[Time] Name | Message |
[09:10] vegaicm
|
Question: is it possible for an application to have multiple I/O threads (and therefore multiple zmq sockets) binding on the same TCP port? What I want to achieve is something similar to the classic spawning a limited number of threads for each new accepted connection. Considering the architecture of zeromq this seems not possible, but having this option may increase throughput.
|
[09:13] guido_g
|
first, io-threads are ømq internal things and it is unlikely that you'll ever need more than one
|
[09:13] guido_g
|
second, some worker pool examples are shown in the guide
|
[09:15] guido_g
|
you can create forwarder reading incomming messages and distributing them to the workers with a few lines of code, again see the guide for examples
|
[09:16] vegaicm
|
guido_g: I tried the simple hello work multithread , but I am not really impressed by its performance. Because all the messages pass though one devices I see it as a bottleneck. I was thinking to create more devices on different ports, but this not really what I was looking for: all the devices on the same port
|
[09:17] guido_g
|
you're going through this port anyway on os level, so i can't see any problem
|
[09:18] Guthur
|
vegaicm: If you look at the device code there is actually not much there
|
[09:18] Guthur
|
it just receives a message and passes it on
|
[09:19] Guthur
|
If that becomes a bottleneck I'd hazard that things have become too granular and you are not doing enough in your worker threads
|
[09:21] guido_g
|
w/o a more or less detailed description of the work to do and the message frequency all one can do is to guess
|
[09:22] vegaicm
|
guido_g, Guthur : what I meant is that with a zmq device I have 1 thread processing all the communication though a TCP socket. While if I can have more then 1 threads managing tcp/ip communication I can hare more throughput. Give me a minute and I will try to describe what I am comparing, but Guthur you are right: the worker threads aren't doing much ... details coming now
|
[09:23] guido_g
|
vegaicm: read the guide, it explains what you can do in various simple examples, really
|
[09:25] vegaicm
|
I created a simple application using standard socket programming. This application spawn 1 thread for each incoming connection and process client requests. What the application does is to get a key=>value from the client and write a it in a hash table with the proper mutex handling. I can reach 360k SET/sec . With zmq using 1 device and 10 workers that do absolutely nothing I can't go beyond the 45k messages/sec
|
[09:26] vegaicm
|
guido_g: I will check the guide again, maybe I missed something in there. Thanks.
|
[09:26] guido_g
|
http://zguide.zeromq.org/chapter:all#toc33
|
[09:27] vegaicm
|
guido_g: I read it, and tried that code. It doesn't meet my expectation :-/
|
[09:27] vegaicm
|
"You cannot use sockets except in the thread that created them."
|
[09:27] guido_g
|
so?
|
[09:28] vegaicm
|
it is a bottleneck for me
|
[09:28] guido_g
|
sure
|
[09:30] vegaicm
|
I can create multiple ZMQ_REQ sockets and multiple workers pool, but this would mean I have to bind on multiple TCP ports
|
[09:31] guido_g
|
sounds fishy, but hey...
|
[09:31] Guthur
|
vegaicm: are you using inproc connections at the worker end of the device?
|
[09:32] vegaicm
|
correct, is fishy , especially for the client . Unless the client itself is a multithread application that connects to multiple socket.
|
[09:33] vegaicm
|
Guthur: yes, inproc between workers. tcp only for client-server comunication
|
[09:43] Guthur
|
guido_g: Does that performance hit sound right? I'd imagine that there would be some overhead but that seems like quite a bit.
|
[09:45] guido_g
|
depends on the implementation i'd say
|
[09:46] guido_g
|
but ømq is not a "one size fits all" thing
|
[09:56] vegaicm
|
Guthur, guido_g : this is a *vague* idea of what I mean: http://pastebin.com/BWcT62GK . On 127.0.0.1 and with TCP, 1 client - 1 server I have 100k/4.485 = 22.3k messages processed per second. If I have 10 clients and 10 servers, the messages processed are 100k x 10 / 6.379 (the slower client) = 156.8k messages per second. If I increase the number of clients/servers the figures go even better.
|
[10:03] guido_g
|
and what you're showing?
|
[10:08] vegaicm
|
guido_g: look this: http://pastebin.com/GXbDiS9F . I start 10 clients that connect to a multithread server that has 10 workers . 10 clients -> 1 MT server = 41.8k msg/sec . 10 clients -> 10 server = 156.8k msg/sec . Where is the bottleneck? I doubt it is in inproc, but in the tcp socket
|
[10:10] guido_g
|
again, what do you want to show?
|
[10:10] vegaicm
|
I want to show that there is a bottleneck in having just 1 threads processing messages from a tcp port
|
[10:11] guido_g
|
this is something widely known, but ok
|
[10:13] vegaicm
|
guido_g: ok :) Then , what I am adding too is that having multiple threads on the same tcp port can increase performance a lot.
|
[10:14] guido_g
|
threads on a port?
|
[10:15] vegaicm
|
yep . With standard socket programming this is possible and quite easy. For example you can have the main process binded on the socket and waiting for connection. For each connection spawns a threads.
|
[10:15] vegaicm
|
but by design I think this is impossible in zeromq
|
[10:16] guido_g
|
it's outright stupid
|
[10:16] guido_g
|
given the fact that a single server can process thousands of ongoing connections
|
[10:17] guido_g
|
this is why queueing and a worker pool is used
|
[10:21] vegaicm
|
what if I am ok to have hundreds/thousands of threads. As I was saying, I have a prototype of an application that spawn 1000 threads and processes requests from 1000 clients at a very good speed, 360k requests/sec . And it isn't just a "hello world" where the workers do nothing, but it actually process some few GB of data
|
[10:22] vegaicm
|
and I would love to reach the same performance using zeromq , if possible
|
[10:22] vegaicm
|
thanks :D
|
[10:42] pieterh
|
hi vegaicm
|
[10:42] pieterh
|
would you like to discuss high performance server architectures?
|
[10:43] vegaicm
|
hi pieterh . Sure
|
[10:43] pieterh
|
first of all, what is the work your threads are doing?
|
[10:45] vegaicm
|
right now the worker threads in zeromq they just send the "world" string back to the client, nothing more. But my future idea is to use the workers to process data in a key=>value storage
|
[10:46] pieterh
|
your basic architecture depends on the kinds of work you are doing
|
[10:46] pieterh
|
i will explain
|
[10:46] pieterh
|
some tasks are totally independent, e.g. "here is a data set, compute me some stuff on it"
|
[10:46] pieterh
|
some tasks are highly interdependent, e.g. "get me something from a database"
|
[10:47] pieterh
|
in the first case, you can create a highly concurrent (parallel) architecture
|
[10:47] pieterh
|
in the second case that is fairly pointless and you can use a more serial architecture
|
[10:47] pieterh
|
ack?
|
[10:47] vegaicm
|
ack so far
|
[10:48] pieterh
|
now you need to get requests from clients and transfer them to your workers as rapidly as you can
|
[10:48] pieterh
|
workers can be processes, threads, or whatever
|
[10:48] pieterh
|
there are many strategies for doing this
|
[10:48] pieterh
|
indeed, starting a thread for each client is one option. Some architectures start a process for each client.
|
[10:48] pieterh
|
on Linux, a process, a thread, it's much the same
|
[10:49] pieterh
|
both are fairly heavy mechanisms
|
[10:49] vegaicm
|
ack on all this too
|
[10:49] pieterh
|
ack?
|
[10:50] vegaicm
|
yep
|
[10:50] pieterh
|
so in general when you come to ask for architecture advice, do not explain your solutions
|
[10:50] pieterh
|
but rather explain your problems and then collect possible solutions
|
[10:50] pieterh
|
now I'll explain the very fastest possible model for getting stuff off a network and to workers
|
[10:51] vegaicm
|
ok, I am following you
|
[10:51] pieterh
|
first of all, you have one thread per network device
|
[10:51] pieterh
|
so that data can flow without interrupts
|
[10:51] pieterh
|
second, you have one thread per worker
|
[10:51] pieterh
|
so that tasks can execute without interrupts
|
[10:51] pieterh
|
third, you connect these two sets of threads using lock-free queues
|
[10:52] pieterh
|
so that messages can flow without interrupts
|
[10:52] CIA-20
|
zeromq2: 03Mikko Koppanen 07master * r945c931 10/ (acinclude.m4 configure.in):
|
[10:52] CIA-20
|
zeromq2: Run autoupdate on the configure.in
|
[10:52] CIA-20
|
zeromq2: I ran autoupdate on the configure.in, which generated most of the
|
[10:52] CIA-20
|
zeromq2: patch attached. There is also a small manual fix in which removes the
|
[10:52] CIA-20
|
zeromq2: warning "Remember to add LT_INIT to configure.in" which I assume is
|
[10:52] CIA-20
|
zeromq2: because AC_PROG_LIBTOOL was called inside a macro.
|
[10:52] CIA-20
|
zeromq2: Signed-off-by: Mikko Koppanen <mkoppanen@php.net> - http://bit.ly/ciuque
|
[10:52] pieterh
|
the single greatest slow down in a concurrent / multithread architecture is when threads are swapped in and out, or locked, or waiting
|
[10:53] pieterh
|
ideally, thus, you end up with exactly 1 thread per CPU, running full speed without any locks
|
[10:53] pieterh
|
ack?
|
[10:53] vegaicm
|
I am following you 100% so far
|
[10:53] pieterh
|
you understand that this is very different than "one thread per client", which will result in chaotic use of the CPU cores
|
[10:53] pieterh
|
now here is the difficulty
|
[10:54] pieterh
|
creating these three layers that i described (i/o, workers, queues) is very very hard
|
[10:54] pieterh
|
that is why practically no projects do this
|
[10:54] pieterh
|
and so, people use techniques like you described
|
[10:54] pieterh
|
happily, we made zeromq for you
|
[10:54] pieterh
|
and it does exactly this
|
[10:55] pieterh
|
by design zeromq does not let you create one thread per client because it would be silly
|
[10:55] pieterh
|
what you do want control over
|
[10:55] pieterh
|
is the threading model for your workers
|
[10:56] pieterh
|
which is what zeromq gives you... you decide exactly how to architect your workers across one or more boxes
|
[10:57] pieterh
|
if you need to, you can start more than 1 I/O thread
|
[10:57] pieterh
|
does this all make sense?
|
[10:57] vegaicm
|
the 1 I/O seems to be my current limitation, because I have multiple TCP circuit handles by a single I/O thread. Is this correct?
|
[10:58] pieterh
|
note that 1 I/O thread can handle a _lot_ of messages per second
|
[10:58] pieterh
|
like several million
|
[10:58] vegaicm
|
on TCP ?
|
[10:58] pieterh
|
most of what you think you know about performance is, unfortunately, wrong
|
[10:58] pieterh
|
yes
|
[10:59] pieterh
|
you can monitor your system, under load, and test with more I/O threads
|
[10:59] pieterh
|
that is useful if you have more NICs
|
[11:00] vegaicm
|
pieterh: how can I test the maximum number of msgs/sec a I/O thread can process over TCP ?
|
[11:01] pieterh
|
use the performance test tools provided with zeromq
|
[11:01] pieterh
|
vegaicm, have you read the Guide?
|
[11:02] vegaicm
|
pieterh: tbh only the first two chapters. The multi-thread approach is what I was interested and I wanted to test it before going further
|
[11:04] pieterh
|
with zeromq you have to abandon your view of a server as a big thing with many clients
|
[11:04] pieterh
|
think instead of a cluster of servers
|
[11:05] pieterh
|
no central bottlenecks
|
[11:08] pieterh
|
vegaicm, I need to go, I hope this has helped you
|
[11:09] vegaicm
|
a cluster of servers is my aim. And inter-servers communications is what I need fast. This is why I was thinking to multiple I/O threads that manage persistent connections between clients and servers and between servers. So actually I don't want an unlimited number of I/O threads spawned on demands, but a limited (and almost predefined) but high number of connections.
|
[11:09] vegaicm
|
pieterh: thanks a lot for your time, I will continue the read of the guide, maybe later on there will be something more relevant for the problem I have. Thanks again
|
[11:09] pieterh
|
kindly, don't try to do 0MQ's work, it's not constructive
|
[11:10] vegaicm
|
ack :)
|
[11:11] pieterh
|
as well as reading the guide, make your own simple examples
|
[11:11] pieterh
|
and when you find _real_, not _imagined_ performance issues, let's talk again
|
[11:11] vegaicm
|
pieterh: will do that. Thanks.
|
[12:48] mikko
|
good morning
|
[12:49] Guthur
|
morning mikko
|
[12:49] Guthur
|
cheers for the help last night
|
[12:50] Guthur
|
Hopefully the test applications should execute without exceptions now
|
[12:55] mikko
|
Guthur: haven't checked this morning
|
[12:55] mikko
|
no warnings no
|
[12:55] mikko
|
w
|
[13:13] Guthur
|
mikko, cool
|
[14:56] mikko
|
sustrik: icc build: nbytes != -1 (tcp_socket.cpp:197)
|
[15:02] sustrik
|
mikko: we have to find out what's the errno at theat point...
|
[15:03] sustrik
|
it should be reported to stderr
|
[15:03] sustrik
|
let me look
|
[15:04] mikko
|
/bin/bash: line 5: 8278 Aborted ${dir}$tst
|
[15:04] mikko
|
Bad file descriptor
|
[15:04] sustrik
|
Bad File Descriptor
|
[15:04] mikko
|
maybe it's a race condition
|
[15:04] sustrik
|
yes
|
[15:04] mikko
|
as multiple builds might try to bind to same things
|
[15:04] sustrik
|
maybe one of the problems already reported on the ML
|
[15:05] sustrik
|
mikko: all builds go to the same directory?
|
[15:05] mikko
|
sustrik: nope
|
[15:06] mikko
|
sustrik: but they might try to bind at the same time
|
[15:06] sustrik
|
good
|
[15:06] mikko
|
if you have concurrent builds
|
[15:06] sustrik
|
aha
|
[15:06] mikko
|
only GCC builds do 'make install'
|
[15:06] sustrik
|
true
|
[15:06] sustrik
|
anyway, this is a different problem
|
[15:06] sustrik
|
a buf in 0mq itself
|
[15:06] sustrik
|
bug
|
[15:07] mikko
|
non icc specific?
|
[15:07] sustrik
|
i don't think so
|
[15:08] sustrik
|
it looks like we are trying to use a file desctiptor that we've already closed
|
[16:52] mikko
|
sustrik: there?
|