Thursday November 18, 2010

[Time] NameMessage
[09:10] vegaicm Question: is it possible for an application to have multiple I/O threads (and therefore multiple zmq sockets) binding on the same TCP port? What I want to achieve is something similar to the classic spawning a limited number of threads for each new accepted connection. Considering the architecture of zeromq this seems not possible, but having this option may increase throughput.
[09:13] guido_g first, io-threads are ømq internal things and it is unlikely that you'll ever need more than one
[09:13] guido_g second, some worker pool examples are shown in the guide
[09:15] guido_g you can create forwarder reading incomming messages and distributing them to the workers with a few lines of code, again see the guide for examples
[09:16] vegaicm guido_g: I tried the simple hello work multithread , but I am not really impressed by its performance. Because all the messages pass though one devices I see it as a bottleneck. I was thinking to create more devices on different ports, but this not really what I was looking for: all the devices on the same port
[09:17] guido_g you're going through this port anyway on os level, so i can't see any problem
[09:18] Guthur vegaicm: If you look at the device code there is actually not much there
[09:18] Guthur it just receives a message and passes it on
[09:19] Guthur If that becomes a bottleneck I'd hazard that things have become too granular and you are not doing enough in your worker threads
[09:21] guido_g w/o a more or less detailed description of the work to do and the message frequency all one can do is to guess
[09:22] vegaicm guido_g, Guthur : what I meant is that with a zmq device I have 1 thread processing all the communication though a TCP socket. While if I can have more then 1 threads managing tcp/ip communication I can hare more throughput. Give me a minute and I will try to describe what I am comparing, but Guthur you are right: the worker threads aren't doing much ... details coming now
[09:23] guido_g vegaicm: read the guide, it explains what you can do in various simple examples, really
[09:25] vegaicm I created a simple application using standard socket programming. This application spawn 1 thread for each incoming connection and process client requests. What the application does is to get a key=>value from the client and write a it in a hash table with the proper mutex handling. I can reach 360k SET/sec . With zmq using 1 device and 10 workers that do absolutely nothing I can't go beyond the 45k messages/sec
[09:26] vegaicm guido_g: I will check the guide again, maybe I missed something in there. Thanks.
[09:26] guido_g
[09:27] vegaicm guido_g: I read it, and tried that code. It doesn't meet my expectation :-/
[09:27] vegaicm "You cannot use sockets except in the thread that created them."
[09:27] guido_g so?
[09:28] vegaicm it is a bottleneck for me
[09:28] guido_g sure
[09:30] vegaicm I can create multiple ZMQ_REQ sockets and multiple workers pool, but this would mean I have to bind on multiple TCP ports
[09:31] guido_g sounds fishy, but hey...
[09:31] Guthur vegaicm: are you using inproc connections at the worker end of the device?
[09:32] vegaicm correct, is fishy , especially for the client . Unless the client itself is a multithread application that connects to multiple socket.
[09:33] vegaicm Guthur: yes, inproc between workers. tcp only for client-server comunication
[09:43] Guthur guido_g: Does that performance hit sound right? I'd imagine that there would be some overhead but that seems like quite a bit.
[09:45] guido_g depends on the implementation i'd say
[09:46] guido_g but ømq is not a "one size fits all" thing
[09:56] vegaicm Guthur, guido_g : this is a *vague* idea of what I mean: . On and with TCP, 1 client - 1 server I have 100k/4.485 = 22.3k messages processed per second. If I have 10 clients and 10 servers, the messages processed are 100k x 10 / 6.379 (the slower client) = 156.8k messages per second. If I increase the number of clients/servers the figures go even better.
[10:03] guido_g and what you're showing?
[10:08] vegaicm guido_g: look this: . I start 10 clients that connect to a multithread server that has 10 workers . 10 clients -> 1 MT server = 41.8k msg/sec . 10 clients -> 10 server = 156.8k msg/sec . Where is the bottleneck? I doubt it is in inproc, but in the tcp socket
[10:10] guido_g again, what do you want to show?
[10:10] vegaicm I want to show that there is a bottleneck in having just 1 threads processing messages from a tcp port
[10:11] guido_g this is something widely known, but ok
[10:13] vegaicm guido_g: ok :) Then , what I am adding too is that having multiple threads on the same tcp port can increase performance a lot.
[10:14] guido_g threads on a port?
[10:15] vegaicm yep . With standard socket programming this is possible and quite easy. For example you can have the main process binded on the socket and waiting for connection. For each connection spawns a threads.
[10:15] vegaicm but by design I think this is impossible in zeromq
[10:16] guido_g it's outright stupid
[10:16] guido_g given the fact that a single server can process thousands of ongoing connections
[10:17] guido_g this is why queueing and a worker pool is used
[10:21] vegaicm what if I am ok to have hundreds/thousands of threads. As I was saying, I have a prototype of an application that spawn 1000 threads and processes requests from 1000 clients at a very good speed, 360k requests/sec . And it isn't just a "hello world" where the workers do nothing, but it actually process some few GB of data
[10:22] vegaicm and I would love to reach the same performance using zeromq , if possible
[10:22] vegaicm thanks :D
[10:42] pieterh hi vegaicm
[10:42] pieterh would you like to discuss high performance server architectures?
[10:43] vegaicm hi pieterh . Sure
[10:43] pieterh first of all, what is the work your threads are doing?
[10:45] vegaicm right now the worker threads in zeromq they just send the "world" string back to the client, nothing more. But my future idea is to use the workers to process data in a key=>value storage
[10:46] pieterh your basic architecture depends on the kinds of work you are doing
[10:46] pieterh i will explain
[10:46] pieterh some tasks are totally independent, e.g. "here is a data set, compute me some stuff on it"
[10:46] pieterh some tasks are highly interdependent, e.g. "get me something from a database"
[10:47] pieterh in the first case, you can create a highly concurrent (parallel) architecture
[10:47] pieterh in the second case that is fairly pointless and you can use a more serial architecture
[10:47] pieterh ack?
[10:47] vegaicm ack so far
[10:48] pieterh now you need to get requests from clients and transfer them to your workers as rapidly as you can
[10:48] pieterh workers can be processes, threads, or whatever
[10:48] pieterh there are many strategies for doing this
[10:48] pieterh indeed, starting a thread for each client is one option. Some architectures start a process for each client.
[10:48] pieterh on Linux, a process, a thread, it's much the same
[10:49] pieterh both are fairly heavy mechanisms
[10:49] vegaicm ack on all this too
[10:49] pieterh ack?
[10:50] vegaicm yep
[10:50] pieterh so in general when you come to ask for architecture advice, do not explain your solutions
[10:50] pieterh but rather explain your problems and then collect possible solutions
[10:50] pieterh now I'll explain the very fastest possible model for getting stuff off a network and to workers
[10:51] vegaicm ok, I am following you
[10:51] pieterh first of all, you have one thread per network device
[10:51] pieterh so that data can flow without interrupts
[10:51] pieterh second, you have one thread per worker
[10:51] pieterh so that tasks can execute without interrupts
[10:51] pieterh third, you connect these two sets of threads using lock-free queues
[10:52] pieterh so that messages can flow without interrupts
[10:52] CIA-20 zeromq2: 03Mikko Koppanen 07master * r945c931 10/ (acinclude.m4
[10:52] CIA-20 zeromq2: Run autoupdate on the
[10:52] CIA-20 zeromq2: I ran autoupdate on the, which generated most of the
[10:52] CIA-20 zeromq2: patch attached. There is also a small manual fix in which removes the
[10:52] CIA-20 zeromq2: warning "Remember to add LT_INIT to" which I assume is
[10:52] CIA-20 zeromq2: because AC_PROG_LIBTOOL was called inside a macro.
[10:52] CIA-20 zeromq2: Signed-off-by: Mikko Koppanen <> -
[10:52] pieterh the single greatest slow down in a concurrent / multithread architecture is when threads are swapped in and out, or locked, or waiting
[10:53] pieterh ideally, thus, you end up with exactly 1 thread per CPU, running full speed without any locks
[10:53] pieterh ack?
[10:53] vegaicm I am following you 100% so far
[10:53] pieterh you understand that this is very different than "one thread per client", which will result in chaotic use of the CPU cores
[10:53] pieterh now here is the difficulty
[10:54] pieterh creating these three layers that i described (i/o, workers, queues) is very very hard
[10:54] pieterh that is why practically no projects do this
[10:54] pieterh and so, people use techniques like you described
[10:54] pieterh happily, we made zeromq for you
[10:54] pieterh and it does exactly this
[10:55] pieterh by design zeromq does not let you create one thread per client because it would be silly
[10:55] pieterh what you do want control over
[10:55] pieterh is the threading model for your workers
[10:56] pieterh which is what zeromq gives you... you decide exactly how to architect your workers across one or more boxes
[10:57] pieterh if you need to, you can start more than 1 I/O thread
[10:57] pieterh does this all make sense?
[10:57] vegaicm the 1 I/O seems to be my current limitation, because I have multiple TCP circuit handles by a single I/O thread. Is this correct?
[10:58] pieterh note that 1 I/O thread can handle a _lot_ of messages per second
[10:58] pieterh like several million
[10:58] vegaicm on TCP ?
[10:58] pieterh most of what you think you know about performance is, unfortunately, wrong
[10:58] pieterh yes
[10:59] pieterh you can monitor your system, under load, and test with more I/O threads
[10:59] pieterh that is useful if you have more NICs
[11:00] vegaicm pieterh: how can I test the maximum number of msgs/sec a I/O thread can process over TCP ?
[11:01] pieterh use the performance test tools provided with zeromq
[11:01] pieterh vegaicm, have you read the Guide?
[11:02] vegaicm pieterh: tbh only the first two chapters. The multi-thread approach is what I was interested and I wanted to test it before going further
[11:04] pieterh with zeromq you have to abandon your view of a server as a big thing with many clients
[11:04] pieterh think instead of a cluster of servers
[11:05] pieterh no central bottlenecks
[11:08] pieterh vegaicm, I need to go, I hope this has helped you
[11:09] vegaicm a cluster of servers is my aim. And inter-servers communications is what I need fast. This is why I was thinking to multiple I/O threads that manage persistent connections between clients and servers and between servers. So actually I don't want an unlimited number of I/O threads spawned on demands, but a limited (and almost predefined) but high number of connections.
[11:09] vegaicm pieterh: thanks a lot for your time, I will continue the read of the guide, maybe later on there will be something more relevant for the problem I have. Thanks again
[11:09] pieterh kindly, don't try to do 0MQ's work, it's not constructive
[11:10] vegaicm ack :)
[11:11] pieterh as well as reading the guide, make your own simple examples
[11:11] pieterh and when you find _real_, not _imagined_ performance issues, let's talk again
[11:11] vegaicm pieterh: will do that. Thanks.
[12:48] mikko good morning
[12:49] Guthur morning mikko
[12:49] Guthur cheers for the help last night
[12:50] Guthur Hopefully the test applications should execute without exceptions now
[12:55] mikko Guthur: haven't checked this morning
[12:55] mikko no warnings no
[12:55] mikko w
[13:13] Guthur mikko, cool
[14:56] mikko sustrik: icc build: nbytes != -1 (tcp_socket.cpp:197)
[15:02] sustrik mikko: we have to find out what's the errno at theat point...
[15:03] sustrik it should be reported to stderr
[15:03] sustrik let me look
[15:04] mikko /bin/bash: line 5: 8278 Aborted ${dir}$tst
[15:04] mikko Bad file descriptor
[15:04] sustrik Bad File Descriptor
[15:04] mikko maybe it's a race condition
[15:04] sustrik yes
[15:04] mikko as multiple builds might try to bind to same things
[15:04] sustrik maybe one of the problems already reported on the ML
[15:05] sustrik mikko: all builds go to the same directory?
[15:05] mikko sustrik: nope
[15:06] mikko sustrik: but they might try to bind at the same time
[15:06] sustrik good
[15:06] mikko if you have concurrent builds
[15:06] sustrik aha
[15:06] mikko only GCC builds do 'make install'
[15:06] sustrik true
[15:06] sustrik anyway, this is a different problem
[15:06] sustrik a buf in 0mq itself
[15:06] sustrik bug
[15:07] mikko non icc specific?
[15:07] sustrik i don't think so
[15:08] sustrik it looks like we are trying to use a file desctiptor that we've already closed
[16:52] mikko sustrik: there?