ZeroMq IRC Log

Thursday November 18, 2010

[Time] Name	Message
[09:10] vegaicm	Question: is it possible for an application to have multiple I/O threads (and therefore multiple zmq sockets) binding on the same TCP port? What I want to achieve is something similar to the classic spawning a limited number of threads for each new accepted connection. Considering the architecture of zeromq this seems not possible, but having this option may increase throughput.
[09:13] guido_g	first, io-threads are Ã¸mq internal things and it is unlikely that you'll ever need more than one
[09:13] guido_g	second, some worker pool examples are shown in the guide
[09:15] guido_g	you can create forwarder reading incomming messages and distributing them to the workers with a few lines of code, again see the guide for examples
[09:16] vegaicm	guido_g: I tried the simple hello work multithread , but I am not really impressed by its performance. Because all the messages pass though one devices I see it as a bottleneck. I was thinking to create more devices on different ports, but this not really what I was looking for: all the devices on the same port
[09:17] guido_g	you're going through this port anyway on os level, so i can't see any problem
[09:18] Guthur	vegaicm: If you look at the device code there is actually not much there
[09:18] Guthur	it just receives a message and passes it on
[09:19] Guthur	If that becomes a bottleneck I'd hazard that things have become too granular and you are not doing enough in your worker threads
[09:21] guido_g	w/o a more or less detailed description of the work to do and the message frequency all one can do is to guess
[09:22] vegaicm	guido_g, Guthur : what I meant is that with a zmq device I have 1 thread processing all the communication though a TCP socket. While if I can have more then 1 threads managing tcp/ip communication I can hare more throughput. Give me a minute and I will try to describe what I am comparing, but Guthur you are right: the worker threads aren't doing much ... details coming now
[09:23] guido_g	vegaicm: read the guide, it explains what you can do in various simple examples, really
[09:25] vegaicm	I created a simple application using standard socket programming. This application spawn 1 thread for each incoming connection and process client requests. What the application does is to get a key=>value from the client and write a it in a hash table with the proper mutex handling. I can reach 360k SET/sec . With zmq using 1 device and 10 workers that do absolutely nothing I can't go beyond the 45k messages/sec
[09:26] vegaicm	guido_g: I will check the guide again, maybe I missed something in there. Thanks.
[09:26] guido_g	http://zguide.zeromq.org/chapter:all#toc33
[09:27] vegaicm	guido_g: I read it, and tried that code. It doesn't meet my expectation :-/
[09:27] vegaicm	"You cannot use sockets except in the thread that created them."
[09:27] guido_g	so?
[09:28] vegaicm	it is a bottleneck for me
[09:28] guido_g	sure
[09:30] vegaicm	I can create multiple ZMQ_REQ sockets and multiple workers pool, but this would mean I have to bind on multiple TCP ports
[09:31] guido_g	sounds fishy, but hey...
[09:31] Guthur	vegaicm: are you using inproc connections at the worker end of the device?
[09:32] vegaicm	correct, is fishy , especially for the client . Unless the client itself is a multithread application that connects to multiple socket.
[09:33] vegaicm	Guthur: yes, inproc between workers. tcp only for client-server comunication
[09:43] Guthur	guido_g: Does that performance hit sound right? I'd imagine that there would be some overhead but that seems like quite a bit.
[09:45] guido_g	depends on the implementation i'd say
[09:46] guido_g	but Ã¸mq is not a "one size fits all" thing
[09:56] vegaicm	Guthur, guido_g : this is a vague idea of what I mean: http://pastebin.com/BWcT62GK . On 127.0.0.1 and with TCP, 1 client - 1 server I have 100k/4.485 = 22.3k messages processed per second. If I have 10 clients and 10 servers, the messages processed are 100k x 10 / 6.379 (the slower client) = 156.8k messages per second. If I increase the number of clients/servers the figures go even better.
[10:03] guido_g	and what you're showing?
[10:08] vegaicm	guido_g: look this: http://pastebin.com/GXbDiS9F . I start 10 clients that connect to a multithread server that has 10 workers . 10 clients -> 1 MT server = 41.8k msg/sec . 10 clients -> 10 server = 156.8k msg/sec . Where is the bottleneck? I doubt it is in inproc, but in the tcp socket
[10:10] guido_g	again, what do you want to show?
[10:10] vegaicm	I want to show that there is a bottleneck in having just 1 threads processing messages from a tcp port
[10:11] guido_g	this is something widely known, but ok
[10:13] vegaicm	guido_g: ok :) Then , what I am adding too is that having multiple threads on the same tcp port can increase performance a lot.
[10:14] guido_g	threads on a port?
[10:15] vegaicm	yep . With standard socket programming this is possible and quite easy. For example you can have the main process binded on the socket and waiting for connection. For each connection spawns a threads.
[10:15] vegaicm	but by design I think this is impossible in zeromq
[10:16] guido_g	it's outright stupid
[10:16] guido_g	given the fact that a single server can process thousands of ongoing connections
[10:17] guido_g	this is why queueing and a worker pool is used
[10:21] vegaicm	what if I am ok to have hundreds/thousands of threads. As I was saying, I have a prototype of an application that spawn 1000 threads and processes requests from 1000 clients at a very good speed, 360k requests/sec . And it isn't just a "hello world" where the workers do nothing, but it actually process some few GB of data
[10:22] vegaicm	and I would love to reach the same performance using zeromq , if possible
[10:22] vegaicm	thanks :D
[10:42] pieterh	hi vegaicm
[10:42] pieterh	would you like to discuss high performance server architectures?
[10:43] vegaicm	hi pieterh . Sure
[10:43] pieterh	first of all, what is the work your threads are doing?
[10:45] vegaicm	right now the worker threads in zeromq they just send the "world" string back to the client, nothing more. But my future idea is to use the workers to process data in a key=>value storage
[10:46] pieterh	your basic architecture depends on the kinds of work you are doing
[10:46] pieterh	i will explain
[10:46] pieterh	some tasks are totally independent, e.g. "here is a data set, compute me some stuff on it"
[10:46] pieterh	some tasks are highly interdependent, e.g. "get me something from a database"
[10:47] pieterh	in the first case, you can create a highly concurrent (parallel) architecture
[10:47] pieterh	in the second case that is fairly pointless and you can use a more serial architecture
[10:47] pieterh	ack?
[10:47] vegaicm	ack so far
[10:48] pieterh	now you need to get requests from clients and transfer them to your workers as rapidly as you can
[10:48] pieterh	workers can be processes, threads, or whatever
[10:48] pieterh	there are many strategies for doing this
[10:48] pieterh	indeed, starting a thread for each client is one option. Some architectures start a process for each client.
[10:48] pieterh	on Linux, a process, a thread, it's much the same
[10:49] pieterh	both are fairly heavy mechanisms
[10:49] vegaicm	ack on all this too
[10:49] pieterh	ack?
[10:50] vegaicm	yep
[10:50] pieterh	so in general when you come to ask for architecture advice, do not explain your solutions
[10:50] pieterh	but rather explain your problems and then collect possible solutions
[10:50] pieterh	now I'll explain the very fastest possible model for getting stuff off a network and to workers
[10:51] vegaicm	ok, I am following you
[10:51] pieterh	first of all, you have one thread per network device
[10:51] pieterh	so that data can flow without interrupts
[10:51] pieterh	second, you have one thread per worker
[10:51] pieterh	so that tasks can execute without interrupts
[10:51] pieterh	third, you connect these two sets of threads using lock-free queues
[10:52] pieterh	so that messages can flow without interrupts
[10:52] CIA-20	zeromq2: 03Mikko Koppanen 07master * r945c931 10/ (acinclude.m4 configure.in):
[10:52] CIA-20	zeromq2: Run autoupdate on the configure.in
[10:52] CIA-20	zeromq2: I ran autoupdate on the configure.in, which generated most of the
[10:52] CIA-20	zeromq2: patch attached. There is also a small manual fix in which removes the
[10:52] CIA-20	zeromq2: warning "Remember to add LT_INIT to configure.in" which I assume is
[10:52] CIA-20	zeromq2: because AC_PROG_LIBTOOL was called inside a macro.
[10:52] CIA-20	zeromq2: Signed-off-by: Mikko Koppanen <mkoppanen@php.net> - http://bit.ly/ciuque
[10:52] pieterh	the single greatest slow down in a concurrent / multithread architecture is when threads are swapped in and out, or locked, or waiting
[10:53] pieterh	ideally, thus, you end up with exactly 1 thread per CPU, running full speed without any locks
[10:53] pieterh	ack?
[10:53] vegaicm	I am following you 100% so far
[10:53] pieterh	you understand that this is very different than "one thread per client", which will result in chaotic use of the CPU cores
[10:53] pieterh	now here is the difficulty
[10:54] pieterh	creating these three layers that i described (i/o, workers, queues) is very very hard
[10:54] pieterh	that is why practically no projects do this
[10:54] pieterh	and so, people use techniques like you described
[10:54] pieterh	happily, we made zeromq for you
[10:54] pieterh	and it does exactly this
[10:55] pieterh	by design zeromq does not let you create one thread per client because it would be silly
[10:55] pieterh	what you do want control over
[10:55] pieterh	is the threading model for your workers
[10:56] pieterh	which is what zeromq gives you... you decide exactly how to architect your workers across one or more boxes
[10:57] pieterh	if you need to, you can start more than 1 I/O thread
[10:57] pieterh	does this all make sense?
[10:57] vegaicm	the 1 I/O seems to be my current limitation, because I have multiple TCP circuit handles by a single I/O thread. Is this correct?
[10:58] pieterh	note that 1 I/O thread can handle a _lot_ of messages per second
[10:58] pieterh	like several million
[10:58] vegaicm	on TCP ?
[10:58] pieterh	most of what you think you know about performance is, unfortunately, wrong
[10:58] pieterh	yes
[10:59] pieterh	you can monitor your system, under load, and test with more I/O threads
[10:59] pieterh	that is useful if you have more NICs
[11:00] vegaicm	pieterh: how can I test the maximum number of msgs/sec a I/O thread can process over TCP ?
[11:01] pieterh	use the performance test tools provided with zeromq
[11:01] pieterh	vegaicm, have you read the Guide?
[11:02] vegaicm	pieterh: tbh only the first two chapters. The multi-thread approach is what I was interested and I wanted to test it before going further
[11:04] pieterh	with zeromq you have to abandon your view of a server as a big thing with many clients
[11:04] pieterh	think instead of a cluster of servers
[11:05] pieterh	no central bottlenecks
[11:08] pieterh	vegaicm, I need to go, I hope this has helped you
[11:09] vegaicm	a cluster of servers is my aim. And inter-servers communications is what I need fast. This is why I was thinking to multiple I/O threads that manage persistent connections between clients and servers and between servers. So actually I don't want an unlimited number of I/O threads spawned on demands, but a limited (and almost predefined) but high number of connections.
[11:09] vegaicm	pieterh: thanks a lot for your time, I will continue the read of the guide, maybe later on there will be something more relevant for the problem I have. Thanks again
[11:09] pieterh	kindly, don't try to do 0MQ's work, it's not constructive
[11:10] vegaicm	ack :)
[11:11] pieterh	as well as reading the guide, make your own simple examples
[11:11] pieterh	and when you find _real_, not _imagined_ performance issues, let's talk again
[11:11] vegaicm	pieterh: will do that. Thanks.
[12:48] mikko	good morning
[12:49] Guthur	morning mikko
[12:49] Guthur	cheers for the help last night
[12:50] Guthur	Hopefully the test applications should execute without exceptions now
[12:55] mikko	Guthur: haven't checked this morning
[12:55] mikko	no warnings no
[12:55] mikko	w
[13:13] Guthur	mikko, cool
[14:56] mikko	sustrik: icc build: nbytes != -1 (tcp_socket.cpp:197)
[15:02] sustrik	mikko: we have to find out what's the errno at theat point...
[15:03] sustrik	it should be reported to stderr
[15:03] sustrik	let me look
[15:04] mikko	/bin/bash: line 5: 8278 Aborted ${dir}$tst
[15:04] mikko	Bad file descriptor
[15:04] sustrik	Bad File Descriptor
[15:04] mikko	maybe it's a race condition
[15:04] sustrik	yes
[15:04] mikko	as multiple builds might try to bind to same things
[15:04] sustrik	maybe one of the problems already reported on the ML
[15:05] sustrik	mikko: all builds go to the same directory?
[15:05] mikko	sustrik: nope
[15:06] mikko	sustrik: but they might try to bind at the same time
[15:06] sustrik	good
[15:06] mikko	if you have concurrent builds
[15:06] sustrik	aha
[15:06] mikko	only GCC builds do 'make install'
[15:06] sustrik	true
[15:06] sustrik	anyway, this is a different problem
[15:06] sustrik	a buf in 0mq itself
[15:06] sustrik	bug
[15:07] mikko	non icc specific?
[15:07] sustrik	i don't think so
[15:08] sustrik	it looks like we are trying to use a file desctiptor that we've already closed
[16:52] mikko	sustrik: there?