ZeroMq IRC Log

Wednesday May 12, 2010

[Time] Name	Message
[05:01] joshua	hi
[05:02] joshua	I've been having trouble using ZMQ_P2P socket types
[05:02] joshua	are there any examples around that use it successfully?
[05:19] guido_g	try to specify an ip address on both sides
[05:19] guido_g	you gave the interface to bind on the server side
[05:20] guido_g	just an idea, of course :)
[05:26] joshua	so for the server, instead of "tcp://lo:5555" bind "tcp://127.0.0.1:5555"?
[05:32] guido_g	yes
[05:40] joshua	hmm, it only works if I do send/receive pairs
[05:40] joshua	I can't get just one to send something to the other
[05:42] sustrik	joshua__: presumably, that's because you close the sender app before it has chance to send the data
[05:42] joshua	ah, the sender doesn't block?
[05:42] sustrik	no, it's async
[05:42] sustrik	that's what MQ means
[05:43] joshua	:D
[06:47] sjampoo	morning!
[07:22] sustrik	morning!
[08:00] mike	sustrik: ping
[08:00] sustrik	pong
[08:01] mike8901	hey - I'm working with joshua on a project - I know you guys talked a bit. Is it possible to block until all sends have completed?
[08:01] mike8901	joshua discovered the hard way that zmq won't keep your program alive
[08:04] sustrik	no, there's no way to do so -- it would result in deadlock if the peer in unavailable
[08:04] joshua	fun
[08:04] sustrik	you have to send acknowledgement by hand
[08:04] mike8901	so what is the recommended way to deal with the situation in which a client sends messages to a server, then exits?
[08:05] mike8901	hm
[08:05] sustrik	is it a request/reply scenario?
[08:05] mike8901	yes, but the reply may not come from the same peer
[08:05] mike8901	basically, we have a central server doling out work to slaves
[08:05] mike8901	and the slaves can talk to each other and redistribute work if needed
[08:05] mike8901	each piece of work has a unique ID attached to it, and a reply will come from some slave, but we don't know which
[08:06] sustrik	that doesn't matter imo
[08:06] sustrik	just wait for reply
[08:06] sustrik	and everything will works as expected
[08:07] mike8901	yeah, I actually think this is a non-issue
[08:07] sjampoo	If you don't even care about a reply, you could use the callback on the message object and implement something that blocks your self.
[08:08] sustrik	yes, but caution is needed as the callback is called from different thread
[08:13] mike8901	does zmq have a more appropriate topology than zmq_p2p for doing a round-robin distribution scheme? we're ending up having to have a vector of locks to each socket, which seems a little wasteful.
[08:14] sustrik	p2p does no distribution...
[08:15] sustrik	use DOWNSTREAM
[08:16] mike8901	ok, will look into that - thanks
[08:17] mike8901	also, does zmq have a way to get a callback when data is available on a socket? we're wasting a lot of time polling every socket on a separate thread now...
[08:20] mike8901	though that may be a function of us just "doing it wrong" by having many sockets, and polling them all in a nonblocking manner
[08:22] sjampoo	You can use zmq_poll to poll them all at once and fire callbacks from there, no? Or are you trying to do something different?
[08:24] mike8901	only issue with zmq_poll is that a different thread could be accessing a given socket
[08:25] mike8901	right now we use mutexes to prevent this, before checking with "recv" to see if data is available.
[08:26] mikko	A ÃMQ context is thread safe and may be shared among as many application threads as the application has requested using the app_threads parameter to zmq_init(), without any additional locking required on the part of the caller. Each ÃMQ socket belonging to a particular context may only be used by the thread that created it using zmq_socket().
[08:26] sustrik	mike8901: what exactly are you trying to achieve?
[08:27] sustrik	you have a thread that owns 1 socket, right?
[08:27] sustrik	you want to wait till there's message available in the socket, no?
[08:28] mike8901	Currently (this may not be the best architecture) we have a setup thread which connects to all the clients using a zmq_p2p model, sticks each socket into a vector, then spawns off a thread to send events to other zmq sockets, as well as a thread to recieve events from zmq sockets.
[08:29] mike8901	and (this may not be correct; we haven't gotten code running yet), we use a vector of mutexes to prevent access to each socket
[08:29] sustrik	what do you want to do? load balance the messages among N sockets?
[08:30] mike8901	yes, in a round-robin manner
[08:30] sustrik	use DOWNSTREAM socket
[08:30] mike8901	okay, I'll look into that
[08:30] sustrik	you'll have a single socket
[08:30] sustrik	it'll do all the hard work for you
[08:30] mike8901	but can I still have another thread polling for recieving messages?
[08:30] sustrik	yes, but there should be another socket there
[08:31] mike8901	oh ok
[08:31] sustrik	presumable UPSTREAM one
[08:31] sustrik	that one merges messages from many sources
[08:32] mike8901	I'm a bit tired to look at that now, but I'll read through http://www.zeromq.org/tutorials:butterfly tomorrow. Thanks for your help!
[08:32] sustrik	np
[08:47] mike8901	oh - one last quick question before I go off to bed: how do you get the list of peers to ZMQ_UPSTREAM? http://github.com/sustrik/jbutterfly/blob/master/gonzo/Component.java specifies an "inp-interface," but I'm not sure what that looks like.
[08:48] sustrik	you don't have a list of peers
[08:48] sustrik	0mq should manage it for you
[08:49] mike8901	er, I guess the question is more appropriate for ZMQ_DOWNSTREAM
[08:49] sustrik	same applies to any socket type
[08:49] mike8901	now, I'm really confused ;)
[08:49] sustrik	the peers are managed by the library
[08:49] mike8901	how is it done though? using some multicast?
[08:49] sustrik	it's transparant to the user
[08:49] sustrik	you can opt for multicast but it's not necessary
[08:50] mike8901	how do the peers find each other?
[08:50] mike8901	or rather
[08:50] sustrik	via address
[08:50] mike8901	but how do you specify the address?
[08:50] mike8901	*addresses
[08:50] mike8901	the connect function takes in a single address
[08:50] sustrik	yes, the connecting side speaks to a single peer
[08:51] sustrik	the binding side speaks to multiple peers
[08:51] mike8901	ah, that may be an issue then....
[08:51] mike8901	our "root" node is going to be transient, and needs to be able to connect to the slaves at will
[08:52] sustrik	you can connect multiple time is needed"
[08:52] sustrik	s.connect (A);
[08:52] sustrik	s.connect (B)
[08:52] sustrik	etc.
[08:52] mike8901	oh ok
[08:52] mike8901	so the root can use a downstream socket, and just call connect for each addr
[08:53] mike8901	is there an easy way to establish a corresponding upstream socket, without bothering to pass the server's IP to the clients?
[08:53] sustrik	server? upstream? what applications there are?
[08:54] mike8901	okay, sorry
[08:54] mike8901	let me explain my application in detail
[08:54] mikko	maybe dns?
[08:54] mike8901	we're implementing a distributed compiler(on top of clang)
[08:54] sustrik	mikko: possibly, but let's first listen to the use case
[08:54] mike8901	the "master" is spawned on demand on the user's computer
[08:55] mike8901	the "slaves" will always be listening for work to process(i.e. files to compile to object code)
[08:55] mike8901	the "master" is not guaranteed to always be running; it is only up for the duration of the compile
[08:55] sustrik	how many masters there may be?
[08:55] mike8901	for now, just 1
[08:55] mike8901	but the master is transient
[08:56] sustrik	so 1 client, 1 master, N workers
[08:56] mike8901	yes
[08:56] mike8901	but the client/master are transient
[08:56] mike8901	well, the client will stay around until work it needs is done
[08:56] sustrik	how does the interaction pattern looks like?
[08:56] sustrik	client sends a request
[08:57] sustrik	master dispatches it to one worker
[08:57] sustrik	worker processes it
[08:57] sustrik	sends reply to the master
[08:57] sustrik	master forwards the reply to client
[08:57] sustrik	is that it?
[08:57] mike8901	yes
[08:57] mike8901	well
[08:57] mike8901	there's not really any client-master interaction now..
[08:58] sustrik	ok, so let's drop the clinet from the scheme
[08:58] mike8901	(there is technically, but we use UNIX sockets for that now)
[08:58] sustrik	master sends request to a worker
[08:58] sustrik	worker replies back to the master
[08:58] sustrik	right?
[08:58] mike8901	yep
[08:58] mike8901	well
[08:59] mike8901	workers are not necesssarily the same, but yes
[08:59] mike8901	that's the idea
[08:59] sustrik	thay are not the same?
[08:59] sustrik	what's the difference?
[08:59] mike8901	well, we're implementing work queue stealing, so if one worker runs out of work, it can ask another for work.
[08:59] sustrik	hm, what is that good for?
[08:59] mike8901	so the master may not recieve the response from the worker it sent the request to
[09:00] sustrik	why not let the master load balance the work?
[09:00] mike8901	the master is going to be overloaded preprocessing(ahmdahl's law) - we want the slaves to load balance amongst themselves
[09:02] sustrik	the master has to send the requests anyway, no?
[09:02] mike8901	yes
[09:02] mike8901	the requests are going to be of varying size though
[09:02] mike8901	as with any project, you'll have really small source files and really large ones
[09:03] mike8901	it's inevitable that some slaves will run out of work, and we want the slaves to be able to steal work off each other's queues
[09:03] sustrik	so what you want to avoid queueing, right?
[09:04] sustrik	at most one request dispatched to the worker at time
[09:04] mike8901	no
[09:04] mike8901	we want the slaves to maintain a queue
[09:04] mike8901	so that if another slave asks slave A for work, it can provide it to slave B
[09:04] sustrik	yes, i understand, but what's the point?
[09:05] mike8901	to load balance
[09:05] sustrik	why not load-balance upfront?
[09:05] sustrik	rather than messing with queues and reassigning the work?
[09:05] mike8901	it's impossible to exactly load balance up front... each request could take an arbitrary amount of time
[09:06] sustrik	say you send at most one request to each worker at time
[09:06] sustrik	when it responds you send another request
[09:06] sustrik	etc.
[09:06] sustrik	wouldn't that solve the problem?
[09:06] mike8901	right, that takes up overhead on the master
[09:06] sustrik	but the master has to send the requests anyway
[09:06] sustrik	what overhead do you have in mind?
[09:07] mike8901	for one, memory overhead
[09:07] sustrik	akc
[09:07] mike8901	queueing the requests will take away from the preprocessor's cache
[09:07] sustrik	ok, i see
[09:07] mike8901	basically, we want as little burdon as possible to be on the master
[09:08] sustrik	what about having a separate load-balancer node then?
[09:08] mike8901	that sounds like it would add a whole other layer of inefficency- now the source has to travel twice over the network
[09:08] mike8901	and you have half the effective bandwidth
[09:09] mike8901	or maybe even less
[09:10] mike8901	anyway, sorry to cut this discussion short(really enjoyed talking with you), but it's 4:10am and I'm exhausted. I'd love to continue this some other time(before Monday at 9am though ;) ).
[09:10] sustrik	sure
[09:10] sustrik	good night!
[09:11] mike8901	(Monday at 9am is the deadline for this project - yes, we're screwed) ;)
[09:11] mike8901	night!
[09:11] sjampoo	heh
[09:11] sjampoo	goodnight and goodluck :)
[09:32] sjampoo	sustrik: i am getting "Assertion failed: fetched (rep.cpp:265)" with a REQ/REP socket on messages larger than about 8k
[09:32] sjampoo	what could be causing this?
[09:32] sustrik	let me see
[09:33] sjampoo	Seems to be something that didn't happen on 2.0.6
[09:33] sustrik	sjampoo: what peer socket types is connected to your REP socket?
[09:33] sjampoo	REQ
[09:34] sustrik	then it's a bug
[09:34] sustrik	can you report it please?
[09:34] sustrik	test program would help
[09:34] sjampoo	i cannot really reproduce it with C code ;/
[09:35] sustrik	hm, which binding it appears with?
[09:35] sjampoo	PyZMQ
[09:35] sjampoo	and local_lat / remote_lat
[09:35] sjampoo	(the perf benchmark)
[09:36] sustrik	i would then suggest reporting it as problem with pyzmq
[09:37] sjampoo	what could be causing it?
[09:37] sustrik	brian will presumably pass the issue upstream with more details attached
[09:37] sustrik	dunno, look's like the message processed has no body
[09:38] sjampoo	Ok
[10:50] CIA-15	zeromq2: 03Brett Cameron 07master * r714a8d5 10/ (5 files): fixes for OpenVMS - http://bit.ly/9IYypp
[10:50] CIA-15	zeromq2: 03Martin Sustrik 07master * r8e5ac10 10/ (7 files in 6 dirs): Merge branch 'master' of git@github.com:sustrik/zeromq2 - http://bit.ly/bKeYae
[12:12] sjampoo	The above issue seems to be a by product of this commit: http://github.com/sustrik/zeromq2/commit/ad6fa9d0d4f1cf29ce63998d7efe337b1a784ef6
[12:14] sustrik	sjampoo: yes, that's when the functionality was introduced
[14:21] mato	sustrik: are you there?
[14:21] sustrik	mato: hi
[14:21] mato	sustrik: I want to revert those atomics changes you committed
[14:21] sustrik	yes, sure
[14:22] mato	while I'm at it, can I remove the native SPARC ops? They are #ifdef-ed out in any case
[14:22] mato	also, in the current git atomic_bitmap is gone, this is correct?
[14:22] sustrik	ack
[14:22] mato	so we have just atomic_counter and atomic_ptr, right?
[14:22] sustrik	SPARC: sure, go on, it's commented out for 2 years now or so :)
[14:22] sustrik	right
[14:23] mato	I'm surprised you committed those changes without asking for review :-(
[14:23] mato	anyway, no harm done, I'll put back the old code
[14:25] sustrik	no way of check everyting, i'm committing in optimistic fashion
[14:25] sjampoo	sustrik: that commit introduces multihop, but i am not using that functionality as i have two req/rep sockets connected directly. Anyway i can reproduce it right now, i probably had too many versions lying around. I'll make an issue
[14:25] sustrik	sjampoo: yes, please
[14:25] mato	sustrik: sure, but you know I spent time on that code, so you could have waited till I got back from holiday
[14:26] sustrik	mato: do you want to become a maintainer for particular subset of files?
[14:26] sustrik	say the atomics?
[14:27] mato	I kind of assumed I was, sice I spent time on it
[14:27] mato	same for doc/*
[14:27] sustrik	ok, let's make this more formal so that obvious who's responsible for what
[14:28] mato	if you like
[14:28] sustrik	definitely
[14:35] sustrik	mato: ok, i've written down the list of components in the project
[14:35] sustrik	what's the common way of listing maintainers?
[14:35] mato	MAINTAINERS file in source tree
[14:35] sustrik	in root?
[14:35] mato	yeah
[14:35] sustrik	ok
[14:36] mato	with Component, Name (of maintainer), Email address
[14:36] mato	or some format like that
[14:36] sustrik	ack
[14:36] sustrik	what about the autotools build
[14:36] sustrik	would you like to maintain that?
[14:37] mato	not really, but you can add me in there as a point of contact
[14:37] sustrik	ok, so it's autotools, docs & atomics
[14:37] sustrik	ok?
[14:37] mato	yeah
[14:44] mato	you should of course add in yourself (with an address of the mailing list) as the maintainer for "everything else"
[14:46] CIA-15	zeromq2: 03Martin Sustrik 07master * r127cb89 10/ MAINTAINERS : MAINTAINERS file added - http://bit.ly/aEumLZ
[14:46] sustrik	done
[14:50] CIA-15	zeromq2: 03Martin Lucina 07master * r52ef3f3 10/ (src/atomic_counter.hpp src/atomic_ptr.hpp):
[14:50] CIA-15	zeromq2: Revert commit 7cb076e, atomic ops cleanup
[14:50] CIA-15	zeromq2: Reverted to using atomic.h on NetBSD
[14:50] CIA-15	zeromq2: Removed GNU builtins (see http://lists.zeromq.org/pipermail/zeromq-dev/2010-May/003485.html)
[14:50] CIA-15	zeromq2: Removed SPARC native atomic ops as they are untested and have been commented out for years
[14:50] CIA-15	zeromq2: Add "memory" to asm clobber for X86 atomic_counter::sub() - http://bit.ly/buhvIA
[14:50] CIA-15	zeromq2: 03Martin Lucina 07master * rf6c1c97 10/ (6 files in 2 dirs): Merge branch 'master' of github.com:sustrik/zeromq2 - http://bit.ly/cNQN1Z