Wednesday May 12, 2010

[Time] NameMessage
[05:01] joshua hi
[05:02] joshua I've been having trouble using ZMQ_P2P socket types
[05:02] joshua are there any examples around that use it successfully?
[05:19] guido_g try to specify an ip address on both sides
[05:19] guido_g you gave the interface to bind on the server side
[05:20] guido_g just an idea, of course :)
[05:26] joshua so for the server, instead of "tcp://lo:5555" bind "tcp://"?
[05:32] guido_g yes
[05:40] joshua hmm, it only works if I do send/receive pairs
[05:40] joshua I can't get just one to send something to the other
[05:42] sustrik joshua__: presumably, that's because you close the sender app before it has chance to send the data
[05:42] joshua ah, the sender doesn't block?
[05:42] sustrik no, it's async
[05:42] sustrik that's what MQ means
[05:43] joshua :D
[06:47] sjampoo morning!
[07:22] sustrik morning!
[08:00] mike sustrik: ping
[08:00] sustrik pong
[08:01] mike8901 hey - I'm working with joshua on a project - I know you guys talked a bit. Is it possible to block until all sends have completed?
[08:01] mike8901 joshua discovered the hard way that zmq won't keep your program alive
[08:04] sustrik no, there's no way to do so -- it would result in deadlock if the peer in unavailable
[08:04] joshua fun
[08:04] sustrik you have to send acknowledgement by hand
[08:04] mike8901 so what is the recommended way to deal with the situation in which a client sends messages to a server, then exits?
[08:05] mike8901 hm
[08:05] sustrik is it a request/reply scenario?
[08:05] mike8901 yes, but the reply may not come from the same peer
[08:05] mike8901 basically, we have a central server doling out work to slaves
[08:05] mike8901 and the slaves can talk to each other and redistribute work if needed
[08:05] mike8901 each piece of work has a unique ID attached to it, and a reply will come from *some* slave, but we don't know which
[08:06] sustrik that doesn't matter imo
[08:06] sustrik just wait for reply
[08:06] sustrik and everything will works as expected
[08:07] mike8901 yeah, I actually think this is a non-issue
[08:07] sjampoo If you don't even care about a reply, you could use the callback on the message object and implement something that blocks your self.
[08:08] sustrik yes, but caution is needed as the callback is called from different thread
[08:13] mike8901 does zmq have a more appropriate topology than zmq_p2p for doing a round-robin distribution scheme? we're ending up having to have a vector of locks to each socket, which seems a little wasteful.
[08:14] sustrik p2p does no distribution...
[08:15] sustrik use DOWNSTREAM
[08:16] mike8901 ok, will look into that - thanks
[08:17] mike8901 also, does zmq have a way to get a callback when data is available on a socket? we're wasting a lot of time polling every socket on a separate thread now...
[08:20] mike8901 though that may be a function of us just "doing it wrong" by having many sockets, and polling them all in a nonblocking manner
[08:22] sjampoo You can use zmq_poll to poll them all at once and fire callbacks from there, no? Or are you trying to do something different?
[08:24] mike8901 only issue with zmq_poll is that a different thread could be accessing a given socket
[08:25] mike8901 right now we use mutexes to prevent this, before checking with "recv" to see if data is available.
[08:26] mikko A ØMQ context is thread safe and may be shared among as many application threads as the application has requested using the app_threads parameter to zmq_init(), without any additional locking required on the part of the caller. Each ØMQ socket belonging to a particular context may only be used by the thread that created it using zmq_socket().
[08:26] sustrik mike8901: what exactly are you trying to achieve?
[08:27] sustrik you have a thread that owns 1 socket, right?
[08:27] sustrik you want to wait till there's message available in the socket, no?
[08:28] mike8901 Currently (this may not be the best architecture) we have a setup thread which connects to all the clients using a zmq_p2p model, sticks each socket into a vector, then spawns off a thread to send events to other zmq sockets, as well as a thread to recieve events from zmq sockets.
[08:29] mike8901 and (this may not be correct; we haven't gotten code running yet), we use a vector of mutexes to prevent access to each socket
[08:29] sustrik what do you want to do? load balance the messages among N sockets?
[08:30] mike8901 yes, in a round-robin manner
[08:30] sustrik use DOWNSTREAM socket
[08:30] mike8901 okay, I'll look into that
[08:30] sustrik you'll have a single socket
[08:30] sustrik it'll do all the hard work for you
[08:30] mike8901 but can I still have another thread polling for recieving messages?
[08:30] sustrik yes, but there should be another socket there
[08:31] mike8901 oh ok
[08:31] sustrik presumable UPSTREAM one
[08:31] sustrik that one merges messages from many sources
[08:32] mike8901 I'm a bit tired to look at that now, but I'll read through tomorrow. Thanks for your help!
[08:32] sustrik np
[08:47] mike8901 oh - one last quick question before I go off to bed: how do you get the list of peers to ZMQ_UPSTREAM? specifies an "inp-interface," but I'm not sure what that looks like.
[08:48] sustrik you don't have a list of peers
[08:48] sustrik 0mq should manage it for you
[08:49] mike8901 er, I guess the question is more appropriate for ZMQ_DOWNSTREAM
[08:49] sustrik same applies to any socket type
[08:49] mike8901 now, I'm really confused ;)
[08:49] sustrik the peers are managed by the library
[08:49] mike8901 how is it done though? using some multicast?
[08:49] sustrik it's transparant to the user
[08:49] sustrik you can opt for multicast but it's not necessary
[08:50] mike8901 how do the peers find each other?
[08:50] mike8901 or rather
[08:50] sustrik via address
[08:50] mike8901 but how do you specify the address?
[08:50] mike8901 *addresses
[08:50] mike8901 the connect function takes in a single address
[08:50] sustrik yes, the connecting side speaks to a single peer
[08:51] sustrik the binding side speaks to multiple peers
[08:51] mike8901 ah, that may be an issue then....
[08:51] mike8901 our "root" node is going to be transient, and needs to be able to connect to the slaves at will
[08:52] sustrik you can connect multiple time is needed"
[08:52] sustrik s.connect (A);
[08:52] sustrik s.connect (B)
[08:52] sustrik etc.
[08:52] mike8901 oh ok
[08:52] mike8901 so the root can use a downstream socket, and just call connect for each addr
[08:53] mike8901 is there an easy way to establish a corresponding upstream socket, without bothering to pass the server's IP to the clients?
[08:53] sustrik server? upstream? what applications there are?
[08:54] mike8901 okay, sorry
[08:54] mike8901 let me explain my application in detail
[08:54] mikko maybe dns?
[08:54] mike8901 we're implementing a distributed compiler(on top of clang)
[08:54] sustrik mikko: possibly, but let's first listen to the use case
[08:54] mike8901 the "master" is spawned on demand on the user's computer
[08:55] mike8901 the "slaves" will always be listening for work to process(i.e. files to compile to object code)
[08:55] mike8901 the "master" is not guaranteed to always be running; it is only up for the duration of the compile
[08:55] sustrik how many masters there may be?
[08:55] mike8901 for now, just 1
[08:55] mike8901 but the master is transient
[08:56] sustrik so 1 client, 1 master, N workers
[08:56] mike8901 yes
[08:56] mike8901 but the client/master are transient
[08:56] mike8901 well, the client will stay around until work it needs is done
[08:56] sustrik how does the interaction pattern looks like?
[08:56] sustrik client sends a request
[08:57] sustrik master dispatches it to one worker
[08:57] sustrik worker processes it
[08:57] sustrik sends reply to the master
[08:57] sustrik master forwards the reply to client
[08:57] sustrik is that it?
[08:57] mike8901 yes
[08:57] mike8901 well
[08:57] mike8901 there's not really any client-master interaction now..
[08:58] sustrik ok, so let's drop the clinet from the scheme
[08:58] mike8901 (there is technically, but we use UNIX sockets for that now)
[08:58] sustrik master sends request to a worker
[08:58] sustrik worker replies back to the master
[08:58] sustrik right?
[08:58] mike8901 yep
[08:58] mike8901 well
[08:59] mike8901 workers are not necesssarily the same, but yes
[08:59] mike8901 that's the idea
[08:59] sustrik thay are not the same?
[08:59] sustrik what's the difference?
[08:59] mike8901 well, we're implementing work queue stealing, so if one worker runs out of work, it can ask another for work.
[08:59] sustrik hm, what is that good for?
[08:59] mike8901 so the master may not recieve the response from the worker it sent the request to
[09:00] sustrik why not let the master load balance the work?
[09:00] mike8901 the master is going to be overloaded preprocessing(ahmdahl's law) - we want the slaves to load balance amongst themselves
[09:02] sustrik the master has to send the requests anyway, no?
[09:02] mike8901 yes
[09:02] mike8901 the requests are going to be of varying size though
[09:02] mike8901 as with any project, you'll have really small source files and really large ones
[09:03] mike8901 it's inevitable that some slaves will run out of work, and we want the slaves to be able to steal work off each other's queues
[09:03] sustrik so what you want to avoid queueing, right?
[09:04] sustrik at most one request dispatched to the worker at time
[09:04] mike8901 no
[09:04] mike8901 we want the slaves to maintain a queue
[09:04] mike8901 so that if another slave asks slave A for work, it can provide it to slave B
[09:04] sustrik yes, i understand, but what's the point?
[09:05] mike8901 to load balance
[09:05] sustrik why not load-balance upfront?
[09:05] sustrik rather than messing with queues and reassigning the work?
[09:05] mike8901 it's impossible to exactly load balance up front... each request could take an arbitrary amount of time
[09:06] sustrik say you send at most one request to each worker at time
[09:06] sustrik when it responds you send another request
[09:06] sustrik etc.
[09:06] sustrik wouldn't that solve the problem?
[09:06] mike8901 right, that takes up overhead on the master
[09:06] sustrik but the master has to send the requests anyway
[09:06] sustrik what overhead do you have in mind?
[09:07] mike8901 for one, memory overhead
[09:07] sustrik akc
[09:07] mike8901 queueing the requests will take away from the preprocessor's cache
[09:07] sustrik ok, i see
[09:07] mike8901 basically, we want as little burdon as possible to be on the master
[09:08] sustrik what about having a separate load-balancer node then?
[09:08] mike8901 that sounds like it would add a whole other layer of inefficency- now the source has to travel *twice* over the network
[09:08] mike8901 and you have half the effective bandwidth
[09:09] mike8901 or maybe even less
[09:10] mike8901 anyway, sorry to cut this discussion short(really enjoyed talking with you), but it's 4:10am and I'm exhausted. I'd love to continue this some other time(before Monday at 9am though ;) ).
[09:10] sustrik sure
[09:10] sustrik good night!
[09:11] mike8901 (Monday at 9am is the deadline for this project - yes, we're screwed) ;)
[09:11] mike8901 night!
[09:11] sjampoo heh
[09:11] sjampoo goodnight and goodluck :)
[09:32] sjampoo sustrik: i am getting "Assertion failed: fetched (rep.cpp:265)" with a REQ/REP socket on messages larger than about 8k
[09:32] sjampoo what could be causing this?
[09:32] sustrik let me see
[09:33] sjampoo Seems to be something that didn't happen on 2.0.6
[09:33] sustrik sjampoo: what peer socket types is connected to your REP socket?
[09:33] sjampoo REQ
[09:34] sustrik then it's a bug
[09:34] sustrik can you report it please?
[09:34] sustrik test program would help
[09:34] sjampoo i cannot really reproduce it with C code ;/
[09:35] sustrik hm, which binding it appears with?
[09:35] sjampoo PyZMQ
[09:35] sjampoo and local_lat / remote_lat
[09:35] sjampoo (the perf benchmark)
[09:36] sustrik i would then suggest reporting it as problem with pyzmq
[09:37] sjampoo what could be causing it?
[09:37] sustrik brian will presumably pass the issue upstream with more details attached
[09:37] sustrik dunno, look's like the message processed has no body
[09:38] sjampoo Ok
[10:50] CIA-15 zeromq2: 03Brett Cameron 07master * r714a8d5 10/ (5 files): fixes for OpenVMS -
[10:50] CIA-15 zeromq2: 03Martin Sustrik 07master * r8e5ac10 10/ (7 files in 6 dirs): Merge branch 'master' of -
[12:12] sjampoo The above issue seems to be a by product of this commit:
[12:14] sustrik sjampoo: yes, that's when the functionality was introduced
[14:21] mato sustrik: are you there?
[14:21] sustrik mato: hi
[14:21] mato sustrik: I want to revert those atomics changes you committed
[14:21] sustrik yes, sure
[14:22] mato while I'm at it, can I remove the native SPARC ops? They are #ifdef-ed out in any case
[14:22] mato also, in the current git atomic_bitmap is gone, this is correct?
[14:22] sustrik ack
[14:22] mato so we have just atomic_counter and atomic_ptr, right?
[14:22] sustrik SPARC: sure, go on, it's commented out for 2 years now or so :)
[14:22] sustrik right
[14:23] mato I'm surprised you committed those changes without asking for review :-(
[14:23] mato anyway, no harm done, I'll put back the old code
[14:25] sustrik no way of check everyting, i'm committing in optimistic fashion
[14:25] sjampoo sustrik: that commit introduces multihop, but i am not using that functionality as i have two req/rep sockets connected directly. Anyway i can reproduce it right now, i probably had too many versions lying around. I'll make an issue
[14:25] sustrik sjampoo: yes, please
[14:25] mato sustrik: sure, but you know I spent time on that code, so you could have waited till I got back from holiday
[14:26] sustrik mato: do you want to become a maintainer for particular subset of files?
[14:26] sustrik say the atomics?
[14:27] mato I kind of assumed I was, sice I spent time on it
[14:27] mato same for doc/*
[14:27] sustrik ok, let's make this more formal so that obvious who's responsible for what
[14:28] mato if you like
[14:28] sustrik definitely
[14:35] sustrik mato: ok, i've written down the list of components in the project
[14:35] sustrik what's the common way of listing maintainers?
[14:35] mato MAINTAINERS file in source tree
[14:35] sustrik in root?
[14:35] mato yeah
[14:35] sustrik ok
[14:36] mato with Component, Name (of maintainer), Email address
[14:36] mato or some format like that
[14:36] sustrik ack
[14:36] sustrik what about the autotools build
[14:36] sustrik would you like to maintain that?
[14:37] mato not really, but you can add me in there as a point of contact
[14:37] sustrik ok, so it's autotools, docs & atomics
[14:37] sustrik ok?
[14:37] mato yeah
[14:44] mato you should of course add in yourself (with an address of the mailing list) as the maintainer for "everything else"
[14:46] CIA-15 zeromq2: 03Martin Sustrik 07master * r127cb89 10/ MAINTAINERS : MAINTAINERS file added -
[14:46] sustrik done
[14:50] CIA-15 zeromq2: 03Martin Lucina 07master * r52ef3f3 10/ (src/atomic_counter.hpp src/atomic_ptr.hpp):
[14:50] CIA-15 zeromq2: Revert commit 7cb076e, atomic ops cleanup
[14:50] CIA-15 zeromq2: Reverted to using atomic.h on NetBSD
[14:50] CIA-15 zeromq2: Removed GNU builtins (see
[14:50] CIA-15 zeromq2: Removed SPARC native atomic ops as they are untested and have been commented out for years
[14:50] CIA-15 zeromq2: Add "memory" to asm clobber for X86 atomic_counter::sub() -
[14:50] CIA-15 zeromq2: 03Martin Lucina 07master * rf6c1c97 10/ (6 files in 2 dirs): Merge branch 'master' of -