[Time] Name | Message |
[05:01] joshua
|
hi
|
[05:02] joshua
|
I've been having trouble using ZMQ_P2P socket types
|
[05:02] joshua
|
are there any examples around that use it successfully?
|
[05:19] guido_g
|
try to specify an ip address on both sides
|
[05:19] guido_g
|
you gave the interface to bind on the server side
|
[05:20] guido_g
|
just an idea, of course :)
|
[05:26] joshua
|
so for the server, instead of "tcp://lo:5555" bind "tcp://127.0.0.1:5555"?
|
[05:32] guido_g
|
yes
|
[05:40] joshua
|
hmm, it only works if I do send/receive pairs
|
[05:40] joshua
|
I can't get just one to send something to the other
|
[05:42] sustrik
|
joshua__: presumably, that's because you close the sender app before it has chance to send the data
|
[05:42] joshua
|
ah, the sender doesn't block?
|
[05:42] sustrik
|
no, it's async
|
[05:42] sustrik
|
that's what MQ means
|
[05:43] joshua
|
:D
|
[06:47] sjampoo
|
morning!
|
[07:22] sustrik
|
morning!
|
[08:00] mike
|
sustrik: ping
|
[08:00] sustrik
|
pong
|
[08:01] mike8901
|
hey - I'm working with joshua on a project - I know you guys talked a bit. Is it possible to block until all sends have completed?
|
[08:01] mike8901
|
joshua discovered the hard way that zmq won't keep your program alive
|
[08:04] sustrik
|
no, there's no way to do so -- it would result in deadlock if the peer in unavailable
|
[08:04] joshua
|
fun
|
[08:04] sustrik
|
you have to send acknowledgement by hand
|
[08:04] mike8901
|
so what is the recommended way to deal with the situation in which a client sends messages to a server, then exits?
|
[08:05] mike8901
|
hm
|
[08:05] sustrik
|
is it a request/reply scenario?
|
[08:05] mike8901
|
yes, but the reply may not come from the same peer
|
[08:05] mike8901
|
basically, we have a central server doling out work to slaves
|
[08:05] mike8901
|
and the slaves can talk to each other and redistribute work if needed
|
[08:05] mike8901
|
each piece of work has a unique ID attached to it, and a reply will come from *some* slave, but we don't know which
|
[08:06] sustrik
|
that doesn't matter imo
|
[08:06] sustrik
|
just wait for reply
|
[08:06] sustrik
|
and everything will works as expected
|
[08:07] mike8901
|
yeah, I actually think this is a non-issue
|
[08:07] sjampoo
|
If you don't even care about a reply, you could use the callback on the message object and implement something that blocks your self.
|
[08:08] sustrik
|
yes, but caution is needed as the callback is called from different thread
|
[08:13] mike8901
|
does zmq have a more appropriate topology than zmq_p2p for doing a round-robin distribution scheme? we're ending up having to have a vector of locks to each socket, which seems a little wasteful.
|
[08:14] sustrik
|
p2p does no distribution...
|
[08:15] sustrik
|
use DOWNSTREAM
|
[08:16] mike8901
|
ok, will look into that - thanks
|
[08:17] mike8901
|
also, does zmq have a way to get a callback when data is available on a socket? we're wasting a lot of time polling every socket on a separate thread now...
|
[08:20] mike8901
|
though that may be a function of us just "doing it wrong" by having many sockets, and polling them all in a nonblocking manner
|
[08:22] sjampoo
|
You can use zmq_poll to poll them all at once and fire callbacks from there, no? Or are you trying to do something different?
|
[08:24] mike8901
|
only issue with zmq_poll is that a different thread could be accessing a given socket
|
[08:25] mike8901
|
right now we use mutexes to prevent this, before checking with "recv" to see if data is available.
|
[08:26] mikko
|
A ÃMQ context is thread safe and may be shared among as many application threads as the application has requested using the app_threads parameter to zmq_init(), without any additional locking required on the part of the caller. Each ÃMQ socket belonging to a particular context may only be used by the thread that created it using zmq_socket().
|
[08:26] sustrik
|
mike8901: what exactly are you trying to achieve?
|
[08:27] sustrik
|
you have a thread that owns 1 socket, right?
|
[08:27] sustrik
|
you want to wait till there's message available in the socket, no?
|
[08:28] mike8901
|
Currently (this may not be the best architecture) we have a setup thread which connects to all the clients using a zmq_p2p model, sticks each socket into a vector, then spawns off a thread to send events to other zmq sockets, as well as a thread to recieve events from zmq sockets.
|
[08:29] mike8901
|
and (this may not be correct; we haven't gotten code running yet), we use a vector of mutexes to prevent access to each socket
|
[08:29] sustrik
|
what do you want to do? load balance the messages among N sockets?
|
[08:30] mike8901
|
yes, in a round-robin manner
|
[08:30] sustrik
|
use DOWNSTREAM socket
|
[08:30] mike8901
|
okay, I'll look into that
|
[08:30] sustrik
|
you'll have a single socket
|
[08:30] sustrik
|
it'll do all the hard work for you
|
[08:30] mike8901
|
but can I still have another thread polling for recieving messages?
|
[08:30] sustrik
|
yes, but there should be another socket there
|
[08:31] mike8901
|
oh ok
|
[08:31] sustrik
|
presumable UPSTREAM one
|
[08:31] sustrik
|
that one merges messages from many sources
|
[08:32] mike8901
|
I'm a bit tired to look at that now, but I'll read through http://www.zeromq.org/tutorials:butterfly tomorrow. Thanks for your help!
|
[08:32] sustrik
|
np
|
[08:47] mike8901
|
oh - one last quick question before I go off to bed: how do you get the list of peers to ZMQ_UPSTREAM? http://github.com/sustrik/jbutterfly/blob/master/gonzo/Component.java specifies an "inp-interface," but I'm not sure what that looks like.
|
[08:48] sustrik
|
you don't have a list of peers
|
[08:48] sustrik
|
0mq should manage it for you
|
[08:49] mike8901
|
er, I guess the question is more appropriate for ZMQ_DOWNSTREAM
|
[08:49] sustrik
|
same applies to any socket type
|
[08:49] mike8901
|
now, I'm really confused ;)
|
[08:49] sustrik
|
the peers are managed by the library
|
[08:49] mike8901
|
how is it done though? using some multicast?
|
[08:49] sustrik
|
it's transparant to the user
|
[08:49] sustrik
|
you can opt for multicast but it's not necessary
|
[08:50] mike8901
|
how do the peers find each other?
|
[08:50] mike8901
|
or rather
|
[08:50] sustrik
|
via address
|
[08:50] mike8901
|
but how do you specify the address?
|
[08:50] mike8901
|
*addresses
|
[08:50] mike8901
|
the connect function takes in a single address
|
[08:50] sustrik
|
yes, the connecting side speaks to a single peer
|
[08:51] sustrik
|
the binding side speaks to multiple peers
|
[08:51] mike8901
|
ah, that may be an issue then....
|
[08:51] mike8901
|
our "root" node is going to be transient, and needs to be able to connect to the slaves at will
|
[08:52] sustrik
|
you can connect multiple time is needed"
|
[08:52] sustrik
|
s.connect (A);
|
[08:52] sustrik
|
s.connect (B)
|
[08:52] sustrik
|
etc.
|
[08:52] mike8901
|
oh ok
|
[08:52] mike8901
|
so the root can use a downstream socket, and just call connect for each addr
|
[08:53] mike8901
|
is there an easy way to establish a corresponding upstream socket, without bothering to pass the server's IP to the clients?
|
[08:53] sustrik
|
server? upstream? what applications there are?
|
[08:54] mike8901
|
okay, sorry
|
[08:54] mike8901
|
let me explain my application in detail
|
[08:54] mikko
|
maybe dns?
|
[08:54] mike8901
|
we're implementing a distributed compiler(on top of clang)
|
[08:54] sustrik
|
mikko: possibly, but let's first listen to the use case
|
[08:54] mike8901
|
the "master" is spawned on demand on the user's computer
|
[08:55] mike8901
|
the "slaves" will always be listening for work to process(i.e. files to compile to object code)
|
[08:55] mike8901
|
the "master" is not guaranteed to always be running; it is only up for the duration of the compile
|
[08:55] sustrik
|
how many masters there may be?
|
[08:55] mike8901
|
for now, just 1
|
[08:55] mike8901
|
but the master is transient
|
[08:56] sustrik
|
so 1 client, 1 master, N workers
|
[08:56] mike8901
|
yes
|
[08:56] mike8901
|
but the client/master are transient
|
[08:56] mike8901
|
well, the client will stay around until work it needs is done
|
[08:56] sustrik
|
how does the interaction pattern looks like?
|
[08:56] sustrik
|
client sends a request
|
[08:57] sustrik
|
master dispatches it to one worker
|
[08:57] sustrik
|
worker processes it
|
[08:57] sustrik
|
sends reply to the master
|
[08:57] sustrik
|
master forwards the reply to client
|
[08:57] sustrik
|
is that it?
|
[08:57] mike8901
|
yes
|
[08:57] mike8901
|
well
|
[08:57] mike8901
|
there's not really any client-master interaction now..
|
[08:58] sustrik
|
ok, so let's drop the clinet from the scheme
|
[08:58] mike8901
|
(there is technically, but we use UNIX sockets for that now)
|
[08:58] sustrik
|
master sends request to a worker
|
[08:58] sustrik
|
worker replies back to the master
|
[08:58] sustrik
|
right?
|
[08:58] mike8901
|
yep
|
[08:58] mike8901
|
well
|
[08:59] mike8901
|
workers are not necesssarily the same, but yes
|
[08:59] mike8901
|
that's the idea
|
[08:59] sustrik
|
thay are not the same?
|
[08:59] sustrik
|
what's the difference?
|
[08:59] mike8901
|
well, we're implementing work queue stealing, so if one worker runs out of work, it can ask another for work.
|
[08:59] sustrik
|
hm, what is that good for?
|
[08:59] mike8901
|
so the master may not recieve the response from the worker it sent the request to
|
[09:00] sustrik
|
why not let the master load balance the work?
|
[09:00] mike8901
|
the master is going to be overloaded preprocessing(ahmdahl's law) - we want the slaves to load balance amongst themselves
|
[09:02] sustrik
|
the master has to send the requests anyway, no?
|
[09:02] mike8901
|
yes
|
[09:02] mike8901
|
the requests are going to be of varying size though
|
[09:02] mike8901
|
as with any project, you'll have really small source files and really large ones
|
[09:03] mike8901
|
it's inevitable that some slaves will run out of work, and we want the slaves to be able to steal work off each other's queues
|
[09:03] sustrik
|
so what you want to avoid queueing, right?
|
[09:04] sustrik
|
at most one request dispatched to the worker at time
|
[09:04] mike8901
|
no
|
[09:04] mike8901
|
we want the slaves to maintain a queue
|
[09:04] mike8901
|
so that if another slave asks slave A for work, it can provide it to slave B
|
[09:04] sustrik
|
yes, i understand, but what's the point?
|
[09:05] mike8901
|
to load balance
|
[09:05] sustrik
|
why not load-balance upfront?
|
[09:05] sustrik
|
rather than messing with queues and reassigning the work?
|
[09:05] mike8901
|
it's impossible to exactly load balance up front... each request could take an arbitrary amount of time
|
[09:06] sustrik
|
say you send at most one request to each worker at time
|
[09:06] sustrik
|
when it responds you send another request
|
[09:06] sustrik
|
etc.
|
[09:06] sustrik
|
wouldn't that solve the problem?
|
[09:06] mike8901
|
right, that takes up overhead on the master
|
[09:06] sustrik
|
but the master has to send the requests anyway
|
[09:06] sustrik
|
what overhead do you have in mind?
|
[09:07] mike8901
|
for one, memory overhead
|
[09:07] sustrik
|
akc
|
[09:07] mike8901
|
queueing the requests will take away from the preprocessor's cache
|
[09:07] sustrik
|
ok, i see
|
[09:07] mike8901
|
basically, we want as little burdon as possible to be on the master
|
[09:08] sustrik
|
what about having a separate load-balancer node then?
|
[09:08] mike8901
|
that sounds like it would add a whole other layer of inefficency- now the source has to travel *twice* over the network
|
[09:08] mike8901
|
and you have half the effective bandwidth
|
[09:09] mike8901
|
or maybe even less
|
[09:10] mike8901
|
anyway, sorry to cut this discussion short(really enjoyed talking with you), but it's 4:10am and I'm exhausted. I'd love to continue this some other time(before Monday at 9am though ;) ).
|
[09:10] sustrik
|
sure
|
[09:10] sustrik
|
good night!
|
[09:11] mike8901
|
(Monday at 9am is the deadline for this project - yes, we're screwed) ;)
|
[09:11] mike8901
|
night!
|
[09:11] sjampoo
|
heh
|
[09:11] sjampoo
|
goodnight and goodluck :)
|
[09:32] sjampoo
|
sustrik: i am getting "Assertion failed: fetched (rep.cpp:265)" with a REQ/REP socket on messages larger than about 8k
|
[09:32] sjampoo
|
what could be causing this?
|
[09:32] sustrik
|
let me see
|
[09:33] sjampoo
|
Seems to be something that didn't happen on 2.0.6
|
[09:33] sustrik
|
sjampoo: what peer socket types is connected to your REP socket?
|
[09:33] sjampoo
|
REQ
|
[09:34] sustrik
|
then it's a bug
|
[09:34] sustrik
|
can you report it please?
|
[09:34] sustrik
|
test program would help
|
[09:34] sjampoo
|
i cannot really reproduce it with C code ;/
|
[09:35] sustrik
|
hm, which binding it appears with?
|
[09:35] sjampoo
|
PyZMQ
|
[09:35] sjampoo
|
and local_lat / remote_lat
|
[09:35] sjampoo
|
(the perf benchmark)
|
[09:36] sustrik
|
i would then suggest reporting it as problem with pyzmq
|
[09:37] sjampoo
|
what could be causing it?
|
[09:37] sustrik
|
brian will presumably pass the issue upstream with more details attached
|
[09:37] sustrik
|
dunno, look's like the message processed has no body
|
[09:38] sjampoo
|
Ok
|
[10:50] CIA-15
|
zeromq2: 03Brett Cameron 07master * r714a8d5 10/ (5 files): fixes for OpenVMS - http://bit.ly/9IYypp
|
[10:50] CIA-15
|
zeromq2: 03Martin Sustrik 07master * r8e5ac10 10/ (7 files in 6 dirs): Merge branch 'master' of git@github.com:sustrik/zeromq2 - http://bit.ly/bKeYae
|
[12:12] sjampoo
|
The above issue seems to be a by product of this commit: http://github.com/sustrik/zeromq2/commit/ad6fa9d0d4f1cf29ce63998d7efe337b1a784ef6
|
[12:14] sustrik
|
sjampoo: yes, that's when the functionality was introduced
|
[14:21] mato
|
sustrik: are you there?
|
[14:21] sustrik
|
mato: hi
|
[14:21] mato
|
sustrik: I want to revert those atomics changes you committed
|
[14:21] sustrik
|
yes, sure
|
[14:22] mato
|
while I'm at it, can I remove the native SPARC ops? They are #ifdef-ed out in any case
|
[14:22] mato
|
also, in the current git atomic_bitmap is gone, this is correct?
|
[14:22] sustrik
|
ack
|
[14:22] mato
|
so we have just atomic_counter and atomic_ptr, right?
|
[14:22] sustrik
|
SPARC: sure, go on, it's commented out for 2 years now or so :)
|
[14:22] sustrik
|
right
|
[14:23] mato
|
I'm surprised you committed those changes without asking for review :-(
|
[14:23] mato
|
anyway, no harm done, I'll put back the old code
|
[14:25] sustrik
|
no way of check everyting, i'm committing in optimistic fashion
|
[14:25] sjampoo
|
sustrik: that commit introduces multihop, but i am not using that functionality as i have two req/rep sockets connected directly. Anyway i can reproduce it right now, i probably had too many versions lying around. I'll make an issue
|
[14:25] sustrik
|
sjampoo: yes, please
|
[14:25] mato
|
sustrik: sure, but you know I spent time on that code, so you could have waited till I got back from holiday
|
[14:26] sustrik
|
mato: do you want to become a maintainer for particular subset of files?
|
[14:26] sustrik
|
say the atomics?
|
[14:27] mato
|
I kind of assumed I was, sice I spent time on it
|
[14:27] mato
|
same for doc/*
|
[14:27] sustrik
|
ok, let's make this more formal so that obvious who's responsible for what
|
[14:28] mato
|
if you like
|
[14:28] sustrik
|
definitely
|
[14:35] sustrik
|
mato: ok, i've written down the list of components in the project
|
[14:35] sustrik
|
what's the common way of listing maintainers?
|
[14:35] mato
|
MAINTAINERS file in source tree
|
[14:35] sustrik
|
in root?
|
[14:35] mato
|
yeah
|
[14:35] sustrik
|
ok
|
[14:36] mato
|
with Component, Name (of maintainer), Email address
|
[14:36] mato
|
or some format like that
|
[14:36] sustrik
|
ack
|
[14:36] sustrik
|
what about the autotools build
|
[14:36] sustrik
|
would you like to maintain that?
|
[14:37] mato
|
not really, but you can add me in there as a point of contact
|
[14:37] sustrik
|
ok, so it's autotools, docs & atomics
|
[14:37] sustrik
|
ok?
|
[14:37] mato
|
yeah
|
[14:44] mato
|
you should of course add in yourself (with an address of the mailing list) as the maintainer for "everything else"
|
[14:46] CIA-15
|
zeromq2: 03Martin Sustrik 07master * r127cb89 10/ MAINTAINERS : MAINTAINERS file added - http://bit.ly/aEumLZ
|
[14:46] sustrik
|
done
|
[14:50] CIA-15
|
zeromq2: 03Martin Lucina 07master * r52ef3f3 10/ (src/atomic_counter.hpp src/atomic_ptr.hpp):
|
[14:50] CIA-15
|
zeromq2: Revert commit 7cb076e, atomic ops cleanup
|
[14:50] CIA-15
|
zeromq2: Reverted to using atomic.h on NetBSD
|
[14:50] CIA-15
|
zeromq2: Removed GNU builtins (see http://lists.zeromq.org/pipermail/zeromq-dev/2010-May/003485.html)
|
[14:50] CIA-15
|
zeromq2: Removed SPARC native atomic ops as they are untested and have been commented out for years
|
[14:50] CIA-15
|
zeromq2: Add "memory" to asm clobber for X86 atomic_counter::sub() - http://bit.ly/buhvIA
|
[14:50] CIA-15
|
zeromq2: 03Martin Lucina 07master * rf6c1c97 10/ (6 files in 2 dirs): Merge branch 'master' of github.com:sustrik/zeromq2 - http://bit.ly/cNQN1Z
|