Thursday July 21, 2011

[Time] NameMessage
[00:06] failshell hello. how does zeromq handle queues? can they be distributed in a cluster?
[00:13] jhawk28 no
[00:13] jhawk28 you could create a broker that is implemented as a cluster
[00:14] jhawk28 the queues in zeromq are effectively network buffers
[09:52] CIA-32 libzmq: 03Martin Sustrik 07master * rcb2d715 10/ builds/redhat/ : endmsg(3) and zmq_recvmsg(3) added to RPM spec file ...
[11:49] Darkproger hi! is it possible to set a socket timeout?
[11:50] Darkproger except using poll()
[11:53] guido_g no
[11:54] guido_g ahhh... stop
[11:54] guido_g check the docs for an option, it might be that there materialized one
[11:55] guido_g not for 2.1
[11:55] Darkproger HEAD?
[11:56] guido_g <- there is something
[11:56] guido_g but it's 3.0
[11:57] sustrik yes, there ZMQ_RCVTIMEO otpion
[11:57] sustrik in 3.0
[11:58] Darkproger are there matching pyzmq bindings by chance?
[11:58] sustrik i recall there's some 3.0 support in pyzmq
[11:58] guido_g pyzmq is sort 3.0 ready, but i wouldn't expect it to be fully compatible
[12:04] Darkproger great
[12:10] Darkproger btw, what if i use a REQ/REP socket to ping status of the remote process and send() succeeds the first time, but recv() dies on a timeout, and next time send() return error because it is not being called in a natural order. Is it possible to reset REQ socket state?
[12:12] guido_g no, you've to close it
[12:14] Darkproger so if i recreate it upon the next call, i guess that connect() will block in case when the endpoint is down?
[12:14] sustrik it won't
[12:17] pieterh sustrik: are you going to make router work with MORE instead of LABEL for 3.x?
[12:18] sustrik you are the maintainer of 3-0
[12:18] sustrik it's up to you to decide
[12:18] pieterh so you are not going to make a 3.1 branch?
[12:18] sustrik dunno
[12:18] sustrik what's the conclusion so far?
[12:18] pieterh well, no-one seems to mind either way
[12:19] sustrik i see
[12:20] sustrik anyway, the ROUTER is removed from current 4-0 and replaced by generic socket
[12:20] pieterh ah, ok
[12:20] pieterh what are you calling it?
[12:20] sustrik ZMQ_GENERIC
[12:20] sustrik lame
[12:20] pieterh lame
[12:20] pieterh can't you just call it ROUTER?
[12:20] pieterh less shift
[12:21] sustrik sure
[12:21] sustrik it'll make guide confusing afaics though
[12:21] sustrik so it's up to you
[12:21] pieterh the guide will have to be redone in large portions
[12:21] pieterh it's going to break 80% of the examples
[12:22] sustrik ok, i'll rename it to ROUTER
[12:22] pieterh that'll help IMO, since we can continue to speak of ROUTER-based patterns
[12:22] sustrik good
[12:22] pieterh then we simplify to use ROUTER at all sides
[12:22] sustrik ok
[12:23] pieterh ok, 4.0 it is...
[12:23] pieterh it might be we stop using minor version numbers at all...
[12:24] pieterh ok, I'll apply that patch to 3.0, since breaking ROUTER there would really be too painful
[12:24] pieterh also, I'm not sure if blocking is the right strategy on HWM... dropping might be better
[12:25] guido_g don't forget an _easy_ way to detect the hwm situation
[12:26] pieterh guido_g: if ROUTER drops messages on HWM, it can be detected as a real exception
[12:26] pieterh this is the basis for proving credit-based flow control:
[12:27] guido_g ahh ok, that's cool
[12:27] guido_g i like exceptions :)
[12:27] guido_g saw the article
[12:27] pieterh the principle is that you never reach HWM, and if you do, it's fatal
[12:27] pieterh this will work beautifully with the 4.0 ROUTER sockets
[12:28] guido_g ok
[12:28] sustrik block is the right solution IMO
[12:28] sustrik because it's easy to simulate drop on top of block
[12:28] sustrik but not vice versa
[12:29] guido_g so true
[12:30] pieterh sustrik: you might want to read that article, it could be useful to add cbfc to some sockets
[12:30] sustrik i've already read it
[12:34] pieterh sustrik: do you feel strongly about removing / not removing zmq_device from 3.0?
[12:34] pieterh I have an unmerged branch with your patch that restores it
[12:34] sustrik no
[12:34] pieterh anyone here care about that?
[12:35] guido_g nope
[12:35] sustrik :)
[12:43] sustrik as for the cbfc it's an interesting topic
[12:43] sustrik because there actually *is* cbfc on TCP level
[12:43] sustrik credit = SNDBUF + RCVBUF
[12:44] sustrik the only reason for cfbc on 0mq level is when you want to define your credit in messages rather than bytes
[12:45] sustrik and afaiu the only value people are really interested in is 1
[12:45] sustrik ie. lock-step
[12:46] sustrik which kind of implies we are dealing with some kind of higher level business logic here rather than simple flow control
[12:47] FellowTraveler hi all
[12:48] FellowTraveler QUestion: When my server is running, then I can send messages back and forth no problem. But if my server is NOT running, then the client hangs for a while and freezes my GUI.
[12:48] FellowTraveler Do you know how to configure ZMQ so this doesn't happen?
[12:48] sustrik aren't you calling a blocking function from a GUI thread or somesuch?
[13:03] pieterh sustrik: there are many patterns which repeat at TCP and also higher levels
[13:03] pieterh the point of CBFC is (a) to do multi message acknowledgment, asynchronously, and (b) to avoid blocking on write
[13:03] FellowTraveler How is that done?
[13:03] pieterh the specific use case I wrote that example for was a server sending out to many clients, which could never afford to block
[13:04] FellowTraveler How can I use Zmq to send a message, and check for a response later, without blocking on the call
[13:04] pieterh FellowTraveler: how much of the Guide have you read?
[13:05] pieterh sustrik: so window=1 is essentially the same as LRU
[13:05] pieterh that's a common pattern but really not the only one
[13:05] pieterh for one thing, it is slow, since each message requires a hand shake
[13:06] FellowTraveler pietrh: my understanding was the ZMQ was smart enough to handle all the messages back and forth, thus saving me from having to deal with that myself.
[13:06] FellowTraveler I would think that I can just send a message off, and then later just check the reply queue to see if anything came in
[13:06] sustrik so in the scenario with credit>1, why can't you simply use TCP credit?
[13:07] pieterh if there was some way to accurately know when it was safe to write to a specific client, via ROUTER, we'd not need cbfc
[13:07] sustrik FellowTraveler: Yes you can, but recv() can block if the response is not yet there
[13:07] sustrik pieterh: there is
[13:07] pieterh sustrik: in 4.0 you mean?
[13:07] sustrik rc = zmq_send (msg, NOBLOCK);
[13:08] sustrik if (rc == -1 && errno == EAGAIN)
[13:08] sustrik printf ("cannot send");
[13:08] pieterh sustrik: afaiu, if any of the client queues are full, it'll block on all of them
[13:08] sustrik nope
[13:08] pieterh ok
[13:08] sustrik each queue is a separate thing
[13:09] pieterh I'd have to find that thread again, it was a few months ago
[13:13] pieterh sustrik: ok, I found it...
[13:13] pieterh basically  xrep_t::has_out always returns true
[13:13] sustrik yes
[13:13] pieterh so you will not get an EAGAIN when one queue is full
[13:14] pieterh so there is no way to know in advance not to write to a full client
[13:14] sustrik yes, that's the case with XREP
[13:14] sustrik the generic socket blocks instead
[13:14] pieterh so, that's what I meant, you changed this in ROUTER for 4.0
[13:14] sustrik ah, yes
[13:14] sustrik sorry
[13:15] pieterh np, just nice to not have wasted my time making that CBFC
[13:15] sustrik still, the poll has to retuen POLLOUT all the time
[13:15] pieterh sure, because it doesn't know what peer you're talking about
[13:15] sustrik exactly
[13:15] sustrik it's weid
[13:15] pieterh so it still means tentatively writing all the time
[13:15] sustrik weird
[13:15] pieterh which is a pain
[13:15] sustrik yes
[13:15] pieterh no way to do that properly in fact
[13:15] sustrik exactly
[13:16] pieterh whereas CBFC does it properly
[13:16] pieterh you get an event, respond to it immediately
[13:16] pieterh ok...
[13:16] sustrik that's one of the thing i had in mind when i aregued that "generic" socket cannot be implemeted properly
[13:16] pieterh it's true but I'm happy layering two patterns
[13:17] pieterh even if we did cbfc in the socket, it wouldn't help
[13:17] sustrik yes, it's a conceptual problem
[13:17] pieterh now, what could work is a different kind of poll/event design
[13:17] pieterh i.e. "peer X is ready for output now, on socket S"
[13:18] sustrik yes, separate fds for different connections
[13:18] pieterh yes, in fact
[13:18] sustrik that's why i'm saying that pursuing generic socket leads back to standard TCP
[13:18] pieterh brings it closer to TCP while still giving that 0MQ magic
[13:18] pieterh well, 'standard' except for all the magic stuff
[13:18] sustrik yep
[13:18] pieterh it leads back to a TCP-like API
[13:19] pieterh which is perfectly OK IMO
[13:19] sustrik sure
[13:19] sustrik i guess most people use 0mq because they don't want to deal with TCP API
[13:19] sustrik but those could ignore generic socket
[13:19] pieterh well, to be honest, the basic TCP API is fine
[13:19] sustrik sure, it's nice
[13:20] pieterh except for the clunky details like converting addresses, handling errors, configuring sockets, etc.
[13:20] sustrik but not exactly easy to work with
[13:20] pieterh socket/accept/close/recv/send is fine
[13:20] sustrik ack
[13:20] pieterh this could be a better API than using meta-messages with a single socket
[13:21] sustrik maybe
[13:21] sustrik anyway, the clear separation of patterns means that we can do anything we want with generic socket
[13:21] sustrik and even if we mess it up
[13:21] pieterh yup, it's nice
[13:21] sustrik there's no hard done
[13:21] pieterh what's good is that we have a whole bunch of real use cases that we can use to test new designs against
[13:22] sustrik yes
[13:22] pieterh if we do it right, the various high level patterns should become simpler to implement
[13:23] pieterh ok, /me goes back to writing the chapter on Social Architecture...
[13:23] sustrik yes, but still, in the long run they have to be moved to the core
[13:23] sustrik because of minor inconsistencies
[13:24] sustrik like the generic socket always signaling POLLOUT
[13:24] pieterh that was the plan, but I assume it'll take 10 years or more
[13:24] sustrik etc.
[13:24] sustrik sure
[17:52] CIA-32 libzmq: 03Martin Sustrik 07master * r72a793f 10/ (7 files in 2 dirs): ZMQ_GENERIC renamed to ZMQ_ROUTER ...
[17:52] CIA-32 libzmq: 03Martin Sustrik 07master * ra1e09fa 10/ (doc/zmq_send.txt include/zmq.h src/router.cpp): ROUTER socket reports error when message cannot be routed ...
[17:52] CIA-32 libzmq: 03Martin Sustrik 07master * r6b873d4 10/ src/router.cpp : ROUTER socket blocks on SNDHWM ...
[17:52] CIA-32 libzmq: 03Martin Sustrik 07master * r4bd3359 10/ doc/zmq_sendmsg.txt : ECANTROUTE error documented in zmq_sendmsg(3) ...
[18:04] sustrik hi mikko!
[18:49] mikko hi sustrik
[18:49] sustrik hi
[18:49] mikko i'm in a slightly intoxicated state
[18:49] sustrik np :)
[18:49] sustrik i've applied the RPM patch
[18:50] sustrik the build system doesn't seem to build it
[18:50] sustrik is it off?
[18:50] mikko it builds once a day
[18:50] mikko i can force it
[18:50] sustrik ah, i see
[18:50] sustrik let it be
[18:50] sustrik thanks
[18:50] mikko 00 00 * * *
[18:51] mikko midnight BST
[18:51] mikko our autoconf is too new for redhat
[18:51] sustrik ok
[18:51] mikko so it builds from the snapshot src package
[18:51] mikko
[18:53] sustrik yes, i was just not sure about the scheduling of the build
[18:53] mikko success
[18:54] sustrik great, thanks!
[18:55] mikko np
[20:10] xristos
[20:10] xristos why are my signal handler not getting called?
[20:10] xristos *handelrs
[20:23] pieterh xristos: hi
[20:23] pieterh I use signal handlers in CZMQ and they definitely work
[20:24] pieterh I use sigaction() instead of signal() but IMO it's the same
[20:24] pieterh if it's not working in Python, I'd expect that pyzmq is doing its own signal handling
[20:24] pieterh check that code, or ask the pyzmq devs if they have an idea
[20:35] michelp xristos, python signals are acted on "between" bytecodes by the main thread, so maybe the recv is blocking and the signal won't be called until it returns
[20:35] michelp if you switch to poll() with a timeout, it might work?
[20:37] michelp that what this in the docs makes me think "lthough Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the “atomic” instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time."
[20:37] sustrik michelp: this was the original problem with handling Ctrl+C in python
[20:37] xristos michelp: i think pieterh is right in that whoever is handling the signal is terminating the program
[20:38] xristos so pyzmq must do it internally
[20:38] sustrik the signal was delayed until blocking zmq function returned
[20:38] sustrik such as zmq_recv()
[20:38] sustrik then we've changed the behaviour
[20:39] sustrik in such way that blocking calls return EINTR when signal happens
[20:39] sustrik so it should work ok now
[21:02] xristos mailbox.cpp:77 errno_assert (rc == 0);
[21:02] xristos wont this terminate if rc != 0 (i assume EINTR is not 0)
[21:03] xristos so when the signal is delivered, and if signaler.wait returns EINTR, the process terminates?
[21:12] sustrik xristos: what version are you referring to?
[21:16] sustrik zeromq2-1 trunk i guess
[21:16] sustrik yes, that's a new code and is seems to be incorrect
[21:18] xristos sustrik: 2.1 yes
[21:18] xristos i straced it and it's calling abort
[21:18] xristos so the assert is the issue
[21:23] sustrik xristos: yes, it should pass the EINTR to the caller
[21:24] sustrik can you test the fix?
[21:24] sustrik if (rc != 0 && (errno == EAGAIN || errno == EINTR) ...
[21:25] xristos yeah it seems fine
[21:25] xristos i'm at home now, will try it tomorrow at work
[21:25] sustrik ok, i'll commit it then
[21:25] xristos i'm already using 2.1 trunk for a previous commit