Thursday October 14, 2010

[Time] NameMessage
[05:51] deri i enjoyed reading the zeromq guide. a couple of the take aways I got is to focus on the 1-to-N fan out principle, and also setting up the REP-REP "pinch points". i still need to sit down and think things through beyond that though.
[06:38] CIA-14 zeromq2: 03Martin Sustrik 07master * re2167ce 10/ src/zmq.cpp :
[06:38] CIA-14 zeromq2: Precise timouts in zmq_poll implemented
[06:38] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[06:38] CIA-14 zeromq2: 03Martin Pales 07master * rda73b7c 10/ (AUTHORS src/devpoll.cpp):
[06:38] CIA-14 zeromq2: zmq::devpoll_t : correct a typo in loop()
[06:38] CIA-14 zeromq2: A minor typo correction to resolve compilation error on Solaris.
[06:38] CIA-14 zeromq2: Signed-off-by: Martin Pales <> -
[07:17] CIA-14 zeromq2: 03Martin Sustrik 07master * rb174ad2 10/ doc/zmq_poll.txt :
[07:17] CIA-14 zeromq2: zmq_poll man page fixed to reflect the precise timeout semantics.
[07:17] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[07:35] CIA-14 zeromq2: 03Martin Sustrik 07master * rcafcdbb 10/ src/zmq.cpp :
[07:35] CIA-14 zeromq2: Safety measure in zmq_msg_close implemented
[07:35] CIA-14 zeromq2: zmq_msg_close now empties the message on zmq_msg_close, thus not
[07:35] CIA-14 zeromq2: leaving random data in the structure, that may be mistaken for
[07:35] CIA-14 zeromq2: a valid message.
[07:35] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[10:02] CIA-14 zeromq2: 03Martin Pales 07master * rf9e6d94 10/ src/poller_base.cpp :
[10:02] CIA-14 zeromq2: zmq::poller_base_t : workaround for sunstudio compiler in add_timer()
[10:02] CIA-14 zeromq2: A minor workaround to resolve compilation error with sunstudio compiler,
[10:02] CIA-14 zeromq2: which does not yet support member templates for std::multimap.
[10:02] CIA-14 zeromq2: Signed-off-by: Martin Pales <> -
[10:15] CIA-14 zeromq2: 03Martin Sustrik 07master * rb7386f5 10/ (4 files):
[10:15] CIA-14 zeromq2: To insert to associateive STL containers value_type used instead of make_pair
[10:15] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[11:06] CIA-14 zeromq2: 03Gonzalo Diethelm 07maint * r26d7669 10/ .gitignore : Added bin directory to ignore list. -
[11:16] CIA-14 jzmq: 03Gonzalo Diethelm 07master * r04603ed 10/ (12 files in 4 dirs):
[11:16] CIA-14 jzmq: All socket options are now 64 bits.
[11:16] CIA-14 jzmq: Enabled some socket options only from version 2.1.0, at compile and run time.
[11:16] CIA-14 jzmq: Added version functions to Java binding.
[11:16] CIA-14 jzmq: Changed several file modes back to 644. -
[12:25] mato sustrik: are you there?
[12:26] sustrik mato: hi
[12:26] mato sustrik: you ignored my comment about writing a test_poll, why?
[12:27] mato sustrik: then you commit the patch and ask "please check if everything works" :-(
[12:27] sustrik i've did it by accident
[12:27] sustrik to prevent myself from embarassement i've announced it as if it was intended afterwards :)
[12:27] mato what was intended?
[12:28] sustrik i've had a git tree with the patch aplied
[12:28] mato what patch?
[12:28] sustrik zmq_poll one
[12:28] sustrik then there was another patch on the mailing list
[12:28] sustrik so i've aplied it
[12:28] mato huh?
[12:29] sustrik forgeting that there is the zmq_poll patch already aplied in that repo
[12:29] mato i thought you wrote the zmq_poll patch for timeouts? it has your name on it
[12:29] sustrik yes, i did
[12:29] mato so? what's this about another patch?
[12:29] sustrik i'm explaining the accident
[12:29] sustrik 1. i have my local repo
[12:29] mato what accident?
[12:30] sustrik how the untested zmq_poll patch got into zeromq/zeromq2
[12:30] sustrik 2. i fix zmq_poll
[12:30] sustrik 3. i commit the fix so that i can do git format-patch
[12:30] sustrik 4. i send it the patch to the ML
[12:30] mato oh, i see now
[12:30] sustrik 5. I am happy
[12:31] sustrik 6. another patch arrives on ML
[12:31] sustrik etc.
[12:31] mato ok, two things here
[12:31] mato or, two different approaches to fix the problem
[12:31] mato 1. use local topic branches for your own work, so it's not on your 'master' branch at all
[12:32] mato and/or 2. (the safest approach), use two separate clones of zeromq2 for your work
[12:32] mato in other words, one clone for your maintainer hayt
[12:32] mato *hat
[12:32] mato where all you do is apply patches as they come, merge branches, and push to github
[12:32] sustrik i'm doing 2
[12:32] mato and a separate clone for your contributor hat
[12:33] sustrik but accidents happen
[12:33] sustrik i have to think of my local naming convention to make is safe...
[12:33] mato git clone ...
[12:33] mato mv zeromq2 zeromq2-integration
[12:33] mato git clone ...
[12:34] mato problem solved :-)
[12:34] sustrik something like that
[12:34] mato alternatively, clone your personal zeromq2 repo from the git:// url
[12:34] mato then you can't push to origin at all
[12:34] sustrik up to now i was naming the clones randomly
[12:34] sustrik which produced the accident
[12:34] mato yeah, well, there are many ways to do it...
[12:34] sustrik np, i'll think of something
[12:35] mato ok, another question...
[12:36] mato how much effort would it be to get SO_LINGER implemented?
[12:36] sustrik no idea
[12:36] sustrik you've seen how the shutdown code looks like
[12:36] sustrik one day at most for coding
[12:37] sustrik unspecified time to fix the resulting bugs
[12:41] mato sustrik: well, the problem is we have no clean "abort" path right now
[12:41] sustrik ack
[12:41] mato sustrik: in other words, say a 0MQ-based API sends off some messages
[12:41] sustrik i know what you mean
[12:41] mato sustrik: then some timeout hits, or whatever... I can't get rid of a socket with messages pending on it
[12:42] sustrik you can close it
[12:42] mato yeah, but it'll hang around forever
[12:42] sustrik it will be there but invisible to you
[12:42] mato and term will block
[12:42] sustrik right
[12:42] sustrik right
[12:42] mato plus the socket will reconnect, or whatever...
[12:42] sustrik sure
[12:42] sustrik :)
[12:42] mato so to complete the semantics, some kind of SO_LINGER or at least "get rid of this socket *now*" thingy is required
[12:43] mato otherwise it's not sane...
[12:43] sustrik ack
[12:43] pieterh mato: good news, I just tried git am on a mailbox file made by copy/paste of 'original email' in gmail and it works perfectly
[12:44] mato pieterh_: good for you :-)
[12:44] mato pieterh_: also there's some way to get emails out of gmail via IMAP
[12:45] mato pieterh_: so you could have a process where you move patches to apply into a "Patches to apply" folder
[12:45] mato pieterh_: then have a command line pipe that grabs that folder and shoves it into git-am
[12:45] mato pieterh_: no cut/paste involved...
[12:49] mato sustrik: anyhow, what about that test_poll? i pointed you to brian's tests which should be trivial to port to C++...
[12:49] sustrik sure, go on
[12:49] mato :-)
[12:50] sustrik :o)
[12:50] mato I was kind of hoping since you're mucking with the code that you'll do it
[12:50] mato but I see you obviously don't believe in tests :-)
[12:50] sustrik the problem is that it's not easy to check all the paths in the poll implementation
[12:50] mato start somewhere
[12:50] sustrik you have to generate internal events somehow etc.
[12:51] mato ok, i get it
[12:51] mato since sustrik says "it's too hard" :-)
[12:51] sustrik let's rather start with "what has to be tested"
[12:52] mato did you look at brian's test scripts at all?
[12:52] mato they're pretty good
[12:52] sustrik nope, where can i find them?
[12:52] mato I wrote you an email
[12:52] mato with the URL :-)
[12:52] mato sustrik:
[12:53] sustrik hm, there are no timeout used there afaics
[12:53] mato sure, but it's a start
[12:55] sustrik ok, i'll give it a try once i have some time free
[12:55] mato sure, if i have time i'll ping you and look at it myself if you've not started on it
[12:55] sustrik ok
[13:01] jason When using socket identity for durable sockets is there a way to query the sending socket to see what messages still haven't been received?
[13:01] pieterh jason__, nope
[13:19] drbobbeaty I'm using ZMQ 2.0.7 and ran into this error message on a ZMQ_SUB socket using the "epgm://" transport (OpenPGM) and wanted to know if anyone had seen this before: The error says:
[13:20] drbobbeaty (process:10408): Pgm-WARNING **: peer expired, tsi
[13:21] drbobbeaty The ZMQ messages didn't stop, but I didn't know what to make of the error.
[13:22] drbobbeaty As a side note, is there any targeted release of ZMQ that will incorporate the new OpenPGM with the better communication between OpenPGM and ZMQ? (Pieter H mentioned it when he was here giving a talk)
[13:40] mikko hmm
[13:40] mikko i was looking at adding ICC builds for zeromq
[13:40] mikko but it looks like it doesn't qualify for intel free tools
[13:40] sustrik mikko: why not?
[13:41] mikko
[13:42] mikko i am not sure if the second last question applies to me
[13:42] mikko or wait
[13:43] mikko does imatix charge for support? does that extend to people outside that organisation?
[13:43] mikko no idea
[13:43] sustrik i recall we had a free icc license once
[13:43] sustrik let me ask intel guys about it
[13:45] sustrik the text there seems to be nonsence
[13:45] sustrik with such restrictions noone would qualify
[13:55] mikko sustrik: true
[13:55] mikko would be nice to add sun studio as well
[13:55] mikko but that is as far as i know a free download
[13:56] sustrik mikko: yes, it would be nice
[13:56] sustrik it's up to you :)
[13:56] mikko i'll make it happen
[13:57] mikko i don't have internet at my new flat yet so might take longer
[13:58] sustrik no haste
[13:59] sustrik btw, how does hudson know when to rebuild?
[14:00] mikko it polls SCM every 15 minutes and builds if there are changes
[14:01] sustrik hm, jzmq was fixed this morning
[14:01] sustrik hudson shows last failure 14hrs ago
[14:01] mikko currently it polls only zeromq
[14:01] mikko the bindings are not being polled
[14:01] mikko i could add polling for individual bindings as well
[14:02] mikko currently it polls zeromq2 master and maint branches and builds if there are changes
[14:02] mikko all bindings are built as dependent projects
[14:03] mikko my initial thinking is that eventually it would do a lot of polling if everything polled
[14:07] mikko logging in allows you to configure / manually kick off builds
[14:07] mikko but can't really open it to everyone as people can execute arbitrary shell commands
[14:07] sustrik mikko: sure
[14:09] sustrik nice, jzmq/maint is now ok
[14:09] mikko i could disable erlzmq/maint build
[14:09] mikko as it will always fail
[14:09] sustrik yes, please
[14:09] sustrik erlzmq doesn't work with maint
[14:10] mikko 0%
[14:10] mikko ermm
[14:10] mikko zeromq perl needs work on both branches
[14:10] sustrik seen it
[14:10] sustrik lestrrat, summon!
[14:11] sustrik hm, it's past midnight in japan
[14:11] sustrik never mind
[14:32] CIA-14 zeromq2: 03Martin Pales 07master * r03a18c2 10/ src/clock.cpp :
[14:32] CIA-14 zeromq2: zmq::clock_t : return correct value in rdtsc() on solaris
[14:32] CIA-14 zeromq2: Function clock_t::rdtsc() now returns correct value when compiled
[14:32] CIA-14 zeromq2: with sunstudio 12 compiler.
[14:32] CIA-14 zeromq2: Signed-off-by: Martin Pales <> -
[15:04] mikko hmm
[15:04] mikko sun studio doesn't seem to do the trick out of the box
[15:17] mikko ah
[15:17] mikko got it
[15:19] mikko
[15:19] mikko this is what i get with sun studio
[15:21] delaney is there anyway for a XREP to work with http over tcp? i can get requests but how would i send them back given the request doesn't have an zmq envelope?
[15:22] mikko delaney: not without creating a wrapper
[15:22] cremes delaney: 0mq doesn't handle that... you should look at the mongrel2 project:
[15:23] delaney gotcha, thanks!
[15:28] mikko a lot of errors are elimanated by removing _GNU_SOURCE definition
[15:43] mikko mato: are you there? i got a build related patch / idea
[15:50] mato mikko: yes?
[15:50] mikko
[15:51] mikko you reckon this is OK?
[15:51] mikko it fixes the build on my sun studio installation
[15:51] mikko _GNU_SOURCE seems to define a lot of stuff in headers that's not supported by non-gnu compilers
[15:51] mato hmm
[15:52] mikko im yet to test ICC
[15:52] mato yeah, and i guess sun studio does not try to specifically be GCC-compatible
[15:52] mato ICC when I last looked tried quite a bit harder to be compatible with GCC where possible
[15:52] mikko ill send the patch to mailing-list after ICC tests
[15:52] mato Linux is generally quite forgiving of absence or presence-of feature flags so that should be fine
[15:53] mato yes please, test, etc....
[15:53] mikko
[15:54] mikko sun studio gives a couple of those as well
[15:54] mato hmm, dunno about that one, ask sustrik
[15:57] mikko ICC build fails
[16:00] sustrik hm, the worker routing should have C signature rather than C++ signature
[16:04] mikko sustrik: see src/epoll.cpp line 141
[16:04] mikko hmm
[16:04] mikko nm
[16:05] sustrik int n = epoll_wait (epoll_fd, &ev_buf [0], max_io_events,
[16:05] sustrik timeout ? timeout : -1);
[16:06] mikko yeah
[16:06] mikko trying to figure out why i'm getting a compilation error on that line
[16:06] mikko about signedness
[16:07] mikko epoll_wait takes an int as last param?
[16:08] sustrik yes
[16:08] sustrik it should be explicitly cast to int, yes
[16:09] sustrik otherwise it's uint64_t
[16:39] ptrb so it's possible for a ZMQ_SUB to connect() to more than 1 ZMQ_PUB, but can we instead connect() on a ZMQ_PUB to more than one ZMQ_SUB?
[16:44] sustrik sure
[16:54] mikko hopefully the patches came through
[16:58] mikko if not, they are here as well
[16:58] mikko now i gotta run, see you tomorrow (latest)
[17:01] ptrb sustrik: when I did that, the HWM behavior on the ZMQ_PUB wasn't respected
[17:03] sustrik mikko: cyl
[17:03] sustrik ptrb: what have you observed exactly?
[17:04] ptrb I still need to do the "simple complete reproducable example" step, but my experience was ZMQ_PUB with HWM=1 then connect()'ed to one or more ZMQ_SUB sockets (which were bind()ed) had the effect of HWM=0 (unlimited) when I started publishing shit
[17:06] sustrik you mean the memory grew without limit?
[17:07] ptrb correct
[17:10] sustrik ptrb: looks like a bug
[17:12] ptrb ok, let me make something reproducible, and if it reproduces, I'll file.. something.. somewhere
[17:29] sustrik mato: can you approve mikko's patch no. 0001
[17:30] sustrik oops, done, sorry
[18:00] mato sustrik: beer o'clock?
[18:11] delaney is down?
[18:14] mato delaney: yeah, it looks like the ISP has some kind of outage
[18:14] mato delaney: started about 15mins ago
[18:26] delaney i know there is a high water mark per socket, but is there one per id?
[18:26] delaney the api reference made a reference that makes it seem like there is
[18:27] delaney but i could find the way to set it
[18:46] cremes delaney: high and low water marks are per socket; the socket identity doesn't have anything to do with it
[18:47] delaney right but say you have 1000s of messages for a client and they don't come back?
[18:48] delaney is there a way to tell zeromq, you can clear all the messages for 'Lucy'
[19:04] cremes delaney: no
[20:30] delaney cremes: so how would you deal with an environment where a high volume of transient clients may disconnect ungracefully and leave stuff on the queue basically forever?
[20:31] cremes delaney: i should have written a bit more up above than "no" :)
[20:31] cremes if you have a publisher putting out say 100 msgs/s and you have subscribers coming in and out all of the time, the subscribers
[20:31] cremes who disconnect/close their sockets will cause the publisher to drop those messages
[20:32] delaney and in a xreq/xrep setup?
[20:32] cremes so internally you could look at it like each identity has its own queue
[20:32] cremes but that is not exposed to you at all; it all is handled by the library
[20:33] cremes xreq will block when it hits its high water mark
[20:33] cremes it *should* unblock if all subscribers drop their connections but i don't see that specifically documented
[20:34] cremes and if it doesn't, it's probably a bug
[20:34] cremes delaney: does that help?
[20:36] delaney heres the concrete issue, maybe that'll help. writing a game server, client are out of our control obviously and may disconnect with proper shutdown. we are using XREQ for the client and XREP for the server to allow bi-directional traffic. say we are sending messages to the client for a specific amount of time and if they don't respond with a least a ping they timeout. Now the server has a bunch of messages on its queue that'll never go away. And eve
[20:38] delaney there is nothing in the api doc to say how it drops
[20:38] delaney is it by time?
[20:41] cremes mato or sustrik can give you a definitive answer since they are deep into the source
[20:42] cremes however...
[20:42] cremes for xreq, queued messages should be dropped/deleted as soon as the 0mq socket detects that the other end is *gone*
[20:42] cremes are you seeing it behave differently?
[20:43] delaney yeah, i looking at the XREP side right now
[20:44] cremes is your server opening a xrep or xreq socket?
[20:44] delaney xrep
[20:44] delaney so the behavior is Drop
[20:45] delaney i just need to know if its not allowing more stuff to the transport queue (bad) or gets rid of the oldest message (good in my case)
[20:45] cremes the docs are pretty clear on this
[20:45] cremes i'll quote a small piece:
[20:45] cremes Likewise, any messages routed to a non-existent peer or a peer for which the individual high water mark has been reached shall also be dropped.
[20:45] cremes so if the peer disappears, those messages are dropped
[20:46] cremes even the ones that are already queued
[20:46] cremes make sense?
[20:46] delaney right... but how do you set an individual high water mark?
[20:46] delaney that was my initial question :P
[20:46] cremes you don't; it is global for the socket
[20:47] cremes so how many xreq sockets are going to be connected to the server's xrep socket?
[20:47] delaney if its global how do you have an individual one too?
[20:47] delaney hopefully in the 1000s
[20:47] cremes an individual what?
[20:48] cremes the HWM is global for each xrep socket but you can have different HWMs for different xrep sockets
[20:48] cremes is that what you wanted to know?
[20:48] delaney the confusing part is it says in that sentence there is an individual HWM but then you just said there is only a global one
[20:48] cremes each socket has its own HWM
[20:48] delaney OH, so its a global value per connection to the XREP?
[20:48] cremes right
[20:49] cremes XREP-1 can have HWM equal to 100
[20:49] delaney k
[20:49] cremes while XREP-2 has HWM set to 5500
[20:49] delaney but in my case there is only 1 xrep
[20:49] delaney and 1000 xreq connected to it
[20:49] cremes right
[20:50] delaney okay let me make an example real quick
[20:50] cremes sure
[20:51] cremes maybe this will help... let's say you have 3 xreq sockets connecting to your xrep
[20:52] cremes 2 of them are very fast while 1 is very slow, so it queues messages for the slow one
[20:52] cremes if the slow socket's queue hits the HWM, it will drop messages *only* for that one
[20:52] cremes the fast sockets will continue to get messages
[20:52] cremes so internally there is probably a separate message queue for each connected socket
[20:53] cremes the HWM is enforced separately for each connected socket
[20:53] cremes that behavior is pretty specific to xrep sockets
[20:55] delaney AH
[20:55] deri i thought the point of the hwm was just to get the average right so that chances are there will be enough overall headroom to keep things going. anyway, delaney do you think that a kernel parameter, assuming you are using the linux kernel, like inet_peer_maxttl could help here? maybe you can the offending connections can be clipped away so zeromq can notice it and free individual buffers
[20:56] delaney the wording is confusing... its not HWM per socket... its HWM per connection to a socket (since a socket have multiple connections)
[20:57] cremes delaney: right
[20:57] cremes if you have some wording that would be clearer, you should send in a documentation patch
[20:58] cremes the doc mostly covers the extreme cases... 1) all sockets hit HWM or 2) there are no peers
[20:58] cremes you are concerned with the case in the middle... some sockets are fine but a few hit HWM
[21:04] delaney if thats the case then awesome. yeah the docs actually scared me with the drop on a global queue, which is scary
[21:05] delaney whereas its a global max for each queue.
[21:09] lestrrat sustrik/mikko: I'm running YAPC::Asia today and tomorrow -- and I'm going to be burnt out a few days after that, so won't be doing anything during that timeframe :/
[21:56] sustrik lestrrat: good luck!