Tuesday February 22, 2011

[Time] NameMessage
[00:20] cremes on a multi-part send, if any part fails (rc != 0 and errno is set) then 0mq gives my user code owernship of the messages again, yes?
[00:20] cremes in normal circumstances, once you pass a message to zmq_send() it's 0mq's responsibility to call zmq_msg_close()
[00:20] cremes except in the case above, right?
[00:33] mikko cremes: nope
[00:34] mikko cremes: even if you zmq_send message you need to close it
[00:35] cremes mikko: sheesh, i've been doing this wrong for *months* then
[00:36] cremes so, does zmq_send() increment the "copy counter" on the zmq_msg_t and then decrement it (and release) when it's sent?
[00:37] cremes it must otherwise when i call close it would release the message before 0mq has a chance to transmit it
[00:37] cremes pls confirm if you can
[01:37] mikko cremes: sorry, was a away building stuff
[01:37] mikko cremes: you can close right after send
[01:38] mikko take a look at zguide for samples
[01:38] mikko
[01:38] mikko also, that might clear it up a bit
[01:40] mikko also, see the page: for zmq_msg_close
[02:09] jugg whoa! Since when did we get versioned api on the web? Nice! :)
[02:10] jugg oh, different domain. interesting.
[02:11] mikko jugg: been there a few days
[02:12] mikko and finally!
[02:12] mikko
[02:12] mikko centos rpms available
[02:12] mikko time to sleep
[10:40] pieterh good morning
[10:42] ianbarber morning
[10:48] pieterh ianbarber: what part of the world are you in?
[10:48] pieterh London? you seem to get around a lot for your "0MQ is the answer" talks :-)
[10:56] ianbarber yep, london
[10:56] ianbarber i was hoping to get to give it at confoo as well, but they went with a different talk in the end :)
[10:59] pieterh I was thinking of doing a small 0MQ event in Brussels later in spring
[11:00] pieterh April or May, when it's nicer weather
[11:03] ianbarber awesome, i think that would be fun
[11:05] ianbarber are you based in brussels then, or near by?
[11:07] pieterh I'm in Brussels, yes, so it's easy for us to organize something here
[11:08] pieterh There's a nice place in the center of town I used to hold conferences in
[11:09] pieterh Brussels is reasonably central IMO, and of course there's the beer...
[11:09] ianbarber surely there are some unused government buildings available? :)
[11:10] ianbarber yeah, brussels is a really nice place, and easy to get to on the train from everywhere as well for people that aren't keen on flying.
[11:10] pieterh you mean because one of our 7 governments is currently on extended holiday?
[11:10] pieterh lol
[11:10] pieterh ok, I'll set it up... excellent...
[11:11] pieterh I'm thinking, mix of workshops and project presentations
[11:12] pieterh people can go home in the evening, or stay overnight and socialise
[11:13] ianbarber i think that's sensible, though I would probably aim for a panel or discussion slot or two, just to to give it a less structured feel - I would imagine that the crowd will all be pretty good with the library so the chat will be as good as the talks
[11:17] pieterh So the idea is a lot of tables, chairs, refreshments, in a large room
[11:18] pieterh wifi
[11:19] ianbarber sounds good
[11:29] ianbarber btw, the clone example in the new guide chapter is excellent.
[11:31] pieterh ah, glad you like it
[11:32] pieterh do you think it makes sense to do all the design discussion first, and then the code later?
[11:32] pieterh these examples are going to be a lot larger than the earlier ones
[11:34] ianbarber yeah, i think that's tricky whenever you get to a more real world example - i mean if the reader had been paying attention then should pretty much get how to build it by that point as all of the blocks have been covered. I think it is going to be a big block of code to cover the whole client and server case, but I'm not sure there's that much that should split it up
[11:36] pieterh I mean, after you read the Clone discussion, do you want to see it worked out in code, or do you want to continue to the Harmony discussion?
[11:36] pieterh assuming that code is 20 pages long...
[11:36] pieterh (not one code block, but developed piece by piece)
[11:37] pieterh for me the only way to prove the design is running code
[11:37] ianbarber yeah, i would definitely prefer to see some code
[11:38] pieterh ok, then I'll switch back to the earlier structure... it'll be like the last Worked example in Ch3
[11:38] ianbarber yeah, i think that was a reasonable model
[11:39] ianbarber would it be worth maybe having these examples be in python or another scripting language, just to trim the size on-page some?
[11:41] pieterh that could be good, yes
[11:41] pieterh it solves one problem I have with C, the lack of containers
[11:41] pieterh I was thinking of using ZFL for these advanced cases
[11:41] pieterh but Python would be neater
[11:42] pieterh However... I still need to write them in C :-)
[11:42] pieterh For completeness' sake
[11:43] pieterh let's continue in C, which is the language of the API
[11:43] pieterh but we can produce versions of the Guide for every single language
[11:43] pieterh you want the Guide in Python? Not a problem!
[11:44] pieterh (any examples not translated will default to C then)
[11:45] pieterh since the source for examples is merged into the text at build time anyhow
[11:46] ianbarber that would be quite cool
[11:47] pieterh ok, it's a deal
[11:47] ianbarber on that note - is there a process for saying when new examples are ready for translation?
[11:48] ianbarber maybe just a mailing list ping that the code is there before the guide goes live
[11:48] pieterh hmm, I guess it involves tracking the git
[11:48] pieterh I'd rather not get into a release process for the guide
[11:49] pieterh there are only a couple of languages that people have translated systematically
[11:49] pieterh like PHP :-)
[11:49] stimpie I understood the goal of zeromq is becoming a kernel module, I have just read the interesting new part of the manual 'clone pattern' but I wander how this adds to a kernel level system?
[11:49] pieterh stimpie: it's a layer on top
[11:50] ianbarber pieterh: fair point :)
[11:50] pieterh ianbarber: yesterday I updated the C examples and text for 2.1...
[11:51] pieterh I'm not sure the PHP binding handles 2.1 even...
[11:51] ianbarber yeah, i was just making a note I should check the PHP ones
[11:51] ianbarber it's up to date as far as I know, I've been mostly using it against 2.1.0
[11:51] stimpie pieterh, ok clear enough.
[11:51] pieterh nice!
[11:51] pieterh hopefully these shifts will become rarer and rarer
[11:52] pieterh stimpie: there are lots of reusable patterns we can make on top of 0MQ, Ch4 is covering some of the reliability ones
[11:52] pieterh I think giving them names makes them easier to understand and reuse
[11:54] stimpie They are interesting patterns but it confuses me with what the scope of the zeromq project is
[11:55] private_meta hmm... I've been told that the c++ version of zmq uses void pointers because you can send something OTHER than char pointers as well, yet for the python version you can pretty much send a standard string in the send function. Doesn't that mean that this gives c++ functionality you can't capture with a similar python implementation?
[11:57] stimpie private_meta, the message content is up to the client. You could also serialize java objects which are pretty useless in a c++ client.
[11:58] private_meta Yeah, but it looks to me that for the python interface, in case I don't misunderstand it, the message content is narrowed down to strings
[11:59] ianbarber private_meta: underneath it's just a chunk of bytes - I would imagine that there is a pack function or similar that can pack anything into a string for python?
[12:00] private_meta I don't know, it's just that the send interface for python looks so convenient while with c++ you have to bitch around with zmq::message_t where you have to use memcpy or message_t.rebuild to get a simple string into a message
[12:02] ianbarber hmm, for C there's the little zhelpers script pieterh uses that provides some handy helper functions to hide some of that stuff, i don't know if there's a C++ equivalent
[12:03] pieterh stimpie: I'll make this clear in the text, thanks for pointing that out
[12:04] pieterh ianbarber: yes, some kind person made zhelpers.hpp
[12:05] pieterh private_meta: feel free to translate the zmsg code from C to C++, it'll give you what you want
[12:06] private_meta pieterh: uhm... it doesn't exist yte?
[12:06] private_meta *yet?
[12:06] pieterh private_meta: the feeling of pride and accomplishment you'll feel as you make it... will be better than steak salad and fries
[12:06] private_meta Well, not that hard, I don't quite like steak
[12:07] pieterh even better then... :-)
[12:07] pieterh general rule with 0MQ is, if there's something you think could work better, make it happen
[12:07] pieterh you can take the C code and wrap it as C++ very easily IMO
[12:07] private_meta I wouldn't know how
[12:08] pieterh you are working in what language?
[12:08] private_meta C++
[12:08] pieterh did you read the zmsg.c code yet?
[12:10] private_meta nope
[12:10] private_meta Well, isn't there a zmsg.cpp already?
[12:12] pieterh private_meta: please read both those files, then come back...
[12:13] pieterh you have three choices, when it comes to getting functionality in 0MQ (or any software)
[12:13] pieterh 1. pay for it and get it soon
[12:14] private_meta 2. don't pay and wait
[12:14] pieterh 2. wait until someone else makes it and shares it, then get it for free
[12:14] private_meta 3. do it yourself
[12:14] pieterh exactly
[12:14] pieterh :-)
[12:14] pieterh in this case I've made it really, really easy for you...
[12:14] private_meta seen an interesting venn diagram for that
[12:14] pieterh since you can literally take the C code (already designed as a class), wrap it with (I'd guess 20 lines of C++ code)
[12:14] pieterh and get what you need
[12:14] stimpie the file: examples/Java/ contains some garbage at the end
[12:15] pieterh stimpie: indeed, it seems chopped off...
[12:16] pieterh stimpie: ah, I see what you mean
[12:17] pieterh ok, fixing that, I found the original contribution
[12:17] pieterh stimpie: fixed, thanks!
[12:17] stimpie np
[12:26] private_meta just out of curiosity, don't have time at the moment, let's say i want to improve zmq by adding convenience methods for send/recv for c++, let's say for strings, would that be worth considering it?
[12:28] stimpie private_meta, what is wrong with send(msg.toBytes())?
[12:28] pieterh private_meta: it would not go into the core
[12:28] pieterh private_meta: if all you want is send/recv string, it's already in zhelpers.hpp
[12:29] pieterh // Convert string to 0MQ string and send to socket
[12:29] pieterh static bool
[12:29] pieterh s_send (zmq::socket_t & socket, const std::string & string) {
[12:29] pieterh zmq::message_t message(string.size());
[12:29] pieterh memcpy(,, string.size());
[12:29] pieterh bool rc = socket.send(message);
[12:29] pieterh return (rc);
[12:29] pieterh }
[12:30] private_meta oh, I've already seen that. By the way, why do you use memcpy there and not rebuild?
[12:38] private_meta argh... the disabled copy constructor is annoying, I can't even return a message_t object from a function
[13:08] pieterh private_meta: that code was written by Olivier Chamoux afair
[13:09] private_meta The example code carries that name as well
[13:10] private_meta According to a comment in the source it's to avoid "shared messages", which seems to be a somewhat valid argument if you tried to use it for that, but it's annoying if you want to return a message by a function
[13:11] pieterh sure
[15:24] CIA-21 zeromq2: 03Martin Sustrik 07master * r43e8868 10/ (24 files):
[15:24] CIA-21 zeromq2: Added explicit error message in case of memory exhaustion
[15:24] CIA-21 zeromq2: Signed-off-by: Martin Sustrik <> -
[16:00] amacleod Are there any plans to have language-native ZMQ libraries, rather than wrapping C++ libs?
[16:01] amacleod I guess maintaining parallel language-native libs would be a much bigger maintenance load.
[16:08] pieterh amacleod: it's a lot of work unless there's a real payoff, e.g. languages that can't link to C++
[16:09] pieterh you would not reach a similar level of performance and functionality
[16:09] pieterh it's been discussed, we would need to document the wire level protocols properly first
[16:10] amacleod pieterh, hm, yeah, good point. And I guess the hassle of linking, for example, JNI libraries in Java, is pretty much a one-time configuration thing.
[16:10] pieterh plus you always get the latest/greatest 0MQ, etc.
[16:10] amacleod pieterh, well, documenting the wire level protocols sounds like a good idea anyway. :-D
[16:10] pieterh yes, when someone actually wants it... :-)
[16:11] amacleod pieterh, by the way, where can I look at the new router/dealer example you made?
[16:11] pieterh amacleod: hang on, I'll rebuild the Guide...
[16:11] amacleod Thanks.
[16:16] pieterh hmm, Wikidot seems to cache the old text for a while...
[16:17] pieterh oops, error in my upload robot, it's sending the content to the wrong place...
[16:17] amacleod Silly robot.
[16:18] pieterh ok, here:
[16:19] ljackson pieterh, got that code working last night, thx for your help
[16:19] pieterh ljackson: np
[16:19] ljackson pieterh, silly mistake of not de-ref on the work/clients sockets before sending to the queue device
[16:19] ljackson odd that the api took that tho
[16:20] pieterh ljackson: what language binding?
[16:20] ljackson who maintains the c++ api for zeromq ? Maybe I could extend to accept socket pointers and ask for a pull request ?
[16:20] pieterh ah, yes, I'm sure the maintainers will welcome contributions
[16:20] amacleod "Worked Example: Inter-Broker Routing", right?
[16:20] pieterh amacleod: "Asynchronous Client-Server"...
[16:20] pieterh reload that page
[16:21] amacleod Ah, ok.
[16:21] amacleod Looks like it renders as #toc50 for me.
[16:22] pieterh caching issue perhaps
[16:22] amacleod Yeah.. I think I'm still getting the old version.
[16:22] pieterh do you see figure 46 "Asynchronous Client Server" ?
[16:22] amacleod Yeah.. I see the diagram.
[16:22] pieterh ok, then that's all good
[16:23] amacleod The next code sample is router-to-router, though...
[16:24] pieterh does it not show the asyncsrv example?
[16:24] pieterh ah, diagrams are accurate but text is out of date...
[16:24] pieterh reload, reload, reload!
[16:24] amacleod Nope. It's a little incongruous, actually... :)
[16:24] amacleod aha.. there it is!
[16:25] pieterh enjoy, amacleod, and let me know if it's helpful
[16:25] pieterh it was quite fun making this pattern
[16:25] amacleod Sure thing. :)
[16:27] amacleod Could the client_task use blocking recv rather than polling, or is the polling crucial?
[16:28] pieterh amacleod: if you want to send a mix of requests and replies, it can't block on recv
[16:28] pieterh you can't have a separate thread doing the receiving
[16:29] pieterh it can use a simpler poll than the one I made, that's to ensure requests are sent on time
[16:30] pieterh brb, lunch...
[16:34] CIA-21 zeromq2: 03Martin Sustrik 07sub-forward * r977f5b7 10/ (5 files):
[16:34] CIA-21 zeromq2: Trie-based matcher (ptrie_t) implemented.
[16:34] CIA-21 zeromq2: Signed-off-by: Martin Sustrik <> -
[17:31] cremes is there any technique for detecting a slow subscriber? e.g. check queue sizes on netstat or something?
[17:32] cremes i have a pub socket that has a dozen or so subscribers; my memory usage slowly climbs even though i don't have any leaks
[17:32] cremes i now suspect a slow subscriber isn't pulling stuff off the queue quickly enough and it's backing up at the publisher (HWM is default)
[17:44] pieterh cremes: there's zmq_getsockopt (..ZMQ_BACKLOG)
[17:46] cremes is that really appropriate? that just controls the queue of initial connects/binds
[17:46] cremes it doesn't have anything to do with message queue length, right?
[17:46] pieterh oops
[17:46] cremes :)
[17:47] pieterh so there's no way to know what's happening at the level of individual subscribers...
[17:47] pieterh do you number your messages?
[17:47] cremes no but they do get a timestamp
[17:47] nooob you might want to setup a connection from the subscriber back to the sender
[17:48] cremes so they are sequential
[17:48] pieterh ok, even better
[17:48] pieterh in the subscriber, check how old incoming messages are
[17:48] pieterh if you exceed X seconds, send an alert to your system console
[17:48] nooob there was a pattern like that in the guide
[17:48] cremes hmmm, that doesn't seem like it would help unless i misunderstand how the pub socket queueing works
[17:48] cremes there is a separate outgoing queue for each subscriber on a pub socket, yes?
[17:49] pieterh cremes: if your pubsub system is stable, subscribers will get messages with predictably low delays
[17:49] cremes so fast subscribers would have a small or empty queue while my slow guy would have a large queue
[17:49] pieterh it's running over TCP?
[17:49] cremes yes
[17:49] pieterh even over PGM...
[17:49] pieterh slow subscribers will by definition :-) get messages 'too slowly'
[17:50] pieterh timestamp checking should do it
[17:50] cremes i will check that out
[17:50] pieterh i like the pattern, will try a quick implementation
[18:12] cremes pieterh: question...
[18:12] pieterh cremes: shoot...
[18:12] cremes i wrote an example where i have a single publisher that connects to a forwarder device
[18:12] cremes it publishes as fast as it can
[18:12] cremes there are no subscribers connected to the device
[18:12] cremes i see memory growing rapidly; is that expected?
[18:12] pieterh depends...
[18:13] pieterh running on the same box?
[18:13] cremes yeah
[18:13] pieterh one core?
[18:13] cremes dual quad, 16gb memory... beefy box
[18:13] pieterh no, then it's not expected
[18:13] cremes ok
[18:13] pieterh is the memory growing in the publisher or in the forwarder?
[18:13] cremes i'm going to try and replicate this with the C forwarder
[18:13] cremes it grows in the forwarder
[18:14] pieterh how about CPU usage?
[18:14] cremes high
[18:14] pieterh ok, try this
[18:14] pieterh - publish 20M messages, then pause for 10 seconds
[18:14] pieterh - repeat
[18:14] cremes ok
[18:14] pieterh see if memory usage remains high during that pause
[18:14] pieterh if so, forwarder is broken somehow
[18:15] pieterh if it comes down, it's just queuing bizarreness
[18:28] sustrik with a single incoming streams and many outgoing streams you would expect the latter to be bandwidth-bound and thus slower than the former
[18:28] sustrik consequently, in congestion situations you should expect messages queueing in the forwarder
[18:38] pieterh sustrik: yes, but here there are no subscribers on the forwarder...
[18:39] sustrik ah
[18:40] sustrik i've missed that
[18:41] sustrik then the queue's main loop must be slower then message receiving in its SUB socket
[18:41] sustrik in any case, when doing congestion tests
[18:41] sustrik use hwm
[18:41] sustrik otherwise you are inevitable going to run out of memory
[18:52] pieterh hmm, forwarder should run at least as fast as publisher in this case...
[18:52] pieterh let's see if cremes comes back with more data
[18:52] cremes when there are no subscribers attached to the forwarder, calls to zmq_send() should just close the msg and drop it, right?
[18:53] pieterh cremes: ack
[18:53] pieterh did you try that run/pause/run/pause ?
[18:55] cremes yes, the memory did not shrink
[18:56] pieterh did _not_ shrink?
[18:56] pieterh then it's a real leak
[18:56] cremes i need to try this with the C forwarder device that comes with the lib
[18:57] cremes and see if it behaves the same
[18:57] pieterh yes
[18:57] cremes i'll open a ticket if i see it replicated
[18:57] pieterh i just reviewed the code for that, there is zero chance it leaks memory
[18:57] cremes right, it's so simple there is *no* way
[18:57] cremes <sigh>
[18:57] pieterh if there's a leak it's either in the pub socket (unlikely), or the binding (possible), or it's a heap artifact (plausible)
[18:58] pieterh sometimes the heap does not shrink immediately when memory is freed
[18:58] pieterh try setting a HWM of, say, 100K on the publisher and see what effect that has
[18:58] pieterh s/publisher/forwarder/ sorry
[18:59] pieterh on the frontend socket, initially
[18:59] cremes yes, i'll keep at it
[19:05] amacleod pieterh, my problem from yesterday does seem to be a threading issue. When I changed my test harness (which simulates a client) to use a separate context from the server, the messages got through correctly in both directions.
[19:06] amacleod However, now I'm seeing "Assertion failed: pending_term_acks" when closing the socket.
[19:06] pieterh amacleod: ah, good... I've also been using separate contexts for each 'task' when it simulates a separate process
[19:06] pieterh that's not a 0MQ assertion...
[19:07] amacleod It lists it as socket_base.cpp:690
[19:07] amacleod It's hard for me to debug, because it doesn't generate a Java exception, it just kills the process.
[19:07] pieterh what version of 0MQ are you on?
[19:07] amacleod Might be jzmq, hmm..
[19:07] amacleod 2.0.10
[19:08] pieterh any problem upgrading to master?
[19:08] pieterh there are a lot of fixes since 2.0.10
[19:09] pieterh sustrik: do you keep any log of major changes made apart from the git commit history?
[19:09] amacleod Depends--how stable is 2.1? I think we chose 2.0.10 because we wanted not to use the "development" branch.
[19:10] pieterh amacleod: for various reasons, the git master is significantly more stable than the 'stable'
[19:10] pieterh ... than the 'stable' 2.0.10 release
[19:10] amacleod Hmm. :) Might be worth the switch, then.
[19:11] sustrik btw, the assertion is in code that was complately rewritten in 2.1
[19:11] amacleod sustrik, good to know.
[19:11] pieterh we're in the slow process of making a formal release for 2.1.11
[19:11] pieterh sustrik: the one thing that will be problematic is making release notes
[19:11] sustrik pieterh: what log?
[19:11] sustrik why so?
[19:11] pieterh i was afraid of that...
[19:11] sustrik it's automatic
[19:12] pieterh what's automatic is a dump of every commit message
[19:12] sustrik yup
[19:12] amacleod
[19:12] pieterh that is not release notes
[19:12] pieterh nope, NEWS is painfully made by hand
[19:12] sustrik ah, i'll go through the commit messages and write the release notes
[19:12] sustrik not a problem
[19:12] pieterh excellent...!
[19:12] pieterh then IMO we're ready to break off the branch...
[19:13] pieterh there were zero issues porting the Guide examples to 2.1.11
[19:13] sustrik there are some pgm problems being experienced with head currently
[19:13] pieterh that's ok, we'll have at least a couple of weeks to stabilize
[19:13] amacleod In the mean time, if we assume I cannot presently upgrade from 2.0.10, any suggestion on where I should look to prevent this assertion from failing?
[19:13] pieterh the key now IMO is to get a formal package out so folks like amacleod use the current master, not old code
[19:14] pieterh sustrik_, so I'm going to create a separate git but this is somewhat experimental
[19:14] amacleod As far as I know, jzmq is set up to handle both 2.0.10 and 2.1.x, so upgrading shouldn't be too painful.
[19:15] sustrik the only thing preventing branching off right now is the pgm problem
[19:15] pieterh amacleod: at least, try on the 2.1.x master so you know whether it works better or not
[19:15] sustrik i can't help with that :|
[19:15] sustrik so we'll have to wait while someone fixes it
[19:15] pieterh sustrik_ steve's traveling right now
[19:16] pieterh we'll pipeline it all
[19:16] sustrik or, alternatively, you can branch from a historic version
[19:16] pieterh pgm fixes can come in after we branch
[19:16] pieterh it's good to have known issues so we can prove that the process works
[19:16] sustrik right, you can branch a backport the ix
[19:16] sustrik fix
[19:16] pieterh yes
[19:17] pieterh do you want to push any code before I clone the repo?
[19:17] sustrik no
[19:17] sustrik do it now
[19:17] pieterh okay... going for it :-)
[19:17] sustrik amacleod: upgrading should not be painful
[19:17] sustrik trying to fix the problem in 2.0.10 is going to be painful
[19:18] amacleod Hm, I think you are right.
[19:18] sustrik it's a problem with shutdown subsystem
[19:18] sustrik which was pretty creaky in 2.0.x
[19:18] sustrik it was one of the major reasons for making 2.1
[19:18] amacleod So, which version should I get? The 2.1.0 package from the front page?
[19:19] sustrik yes
[19:24] pieterh hmm, anyone know how to clone a github repository and _keep_ it in the same organization?
[19:24] pieterh I'm sure I'm missing something obvious...
[19:33] pieterh hmm, git push -u, obviously... duh
[19:40] pieterh sustrik_: okaaay, I think that's done... we now have
[19:40] pieterh just the master branch and no tracking between the two gits, I hope
[19:43] pieterh sustrik_: next step is to make release notes (you) and then packages (me)
[19:49] sustrik ok
[19:49] sustrik let me see
[19:53] pieterh I suggest we edit the NEWS together at, then commit to (the real) master
[19:54] sustrik yes
[19:54] sustrik wait a sec
[19:56] pieterh I'd like to make 2-3 release candidates over 2-3 weeks
[19:56] pieterh
[19:57] pieterh I think we have enough momentum to get rapid feedback on releases
[19:58] pieterh the one problem I see with this approach is we don't get a branch for the stable release, automatically, in the real git
[19:59] pieterh s/branch/tag/
[19:59] pieterh mato's going to cut my throat...
[20:05] sustrik what's the problem?
[20:05] sustrik it's DCVS
[20:05] sustrik so it shouldn't matter whether it's a branch or a separate repo
[20:05] sustrik DVCS*
[20:05] sustrik pieterh: still there
[20:05] pieterh ok, allow for the fact that any gross git manipulations leave me nervous
[20:05] sustrik ?
[20:06] pieterh I find the tool 10x too complex and dangerous, so...
[20:06] pieterh I'd much prefer to work with a copy of the repository (not a clone or a fork)
[20:06] pieterh advantages: anyone can imitate this, make releases, safely
[20:06] sustrik you mean you've created a new repo?
[20:06] pieterh yes
[20:07] sustrik i.e. deleted the entire history?
[20:07] pieterh
[20:07] pieterh I was able to copy the master branch
[20:07] pieterh which is fine for my purposes (stabilization)
[20:07] sustrik github seems to be dead
[20:07] pieterh it wasn't me!!!!!
[20:07] sustrik :)))
[20:08] pieterh i'd *much* prefer to work with readonly access to the real repository
[20:08] sustrik let's move to piratepad now
[20:08] pieterh but problem is, separate repository breaks the neat history of version tags in the real repo
[20:09] pieterh piratepad seems unreliable right now...
[20:09] sustrik yuck
[20:09] cremes ok, i can reproduce the memory "leak"; it is somewhat complicated and it's not a real leak... it's more like a DDOS
[20:09] sustrik anyway, there are just 2 items
[20:09] sustrik ZMQ_RECONNECT_IVL_MAX
[20:09] sustrik and
[20:09] sustrik ZMQ_RECOVERY_IVL_MSEC
[20:09] sustrik + the bug fixes
[20:10] sustrik you can get description for both from zmq_setsockopt(3)
[20:10] pieterh indeed
[20:11] pieterh I'll do that, np
[20:11] sustrik it's easy this time
[20:11] pieterh "An automatic restart of Piratepad will occur in -31945 seconds."
[20:11] pieterh good god
[20:11] sustrik they are pirates, you know
[20:11] pieterh We killed Piratepad...
[20:11] sustrik not exactly precise
[20:11] pieterh Well, Google bought etherpad and then shut it down...
[20:12] sustrik anyway
[20:12] sustrik anything else missing for making the stable branch?
[20:13] pieterh not for me
[20:13] sustrik good
[20:13] pieterh but did you understand my concern?
[20:13] sustrik which one?
[20:13] pieterh we're breaking the master/maint process
[20:14] pieterh smashing it into little pieces
[20:14] sustrik yes
[20:14] sustrik the problem is the maint was not really well maintained anyway
[20:14] pieterh since I personally dislike that process and find it complex and bizarre, I'm happy with that
[20:15] pieterh I'd like a process that takes 30 seconds to fully understand
[20:15] pieterh on a cold monday morning before coffee
[20:15] sustrik i think it's the same now
[20:16] sustrik patches being passed between branches
[20:16] sustrik ah, one point
[20:16] sustrik have a look how version numbers are to be maintained
[20:16] sustrik it's important not to mess that up
[20:17] pieterh I still don't remember how changes flow from maint to master and vice-versa
[20:17] pieterh so if this works, what we'll get are a series of stand-alone gits, zeromq2-1, zeromq2-2, zeromq3-0, each with their own maintainer(s)
[20:17] pieterh and pull requests with patches between them
[20:17] pieterh each being copied off the real repo as that heads towards stability
[20:18] sustrik it works both ways
[20:18] pieterh the version numbering works properly IMO
[20:18] amacleod :-( Now jzmq tests are hung forever in Context.finalize
[20:18] pieterh next releases will be 2.1.1, 2.1.2, 2.1.3...
[20:18] sustrik 1. backporting patches from master to maint(s)
[20:18] sustrik 2. upstreaming patches from maint(s) to master
[20:19] pieterh I'd assume 2.1.3 will be the stable one if people find bugs
[20:19] pieterh yes
[20:19] pieterh I'd veto that
[20:19] pieterh upstreaming patches from temporary clones of maint(s) to master
[20:19] pieterh at least in my view... for now...
[20:19] pieterh anyhow, yes, if it makes sense
[20:19] pieterh DCVS as you said
[20:20] pieterh each repo is largely independent and we work by protocol
[20:20] sustrik well, if you bugfix a patch for maint, you want to upstream it
[20:20] sustrik only thing it means is sending it to the ML
[20:20] sustrik no big deal
[20:21] pieterh if the bug is in the current master, I'd first want to get it fixed there
[20:21] pieterh because until it's gone past that filter, there's no guarantee it's sane
[20:21] pieterh if the bug is in old code, then there's no upstreaming anyhow
[20:21] sustrik it's up to you
[20:22] sustrik so you're going to reject patches to maint
[20:22] sustrik right?
[20:22] pieterh presumably, yes
[20:22] sustrik ok
[20:22] pieterh or at least treat them with scepticism
[20:22] pieterh the only person I trust for my patches is you
[20:22] sustrik ok, now for the versioning
[20:22] pieterh especially, especially on a stable release people rely on for production
[20:22] pieterh versioning... ok
[20:23] sustrik the process looks like this:
[20:23] pieterh s/you/the owner of the code in question/
[20:23] sustrik when about to make a release:
[20:23] sustrik 1. update the version numbers in zmq.h
[20:23] sustrik 2. make a release
[20:23] sustrik 3. bump the version number
[20:24] pieterh right,
[20:24] sustrik yes
[20:24] sustrik exactly the same process
[20:24] pieterh this would happen on the release git only
[20:24] sustrik are you saying you are not going to version the maint branch?
[20:25] sustrik that's nonsense
[20:25] pieterh as far as I'm concerned, there is no maint branch
[20:25] sustrik maint repo
[20:25] sustrik no versions?
[20:25] pieterh ok, the terms are vital here
[20:25] pieterh let's call it the release git
[20:25] pieterh as compared to the master git
[20:25] sustrik whatever
[20:25] sustrik what about versions?
[20:25] pieterh I'm going to tag the release git properly
[20:26] pieterh and version it properly, exactly as now
[20:26] sustrik change zmq.h as well
[20:26] pieterh exactly the same, but it all happens on my master branch
[20:26] sustrik otherwise users won't be able to find out what the version is
[20:26] sustrik (zmq_version(), version macros
[20:26] pieterh yes, all that is unambiguous
[20:27] pieterh the versioning is sane, obvious, necessary
[20:27] sustrik right, 3 steps
[20:27] sustrik tag
[20:27] sustrik release
[20:27] sustrik change version
[20:27] pieterh yes
[20:27] sustrik that's it
[20:27] sustrik ok
[20:28] pieterh and the final step is update the version in the development master
[20:28] sustrik so, i'm going to bump master to 2.2
[20:28] pieterh so it goes from 2.1.1 to 2.2.0
[20:28] sustrik yes, i'll do that now
[20:28] pieterh very excellent...!
[20:28] pieterh ok, I'm going to make a release now
[20:28] pieterh no time like the present
[20:29] sustrik cremes: sorry for the delay
[20:29] sustrik have you found out what's the problem?
[20:29] pieterh amacleod: sounds like there's a socket left open
[20:29] pieterh amacleod: sorry also for the delay
[20:30] amacleod Terminating a context will not finish if a socket is left open?
[20:31] pieterh amacleod: not in every case, but in some cases
[20:31] pieterh this is new behavior in 2.1
[20:31] pieterh a side-effect of zmq_term's determination to flush sockets safely
[20:32] pieterh if the socket is held by the same thread as calls zmq_term, it deadlocks (or something)
[20:32] amacleod Hmm, yeah. It did look as though the Java finalizer was waiting on some things.
[20:36] pieterh sustrik: s/until/untill/ in zmq_setsockopt.txt...
[20:37] sustrik would you fix it in maint and upstram it to master or should i fix it in master and you'll backport it to the maint :)
[20:38] pieterh For a one-word fix... let me clone the git and send you a pull request with regression test case
[20:38] pieterh yeah :-) process!
[20:40] sustrik i love it :)
[20:41] pieterh what do you call sys://log? do we want to document that briefly?
[20:41] pieterh is it an internal transport?
[20:41] cremes sustrik: yes, i've discovered a problem; i am just finishing a second test of my hypothesis.... i'll share it in about 10m
[20:41] sustrik good god, everything is shutting down
[20:41] pieterh someone dropped the Internet!
[20:42] sustrik i suspect i'm in lybia
[20:42] pieterh I knew they should have left it safe with the Elders of the Internet at the top of Big Ben
[20:43] pieterh sustrik: funny, a friend of mine was supposed to be doing a rally in Libya just about now...
[20:43] sustrik let's hope he's not there really
[20:43] pieterh indeed, I asked him like two weeks ago, "you going before or after the revolution?", he lol'd
[20:44] fbarriga hi everyone
[20:44] fbarriga I've a little doubt
[20:45] sustrik yuck, github is totally braindead
[20:45] sustrik repos as well as the site
[20:45] pieterh fbarriga: hi, what's up?
[20:45] fbarriga hi pieterh , playing with the socket
[20:46] fbarriga probably this deserve a RTFM reply..
[20:46] fbarriga if I do this: zmq_msg_init(&msg); zmq_recv(sock, &msg, 0); zmq_recv(sock, &msg, 0); zmq_msg_close(&msg);
[20:46] fbarriga i'm leaking memory ?
[20:46] cremes sustrik: apparently i can reproduce a memory ddos with 0mq sockets; here's how (there may be a simpler way but this is how i did it)
[20:46] fbarriga every time zmq_recv allocate new memory ?
[20:47] cremes setup a standard forwarder
[20:47] pieterh fbarriga: yes, that works fine
[20:47] fbarriga and is mandatory to init the msg before receive the data ?
[20:47] pieterh yes
[20:47] cremes connect a publisher to it and let it go crazy broadcasting (bigger messages are better, e.g. 20k+)
[20:47] cremes so far, no leak
[20:47] cremes now start adding subscribers
[20:47] fbarriga umm, but if I have a while(true) I can't put the init outside to avoid overhead ?
[20:48] cremes again, no real leak except for a little queueing if the subscribers are slower than the pub
[20:48] pieterh fbarriga: close and init are a pair
[20:48] cremes i adjusted my publisher so that it would not overrun the subscribers
[20:48] cremes now here's where the leak comes in....
[20:49] cremes modify the subscribers so that they only stay connected for a few seconds; they close their sockets, and then immediately reconnect to the forwarder
[20:49] fbarriga so I can do this: zmq_msg_init(&msg); while (true) { zmq_recv(sock, &msg, 0); } zmq_msg_close(&msg); ?
[20:49] cremes they continue to do this for at least 15 minutes
[20:49] cremes what i see happening is a lot of sockets go into TIME_WAIT
[20:49] cremes i suspected that the socket ZMQ_LINGER option (defaults to indefinite) was holding onto the messages
[20:50] cremes but i just ran a test where i set it to 0 and it still occurred
[20:50] cremes so i'm thinking that as a socket sits in TIME_WAIT, its queue is still active; as new messages arrive from the publisher they are added to the queue
[20:50] sustrik do the subscribers have identities?
[20:50] cremes when the TIME_WAIT expires, the queued data doesn't go away
[20:51] cremes they are using the default random id's; i am not overriding with my own
[20:51] pieterh fbarriga: nope, it'll leak memory
[20:51] pieterh if you have positive proof that init/close are slow, send it along
[20:51] cremes when i stop the publisher, the forwarder's memory stops growing
[20:51] pieterh if you don't then please don't optimize unnecessarily, it's pointless
[20:52] cremes when i disconnect the publisher and *all* subscribers, the forwarder's memory footprint remains the same (big)
[20:52] cremes so if i left it running (like for an overnight test) it would exhaust all memory resources by morning
[20:52] sustrik cremes: afaik processes don't return allocated memory to the OS
[20:52] cremes and this is exactly what has been plaguing me the last week or so (since i found that last bug!)
[20:53] pieterh cremes: does the forwarder memory grow again from that high point ?
[20:53] pieterh i.e. leak vs. cached heap in process...?
[20:53] cremes pieterh: it only grows again if i turn the publisher back on
[20:53] pieterh but does it start growing again immediately?
[20:54] cremes let's see....
[20:54] pieterh if it's cached heap memory then after a few cycles it will stop growing
[20:55] fbarriga pieterh, but hypothetically calling those function will slow down the function
[20:55] pieterh fbarriga: doing hypothetical optimization is a waste of time, please don't
[20:55] cremes pieterh: no, it doesn't start growing again right away
[20:56] pieterh cremes: then it's cached heap memory, not a leak
[20:56] pieterh fbarriga: if you wish to actually know, try calling zmq_msg_init() and zmq_msg_close() in a loop, 1 billion times
[20:56] pieterh measure how many seconds it takes
[20:56] cremes so this is just due to the heap growing really large and it never gives the buffer space back to the OS even if the buffer is empty
[20:57] cremes ?
[20:57] pieterh cremes: yup
[20:57] CIA-21 jzmq: 03Gonzalo Diethelm 07master * ra791958 10/ (.gitignore Changes by jrideout to allow the build to succeed with autoconf 2.59. -
[20:57] pieterh if you have a normally peaky stream, it'll hit some max and stop growing
[20:57] pieterh there may be a system call to return memory to the OS
[20:57] pieterh but it's always virtualized anyhow, not real RAM, afaik
[20:58] pieterh sustrik: is the inproc hwm+swap change worth mentioning in the release notes?
[20:59] sustrik dunno
[20:59] sustrik it's more of a fix
[20:59] sustrik than a new feature
[21:01] pieterh basically the hwm is shared between peers, right?
[21:03] sustrik it's sum of the two hwms
[21:05] pieterh the peers share a single buffer...
[21:08] pieterh sustrik: is the NEWS for 2.1.1...
[21:08] pieterh I've gone through the git log
[21:10] sustrik you've killed the edupad as well!
[21:11] sustrik isn't it a ipv6 day today?
[21:15] pieterh sustrik: let's do that tomorrow... update your NEWS, send me a pull request
[21:15] pieterh when we get that working, I'll make the release
[21:16] sustrik ok
[21:19] cremes i'm pretty convinced there is still a leak here... new information
[21:19] cremes slowing the publisher to every 10ms and setting HWM for all sockets to 1 and LINGER to 0
[21:19] cremes it still slowly grows in size
[21:20] cremes there shouldn't be *anything* getting queued or buffered here
[21:20] sustrik ack
[21:20] sustrik can you provide the test program?
[21:20] cremes the "cause" appears to be the subscribers connecting/disconnecting
[21:21] sustrik yes, looks like
[21:21] cremes sustrik: i can but it will be in ruby; i'll describe the steps so it could be replicated in another lang
[21:21] pieterh hmm, sustrik, maybe you really are in Libya...
[21:23] pieterh sustrik: does the HWM on a PUB socket affect each subscriber queue (over TCP)? so potentially N x HWM?
[21:23] sustrik cremes: ok
[21:23] sustrik yes
[21:23] sustrik pieterh: yes
[21:24] pieterh thx
[21:39] sustrik pieterh: still there
[21:39] sustrik ?
[21:39] sustrik one more important thing...
[21:40] pieterh yeah, still here
[21:41] pieterh hmm?
[21:41] sustrik given that there are two separately maintained repos now
[21:42] sustrik we should keep the patches completely clean
[21:42] sustrik i.e. no whitespace patching
[21:42] sustrik no mixed patches
[21:42] sustrik like fixing a bug and correcting an unrelated typo in a single patch
[21:42] sustrik etc.
[21:42] pieterh you mean so we can re-port the history anywhere?
[21:42] sustrik otherwise we'll end in dependency hell
[21:43] sustrik yes
[21:43] pieterh right
[21:43] sustrik keeping patches mobile
[21:43] pieterh so the way a pull request works is you specify a commit (or multiple commits)
[21:43] pieterh this may not work all the time
[21:44] sustrik ?
[21:50] pieterh clearly if we can keep patches fully mobile on the development master, that's ideal
[21:50] pieterh makes it fairly easy to send them anywhere
[21:50] pieterh we may want to add more process support, e.g. something greppable in the log
[21:50] pieterh in the worst case we can apply patches by hand
[21:51] pieterh well, some situations I can see happening
[21:51] pieterh - bug hits in 2.1 but that code has changed in 2.2
[21:51] pieterh - patch in 2.2 doesn't apply cleanly in 2.1
[21:52] pieterh - 2.1 needs only part of a patch made to 2.2
[21:52] pieterh etc.
[21:55] sustrik git has support for that
[21:55] amacleod I am getting a different assertion error from jzmq tests now: Unknown error 156384765, rc == 0 (src/socket_base.cpp:243)
[21:55] sustrik it's called merging
[21:55] sustrik so you apply the patch
[21:55] sustrik you get merge conflicts
[21:55] sustrik you solve them
[21:55] sustrik you commit the patch
[21:55] sustrik it's still the same patch, but it have been backported
[21:56] sustrik the backport iirc is visible as a separate commit
[21:57] sustrik amacleod: what version?
[21:57] pieterh sustrik: ok, we'll play it through
[21:58] amacleod sustrik, v2.1.0, from tarball on
[22:00] amacleod Looks like that line reference is in zmq::socket_base_t::getsockopt
[22:01] sustrik amacleod: yes, it's a bug
[22:01] sustrik it's solved in head
[22:01] amacleod Ok. Guess I should get head, then, eh?
[22:01] sustrik pieterh: when are you going to release 2.1.1?
[22:02] sustrik amacleod: either get the head or wait for 2.1.1
[22:02] amacleod I will try with head.
[22:10] pieterh sustrik: please review and add to your master if you're happy with it
[22:10] pieterh date is set for tomorrow
[22:10] pieterh send me a pull request for that change, and I'll process it, and then make the release
[22:10] pieterh tomorrow morning after coffee
[22:10] pieterh does that work?
[22:10] pieterh I'd like to split off "Making a Release" from the "Source Code" page, it's two separate topics
[22:10] pieterh and this lets me document how to make a release, as a naive and not very expert person
[22:10] pieterh which is ideal
[22:11] pieterh ok, away
[22:12] pieterh g'nite everyone
[22:12] amacleod Nite Pieter.
[22:15] sustrik what change?
[22:16] sustrik edpad times out here btw
[22:16] sustrik good night anyway
[22:21] amacleod Do pollers need to be closed or finalized in some way before terminating the context?
[22:28] rphillips is there a way to set a zeromq socket up to timeout on a zmq_send?
[22:28] ianbarber you could poll for a socket being writeable, with a timeout
[22:32] rphillips Do I need to check for writability after the connect or after the send?
[22:32] rphillips the send is blocked already
[22:44] sustrik before the send