ZeroMq IRC Log

Tuesday February 22, 2011

[Time] Name	Message
[00:20] cremes	on a multi-part send, if any part fails (rc != 0 and errno is set) then 0mq gives my user code owernship of the messages again, yes?
[00:20] cremes	in normal circumstances, once you pass a message to zmq_send() it's 0mq's responsibility to call zmq_msg_close()
[00:20] cremes	except in the case above, right?
[00:33] mikko	cremes: nope
[00:34] mikko	cremes: even if you zmq_send message you need to close it
[00:35] cremes	mikko: sheesh, i've been doing this wrong for months then
[00:36] cremes	so, does zmq_send() increment the "copy counter" on the zmq_msg_t and then decrement it (and release) when it's sent?
[00:37] cremes	it must otherwise when i call close it would release the message before 0mq has a chance to transmit it
[00:37] cremes	pls confirm if you can
[01:37] mikko	cremes: sorry, was a away building stuff
[01:37] mikko	cremes: you can close right after send
[01:38] mikko	take a look at zguide for samples
[01:38] mikko	https://github.com/zeromq/zeromq2/blob/master/src/zmq.cpp#L129
[01:38] mikko	also, that might clear it up a bit
[01:40] mikko	also, see the page: http://api.zero.mq/master:zmq-msg-close for zmq_msg_close
[02:09] jugg	whoa! Since when did we get versioned api on the web? Nice! :)
[02:10] jugg	oh, different domain. interesting.
[02:11] mikko	jugg: been there a few days
[02:12] mikko	and finally!
[02:12] mikko	http://snapshot.zero.mq/rpm/2011-02-22_02-09-11/centos5/i386/
[02:12] mikko	centos rpms available
[02:12] mikko	time to sleep
[10:40] pieterh	good morning
[10:42] ianbarber	morning
[10:48] pieterh	ianbarber: what part of the world are you in?
[10:48] pieterh	London? you seem to get around a lot for your "0MQ is the answer" talks :-)
[10:56] ianbarber	yep, london
[10:56] ianbarber	i was hoping to get to give it at confoo as well, but they went with a different talk in the end :)
[10:59] pieterh	I was thinking of doing a small 0MQ event in Brussels later in spring
[11:00] pieterh	April or May, when it's nicer weather
[11:03] ianbarber	awesome, i think that would be fun
[11:05] ianbarber	are you based in brussels then, or near by?
[11:07] pieterh	I'm in Brussels, yes, so it's easy for us to organize something here
[11:08] pieterh	There's a nice place in the center of town I used to hold conferences in
[11:09] pieterh	Brussels is reasonably central IMO, and of course there's the beer...
[11:09] ianbarber	surely there are some unused government buildings available? :)
[11:10] ianbarber	yeah, brussels is a really nice place, and easy to get to on the train from everywhere as well for people that aren't keen on flying.
[11:10] pieterh	you mean because one of our 7 governments is currently on extended holiday?
[11:10] pieterh	lol
[11:10] pieterh	ok, I'll set it up... excellent...
[11:11] pieterh	I'm thinking, mix of workshops and project presentations
[11:12] pieterh	people can go home in the evening, or stay overnight and socialise
[11:13] ianbarber	i think that's sensible, though I would probably aim for a panel or discussion slot or two, just to to give it a less structured feel - I would imagine that the crowd will all be pretty good with the library so the chat will be as good as the talks
[11:17] pieterh	So the idea is a lot of tables, chairs, refreshments, in a large room
[11:18] pieterh	wifi
[11:19] ianbarber	sounds good
[11:29] ianbarber	btw, the clone example in the new guide chapter is excellent.
[11:31] pieterh	ah, glad you like it
[11:32] pieterh	do you think it makes sense to do all the design discussion first, and then the code later?
[11:32] pieterh	these examples are going to be a lot larger than the earlier ones
[11:34] ianbarber	yeah, i think that's tricky whenever you get to a more real world example - i mean if the reader had been paying attention then should pretty much get how to build it by that point as all of the blocks have been covered. I think it is going to be a big block of code to cover the whole client and server case, but I'm not sure there's that much that should split it up
[11:36] pieterh	I mean, after you read the Clone discussion, do you want to see it worked out in code, or do you want to continue to the Harmony discussion?
[11:36] pieterh	assuming that code is 20 pages long...
[11:36] pieterh	(not one code block, but developed piece by piece)
[11:37] pieterh	for me the only way to prove the design is running code
[11:37] ianbarber	yeah, i would definitely prefer to see some code
[11:38] pieterh	ok, then I'll switch back to the earlier structure... it'll be like the last Worked example in Ch3
[11:38] ianbarber	yeah, i think that was a reasonable model
[11:39] ianbarber	would it be worth maybe having these examples be in python or another scripting language, just to trim the size on-page some?
[11:41] pieterh	that could be good, yes
[11:41] pieterh	it solves one problem I have with C, the lack of containers
[11:41] pieterh	I was thinking of using ZFL for these advanced cases
[11:41] pieterh	but Python would be neater
[11:42] pieterh	However... I still need to write them in C :-)
[11:42] pieterh	For completeness' sake
[11:43] pieterh	let's continue in C, which is the language of the API
[11:43] pieterh	but we can produce versions of the Guide for every single language
[11:43] pieterh	you want the Guide in Python? Not a problem!
[11:44] pieterh	(any examples not translated will default to C then)
[11:45] pieterh	since the source for examples is merged into the text at build time anyhow
[11:46] ianbarber	that would be quite cool
[11:47] pieterh	ok, it's a deal
[11:47] ianbarber	on that note - is there a process for saying when new examples are ready for translation?
[11:48] ianbarber	maybe just a mailing list ping that the code is there before the guide goes live
[11:48] pieterh	hmm, I guess it involves tracking the git
[11:48] pieterh	I'd rather not get into a release process for the guide
[11:49] pieterh	there are only a couple of languages that people have translated systematically
[11:49] pieterh	like PHP :-)
[11:49] stimpie	I understood the goal of zeromq is becoming a kernel module, I have just read the interesting new part of the manual 'clone pattern' but I wander how this adds to a kernel level system?
[11:49] pieterh	stimpie: it's a layer on top
[11:50] ianbarber	pieterh: fair point :)
[11:50] pieterh	ianbarber: yesterday I updated the C examples and text for 2.1...
[11:51] pieterh	I'm not sure the PHP binding handles 2.1 even...
[11:51] ianbarber	yeah, i was just making a note I should check the PHP ones
[11:51] ianbarber	it's up to date as far as I know, I've been mostly using it against 2.1.0
[11:51] stimpie	pieterh, ok clear enough.
[11:51] pieterh	nice!
[11:51] pieterh	hopefully these shifts will become rarer and rarer
[11:52] pieterh	stimpie: there are lots of reusable patterns we can make on top of 0MQ, Ch4 is covering some of the reliability ones
[11:52] pieterh	I think giving them names makes them easier to understand and reuse
[11:54] stimpie	They are interesting patterns but it confuses me with what the scope of the zeromq project is
[11:55] private_meta	hmm... I've been told that the c++ version of zmq uses void pointers because you can send something OTHER than char pointers as well, yet for the python version you can pretty much send a standard string in the send function. Doesn't that mean that this gives c++ functionality you can't capture with a similar python implementation?
[11:57] stimpie	private_meta, the message content is up to the client. You could also serialize java objects which are pretty useless in a c++ client.
[11:58] private_meta	Yeah, but it looks to me that for the python interface, in case I don't misunderstand it, the message content is narrowed down to strings
[11:59] ianbarber	private_meta: underneath it's just a chunk of bytes - I would imagine that there is a pack function or similar that can pack anything into a string for python?
[12:00] private_meta	I don't know, it's just that the send interface for python looks so convenient while with c++ you have to bitch around with zmq::message_t where you have to use memcpy or message_t.rebuild to get a simple string into a message
[12:02] ianbarber	hmm, for C there's the little zhelpers script pieterh uses that provides some handy helper functions to hide some of that stuff, i don't know if there's a C++ equivalent
[12:03] pieterh	stimpie: I'll make this clear in the text, thanks for pointing that out
[12:04] pieterh	ianbarber: yes, some kind person made zhelpers.hpp
[12:05] pieterh	private_meta: feel free to translate the zmsg code from C to C++, it'll give you what you want
[12:06] private_meta	pieterh: uhm... it doesn't exist yte?
[12:06] private_meta	*yet?
[12:06] pieterh	private_meta: the feeling of pride and accomplishment you'll feel as you make it... will be better than steak salad and fries
[12:06] private_meta	Well, not that hard, I don't quite like steak
[12:07] pieterh	even better then... :-)
[12:07] pieterh	general rule with 0MQ is, if there's something you think could work better, make it happen
[12:07] pieterh	you can take the C code and wrap it as C++ very easily IMO
[12:07] private_meta	I wouldn't know how
[12:08] pieterh	you are working in what language?
[12:08] private_meta	C++
[12:08] pieterh	did you read the zmsg.c code yet?
[12:10] private_meta	nope
[12:10] private_meta	Well, isn't there a zmsg.cpp already?
[12:12] pieterh	private_meta: please read both those files, then come back...
[12:13] pieterh	you have three choices, when it comes to getting functionality in 0MQ (or any software)
[12:13] pieterh	1. pay for it and get it soon
[12:14] private_meta	2. don't pay and wait
[12:14] pieterh	2. wait until someone else makes it and shares it, then get it for free
[12:14] private_meta	3. do it yourself
[12:14] pieterh	exactly
[12:14] pieterh	:-)
[12:14] pieterh	in this case I've made it really, really easy for you...
[12:14] private_meta	seen an interesting venn diagram for that
[12:14] pieterh	since you can literally take the C code (already designed as a class), wrap it with (I'd guess 20 lines of C++ code)
[12:14] pieterh	and get what you need
[12:14] stimpie	the file: examples/Java/psenvsub.java contains some garbage at the end
[12:15] pieterh	stimpie: indeed, it seems chopped off...
[12:16] pieterh	stimpie: ah, I see what you mean
[12:17] pieterh	ok, fixing that, I found the original contribution
[12:17] pieterh	stimpie: fixed, thanks!
[12:17] stimpie	np
[12:26] private_meta	just out of curiosity, don't have time at the moment, let's say i want to improve zmq by adding convenience methods for send/recv for c++, let's say for strings, would that be worth considering it?
[12:28] stimpie	private_meta, what is wrong with send(msg.toBytes())?
[12:28] pieterh	private_meta: it would not go into the core
[12:28] pieterh	private_meta: if all you want is send/recv string, it's already in zhelpers.hpp
[12:29] pieterh	// Convert string to 0MQ string and send to socket
[12:29] pieterh	static bool
[12:29] pieterh	s_send (zmq::socket_t & socket, const std::string & string) {
[12:29] pieterh	zmq::message_t message(string.size());
[12:29] pieterh	memcpy(message.data(), string.data(), string.size());
[12:29] pieterh	bool rc = socket.send(message);
[12:29] pieterh	return (rc);
[12:29] pieterh	}
[12:30] private_meta	oh, I've already seen that. By the way, why do you use memcpy there and not rebuild?
[12:38] private_meta	argh... the disabled copy constructor is annoying, I can't even return a message_t object from a function
[13:08] pieterh	private_meta: that code was written by Olivier Chamoux afair
[13:09] private_meta	The example code carries that name as well
[13:10] private_meta	According to a comment in the source it's to avoid "shared messages", which seems to be a somewhat valid argument if you tried to use it for that, but it's annoying if you want to return a message by a function
[13:11] pieterh	sure
[15:24] CIA-21	zeromq2: 03Martin Sustrik 07master * r43e8868 10/ (24 files):
[15:24] CIA-21	zeromq2: Added explicit error message in case of memory exhaustion
[15:24] CIA-21	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/eVM1AS
[16:00] amacleod	Are there any plans to have language-native ZMQ libraries, rather than wrapping C++ libs?
[16:01] amacleod	I guess maintaining parallel language-native libs would be a much bigger maintenance load.
[16:08] pieterh	amacleod: it's a lot of work unless there's a real payoff, e.g. languages that can't link to C++
[16:09] pieterh	you would not reach a similar level of performance and functionality
[16:09] pieterh	it's been discussed, we would need to document the wire level protocols properly first
[16:10] amacleod	pieterh, hm, yeah, good point. And I guess the hassle of linking, for example, JNI libraries in Java, is pretty much a one-time configuration thing.
[16:10] pieterh	plus you always get the latest/greatest 0MQ, etc.
[16:10] amacleod	pieterh, well, documenting the wire level protocols sounds like a good idea anyway. :-D
[16:10] pieterh	yes, when someone actually wants it... :-)
[16:11] amacleod	pieterh, by the way, where can I look at the new router/dealer example you made?
[16:11] pieterh	amacleod: hang on, I'll rebuild the Guide...
[16:11] amacleod	Thanks.
[16:16] pieterh	hmm, Wikidot seems to cache the old text for a while...
[16:17] pieterh	oops, error in my upload robot, it's sending the content to the wrong place...
[16:17] amacleod	Silly robot.
[16:18] pieterh	ok, here: http://zguide.zeromq.org/chapter:all#toc51
[16:19] ljackson	pieterh, got that code working last night, thx for your help
[16:19] pieterh	ljackson: np
[16:19] ljackson	pieterh, silly mistake of not de-ref on the work/clients sockets before sending to the queue device
[16:19] ljackson	odd that the api took that tho
[16:20] pieterh	ljackson: what language binding?
[16:20] ljackson	who maintains the c++ api for zeromq ? Maybe I could extend to accept socket pointers and ask for a pull request ?
[16:20] pieterh	ah, yes, I'm sure the maintainers will welcome contributions
[16:20] amacleod	"Worked Example: Inter-Broker Routing", right?
[16:20] pieterh	amacleod: "Asynchronous Client-Server"...
[16:20] pieterh	reload that page
[16:21] amacleod	Ah, ok.
[16:21] amacleod	Looks like it renders as #toc50 for me.
[16:22] pieterh	caching issue perhaps
[16:22] amacleod	Yeah.. I think I'm still getting the old version.
[16:22] pieterh	do you see figure 46 "Asynchronous Client Server" ?
[16:22] amacleod	Yeah.. I see the diagram.
[16:22] pieterh	ok, then that's all good
[16:23] amacleod	The next code sample is router-to-router, though...
[16:24] pieterh	does it not show the asyncsrv example?
[16:24] pieterh	ah, diagrams are accurate but text is out of date...
[16:24] pieterh	reload, reload, reload!
[16:24] amacleod	Nope. It's a little incongruous, actually... :)
[16:24] amacleod	aha.. there it is!
[16:25] pieterh	enjoy, amacleod, and let me know if it's helpful
[16:25] pieterh	it was quite fun making this pattern
[16:25] amacleod	Sure thing. :)
[16:27] amacleod	Could the client_task use blocking recv rather than polling, or is the polling crucial?
[16:28] pieterh	amacleod: if you want to send a mix of requests and replies, it can't block on recv
[16:28] pieterh	you can't have a separate thread doing the receiving
[16:29] pieterh	it can use a simpler poll than the one I made, that's to ensure requests are sent on time
[16:30] pieterh	brb, lunch...
[16:34] CIA-21	zeromq2: 03Martin Sustrik 07sub-forward * r977f5b7 10/ (5 files):
[16:34] CIA-21	zeromq2: Trie-based matcher (ptrie_t) implemented.
[16:34] CIA-21	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/gSZiWK
[17:31] cremes	is there any technique for detecting a slow subscriber? e.g. check queue sizes on netstat or something?
[17:32] cremes	i have a pub socket that has a dozen or so subscribers; my memory usage slowly climbs even though i don't have any leaks
[17:32] cremes	i now suspect a slow subscriber isn't pulling stuff off the queue quickly enough and it's backing up at the publisher (HWM is default)
[17:44] pieterh	cremes: there's zmq_getsockopt (..ZMQ_BACKLOG)
[17:46] cremes	is that really appropriate? that just controls the queue of initial connects/binds
[17:46] cremes	it doesn't have anything to do with message queue length, right?
[17:46] pieterh	oops
[17:46] cremes	:)
[17:47] pieterh	so there's no way to know what's happening at the level of individual subscribers...
[17:47] pieterh	do you number your messages?
[17:47] cremes	no but they do get a timestamp
[17:47] nooob	you might want to setup a connection from the subscriber back to the sender
[17:48] cremes	so they are sequential
[17:48] pieterh	ok, even better
[17:48] pieterh	in the subscriber, check how old incoming messages are
[17:48] pieterh	if you exceed X seconds, send an alert to your system console
[17:48] nooob	there was a pattern like that in the guide
[17:48] cremes	hmmm, that doesn't seem like it would help unless i misunderstand how the pub socket queueing works
[17:48] cremes	there is a separate outgoing queue for each subscriber on a pub socket, yes?
[17:49] pieterh	cremes: if your pubsub system is stable, subscribers will get messages with predictably low delays
[17:49] cremes	so fast subscribers would have a small or empty queue while my slow guy would have a large queue
[17:49] pieterh	it's running over TCP?
[17:49] cremes	yes
[17:49] pieterh	even over PGM...
[17:49] pieterh	slow subscribers will by definition :-) get messages 'too slowly'
[17:50] pieterh	timestamp checking should do it
[17:50] cremes	i will check that out
[17:50] pieterh	i like the pattern, will try a quick implementation
[18:12] cremes	pieterh: question...
[18:12] pieterh	cremes: shoot...
[18:12] cremes	i wrote an example where i have a single publisher that connects to a forwarder device
[18:12] cremes	it publishes as fast as it can
[18:12] cremes	there are no subscribers connected to the device
[18:12] cremes	i see memory growing rapidly; is that expected?
[18:12] pieterh	depends...
[18:13] pieterh	running on the same box?
[18:13] cremes	yeah
[18:13] pieterh	one core?
[18:13] cremes	dual quad, 16gb memory... beefy box
[18:13] pieterh	no, then it's not expected
[18:13] cremes	ok
[18:13] pieterh	is the memory growing in the publisher or in the forwarder?
[18:13] cremes	i'm going to try and replicate this with the C forwarder
[18:13] cremes	it grows in the forwarder
[18:14] pieterh	how about CPU usage?
[18:14] cremes	high
[18:14] pieterh	ok, try this
[18:14] pieterh	- publish 20M messages, then pause for 10 seconds
[18:14] pieterh	- repeat
[18:14] cremes	ok
[18:14] pieterh	see if memory usage remains high during that pause
[18:14] pieterh	if so, forwarder is broken somehow
[18:15] pieterh	if it comes down, it's just queuing bizarreness
[18:28] sustrik	with a single incoming streams and many outgoing streams you would expect the latter to be bandwidth-bound and thus slower than the former
[18:28] sustrik	consequently, in congestion situations you should expect messages queueing in the forwarder
[18:38] pieterh	sustrik: yes, but here there are no subscribers on the forwarder...
[18:39] sustrik	ah
[18:40] sustrik	i've missed that
[18:41] sustrik	then the queue's main loop must be slower then message receiving in its SUB socket
[18:41] sustrik	in any case, when doing congestion tests
[18:41] sustrik	use hwm
[18:41] sustrik	otherwise you are inevitable going to run out of memory
[18:52] pieterh	hmm, forwarder should run at least as fast as publisher in this case...
[18:52] pieterh	let's see if cremes comes back with more data
[18:52] cremes	when there are no subscribers attached to the forwarder, calls to zmq_send() should just close the msg and drop it, right?
[18:53] pieterh	cremes: ack
[18:53] pieterh	did you try that run/pause/run/pause ?
[18:55] cremes	yes, the memory did not shrink
[18:56] pieterh	did _not_ shrink?
[18:56] pieterh	then it's a real leak
[18:56] cremes	i need to try this with the C forwarder device that comes with the lib
[18:57] cremes	and see if it behaves the same
[18:57] pieterh	yes
[18:57] cremes	i'll open a ticket if i see it replicated
[18:57] pieterh	i just reviewed the code for that, there is zero chance it leaks memory
[18:57] cremes	right, it's so simple there is no way
[18:57] cremes	<sigh>
[18:57] pieterh	if there's a leak it's either in the pub socket (unlikely), or the binding (possible), or it's a heap artifact (plausible)
[18:58] pieterh	sometimes the heap does not shrink immediately when memory is freed
[18:58] pieterh	try setting a HWM of, say, 100K on the publisher and see what effect that has
[18:58] pieterh	s/publisher/forwarder/ sorry
[18:59] pieterh	on the frontend socket, initially
[18:59] cremes	yes, i'll keep at it
[19:05] amacleod	pieterh, my problem from yesterday does seem to be a threading issue. When I changed my test harness (which simulates a client) to use a separate context from the server, the messages got through correctly in both directions.
[19:06] amacleod	However, now I'm seeing "Assertion failed: pending_term_acks" when closing the socket.
[19:06] pieterh	amacleod: ah, good... I've also been using separate contexts for each 'task' when it simulates a separate process
[19:06] pieterh	that's not a 0MQ assertion...
[19:07] amacleod	It lists it as socket_base.cpp:690
[19:07] amacleod	It's hard for me to debug, because it doesn't generate a Java exception, it just kills the process.
[19:07] pieterh	what version of 0MQ are you on?
[19:07] amacleod	Might be jzmq, hmm..
[19:07] amacleod	2.0.10
[19:08] pieterh	any problem upgrading to master?
[19:08] pieterh	there are a lot of fixes since 2.0.10
[19:09] pieterh	sustrik: do you keep any log of major changes made apart from the git commit history?
[19:09] amacleod	Depends--how stable is 2.1? I think we chose 2.0.10 because we wanted not to use the "development" branch.
[19:10] pieterh	amacleod: for various reasons, the git master is significantly more stable than the 'stable'
[19:10] pieterh	... than the 'stable' 2.0.10 release
[19:10] amacleod	Hmm. :) Might be worth the switch, then.
[19:11] sustrik	btw, the assertion is in code that was complately rewritten in 2.1
[19:11] amacleod	sustrik, good to know.
[19:11] pieterh	we're in the slow process of making a formal release for 2.1.11
[19:11] pieterh	sustrik: the one thing that will be problematic is making release notes
[19:11] sustrik	pieterh: what log?
[19:11] sustrik	why so?
[19:11] pieterh	i was afraid of that...
[19:11] sustrik	it's automatic
[19:12] pieterh	what's automatic is a dump of every commit message
[19:12] sustrik	yup
[19:12] amacleod	https://github.com/zeromq/zeromq2/raw/master/NEWS
[19:12] pieterh	that is not release notes
[19:12] pieterh	nope, NEWS is painfully made by hand
[19:12] sustrik	ah, i'll go through the commit messages and write the release notes
[19:12] sustrik	not a problem
[19:12] pieterh	excellent...!
[19:12] pieterh	then IMO we're ready to break off the branch...
[19:13] pieterh	there were zero issues porting the Guide examples to 2.1.11
[19:13] sustrik	there are some pgm problems being experienced with head currently
[19:13] pieterh	that's ok, we'll have at least a couple of weeks to stabilize
[19:13] amacleod	In the mean time, if we assume I cannot presently upgrade from 2.0.10, any suggestion on where I should look to prevent this assertion from failing?
[19:13] pieterh	the key now IMO is to get a formal package out so folks like amacleod use the current master, not old code
[19:14] pieterh	sustrik_, so I'm going to create a separate git but this is somewhat experimental
[19:14] amacleod	As far as I know, jzmq is set up to handle both 2.0.10 and 2.1.x, so upgrading shouldn't be too painful.
[19:15] sustrik	the only thing preventing branching off right now is the pgm problem
[19:15] pieterh	amacleod: at least, try on the 2.1.x master so you know whether it works better or not
[19:15] sustrik	i can't help with that :\|
[19:15] sustrik	so we'll have to wait while someone fixes it
[19:15] pieterh	sustrik_ steve's traveling right now
[19:16] pieterh	we'll pipeline it all
[19:16] sustrik	or, alternatively, you can branch from a historic version
[19:16] pieterh	pgm fixes can come in after we branch
[19:16] pieterh	it's good to have known issues so we can prove that the process works
[19:16] sustrik	right, you can branch a backport the ix
[19:16] sustrik	fix
[19:16] pieterh	yes
[19:17] pieterh	do you want to push any code before I clone the repo?
[19:17] sustrik	no
[19:17] sustrik	do it now
[19:17] pieterh	okay... going for it :-)
[19:17] sustrik	amacleod: upgrading should not be painful
[19:17] sustrik	trying to fix the problem in 2.0.10 is going to be painful
[19:18] amacleod	Hm, I think you are right.
[19:18] sustrik	it's a problem with shutdown subsystem
[19:18] sustrik	which was pretty creaky in 2.0.x
[19:18] sustrik	it was one of the major reasons for making 2.1
[19:18] amacleod	So, which version should I get? The 2.1.0 package from the front page?
[19:19] sustrik	yes
[19:24] pieterh	hmm, anyone know how to clone a github repository and _keep_ it in the same organization?
[19:24] pieterh	I'm sure I'm missing something obvious...
[19:33] pieterh	hmm, git push -u, obviously... duh
[19:40] pieterh	sustrik_: okaaay, I think that's done... we now have http://github.com/zeromq/zeromq2-1
[19:40] pieterh	just the master branch and no tracking between the two gits, I hope
[19:43] pieterh	sustrik_: next step is to make release notes (you) and then packages (me)
[19:49] sustrik	ok
[19:49] sustrik	let me see
[19:53] pieterh	I suggest we edit the NEWS together at http://piratepad.net/, then commit to (the real) master
[19:54] sustrik	yes
[19:54] sustrik	wait a sec
[19:56] pieterh	I'd like to make 2-3 release candidates over 2-3 weeks
[19:56] pieterh	http://piratepad.net/dkenb33ThF
[19:57] pieterh	I think we have enough momentum to get rapid feedback on releases
[19:58] pieterh	the one problem I see with this approach is we don't get a branch for the stable release, automatically, in the real git
[19:59] pieterh	s/branch/tag/
[19:59] pieterh	mato's going to cut my throat...
[20:05] sustrik	what's the problem?
[20:05] sustrik	it's DCVS
[20:05] sustrik	so it shouldn't matter whether it's a branch or a separate repo
[20:05] sustrik	DVCS*
[20:05] sustrik	pieterh: still there
[20:05] pieterh	ok, allow for the fact that any gross git manipulations leave me nervous
[20:05] sustrik	?
[20:06] pieterh	I find the tool 10x too complex and dangerous, so...
[20:06] pieterh	I'd much prefer to work with a copy of the repository (not a clone or a fork)
[20:06] pieterh	advantages: anyone can imitate this, make releases, safely
[20:06] sustrik	you mean you've created a new repo?
[20:06] pieterh	yes
[20:07] sustrik	i.e. deleted the entire history?
[20:07] pieterh	http://github.com/zeromq/zeromq2-1
[20:07] pieterh	I was able to copy the master branch
[20:07] pieterh	which is fine for my purposes (stabilization)
[20:07] sustrik	github seems to be dead
[20:07] pieterh	it wasn't me!!!!!
[20:07] sustrik	:)))
[20:08] pieterh	i'd much prefer to work with readonly access to the real repository
[20:08] sustrik	let's move to piratepad now
[20:08] pieterh	but problem is, separate repository breaks the neat history of version tags in the real repo
[20:09] pieterh	piratepad seems unreliable right now...
[20:09] sustrik	yuck
[20:09] cremes	ok, i can reproduce the memory "leak"; it is somewhat complicated and it's not a real leak... it's more like a DDOS
[20:09] sustrik	anyway, there are just 2 items
[20:09] sustrik	ZMQ_RECONNECT_IVL_MAX
[20:09] sustrik	and
[20:09] sustrik	ZMQ_RECOVERY_IVL_MSEC
[20:09] sustrik	+ the bug fixes
[20:10] sustrik	you can get description for both from zmq_setsockopt(3)
[20:10] pieterh	indeed
[20:11] pieterh	I'll do that, np
[20:11] sustrik	it's easy this time
[20:11] pieterh	"An automatic restart of Piratepad will occur inÂ -31945Â seconds."
[20:11] pieterh	good god
[20:11] sustrik	they are pirates, you know
[20:11] pieterh	We killed Piratepad...
[20:11] sustrik	not exactly precise
[20:11] pieterh	Well, Google bought etherpad and then shut it down...
[20:12] sustrik	anyway
[20:12] sustrik	anything else missing for making the stable branch?
[20:13] pieterh	not for me
[20:13] sustrik	good
[20:13] pieterh	but did you understand my concern?
[20:13] sustrik	which one?
[20:13] pieterh	we're breaking the master/maint process
[20:14] pieterh	smashing it into little pieces
[20:14] sustrik	yes
[20:14] sustrik	the problem is the maint was not really well maintained anyway
[20:14] pieterh	since I personally dislike that process and find it complex and bizarre, I'm happy with that
[20:15] pieterh	I'd like a process that takes 30 seconds to fully understand
[20:15] pieterh	on a cold monday morning before coffee
[20:15] sustrik	i think it's the same now
[20:16] sustrik	patches being passed between branches
[20:16] sustrik	ah, one point
[20:16] sustrik	have a look how version numbers are to be maintained
[20:16] sustrik	it's important not to mess that up
[20:17] pieterh	I still don't remember how changes flow from maint to master and vice-versa
[20:17] pieterh	so if this works, what we'll get are a series of stand-alone gits, zeromq2-1, zeromq2-2, zeromq3-0, each with their own maintainer(s)
[20:17] pieterh	and pull requests with patches between them
[20:17] pieterh	each being copied off the real repo as that heads towards stability
[20:18] sustrik	it works both ways
[20:18] pieterh	the version numbering works properly IMO
[20:18] amacleod	:-( Now jzmq tests are hung forever in Context.finalize
[20:18] pieterh	next releases will be 2.1.1, 2.1.2, 2.1.3...
[20:18] sustrik	1. backporting patches from master to maint(s)
[20:18] sustrik	2. upstreaming patches from maint(s) to master
[20:19] pieterh	I'd assume 2.1.3 will be the stable one if people find bugs
[20:19] pieterh	yes
[20:19] pieterh	I'd veto that
[20:19] pieterh	upstreaming patches from temporary clones of maint(s) to master
[20:19] pieterh	at least in my view... for now...
[20:19] pieterh	anyhow, yes, if it makes sense
[20:19] pieterh	DCVS as you said
[20:20] pieterh	each repo is largely independent and we work by protocol
[20:20] sustrik	well, if you bugfix a patch for maint, you want to upstream it
[20:20] sustrik	only thing it means is sending it to the ML
[20:20] sustrik	no big deal
[20:21] pieterh	if the bug is in the current master, I'd first want to get it fixed there
[20:21] pieterh	because until it's gone past that filter, there's no guarantee it's sane
[20:21] pieterh	if the bug is in old code, then there's no upstreaming anyhow
[20:21] sustrik	it's up to you
[20:22] sustrik	so you're going to reject patches to maint
[20:22] sustrik	right?
[20:22] pieterh	presumably, yes
[20:22] sustrik	ok
[20:22] pieterh	or at least treat them with scepticism
[20:22] pieterh	the only person I trust for my patches is you
[20:22] sustrik	ok, now for the versioning
[20:22] pieterh	especially, especially on a stable release people rely on for production
[20:22] pieterh	versioning... ok
[20:23] sustrik	the process looks like this:
[20:23] pieterh	s/you/the owner of the code in question/
[20:23] sustrik	when about to make a release:
[20:23] sustrik	1. update the version numbers in zmq.h
[20:23] sustrik	2. make a release
[20:23] sustrik	3. bump the version number
[20:24] pieterh	right, http://www.zeromq.org/docs:procedures
[20:24] sustrik	yes
[20:24] sustrik	exactly the same process
[20:24] pieterh	this would happen on the release git only
[20:24] sustrik	are you saying you are not going to version the maint branch?
[20:25] sustrik	that's nonsense
[20:25] pieterh	as far as I'm concerned, there is no maint branch
[20:25] sustrik	maint repo
[20:25] sustrik	no versions?
[20:25] pieterh	ok, the terms are vital here
[20:25] pieterh	let's call it the release git
[20:25] pieterh	as compared to the master git
[20:25] sustrik	whatever
[20:25] sustrik	what about versions?
[20:25] pieterh	I'm going to tag the release git properly
[20:26] pieterh	and version it properly, exactly as now
[20:26] sustrik	change zmq.h as well
[20:26] pieterh	exactly the same, but it all happens on my master branch
[20:26] sustrik	otherwise users won't be able to find out what the version is
[20:26] sustrik	(zmq_version(), version macros
[20:26] pieterh	yes, all that is unambiguous
[20:27] pieterh	the versioning is sane, obvious, necessary
[20:27] sustrik	right, 3 steps
[20:27] sustrik	tag
[20:27] sustrik	release
[20:27] sustrik	change version
[20:27] pieterh	yes
[20:27] sustrik	that's it
[20:27] sustrik	ok
[20:28] pieterh	and the final step is update the version in the development master
[20:28] sustrik	so, i'm going to bump master to 2.2
[20:28] pieterh	so it goes from 2.1.1 to 2.2.0
[20:28] sustrik	yes, i'll do that now
[20:28] pieterh	very excellent...!
[20:28] pieterh	ok, I'm going to make a release now
[20:28] pieterh	no time like the present
[20:29] sustrik	cremes: sorry for the delay
[20:29] sustrik	have you found out what's the problem?
[20:29] pieterh	amacleod: sounds like there's a socket left open
[20:29] pieterh	amacleod: sorry also for the delay
[20:30] amacleod	Terminating a context will not finish if a socket is left open?
[20:31] pieterh	amacleod: not in every case, but in some cases
[20:31] pieterh	this is new behavior in 2.1
[20:31] pieterh	a side-effect of zmq_term's determination to flush sockets safely
[20:32] pieterh	if the socket is held by the same thread as calls zmq_term, it deadlocks (or something)
[20:32] amacleod	Hmm, yeah. It did look as though the Java finalizer was waiting on some things.
[20:36] pieterh	sustrik: s/until/untill/ in zmq_setsockopt.txt...
[20:37] sustrik	would you fix it in maint and upstram it to master or should i fix it in master and you'll backport it to the maint :)
[20:38] pieterh	For a one-word fix... let me clone the git and send you a pull request with regression test case
[20:38] pieterh	yeah :-) process!
[20:40] sustrik	i love it :)
[20:41] pieterh	what do you call sys://log? do we want to document that briefly?
[20:41] pieterh	is it an internal transport?
[20:41] cremes	sustrik: yes, i've discovered a problem; i am just finishing a second test of my hypothesis.... i'll share it in about 10m
[20:41] sustrik	good god, everything is shutting down
[20:41] pieterh	someone dropped the Internet!
[20:42] sustrik	i suspect i'm in lybia
[20:42] pieterh	I knew they should have left it safe with the Elders of the Internet at the top of Big Ben
[20:43] pieterh	sustrik: funny, a friend of mine was supposed to be doing a rally in Libya just about now...
[20:43] sustrik	let's hope he's not there really
[20:43] pieterh	indeed, I asked him like two weeks ago, "you going before or after the revolution?", he lol'd
[20:44] fbarriga	hi everyone
[20:44] fbarriga	I've a little doubt
[20:45] sustrik	yuck, github is totally braindead
[20:45] sustrik	repos as well as the site
[20:45] pieterh	fbarriga: hi, what's up?
[20:45] fbarriga	hi pieterh , playing with the socket
[20:46] fbarriga	probably this deserve a RTFM reply..
[20:46] fbarriga	if I do this: zmq_msg_init(&msg); zmq_recv(sock, &msg, 0); zmq_recv(sock, &msg, 0); zmq_msg_close(&msg);
[20:46] fbarriga	i'm leaking memory ?
[20:46] cremes	sustrik: apparently i can reproduce a memory ddos with 0mq sockets; here's how (there may be a simpler way but this is how i did it)
[20:46] fbarriga	every time zmq_recv allocate new memory ?
[20:47] cremes	setup a standard forwarder
[20:47] pieterh	fbarriga: yes, that works fine
[20:47] fbarriga	and is mandatory to init the msg before receive the data ?
[20:47] pieterh	yes
[20:47] cremes	connect a publisher to it and let it go crazy broadcasting (bigger messages are better, e.g. 20k+)
[20:47] cremes	so far, no leak
[20:47] cremes	now start adding subscribers
[20:47] fbarriga	umm, but if I have a while(true) I can't put the init outside to avoid overhead ?
[20:48] cremes	again, no real leak except for a little queueing if the subscribers are slower than the pub
[20:48] pieterh	fbarriga: close and init are a pair
[20:48] cremes	i adjusted my publisher so that it would not overrun the subscribers
[20:48] cremes	now here's where the leak comes in....
[20:49] cremes	modify the subscribers so that they only stay connected for a few seconds; they close their sockets, and then immediately reconnect to the forwarder
[20:49] fbarriga	so I can do this: zmq_msg_init(&msg); while (true) { zmq_recv(sock, &msg, 0); } zmq_msg_close(&msg); ?
[20:49] cremes	they continue to do this for at least 15 minutes
[20:49] cremes	what i see happening is a lot of sockets go into TIME_WAIT
[20:49] cremes	i suspected that the socket ZMQ_LINGER option (defaults to indefinite) was holding onto the messages
[20:50] cremes	but i just ran a test where i set it to 0 and it still occurred
[20:50] cremes	so i'm thinking that as a socket sits in TIME_WAIT, its queue is still active; as new messages arrive from the publisher they are added to the queue
[20:50] sustrik	do the subscribers have identities?
[20:50] cremes	when the TIME_WAIT expires, the queued data doesn't go away
[20:51] cremes	they are using the default random id's; i am not overriding with my own
[20:51] pieterh	fbarriga: nope, it'll leak memory
[20:51] pieterh	if you have positive proof that init/close are slow, send it along
[20:51] cremes	when i stop the publisher, the forwarder's memory stops growing
[20:51] pieterh	if you don't then please don't optimize unnecessarily, it's pointless
[20:52] cremes	when i disconnect the publisher and all subscribers, the forwarder's memory footprint remains the same (big)
[20:52] cremes	so if i left it running (like for an overnight test) it would exhaust all memory resources by morning
[20:52] sustrik	cremes: afaik processes don't return allocated memory to the OS
[20:52] cremes	and this is exactly what has been plaguing me the last week or so (since i found that last bug!)
[20:53] pieterh	cremes: does the forwarder memory grow again from that high point ?
[20:53] pieterh	i.e. leak vs. cached heap in process...?
[20:53] cremes	pieterh: it only grows again if i turn the publisher back on
[20:53] pieterh	but does it start growing again immediately?
[20:54] cremes	let's see....
[20:54] pieterh	if it's cached heap memory then after a few cycles it will stop growing
[20:55] fbarriga	pieterh, but hypothetically calling those function will slow down the function
[20:55] pieterh	fbarriga: doing hypothetical optimization is a waste of time, please don't
[20:55] cremes	pieterh: no, it doesn't start growing again right away
[20:56] pieterh	cremes: then it's cached heap memory, not a leak
[20:56] pieterh	fbarriga: if you wish to actually know, try calling zmq_msg_init() and zmq_msg_close() in a loop, 1 billion times
[20:56] pieterh	measure how many seconds it takes
[20:56] cremes	so this is just due to the heap growing really large and it never gives the buffer space back to the OS even if the buffer is empty
[20:57] cremes	?
[20:57] pieterh	cremes: yup
[20:57] CIA-21	jzmq: 03Gonzalo Diethelm 07master * ra791958 10/ (.gitignore configure.in): Changes by jrideout to allow the build to succeed with autoconf 2.59. - http://bit.ly/fypsls
[20:57] pieterh	if you have a normally peaky stream, it'll hit some max and stop growing
[20:57] pieterh	there may be a system call to return memory to the OS
[20:57] pieterh	but it's always virtualized anyhow, not real RAM, afaik
[20:58] pieterh	sustrik: is the inproc hwm+swap change worth mentioning in the release notes?
[20:59] sustrik	dunno
[20:59] sustrik	it's more of a fix
[20:59] sustrik	than a new feature
[21:01] pieterh	basically the hwm is shared between peers, right?
[21:03] sustrik	it's sum of the two hwms
[21:05] pieterh	the peers share a single buffer...
[21:08] pieterh	sustrik: http://edupad.ch/2tZLebJbTg is the NEWS for 2.1.1...
[21:08] pieterh	I've gone through the git log
[21:10] sustrik	you've killed the edupad as well!
[21:11] sustrik	isn't it a ipv6 day today?
[21:15] pieterh	sustrik: let's do that tomorrow... update your NEWS, send me a pull request
[21:15] pieterh	when we get that working, I'll make the release
[21:16] sustrik	ok
[21:19] cremes	i'm pretty convinced there is still a leak here... new information
[21:19] cremes	slowing the publisher to every 10ms and setting HWM for all sockets to 1 and LINGER to 0
[21:19] cremes	it still slowly grows in size
[21:20] cremes	there shouldn't be anything getting queued or buffered here
[21:20] sustrik	ack
[21:20] sustrik	can you provide the test program?
[21:20] cremes	the "cause" appears to be the subscribers connecting/disconnecting
[21:21] sustrik	yes, looks like
[21:21] cremes	sustrik: i can but it will be in ruby; i'll describe the steps so it could be replicated in another lang
[21:21] pieterh	hmm, sustrik, maybe you really are in Libya...
[21:23] pieterh	sustrik: does the HWM on a PUB socket affect each subscriber queue (over TCP)? so potentially N x HWM?
[21:23] sustrik	cremes: ok
[21:23] sustrik	yes
[21:23] sustrik	pieterh: yes
[21:24] pieterh	thx
[21:39] sustrik	pieterh: still there
[21:39] sustrik	?
[21:39] sustrik	one more important thing...
[21:40] pieterh	yeah, still here
[21:41] pieterh	hmm?
[21:41] sustrik	given that there are two separately maintained repos now
[21:42] sustrik	we should keep the patches completely clean
[21:42] sustrik	i.e. no whitespace patching
[21:42] sustrik	no mixed patches
[21:42] sustrik	like fixing a bug and correcting an unrelated typo in a single patch
[21:42] sustrik	etc.
[21:42] pieterh	you mean so we can re-port the history anywhere?
[21:42] sustrik	otherwise we'll end in dependency hell
[21:43] sustrik	yes
[21:43] pieterh	right
[21:43] sustrik	keeping patches mobile
[21:43] pieterh	so the way a pull request works is you specify a commit (or multiple commits)
[21:43] pieterh	this may not work all the time
[21:44] sustrik	?
[21:50] pieterh	clearly if we can keep patches fully mobile on the development master, that's ideal
[21:50] pieterh	makes it fairly easy to send them anywhere
[21:50] pieterh	we may want to add more process support, e.g. something greppable in the log
[21:50] pieterh	in the worst case we can apply patches by hand
[21:51] pieterh	well, some situations I can see happening
[21:51] pieterh	- bug hits in 2.1 but that code has changed in 2.2
[21:51] pieterh	- patch in 2.2 doesn't apply cleanly in 2.1
[21:52] pieterh	- 2.1 needs only part of a patch made to 2.2
[21:52] pieterh	etc.
[21:55] sustrik	git has support for that
[21:55] amacleod	I am getting a different assertion error from jzmq tests now: Unknown error 156384765, rc == 0 (src/socket_base.cpp:243)
[21:55] sustrik	it's called merging
[21:55] sustrik	so you apply the patch
[21:55] sustrik	you get merge conflicts
[21:55] sustrik	you solve them
[21:55] sustrik	you commit the patch
[21:55] sustrik	it's still the same patch, but it have been backported
[21:56] sustrik	the backport iirc is visible as a separate commit
[21:57] sustrik	amacleod: what version?
[21:57] pieterh	sustrik: ok, we'll play it through
[21:58] amacleod	sustrik, v2.1.0, from tarball on http://www.zeromq.org/intro:get-the-software
[22:00] amacleod	Looks like that line reference is in zmq::socket_base_t::getsockopt
[22:01] sustrik	amacleod: yes, it's a bug
[22:01] sustrik	it's solved in head
[22:01] amacleod	Ok. Guess I should get head, then, eh?
[22:01] sustrik	pieterh: when are you going to release 2.1.1?
[22:02] sustrik	amacleod: either get the head or wait for 2.1.1
[22:02] amacleod	I will try with head.
[22:10] pieterh	sustrik: please review http://edupad.ch/2tZLebJbTg and add to your master if you're happy with it
[22:10] pieterh	date is set for tomorrow
[22:10] pieterh	send me a pull request for that change, and I'll process it, and then make the release
[22:10] pieterh	tomorrow morning after coffee
[22:10] pieterh	does that work?
[22:10] pieterh	I'd like to split off "Making a Release" from the "Source Code" page, it's two separate topics
[22:10] pieterh	and this lets me document how to make a release, as a naive and not very expert person
[22:10] pieterh	which is ideal
[22:11] pieterh	ok, away
[22:12] pieterh	g'nite everyone
[22:12] amacleod	Nite Pieter.
[22:15] sustrik	what change?
[22:16] sustrik	edpad times out here btw
[22:16] sustrik	good night anyway
[22:21] amacleod	Do pollers need to be closed or finalized in some way before terminating the context?
[22:28] rphillips	is there a way to set a zeromq socket up to timeout on a zmq_send?
[22:28] ianbarber	you could poll for a socket being writeable, with a timeout
[22:32] rphillips	Do I need to check for writability after the connect or after the send?
[22:32] rphillips	the send is blocked already
[22:44] sustrik	before the send