ZeroMq IRC Log

Thursday October 14, 2010

[Time] Name	Message
[05:51] deri	i enjoyed reading the zeromq guide. a couple of the take aways I got is to focus on the 1-to-N fan out principle, and also setting up the REP-REP "pinch points". i still need to sit down and think things through beyond that though.
[06:38] CIA-14	zeromq2: 03Martin Sustrik 07master * re2167ce 10/ src/zmq.cpp :
[06:38] CIA-14	zeromq2: Precise timouts in zmq_poll implemented
[06:38] CIA-14	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/bfyFvq
[06:38] CIA-14	zeromq2: 03Martin Pales 07master * rda73b7c 10/ (AUTHORS src/devpoll.cpp):
[06:38] CIA-14	zeromq2: zmq::devpoll_t : correct a typo in loop()
[06:38] CIA-14	zeromq2: A minor typo correction to resolve compilation error on Solaris.
[06:38] CIA-14	zeromq2: Signed-off-by: Martin Pales <m.pales@gmail.com> - http://bit.ly/a57V6q
[07:17] CIA-14	zeromq2: 03Martin Sustrik 07master * rb174ad2 10/ doc/zmq_poll.txt :
[07:17] CIA-14	zeromq2: zmq_poll man page fixed to reflect the precise timeout semantics.
[07:17] CIA-14	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/cE8zjz
[07:35] CIA-14	zeromq2: 03Martin Sustrik 07master * rcafcdbb 10/ src/zmq.cpp :
[07:35] CIA-14	zeromq2: Safety measure in zmq_msg_close implemented
[07:35] CIA-14	zeromq2: zmq_msg_close now empties the message on zmq_msg_close, thus not
[07:35] CIA-14	zeromq2: leaving random data in the structure, that may be mistaken for
[07:35] CIA-14	zeromq2: a valid message.
[07:35] CIA-14	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/aBQ9ez
[10:02] CIA-14	zeromq2: 03Martin Pales 07master * rf9e6d94 10/ src/poller_base.cpp :
[10:02] CIA-14	zeromq2: zmq::poller_base_t : workaround for sunstudio compiler in add_timer()
[10:02] CIA-14	zeromq2: A minor workaround to resolve compilation error with sunstudio compiler,
[10:02] CIA-14	zeromq2: which does not yet support member templates for std::multimap.
[10:02] CIA-14	zeromq2: Signed-off-by: Martin Pales <m.pales@gmail.com> - http://bit.ly/c21Vwa
[10:15] CIA-14	zeromq2: 03Martin Sustrik 07master * rb7386f5 10/ (4 files):
[10:15] CIA-14	zeromq2: To insert to associateive STL containers value_type used instead of make_pair
[10:15] CIA-14	zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/aafuBG
[11:06] CIA-14	zeromq2: 03Gonzalo Diethelm 07maint * r26d7669 10/ .gitignore : Added bin directory to ignore list. - http://bit.ly/ceMfu6
[11:16] CIA-14	jzmq: 03Gonzalo Diethelm 07master * r04603ed 10/ (12 files in 4 dirs):
[11:16] CIA-14	jzmq: All socket options are now 64 bits.
[11:16] CIA-14	jzmq: Enabled some socket options only from version 2.1.0, at compile and run time.
[11:16] CIA-14	jzmq: Added version functions to Java binding.
[11:16] CIA-14	jzmq: Changed several file modes back to 644. - http://bit.ly/d82VKX
[12:25] mato	sustrik: are you there?
[12:26] sustrik	mato: hi
[12:26] mato	sustrik: you ignored my comment about writing a test_poll, why?
[12:27] mato	sustrik: then you commit the patch and ask "please check if everything works" :-(
[12:27] sustrik	i've did it by accident
[12:27] sustrik	to prevent myself from embarassement i've announced it as if it was intended afterwards :)
[12:27] mato	what was intended?
[12:28] sustrik	i've had a git tree with the patch aplied
[12:28] mato	what patch?
[12:28] sustrik	zmq_poll one
[12:28] sustrik	then there was another patch on the mailing list
[12:28] sustrik	so i've aplied it
[12:28] mato	huh?
[12:29] sustrik	forgeting that there is the zmq_poll patch already aplied in that repo
[12:29] mato	i thought you wrote the zmq_poll patch for timeouts? it has your name on it
[12:29] sustrik	yes, i did
[12:29] mato	so? what's this about another patch?
[12:29] sustrik	i'm explaining the accident
[12:29] sustrik	1. i have my local repo
[12:29] mato	what accident?
[12:30] sustrik	how the untested zmq_poll patch got into zeromq/zeromq2
[12:30] sustrik	2. i fix zmq_poll
[12:30] sustrik	3. i commit the fix so that i can do git format-patch
[12:30] sustrik	4. i send it the patch to the ML
[12:30] mato	oh, i see now
[12:30] sustrik	5. I am happy
[12:31] sustrik	6. another patch arrives on ML
[12:31] sustrik	etc.
[12:31] mato	ok, two things here
[12:31] mato	or, two different approaches to fix the problem
[12:31] mato	1. use local topic branches for your own work, so it's not on your 'master' branch at all
[12:32] mato	and/or 2. (the safest approach), use two separate clones of zeromq2 for your work
[12:32] mato	in other words, one clone for your maintainer hayt
[12:32] mato	*hat
[12:32] mato	where all you do is apply patches as they come, merge branches, and push to github
[12:32] sustrik	i'm doing 2
[12:32] mato	and a separate clone for your contributor hat
[12:33] sustrik	but accidents happen
[12:33] sustrik	i have to think of my local naming convention to make is safe...
[12:33] mato	git clone ...
[12:33] mato	mv zeromq2 zeromq2-integration
[12:33] mato	git clone ...
[12:34] mato	problem solved :-)
[12:34] sustrik	something like that
[12:34] mato	alternatively, clone your personal zeromq2 repo from the git:// url
[12:34] mato	then you can't push to origin at all
[12:34] sustrik	up to now i was naming the clones randomly
[12:34] sustrik	which produced the accident
[12:34] mato	yeah, well, there are many ways to do it...
[12:34] sustrik	np, i'll think of something
[12:35] mato	ok, another question...
[12:36] mato	how much effort would it be to get SO_LINGER implemented?
[12:36] sustrik	no idea
[12:36] sustrik	you've seen how the shutdown code looks like
[12:36] sustrik	one day at most for coding
[12:37] sustrik	unspecified time to fix the resulting bugs
[12:41] mato	sustrik: well, the problem is we have no clean "abort" path right now
[12:41] sustrik	ack
[12:41] mato	sustrik: in other words, say a 0MQ-based API sends off some messages
[12:41] sustrik	i know what you mean
[12:41] mato	sustrik: then some timeout hits, or whatever... I can't get rid of a socket with messages pending on it
[12:42] sustrik	you can close it
[12:42] mato	yeah, but it'll hang around forever
[12:42] sustrik	it will be there but invisible to you
[12:42] mato	and term will block
[12:42] sustrik	right
[12:42] sustrik	right
[12:42] mato	plus the socket will reconnect, or whatever...
[12:42] sustrik	sure
[12:42] sustrik	:)
[12:42] mato	so to complete the semantics, some kind of SO_LINGER or at least "get rid of this socket now" thingy is required
[12:43] mato	otherwise it's not sane...
[12:43] sustrik	ack
[12:43] pieterh	mato: good news, I just tried git am on a mailbox file made by copy/paste of 'original email' in gmail and it works perfectly
[12:44] mato	pieterh_: good for you :-)
[12:44] mato	pieterh_: also there's some way to get emails out of gmail via IMAP
[12:45] mato	pieterh_: so you could have a process where you move patches to apply into a "Patches to apply" folder
[12:45] mato	pieterh_: then have a command line pipe that grabs that folder and shoves it into git-am
[12:45] mato	pieterh_: no cut/paste involved...
[12:49] mato	sustrik: anyhow, what about that test_poll? i pointed you to brian's tests which should be trivial to port to C++...
[12:49] sustrik	sure, go on
[12:49] mato	:-)
[12:50] sustrik	:o)
[12:50] mato	I was kind of hoping since you're mucking with the code that you'll do it
[12:50] mato	but I see you obviously don't believe in tests :-)
[12:50] sustrik	the problem is that it's not easy to check all the paths in the poll implementation
[12:50] mato	start somewhere
[12:50] sustrik	you have to generate internal events somehow etc.
[12:51] mato	ok, i get it
[12:51] mato	since sustrik says "it's too hard" :-)
[12:51] sustrik	let's rather start with "what has to be tested"
[12:52] mato	did you look at brian's test scripts at all?
[12:52] mato	they're pretty good
[12:52] sustrik	nope, where can i find them?
[12:52] mato	I wrote you an email
[12:52] mato	with the URL :-)
[12:52] mato	sustrik: http://github.com/zeromq/pyzmq/blob/master/zmq/tests/test_poll.py
[12:53] sustrik	hm, there are no timeout used there afaics
[12:53] mato	sure, but it's a start
[12:55] sustrik	ok, i'll give it a try once i have some time free
[12:55] mato	sure, if i have time i'll ping you and look at it myself if you've not started on it
[12:55] sustrik	ok
[13:01] jason	When using socket identity for durable sockets is there a way to query the sending socket to see what messages still haven't been received?
[13:01] pieterh	jason__, nope
[13:19] drbobbeaty	I'm using ZMQ 2.0.7 and ran into this error message on a ZMQ_SUB socket using the "epgm://" transport (OpenPGM) and wanted to know if anyone had seen this before: The error says:
[13:20] drbobbeaty	(process:10408): Pgm-WARNING **: peer expired, tsi 26.117.130.238.238.254.49517
[13:21] drbobbeaty	The ZMQ messages didn't stop, but I didn't know what to make of the error.
[13:22] drbobbeaty	As a side note, is there any targeted release of ZMQ that will incorporate the new OpenPGM with the better communication between OpenPGM and ZMQ? (Pieter H mentioned it when he was here giving a talk)
[13:40] mikko	hmm
[13:40] mikko	i was looking at adding ICC builds for zeromq
[13:40] mikko	but it looks like it doesn't qualify for intel free tools
[13:40] sustrik	mikko: why not?
[13:41] mikko	http://software.intel.com/en-us/articles/non-commercial-software-faq/
[13:42] mikko	i am not sure if the second last question applies to me
[13:42] mikko	or wait
[13:43] mikko	does imatix charge for support? does that extend to people outside that organisation?
[13:43] mikko	no idea
[13:43] sustrik	i recall we had a free icc license once
[13:43] sustrik	let me ask intel guys about it
[13:45] sustrik	the text there seems to be nonsence
[13:45] sustrik	with such restrictions noone would qualify
[13:55] mikko	sustrik: true
[13:55] mikko	would be nice to add sun studio as well
[13:55] mikko	but that is as far as i know a free download
[13:56] sustrik	mikko: yes, it would be nice
[13:56] sustrik	it's up to you :)
[13:56] mikko	i'll make it happen
[13:57] mikko	i don't have internet at my new flat yet so might take longer
[13:58] sustrik	no haste
[13:59] sustrik	btw, how does hudson know when to rebuild?
[14:00] mikko	it polls SCM every 15 minutes and builds if there are changes
[14:01] sustrik	hm, jzmq was fixed this morning
[14:01] sustrik	hudson shows last failure 14hrs ago
[14:01] mikko	currently it polls only zeromq
[14:01] mikko	the bindings are not being polled
[14:01] mikko	i could add polling for individual bindings as well
[14:02] mikko	currently it polls zeromq2 master and maint branches and builds if there are changes
[14:02] mikko	all bindings are built as dependent projects
[14:03] mikko	my initial thinking is that eventually it would do a lot of polling if everything polled
[14:07] mikko	logging in allows you to configure / manually kick off builds
[14:07] mikko	but can't really open it to everyone as people can execute arbitrary shell commands
[14:07] sustrik	mikko: sure
[14:09] sustrik	nice, jzmq/maint is now ok
[14:09] mikko	i could disable erlzmq/maint build
[14:09] mikko	as it will always fail
[14:09] sustrik	yes, please
[14:09] sustrik	erlzmq doesn't work with maint
[14:10] mikko	0%
[14:10] mikko	ermm
[14:10] mikko	zeromq perl needs work on both branches
[14:10] sustrik	seen it
[14:10] sustrik	lestrrat, summon!
[14:11] sustrik	hm, it's past midnight in japan
[14:11] sustrik	never mind
[14:32] CIA-14	zeromq2: 03Martin Pales 07master * r03a18c2 10/ src/clock.cpp :
[14:32] CIA-14	zeromq2: zmq::clock_t : return correct value in rdtsc() on solaris
[14:32] CIA-14	zeromq2: Function clock_t::rdtsc() now returns correct value when compiled
[14:32] CIA-14	zeromq2: with sunstudio 12 compiler.
[14:32] CIA-14	zeromq2: Signed-off-by: Martin Pales <m.pales@gmail.com> - http://bit.ly/bBjhV0
[15:04] mikko	hmm
[15:04] mikko	sun studio doesn't seem to do the trick out of the box
[15:17] mikko	ah
[15:17] mikko	got it
[15:19] mikko	https://gist.github.com/909b0835608e559e3fb8
[15:19] mikko	this is what i get with sun studio
[15:21] delaney	is there anyway for a XREP to work with http over tcp? i can get requests but how would i send them back given the request doesn't have an zmq envelope?
[15:22] mikko	delaney: not without creating a wrapper
[15:22] cremes	delaney: 0mq doesn't handle that... you should look at the mongrel2 project: http://mongrel2.org/home
[15:23] delaney	gotcha, thanks!
[15:28] mikko	a lot of errors are elimanated by removing _GNU_SOURCE definition
[15:43] mikko	mato: are you there? i got a build related patch / idea
[15:50] mato	mikko: yes?
[15:50] mikko	https://gist.github.com/b78863aa72bf01f0f1a7
[15:51] mikko	you reckon this is OK?
[15:51] mikko	it fixes the build on my sun studio installation
[15:51] mikko	_GNU_SOURCE seems to define a lot of stuff in headers that's not supported by non-gnu compilers
[15:51] mato	hmm
[15:52] mikko	im yet to test ICC
[15:52] mato	yeah, and i guess sun studio does not try to specifically be GCC-compatible
[15:52] mato	ICC when I last looked tried quite a bit harder to be compatible with GCC where possible
[15:52] mikko	ill send the patch to mailing-list after ICC tests
[15:52] mato	Linux is generally quite forgiving of absence or presence-of feature flags so that should be fine
[15:53] mato	yes please, test, etc....
[15:53] mikko	http://gist.github.com/626438
[15:54] mikko	sun studio gives a couple of those as well
[15:54] mato	hmm, dunno about that one, ask sustrik
[15:57] mikko	ICC build fails
[16:00] sustrik	hm, the worker routing should have C signature rather than C++ signature
[16:04] mikko	sustrik: see src/epoll.cpp line 141
[16:04] mikko	hmm
[16:04] mikko	nm
[16:05] sustrik	int n = epoll_wait (epoll_fd, &ev_buf [0], max_io_events,
[16:05] sustrik	timeout ? timeout : -1);
[16:06] mikko	yeah
[16:06] mikko	trying to figure out why i'm getting a compilation error on that line
[16:06] mikko	about signedness
[16:07] mikko	epoll_wait takes an int as last param?
[16:08] sustrik	yes
[16:08] sustrik	it should be explicitly cast to int, yes
[16:09] sustrik	otherwise it's uint64_t
[16:39] ptrb	so it's possible for a ZMQ_SUB to connect() to more than 1 ZMQ_PUB, but can we instead connect() on a ZMQ_PUB to more than one ZMQ_SUB?
[16:44] sustrik	sure
[16:54] mikko	hopefully the patches came through
[16:58] mikko	if not, they are here as well http://valokuva.org/~mikko/misc/
[16:58] mikko	now i gotta run, see you tomorrow (latest)
[17:01] ptrb	sustrik: when I did that, the HWM behavior on the ZMQ_PUB wasn't respected
[17:03] sustrik	mikko: cyl
[17:03] sustrik	ptrb: what have you observed exactly?
[17:04] ptrb	I still need to do the "simple complete reproducable example" step, but my experience was ZMQ_PUB with HWM=1 then connect()'ed to one or more ZMQ_SUB sockets (which were bind()ed) had the effect of HWM=0 (unlimited) when I started publishing shit
[17:06] sustrik	you mean the memory grew without limit?
[17:07] ptrb	correct
[17:10] sustrik	ptrb: looks like a bug
[17:12] ptrb	ok, let me make something reproducible, and if it reproduces, I'll file.. something.. somewhere
[17:29] sustrik	mato: can you approve mikko's patch no. 0001
[17:30] sustrik	oops, done, sorry
[18:00] mato	sustrik: beer o'clock?
[18:11] delaney	is http://api.zeromq.org/zmq.html down?
[18:14] mato	delaney: yeah, it looks like the ISP has some kind of outage
[18:14] mato	delaney: started about 15mins ago
[18:26] delaney	i know there is a high water mark per socket, but is there one per id?
[18:26] delaney	the api reference made a reference that makes it seem like there is
[18:27] delaney	but i could find the way to set it
[18:46] cremes	delaney: high and low water marks are per socket; the socket identity doesn't have anything to do with it
[18:47] delaney	right but say you have 1000s of messages for a client and they don't come back?
[18:48] delaney	is there a way to tell zeromq, you can clear all the messages for 'Lucy'
[19:04] cremes	delaney: no
[20:30] delaney	cremes: so how would you deal with an environment where a high volume of transient clients may disconnect ungracefully and leave stuff on the queue basically forever?
[20:31] cremes	delaney: i should have written a bit more up above than "no" :)
[20:31] cremes	if you have a publisher putting out say 100 msgs/s and you have subscribers coming in and out all of the time, the subscribers
[20:31] cremes	who disconnect/close their sockets will cause the publisher to drop those messages
[20:32] delaney	and in a xreq/xrep setup?
[20:32] cremes	so internally you could look at it like each identity has its own queue
[20:32] cremes	but that is not exposed to you at all; it all is handled by the library
[20:33] cremes	xreq will block when it hits its high water mark
[20:33] cremes	it should unblock if all subscribers drop their connections but i don't see that specifically documented
[20:34] cremes	and if it doesn't, it's probably a bug
[20:34] cremes	delaney: does that help?
[20:36] delaney	heres the concrete issue, maybe that'll help. writing a game server, client are out of our control obviously and may disconnect with proper shutdown. we are using XREQ for the client and XREP for the server to allow bi-directional traffic. say we are sending messages to the client for a specific amount of time and if they don't respond with a least a ping they timeout. Now the server has a bunch of messages on its queue that'll never go away. And eve
[20:38] delaney	there is nothing in the api doc to say how it drops
[20:38] delaney	is it by time?
[20:41] cremes	mato or sustrik can give you a definitive answer since they are deep into the source
[20:42] cremes	however...
[20:42] cremes	for xreq, queued messages should be dropped/deleted as soon as the 0mq socket detects that the other end is gone
[20:42] cremes	are you seeing it behave differently?
[20:43] delaney	yeah, i looking at the XREP side right now
[20:44] cremes	is your server opening a xrep or xreq socket?
[20:44] delaney	xrep
[20:44] delaney	so the behavior is Drop
[20:45] delaney	i just need to know if its not allowing more stuff to the transport queue (bad) or gets rid of the oldest message (good in my case)
[20:45] cremes	the docs are pretty clear on this
[20:45] cremes	i'll quote a small piece:
[20:45] cremes	Likewise, any messages routed to a non-existent peer or a peer for which the individual high water mark has been reached shall also be dropped.
[20:45] cremes	so if the peer disappears, those messages are dropped
[20:46] cremes	even the ones that are already queued
[20:46] cremes	make sense?
[20:46] delaney	right... but how do you set an individual high water mark?
[20:46] delaney	that was my initial question :P
[20:46] cremes	you don't; it is global for the socket
[20:47] cremes	so how many xreq sockets are going to be connected to the server's xrep socket?
[20:47] delaney	if its global how do you have an individual one too?
[20:47] delaney	hopefully in the 1000s
[20:47] cremes	an individual what?
[20:48] cremes	the HWM is global for each xrep socket but you can have different HWMs for different xrep sockets
[20:48] cremes	is that what you wanted to know?
[20:48] delaney	the confusing part is it says in that sentence there is an individual HWM but then you just said there is only a global one
[20:48] cremes	each socket has its own HWM
[20:48] delaney	OH, so its a global value per connection to the XREP?
[20:48] cremes	right
[20:49] cremes	XREP-1 can have HWM equal to 100
[20:49] delaney	k
[20:49] cremes	while XREP-2 has HWM set to 5500
[20:49] delaney	but in my case there is only 1 xrep
[20:49] delaney	and 1000 xreq connected to it
[20:49] cremes	right
[20:50] delaney	okay let me make an example real quick
[20:50] cremes	sure
[20:51] cremes	maybe this will help... let's say you have 3 xreq sockets connecting to your xrep
[20:52] cremes	2 of them are very fast while 1 is very slow, so it queues messages for the slow one
[20:52] cremes	if the slow socket's queue hits the HWM, it will drop messages only for that one
[20:52] cremes	the fast sockets will continue to get messages
[20:52] cremes	so internally there is probably a separate message queue for each connected socket
[20:53] cremes	the HWM is enforced separately for each connected socket
[20:53] cremes	that behavior is pretty specific to xrep sockets
[20:55] delaney	AH
[20:55] deri	i thought the point of the hwm was just to get the average right so that chances are there will be enough overall headroom to keep things going. anyway, delaney do you think that a kernel parameter, assuming you are using the linux kernel, like inet_peer_maxttl could help here? maybe you can the offending connections can be clipped away so zeromq can notice it and free individual buffers
[20:56] delaney	the wording is confusing... its not HWM per socket... its HWM per connection to a socket (since a socket have multiple connections)
[20:57] cremes	delaney: right
[20:57] cremes	if you have some wording that would be clearer, you should send in a documentation patch
[20:58] cremes	the doc mostly covers the extreme cases... 1) all sockets hit HWM or 2) there are no peers
[20:58] cremes	you are concerned with the case in the middle... some sockets are fine but a few hit HWM
[21:04] delaney	if thats the case then awesome. yeah the docs actually scared me with the drop on a global queue, which is scary
[21:05] delaney	whereas its a global max for each queue.
[21:09] lestrrat	sustrik/mikko: I'm running YAPC::Asia today and tomorrow -- and I'm going to be burnt out a few days after that, so won't be doing anything during that timeframe :/
[21:56] sustrik	lestrrat: good luck!