ZeroMq IRC Log

Wednesday February 16, 2011

[Time] Name	Message
[00:00] cremes	kdj: you are welcome; remember to pay it forward at some point ;)
[00:01] kdj	Hopefully that won't involve inadvertently leading someone astray. ;)
[04:50] zedas	sustrik: hey so i still see this poll 100%CPU bug even with the latest 2.1.0 and cannot figure out how to fix it. http://dpaste.de/oxeU/
[04:51] zedas	sustrik: it looks like i'll have to dig into the zeromq code and pull out the error handling that zmq_poll does.
[07:07] sustrik	zedas: any chance to reproduce the problem here?
[07:24] zedas	sustrik: it happens at random on my servers, so next time i can gdb to it and debug for you.
[07:24] sustrik	thanks
[07:25] sustrik	find out what's looping there
[07:31] zedas	well i'm pretty sure it's zmq_poll not handling an EAGAIN on zeromq socket objects.
[07:31] zedas	but i'll confirm it and work up a fix. looking at the code the fix may be a flag that says to not stuff errors.
[07:42] sustrik	let me have a look...
[07:42] sustrik	zedas: is that linux?
[07:44] sustrik	hm, the only operations on zeromq socket objects witihn zmq_poll is zmq_getsockopt()
[07:45] sustrik	are you getting EAGAIN from zmq_getsockopt()? That should not happen as far as i am aware.
[08:41] enleth	Hello
[08:44] sustrik	hi
[08:44] enleth	I've got a problem building OMQ - it's about the luuid dependency. OMQ reuires the OSSP UUID library, which, due to conflicts with (unmaintained and dropped a long time ago) e2fsprogrs libuuid was renamed to libossp-uuid in my Linux distribution and, FWIW, this was generally a very popular solution.
[08:45] enleth	But OMQ looks for libuuid and the configure script does not accept an alternate name
[08:45] sustrik	enleth: easy, patch the build system and submit the patch to the mailing list
[08:47] enleth	Oh, and I just noticed that the proper libuuid provides an uuid-config program for the configure script to use
[08:48] enleth	uuid-config --libs outputs -lossp-uuid, which should be used
[08:48] enleth	I guess this is what the build system should do instead of using a hardcoded name
[08:49] sustrik	great, post your suggestion to the mailing list
[08:49] enleth	The problem is, my skills with autotools are crap
[08:49] sustrik	so that build system maintainers can have a look at it
[08:49] enleth	OK, will do
[08:49] sustrik	thanks
[09:16] enleth	No, wait. It does use the old e2fsprogrs-derived libuuid, my bad.
[09:25] enleth	OK, there is no problem, the distro repository managed screwed up and I got a bad upgrade installed
[09:34] mikko	pieterh: are you here sir?
[09:34] pieterh	mikko: just arrived
[09:36] mikko	is there a specific reason why test functions are compiled into zfl ?
[09:36] mikko	are those symbols needed outside selftest?
[09:38] pieterh	if you can find a way of compiling a single C source file into two objects, I'm hapopy
[09:38] pieterh	*happy
[09:39] pieterh	but the test code must, for me, be in the same source as the actual class
[09:40] mikko	pieterh: ok
[09:41] pieterh	mikko: if people are unhappy about extra code in their executables we could make these conditionally compiled
[09:42] mikko	pieterh: currently i was prototyping something like: https://gist.github.com/3f2a43c19ab439b22884
[09:42] mikko	separate tests/ directory
[09:42] mikko	but i think it should be possible to create separate objects from same code as well
[09:43] pieterh	aaaghhhh.....
[09:43] pieterh	it's the reason the man pages are a real pain to maintain
[09:43] pieterh	separate directories look very clean organizationally
[09:43] pieterh	but they ensure pieces don't get updated
[09:44] pieterh	also the test cases are essential documentation, like the rest of the source file
[09:44] pieterh	running the selftest in its own directory is a good idea, some tests need to mess with files
[09:44] pieterh	but I really, really don't want to find ourselves in the zmq situation of having lots of code that lacks test cases
[09:46] mikko	hmmm, this gives me additional idea
[09:47] mikko	in zfls case code coverage reports would make sense
[09:47] pieterh	yes, as an additional insurance
[09:47] pieterh	that's meta testing, i.e. testing the test cases
[09:48] pieterh	it's a neat idea
[09:48] mikko	i'll put this on my todo
[09:49] pieterh	there's still space? I'm impressed...
[09:49] pieterh	:-)
[09:49] ianbarber	speaking off: mikko, did you move the pear server?
[09:49] ianbarber	s/off/of
[09:50] mikko	ianbarber: in the works
[09:51] mikko	hmm
[09:51] mikko	i guess the easiest would be to put it where rest of the stuff is
[09:51] mikko	you can point the dns to 193.211.31.222
[09:51] ianbarber	i'll point both php. and pear. at it
[09:56] mikko	looking at the apache rewrite rules this makes me want to use nginx
[09:56] kristsk	nginx is about the same, imho
[09:57] mikko	kristsk: dynamic virtualhosting seems a lot more fluent in nginx
[10:03] kristsk	might be because of nginx's config syntax, it does not feel so archaic
[10:05] kristsk	in regard of vhosts lighthttpd is sought to be more powerfull
[10:48] Guthur	sustrik: do you think having wsapoll on supported win platforms would be good to have?
[11:06] ianbarber	pieterh: about?
[11:06] pieterh	ianbarber: about 12ish
[11:06] pieterh	:-) how can I help you?
[11:06] ianbarber	:)
[11:08] ianbarber	i discovered the wonderful land of martinique has a fun domain extension, so the PHP extension is now available on php.zero.mq and pear.zero.mq (pear is the PHP package system). Was wondering - do you want to have zeromq.org listen on zero.mq and www.zero.mq as well, i can point in that direction (even if it's just doing a rewrite to zeromq.org)
[11:08] ianbarber	we can redirect from hosting as well, just seems like if someone does go to just zero.mq, they should end up at the site. It's on mikko's geo redundant hosting at the mo :)
[11:09] pieterh	oh... I like it
[11:10] ianbarber	i can point them at 74.86.234.146 if thats sensible - don't know if there are any weird wikidot issues or similar
[11:10] pieterh	if you point www.zero.mq to www.wikidot.com, then I'll add it to the custom domains on the website
[11:10] ianbarber	cool
[11:10] ianbarber	will do
[11:10] pieterh	wow, we have a sneaky short domain name, so 2011...
[11:11] pieterh	afair you can't point zero.mq itself to a DNS name, you need to use the IP address there
[11:11] mikko	you can
[11:11] mikko	CNAME
[11:11] ianbarber	should be able to cname it
[11:11] ianbarber	yeah
[11:12] pieterh	maybe I'm confusing with wildcards, I usually point *.zeromq.org etc. to wikidot
[11:13] pieterh	cname the heck out of it, ianbarber, I'll add the custom domain entries in an hour or so
[11:14] ianbarber	cool :) I've pointed zero.mq and www.zero.mq, so we'll see then :)
[11:15] pieterh	would it be worth doing something sneaky like...
[11:15] pieterh	zero.mq -> redirects to www.zeromq.org/community... ?
[11:16] pieterh	I can make that work
[11:16] pieterh	ianbarber: DNS seems to have propagated already, that was fast
[11:17] pieterh	presumably not cached anywhere
[11:17] ianbarber	yeah, www wasn't set up before
[11:17] ianbarber	redirect to community sounds like an idea, if that's doable on wikidot
[11:18] pieterh	np, give me 5 minutes...
[11:20] enleth	mikko: hey, just wanted to say thanks for the PHP bindings for ZMQ, TC and TT - good job!
[11:21] enleth	It was pretty amusing when I opened the github page for ZMQ bindings a moment ago, saw your username and though "well, I know this guy - what else I might be using that he did?"
[11:21] pieterh	ianbarber: ok, done, give it a whirl... :-)
[11:22] mikko	enleth: my pleasure
[11:25] ianbarber	pieterh: i seem to be getting a password page. that's odd
[11:25] pieterh	ianbarber: ah, my bad, it's still a private site, will fix immediately
[11:25] ianbarber	ah, cool
[11:26] pieterh	ianbarber: try again now?
[11:26] ianbarber	yep, that's looking good
[11:26] ianbarber	very nice!
[11:26] pieterh	it's very cool
[14:30] ianbarber	pieterh: was thinking, I've noticed that there are a lot of questions on the mailing lists that are solved in broadly the same way, even from people who have read the guide (myself included). I was wondering whether there is any value in some sort of 0MQ pattern library.
[14:30] ianbarber	sort of like http://developer.yahoo.com/ypatterns/ but with messaging patterns at all kinds of scales
[14:31] ianbarber	i like how the generic pattern is described and an example given in each one of those (http://developer.yahoo.com/ypatterns/navigation/accordion.html)
[14:32] ianbarber	but still pretty simple, 1 page
[14:51] mikko	cremes: you can run make check
[14:52] mikko	(dont wanna confuse the thread as it has moved on from there)
[14:54] cremes	mikko: here are the results: https://gist.github.com/829493
[14:54] cremes	failure...
[14:57] mikko	No space left on device
[14:58] cremes	how did i not see that?.... bleary eyed after 30 hours of debugging...
[14:58] mikko	also, the tests wont output anything but they should assert on failure
[14:58] mikko	return code for success is 0
[14:59] cremes	oh wait, that out of space condition happened overnight as i was testing something
[14:59] cremes	hold on a sec
[15:00] cremes	mikko: reload the gist; it now shows all as passing
[15:01] cremes	my problem with running the tests was i didn't know the right make target
[15:01] mikko	make check is autotools default test target
[15:01] cremes	i tried 'make test' and 'make all' but the former didn't exist and the latter didn't seem to run them
[15:01] cremes	didn't know that
[15:02] mikko	make test seems to be widely used as well
[15:02] cremes	looks like all is well; chalk this up to user error
[15:02] cremes	yeah, maybe adding it as an additional target would be a nice convenience
[15:02] mikko	i'll add that on todo
[15:09] pieterh	ianbarber, was eating lunch... back now
[15:10] pieterh	imo there would be value in a pattern library but I'll use Sustrik's Law here
[15:10] pieterh	find the person to collect and maintain the patterns, and the problem is solved :-)
[15:12] mikko	http://build.valokuva.org/job/test-gcov/5/cobertura/?
[15:12] mikko	zfl code coverage
[15:13] ianbarber	pieterh_: fair point, i do appreciate sustrik's law :)
[15:13] mikko	hmm source code missing
[15:14] pieterh	ianbarber, you can also apply Pieter's Response to Calls to Action
[15:15] pieterh	"Excellent idea, Ian, I'm curious to see how you do it"
[15:15] pieterh	Known in ruder groups as nypa :-)
[15:16] pieterh	Actually, I do have a more positive idea
[15:17] pieterh	When you see a question solved in a way you think is reusable, point me to it, and I'll cover it in the Guide at some stage
[15:17] pieterh	there are a lot of chapters waiting to be written
[15:19] ianbarber	yeah, i think that's good. the guide really is the basis for shared understanding about it
[15:19] mikko	ah
[15:19] mikko	finally it works
[15:19] mikko	https://build.valokuva.org/job/test-gcov/7/cobertura/_default_/zfl_rpcd_c/
[15:19] ianbarber	i'm happy to do some patterns (at some point!) just wanted to check whether it fitted in with the direction you're taking the guide
[15:25] pieterh	mikko: sweet!
[15:25] pieterh	ianbarber, I guess the Guide aims to be the bible, eventually
[15:26] pieterh	modest aims
[15:27] pieterh	we can (and by 'we' I really mean 'you') start by collecting text on a wiki page
[15:27] pieterh	that is trivial, shareable, reusable
[15:27] pieterh	join the zero.mq (great name) wiki if you're not already on it, start a docs:patterns page...
[15:27] ianbarber	yeah. i think the tricky thing with the guide is balancing it for new users, and for experienced ones
[15:28] ianbarber	yep, i'm on it, will do
[15:28] pieterh	no problem, really... start with simple stuff, get more advanced as you go along
[15:28] pieterh	patterns would be like a cookbook, stand alone section, with some good indexing
[15:28] ianbarber	yeah
[15:28] ianbarber	that's pretty much the idea, just to have a concise example of different interaction models really
[15:29] pieterh	even copy/paste of solutions from the email list is a good start
[15:29] pieterh	don't worry about producing prose, that's my speciality
[15:42] mikko	hi Steve-o
[15:42] Steve-o	hi mikko
[15:43] Steve-o	working on new house this week, a foreclosure so many minor issues :/
[15:44] Steve-o	back in HK next week and back to work
[15:44] mikko	is your house in the states?
[15:44] Steve-o	upstate NY
[15:45] mikko	are you moving there?
[15:45] Steve-o	near Martha Stewart is about the only notable point
[15:46] Steve-o	eventually moving there, house prices very cheap so good time to buy
[15:46] Steve-o	I have another year for my greencard it looks
[15:48] Steve-o	so what is the status on autoconf in zeromq, anymore changes required?
[15:49] mikko	i think we should get 2.1.0 out before refactoring the openpgm part
[15:49] mikko	it seems to be working well with openpgm trunk
[15:50] mikko	some open issues to solve but in general good
[15:50] mikko	one of them is how to link openpgm if zeromq invokes openpgm built?
[15:50] mikko	build*
[15:50] mikko	install openpgm.so and use the shared lib?
[15:51] mikko	use the object files directly?
[15:51] mikko	etc
[15:51] Steve-o	good question, distros would like shared libs,
[15:51] mikko	linking libpgm.a into libzmq.so works on linux (assuming libpgm.a is position independent code) but not portable
[15:52] mikko	yes, my only fear is the following scenario:
[15:52] Steve-o	which is why I don't have a dll on Windows
[15:52] mikko	user has libpgm installed, now installs zeromq with openpgm support, zeromq invokes openpgm build and overwrites the existing installation
[15:54] Steve-o	well a common solution I have seen to that is to install the dependent library in a sub-directory of the product build instead of the OS preferred location
[15:55] mikko	but distros dont like rpath
[15:55] Steve-o	For convenience prefer static libraries but allow distributions to use shared libraries.
[15:55] Steve-o	so out of the tarball build libpgm.a but allow configure options for libpgm.so
[15:56] mikko	but how to use the libpgm.a ?
[15:56] mikko	.a inside .so is not really portable
[15:56] Steve-o	really? where isn't it valid?
[15:57] mikko	i can check, i did a lot of googling on this
[16:01] mikko	hp-ux seems to be one
[16:01] mikko	is that even supported by openpgm?
[16:01] Steve-o	not yet
[16:02] mikko	Libtool convenience library
[16:02] mikko	sounds like a solution
[16:02] mikko	http://sourceware.org/autobook/autobook/autobook_92.html
[16:02] mikko	groups together a set of object files
[16:02] Steve-o	that's what zeromq is using now
[16:03] mikko	but on different side of the fence
[16:04] Steve-o	let me read up on HPUX, v10 was fine as I remember they broke various things with 11
[16:04] mikko	Steve-o: how does bundling convenience lib on openpgm side sound like?
[16:04] mikko	and then zeromq links that
[16:04] mikko	i could at least investigate this as it seems like a portable option
[16:05] Steve-o	ok, if you can provide the code, I'm not sure how this is supposed to work with two different projects
[16:06] mikko	the ultimate goal i guess is to have both as shared libraries provided by distros
[16:06] mikko	but in the meanwhile convenience lib sounds ok
[16:06] mikko	i'll put this on my ever growing todo list
[16:07] mikko	at least i got ZFL code coverage working today
[16:08] Steve-o	using gcov?
[16:08] mikko	yes
[16:09] mikko	http://build.valokuva.org/job/test-gcov/7/cobertura/_default_/
[16:12] Steve-o	nice, it's tedious getting those percentages higher though
[16:13] mikko	true. you would almost need to preload a malloc implementation that fails randomly
[16:13] mikko	to test all asserts
[16:14] mikko	and even then it would be very random
[16:15] mikko	might add same thing for zeromq later as well
[16:17] cremes	pieterh: ping... where is "zhelpers.h"? i can't compile your mailbugz.c test without it
[16:18] pieterh	cremes: sorry!
[16:18] pieterh	adding it now
[16:18] sustrik	cremes, just replace it with zmq.h
[16:18] pieterh	sustrik: nope, that and other stuff
[16:18] sustrik	there's nothing used from zhelpers.h in the code
[16:18] sustrik	i've just compiled it
[16:18] sustrik	aha
[16:18] sustrik	replace the line with:
[16:18] sustrik	#include <zmq.h>
[16:19] sustrik	#include <stdio.h>
[16:19] sustrik	#include <string.h>
[16:19] sustrik	that works
[16:19] pieterh	yes, that works
[16:21] Steve-o	mikko: ok so I already have the libtool convenience library libpgm.la, libtool is giving me the shared and static libraries for free
[16:22] mikko	Steve-o: i know, but if you link against the .la from zeromq it gives a a warning "Warning: libpgm.la won't be deployed"
[16:22] mikko	not sure if that can be ignored
[16:22] mikko	maybe it can
[16:22] Steve-o	is that because of a noinst_ line?
[16:23] mikko	i got a local branch here
[16:23] pieterh	sustrik, in the pubsub pattern it is IMO a design flaw that zmq_connect is asynchronous
[16:23] mikko	Steve-o: https://gist.github.com/3f14f1a3f816df3016c7
[16:23] mikko	these are some of the changes related to zeromq
[16:24] pieterh	that is, on a sub socket
[16:25] mikko	Steve-o: i tested that with ./configure --without-documentation --with-pgm=/tmp/to/pgm-trunk
[16:28] Steve-o	mikko: I can't find anything on that error message in google
[16:29] sustrik	pieterh_: why so?
[16:32] zedas	sustrik: yep that's linux. why?
[16:33] sustrik	there are 2 implementations of zmq_poll
[16:33] sustrik	i was just checking which one to have a look at
[16:34] sustrik	anyway, what's the problem you were referring to?
[16:35] sustrik	ah, the EAGAINs in strace
[16:35] Steve-o	mikko: maybe I need to explicitly add a noinst_LTLIBRARIES instead of lib_LTLIBRARIES
[16:35] sustrik	i've missed the link, sorry
[16:36] cremes	pieterh_: i don't compile a lot of C programs; what's the gcc line to get the example to compile & link?
[16:37] cremes	nm, got it
[16:38] mikko	Steve-o: gimme a sec
[16:38] mikko	getting the exact error message out
[16:40] pieterh	cremes: sorry, my irc client's not alerting me for some reason
[16:41] cremes	no worries; i compiled the program and ran it successfully
[16:41] cremes	no failures
[16:41] cremes	so my hypothesis must be wrong as to the cause of the mailbox assertion
[16:41] pieterh	at least it's not that simple
[16:42] cremes	right
[16:42] pieterh	assuming I got the case right
[16:42] pieterh	5M writes, 5M reads...
[16:42] cremes	you got it right as i explained it
[16:42] pieterh	sustrik: sorry also, I'm not getting beeps...
[16:42] pieterh	pubsub fails, for every new user, in the same way
[16:43] pieterh	subscriber connects, then misses X milliseconds of messages
[16:43] sustrik	ack
[16:43] pieterh	i'm not sure doing a synchronous connect would make any difference
[16:43] sustrik	it probably won't
[16:43] cremes	pieterh_: is it possible to run this under gdb and have it drop into the debugger instead of asserting?
[16:43] pieterh	but there is definitely a problem when every user hits the same issue
[16:44] cremes	if so, perhaps i could dump the contents of the mailbox?
[16:44] pieterh	cremes, afaik usual tactic is to get a core dump and then debug from there
[16:44] pieterh	i'm no gdb expert
[16:44] cremes	ok, how can i force it to core?
[16:44] sustrik	cremes: p
[16:44] pieterh	divide by zero?
[16:45] sustrik	when you want to dump the content of variable x, type "p x"
[16:45] pieterh	assertion failure will produce a core I think
[16:45] pieterh	you need to enable core dumps for your process
[16:45] pieterh	ulimit unlimited
[16:45] cremes	yeah, right now i'm set for a core size of 0; i can change that
[16:46] cremes	are you sure the assertion causes a core?
[16:46] sustrik	cremes: just start the executable under gdb
[16:46] sustrik	it will stop and get you gdb prompt when assertion is hit
[16:46] pieterh	yeah, and make sure it's compiled and linked for debugging
[16:47] cremes	i did run it under gdb several times; the assertion would cause the ruby runtime to throw an exception and exit cleanly
[16:47] cremes	so gdb never caught the issue
[16:47] cremes	outside of gdb, it would assert
[16:47] cremes	very frustrating
[16:47] sustrik	:\|
[16:48] pieterh	my brute force approach would be to add code to 0MQ that dumps the mailbox just before it asserts, under the same conditions
[16:48] pieterh	don't waste time trying to get debuggers working unless you already know how
[16:49] cremes	i like that suggestion; any suggestion on how to dump the mailbox?
[16:49] sustrik	cremes: i would do a bit different thing
[16:49] cremes	i.e. are there important components to capture or should i just dump it as a string?
[16:49] cremes	sustrik: talk to me
[16:49] sustrik	just print some text when mailbox_t::send() is invoked
[16:50] sustrik	in you scenario the number of invocations should be pretty modest
[16:50] sustrik	if it starts printing a lot of text, there's definitely some problem there
[16:50] cremes	sustrik: just any text like "mailbox.send!"
[16:50] sustrik	yes
[16:50] cremes	ok
[16:51] cremes	so you don't care about the contents of the mailbox
[16:51] sustrik	not really
[16:51] cremes	ok, i'll try that now
[16:51] sustrik	if we find out that there's a lot of commands is written
[16:52] sustrik	we'll have a look at what kind of commands is that
[17:10] pieterh	mikko: I'm improving some of the coverage but it's always going to miss on assertions, apparently
[17:15] mikko	pieterh_: yes
[17:15] mikko	i dont think it calculates those
[17:15] pieterh	hey, my beep works now! :-)
[17:15] mikko	and 100% is not really a realistic or even desirable aim
[17:15] mikko	Steve-o: i think i solved it
[17:15] pieterh	ok, I'll improve some of the coverage but like Steve-o says, it gets messy
[17:16] mikko	Steve-o: almost. now it compiles twice it seems
[17:17] ianbarber	just to be doubly sure
[17:18] ianbarber	compare the two, and if they're different fail on a non-deterministic build process
[17:25] cremes	sustrik: yes, there are a lot of commands sent
[17:25] sustrik	ok
[17:25] cremes	what's the next step? dump the commands when the mailbox buffer is increased?
[17:26] sustrik	can you print out cmd->type?
[17:26] sustrik	that will show what kind of commands are being passed
[17:26] cremes	sure; on every invocation or just when the buffer size is increased?
[17:26] sustrik	on every invocation
[17:26] cremes	ok
[17:28] cremes	sustrik: i see it's defined as an enum so i can use printf("%d", cmd->type), yes?
[17:29] sustrik	printf("%d", (int) cmd->type)
[17:29] sustrik	just in case
[17:29] cremes	k
[17:31] cremes	sustrik: mailbox.cpp:158:34: error: base operand of '->' has non-pointer type 'const zmq::command_t'
[17:32] cremes	??
[17:32] sustrik	it should be cmd_.type
[17:32] sustrik	sorry
[17:34] cremes	clean compile; running now
[17:37] cremes	sustrik: here's a sampling of what i see; the cmd is wrapped in TY(cmd) so i can pick it out of the log easily
[17:37] cremes	https://gist.github.com/829782
[17:39] sustrik	do you call connect or bind in that app?
[17:40] cremes	i call both early on during setup, then i don't need to call it again
[17:41] sustrik	ah, both are in the same process
[17:41] sustrik	i see
[17:41] sustrik	what transport do you use?
[17:41] sustrik	tcp? inproc? ipc?
[17:41] cremes	tcp
[17:42] sustrik	cremes: can you printf something in connect_sessio_t::detached() function?
[17:42] cremes	yes
[17:42] sustrik	(that wey we'll see if there a lot of reconnecting happening)
[17:47] cremes	sustrik: [cremes@box1 servers]$ grep ^REC t.out \| wc -l
[17:47] cremes	921674
[17:47] cremes	so yes, lots of reconnects
[17:51] cremes	this is a threaded app writing to the same logfile so sequence is a bit suspect
[17:52] cremes	however, it appears each REC is always followed by command type 1 or 3 (plug or attach) which kind of makes sense
[17:56] sustrik	yep
[17:56] sustrik	the question is: why does it reconnect at all?
[17:57] sustrik	moreover, the default reconnect interval is 0.1 sec
[17:57] cremes	agreed; all transport strings are of the form 'tcp://127.0.0.1:<port>'
[17:57] sustrik	so to get 921675 would require couple of days
[17:58] sustrik	you mean: "both" rather than "all", right?
[17:59] cremes	there is a PUB producer, a FORWARDER device, and multiple SUB consumers in this process
[17:59] cremes	they all connect up in the beginning and should never close/reconnect for the life of the program
[17:59] cremes	so each one has its own transport connection string; that's what i meant by 'all'
[18:00] sustrik	i see
[18:00] sustrik	how many SUBs?
[18:01] cremes	let's see...
[18:02] sustrik	approximately...
[18:02] sustrik	tens, hundreds, thousands?
[18:02] cremes	5 in the clients and 1 in the FORWARDER, so about 6 (i might be forgetting one or two)
[18:02] sustrik	ok
[18:03] sustrik	do you close the FORWARDER before closing the SUBs?
[18:04] cremes	they should all terminate at roughly the same time when i interrupt/kill the program
[18:04] sustrik	ok
[18:05] cremes	otherwise, the FORWARDER never exits
[18:05] sustrik	does FORWARDER connect to SUBs or other way round?
[18:05] cremes	FORWARDER binds while all clients connect
[18:05] sustrik	what about PUB?
[18:05] cremes	actually, the IN/OUT sockets on the FORWARDER always bind
[18:06] cremes	the publisher connects too as a result
[18:06] sustrik	ok
[18:06] sustrik	hm, i see no reason then for reconnections to happen
[18:06] sustrik	are you 100% that the connection strings match?
[18:07] cremes	match in what way?
[18:07] cremes	they are all tcp?
[18:07] sustrik	are they the same on bind and connect side?
[18:07] cremes	if they weren't, the data wouldn't flow through my app, yes?
[18:08] sustrik	ah, the data flow through
[18:08] sustrik	i see
[18:08] sustrik	to all 5 subs?
[18:09] cremes	yes, the main PUB broadcasts and the 5 subs each sub to everything
[18:09] sustrik	and all of them actually get the data
[18:09] cremes	if they weren't getting the data, the app would lock (and produce something similar to EFSM in my code)
[18:09] sustrik	ok, good
[18:09] cremes	it's kind of like an election algo
[18:10] sustrik	to be frank, i have no idea what's going on there
[18:10] sustrik	if the reconnections happen
[18:10] sustrik	one would expect that at least some messages would be lost
[18:10] cremes	any idea how i can do 900k reconnects in a few minutes?
[18:10] sustrik	no idea
[18:11] cremes	<sigh>
[18:11] sustrik	have you changed the default RECONNECT_IVL?
[18:11] cremes	btw, i ran pieter's mailbugz code with these debug prints in them and it barely puts out anything at all
[18:11] sustrik	exactly
[18:11] cremes	nope, no changes to RECONNECT_IVL
[18:12] cremes	all sockets are allocated in their default state; the one exception is calling setsockopt on the SUBs to set their subscription string
[18:12] cremes	and i always set my own IDENTITY
[18:12] cremes	someone on the ML suggested a potential IDENTITY collision; could that be related?
[18:13] sustrik	maybe
[18:13] sustrik	do you have identity collisions there?
[18:13] sustrik	like all 5 subs having the same identity?
[18:13] cremes	i shouldn't; the identity is always <random id>.<sock type>.<server type> where random id is 0 to 999_999_999
[18:14] cremes	it's possible there is a collision but improbable
[18:14] sustrik	try printing them out
[18:15] cremes	i'm auditing that right now; give me 5m
[18:22] pieterh	cremes, are you sure you're initializing your random number generator?
[18:22] pieterh	if not, every client will produce an identical 'random' sequence
[18:23] pieterh	cremes: if you're getting reconnects, presumably you're also getting disconnects
[18:23] pieterh	and if you can find those, you can find what is causing them
[18:24] pieterh	sustrik: how many places does 0MQ forcefully disconnect a subscriber socket without assertion
[18:24] pieterh	do we have the sys: transport working?
[18:26] sustrik	pieterh_: every time the other side does something unexpected
[18:26] sustrik	such as sending malformed frame
[18:26] pieterh	yeah, but are there lots of places in the code?
[18:26] sustrik	not much, 3-4 i think
[18:26] pieterh	right... so a few well-placed prints and we'll know what's happening
[18:26] sustrik	sys: works
[18:27] sustrik	and should be used exactly for this kind of thing
[18:27] pieterh	precisely
[18:27] sustrik	the only problem is that some kind of throttling
[18:27] sustrik	not to get the log overloaded
[18:27] pieterh	presumably all we care about are the first 10 messages
[18:27] sustrik	i.e. if the same problem happens over and over again
[18:27] sustrik	in 10us intevals
[18:28] sustrik	only the fist one should be reported
[18:28] pieterh	add a numeric code and ignore duplicates, standard solution
[18:28] sustrik	you need some kind of state machine
[18:29] sustrik	if connecting fails happens log it a switch to "no log" state
[18:29] cremes	alas, it looks to me like they are all unique: https://gist.github.com/829865
[18:29] sustrik	any subsequent connect failures are not logged
[18:29] cremes	interestingly, out of all 4 components, only the one that crashes shows the hundreds of thousands of reconnects
[18:29] sustrik	when connecting succeeds, switch back to "log" state
[18:29] sustrik	thus making next disconnect being logged
[18:30] pieterh	you don't need anything that complex IMO
[18:30] pieterh	if you get more than 1000 alerts on sys: you can give up
[18:30] pieterh	(in a minute, hour, day_)
[18:30] pieterh	cremes, you may want to add prints in the places 0MQ disconnects subscribers
[18:31] sustrik	cremes: no more ideas, i need a minimal test case
[18:31] sustrik	to reproduce it here
[18:31] cremes	ok, i'll keep poking at it
[18:32] pieterh	sustrik, can you tell cremes where those 3-4 places are?
[18:32] sustrik	hm, i don't know precisely
[18:32] sustrik	dhammika have supplied those patches
[18:33] pieterh	it used to be easy 'egrep assert *.cpp'
[18:33] sustrik	maybe check the commit log
[18:33] sustrik	?
[18:33] sustrik	it's not asserting, it's closing the connections
[18:36] cremes	this conversation gave me an idea... i think i am narrowing it down... give me 10m
[18:36] pieterh	sustrik, I meant, it used to assert and I remember several times chasing down framing errors by sticking printfs into those places
[18:38] sustrik	these assert have been removed via your "0MQ competition" :)
[18:46] cremes	sustrik, pieterh_: found it!
[18:46] pieterh	:-)
[18:46] cremes	i had a duplicate identity on an unrelated XREQ socket!
[18:46] pieterh	yay!
[18:47] cremes	to reproduce, it's probably just these steps...
[18:47] pieterh	sustrik, does zmq already send anything to sys:?
[18:47] cremes	1. create a QUEUE device that binds to some port
[18:47] sustrik	pieterh_: no
[18:47] cremes	2. create two XREQ (REQ too?) sockets, set their identity the same and connect them to the QUEUE
[18:47] cremes	3. check for reconnects
[18:48] cremes	4. Maybe need to send some data through first...?
[18:48] pieterh	cremes: I'll make a test case later on
[18:48] cremes	ok, thanks pieter! your c skills far exceed my own
[18:48] pieterh	what do you mean by 'check for reconnects'?
[18:48] cremes	thank you both so much for working through this with me; this conversation solved it
[18:49] pieterh	i'd like to get a test case that results in a crash
[18:49] cremes	i added a debug statement to connect_session.cpp:detach to print whenever it detached and attempted a reconnect
[18:50] cremes	let me try to write one in ruby
[18:50] pieterh	this still does not explain why the mailbox exploded...
[18:50] cremes	then i can tell you exactly what needs to be done in c
[18:50] pieterh	yes, make a ruby test case, that's perfect
[18:50] pieterh	exploding mailbox gets double score
[18:50] pieterh	sustrik: we should start to send stuff to sys: where we used to assert
[18:51] pieterh	if you can document how to use sys: from inside zmq I can try that
[18:51] pieterh	ideally, a 1-liner that sends a string... :-)
[18:52] pieterh	then we can apply that to cremes test case and check that we'd have caught this error
[18:52] sustrik	log ();
[18:52] sustrik	it's ther
[18:52] sustrik	e
[18:53] pieterh	ah, it requires all the work of creating a message first
[18:53] pieterh	that's tedious
[18:54] pieterh	do we have a standardized format for sys://log messages?
[18:54] pieterh	sorry to complain but if this was packaged somewhat, it'd be easier for people to use it internally
[18:55] sustrik	no format
[18:55] sustrik	just use string atm
[18:55] sustrik	we can polish the format later on
[18:55] pieterh	every single object has a log method?
[18:56] pieterh	inherited from object_t?
[18:56] sustrik	yes
[18:56] pieterh	so the log method there could be somewhat expanded to take a string and create/destroy the msg itself
[18:57] pieterh	afaics we don't use this anywhere yet
[18:57] sustrik	sure
[18:57] pieterh	and then we need a documented parsable format for messages
[18:57] pieterh	minimal
[18:57] pieterh	easy to improve later
[18:57] sustrik	ack
[18:57] pieterh	ok, I'll try my hand at this, apologies in advance...
[19:11] cremes	yes! i have a reproducible crasher in ruby!
[19:12] cremes	pieterh_: do you want the ruby code or an explanation for translation to c?
[19:12] pieterh	cremes, I think we need to log two issues here
[19:13] cremes	ok, i can create the issues, but i only see one
[19:13] pieterh	(a) lack of any warning to the app developer
[19:13] pieterh	(b) mailbox crash
[19:13] pieterh	(b) is the critical one, and the ruby example will be valuable there
[19:13] cremes	ok, so (a) is for tracking a new feature request to add the sys: stuff, yes?
[19:13] pieterh	yes
[19:13] cremes	ok, i'll write them up
[19:14] pieterh	well, we don't track new feature requests, so perhaps skip (a)
[19:14] cremes	i'll add it to the wiki 3.0/roadmap page
[19:14] pieterh	i'm working on it now... :-)
[19:15] cremes	ok!
[19:27] cremes	pieterh_: preview this issue and let me know if you need more details to reproduce in c: https://github.com/zeromq/zeromq2/issues/165
[19:27] pieterh	cremes, thanks!
[19:27] cremes	pieterh_: i've spent the last 96 hours banging on this! i'm happy to see it solved!
[19:28] pieterh	that's why i'm doing the sys://log stuff, it's insane to lose so much time to a missing warning
[19:28] cremes	honestly, i'm taking the rest of the day off.... i feel deflated
[19:29] pieterh	sustrik, what's the correct way to work with a msg in the zmq core?
[19:30] pieterh	::zmq_msg_t or is there a message class I'm missing?
[19:39] enleth	Hello
[19:39] enleth	mikko: is the API documentation at http://valokuva.org/~mikko/php-zmq/ supposed to be inaccessible?
[19:46] ianbarber	enleth: check php.zero.mq
[19:46] ianbarber	references probably need updating
[19:50] mikko	enleth: yes
[19:55] enleth	ianbarber: thanks, that's it.
[19:56] enleth	mikko: can I suggest a 302 redirect to the new address?
[19:56] pieterh	cremes: still there?
[19:56] enleth	The old one is all over the latest git tree
[20:01] mikko	done
[20:01] cremes	pieterh_: for a bit more; what's up?
[20:01] pieterh	just wondered if you need to actually use the REQ/REP sockets to create the crash
[20:01] pieterh	or just bind them and BOOM
[20:02] pieterh	s/bind/connect
[20:02] cremes	let me see... give me 1m
[20:03] cremes	pieterh_: nope, crashes without using them; good catch... it's even more reduced now
[20:03] pieterh	excellent...
[20:03] pieterh	thanks a lot
[20:03] cremes	i'm no longer thinking clearly otherwise i would have tried that :)
[20:04] pieterh	it's been a long day :-)
[20:04] cremes	pieterh_: looks like you do need the REQ socket too
[20:05] cremes	a pair of REP's with the same ID is insufficient
[20:05] cremes	it's been a long week
[20:05] pieterh	ack, you need a pair of sockets with one disconnecting the other
[20:05] pieterh	presumably, I'll test that, it applies to all relevant socket types
[20:05] pieterh	it's been a long year!
[20:05] cremes	perhaps...
[20:05] pieterh	hang on...
[20:06] pieterh	:-)
[20:06] cremes	heh
[20:10] pieterh	cremes: bingo, I reproduced it!
[20:10] cremes	awesome!
[20:11] cremes	once started it only takes a few seconds to exhaust that buffer even when it's 5MB!
[20:11] pieterh	just connect two req sockets with same ID, wait 1 second...
[20:11] pieterh	I'm going to try with other socket types now
[20:16] pieterh	cremes: it affects all socket types
[20:16] pieterh	any combination of bind/connect, even pub connecting to sub
[20:17] cremes	wow
[20:17] cremes	this might explain a lot of people's problems; there are several issues open about this assertion
[20:18] pieterh	ironically 0MQ used to assert before :-)
[20:18] cremes	oh, the irony... :(
[20:19] cremes	well, i'm just glad it's no longer a mystery
[20:19] pieterh	anyhow, this makes it much easier to solve properly
[20:19] cremes	other than this, i haven't hit an assertion in a long time
[20:22] pieterh	indeed, we had a competition to kill them :-)
[22:30] jol	pieterh: nice talk at fosdem, I just watch it.
[22:40] Steve-o	thx mikko
[23:53] dan	hello
[23:53] mikko	hi
[23:53] dan	i've got a question about zmq
[23:54] mikko	go ahead
[23:54] dan	is there any reason I should not be able to implement a pubsub connection with one side in python and the other in cpp over ipc?
[23:55] mikko	no reason
[23:55] mikko	should be perfectly ok
[23:55] dan	hm
[23:56] mikko	you are not seeing any messages?
[23:56] dan	i see them when I use tcp, but not when i use ipc
[23:56] mikko	can i see the code?
[23:56] dan	whats the best way to share it?
[23:56] dan	copy paste in here?
[23:56] mikko	gist.github.com
[23:57] dan	sure - let me copy the code