[Time] Name | Message |
[00:00] cremes
|
kdj: you are welcome; remember to pay it forward at some point ;)
|
[00:01] kdj
|
Hopefully that won't involve inadvertently leading someone astray. ;)
|
[04:50] zedas
|
sustrik: hey so i still see this poll 100%CPU bug even with the latest 2.1.0 and *cannot* figure out how to fix it. http://dpaste.de/oxeU/
|
[04:51] zedas
|
sustrik: it looks like i'll have to dig into the zeromq code and pull out the error handling that zmq_poll does.
|
[07:07] sustrik
|
zedas: any chance to reproduce the problem here?
|
[07:24] zedas
|
sustrik: it happens at random on my servers, so next time i can gdb to it and debug for you.
|
[07:24] sustrik
|
thanks
|
[07:25] sustrik
|
find out what's looping there
|
[07:31] zedas
|
well i'm pretty sure it's zmq_poll not handling an EAGAIN on zeromq socket objects.
|
[07:31] zedas
|
but i'll confirm it and work up a fix. looking at the code the fix may be a flag that says to not stuff errors.
|
[07:42] sustrik
|
let me have a look...
|
[07:42] sustrik
|
zedas: is that linux?
|
[07:44] sustrik
|
hm, the only operations on zeromq socket objects witihn zmq_poll is zmq_getsockopt()
|
[07:45] sustrik
|
are you getting EAGAIN from zmq_getsockopt()? That should not happen as far as i am aware.
|
[08:41] enleth
|
Hello
|
[08:44] sustrik
|
hi
|
[08:44] enleth
|
I've got a problem building OMQ - it's about the luuid dependency. OMQ reuires the OSSP UUID library, which, due to conflicts with (unmaintained and dropped a long time ago) e2fsprogrs libuuid was renamed to libossp-uuid in my Linux distribution and, FWIW, this was generally a very popular solution.
|
[08:45] enleth
|
But OMQ looks for libuuid and the configure script does not accept an alternate name
|
[08:45] sustrik
|
enleth: easy, patch the build system and submit the patch to the mailing list
|
[08:47] enleth
|
Oh, and I just noticed that the proper libuuid provides an uuid-config program for the configure script to use
|
[08:48] enleth
|
uuid-config --libs outputs -lossp-uuid, which should be used
|
[08:48] enleth
|
I guess this is what the build system should do instead of using a hardcoded name
|
[08:49] sustrik
|
great, post your suggestion to the mailing list
|
[08:49] enleth
|
The problem is, my skills with autotools are crap
|
[08:49] sustrik
|
so that build system maintainers can have a look at it
|
[08:49] enleth
|
OK, will do
|
[08:49] sustrik
|
thanks
|
[09:16] enleth
|
No, wait. It does use the old e2fsprogrs-derived libuuid, my bad.
|
[09:25] enleth
|
OK, there is no problem, the distro repository managed screwed up and I got a bad upgrade installed
|
[09:34] mikko
|
pieterh: are you here sir?
|
[09:34] pieterh
|
mikko: just arrived
|
[09:36] mikko
|
is there a specific reason why test functions are compiled into zfl ?
|
[09:36] mikko
|
are those symbols needed outside selftest?
|
[09:38] pieterh
|
if you can find a way of compiling a single C source file into two objects, I'm hapopy
|
[09:38] pieterh
|
*happy
|
[09:39] pieterh
|
but the test code must, for me, be in the same source as the actual class
|
[09:40] mikko
|
pieterh: ok
|
[09:41] pieterh
|
mikko: if people are unhappy about extra code in their executables we could make these conditionally compiled
|
[09:42] mikko
|
pieterh: currently i was prototyping something like: https://gist.github.com/3f2a43c19ab439b22884
|
[09:42] mikko
|
separate tests/ directory
|
[09:42] mikko
|
but i think it should be possible to create separate objects from same code as well
|
[09:43] pieterh
|
aaaghhhh.....
|
[09:43] pieterh
|
it's the reason the man pages are a real pain to maintain
|
[09:43] pieterh
|
separate directories look very clean organizationally
|
[09:43] pieterh
|
but they ensure pieces don't get updated
|
[09:44] pieterh
|
also the test cases are essential documentation, like the rest of the source file
|
[09:44] pieterh
|
running the selftest in its own directory is a good idea, some tests need to mess with files
|
[09:44] pieterh
|
but I really, really don't want to find ourselves in the zmq situation of having lots of code that lacks test cases
|
[09:46] mikko
|
hmmm, this gives me additional idea
|
[09:47] mikko
|
in zfls case code coverage reports would make sense
|
[09:47] pieterh
|
yes, as an additional insurance
|
[09:47] pieterh
|
that's meta testing, i.e. testing the test cases
|
[09:48] pieterh
|
it's a neat idea
|
[09:48] mikko
|
i'll put this on my todo
|
[09:49] pieterh
|
there's still space? I'm impressed...
|
[09:49] pieterh
|
:-)
|
[09:49] ianbarber
|
speaking off: mikko, did you move the pear server?
|
[09:49] ianbarber
|
s/off/of
|
[09:50] mikko
|
ianbarber: in the works
|
[09:51] mikko
|
hmm
|
[09:51] mikko
|
i guess the easiest would be to put it where rest of the stuff is
|
[09:51] mikko
|
you can point the dns to 193.211.31.222
|
[09:51] ianbarber
|
i'll point both php. and pear. at it
|
[09:56] mikko
|
looking at the apache rewrite rules this makes me want to use nginx
|
[09:56] kristsk
|
nginx is about the same, imho
|
[09:57] mikko
|
kristsk: dynamic virtualhosting seems a lot more fluent in nginx
|
[10:03] kristsk
|
might be because of nginx's config syntax, it does not feel so archaic
|
[10:05] kristsk
|
in regard of vhosts lighthttpd is sought to be more powerfull
|
[10:48] Guthur
|
sustrik: do you think having wsapoll on supported win platforms would be good to have?
|
[11:06] ianbarber
|
pieterh: about?
|
[11:06] pieterh
|
ianbarber: about 12ish
|
[11:06] pieterh
|
:-) how can I help you?
|
[11:06] ianbarber
|
:)
|
[11:08] ianbarber
|
i discovered the wonderful land of martinique has a fun domain extension, so the PHP extension is now available on php.zero.mq and pear.zero.mq (pear is the PHP package system). Was wondering - do you want to have zeromq.org listen on zero.mq and www.zero.mq as well, i can point in that direction (even if it's just doing a rewrite to zeromq.org)
|
[11:08] ianbarber
|
we can redirect from hosting as well, just seems like if someone does go to just zero.mq, they should end up at the site. It's on mikko's geo redundant hosting at the mo :)
|
[11:09] pieterh
|
oh... I like it
|
[11:10] ianbarber
|
i can point them at 74.86.234.146 if thats sensible - don't know if there are any weird wikidot issues or similar
|
[11:10] pieterh
|
if you point www.zero.mq to www.wikidot.com, then I'll add it to the custom domains on the website
|
[11:10] ianbarber
|
cool
|
[11:10] ianbarber
|
will do
|
[11:10] pieterh
|
wow, we have a sneaky short domain name, so 2011...
|
[11:11] pieterh
|
afair you can't point zero.mq itself to a DNS name, you need to use the IP address there
|
[11:11] mikko
|
you can
|
[11:11] mikko
|
CNAME
|
[11:11] ianbarber
|
should be able to cname it
|
[11:11] ianbarber
|
yeah
|
[11:12] pieterh
|
maybe I'm confusing with wildcards, I usually point *.zeromq.org etc. to wikidot
|
[11:13] pieterh
|
cname the heck out of it, ianbarber, I'll add the custom domain entries in an hour or so
|
[11:14] ianbarber
|
cool :) I've pointed zero.mq and www.zero.mq, so we'll see then :)
|
[11:15] pieterh
|
would it be worth doing something sneaky like...
|
[11:15] pieterh
|
zero.mq -> redirects to www.zeromq.org/community... ?
|
[11:16] pieterh
|
I can make that work
|
[11:16] pieterh
|
ianbarber: DNS seems to have propagated already, that was fast
|
[11:17] pieterh
|
presumably not cached anywhere
|
[11:17] ianbarber
|
yeah, www wasn't set up before
|
[11:17] ianbarber
|
redirect to community sounds like an idea, if that's doable on wikidot
|
[11:18] pieterh
|
np, give me 5 minutes...
|
[11:20] enleth
|
mikko: hey, just wanted to say thanks for the PHP bindings for ZMQ, TC and TT - good job!
|
[11:21] enleth
|
It was pretty amusing when I opened the github page for ZMQ bindings a moment ago, saw your username and though "well, I know this guy - what else I might be using that he did?"
|
[11:21] pieterh
|
ianbarber: ok, done, give it a whirl... :-)
|
[11:22] mikko
|
enleth: my pleasure
|
[11:25] ianbarber
|
pieterh: i seem to be getting a password page. that's odd
|
[11:25] pieterh
|
ianbarber: ah, my bad, it's still a private site, will fix immediately
|
[11:25] ianbarber
|
ah, cool
|
[11:26] pieterh
|
ianbarber: try again now?
|
[11:26] ianbarber
|
yep, that's looking good
|
[11:26] ianbarber
|
very nice!
|
[11:26] pieterh
|
it's very cool
|
[14:30] ianbarber
|
pieterh: was thinking, I've noticed that there are a lot of questions on the mailing lists that are solved in broadly the same way, even from people who have read the guide (myself included). I was wondering whether there is any value in some sort of 0MQ pattern library.
|
[14:30] ianbarber
|
sort of like http://developer.yahoo.com/ypatterns/ but with messaging patterns at all kinds of scales
|
[14:31] ianbarber
|
i like how the generic pattern is described and an example given in each one of those (http://developer.yahoo.com/ypatterns/navigation/accordion.html)
|
[14:32] ianbarber
|
but still pretty simple, 1 page
|
[14:51] mikko
|
cremes: you can run make check
|
[14:52] mikko
|
(dont wanna confuse the thread as it has moved on from there)
|
[14:54] cremes
|
mikko: here are the results: https://gist.github.com/829493
|
[14:54] cremes
|
failure...
|
[14:57] mikko
|
No space left on device
|
[14:58] cremes
|
how did i not see that?.... bleary eyed after 30 hours of debugging...
|
[14:58] mikko
|
also, the tests wont output anything but they should assert on failure
|
[14:58] mikko
|
return code for success is 0
|
[14:59] cremes
|
oh wait, that out of space condition happened overnight as i was testing something
|
[14:59] cremes
|
hold on a sec
|
[15:00] cremes
|
mikko: reload the gist; it now shows all as passing
|
[15:01] cremes
|
my problem with running the tests was i didn't know the right make target
|
[15:01] mikko
|
make check is autotools default test target
|
[15:01] cremes
|
i tried 'make test' and 'make all' but the former didn't exist and the latter didn't seem to run them
|
[15:01] cremes
|
didn't know that
|
[15:02] mikko
|
make test seems to be widely used as well
|
[15:02] cremes
|
looks like all is well; chalk this up to user error
|
[15:02] cremes
|
yeah, maybe adding it as an additional target would be a nice convenience
|
[15:02] mikko
|
i'll add that on todo
|
[15:09] pieterh
|
ianbarber, was eating lunch... back now
|
[15:10] pieterh
|
imo there would be value in a pattern library but I'll use Sustrik's Law here
|
[15:10] pieterh
|
find the person to collect and maintain the patterns, and the problem is solved :-)
|
[15:12] mikko
|
http://build.valokuva.org/job/test-gcov/5/cobertura/?
|
[15:12] mikko
|
zfl code coverage
|
[15:13] ianbarber
|
pieterh_: fair point, i do appreciate sustrik's law :)
|
[15:13] mikko
|
hmm source code missing
|
[15:14] pieterh
|
ianbarber, you can also apply Pieter's Response to Calls to Action
|
[15:15] pieterh
|
"Excellent idea, Ian, I'm curious to see how you do it"
|
[15:15] pieterh
|
Known in ruder groups as nypa :-)
|
[15:16] pieterh
|
Actually, I do have a more positive idea
|
[15:17] pieterh
|
When you see a question solved in a way you think is reusable, point me to it, and I'll cover it in the Guide at some stage
|
[15:17] pieterh
|
there are a lot of chapters waiting to be written
|
[15:19] ianbarber
|
yeah, i think that's good. the guide really is the basis for shared understanding about it
|
[15:19] mikko
|
ah
|
[15:19] mikko
|
finally it works
|
[15:19] mikko
|
https://build.valokuva.org/job/test-gcov/7/cobertura/_default_/zfl_rpcd_c/
|
[15:19] ianbarber
|
i'm happy to do some patterns (at some point!) just wanted to check whether it fitted in with the direction you're taking the guide
|
[15:25] pieterh
|
mikko: sweet!
|
[15:25] pieterh
|
ianbarber, I guess the Guide aims to be the bible, eventually
|
[15:26] pieterh
|
modest aims
|
[15:27] pieterh
|
we can (and by 'we' I really mean 'you') start by collecting text on a wiki page
|
[15:27] pieterh
|
that is trivial, shareable, reusable
|
[15:27] pieterh
|
join the zero.mq (great name) wiki if you're not already on it, start a docs:patterns page...
|
[15:27] ianbarber
|
yeah. i think the tricky thing with the guide is balancing it for new users, and for experienced ones
|
[15:28] ianbarber
|
yep, i'm on it, will do
|
[15:28] pieterh
|
no problem, really... start with simple stuff, get more advanced as you go along
|
[15:28] pieterh
|
patterns would be like a cookbook, stand alone section, with some good indexing
|
[15:28] ianbarber
|
yeah
|
[15:28] ianbarber
|
that's pretty much the idea, just to have a concise example of different interaction models really
|
[15:29] pieterh
|
even copy/paste of solutions from the email list is a good start
|
[15:29] pieterh
|
don't worry about producing prose, that's my speciality
|
[15:42] mikko
|
hi Steve-o
|
[15:42] Steve-o
|
hi mikko
|
[15:43] Steve-o
|
working on new house this week, a foreclosure so many minor issues :/
|
[15:44] Steve-o
|
back in HK next week and back to work
|
[15:44] mikko
|
is your house in the states?
|
[15:44] Steve-o
|
upstate NY
|
[15:45] mikko
|
are you moving there?
|
[15:45] Steve-o
|
near Martha Stewart is about the only notable point
|
[15:46] Steve-o
|
eventually moving there, house prices very cheap so good time to buy
|
[15:46] Steve-o
|
I have another year for my greencard it looks
|
[15:48] Steve-o
|
so what is the status on autoconf in zeromq, anymore changes required?
|
[15:49] mikko
|
i think we should get 2.1.0 out before refactoring the openpgm part
|
[15:49] mikko
|
it seems to be working well with openpgm trunk
|
[15:50] mikko
|
some open issues to solve but in general good
|
[15:50] mikko
|
one of them is how to link openpgm if zeromq invokes openpgm built?
|
[15:50] mikko
|
build*
|
[15:50] mikko
|
install openpgm.so and use the shared lib?
|
[15:51] mikko
|
use the object files directly?
|
[15:51] mikko
|
etc
|
[15:51] Steve-o
|
good question, distros would like shared libs,
|
[15:51] mikko
|
linking libpgm.a into libzmq.so works on linux (assuming libpgm.a is position independent code) but not portable
|
[15:52] mikko
|
yes, my only fear is the following scenario:
|
[15:52] Steve-o
|
which is why I don't have a dll on Windows
|
[15:52] mikko
|
user has libpgm installed, now installs zeromq with openpgm support, zeromq invokes openpgm build and overwrites the existing installation
|
[15:54] Steve-o
|
well a common solution I have seen to that is to install the dependent library in a sub-directory of the product build instead of the OS preferred location
|
[15:55] mikko
|
but distros dont like rpath
|
[15:55] Steve-o
|
For convenience prefer static libraries but allow distributions to use shared libraries.
|
[15:55] Steve-o
|
so out of the tarball build libpgm.a but allow configure options for libpgm.so
|
[15:56] mikko
|
but how to use the libpgm.a ?
|
[15:56] mikko
|
.a inside .so is not really portable
|
[15:56] Steve-o
|
really? where isn't it valid?
|
[15:57] mikko
|
i can check, i did a lot of googling on this
|
[16:01] mikko
|
hp-ux seems to be one
|
[16:01] mikko
|
is that even supported by openpgm?
|
[16:01] Steve-o
|
not yet
|
[16:02] mikko
|
Libtool convenience library
|
[16:02] mikko
|
sounds like a solution
|
[16:02] mikko
|
http://sourceware.org/autobook/autobook/autobook_92.html
|
[16:02] mikko
|
groups together a set of object files
|
[16:02] Steve-o
|
that's what zeromq is using now
|
[16:03] mikko
|
but on different side of the fence
|
[16:04] Steve-o
|
let me read up on HPUX, v10 was fine as I remember they broke various things with 11
|
[16:04] mikko
|
Steve-o: how does bundling convenience lib on openpgm side sound like?
|
[16:04] mikko
|
and then zeromq links that
|
[16:04] mikko
|
i could at least investigate this as it seems like a portable option
|
[16:05] Steve-o
|
ok, if you can provide the code, I'm not sure how this is supposed to work with two different projects
|
[16:06] mikko
|
the ultimate goal i guess is to have both as shared libraries provided by distros
|
[16:06] mikko
|
but in the meanwhile convenience lib sounds ok
|
[16:06] mikko
|
i'll put this on my ever growing todo list
|
[16:07] mikko
|
at least i got ZFL code coverage working today
|
[16:08] Steve-o
|
using gcov?
|
[16:08] mikko
|
yes
|
[16:09] mikko
|
http://build.valokuva.org/job/test-gcov/7/cobertura/_default_/
|
[16:12] Steve-o
|
nice, it's tedious getting those percentages higher though
|
[16:13] mikko
|
true. you would almost need to preload a malloc implementation that fails randomly
|
[16:13] mikko
|
to test all asserts
|
[16:14] mikko
|
and even then it would be very random
|
[16:15] mikko
|
might add same thing for zeromq later as well
|
[16:17] cremes
|
pieterh: ping... where is "zhelpers.h"? i can't compile your mailbugz.c test without it
|
[16:18] pieterh
|
cremes: sorry!
|
[16:18] pieterh
|
adding it now
|
[16:18] sustrik
|
cremes, just replace it with zmq.h
|
[16:18] pieterh
|
sustrik: nope, that and other stuff
|
[16:18] sustrik
|
there's nothing used from zhelpers.h in the code
|
[16:18] sustrik
|
i've just compiled it
|
[16:18] sustrik
|
aha
|
[16:18] sustrik
|
replace the line with:
|
[16:18] sustrik
|
#include <zmq.h>
|
[16:19] sustrik
|
#include <stdio.h>
|
[16:19] sustrik
|
#include <string.h>
|
[16:19] sustrik
|
that works
|
[16:19] pieterh
|
yes, that works
|
[16:21] Steve-o
|
mikko: ok so I already have the libtool convenience library libpgm.la, libtool is giving me the shared and static libraries for free
|
[16:22] mikko
|
Steve-o: i know, but if you link against the .la from zeromq it gives a a warning "Warning: libpgm.la won't be deployed"
|
[16:22] mikko
|
not sure if that can be ignored
|
[16:22] mikko
|
maybe it can
|
[16:22] Steve-o
|
is that because of a noinst_ line?
|
[16:23] mikko
|
i got a local branch here
|
[16:23] pieterh
|
sustrik, in the pubsub pattern it is IMO a design flaw that zmq_connect is asynchronous
|
[16:23] mikko
|
Steve-o: https://gist.github.com/3f14f1a3f816df3016c7
|
[16:23] mikko
|
these are some of the changes related to zeromq
|
[16:24] pieterh
|
that is, on a sub socket
|
[16:25] mikko
|
Steve-o: i tested that with ./configure --without-documentation --with-pgm=/tmp/to/pgm-trunk
|
[16:28] Steve-o
|
mikko: I can't find anything on that error message in google
|
[16:29] sustrik
|
pieterh_: why so?
|
[16:32] zedas
|
sustrik: yep that's linux. why?
|
[16:33] sustrik
|
there are 2 implementations of zmq_poll
|
[16:33] sustrik
|
i was just checking which one to have a look at
|
[16:34] sustrik
|
anyway, what's the problem you were referring to?
|
[16:35] sustrik
|
ah, the EAGAINs in strace
|
[16:35] Steve-o
|
mikko: maybe I need to explicitly add a noinst_LTLIBRARIES instead of lib_LTLIBRARIES
|
[16:35] sustrik
|
i've missed the link, sorry
|
[16:36] cremes
|
pieterh_: i don't compile a lot of C programs; what's the gcc line to get the example to compile & link?
|
[16:37] cremes
|
nm, got it
|
[16:38] mikko
|
Steve-o: gimme a sec
|
[16:38] mikko
|
getting the exact error message out
|
[16:40] pieterh
|
cremes: sorry, my irc client's not alerting me for some reason
|
[16:41] cremes
|
no worries; i compiled the program and ran it successfully
|
[16:41] cremes
|
no failures
|
[16:41] cremes
|
so my hypothesis must be wrong as to the cause of the mailbox assertion
|
[16:41] pieterh
|
at least it's not that simple
|
[16:42] cremes
|
right
|
[16:42] pieterh
|
assuming I got the case right
|
[16:42] pieterh
|
5M writes, 5M reads...
|
[16:42] cremes
|
you got it right as i explained it
|
[16:42] pieterh
|
sustrik: sorry also, I'm not getting beeps...
|
[16:42] pieterh
|
pubsub fails, for every new user, in the same way
|
[16:43] pieterh
|
subscriber connects, then misses X milliseconds of messages
|
[16:43] sustrik
|
ack
|
[16:43] pieterh
|
i'm not sure doing a synchronous connect would make any difference
|
[16:43] sustrik
|
it probably won't
|
[16:43] cremes
|
pieterh_: is it possible to run this under gdb and have it drop into the debugger instead of asserting?
|
[16:43] pieterh
|
but there is definitely a problem when every user hits the same issue
|
[16:44] cremes
|
if so, perhaps i could dump the contents of the mailbox?
|
[16:44] pieterh
|
cremes, afaik usual tactic is to get a core dump and then debug from there
|
[16:44] pieterh
|
i'm no gdb expert
|
[16:44] cremes
|
ok, how can i force it to core?
|
[16:44] sustrik
|
cremes: p
|
[16:44] pieterh
|
divide by zero?
|
[16:45] sustrik
|
when you want to dump the content of variable x, type "p x"
|
[16:45] pieterh
|
assertion failure will produce a core I think
|
[16:45] pieterh
|
you need to enable core dumps for your process
|
[16:45] pieterh
|
ulimit unlimited
|
[16:45] cremes
|
yeah, right now i'm set for a core size of 0; i can change that
|
[16:46] cremes
|
are you sure the assertion causes a core?
|
[16:46] sustrik
|
cremes: just start the executable under gdb
|
[16:46] sustrik
|
it will stop and get you gdb prompt when assertion is hit
|
[16:46] pieterh
|
yeah, and make sure it's compiled and linked for debugging
|
[16:47] cremes
|
i did run it under gdb several times; the assertion would cause the ruby runtime to throw an exception and exit cleanly
|
[16:47] cremes
|
so gdb never caught the issue
|
[16:47] cremes
|
outside of gdb, it would assert
|
[16:47] cremes
|
very frustrating
|
[16:47] sustrik
|
:|
|
[16:48] pieterh
|
my brute force approach would be to add code to 0MQ that dumps the mailbox just before it asserts, under the same conditions
|
[16:48] pieterh
|
don't waste time trying to get debuggers working unless you already know how
|
[16:49] cremes
|
i like that suggestion; any suggestion on how to dump the mailbox?
|
[16:49] sustrik
|
cremes: i would do a bit different thing
|
[16:49] cremes
|
i.e. are there important components to capture or should i just dump it as a string?
|
[16:49] cremes
|
sustrik: talk to me
|
[16:49] sustrik
|
just print some text when mailbox_t::send() is invoked
|
[16:50] sustrik
|
in you scenario the number of invocations should be pretty modest
|
[16:50] sustrik
|
if it starts printing a lot of text, there's definitely some problem there
|
[16:50] cremes
|
sustrik: just any text like "mailbox.send!"
|
[16:50] sustrik
|
yes
|
[16:50] cremes
|
ok
|
[16:51] cremes
|
so you don't care about the contents of the mailbox
|
[16:51] sustrik
|
not really
|
[16:51] cremes
|
ok, i'll try that now
|
[16:51] sustrik
|
if we find out that there's a lot of commands is written
|
[16:52] sustrik
|
we'll have a look at what kind of commands is that
|
[17:10] pieterh
|
mikko: I'm improving some of the coverage but it's always going to miss on assertions, apparently
|
[17:15] mikko
|
pieterh_: yes
|
[17:15] mikko
|
i dont think it calculates those
|
[17:15] pieterh
|
hey, my beep works now! :-)
|
[17:15] mikko
|
and 100% is not really a realistic or even desirable aim
|
[17:15] mikko
|
Steve-o: i think i solved it
|
[17:15] pieterh
|
ok, I'll improve some of the coverage but like Steve-o says, it gets messy
|
[17:16] mikko
|
Steve-o: almost. now it compiles twice it seems
|
[17:17] ianbarber
|
just to be doubly sure
|
[17:18] ianbarber
|
compare the two, and if they're different fail on a non-deterministic build process
|
[17:25] cremes
|
sustrik: yes, there are a *lot* of commands sent
|
[17:25] sustrik
|
ok
|
[17:25] cremes
|
what's the next step? dump the commands when the mailbox buffer is increased?
|
[17:26] sustrik
|
can you print out cmd->type?
|
[17:26] sustrik
|
that will show what kind of commands are being passed
|
[17:26] cremes
|
sure; on every invocation or just when the buffer size is increased?
|
[17:26] sustrik
|
on every invocation
|
[17:26] cremes
|
ok
|
[17:28] cremes
|
sustrik: i see it's defined as an enum so i can use printf("%d", cmd->type), yes?
|
[17:29] sustrik
|
printf("%d", (int) cmd->type)
|
[17:29] sustrik
|
just in case
|
[17:29] cremes
|
k
|
[17:31] cremes
|
sustrik: mailbox.cpp:158:34: error: base operand of '->' has non-pointer type 'const zmq::command_t'
|
[17:32] cremes
|
??
|
[17:32] sustrik
|
it should be cmd_.type
|
[17:32] sustrik
|
sorry
|
[17:34] cremes
|
clean compile; running now
|
[17:37] cremes
|
sustrik: here's a sampling of what i see; the cmd is wrapped in TY(cmd) so i can pick it out of the log easily
|
[17:37] cremes
|
https://gist.github.com/829782
|
[17:39] sustrik
|
do you call connect or bind in that app?
|
[17:40] cremes
|
i call both early on during setup, then i don't need to call it again
|
[17:41] sustrik
|
ah, both are in the same process
|
[17:41] sustrik
|
i see
|
[17:41] sustrik
|
what transport do you use?
|
[17:41] sustrik
|
tcp? inproc? ipc?
|
[17:41] cremes
|
tcp
|
[17:42] sustrik
|
cremes: can you printf something in connect_sessio_t::detached() function?
|
[17:42] cremes
|
yes
|
[17:42] sustrik
|
(that wey we'll see if there a lot of reconnecting happening)
|
[17:47] cremes
|
sustrik: [cremes@box1 servers]$ grep ^REC t.out | wc -l
|
[17:47] cremes
|
921674
|
[17:47] cremes
|
so yes, lots of reconnects
|
[17:51] cremes
|
this is a threaded app writing to the same logfile so sequence is a bit suspect
|
[17:52] cremes
|
however, it appears each REC is always followed by command type 1 or 3 (plug or attach) which kind of makes sense
|
[17:56] sustrik
|
yep
|
[17:56] sustrik
|
the question is: why does it reconnect at all?
|
[17:57] sustrik
|
moreover, the default reconnect interval is 0.1 sec
|
[17:57] cremes
|
agreed; all transport strings are of the form 'tcp://127.0.0.1:<port>'
|
[17:57] sustrik
|
so to get 921675 would require couple of days
|
[17:58] sustrik
|
you mean: "both" rather than "all", right?
|
[17:59] cremes
|
there is a PUB producer, a FORWARDER device, and multiple SUB consumers in this process
|
[17:59] cremes
|
they all connect up in the beginning and should never close/reconnect for the life of the program
|
[17:59] cremes
|
so each one has its own transport connection string; that's what i meant by 'all'
|
[18:00] sustrik
|
i see
|
[18:00] sustrik
|
how many SUBs?
|
[18:01] cremes
|
let's see...
|
[18:02] sustrik
|
approximately...
|
[18:02] sustrik
|
tens, hundreds, thousands?
|
[18:02] cremes
|
5 in the clients and 1 in the FORWARDER, so about 6 (i might be forgetting one or two)
|
[18:02] sustrik
|
ok
|
[18:03] sustrik
|
do you close the FORWARDER before closing the SUBs?
|
[18:04] cremes
|
they should all terminate at roughly the same time when i interrupt/kill the program
|
[18:04] sustrik
|
ok
|
[18:05] cremes
|
otherwise, the FORWARDER never exits
|
[18:05] sustrik
|
does FORWARDER connect to SUBs or other way round?
|
[18:05] cremes
|
FORWARDER binds while all clients connect
|
[18:05] sustrik
|
what about PUB?
|
[18:05] cremes
|
actually, the IN/OUT sockets on the FORWARDER always bind
|
[18:06] cremes
|
the publisher connects too as a result
|
[18:06] sustrik
|
ok
|
[18:06] sustrik
|
hm, i see no reason then for reconnections to happen
|
[18:06] sustrik
|
are you 100% that the connection strings match?
|
[18:07] cremes
|
match in what way?
|
[18:07] cremes
|
they are all tcp?
|
[18:07] sustrik
|
are they the same on bind and connect side?
|
[18:07] cremes
|
if they weren't, the data wouldn't flow through my app, yes?
|
[18:08] sustrik
|
ah, the data flow through
|
[18:08] sustrik
|
i see
|
[18:08] sustrik
|
to all 5 subs?
|
[18:09] cremes
|
yes, the main PUB broadcasts and the 5 subs each sub to everything
|
[18:09] sustrik
|
and all of them actually get the data
|
[18:09] cremes
|
if they weren't getting the data, the app would lock (and produce something similar to EFSM in my code)
|
[18:09] sustrik
|
ok, good
|
[18:09] cremes
|
it's kind of like an election algo
|
[18:10] sustrik
|
to be frank, i have no idea what's going on there
|
[18:10] sustrik
|
if the reconnections happen
|
[18:10] sustrik
|
one would expect that at least some messages would be lost
|
[18:10] cremes
|
any idea how i can do 900k reconnects in a few minutes?
|
[18:10] sustrik
|
no idea
|
[18:11] cremes
|
<sigh>
|
[18:11] sustrik
|
have you changed the default RECONNECT_IVL?
|
[18:11] cremes
|
btw, i ran pieter's mailbugz code with these debug prints in them and it barely puts out anything at all
|
[18:11] sustrik
|
exactly
|
[18:11] cremes
|
nope, no changes to RECONNECT_IVL
|
[18:12] cremes
|
all sockets are allocated in their default state; the one exception is calling setsockopt on the SUBs to set their subscription string
|
[18:12] cremes
|
and i always set my own IDENTITY
|
[18:12] cremes
|
someone on the ML suggested a potential IDENTITY collision; could that be related?
|
[18:13] sustrik
|
maybe
|
[18:13] sustrik
|
do you have identity collisions there?
|
[18:13] sustrik
|
like all 5 subs having the same identity?
|
[18:13] cremes
|
i shouldn't; the identity is always <random id>.<sock type>.<server type> where random id is 0 to 999_999_999
|
[18:14] cremes
|
it's *possible* there is a collision but *improbable*
|
[18:14] sustrik
|
try printing them out
|
[18:15] cremes
|
i'm auditing that right now; give me 5m
|
[18:22] pieterh
|
cremes, are you sure you're initializing your random number generator?
|
[18:22] pieterh
|
if not, every client will produce an identical 'random' sequence
|
[18:23] pieterh
|
cremes: if you're getting reconnects, presumably you're also getting disconnects
|
[18:23] pieterh
|
and if you can find those, you can find what is causing them
|
[18:24] pieterh
|
sustrik: how many places does 0MQ forcefully disconnect a subscriber socket without assertion
|
[18:24] pieterh
|
do we have the sys: transport working?
|
[18:26] sustrik
|
pieterh_: every time the other side does something unexpected
|
[18:26] sustrik
|
such as sending malformed frame
|
[18:26] pieterh
|
yeah, but are there lots of places in the code?
|
[18:26] sustrik
|
not much, 3-4 i think
|
[18:26] pieterh
|
right... so a few well-placed prints and we'll know what's happening
|
[18:26] sustrik
|
sys: works
|
[18:27] sustrik
|
and should be used exactly for this kind of thing
|
[18:27] pieterh
|
precisely
|
[18:27] sustrik
|
the only problem is that some kind of throttling
|
[18:27] sustrik
|
not to get the log overloaded
|
[18:27] pieterh
|
presumably all we care about are the first 10 messages
|
[18:27] sustrik
|
i.e. if the same problem happens over and over again
|
[18:27] sustrik
|
in 10us intevals
|
[18:28] sustrik
|
only the fist one should be reported
|
[18:28] pieterh
|
add a numeric code and ignore duplicates, standard solution
|
[18:28] sustrik
|
you need some kind of state machine
|
[18:29] sustrik
|
if connecting fails happens log it a switch to "no log" state
|
[18:29] cremes
|
alas, it looks to me like they are all unique: https://gist.github.com/829865
|
[18:29] sustrik
|
any subsequent connect failures are not logged
|
[18:29] cremes
|
interestingly, out of all 4 components, only the one that crashes shows the hundreds of thousands of reconnects
|
[18:29] sustrik
|
when connecting succeeds, switch back to "log" state
|
[18:29] sustrik
|
thus making next disconnect being logged
|
[18:30] pieterh
|
you don't need anything that complex IMO
|
[18:30] pieterh
|
if you get more than 1000 alerts on sys: you can give up
|
[18:30] pieterh
|
(in a minute, hour, day_)
|
[18:30] pieterh
|
cremes, you may want to add prints in the places 0MQ *disconnects* subscribers
|
[18:31] sustrik
|
cremes: no more ideas, i need a minimal test case
|
[18:31] sustrik
|
to reproduce it here
|
[18:31] cremes
|
ok, i'll keep poking at it
|
[18:32] pieterh
|
sustrik, can you tell cremes where those 3-4 places are?
|
[18:32] sustrik
|
hm, i don't know precisely
|
[18:32] sustrik
|
dhammika have supplied those patches
|
[18:33] pieterh
|
it used to be easy 'egrep assert *.cpp'
|
[18:33] sustrik
|
maybe check the commit log
|
[18:33] sustrik
|
?
|
[18:33] sustrik
|
it's not asserting, it's closing the connections
|
[18:36] cremes
|
this conversation gave me an idea... i think i am narrowing it down... give me 10m
|
[18:36] pieterh
|
sustrik, I meant, it *used* to assert and I remember several times chasing down framing errors by sticking printfs into those places
|
[18:38] sustrik
|
these assert have been removed via your "0MQ competition" :)
|
[18:46] cremes
|
sustrik, pieterh_: found it!
|
[18:46] pieterh
|
:-)
|
[18:46] cremes
|
i had a duplicate identity on an unrelated XREQ socket!
|
[18:46] pieterh
|
yay!
|
[18:47] cremes
|
to reproduce, it's probably just these steps...
|
[18:47] pieterh
|
sustrik, does zmq already send anything to sys:?
|
[18:47] cremes
|
1. create a QUEUE device that binds to some port
|
[18:47] sustrik
|
pieterh_: no
|
[18:47] cremes
|
2. create two XREQ (REQ too?) sockets, set their identity the same and connect them to the QUEUE
|
[18:47] cremes
|
3. check for reconnects
|
[18:48] cremes
|
4. Maybe need to send some data through first...?
|
[18:48] pieterh
|
cremes: I'll make a test case later on
|
[18:48] cremes
|
ok, thanks pieter! your c skills far exceed my own
|
[18:48] pieterh
|
what do you mean by 'check for reconnects'?
|
[18:48] cremes
|
thank you both so much for working through this with me; this conversation solved it
|
[18:49] pieterh
|
i'd like to get a test case that results in a crash
|
[18:49] cremes
|
i added a debug statement to connect_session.cpp:detach to print whenever it detached and attempted a reconnect
|
[18:50] cremes
|
let me try to write one in ruby
|
[18:50] pieterh
|
this still does not explain why the mailbox exploded...
|
[18:50] cremes
|
then i can tell you exactly what needs to be done in c
|
[18:50] pieterh
|
yes, make a ruby test case, that's perfect
|
[18:50] pieterh
|
exploding mailbox gets double score
|
[18:50] pieterh
|
sustrik: we should start to send stuff to sys: where we used to assert
|
[18:51] pieterh
|
if you can document how to use sys: from inside zmq I can try that
|
[18:51] pieterh
|
ideally, a 1-liner that sends a string... :-)
|
[18:52] pieterh
|
then we can apply that to cremes test case and check that we'd have caught this error
|
[18:52] sustrik
|
log ();
|
[18:52] sustrik
|
it's ther
|
[18:52] sustrik
|
e
|
[18:53] pieterh
|
ah, it requires all the work of creating a message first
|
[18:53] pieterh
|
that's tedious
|
[18:54] pieterh
|
do we have a standardized format for sys://log messages?
|
[18:54] pieterh
|
sorry to complain but if this was packaged somewhat, it'd be easier for people to use it internally
|
[18:55] sustrik
|
no format
|
[18:55] sustrik
|
just use string atm
|
[18:55] sustrik
|
we can polish the format later on
|
[18:55] pieterh
|
every single object has a log method?
|
[18:56] pieterh
|
inherited from object_t?
|
[18:56] sustrik
|
yes
|
[18:56] pieterh
|
so the log method there could be somewhat expanded to take a string and create/destroy the msg itself
|
[18:57] pieterh
|
afaics we don't use this anywhere yet
|
[18:57] sustrik
|
sure
|
[18:57] pieterh
|
and then we need a documented parsable format for messages
|
[18:57] pieterh
|
minimal
|
[18:57] pieterh
|
easy to improve later
|
[18:57] sustrik
|
ack
|
[18:57] pieterh
|
ok, I'll try my hand at this, apologies in advance...
|
[19:11] cremes
|
yes! i have a reproducible crasher in ruby!
|
[19:12] cremes
|
pieterh_: do you want the ruby code or an explanation for translation to c?
|
[19:12] pieterh
|
cremes, I think we need to log two issues here
|
[19:13] cremes
|
ok, i can create the issues, but i only see one
|
[19:13] pieterh
|
(a) lack of any warning to the app developer
|
[19:13] pieterh
|
(b) mailbox crash
|
[19:13] pieterh
|
(b) is the critical one, and the ruby example will be valuable there
|
[19:13] cremes
|
ok, so (a) is for tracking a new feature request to add the sys: stuff, yes?
|
[19:13] pieterh
|
yes
|
[19:13] cremes
|
ok, i'll write them up
|
[19:14] pieterh
|
well, we don't track new feature requests, so perhaps skip (a)
|
[19:14] cremes
|
i'll add it to the wiki 3.0/roadmap page
|
[19:14] pieterh
|
i'm working on it now... :-)
|
[19:15] cremes
|
ok!
|
[19:27] cremes
|
pieterh_: preview this issue and let me know if you need more details to reproduce in c: https://github.com/zeromq/zeromq2/issues/165
|
[19:27] pieterh
|
cremes, thanks!
|
[19:27] cremes
|
pieterh_: i've spent the last 96 hours banging on this! i'm happy to see it solved!
|
[19:28] pieterh
|
that's why i'm doing the sys://log stuff, it's insane to lose so much time to a missing warning
|
[19:28] cremes
|
honestly, i'm taking the rest of the day off.... i feel deflated
|
[19:29] pieterh
|
sustrik, what's the correct way to work with a msg in the zmq core?
|
[19:30] pieterh
|
::zmq_msg_t or is there a message class I'm missing?
|
[19:39] enleth
|
Hello
|
[19:39] enleth
|
mikko: is the API documentation at http://valokuva.org/~mikko/php-zmq/ supposed to be inaccessible?
|
[19:46] ianbarber
|
enleth: check php.zero.mq
|
[19:46] ianbarber
|
references probably need updating
|
[19:50] mikko
|
enleth: yes
|
[19:55] enleth
|
ianbarber: thanks, that's it.
|
[19:56] enleth
|
mikko: can I suggest a 302 redirect to the new address?
|
[19:56] pieterh
|
cremes: still there?
|
[19:56] enleth
|
The old one is all over the latest git tree
|
[20:01] mikko
|
done
|
[20:01] cremes
|
pieterh_: for a bit more; what's up?
|
[20:01] pieterh
|
just wondered if you need to actually use the REQ/REP sockets to create the crash
|
[20:01] pieterh
|
or just bind them and BOOM
|
[20:02] pieterh
|
s/bind/connect
|
[20:02] cremes
|
let me see... give me 1m
|
[20:03] cremes
|
pieterh_: nope, crashes without using them; good catch... it's even *more* reduced now
|
[20:03] pieterh
|
excellent...
|
[20:03] pieterh
|
thanks a lot
|
[20:03] cremes
|
i'm no longer thinking clearly otherwise i would have tried that :)
|
[20:04] pieterh
|
it's been a long day :-)
|
[20:04] cremes
|
pieterh_: looks like you *do* need the REQ socket too
|
[20:05] cremes
|
a pair of REP's with the same ID is insufficient
|
[20:05] cremes
|
it's been a long *week*
|
[20:05] pieterh
|
ack, you need a pair of sockets with one disconnecting the other
|
[20:05] pieterh
|
presumably, I'll test that, it applies to all relevant socket types
|
[20:05] pieterh
|
it's been a long *year*!
|
[20:05] cremes
|
perhaps...
|
[20:05] pieterh
|
hang on...
|
[20:06] pieterh
|
:-)
|
[20:06] cremes
|
heh
|
[20:10] pieterh
|
cremes: bingo, I reproduced it!
|
[20:10] cremes
|
awesome!
|
[20:11] cremes
|
once started it only takes a few seconds to exhaust that buffer even when it's 5MB!
|
[20:11] pieterh
|
just connect two req sockets with same ID, wait 1 second...
|
[20:11] pieterh
|
I'm going to try with other socket types now
|
[20:16] pieterh
|
cremes: it affects all socket types
|
[20:16] pieterh
|
any combination of bind/connect, even pub connecting to sub
|
[20:17] cremes
|
wow
|
[20:17] cremes
|
this *might* explain a lot of people's problems; there are several issues open about this assertion
|
[20:18] pieterh
|
ironically 0MQ used to assert before :-)
|
[20:18] cremes
|
oh, the irony... :(
|
[20:19] cremes
|
well, i'm just glad it's no longer a mystery
|
[20:19] pieterh
|
anyhow, this makes it much easier to solve properly
|
[20:19] cremes
|
other than this, i haven't hit an assertion in a long time
|
[20:22] pieterh
|
indeed, we had a competition to kill them :-)
|
[22:30] jol
|
pieterh: nice talk at fosdem, I just watch it.
|
[22:40] Steve-o
|
thx mikko
|
[23:53] dan
|
hello
|
[23:53] mikko
|
hi
|
[23:53] dan
|
i've got a question about zmq
|
[23:54] mikko
|
go ahead
|
[23:54] dan
|
is there any reason I should not be able to implement a pubsub connection with one side in python and the other in cpp over ipc?
|
[23:55] mikko
|
no reason
|
[23:55] mikko
|
should be perfectly ok
|
[23:55] dan
|
hm
|
[23:56] mikko
|
you are not seeing any messages?
|
[23:56] dan
|
i see them when I use tcp, but not when i use ipc
|
[23:56] mikko
|
can i see the code?
|
[23:56] dan
|
whats the best way to share it?
|
[23:56] dan
|
copy paste in here?
|
[23:56] mikko
|
gist.github.com
|
[23:57] dan
|
sure - let me copy the code
|