Wednesday October 26, 2011

[Time] NameMessage
[05:44] liang2012 has any one encountered a send being blocked?
[09:28] CIA-79 libzmq: 03Mikko Koppanen 07master * r6c1b50c 10/ (acinclude.m4 src/ip.cpp): Added compile-time test for SOCK_CLOEXEC ...
[09:51] mikko hi
[09:51] mikko how was ukraine?
[09:52] sustrik mikko: hi
[09:52] sustrik great
[09:52] sustrik 0mq meet turned into pycon after-after-party as people leaving the after-party joined the meetup
[09:53] sustrik some have allegedly even continued to an after-after-afte-party :)
[09:54] sustrik btw, i've asked pieter to pull the CLOEXEC patch to 3.0 and 2.1
[09:55] mikko cool. thanks
[10:04] mikko sustrik: seen LIBZMQ-275?
[10:04] sustrik yes
[10:04] sustrik i kind of recall fixing that kind of thing already
[10:23] jond hi Martin re: that zynga issue which Henry Geddes posted a stack trace on; another poster Marc Rossi remembered a patch he had applied which does appear to be in 2.1.10 but the code is slightly different to original patch as it handles a timeout now; this is in socket_base_t::recv. does this ring any bells?
[10:40] mikko sustrik: might be 3.0 issue?
[10:41] sustrik mikko: may be, i'll check it
[10:42] sustrik jond: not really, i'm going to check
[10:42] sustrik (beet out of town for 6 days, i'm catching up still)
[10:58] jond sustrik: no problem, well he ends with two threads in epoll_wait (that'll be reaper, and io) and another doing poll from mailbox. mikko has asked them if they are running centos
[10:58] jond which they are
[12:07] mikko jond: i saw this on irc the other day
[12:07] mikko guy was running pub/sub with large throughput
[12:07] mikko and was having hangups
[12:07] mikko on centos5
[12:58] sustrik mikko: LIBZMQ-261, the kqueue problem
[12:58] sustrik i'm running the attached test on freebsd
[12:58] sustrik but it seems to work ok
[12:59] sustrik do you remember whether there was some other freebsd test program?
[13:15] mikko sustrik: which version are you testing?
[13:16] sustrik 2-1
[13:17] sustrik mikko: latest 2-1 i mean
[13:18] mikko sustrik: did you remove the patch?
[13:18] mikko the one that i wrote hides the issue
[13:19] sustrik it've been applied to 2-1?
[13:19] mikko yes, pieter applied it
[13:19] sustrik i see
[13:19] sustrik let me downgrade to 2.1.10
[13:19] mikko
[13:20] mikko 2.1.9 iirc doesnt have it
[13:20] sustrik the error was reported with 2.1.10
[13:20] sustrik so it should be visible there
[13:20] mikko i reported it against 2.1.0
[13:20] mikko ermm
[13:21] mikko 2.1.10 because that was the trunk back then
[13:21] mikko i think 2.1.10 released with this patch
[13:21] sustrik i see
[13:22] sustrik let's use 2.1.9 then
[13:22] mikko you should get it out with ease on 2.1.9
[13:22] mikko my patch hides the error and makes it workable
[13:22] mikko probably not ideal but seems to work
[13:23] sustrik trying...
[13:25] jond sustrik: could kevent::rm_fd call kevent_twice?
[13:25] sustrik no idea
[13:25] sustrik :)
[13:26] sustrik i haven't written the code
[13:26] jond s/kevent/kqueue/g
[13:27] sustrik i would say it should call it only once
[13:27] jond well I wonder what happens if pe->flag_pollin and pe->flag_pollout are both true
[13:27] jond as it's called twice then
[13:28] sustrik mikko: no luck
[13:28] sustrik jond: let me see
[13:29] jond looks iffy to me, but i don't have a bsd system
[13:29] sustrik jond: what line?
[13:30] jond kqueue.cpp, zmq::kqueue_t::rm_fd
[13:32] sustrik hm, i have no experience with kqueue
[13:32] sustrik however, it seems that READ and WRITE events are different objects
[13:32] sustrik and so both should be removed
[13:33] jond sustrik: same here; can the removal be done in single call, or'ing the flags?
[13:34] sustrik no idea
[13:34] sustrik however, it should have no impact on this bug
[13:35] sustrik damn, i cannot reproduce it
[13:35] sustrik mikko: have you used a different test case or something?
[13:58] mikko sustrik: yes
[13:58] mikko it crashed pzq all the time
[13:59] sustrik on freebsd, right?
[13:59] mikko on mac
[13:59] mikko Environment:
[13:59] mikko Mac OS X
[13:59] sustrik :(
[13:59] sustrik no mac os x here
[14:00] mikko should be similar error
[14:00] mikko Gábor Farkas seems to have issue with push/pull on freebsd
[14:00] mikko let me test if the test code in the issue does it on mac
[14:01] sustrik it's attached by someone with osx so presumably it should
[14:01] sustrik what about trying pzq on freebsd?
[14:03] mikko the test case looks like a simplified version of what pzq does
[14:03] mikko i wonder if this is os x specific?
[14:03] sustrik maybe
[14:03] mikko or at least harder to reproduce on freebsd
[14:03] sustrik still, gabor reports something similar
[14:03] sustrik yes, harder to reproduce
[14:03] sustrik and the error code is actually different (so your patch doesn't solve it)
[14:04] sustrik gabor reports EBADF
[14:04] sustrik while mac os x produces ENOENT
[14:25] djc I have a situation with two inproc pub/subs, where second and later subscriber threads to connect to the second inproc channel seem to fail
[14:28] sustrik what error?
[14:28] djc connection refused
[14:29] djc it appears that maybe the publisher thread fails
[14:29] djc because a quick second subscriber does work
[14:29] djc but they reliably start failing to connect after a while
[14:30] djc (these are python threads, btw)
[14:31] sustrik zmq_connect() returns connection refused?
[14:31] mikko djc: does the bind happen before connect in all cases?
[14:31] sustrik it's not an legal return code
[14:31] djc mikko: yeah, the bind happens way before connect
[14:31] djc sustrik: it mentions zmq/core/socket.c:4114
[14:32] djc (this is with 2.1.7, but I didn't see anything particularly relevant in the NEWS)
[14:32] sustrik there's no such file
[14:32] sustrik not even the directory
[14:32] djc it's probably pyzmq
[14:32] djc
[14:34] sustrik hm, maybe there's a bug in 2.1.7 causing zmq_connect() return ECONNREFUSED instead of trying to reconnect
[14:35] sustrik but i quite doubt it
[14:35] djc it throws this ZMQError on any non-zero return from zmq_connect()
[14:35] sustrik as "connect first, bind second" used to work even back then
[14:35] djc according to the pyzmq code
[14:36] sustrik irrespective of what the actual error is?
[14:36] sustrik you should fill an error ticket in pyzmq bug tracker then
[14:37] djc yeah
[14:37] djc I'm checking if it's somehow wrapping the error code in the exception
[14:37] sustrik the valid error codes are described here:
[14:37] sustrik
[14:38] djc most of those seem rather unlikely if an earlier invocation of the same code works
[14:39] sustrik find out what the actual error code is then
[14:41] mikko sustrik: those dont seem to be right
[14:41] mikko sustrik: if you connect inproc to a non-existent endpoint you get connection refused
[14:42] mikko at least that seems to be the case with 2.1
[14:42] sustrik yes, the auto-reconnect functionality for inproc is missing
[14:42] sustrik djc: is it inproc transport?
[14:43] djc sustrik: yeah
[14:43] mikko sustrik: i think ECONNREFUSED should be added to that man page
[14:43] sustrik mikko: yes
[14:43] djc that's what I mentioned at the start :)
[14:43] sustrik djc: there's unimplemented feature with inproc
[14:43] sustrik it doesn't reconnect automatically upon failure
[14:44] djc failure for what reason?
[14:44] sustrik for example that there's nobody bound to the endpoint
[14:44] sustrik so it returns ECONNREFUSED instead
[14:44] djc ah! so if all the subscribers end, an inproc publishers would also end?
[14:45] djc if so, you should mention that on
[14:45] mikko djc: it shouldnt
[14:46] mikko assuming it doesnt exit
[14:46] djc well, that kind of describes the behavior I'm seeing
[14:47] mikko djc: if your bound and exits and then rebinds that would probably cause error on clients
[14:48] djc okay, it looks like my publisher is somehow ending
[15:17] CIA-79 jzmq: 03Gonzalo Diethelm 07master * rdaf4775 10/ pom.xml : Changed version to 1.0.0 in preparation to first oficial release. -
[15:54] CIA-79 pyzmq: 03MinRK 07master * r6c5b78c 10/ zmq/utils/ : fix dumps->loads typo in jsonapi.__all__ ...
[16:09] mikko sustrik: back
[16:10] mikko sustrik: do you want me to look into this kqueue thing?
[16:10] mikko is there a backtrace you could use?
[16:11] sustrik mikko: hm
[16:11] sustrik no idea what's going on there
[16:11] sustrik we can start with the backtrace
[16:11] mikko i'll try to ebay a cheap mac mini
[16:11] mikko i can probably stick it into same place where build cluster is
[16:12] sustrik that would be nice
[16:12] mikko it would probably be beneficial as we have large amount of users on macs
[16:12] sustrik yep, it's kind of annoying not to be able to reproduce the osx problems
[16:14] jond mikko, sustrik perhaps dtrace be useful in looking at the problem....
[16:18] cremes mikko, sustrik: i can provide an account to you guys on my osx box at work
[16:18] cremes actually, i already did that for sustrik a while ago...
[16:18] cremes account should still be active
[16:19] sustrik cremes: great
[16:22] mikko cremes: that would be great if you can provide sustrik with access
[16:22] mikko can't see cheap imacs on ebay atm
[16:22] cremes mikko: we're chatting privately about it :)
[16:24] mikko cremes: is it a development server?
[16:24] mikko i wonder if you might be able to run a build slave on it
[16:25] cremes mikko: it's my desktop that i use for all development... it's pretty busy for 12 hours a day
[16:25] cremes but should be available as a potential build slave for the remainder
[16:26] mikko ah ok
[16:27] mikko hmm
[16:27] mikko it might be better if i ebay a mac mini
[16:27] cremes probably :)
[16:27] cremes but i'm willing to lend a hand in the short term if you need it
[16:29] mikko short term it's more important for sustrik to have access
[16:29] mikko so that he can fix all the weird ones
[16:39] cremes mikko: he's got it
[16:44] sustrik cremes, mikko: reproduced!
[16:44] sustrik thanks!
[16:45] cremes great news!
[16:51] mikko sustrik: cool
[16:51] mikko sustrik: it was easier on mac os x ?
[16:51] mikko it might be that it's also easier to reproduce on older freebsd
[16:56] sustrik it happened immediately
[17:34] crankycoder has there been any progress on making 0mq safer from fuzzing? I saw a post about hardening it via a competition, but can't find any links to that competition.
[17:34] crankycoder
[17:41] mikko crankycoder: it's a work in progress
[17:42] crankycoder is there anywhere to track it?
[17:42] crankycoder i'm looking for a bug tracker or list of bugs outstanding
[17:59] Steve-o enjoy packages for zeromq-java on windows. well 64-bit anyway.
[18:24] mikko maybe we should add issue tracker url to topic
[18:44] collision Hi, i'm using a REQ-REP socket
[18:45] collision the client send arrives ok
[18:45] collision but the server responses come corruput to the client
[18:50] collision i found it
[18:51] collision never mind
[21:59] tarcieri guess I should chill here
[21:59] tarcieri :)
[21:59] mikko possibly
[21:59] cremes tarcieri: i saw you ping me in rubinius... what's up?
[21:59] tarcieri cremes: oh hey, just playing with ffi-rzmq
[22:00] tarcieri had some more general questions so I figured I'd pop in here
[22:00] cremes no! don't add 0mq to celluloid!
[22:00] cremes :)
[22:00] tarcieri lol
[22:00] cremes i'll try to help... i have limited time as i'm going out tonight...
[22:00] cremes what's your ??
[22:00] tarcieri just trying to figure out how to map distributed Celluloid onto 0MQ
[22:01] tarcieri I want to use one actor per link to another node
[22:01] tarcieri and have that actor talk to a corresponding actor on the remote node using an exclusive pair
[22:01] cremes ok
[22:02] tarcieri it's just as I read about exclusive pairs the docs keep guilt tripping me into trying something else
[22:02] cremes sure... PAIR offers no real benefit over regular bsd sockets
[22:03] tarcieri seems like 0MQ would have me use inproc for ALL inter-actor communication
[22:03] tarcieri and like what, marshal every time?
[22:03] tarcieri seems bad
[22:03] cremes yes... inproc does a little "pointer flip" behind the scenes but you can't use that with ruby
[22:04] cremes since objects move in memory due to GC
[22:04] tarcieri yeah
[22:04] cremes so you have to marshal
[22:04] tarcieri especially in, say, JRuby
[22:04] tarcieri which is kinda the #1 thing I'm targeting right now
[22:04] cremes it's a pain in the ass... we need a Object#pin/Object#unpin to allow for locking objects in place
[22:04] tarcieri so anyway Celluloid already has primitives for in-VM messaging
[22:04] tarcieri and I'm not going to rip them out and replace them with 0MQ
[22:05] tarcieri although I could offer 0MQ-based actors if I wanted, i just don't see the point really :/
[22:05] cremes well, 0mq-based actors when they are spread over multiple machines
[22:05] tarcieri yeah, indeed
[22:05] cremes on a single box it doesn't offer you mcuh benefit
[22:05] tarcieri so like... you have N nodes that all need to talk to each other
[22:06] grantr tarcieri, sounds like a job for xrep/xreq (or dealer/router, depending on who you're talking to)
[22:06] tarcieri at least for the basic actor protocol I don't really see the usefulness of anything but pairs
[22:06] cremes right
[22:06] tarcieri grantr: Celluloid messaging is fully asynchronous though
[22:07] grantr yep, so is xre[pq]
[22:07] cremes grantr: when using ZM_NOBLOCK
[22:07] grantr they dont enforce the request-reply ordering
[22:07] grantr cremes, right
[22:08] tarcieri grantr: so if I understand them correctly... I think they're a bad fit
[22:08] tarcieri grantr: there's not necessarily replies to any message
[22:08] cremes tarcieri: have you read the guide?
[22:09] cremes it's the best doc to use for getting a foundational understanding of 0mq
[22:09] grantr tarcieri, try this for an overview of the socket types
[22:09] cremes and it is possible that 0mq isn't a good fit
[22:12] grantr cremes, why shouldn't tarcieri add 0mq to celluloid?
[22:13] tarcieri so the thing is
[22:13] cremes grantr: why use it if it offers no benefit over bsd sockets?
[22:13] tarcieri I can see various other socket types being useful for specific cases
[22:13] cremes it would just be a wasted dependency
[22:13] tarcieri those wouldn't be part of the general actor communication protocol though
[22:14] tarcieri but the thing is, if I want to incorporate those types later, shouldn't I start with 0mq, or is that a waste of time?
[22:14] grantr cremes, i mean this: <cremes> no! don't add 0mq to celluloid!
[22:14] tarcieri particularly pubsub and push/pull
[22:14] tarcieri I assume that was facetious
[22:14] grantr i thought so too :)
[22:15] cremes grantr: oh, that first response was a joke :)
[22:15] grantr as i see it, the advantage of using zmq for inter-node communication - abstracts socket setup/teardown, abstracts failure handling, abstracts buffering
[22:16] grantr dunno if its worth the dependency
[22:16] cremes tarcieri: a lot of the internal 0mq classes reflect the message-passing aesthetic of actors... mailboxes and whatnot
[22:16] cremes fyi
[22:17] cremes grantr is right... it does help with those issues but i suggest you remember YAGNI
[22:17] cremes you might only ever use PAIR in which case i think the dependency is pointless
[22:18] grantr for my use case with celluloid, the other socket types are a big help so it makes sense for ME to use zmq in other places too
[22:18] grantr i could see if the only thing you need it for is casting rpc to other nodes, maybe not worth the dependency (since it can be a pretty gnarly dependency to resolve)
[22:19] cremes sorry folks... gotta run; i'll check the scrollback in a few hours
[22:20] tarcieri I think pubsub is the biggest use case
[22:21] grantr thats the one i need
[22:21] tarcieri the biggest non-pair use case, that is
[22:23] tarcieri cremes: well provided you check the scrollback, I did want to confirm that ZMQ::Contexts are thread safe
[22:23] grantr tarcieri, contexts are, sockets are not
[22:23] tarcieri I'd like to share one across all threads if possible
[22:23] tarcieri yeah
[22:23] tarcieri cool
[22:23] grantr contexts are intended to be one per process (as i understand it)
[22:23] tarcieri seems good