ZeroMq IRC Log

Wednesday October 26, 2011

[Time] Name	Message
[05:44] liang2012	has any one encountered a send being blocked?
[09:28] CIA-79	libzmq: 03Mikko Koppanen 07master * r6c1b50c 10/ (acinclude.m4 configure.in src/ip.cpp): Added compile-time test for SOCK_CLOEXEC ...
[09:51] mikko	hi
[09:51] mikko	how was ukraine?
[09:52] sustrik	mikko: hi
[09:52] sustrik	great
[09:52] sustrik	0mq meet turned into pycon after-after-party as people leaving the after-party joined the meetup
[09:53] sustrik	some have allegedly even continued to an after-after-afte-party :)
[09:54] sustrik	btw, i've asked pieter to pull the CLOEXEC patch to 3.0 and 2.1
[09:55] mikko	cool. thanks
[10:04] mikko	sustrik: seen LIBZMQ-275?
[10:04] sustrik	yes
[10:04] sustrik	i kind of recall fixing that kind of thing already
[10:23] jond	hi Martin re: that zynga issue which Henry Geddes posted a stack trace on; another poster Marc Rossi remembered a patch he had applied which does appear to be in 2.1.10 but the code is slightly different to original patch as it handles a timeout now; this is in socket_base_t::recv. does this ring any bells?
[10:40] mikko	sustrik: might be 3.0 issue?
[10:41] sustrik	mikko: may be, i'll check it
[10:42] sustrik	jond: not really, i'm going to check
[10:42] sustrik	(beet out of town for 6 days, i'm catching up still)
[10:58] jond	sustrik: no problem, well he ends with two threads in epoll_wait (that'll be reaper, and io) and another doing poll from mailbox. mikko has asked them if they are running centos
[10:58] jond	which they are
[12:07] mikko	jond: i saw this on irc the other day
[12:07] mikko	guy was running pub/sub with large throughput
[12:07] mikko	and was having hangups
[12:07] mikko	on centos5
[12:58] sustrik	mikko: LIBZMQ-261, the kqueue problem
[12:58] sustrik	i'm running the attached test on freebsd
[12:58] sustrik	but it seems to work ok
[12:59] sustrik	do you remember whether there was some other freebsd test program?
[13:15] mikko	sustrik: which version are you testing?
[13:16] sustrik	2-1
[13:17] sustrik	mikko: latest 2-1 i mean
[13:18] mikko	sustrik: did you remove the patch?
[13:18] mikko	the one that i wrote hides the issue
[13:19] sustrik	it've been applied to 2-1?
[13:19] mikko	yes, pieter applied it
[13:19] sustrik	i see
[13:19] sustrik	let me downgrade to 2.1.10
[13:19] mikko	https://zeromq.jira.com/browse/LIBZMQ-261?focusedCommentId=13522&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13522
[13:20] mikko	2.1.9 iirc doesnt have it
[13:20] sustrik	the error was reported with 2.1.10
[13:20] sustrik	so it should be visible there
[13:20] mikko	i reported it against 2.1.0
[13:20] mikko	ermm
[13:21] mikko	2.1.10 because that was the trunk back then
[13:21] mikko	i think 2.1.10 released with this patch
[13:21] sustrik	i see
[13:22] sustrik	let's use 2.1.9 then
[13:22] mikko	you should get it out with ease on 2.1.9
[13:22] mikko	my patch hides the error and makes it workable
[13:22] mikko	probably not ideal but seems to work
[13:23] sustrik	trying...
[13:25] jond	sustrik: could kevent::rm_fd call kevent_twice?
[13:25] sustrik	no idea
[13:25] sustrik	:)
[13:26] sustrik	i haven't written the code
[13:26] jond	s/kevent/kqueue/g
[13:27] sustrik	i would say it should call it only once
[13:27] jond	well I wonder what happens if pe->flag_pollin and pe->flag_pollout are both true
[13:27] jond	as it's called twice then
[13:28] sustrik	mikko: no luck
[13:28] sustrik	jond: let me see
[13:29] jond	looks iffy to me, but i don't have a bsd system
[13:29] sustrik	jond: what line?
[13:30] jond	kqueue.cpp, zmq::kqueue_t::rm_fd
[13:32] sustrik	hm, i have no experience with kqueue
[13:32] sustrik	however, it seems that READ and WRITE events are different objects
[13:32] sustrik	and so both should be removed
[13:33] jond	sustrik: same here; can the removal be done in single call, or'ing the flags?
[13:34] sustrik	no idea
[13:34] sustrik	however, it should have no impact on this bug
[13:35] sustrik	damn, i cannot reproduce it
[13:35] sustrik	mikko: have you used a different test case or something?
[13:58] mikko	sustrik: yes
[13:58] mikko	it crashed pzq all the time
[13:59] sustrik	on freebsd, right?
[13:59] mikko	on mac
[13:59] mikko	Environment:
[13:59] mikko	Mac OS X
[13:59] sustrik	:(
[13:59] sustrik	no mac os x here
[14:00] mikko	should be similar error
[14:00] mikko	GÃ¡bor Farkas seems to have issue with push/pull on freebsd
[14:00] mikko	let me test if the test code in the issue does it on mac
[14:01] sustrik	it's attached by someone with osx so presumably it should
[14:01] sustrik	what about trying pzq on freebsd?
[14:03] mikko	the test case looks like a simplified version of what pzq does
[14:03] mikko	i wonder if this is os x specific?
[14:03] sustrik	maybe
[14:03] mikko	or at least harder to reproduce on freebsd
[14:03] sustrik	still, gabor reports something similar
[14:03] sustrik	yes, harder to reproduce
[14:03] sustrik	and the error code is actually different (so your patch doesn't solve it)
[14:04] sustrik	gabor reports EBADF
[14:04] sustrik	while mac os x produces ENOENT
[14:25] djc	I have a situation with two inproc pub/subs, where second and later subscriber threads to connect to the second inproc channel seem to fail
[14:28] sustrik	what error?
[14:28] djc	connection refused
[14:29] djc	it appears that maybe the publisher thread fails
[14:29] djc	because a quick second subscriber does work
[14:29] djc	but they reliably start failing to connect after a while
[14:30] djc	(these are python threads, btw)
[14:31] sustrik	zmq_connect() returns connection refused?
[14:31] mikko	djc: does the bind happen before connect in all cases?
[14:31] sustrik	it's not an legal return code
[14:31] djc	mikko: yeah, the bind happens way before connect
[14:31] djc	sustrik: it mentions zmq/core/socket.c:4114
[14:32] djc	(this is with 2.1.7, but I didn't see anything particularly relevant in the NEWS)
[14:32] sustrik	there's no such file
[14:32] sustrik	not even the directory
[14:32] djc	it's probably pyzmq
[14:32] djc	http://paste.pocoo.org/show/498476/
[14:34] sustrik	hm, maybe there's a bug in 2.1.7 causing zmq_connect() return ECONNREFUSED instead of trying to reconnect
[14:35] sustrik	but i quite doubt it
[14:35] djc	it throws this ZMQError on any non-zero return from zmq_connect()
[14:35] sustrik	as "connect first, bind second" used to work even back then
[14:35] djc	according to the pyzmq code
[14:36] sustrik	irrespective of what the actual error is?
[14:36] sustrik	you should fill an error ticket in pyzmq bug tracker then
[14:37] djc	yeah
[14:37] djc	I'm checking if it's somehow wrapping the error code in the exception
[14:37] sustrik	the valid error codes are described here:
[14:37] sustrik	http://api.zeromq.org/2-1:zmq-connect#toc4
[14:38] djc	most of those seem rather unlikely if an earlier invocation of the same code works
[14:39] sustrik	find out what the actual error code is then
[14:41] mikko	sustrik: those dont seem to be right
[14:41] mikko	sustrik: if you connect inproc to a non-existent endpoint you get connection refused
[14:42] mikko	at least that seems to be the case with 2.1
[14:42] sustrik	yes, the auto-reconnect functionality for inproc is missing
[14:42] sustrik	djc: is it inproc transport?
[14:43] djc	sustrik: yeah
[14:43] mikko	sustrik: i think ECONNREFUSED should be added to that man page
[14:43] sustrik	mikko: yes
[14:43] djc	that's what I mentioned at the start :)
[14:43] sustrik	djc: there's unimplemented feature with inproc
[14:43] sustrik	it doesn't reconnect automatically upon failure
[14:44] djc	failure for what reason?
[14:44] sustrik	for example that there's nobody bound to the endpoint
[14:44] sustrik	so it returns ECONNREFUSED instead
[14:44] djc	ah! so if all the subscribers end, an inproc publishers would also end?
[14:45] djc	if so, you should mention that on http://api.zeromq.org/2-1:zmq-inproc
[14:45] mikko	djc: it shouldnt
[14:46] mikko	assuming it doesnt exit
[14:46] djc	well, that kind of describes the behavior I'm seeing
[14:47] mikko	djc: if your bound and exits and then rebinds that would probably cause error on clients
[14:48] djc	okay, it looks like my publisher is somehow ending
[15:17] CIA-79	jzmq: 03Gonzalo Diethelm 07master * rdaf4775 10/ pom.xml : Changed version to 1.0.0 in preparation to first oficial release. - http://git.io/bB309Q
[15:54] CIA-79	pyzmq: 03MinRK 07master * r6c5b78c 10/ zmq/utils/jsonapi.py : fix dumps->loads typo in jsonapi.__all__ ...
[16:09] mikko	sustrik: back
[16:10] mikko	sustrik: do you want me to look into this kqueue thing?
[16:10] mikko	is there a backtrace you could use?
[16:11] sustrik	mikko: hm
[16:11] sustrik	no idea what's going on there
[16:11] sustrik	we can start with the backtrace
[16:11] mikko	i'll try to ebay a cheap mac mini
[16:11] mikko	i can probably stick it into same place where build cluster is
[16:12] sustrik	that would be nice
[16:12] mikko	it would probably be beneficial as we have large amount of users on macs
[16:12] sustrik	yep, it's kind of annoying not to be able to reproduce the osx problems
[16:14] jond	mikko, sustrik perhaps dtrace be useful in looking at the problem....
[16:18] cremes	mikko, sustrik: i can provide an account to you guys on my osx box at work
[16:18] cremes	actually, i already did that for sustrik a while ago...
[16:18] cremes	account should still be active
[16:19] sustrik	cremes: great
[16:22] mikko	cremes: that would be great if you can provide sustrik with access
[16:22] mikko	can't see cheap imacs on ebay atm
[16:22] cremes	mikko: we're chatting privately about it :)
[16:24] mikko	cremes: is it a development server?
[16:24] mikko	i wonder if you might be able to run a build slave on it
[16:25] cremes	mikko: it's my desktop that i use for all development... it's pretty busy for 12 hours a day
[16:25] cremes	but should be available as a potential build slave for the remainder
[16:26] mikko	ah ok
[16:27] mikko	hmm
[16:27] mikko	it might be better if i ebay a mac mini
[16:27] cremes	probably :)
[16:27] cremes	but i'm willing to lend a hand in the short term if you need it
[16:29] mikko	short term it's more important for sustrik to have access
[16:29] mikko	so that he can fix all the weird ones
[16:39] cremes	mikko: he's got it
[16:44] sustrik	cremes, mikko: reproduced!
[16:44] sustrik	thanks!
[16:45] cremes	great news!
[16:51] mikko	sustrik: cool
[16:51] mikko	sustrik: it was easier on mac os x ?
[16:51] mikko	it might be that it's also easier to reproduce on older freebsd
[16:56] sustrik	it happened immediately
[17:34] crankycoder	has there been any progress on making 0mq safer from fuzzing? I saw a post about hardening it via a competition, but can't find any links to that competition.
[17:34] crankycoder	http://www.mail-archive.com/zeromq-dev@lists.zeromq.org/msg05592.html
[17:41] mikko	crankycoder: it's a work in progress
[17:42] crankycoder	is there anywhere to track it?
[17:42] crankycoder	i'm looking for a bug tracker or list of bugs outstanding
[17:59] Steve-o	enjoy packages for zeromq-java on windows. well 64-bit anyway.
[18:24] mikko	maybe we should add issue tracker url to topic
[18:44] collision	Hi, i'm using a REQ-REP socket
[18:45] collision	the client send arrives ok
[18:45] collision	but the server responses come corruput to the client
[18:50] collision	i found it
[18:51] collision	never mind
[21:59] tarcieri	guess I should chill here
[21:59] tarcieri	:)
[21:59] mikko	possibly
[21:59] cremes	tarcieri: i saw you ping me in rubinius... what's up?
[21:59] tarcieri	cremes: oh hey, just playing with ffi-rzmq
[22:00] tarcieri	had some more general questions so I figured I'd pop in here
[22:00] cremes	no! don't add 0mq to celluloid!
[22:00] cremes	:)
[22:00] tarcieri	lol
[22:00] cremes	i'll try to help... i have limited time as i'm going out tonight...
[22:00] cremes	what's your ??
[22:00] tarcieri	just trying to figure out how to map distributed Celluloid onto 0MQ
[22:01] tarcieri	I want to use one actor per link to another node
[22:01] tarcieri	and have that actor talk to a corresponding actor on the remote node using an exclusive pair
[22:01] cremes	ok
[22:02] tarcieri	it's just as I read about exclusive pairs the docs keep guilt tripping me into trying something else
[22:02] cremes	sure... PAIR offers no real benefit over regular bsd sockets
[22:03] tarcieri	seems like 0MQ would have me use inproc for ALL inter-actor communication
[22:03] tarcieri	and like what, marshal every time?
[22:03] tarcieri	seems bad
[22:03] cremes	yes... inproc does a little "pointer flip" behind the scenes but you can't use that with ruby
[22:04] cremes	since objects move in memory due to GC
[22:04] tarcieri	yeah
[22:04] cremes	so you have to marshal
[22:04] tarcieri	especially in, say, JRuby
[22:04] tarcieri	which is kinda the #1 thing I'm targeting right now
[22:04] cremes	it's a pain in the ass... we need a Object#pin/Object#unpin to allow for locking objects in place
[22:04] tarcieri	so anyway Celluloid already has primitives for in-VM messaging
[22:04] tarcieri	and I'm not going to rip them out and replace them with 0MQ
[22:05] tarcieri	although I could offer 0MQ-based actors if I wanted, i just don't see the point really :/
[22:05] cremes	well, 0mq-based actors when they are spread over multiple machines
[22:05] tarcieri	yeah, indeed
[22:05] cremes	on a single box it doesn't offer you mcuh benefit
[22:05] tarcieri	so like... you have N nodes that all need to talk to each other
[22:06] grantr	tarcieri, sounds like a job for xrep/xreq (or dealer/router, depending on who you're talking to)
[22:06] tarcieri	at least for the basic actor protocol I don't really see the usefulness of anything but pairs
[22:06] cremes	right
[22:06] tarcieri	grantr: Celluloid messaging is fully asynchronous though
[22:07] grantr	yep, so is xre[pq]
[22:07] cremes	grantr: when using ZM_NOBLOCK
[22:07] grantr	they dont enforce the request-reply ordering
[22:07] grantr	cremes, right
[22:08] tarcieri	grantr: so if I understand them correctly... I think they're a bad fit
[22:08] tarcieri	grantr: there's not necessarily replies to any message
[22:08] cremes	tarcieri: have you read the guide? zero.mq/zg
[22:09] cremes	it's the best doc to use for getting a foundational understanding of 0mq
[22:09] grantr	tarcieri, try this for an overview of the socket types http://api.zeromq.org/3-0:zmq-socket
[22:09] cremes	and it is possible that 0mq isn't a good fit
[22:12] grantr	cremes, why shouldn't tarcieri add 0mq to celluloid?
[22:13] tarcieri	so the thing is
[22:13] cremes	grantr: why use it if it offers no benefit over bsd sockets?
[22:13] tarcieri	I can see various other socket types being useful for specific cases
[22:13] cremes	it would just be a wasted dependency
[22:13] tarcieri	those wouldn't be part of the general actor communication protocol though
[22:14] tarcieri	but the thing is, if I want to incorporate those types later, shouldn't I start with 0mq, or is that a waste of time?
[22:14] grantr	cremes, i mean this: <cremes> no! don't add 0mq to celluloid!
[22:14] tarcieri	particularly pubsub and push/pull
[22:14] tarcieri	I assume that was facetious
[22:14] grantr	i thought so too :)
[22:15] cremes	grantr: oh, that first response was a joke :)
[22:15] grantr	as i see it, the advantage of using zmq for inter-node communication - abstracts socket setup/teardown, abstracts failure handling, abstracts buffering
[22:16] grantr	dunno if its worth the dependency
[22:16] cremes	tarcieri: a lot of the internal 0mq classes reflect the message-passing aesthetic of actors... mailboxes and whatnot
[22:16] cremes	fyi
[22:17] cremes	grantr is right... it does help with those issues but i suggest you remember YAGNI
[22:17] cremes	you might only ever use PAIR in which case i think the dependency is pointless
[22:18] grantr	for my use case with celluloid, the other socket types are a big help so it makes sense for ME to use zmq in other places too
[22:18] grantr	i could see if the only thing you need it for is casting rpc to other nodes, maybe not worth the dependency (since it can be a pretty gnarly dependency to resolve)
[22:19] cremes	sorry folks... gotta run; i'll check the scrollback in a few hours
[22:20] tarcieri	I think pubsub is the biggest use case
[22:21] grantr	thats the one i need
[22:21] tarcieri	the biggest non-pair use case, that is
[22:23] tarcieri	cremes: well provided you check the scrollback, I did want to confirm that ZMQ::Contexts are thread safe
[22:23] grantr	tarcieri, contexts are, sockets are not
[22:23] tarcieri	I'd like to share one across all threads if possible
[22:23] tarcieri	yeah
[22:23] tarcieri	cool
[22:23] grantr	contexts are intended to be one per process (as i understand it)
[22:23] tarcieri	seems good

ZeroMq Home

Wednesday October 26, 2011