ZeroMq IRC Log

Wednesday November 2, 2011

[Time] Name	Message
[10:06] g4bor	hi, are there any known issues with using (py)zeromq in a application that uses process-forking?
[10:14] mikko	g4bor: possibly
[10:15] g4bor	mikko: ok, i see that my question was a little too generic :-) ... could you name some of those issues?
[10:16] g4bor	mikko: basically i'm trying to find out why a problem is happening (http://lists.zeromq.org/pipermail/zeromq-dev/2011-October/013781.html)
[10:19] g4bor	mikko: and i'm using it in a forking python web-app (based on gunicorn (http://gunicorn.org/)), so i thought, maybe there are some known problem in such a situation...
[10:28] sustrik	the kqueue problem?
[10:28] sustrik	g4abor: it has nothing to do with forking
[10:29] g4bor	sustrik_: yes, that one. except it morphed into the 'mailbox' problem when moving from 2.1.7 to 2.1.10
[10:29] g4bor	sustrik_: nothing to do with forking.. i see...
[10:29] sustrik	ok
[10:29] sustrik	well, mailbox may have to do with forking, dunno
[10:29] sustrik	we need a reproducible minimal test case to fix it
[10:30] g4bor	sustrik_: i completely agree about the test-case... i just haven't found the time to do it yet :-)
[10:31] sustrik	sure, np
[10:31] g4bor	and IF the problem happens, it is ALWAYS the same situation in the application... in short i wonder that maybe i'm sending "incorrect" data into zeromq somehow
[10:34] sustrik	g4abor: no, it looks like an internal 0mq problem
[10:34] sustrik	of course, unless you are using the socket from multiple threads in parallel
[10:46] g4bor	sustrik_: hmm... what if i open a PUSH socket, then i fork, and use the socket from both processes? is that fine or is that insane? :-) (i am not doing it, but maybe i have a bug and it is happening)
[10:50] mikko	g4bor: i dont think thats ok
[10:53] g4bor	mikko: neither do i... i just hope i get a error-message like 'gabor you are doing this wrong.' or something like that .... anyway will investigate this issue
[10:53] sustrik	it' s definitely not OK
[10:54] g4bor	is there an easy way to ask a PUSH socket for some id-number? i mean, if i want to check 2 sockets in 2 processes if they are the same zeromq-socket, is there an easy way to do that?
[10:54] g4bor	i'm using the python-bindings...
[10:57] g4bor	hmm... maybe comparing pointer that's returned from zmq_socket will be enough.. will see
[11:02] sustrik	0mq socket are user space objects
[11:02] sustrik	they can't be shared between processes
[11:10] CIA-79	jzmq: 03Jason Chown 07master * r2303993 10/ src/org/zeromq/ZMQ.java : Clarified comments in timeout parameter ...
[11:10] CIA-79	jzmq: 03Gonzalo Diethelm 07master * rf9b0964 10/ src/org/zeromq/ZMQ.java : Merge pull request #84 from jchown/master ...
[11:22] g4bor	sustrik_: maybe i'm getting to something. i did some test-scripts, and IF i do the context-creation and socket-creation in one process, and then call connect in a forked process, i get the mailbox-assertion... will investigate this more...
[11:24] sustrik	g4bor: are you using the forked context/socket?
[11:24] sustrik	instead of opening a new one in the forked process?
[11:25] mato	sustrik_: what's this? someone trying to use contexts across fork() that will never work...
[11:26] sustrik	dunno
[11:26] sustrik	that's why i'm asking
[11:30] g4bor	sustrik_: yes, i am not trying to do it.. i have an app, that does use zeromq, and uses process-forking.. i THOUGHT that i'm initializing zeromq AFTER the fork, but now i wonder that maybe i have a bug and doing it BEFORE th fork... still investigating ...
[11:31] mato	g4bor: which version of zeromq are you using? i'm not sure if 2.x has the SOCK_CLOEXEC stuff in it and without that forking will break badly
[12:27] g4bor	mato: 2.1.10... it's totally fine if zeromq breaks with forking. the plan is that zeromq is not used 'cross-fork'. i just have to find the bug in my code that causes this behavior (assuming it is the case. i have no proof yet, only suspicion)
[12:35] mato	g4bor: ok, looking at the git log, the fix for CLOEXEC (issue 273) went in after the release of 2.1.10
[12:35] mato	g4bor: without that it really won't work for you at all if you so much as create a single 0mq socket and then fork()
[12:36] mato	g4bor: so i suggest, try with the zeromq2-1 git master (latest 2.1.x) first
[12:36] mato	whoops, hang on
[12:36] mato	LIBZMQ-218 was the original issue
[12:36] mato	-273 is something else
[12:37] mato	ok, and the LIBZMQ-218 fix is in 2.1.10
[12:42] g4bor	cool, that might explain why i am getting a different error message in 2.1.7 and in 2.1.10 :-)
[12:42] g4bor	but that's fine. in my case, in the fork-process, the "parent" is not supposed to ever touch zeromq, only the "child"... the problem is that the "parent" is doing it. when i fix this issue, it (hopefully) will solve the problem.
[18:57] cremes	sustrik: :) for your ML reply
[19:06] sustrik	you mean the primary-school-like one?
[19:06] sustrik	:)
[19:13] technoweenie	hey is there a way to do a safe close on a socket? i want to close the socket from receiving any more messages, but i want to process any that have come in on the socket but not returned from a lib's #recv call
[19:16] sustrik	technoweenie: no, there's not, but there have been some discussion about that kind of functionality
[19:17] technoweenie	i have a little project that just takes crap off a zmq socket into some local queue, but there's still a minute chance of dropping a message if i have to restart something
[19:18] sustrik	yes, that's the reasoning for adding that kind of thing
[19:18] technoweenie	cool i'm so on board with that :)
[19:19] sustrik	it's not as simple as it seems though
[19:19] sustrik	it requires some kind of handshake between the peers prior to terminating
[19:19] sustrik	but what if handshake cannot be completed?
[19:20] sustrik	you can time out but then you can loose a message anyway
[19:20] sustrik	etc.
[19:22] technoweenie	ah
[19:44] technoweenie	sustrik_: is there an issue or something for this? or is everything just on the ML
[19:46] sustrik	technoweenie: it's only on the ML, but it intersects with LIBZMQ-160 issue
[19:47] sustrik	https://zeromq.jira.com/browse/LIBZMQ-160
[19:47] technoweenie	thanks
[20:01] tarcieri	cremes: you around?
[20:02] cremes	tarcieri: yessir
[20:03] tarcieri	cremes: is there a way I can get a ZMQ::Poller to actually monitor a file descriptor?
[20:03] tarcieri	I was trying to do this: https://github.com/tarcieri/dcell/blob/master/lib/celluloid/zmq/reactor.rb#L15
[20:03] tarcieri	it didn't work
[20:04] cremes	tarcieri: ah yes... i put some basic framework in there for watching file descriptors but it isn't tested alas
[20:04] cremes	let me look at it real quickly...
[20:06] cremes	tarcieri: what is that call to @poller.register() returning? are you getting false or an integer?
[20:06] tarcieri	-1 iirc
[20:06] cremes	hmmm, i don't think that's possible
[20:06] tarcieri	I might be wrong, let me double check
[20:11] cremes	tarcieri: looks like registering will work, but a later call to retrieve readables/writables won't pick up that poll_item
[20:11] cremes	the code i wrote assumes it is always a socket and never a file descriptor :(
[20:13] tarcieri	hmm, it's returning 1 on YARV
[20:13] cremes	right, if registration works you'll get a fixnum back
[20:13] tarcieri	and on JRuby
[20:13] tarcieri	wtf o_O
[20:13] tarcieri	okay so it works
[20:13] tarcieri	I swear it wasn't working before :D
[20:13] cremes	it's the later call to #poll which in turn updates two arrays (readables/writables) that doesn't work
[20:14] cremes	or rather, it only works for sockets and not for file descriptors
[20:14] tarcieri	well I swear I couldn't even get as far as registering it before
[20:14] cremes	let me think about how to solve that so both are treated equally
[20:14] tarcieri	cool
[20:14] tarcieri	I'd be fine if you just dropped an integer into readables/writables
[20:15] cremes	ok
[20:17] tarcieri	maybe it broke when I actually tried to poll
[20:17] tarcieri	I'm trying to remember
[20:21] tarcieri	well, anyway if you got that working it'd be nice, until then I'm going with a slightly more ghetto implement that only blocks for 100ms at a time
[20:24] cremes	tarcieri: i just pushed what i hope is a fix; please pull from master and build & install a new gem
[20:24] tarcieri	ok
[20:25] cremes	if the library blows up when you run your code, gist me the error :)
[20:25] tarcieri	lol
[20:25] cremes	hey, you get what you pay for!
[20:30] tarcieri	I get -1 out of the call to poll
[20:32] tarcieri	any way to get a more descriptive error message?
[20:32] tarcieri	like grab errno or whatever
[20:33] cremes	tarcieri: you can call ZMQ::Util.errno and ZMQ::Util.error_string to get the errno and the english text for it
[20:33] tarcieri	aah, cool
[20:35] tarcieri	cremes: Socket operation on non-socket
[20:35] tarcieri	so, the file descriptor I'm registering isn't a socket
[20:35] tarcieri	it's a pipe
[20:35] tarcieri	problem?
[20:36] cremes	tarcieri: that's a question for the 0mq core guys
[20:36] cremes	i think it should "just work"
[20:36] tarcieri	I could switch to a socketpair
[20:37] cremes	sure
[20:37] tarcieri	if that would actually help
[20:38] cremes	a socketpair is a socket, no? i think the fd should work too though.
[20:38] cremes	tarcieri: you are probably venturing into an area where not many devs have exercised the code in libzmq
[20:39] tarcieri	o_O heh
[20:39] cremes	typically folks go the other way where they get a 0mq socket file descriptor and register it with poll/select/kqueue
[20:39] tarcieri	nor jruby
[20:39] tarcieri	> sockets = Socket.pair(Socket::AF_UNIX, Socket::SOCK_STREAM, 0)
[20:39] tarcieri	NoMethodError: undefined method `pair' for Socket:Class
[20:39] tarcieri	cremes: perhaps I'll just make an inproc pair socket and be done with it
[20:40] tarcieri	like 0MQ socket
[20:40] cremes	i don't know what you're trying to do, but sure :)
[20:40] tarcieri	I'm trying to adapt some of the code for multiplexing I/O inside of actors to 0MQ
[20:41] tarcieri	the original purpose was to wake up an actor making a blocking syscall
[20:41] tarcieri	I think the POLS here is to just go 100% 0MQ
[20:41] grantr	tarcieri, so you're thinking of using a pair socket?
[20:41] tarcieri	yes
[20:42] grantr	seems like the way to go
[20:42] tarcieri	and poll that alongside a PULL socket
[20:42] tarcieri	which is how the mailbox actually receives messages from the network
[20:42] cremes	tarcieri: i like it
[20:43] tarcieri	so, problem
[20:43] tarcieri	aren't sockets thread-specific?
[20:43] grantr	so does each node have a pull socket, or each actor? is there one actor that dispatches all messages to node-local actors?
[20:43] grantr	tarcieri, they are not thread-safe
[20:43] cremes	tarcieri: yes, one socket per thread, pls
[20:43] tarcieri	grantr: each node has a single pull socket
[20:43] cremes	contexts are thread safe but sockets ain't
[20:43] tarcieri	the part I'm trying to replace is how other in-process actors would wake up the mailbox
[20:44] tarcieri	the mailbox can get messages one of two places: from the network, or in process
[20:44] cremes	tarcieri: you can have a single socket bind to multiple transports
[20:44] tarcieri	I guess what I haven't mentioned: the node mailbox is an actor
[20:44] cremes	e.g. socket.bind("inproc://mailbox")
[20:44] cremes	and socket.bind("tcp://127.0.0.1:5555")
[20:44] cremes	connect works the same way too
[20:45] tarcieri	cremes: I suppose I could create a thread-specific PAIR socket and tear it down each time you send a message to one of the 0MQ actors
[20:45] tarcieri	seems bad?
[20:45] cremes	yes, you don't want to close/open sockets continuously
[20:45] tarcieri	oi
[20:45] cremes	why would you need to tear it down?
[20:45] tarcieri	le sigh, how to explain this
[20:46] cremes	btw, that multiple bind/connect trick works with all sockets except pair
[20:46] cremes	:)
[20:46] tarcieri	other objects in the system need to talk to the mailbox object
[20:46] tarcieri	that mailbox is going to be blocking in the ZMQ::Poller, waiting for incoming messages from the network
[20:46] tarcieri	but if something inproc sends it a control message, it needs to wake up too
[20:47] tarcieri	and those messages can come from N threads
[20:47] tarcieri	making N sockets to do that seems bad to me
[20:47] cremes	ok, i get it
[20:47] cremes	so the mailbox has a single PULL socket
[20:47] tarcieri	yeah
[20:47] cremes	and the N threads have PUSH sockets, yes?
[20:48] tarcieri	so, that was the goal
[20:48] cremes	this is fine...
[20:48] tarcieri	but where you have N nodes on the network with PUSH sockets
[20:48] tarcieri	but, ungh
[20:48] cremes	the mailbox PULL socket should bind twice, once to inproc and a second time to a well-known port
[20:48] cremes	having N PUSH sockets is fine
[20:48] tarcieri	this totally doesn't need N push sockets for the inproc case
[20:48] cremes	i have a distributed app that typically creates 20_000 sockets all running at the same time
[20:49] tarcieri	it works today with one pipe
[20:49] tarcieri	and the pipe is only there to unblock the syscall
[20:49] tarcieri	it's just sending a single event, "wake up and check your incoming messages"
[20:49] cremes	N push sockets will use up about 50 bytes + 2 file descriptors * N
[20:50] tarcieri	2 file descriptors * N is really bad
[20:50] tarcieri	2 file descriptors alone is really bad
[20:50] cremes	heh
[20:50] tarcieri	it currently uses 2 file descriptors
[20:50] cremes	maybe you could post this to the ML and we could get some input from other 0mq experts
[20:50] tarcieri	I just need a thread safe way to unblock the poller
[20:51] cremes	who can send this message to unblock the poller? any of the N threads?
[20:51] tarcieri	yes
[20:51] cremes	ok... an idea
[20:51] cremes	you can use a 0mq socket from multiple threads if you surround it with a mutex
[20:51] cremes	maybe for this case that would be ok
[20:52] cremes	@mutex.synchronize { push.send(wakeup) }
[20:52] tarcieri	oh, cool
[20:52] tarcieri	seems good
[20:52] cremes	that's a choice, N sockets is a choice, and using something other than 0mq is a choice :)
[20:52] tarcieri	I'm kind of fighting the grain of 0MQ by the fact that Celluloid already has an in-process messaging system
[20:53] tarcieri	and 0MQ would really like to be the in-process messaging system
[20:53] tarcieri	I'm just trying to marry Celluloid's existing system to 0MQ in a reasonably sane manner
[20:53] cremes	right
[20:54] cremes	well, do the simplest thing possible to get it working and go from there... that's usually my plan
[20:54] tarcieri	yeah, already did that
[20:54] tarcieri	and did the wonky poll for 100ms crap
[20:54] cremes	ideally we could get that file descriptor thing working
[20:54] tarcieri	poll zmq, poll the socket
[20:54] cremes	tarcieri: you could get the zmq socket fds and poll those from the system poll you know
[20:54] tarcieri	orly
[20:55] tarcieri	well Celluloid can already do that with Celluloid::IO
[20:55] tarcieri	that'd be perfect
[20:55] cremes	look at the man page for zmq_getsockopt() and ZM_FD
[20:55] tarcieri	err
[20:55] tarcieri	I'd need to make an IO object I can hand to select :/
[20:55] cremes	ah yes, ruby land
[20:55] cremes	there be dragons as far as i'm concerned :)
[20:56] tarcieri	I think I'll just try a pair socket + a mutex
[20:56] tarcieri	seems good
[20:56] cremes	ok
[20:56] cremes	well, you can avoid creating the second PAIR socket if you just share a PUSH via the mutex
[20:56] cremes	have the mailbox PULL socket bind to inproc and ipc (or tcp)
[20:56] cremes	and then share a PUSH socket amongst the threads
[20:57] tarcieri	ok
[20:57] cremes	then you don't need PAIR at all (which may disappear from future iterations of the library)
[20:57] cremes	when i get back from my trip, i'll take a peek at the work necessary to wrap a 0mq FD as a ruby io object
[20:58] tarcieri	cool
[21:00] bb	Is there anyway I can see the number of queued items? Lets say I have a Push/Pull architecture and my Pull worker died so messages are building up..
[21:00] mikko	bb: nope
[21:01] mikko	bb: you can use high watermark to protect yourself from this scenario
[21:01] bb	Ah thanks.
[21:01] mikko	and log when high watermark has been reached
[21:01] mikko	but currently there is no functionality for seeing queue size
[21:01] mikko	this is under debate
[21:02] bb	ok I'll look into that
[21:06] cremes	tarcieri: i just took a quick peek at wrapping a 0mq socket up as a Ruby IO object
[21:06] cremes	tarcieri: looks pretty easy
[21:07] cremes	though first looks are often deceptive :)
[21:08] tarcieri	heh, cool
[21:12] bb	Something I didn't quite get - So if I don't set a ZMQ_HWM, (by default its 0...) messages will keep queuing up until I run out of memory?
[21:17] cremes	bb: read the man page for zmq_socket; it describes the behavior of each socket when HWM is set
[21:50] tarcieri	cremes: lol, I don't think pairs work anyway
[21:51] tarcieri	Assertion failed: !inpipe && !outpipe (pair.cpp:49)
[21:52] mikko	tarcieri: is that easily reprocible?
[21:52] tarcieri	yes
[21:52] tarcieri	100% of the time even
[21:53] mikko	are you using the socket from multiple threads?
[21:53] tarcieri	yes, albeit with a mutex
[21:53] tarcieri	cremes said it wasok
[21:53] tarcieri	apparently not?
[21:54] mikko	well, it looks you are either accessing the socket concurrently or there hasnt been a full memory barrier
[21:55] mikko	if you are for example polling in one thread and reading in another you need to make sure that both these use mutual exclusion
[21:55] tarcieri	I made two pair sockets
[21:55] tarcieri	a "sender" which is shared among threads
[21:55] tarcieri	that's wrapped in a mutex
[21:55] tarcieri	and the receiver
[21:55] tarcieri	that's exclusive
[21:56] mikko	why pair?
[21:56] mikko	is there something preventing you from using socket per thread?
[21:56] mikko	usually simplifies the code a lot
[21:56] tarcieri	heh, scroll up
[21:57] tarcieri	I can try rewriting it so the in process and network messages are handled by a single PULL socket
[21:57] tarcieri	that'd be quite a significant change from what I have now though
[21:57] cremes	tarcieri: why is it a rewrite to use a second shared PUSH socket for in process?
[21:58] cremes	i don't see why you need to use pair...?
[21:58] mikko	tarcieri: you can bind the socket multiple times
[21:58] tarcieri	that's not the issue
[21:58] tarcieri	I'm trying to replace the pipe which was previously used to unblock the event loop
[21:58] mikko	tarcieri: i would bet that the assertion happens due to concurrent access
[21:59] mikko	Assertion failed: !inpipe && !outpipe (pair.cpp:49) this one
[21:59] tarcieri	yeah
[21:59] mikko	unless we horribly broke pair sockets recently
[21:59] tarcieri	wish I knew which socket was implicated
[21:59] mikko	which i think is less likely
[21:59] mikko	tarcieri: this is the pain you usually want to get away from by using inproc sockets and not sharing between threads
[22:00] cremes	tarcieri: why share a pair amongst threads instead of push?
[22:00] tarcieri	cremes: it's... a significant change from how the code works now
[22:00] tarcieri	it can be done
[22:00] tarcieri	it would involve ripping out what I have working now and starting over
[22:01] cremes	i don't follow... you are able to use a PAIR without the rewrite, yes?
[22:01] cremes	why can't you swap push for pair?
[22:01] tarcieri	I was basing a lot of this code on how Celluloid handles multiplexing IO with the actor mailbox
[22:01] tarcieri	I... kinda can I guess?
[22:01] cremes	i don't see why not
[22:01] tarcieri	it'd be pretty wonky
[22:02] cremes	can you point me to the code?
[22:02] tarcieri	it's sitting in my working copy
[22:02] tarcieri	there's a lot of... concerns that are cleanly separated now
[22:02] cremes	ok
[22:03] tarcieri	to mash it together would involve colluding a bunch of components that are cleanly separated at the moment
[22:03] cremes	try this... replace the shared pair socket with a push, and replace the single pair socket with a pull
[22:03] cremes	and see if it works
[22:03] cremes	don't rewrite anything else... just change those two socket types
[22:03] tarcieri	Celluloid lets you have duck types roaming around that talk the same method protocol
[22:04] tarcieri	ok
[22:04] cremes	you only want to use this to wake up the mailbox... you aren't actually passing data so it shouldn't matter
[22:04] tarcieri	it will almost certainly still be broken if concurrent access is the issue
[22:04] cremes	true
[22:04] cremes	but a ruby mutex should be executing a full memory barrier which is all that's necessary
[22:05] cremes	shit... i gotta run for about an hour
[22:05] cremes	post your issues here and i'll get back to them in a bit
[22:05] tarcieri	okay, now it looks like this:
[22:05] tarcieri	SENDING SPIKE...
[22:05] tarcieri	Assertion failed: inpipe_ && !outpipe_ (pull.cpp:42)
[22:05] tarcieri	puts "SENDING SPIKE..."
[22:05] tarcieri	@sender_lock.synchronize { @sender.send_string PAYLOAD }
[22:06] cremes	tarcieri: what version of 0mq? 2.1.10 or master, i hope
[22:06] tarcieri	uhh, old
[22:07] tarcieri	2.1.10
[22:07] cremes	oh, that isn't old
[22:07] cremes	i have to go; i'll try to repro a little later
[22:07] cremes	but i do have a spec in ffi-rzmq that uses push sockets from multiple threads to a single pull
[22:07] tarcieri	orly
[22:07] cremes	it could probably be easily changed to use the same push from multiple but protected via a mutex
[22:08] tarcieri	wonder what I'm doing wrong then
[22:08] cremes	look at the pushpull_spec
[22:08] tarcieri	^^^ should be fine, right?
[22:08] tarcieri	yeah sure
[22:08] cremes	that one uses a dedicated socket per thread
[22:08] cremes	but it could be changed to use one socket with a mutex to see if we can repro your issue
[22:09] cremes	later
[22:09] tarcieri	cool
[22:09] tarcieri	later
[22:39] d	ddiioo
[23:05] cremes	tarcieri: i adapted that spec to use one push socket shared across 4 threads with a mutex; it worked just fine
[23:05] cremes	you have something else going on
[23:06] cremes	tarcieri: i pushed the change to the repository; it's in pushpull_spec.rb

ZeroMq Home

Wednesday November 2, 2011