ZeroMq IRC Log

Monday September 13, 2010

[Time] Name	Message
[00:00] kenkeiter	sleeperbot: it was worth a try.. did you verify that you're only running one or two messaging threads?
[00:00] andrewvc	I've been curious as to how stable the node driver is
[00:00] kenkeiter	:/
[00:03] sleeperbot	I'm looking up how to do that
[00:03] sleeperbot	do you know what I can type in the command line to bring that info up?
[00:04] kenkeiter	sleeperbot: which platform?
[00:04] sleeperbot	unix
[00:04] sleeperbot	ubuntu karmic
[00:07] kenkeiter	htop might work.. haven't done it under *ni
[00:09] kenkeiter	http://manpages.ubuntu.com/manpages/lucid/man1/htop.1.html
[00:14] sleeperbot	I see 3 versions of my node.js stream and web servers
[00:14] sleeperbot	don't see anything related to zmq
[00:15] sleeperbot	killed the extraneous processes, will check if anything changed in cpu usage
[03:34] andrewvc	I assume that XREQ/XREP sockets apply backpressure in the same manner as PUSH/PULL and REQ/REP yes?
[11:28] CIA-20	zeromq2: 03Martin Lucina 07master * rbe159b6 10/ src/pipe.cpp : zmq::writer_t: Add missing test for swap - http://bit.ly/aGN4bM
[11:29] icy	sustrik: hi, is there any paper on the algorithm used for the lock-free queue?
[11:30] sustrik	icy: there's a very old article here:
[11:30] sustrik	http://www.zeromq.org/whitepapers:y-suite
[11:30] sustrik	lot of it doesn't apply any more
[11:31] sustrik	this is what still applies: "+ Table of Contents
[11:31] sustrik	- Table of Contents
[11:31] sustrik	FoldUnfold
[11:31] sustrik	Table of Contents
[11:31] sustrik	Introduction
[11:31] sustrik	Design
[11:31] sustrik	Performance
[11:31] sustrik	Configuration
[11:31] sustrik	Performance with polling
[11:31] sustrik	Performance without polling
[11:31] sustrik	Conclusion
[11:31] sustrik	HISTORICAL WHITEPAPER
[11:31] sustrik	Introduction
[11:32] sustrik	Y-suite is a set of components designed for ultra-efficient passing of messages between threads within a process. Y-suite is somehow similar to local sockets, however, it is much faster.
[11:32] sustrik	In version 0.1 of ÃMQ lightweight messaging kernel, the only y-suite component available is ypipe, a lock-free and wait-free implementation of a queue. In version 0.2 ypollset is added to allow thread to interchange messages with several other threads at the same time (similar to POSIX poll function). Component known as semaphore in version 0.1 is renamed to ysemaphore in version 0.2 to mark that it belongs to y-suite. Same way, spipe i
[11:32] sustrik	s renamed to ysocketpair.
[11:32] sustrik	Design
[11:32] sustrik	The basic means of transferring message between threads is ypipe. Messages are passed through a pipe in the standard write and read manner. Once the reader has no more messages to read from the pipe, it notifies the sender using passive synchronization and goes asleep. Passive synchronization means that the other thread is not notified directly using some kind of async signal, rather it will be notified once it tries to write the next me
[11:32] sustrik	ssage to the pipe. When this happens, writer becomes aware that reader is already asleep or at least going asleep at the moment. It knows that there is new message available, so it wakes the reader up using active synchronization, i.e. actively sending wake-up event to the other thread. Active synchronisation is not provided by ypipe itself, rather by other y-suite components, to be discussed bellow. Usage of ypipe is depicted on the fol
[11:32] sustrik	lowing sequence diagram:"
[11:32] sustrik	yuck
[11:32] sustrik	sorry
[11:33] sustrik	too much text, but the last paragraph is relevant
[11:33] sustrik	also see the diagram that follows the text above
[11:34] ekidd	Good morning! ZeroMQ is a really nice library.
[11:35] ekidd	If I'm using REQ/REP messaging with multiple servers, what happens if one server is asked to handle an unusually long-running request?
[11:36] ekidd	Do the clients just route requests to one of the available servers? Or do they continue to send requests to the busy server?
[11:36] sustrik	ekidd: if you set high watermark, it's queue gets eventually full and subsequent requests will be dispatches to other servers
[11:36] icy	sustrik: yea I've read that, I guess because it is single-reader single-write, it does not suffer from the ABA problem?
[11:37] ekidd	sustrik: Ah, OK. The useful high watermark in my case is very small: The servers are inherently single-threaded workers with long-running jobs. I want to keep them loaded.
[11:37] sustrik	icy: what's ABA?
[11:38] ekidd	I do, however, have lots of clients and servers.
[11:38] icy	sustrik: http://en.wikipedia.org/wiki/ABA_problem
[11:39] icy	sustrik: it's one of the main problems that lock-free queues have to overcome
[11:39] guido_g	it takes an unusually long time to complete the request
[11:40] ekidd	icy: My clients and servers are different machines, so I don't think the lock-free stuff is relevant. But I might be confused.
[11:40] sustrik	ekidd: that's a different conversion going on :)
[11:40] ekidd	Ah, OK. I was confused. :-)
[11:41] guido_g	ekidd: did you see that req/req is locked to the send/recv order?
[11:41] sustrik	ekidd: there's no such thing in 0MQ as explicit ack
[11:41] sustrik	so there's no way for it to work in lock-step fashion
[11:41] ekidd	guido_g: Yeah, that works for me.
[11:42] sustrik	icy: it's basically a two step process
[11:42] guido_g	same for the rep side (the server)
[11:42] ekidd	I basically have a farm of Windows workers that take 0.1 to (say) 60 seconds to process a job, and idle time costs money. There's one worker per server.
[11:42] guido_g	see the user guide for lots of examples and ideas
[11:43] sustrik	icy: while there are messages to be read the synchronisation is done simply by moving a pointer in the linked list in atomic manner
[11:43] sustrik	icy: when there are no messages to be read, the pointer becomes NULL
[11:43] ekidd	I don't mind locking the order of responses the way req/rep does: I'm talking to expensive, single-threaded Windows libraries in any case.
[11:44] icy	sustrik: understood so far and it seems it does not suffer from aba, just was curious if there was real proof of the correctness of the algorithm
[11:44] sustrik	icy: reader goes asleep and standard inter-thread mechanism (socketpair) is used to wake it up
[11:44] ekidd	But I want to maximize utilization of those expensive libraries.
[11:44] sustrik	icy: no
[11:44] sustrik	want to prove it?
[11:44] icy	that would take more time than I have probably :)
[11:45] ekidd	As long as zeromq clients respect the individual server's high water marks and route requests to another worker, everything will work fine.
[11:46] ekidd	I'm going to write some tests (of course). I just wanted to know whether I was even trying something sane. :-)
[11:48] ekidd	Many thanks for your advice, folks!
[11:51] guido_g	ekidd: did you read http://api.zeromq.org/zmq_socket.html, there is something on hwm
[11:54] ekidd	guido_g: Excellent. It definitely has the right semantics. I'll still need to find out whether it does the right thing, performance-wise, with large messages and queues that are often at their high water marks.
[13:44] cremes	while writing some specs for my bindings this weekend i came across a few issues with SWAP, RECOVERY_IVL and RATE
[13:45] cremes	all 3 of those take signed 64-bit integers for input
[13:45] cremes	they also do not return an error when passed a negative number even though that doesn't make any sense
[13:45] cremes	should the library return an error for negative numbers or should my bindings take care of that issue?
[13:57] ptrb	so I have this: http://pastebin.com/dXEveLCx
[13:58] ptrb	I start the server, it sits at zmq_recv(), great; i run the client, it runs fine and exits, but the server never receives anything. ideas?
[14:00] pieterh	cremes: i think all the setsockopt types need to be reviewed for 3.0
[14:00] cremes	pieterh: ok; so should i open bugs for those against the 2.1.x branch?
[14:00] pieterh	but certainly if they are signed get a negative value that should return EINVAL
[14:00] pieterh	yup, even 2.0.x IMO
[14:00] cremes	ok, i'll do that now
[14:01] pieterh	ptrb: looking at it...
[14:02] ptrb	pieterh: thx; I'm guessing there's some setup step I've overlooked
[14:02] pieterh	ptrb: try 'ps'
[14:02] pieterh	imo you have a second copy of the server running
[14:02] pieterh	(though it would assert then...)
[14:03] pieterh	sorry, forget I said that plz
[14:03] ptrb	hmm, no, but maybe something else is sitting on 5001, let me try changing that
[14:03] pieterh	ptrb: client writes a message and then closes & exits
[14:03] pieterh	two things: (a) it should wait for a reply
[14:03] pieterh	(b) if it does not want to wait, it can't exit immediately
[14:03] pieterh	you need to read the users guide
[14:04] ptrb	I have.
[14:04] pieterh	0mq/2.0.x loses data if you close the socket while there is data in flight
[14:04] cremes	ptrb: are you starting the server first?
[14:04] ptrb	Of course.
[14:04] pieterh	send/close is not going to work
[14:04] pieterh	send/recv/close is ok
[14:04] pieterh	send/sleep/close is ok
[14:04] ptrb	OK, so, do I need to recv() in the clie... k
[14:04] cremes	ah yes, that's right
[14:04] cremes	do a sleep before exiting
[14:04] ptrb	even if I don't post anything back explicitly?
[14:04] pieterh	prtb: either a recv
[14:05] pieterh	prtb if you're using REQ and REP sockets, you should be doing send/recv and recv/send
[14:05] pieterh	if you want to just send 1 message as such use PUSH/PULL
[14:05] pieterh	it's not a biggie
[14:05] pieterh	the problem here is not giving the client process time to send its data
[14:05] ptrb	I'm doing something vaguely RPC-ish, so I guess if I want to represent a void blah(); I still have to send something back
[14:06] pieterh	or else use XREQ/XREP
[14:06] ptrb	yeah it makes sense, sure. thanks. i guess it's just not explicit anywhere in the docs (afaict)
[14:06] pieterh	rtfug... :-)
[14:06] pieterh	it is explicit in there
[14:06] ptrb	i have; if you want to point me to the sentence in question I'm happy to be made a fool
[14:07] pieterh	Note that we do sleep (1); before exiting the ventilator. This is a hack that gets around ÃMQ/2.0's design, which discards messages that have not yet been sent, if you exit the program too soon. If you are using ÃMQ/2.1 you can remove this sleep statement.
[14:07] ptrb	eh.
[14:08] pieterh	http://www.zeromq.org/docs:user-guide-1#toc7
[14:08] pieterh	it's the first example that has this problem, so I explain it there
[14:08] pieterh	the hello world client waits for an answer
[14:08] pieterh	and the pubsub example never exits
[14:09] pieterh	maybe i should put it in bold...
[14:09] pieterh	and repeat this, it's a common fault
[14:11] ptrb	if you're willing to take some constructive criticism about the documentation, i'd say that while example-based docs are great, when I have a specific problem (like this) I find there isn't really a way I can find a solution; there's no idioms or implementation details or whatever to search through (as far as I've found)
[14:13] ptrb	but!! but but, thank you :)
[14:19] ptrb	hmm, recv on the client side hangs... is there not some zmq_flush or something I can call?
[14:20] guido_g	no
[14:20] ptrb	poop :\|
[14:21] guido_g	pardon?
[14:21] ptrb	that was an expression of mild disappointment
[15:35] pieterh	ptrb: still there?
[15:36] pieterh	sorry, was in a meeting
[15:36] ptrb	yeah sure
[15:36] pieterh	making a problem driven section in the guide would be good
[15:37] pieterh	did you find out why your client hangs?
[15:37] ptrb	No, I just threw a sleep in there and moved on to bigger, even more problematic things :)
[15:37] ptrb	a problem-driven section would be good, but it'll never be comprehensive
[15:38] ptrb	FWIW I think a good documentation model would be ZeroC's ICE, which has a really comprehensive .pdf
[15:38] pieterh	"did not get a message" is a pretty classic stumbling block
[15:38] ptrb	yeah, fair
[15:38] pieterh	i'll write a flowchart
[15:40] ptrb	now, i'm working on an implementation based on the multithreaded code in the user guide, and i'm getting infinite size-0 messages on the server side after sending one legitimate message from a client
[15:40] ptrb	ever hear of something like this?
[15:42] ptrb	sorry, based on the multithreaded server in the introduction doc
[15:43] cremes	ptrb: i've never seen that... you say your code is "based on" the example; it's always a good idea to start from code that you know works and modify from there
[15:43] cremes	sounds like your mods broke it
[15:44] cremes	the easiest way to find the failure is to revert back to the original "good" code and slowly modify it to your specifications
[15:44] ptrb	yeah. i know. i'm trying to drop the server into an existing process to provide a zmq "layer", so there's not really any way to iterate my way to where I am now.
[15:45] ptrb	i guess i can try taking out some functionality.
[15:45] cremes	did you change the code that sends 0mq messages?
[15:48] ptrb	yes; in ways i initially thought were inconsequential, but i suppose i'm in an assumption-revalidating mood :)
[15:49] ptrb	as a meta-comment, it's really great you guys are hanging out on irc to help folks; zmq is a great project and this is a great resource.
[15:50] ptrb	aha! so, if i zmq_recv(), get a message, and don't zmq_send() something in the server, subsequent zmq_recv()s have the effect of not blocking
[15:51] ptrb	...which seems quite strange to me
[15:51] cremes	ptrb: what kind of socket are you using on this server side?
[15:53] cremes	because that behavior doesn't sound right; the zmq_recv() call is returning an error, right?
[15:54] ptrb	yeah, it returns -1 EAGAIN
[15:54] ptrb	i believe it's EAGAIN, at least.
[15:56] ptrb	the topology is the multithread server example in the intro doc: public tcp XREP endpoint, managed by one thread running zmq_device(ZMQ_QUEUE, ...), forwarding via XREQ to an inproc endpoint, being consumed by worker threads binding to REP
[16:02] cremes	can you provide a code pastie?
[16:03] ptrb	it won't be complete, but sure, one sec...
[16:03] cremes	it doesn't need to be complete... i want to see the code that sets up the socket and calls recv on it
[16:04] ptrb	the worker thread ultimately responsible for processing the recv, right?
[16:04] cremes	whatever code is returning -1 EAGAIN
[16:05] ptrb	http://pastebin.com/rdHn4iX8
[16:08] cremes	ptrb: in your DEBUG statement, also print out the value of zmq_strerror()
[16:08] cremes	i need more information to figure this out
[16:09] ptrb	Operation cannot be accomplished in current state
[16:10] cremes	ah, then there we have it; with a REP socket you can't call recv again until you have subsequently called send
[16:10] cremes	it needs that recv/send/recv pattern because it maintains a small internal state machine
[16:10] ptrb	oh, interesting
[16:10] cremes	that's the whole point of the REQ/REP socket pattern
[16:10] ptrb	ok
[16:10] ptrb	see, this is useful! this should be on a website somewhere :)
[16:10] cremes	the worker is supposed to respond when it is done, right?
[16:10] ptrb	well, my thought is that it may optionally respond
[16:10] cremes	it is for sure
[16:10] ptrb	but if it has to respond, that's fine too
[16:11] cremes	if you want it to be optional, use XREP sockets
[16:11] cremes	that kind of socket does not enforce the recv/send/recv pattern
[16:11] ptrb	I suspect there is more to XREP than simply dropping that enforcement, though
[16:13] cremes	ptrb: not really; REP sockets are built on top of the XREP socket
[16:14] ptrb	hmm, interesting
[16:14] cremes	REP sockets know how to "route" their responses back over multiple hops
[16:14] cremes	you need to do a little extra work when using an XREP socket to retain that functionality
[16:15] cremes	this might help a little: http://www.zeromq.org/recipe:new-recipe
[16:26] ptrb	curiouser and curiouser
[20:37] ModusPwnens	Hi guys, I have a question on req/rep topology
[20:38] ModusPwnens	Previously, I have been doing benchmarking with the subscriber/publisher toplogy, but I wanted to see what results I would get with req/rep
[20:38] ModusPwnens	and I was wondering if there is anything else I need to do besides the obvious change of the socket types and adding in addition send/recv function calls to avoid blocking the code
[20:39] ModusPwnens	because i noticed after I did those things, rather than sending a message of X bytes, it sends X messages of 1 byte
[20:41] ModusPwnens	Actually, I lied. It seems to just send a lot of zero byte messages
[20:41] cremes	ModusPwnens: pastie some code, because it should "just work"
[20:51] ModusPwnens	Actually, i figured out what it was. However, should rep/req have better or worse performance than sub/pub?
[21:07] cremes	ModusPwnens: same perf but round-trip latency is higher (no such notion as round-trip latency with pub/sub)