Monday September 13, 2010

[Time] NameMessage
[00:00] kenkeiter sleeperbot: it was worth a try.. did you verify that you're only running one or two messaging threads?
[00:00] andrewvc I've been curious as to how stable the node driver is
[00:00] kenkeiter :/
[00:03] sleeperbot I'm looking up how to do that
[00:03] sleeperbot do you know what I can type in the command line to bring that info up?
[00:04] kenkeiter sleeperbot: which platform?
[00:04] sleeperbot unix
[00:04] sleeperbot ubuntu karmic
[00:07] kenkeiter htop might work.. haven't done it under *ni
[00:09] kenkeiter
[00:14] sleeperbot I see 3 versions of my node.js stream and web servers
[00:14] sleeperbot don't see anything related to zmq
[00:15] sleeperbot killed the extraneous processes, will check if anything changed in cpu usage
[03:34] andrewvc I assume that XREQ/XREP sockets apply backpressure in the same manner as PUSH/PULL and REQ/REP yes?
[11:28] CIA-20 zeromq2: 03Martin Lucina 07master * rbe159b6 10/ src/pipe.cpp : zmq::writer_t: Add missing test for swap -
[11:29] icy sustrik: hi, is there any paper on the algorithm used for the lock-free queue?
[11:30] sustrik icy: there's a very old article here:
[11:30] sustrik
[11:30] sustrik lot of it doesn't apply any more
[11:31] sustrik this is what still applies: "+ Table of Contents
[11:31] sustrik - Table of Contents
[11:31] sustrik FoldUnfold
[11:31] sustrik Table of Contents
[11:31] sustrik Introduction
[11:31] sustrik Design
[11:31] sustrik Performance
[11:31] sustrik Configuration
[11:31] sustrik Performance with polling
[11:31] sustrik Performance without polling
[11:31] sustrik Conclusion
[11:31] sustrik Introduction
[11:32] sustrik Y-suite is a set of components designed for ultra-efficient passing of messages between threads within a process. Y-suite is somehow similar to local sockets, however, it is much faster.
[11:32] sustrik In version 0.1 of ØMQ lightweight messaging kernel, the only y-suite component available is ypipe, a lock-free and wait-free implementation of a queue. In version 0.2 ypollset is added to allow thread to interchange messages with several other threads at the same time (similar to POSIX poll function). Component known as semaphore in version 0.1 is renamed to ysemaphore in version 0.2 to mark that it belongs to y-suite. Same way, spipe i
[11:32] sustrik s renamed to ysocketpair.
[11:32] sustrik Design
[11:32] sustrik The basic means of transferring message between threads is ypipe. Messages are passed through a pipe in the standard write and read manner. Once the reader has no more messages to read from the pipe, it notifies the sender using passive synchronization and goes asleep. Passive synchronization means that the other thread is not notified directly using some kind of async signal, rather it will be notified once it tries to write the next me
[11:32] sustrik ssage to the pipe. When this happens, writer becomes aware that reader is already asleep or at least going asleep at the moment. It knows that there is new message available, so it wakes the reader up using active synchronization, i.e. actively sending wake-up event to the other thread. Active synchronisation is not provided by ypipe itself, rather by other y-suite components, to be discussed bellow. Usage of ypipe is depicted on the fol
[11:32] sustrik lowing sequence diagram:"
[11:32] sustrik yuck
[11:32] sustrik sorry
[11:33] sustrik too much text, but the last paragraph is relevant
[11:33] sustrik also see the diagram that follows the text above
[11:34] ekidd Good morning! ZeroMQ is a really nice library.
[11:35] ekidd If I'm using REQ/REP messaging with multiple servers, what happens if one server is asked to handle an unusually long-running request?
[11:36] ekidd Do the clients just route requests to one of the available servers? Or do they continue to send requests to the busy server?
[11:36] sustrik ekidd: if you set high watermark, it's queue gets eventually full and subsequent requests will be dispatches to other servers
[11:36] icy sustrik: yea I've read that, I guess because it is single-reader single-write, it does not suffer from the ABA problem?
[11:37] ekidd sustrik: Ah, OK. The useful high watermark in my case is very small: The servers are inherently single-threaded workers with long-running jobs. I want to keep them loaded.
[11:37] sustrik icy: what's ABA?
[11:38] ekidd I do, however, have lots of clients and servers.
[11:38] icy sustrik:
[11:39] icy sustrik: it's one of the main problems that lock-free queues have to overcome
[11:39] guido_g it takes an unusually long time to complete the request
[11:40] ekidd icy: My clients and servers are different machines, so I don't think the lock-free stuff is relevant. But I might be confused.
[11:40] sustrik ekidd: that's a different conversion going on :)
[11:40] ekidd Ah, OK. I was confused. :-)
[11:41] guido_g ekidd: did you see that req/req is locked to the send/recv order?
[11:41] sustrik ekidd: there's no such thing in 0MQ as explicit ack
[11:41] sustrik so there's no way for it to work in lock-step fashion
[11:41] ekidd guido_g: Yeah, that works for me.
[11:42] sustrik icy: it's basically a two step process
[11:42] guido_g same for the rep side (the server)
[11:42] ekidd I basically have a farm of Windows workers that take 0.1 to (say) 60 seconds to process a job, and idle time costs money. There's one worker per server.
[11:42] guido_g see the user guide for lots of examples and ideas
[11:43] sustrik icy: while there are messages to be read the synchronisation is done simply by moving a pointer in the linked list in atomic manner
[11:43] sustrik icy: when there are no messages to be read, the pointer becomes NULL
[11:43] ekidd I don't mind locking the order of responses the way req/rep does: I'm talking to expensive, single-threaded Windows libraries in any case.
[11:44] icy sustrik: understood so far and it seems it does not suffer from aba, just was curious if there was real proof of the correctness of the algorithm
[11:44] sustrik icy: reader goes asleep and standard inter-thread mechanism (socketpair) is used to wake it up
[11:44] ekidd But I want to maximize utilization of those expensive libraries.
[11:44] sustrik icy: no
[11:44] sustrik want to prove it?
[11:44] icy that would take more time than I have probably :)
[11:45] ekidd As long as zeromq clients respect the individual server's high water marks and route requests to another worker, everything will work fine.
[11:46] ekidd I'm going to write some tests (of course). I just wanted to know whether I was even trying something sane. :-)
[11:48] ekidd Many thanks for your advice, folks!
[11:51] guido_g ekidd: did you read, there is something on hwm
[11:54] ekidd guido_g: Excellent. It definitely has the right semantics. I'll still need to find out whether it does the right thing, performance-wise, with large messages and queues that are often at their high water marks.
[13:44] cremes while writing some specs for my bindings this weekend i came across a few issues with SWAP, RECOVERY_IVL and RATE
[13:45] cremes all 3 of those take signed 64-bit integers for input
[13:45] cremes they also do *not* return an error when passed a negative number even though that doesn't make any sense
[13:45] cremes should the library return an error for negative numbers or should my bindings take care of that issue?
[13:57] ptrb so I have this:
[13:58] ptrb I start the server, it sits at zmq_recv(), great; i run the client, it runs fine and exits, but the server never receives anything. ideas?
[14:00] pieterh cremes: i think all the setsockopt types need to be reviewed for 3.0
[14:00] cremes pieterh: ok; so should i open bugs for those against the 2.1.x branch?
[14:00] pieterh but certainly if they are signed get a negative value that should return EINVAL
[14:00] pieterh yup, even 2.0.x IMO
[14:00] cremes ok, i'll do that now
[14:01] pieterh ptrb: looking at it...
[14:02] ptrb pieterh: thx; I'm guessing there's some setup step I've overlooked
[14:02] pieterh ptrb: try 'ps'
[14:02] pieterh imo you have a second copy of the server running
[14:02] pieterh (though it would assert then...)
[14:03] pieterh sorry, forget I said that plz
[14:03] ptrb hmm, no, but maybe something else is sitting on 5001, let me try changing that
[14:03] pieterh ptrb: client writes a message and then closes & exits
[14:03] pieterh two things: (a) it should wait for a reply
[14:03] pieterh (b) if it does not want to wait, it can't exit immediately
[14:03] pieterh you need to read the users guide
[14:04] ptrb I have.
[14:04] pieterh 0mq/2.0.x loses data if you close the socket while there is data in flight
[14:04] cremes ptrb: are you starting the server first?
[14:04] ptrb Of course.
[14:04] pieterh send/close is not going to work
[14:04] pieterh send/recv/close is ok
[14:04] pieterh send/sleep/close is ok
[14:04] ptrb OK, so, do I need to recv() in the clie... k
[14:04] cremes ah yes, that's right
[14:04] cremes do a sleep before exiting
[14:04] ptrb even if I don't post anything back explicitly?
[14:04] pieterh prtb: either a recv
[14:05] pieterh prtb if you're using REQ and REP sockets, you should be doing send/recv and recv/send
[14:05] pieterh if you want to just send 1 message as such use PUSH/PULL
[14:05] pieterh it's not a biggie
[14:05] pieterh the problem here is not giving the client process time to send its data
[14:05] ptrb I'm doing something vaguely RPC-ish, so I guess if I want to represent a void blah(); I still have to send something back
[14:06] pieterh or else use XREQ/XREP
[14:06] ptrb yeah it makes sense, sure. thanks. i guess it's just not explicit anywhere in the docs (afaict)
[14:06] pieterh rtfug... :-)
[14:06] pieterh it is explicit in there
[14:06] ptrb i have; if you want to point me to the sentence in question I'm happy to be made a fool
[14:07] pieterh Note that we do sleep (1); before exiting the ventilator. This is a hack that gets around ØMQ/2.0's design, which discards messages that have not yet been sent, if you exit the program too soon. If you are using ØMQ/2.1 you can remove this sleep statement.
[14:07] ptrb eh.
[14:08] pieterh
[14:08] pieterh it's the first example that has this problem, so I explain it there
[14:08] pieterh the hello world client waits for an answer
[14:08] pieterh and the pubsub example never exits
[14:09] pieterh maybe i should put it in bold...
[14:09] pieterh and repeat this, it's a common fault
[14:11] ptrb if you're willing to take some constructive criticism about the documentation, i'd say that while example-based docs are great, when I have a specific problem (like this) I find there isn't really a way I can find a solution; there's no idioms or implementation details or whatever to search through (as far as I've found)
[14:13] ptrb but!! but but, thank you :)
[14:19] ptrb hmm, recv on the client side hangs... is there not some zmq_flush or something I can call?
[14:20] guido_g no
[14:20] ptrb poop :|
[14:21] guido_g pardon?
[14:21] ptrb that was an expression of mild disappointment
[15:35] pieterh ptrb: still there?
[15:36] pieterh sorry, was in a meeting
[15:36] ptrb yeah sure
[15:36] pieterh making a problem driven section in the guide would be good
[15:37] pieterh did you find out why your client hangs?
[15:37] ptrb No, I just threw a sleep in there and moved on to bigger, even more problematic things :)
[15:37] ptrb a problem-driven section would be good, but it'll never be comprehensive
[15:38] ptrb FWIW I think a good documentation model would be ZeroC's ICE, which has a really comprehensive .pdf
[15:38] pieterh "did not get a message" is a pretty classic stumbling block
[15:38] ptrb yeah, fair
[15:38] pieterh i'll write a flowchart
[15:40] ptrb now, i'm working on an implementation based on the multithreaded code in the user guide, and i'm getting infinite size-0 messages on the server side after sending one legitimate message from a client
[15:40] ptrb ever hear of something like this?
[15:42] ptrb sorry, based on the multithreaded server in the *introduction* doc
[15:43] cremes ptrb: i've never seen that... you say your code is "based on" the example; it's always a good idea to start from code that you *know* works and modify from there
[15:43] cremes sounds like your mods broke it
[15:44] cremes the easiest way to find the failure is to revert back to the original "good" code and slowly modify it to your specifications
[15:44] ptrb yeah. i know. i'm trying to drop the server into an existing process to provide a zmq "layer", so there's not really any way to iterate my way to where I am now.
[15:45] ptrb i guess i can try taking out some functionality.
[15:45] cremes did you change the code that sends 0mq messages?
[15:48] ptrb yes; in ways i initially thought were inconsequential, but i suppose i'm in an assumption-revalidating mood :)
[15:49] ptrb as a meta-comment, it's really great you guys are hanging out on irc to help folks; zmq is a great project and this is a great resource.
[15:50] ptrb aha! so, if i zmq_recv(), get a message, and don't zmq_send() something in the server, subsequent zmq_recv()s have the effect of not blocking
[15:51] ptrb ...which seems quite strange to me
[15:51] cremes ptrb: what kind of socket are you using on this server side?
[15:53] cremes because that behavior doesn't sound right; the zmq_recv() call is returning an error, right?
[15:54] ptrb yeah, it returns -1 EAGAIN
[15:54] ptrb i believe it's EAGAIN, at least.
[15:56] ptrb the topology is the multithread server example in the intro doc: public tcp XREP endpoint, managed by one thread running zmq_device(ZMQ_QUEUE, ...), forwarding via XREQ to an inproc endpoint, being consumed by worker threads binding to REP
[16:02] cremes can you provide a code pastie?
[16:03] ptrb it won't be complete, but sure, one sec...
[16:03] cremes it doesn't need to be complete... i want to see the code that sets up the socket and calls recv on it
[16:04] ptrb the worker thread ultimately responsible for processing the recv, right?
[16:04] cremes whatever code is returning -1 EAGAIN
[16:05] ptrb
[16:08] cremes ptrb: in your DEBUG statement, also print out the value of zmq_strerror()
[16:08] cremes i need more information to figure this out
[16:09] ptrb Operation cannot be accomplished in current state
[16:10] cremes ah, then there we have it; with a REP socket you can't call recv again until you have subsequently called send
[16:10] cremes it needs that recv/send/recv pattern because it maintains a small internal state machine
[16:10] ptrb oh, interesting
[16:10] cremes that's the whole point of the REQ/REP socket pattern
[16:10] ptrb ok
[16:10] ptrb see, this is useful! this should be on a website somewhere :)
[16:10] cremes the worker is supposed to respond when it is done, right?
[16:10] ptrb well, my thought is that it may optionally respond
[16:10] cremes it is for sure
[16:10] ptrb but if it *has* to respond, that's fine too
[16:11] cremes if you want it to be optional, use XREP sockets
[16:11] cremes that kind of socket does not enforce the recv/send/recv pattern
[16:11] ptrb I suspect there is more to XREP than simply dropping that enforcement, though
[16:13] cremes ptrb: not really; REP sockets are built on top of the XREP socket
[16:14] ptrb hmm, interesting
[16:14] cremes REP sockets know how to "route" their responses back over multiple hops
[16:14] cremes you need to do a little extra work when using an XREP socket to retain that functionality
[16:15] cremes this might help a little:
[16:26] ptrb curiouser and curiouser
[20:37] ModusPwnens Hi guys, I have a question on req/rep topology
[20:38] ModusPwnens Previously, I have been doing benchmarking with the subscriber/publisher toplogy, but I wanted to see what results I would get with req/rep
[20:38] ModusPwnens and I was wondering if there is anything else I need to do besides the obvious change of the socket types and adding in addition send/recv function calls to avoid blocking the code
[20:39] ModusPwnens because i noticed after I did those things, rather than sending a message of X bytes, it sends X messages of 1 byte
[20:41] ModusPwnens Actually, I lied. It seems to just send a lot of zero byte messages
[20:41] cremes ModusPwnens: pastie some code, because it should "just work"
[20:51] ModusPwnens Actually, i figured out what it was. However, should rep/req have better or worse performance than sub/pub?
[21:07] cremes ModusPwnens: same perf but round-trip latency is higher (no such notion as round-trip latency with pub/sub)