Thursday May 27, 2010

[Time] NameMessage
[06:42] umesh Hi
[06:42] umesh I am newbie to zeromq
[06:43] umesh I have a simple question that ... how we can use the zeromq with the XML-RPC ?
[06:43] umesh means any pointers to simple example will help
[06:53] sustrik umesh: you have to pair xml-rpc serialisation library with 0mq library
[06:55] sustrik zedas: i had a look at the leak you've reported; it seems the debug info is missing and presumably, optimiser messed with the code; can you do the same with -g -O0 built code?
[06:55] sustrik zedas: as an option, do you have a test program to reproduce the problem?
[06:55] zedas sustrik: that is with -g -O0, it's valgrind on OSX and the retarded dylibs OSX uses
[06:56] sustrik hm, strange, the filenames and linenumber seem missing
[06:56] zedas sustrik: i do have a test program, but it's my project, so probably take you too long to build it and get it working.
[06:57] zedas i'll run the test on linux and get you line numbers
[06:57] sustrik great, thanks
[06:57] zedas in the mean time, does your rbzmq actually build? it fails miserably for me because of the test for zmq_init not including the right libraries or header files.
[06:58] sustrik not sure, what version are you using 2.0.6?
[06:58] sustrik i mean 0mq version
[06:59] sustrik unfortunatrly, rbzmq is orphaned at the moment
[06:59] sustrik i am taking care of it a bit, but i am not a ruby person...
[07:01] zedas latest from git for both 0mq and rbzmq
[07:02] zedas alright i'll see what's up and give you a patch
[07:02] zedas i'm not a ruby person anymore, but seems i get asked to fix ruby stuff :-)
[07:02] sustrik that would be great, thanks :)
[07:02] umesh sustrik: Is there any some real time example which demostrate xml-rpc and zeromq
[07:03] sustrik umesh: i am not aware anyone had done that yet
[07:03] sustrik you may ask on the mailing list
[07:03] zedas sustrik: there ya go
[07:04] zedas that might be a slightly older git checkout than is current, but i can rebuild if that doesn't pinpoint the spot
[07:04] sustrik zedas: interesting, is that the only leak there?
[07:05] zedas that's the only one i can find
[07:05] sustrik ok
[07:05] zedas the others are stock sets of ram for say arc4, threads, etc.
[07:05] zedas this spot is the only one that grows perpetually and causes server death
[07:05] zedas oom killer
[07:06] sustrik one leak for each connection created, right?
[07:06] zedas no, i think one for each message passed
[07:07] sustrik hm
[07:07] zedas so i get 2MB of leaks from about that much data transmitted, but only a few hundred connections
[07:07] zedas but, i could be wrong. maybe the unit tests are doing more than i think.
[07:07] zedas or, 0mq is doing more connections behind the scenes
[07:07] zedas let me get you the code that does this, it's pretty simple
[07:08] sustrik ok
[07:08] zedas
[07:08] zedas that's it
[07:08] zedas nothing else in the system does 0mq, and that's the entire 0mq usage
[07:09] zedas and all the tests are over the REQ/REP socket, not the pub/sub socket
[07:09] zedas so lines 84-86, which calls perform_request
[07:10] zedas ohhhhh hey, see how i'm passing in the same socket as an in and out socket? could that be causing it?
[07:10] zedas perform_request(in_socket, in_socket)
[07:10] zedas they're by ref (&) so not sure, but i'm not clear how the memory handling is done
[07:11] sustrik the strange thing that the leak, as reported by valgrind
[07:11] sustrik is tied to connection creation
[07:11] sustrik how many client do you have
[07:11] sustrik just one?
[07:14] zedas yep.
[07:14] zedas i can show you the python code for the client, it's not much
[07:15] sustrik wait a sec
[07:16] sustrik no luck :(
[07:16] zedas
[07:16] sustrik thx
[07:16] zedas that's what makes the connection
[07:17] zedas and i only make one for the whole test code run
[07:17] zedas so, i can try putting in an explicit close, but iirc there isn't one in the python library, similar to the c++ library
[07:18] sustrik client is perfectly ok
[07:18] sustrik the most basic use case
[07:18] zedas yep, super vanilla
[07:18] sustrik i'll try to run a code similar to one in your server
[07:18] sustrik we'll see...
[07:18] zedas and the server isn't doing much either. it gets the data out, and then hands it to another class or two
[07:18] sustrik but the leak is in server, right?
[07:19] sustrik sure it is
[07:28] zedas yep
[07:28] zedas although i could try running a fast loop of requests in a client
[07:31] sustrik you get tghe leak with req/rep or pub/sub style?
[07:31] zedas req/rep
[07:31] sustrik ok
[07:37] sustrik the client sends one message, receive a reply and exits
[07:37] sustrik or does it run in a loop?
[07:38] sjampoo sustrik: interesting plans for the future
[07:39] sjampoo btw i just benched the nocopy version of pyzmq
[07:41] sustrik sjampoo: nice, what about putting in on the website?
[07:41] sustrik it's us not ns btw
[07:42] sjampoo hah, good catch
[07:42] sjampoo I think it would be nice to put it on the website when the nocopy version is release ready
[07:43] sustrik you should speak to brian about this
[07:44] sustrik i am not sure what's the right place for it
[07:44] sjampoo i will
[07:44] sustrik feel free to use's "performance" section if you have no better place
[07:46] sustrik zedas: i'm running your code but can't produce the leak here :(
[07:47] zedas sustrik: client creates the connection, then does a bunch of req/rep tests, then exits
[07:48] zedas the server is a daemon that just processes in that loop
[07:48] sustrik yes, i've tried both scenarios
[07:48] sustrik not luck though
[07:48] zedas ok can i see your code. maybe i can root out what's in mine that causes it
[07:48] sustrik can you check whether all sockets and the context are properly shut down?
[07:49] sustrik
[07:50] zedas sockets shutdown on the client or server side
[07:51] zedas also, this wouldn't be a shutdown leak because the server crashes after repeated hits
[07:51] sustrik on the server side i mean
[07:51] zedas ok let me do something to this sample you wrote sustrik
[07:52] sustrik sure
[08:07] zedas sustrik: looks like it's on connect
[08:07] sustrik ?
[08:10] zedas ok this client produces the leak: i'll post my server code next
[08:11] zedas
[08:11] zedas sustrik: not much different from your own except i'm sending back a more reliable message, and freeing it in a free method
[08:11] zedas but, if you hit that server with tons of messages, no leak
[08:12] zedas it's only if you run the client in the shell with:
[08:12] zedas while ./testcli ; do ps aux | grep test | grep -v grep ; done
[08:12] zedas then you start to leak ram from each connection
[08:13] sustrik let me try
[08:13] sustrik ugh, i have to leave for an hour
[08:13] sustrik will continue afterwards
[08:17] zedas sustrik: no problem.
[08:17] zedas i'll just make fewer connetions :-)
[08:17] zedas but i think that narrows it down pretty good.
[09:03] pieterh @zedas: mulletdb looks pretty neat, do you have use cases for it already?
[09:26] zedas pieterh: i'm using it on some dinky projects, but mostly just for fun
[09:27] pieterh @zedas: would you like to be listed on the site? I'd like to collect projects using 0MQ
[09:29] zedas pieterh: yeah sure, if you like. it's fairly new but i'm contributing it regularly.
[09:29] pieterh yes, I read your initial blog
[09:30] pieterh well, we can wait a while to see if mulletdb becomes famous :-)
[09:31] zedas it's famous now
[09:31] pieterh :-)
[09:31] jugg I'm using lua bindings, and calling the equivelant of zmq_getsockopt with ZMQ_RCVMORE and the call if failing. Any known issues with that?
[09:31] zedas well, famous for being another of my weirdo projects :-)
[09:32] pieterh @zedas: do you have a pretty logo yet?
[09:42] jugg nevermind, its a bug in the lua bindings.
[10:09] sustrik jugg: the point is that getsockopt is a new function
[10:09] sustrik so it's probably not yet available via lua binding
[10:09] jugg it is, it was incorrectly implemented.
[10:09] jugg I've patched it.
[10:09] sustrik ah
[10:10] sustrik do move the patch upstream
[10:10] sustrik bindings are separate products though
[10:10] jugg in the middle of doing so... :)
[10:14] jugg
[10:16] sustrik !
[10:34] sustrik zedas: ok, i am runnig your test
[10:34] sustrik how do you terminate it?
[10:46] sustrik ok, i see
[11:55] cremes zedas: for ruby you may want to check out the ruby FFI bindings (it's my project) at
[11:55] cremes they are getting a lot more love than the official bindings at the moment
[11:56] cremes i'm adding specs over the next few days to give some confidence in the release quality
[11:57] cremes also, the bindings are not idiomatic ruby; they map to the C functions pretty closely; i am working on a higher-level library to add ruby sugar back in
[11:57] cremes i know you're not a ruby guy anymore so pass this info on to whoever needed the help with rbzmq
[12:01] cremes oh, and the ffi bindings allow 0mq to work with ruby runtimes other than MRI
[13:06] jugg I've set up a REQ/REP pattern, where the REQ socket is connecting, and the REP socket is binding. Is there a way for the REQ socket to determine if a binding exists? Because the REQ socket send must subsequently block on recv waiting for the reply, I'd rather not have start sending messages through the REQ socket until the REP socket is ready to go.
[13:06] jugg s/not have/not/
[13:07] sustrik jugg: that's done automatically
[13:07] sustrik when there's no connection, messages are queued and sent after the server is available
[13:08] jugg yes, this is the problem. My app is now hung on recv waiting for a reply that is sometime to come yet.
[13:08] sustrik ok, what would you want to do instead?
[13:08] sustrik check periodically?
[13:09] jugg I'd like to detect if this is the case (no server available) or just have the send go to oblivion and the recv be a nop.
[13:09] sustrik you can do a non-blocking send
[13:09] jugg sure, but I still have to wait for the recv.
[13:09] sustrik zmq_send (s, msg, ZMQ_NOBLOCK)
[13:09] sustrik no, because the message is not send in that case
[13:10] jugg ok, so without a server on the other side, it is send that is blocking?
[13:10] sustrik it just says EAGAIN
[13:10] sustrik yes
[13:10] jugg well, I *do* want to block on recv if my message is actually sent.
[13:11] sustrik yes
[13:11] sustrik psudocode:
[13:11] sustrik s = socket (REQ)
[13:11] sustrik s.send (request, NOBLOCK)
[13:11] sustrik if (EAGAIN)
[13:11] sustrik do someting else
[13:11] sustrik reply = s.recv ()
[13:13] jugg ok, so NOBLOCK will cause send to fail if no server is available? May there be another reason that send would fail because NOBLOCK is set even if there was a server?
[13:14] jugg ie. are there other side affects of NOBLOCK
[13:14] sustrik no, this is the only reason
[13:14] sustrik ah
[13:14] sustrik ok, one more reason
[13:14] sustrik if you set high watermarks on all sockets
[13:15] sustrik it may happen (if server is slow to respond)
[13:15] sustrik that all buffers space will be eqhausted
[13:15] sustrik by pending requests
[13:15] jugg ok, but without messing with ZMQ_HWM this is not an issue?
[13:15] sustrik then NOBLOCK would return EAGAIN
[13:15] sustrik no
[13:16] jugg great, thanks.
[13:16] sustrik you are welcome
[13:26] jugg well, either another bug in the lua bindings, or NOBLOCK isn't working...
[13:26] jugg send is returning success
[13:26] sustrik REQ socket?
[13:26] jugg yes
[13:26] sustrik let me see
[13:27] jugg the binding looks good
[13:27] sustrik ah, right, you are connecting it
[13:27] jugg ?
[13:27] sustrik that creates the buffer for messages right away
[13:28] jugg oh
[13:28] sustrik forget what i said before
[13:28] jugg unfortunate, because your confusion is what I wanted... :/
[13:28] sustrik you were right
[13:29] sustrik you send a message and it'll be delivered eventually
[13:29] sustrik what about checking for request in non-blocking manner and doing other work if there's no reply yet?
[13:29] sustrik checking for reply
[13:32] jugg that doesn't fit the existing structure that I'm plugging into :/ The current implementation requires that it is known whether the send was received by the server. But blocking on on the send until the server is available isn't acceptable. It needs to simply drop the message and keep going in that case.
[13:33] sustrik then you need a roundtrip to server to ensure that, right?
[13:34] jugg well, the round trip is to ensure the message was processed correctly when it actually gets processed. But it is ok to drop the message into oblivion if the server isn't available.
[13:34] sustrik pub/sub sockets work that way
[13:34] sustrik deliver is peer is available
[13:35] sustrik drop message if it is not
[13:35] jugg sure, but the peer doesn't reply back in that pattern.
[13:35] sustrik it doesn not
[13:36] sustrik basically, what you need is a sync connection rather than message queueing system
[13:36] sustrik that's somehow out of scope of 0mq
[13:37] sustrik why not use standard BSD sockets for such a trivial scenario?
[13:37] jugg I think it is simply an issue of the REQ/REP pattern, there needs to be some way to check if a REQ can be made, if it can, then certainly block on recv for the reply. But if a request can't be made, don't sit there for eternity waiting to make the request.
[13:38] sustrik it's a design issue
[13:38] jugg this is only the first part of the message chain. Once it gets inserted here, it needs to be ensured from there out.
[13:38] sustrik mq systems are expected to queue the message and let the user continue trusting the message will be delivered
[13:42] sustrik it's similar to TCP in a way
[13:42] sustrik you send data
[13:42] sustrik and trust the underlying stack to deliver it eventually
[13:46] jugg This is a frontend delivery to the messaging system. Once the message is in the system, then that message is ensured to be processed, but if the system isn't available there is no point in having the messages queued by the requesting application. That application needs to be told the system isn't available and try again later when it tries to send a message.
[13:47] sustrik i would use simple TCP connection in such case
[13:48] sustrik other option would be to timout
[13:48] sustrik if no response arrives in N seconds the system is assumed to be unavailable
[13:48] sustrik not sure whether that would work for you
[13:50] sustrik actually, there were already some discussion about providing sync access to 0mq network
[13:50] sustrik say, having a simple library that would speak 0mq wire protocol
[13:51] sustrik but would handle connections same way as TCP does
[13:51] sustrik but that kind of thing doesn't exist yet
[13:53] cremes jugg: i think sustrik's suggestion of sending the message async and timing out on the reply is the way to go if you want to stay within the 0mq framework
[13:53] cremes otherwise what you are trying to do isn't a good fit for 0mq
[13:53] sustrik cremes: there's a point in what jugg says
[13:53] sustrik the applications that handle exactly 1 connection
[13:53] sustrik would want that kind of thing
[13:54] sustrik say client/server application
[13:54] cremes but don't you agree that it is already possible today without a change to the library?
[13:54] sustrik server does obviously need 0mq framework as it is today
[13:55] sustrik the connections are hidden to the user
[13:55] sustrik which makes perfect sense if there's arbitrary number of them
[13:55] sustrik like in server handling 10,000's of connections
[13:55] cremes i agree
[13:55] sustrik with the client the case is a bit different
[13:56] sustrik you want to open exactly one connection
[13:56] sustrik send/recv messages
[13:56] sustrik and get notified if the connection breaks
[13:56] sustrik technically, it's much easier than what 0mq does at the moment
[13:57] sustrik but nobody implemented it yet
[13:58] cremes by "nobody implemented it yet" do you mean no one has implemented the 0mq wire format outside of the 0mq framework?
[13:58] cremes because that is one way to accomplish that goal
[13:59] sustrik witthin or outside, doesn't matter
[13:59] sustrik no sync interface as for now
[14:02] cremes btw, what is your release plan for 2.0.7? 1 week? a month? the api has changed quite a bit in master but i am waiting to update my bindings until the next release.
[14:02] cremes i don't want to chase more api changes.
[14:02] sustrik it's not completely up to me
[14:02] sustrik let me check with others involved
[14:02] cremes oh, i thought you owned it :)
[14:03] sustrik i have no idea how the build system works :)
[14:03] cremes me neither!
[14:04] jugg hmm, the watermark stuff doesn't even make sense in conjuction with REQ/REP anyway, as there can only ever be one message in the queue.
[14:05] sustrik jugg: right
[14:05] jugg So, in this case, I would think that NOBLOCK should work the way I want it.
[14:05] sustrik on send?
[14:06] jugg yes, the only point in the queue on send is if the server isn't available.
[14:07] sustrik yes, the queue allows you to continue processing even though the peer is not available
[14:07] sustrik it might seem an overkill in REQ case
[14:07] sustrik but REQ is just a convenience wrapper over XREQ
[14:08] sustrik which allows you to send arbitrary number of requests without waiting for answer
[14:08] sustrik in that case the queue makes perfect sense
[14:08] jugg I guess that one isn't documented...
[14:08] sustrik it's not, sorry
[14:08] sustrik it works in same way as REQ
[14:09] sustrik but you don't have to recv after each send
[14:09] sustrik the drawback is that you have to do some bookkeeping by hand then
[14:10] jugg yah, it would seem the messages would have to have manual sequence numbers tagged on.
[14:10] jugg maybe not in a 1to1 but in a load-balanced topology, certainly.
[14:10] sustrik it works this way
[14:11] sustrik imagine a complex network of requester and repliers with arbitrary structure of stand-alone queues in the middle
[14:11] sustrik each request travels N hops till it gets to service
[14:11] sustrik the reply has to be routed back the same way, opposite direction
[14:12] sustrik so what happens is that each intermediate node tags the request with it's name
[14:12] sustrik when request arrives at the service
[14:12] sustrik it can has a stack of node names attached to it
[14:13] sustrik what REP socket does
[14:13] sustrik is that it cuts off the stack and provides raw message to the service
[14:13] sustrik (it stores the stack somewhere in the meantime)
[14:14] sustrik when service processes the request and sends the reply, REP socket takes the stored stack and attaches it to the reply
[14:14] sustrik then each node routes the reply right way and chops of one name from the stack
[14:15] sustrik when reply arrives at the original requester there are no more attached names in the message
[14:15] sustrik makes sense?
[14:16] jugg yes
[14:16] sustrik so, when you use XREQ/XREP instead of REQ/REP
[14:16] sustrik the bookkeeping you have to do is:
[14:17] sustrik 1. in XREP you have to chop and store the stack manually and attach it back the reply manually
[14:17] sustrik 2. in XREQ you have to attach 'bottom of the stack' message part to the request manually
[14:17] sustrik and chop it off from the reply manually
[14:17] sustrik 'bottom of the stack' message part is just a message part zero bytes long
[14:18] sustrik that's all about XREQ/XREP
[14:19] mato sustrik: maybe it's time to document XREQ/XREP after all? but mark it prominently as "API is experimental" or something?
[14:19] sustrik yes, i think so
[14:19] sustrik people are using it
[14:20] sustrik so no point in hiding the documentation
[14:20] mato ok
[14:20] mato do we have a code example that uses it? one of the devices?
[14:21] sustrik zmq_queue
[14:22] mato ok, good
[14:51] jugg well, I still cast my vote that a REQ socket should fail if NOBLOCK is set and the REP socket isn't available. Failing that, I cast a vote for a UNAVAIL flag to be added which would then cause a failure if the REP socket is not available. (available meaning not bound/connected)
[14:59] sustrik the problem is that it's a distributed system; say if you have a stand-alone queue in the middle, client would send the request (because queue is available) but it it'll linger in the queue because there are no services connected to it
[15:00] sustrik :|
[15:01] sustrik one of the design goals for 0mq is to allow for adding middle nodes into the network without changing semantics for the applications
[15:01] jugg not at all, if the first hop is available, that is all that matters. The nodes in the middle shouldn't be using NOBLOCK, and it should wait until the message can be processed further.
[15:01] jugg The "gateway" received it, therefore the finaly processing should be ensured.
[15:02] sustrik can't you think of the queue in the client as a gateway as well?
[15:02] jugg no
[15:03] sustrik why so?
[15:03] sustrik because of client application failing?
[15:05] jugg An application knows how better to queue is data than a messaging handler. The application is purely interested in knowing whether it can connect to the "system" or not. If it can, it will send out messages. If it can't, it'll wait until later.
[15:10] sustrik makes sense
[15:10] sustrik so we have two conflicting user requirements here
[15:11] sustrik 1. "don't bother me with transport details, accept the message and leave me alone"
[15:11] sustrik 2. "let me know if the message cannot be sent immediately"
[15:13] jugg in the general case, it isn't if it can't be sent immediately (queuing is fine, as long as the queue is being processed - ie something is consuming it). However in the REQ/REP case, where there is only a single message at a time, then yes, it simplifies down to that.
[15:13] jugg Seems like the UNAVAIL flag satisfies both needs.
[15:13] sustrik jugg: yes, REQ/REP is special in a way
[15:13] jugg sustrik: thanks for the input and discussion.
[15:14] sustrik I like to think of XREP/XREQ as of IP and REQ/REP as TCP
[15:14] sustrik it's as there were two separate layers there
[15:14] sustrik with distinct interfaces possibly
[15:14] sustrik anyway
[15:14] sustrik see you later
[15:14] jugg bye
[16:20] jugg Hmm, this is odd on a REP socket, I call send(msg, SNDMORE) 20 times, and one more time without SNDMORE. On the REQ socket, I call recv() and check getsockopt(RCVMORE) which only returns 10 times that there are more messages. The 10 successful recv() calls return every other sent message. So, every other message is being lost somewhere.
[16:28] jugg meh, this is quite bothersome...
[16:29] zedas cremes: thanks.
[16:29] zedas sustrik: any luck?
[16:45] jugg there certainly seems to be a bug here. I can combine my data set into a single message, and it comes accross just fine. But if I use SNDMORE, every other message is lost.
[16:47] cremes jugg: fyi, you are using an experimental feature. if you can reproduce a bug, file a report
[17:31] jugg issue filed:
[17:31] cremes excellent
[20:48] bgranger sustrik: I have a question about if/when zmq_free_fn is called if zmq_send fails...
[20:49] CIA-17 pyzmq: 03Brian Granger 07nocopy * r4685c6d 10/ (zmq/_zmq.c zmq/_zmq.pyx): Adding comments about when Py_DECREF is called. -
[20:53] sjampoo bgranger, hi
[20:53] bgranger hi
[20:54] sjampoo i verified it, it does not call zmq_free_fn on error
[20:54] sjampoo not sure if it should though
[20:54] bgranger sjampoo: thanks, so I possibly have to clean up myself (I can't rely on zmq_free_fn for that)
[20:55] bgranger That is a tough one, because it really depends on why the send fails.
[20:56] bgranger As long as 0mq is consistent and never calls it on error, I can handle it.
[21:19] CIA-17 pyzmq: 03Brian Granger 07nocopy * rc0dce80 10/ (zmq/_zmq.c zmq/_zmq.pyx): Documentation an minor improvements to Message object. -
[21:22] mato hi guys, does this work better for a description of the pipeline pattern?
[21:22] mato The pipeline pattern is used for distributing data to _nodes_ arranged in
[21:22] mato a pipeline. Data always flows *down* the pipeline, and each stage of the
[21:22] mato pipeline is connected to at least one _node_. When a pipeline stage is
[21:22] mato connected to multiple _nodes_, data shall be processed by all connected _nodes_
[21:22] mato in parallel.
[21:29] cremes mato: sounds better than what is there
[21:29] cremes as usual, code is the best documentation
[21:29] mato goodo
[22:49] CIA-17 pyzmq: 03Brian Granger 07master * r88a07bb 10/ (README.rst setup.cfg.template): Updating README.rst and setup.cfg.template. -
[22:59] CIA-17 zeromq2: 03Martin Lucina 07master * r5219e4c 10/ (doc/zmq.txt doc/zmq_setsockopt.txt doc/zmq_socket.txt): Clarify socket types in documentation, reinstate ZMQ_PAIR -
[22:59] CIA-17 zeromq2: 03Mikko Koppanen 07master * r8bd3f74 10/ builds/redhat/zeromq.spec : Import redhat packaging -
[22:59] CIA-17 zeromq2: 03Mikko Koppanen 07master * rb4cc7b9 10/ : dist-hook for copying zeromq.spec to top-level -
[22:59] CIA-17 zeromq2: 03Martin Lucina 07master * r74f1a4a 10/ builds/redhat/zeromq.spec :
[22:59] CIA-17 zeromq2: RPM packaging cleanups
[22:59] CIA-17 zeromq2: - ditch -utils package
[22:59] CIA-17 zeromq2: - add descriptions from Debian packaging -
[23:39] CIA-17 zeromq2: 03Martin Lucina 07master * rda37c45 10/ (doc/zmq_bind.txt doc/zmq_connect.txt):
[23:39] CIA-17 zeromq2: Clarify zmq_bind/zmq_connect
[23:39] CIA-17 zeromq2: Use the term 'endpoint' correctly, and drop the nonsense about local/remote addresses which doesn't clearly explain what is going on -