Thursday October 20, 2011

[Time] NameMessage
[03:19] aroman is it possible to do async RPC with req/rep? I have a process that is doing some IO bound stuff that takes a while to complete, but I need it to be able to accept additional requests while one is still going.
[03:29] aroman oh I see now,
[03:29] aroman dealer/router is *exactly* addressing my issue
[03:41] mattbillenstein aroman: I've been looking at this a bit as well — to implement msgpack-rpc
[03:45] aroman mattbillenstein: ooh, that is very interesting
[03:45] aroman hadn't heard of msgpack-rpc before, but I could definitely use it.
[03:45] mattbillenstein I was working on a zeromq - msgpack-rpc implementation
[03:46] aroman I currently use JSON to power my zeromq RPC system, but JSON is definitely not the bottleneck for my scale -- I just need asynchronous RPC
[03:46] mattbillenstein I backed out the zeromq part of it after I couldn't really figure out exactly how to handle failure cases
[03:46] mattbillenstein but still interested in seeing if it's possible to make it all work
[03:47] aroman i'm sure it is, but zeromq's inner machinations still escape me
[03:47] mattbillenstein json doesn't handle binary data well
[03:47] mattbillenstein well
[03:47] aroman i'm still barely wrapping my head around the DEALER syntax
[03:47] aroman yep, and I'm only sending very small lists or HTML.
[03:47] mattbillenstein I went about doing this using regular tcp sockets and it seems to work pretty well
[03:48] aroman (RPC for a Python web scraper doing async http requests)
[03:48] mattbillenstein gevent or eventlet?
[03:48] aroman neither, Tornado's async http client
[03:48] mattbillenstein ah, cool
[03:48] aroman since pyzmq has tornado support built in
[03:48] mattbillenstein I was using gevent — sounds about like the same thing
[03:49] mattbillenstein there is a library for making zmq sockets block in a greenlet — gevent-zeromq
[03:49] aroman (literally one line of code to make the event loops of tornado and zmq work together)
[03:49] aroman s/loops/loop/
[03:49] aroman mattbillenstein: why would one want blocking sockets?
[03:49] aroman (in an eventloop, I mean)
[03:50] mattbillenstein well, to the user, it seems like they block, but they just yield to the event loop so something else can run
[03:51] mattbillenstein does tornado use callbacks?
[03:51] aroman yes, it's fully asynchronous
[03:53] aroman actually, it looks like I don't really need zeromq at all
[03:54] aroman msgpack-rpc is solving literally exactly what I need -- node.js async, parallel pipelined RPC to python.
[03:54] mattbillenstein word — never been a fan of writing code using callbacks — big fan of eventlet/gevent
[03:54] mattbillenstein I implemented that interface
[03:54] aroman node.js RPC to python?
[03:54] mattbillenstein but I never have more than one message in flight down a single connection at time
[03:55] mattbillenstein so to make a bunch of async calls, I just spawn a new greenlet which creates a new connection, makes a request - when it's done, it closes the connection and hands back the response
[03:55] mattbillenstein no, nothing with node
[03:55] mattbillenstein msgpack-rpc is just a specification
[03:55] mattbillenstein I'm doing python <—> python now
[03:55] mattbillenstein but I may use the same machinery to do python <—> c++ soon
[03:55] aroman ah
[03:56] aroman my project is mostly aimed at getting me more familiar with zeromq/node/ etc, so I'm kind of reluctant to simply bail on zeromq
[03:56] aroman but it definitely seems that i'd be reinventing a lesser wheel as compared to using msgpack-rpc
[03:56] mattbillenstein well, and I use connection pools, so I'm not necessarily creating a new connection each time — so I think it's pretty decent
[03:57] mattbillenstein well, the tradeoff is tcp vs zeromq
[03:57] mattbillenstein you can use msgpack in either case
[03:57] aroman what do you mean?
[03:58] mattbillenstein like msgpack is basically json
[03:58] mattbillenstein it's how you encode the messages on the wire
[03:58] mattbillenstein the wire in this case could be either a tcp connection, or zeromq, or something else entirely
[03:58] aroman ohh
[03:59] aroman so zeromq is not mutually exclusive to msgpack
[03:59] aroman zeromq being the transport and msgpack being the data format
[03:59] mattbillenstein exactly
[03:59] aroman which goes back to your original statement about trying to get the two working together
[03:59] mattbillenstein I started with zeromq and msgpack
[03:59] mattbillenstein but replaced zeromq with tcp
[03:59] aroman ah.
[04:00] mattbillenstein think I just don't understand zeromq well enough and i wanted to get this code out into production
[04:00] mattbillenstein but I might try again sooner or later
[04:01] aroman yeah. I have no actual deadline or concrete end-goal here, so I have the luxury of evaluating my options
[04:02] aroman so maybe what i'll do is use zeromq's DEALER pattern with JSON (since I'm already using JSON, just with sync req/rep sockets) and then see about moving to msgpack
[04:03] mattbillenstein cool, I'd be interested to hear how that turns out
[13:31] rays any idea why this test fails?
[14:31] mikko rays: that is a hefty test
[15:53] cremes rays: what does that code output? when it "fails" does it print an assertion, hang, ... ?
[15:53] cremes if it hangs, can you attach with gdb and get a backtrace?
[17:03] cratores howdy all, getting: Assertion failed: (msg_->flags | ZMQ_MSG_MASK) == 0xff (zmq.cpp:223) on a moderately high traffic app ( about 500k msg/s 15B each) using only inproc:// endpoints between threads, is this worth submitting a tkt or is it likely me (and if so any tips on debugging)? zmq 2.1.9
[17:03] cratores this happens randomly
[17:03] cratores it will sometimes happen reasonably fast and sometimes the app will run for 15 min
[17:04] cratores sorry not 2.1.9, 2.1.10
[17:04] cratores forgot i updated today to see if it would help (does not)
[17:12] mikko cratores: what sort of socket types?
[17:13] mikko cratores: is this reproducible in a smaller test case?
[17:14] mikko cratores: that looks like an invalid message is being accessed
[17:14] mikko for example one that has been closed
[17:15] mikko brb
[17:19] cratores i haven't tried reproducing, but i will if i can't figurei t out. the sockets are pub/sub and there's only 1 of each (so maybe I should use pairs)
[17:19] cratores the thing is i don't know how i would be accessing an invalid msg, id on't do anything fancy like no copy
[17:20] cratores all i do is send/recv with locally created msgs (on the stack, not heap)
[17:21] cremes cratores: accessing messages on the stack might be your issue
[17:21] cremes any chance you can easily change your code to allocate messages on the heap instead?
[17:21] cratores but i copy all the data i need before they go away?
[17:22] cratores all i'm saying is I use them like: func(x,y,z) { zmq:msg_t gets used here and all of its data copied to my own structures }
[17:23] cratores i could just be doing something stupid elsewhere and corrupting of course, so there may not be anything you guys can tell me
[17:24] cratores it's just such a pain to debug multithreaded i/o using gdb, of course the problem doesn't occur in debug since it runs 50x slower
[17:24] cremes cratores: valgrind it... you might be doing something funky that valgrind can catch
[17:24] cratores yeah
[17:27] guido_g messages on the stack is a bad idea, afair inproc just copies pointers, so the messages might be already gond/garbled when the receiver accesses them
[17:30] cratores sorry i'm not being clear
[17:30] cratores i'm not passing them on the stack
[17:31] cratores i'm creating them in a function, receiving data into them
[17:31] cratores locally
[17:31] cratores and then copying what i need before the function is done
[17:32] cratores like func() { msg m; socket.recv(m); copy_stuff_from_m(m); }
[17:32] cratores anyway, i'm going to try valgrind
[18:03] guido_g and on the ending side?
[18:26] mikko guido_g: the message data is refcounted
[18:26] mikko stack is perfectly fine for the message container
[18:28] mikko zmq_msg_init () allocates the contents on the heap
[18:28] mikko unless the message is a VSM
[18:29] mikko which i can't remember how it's handled
[18:37] guido_g ok
[18:37] guido_g is a 15 byte msg a vsm?
[18:40] mikko guido_g: probably
[18:41] mikko zmq_msg_init_data might not be ok from stack
[18:41] guido_g ahh thx
[18:42] mikko ZMQ_MAX_VSM_SIZE 30
[18:43] guido_g so we need to know where the message data resides
[18:49] cratores on both sides i create local zmq_msgs (sorry i use c++ wrapper so these are zmq::msg_t's), recv from the socket, then copy from the msg onto my own structures all in the same function
[18:49] cratores this not zero copy, why would this not be ok?
[18:50] cratores i do this all over this place in apps that run over 1mm msgs a second on Windows and don't have any problems
[18:50] cratores i'm just porting my apps to linux now and running into issues
[18:50] mikko cratores: it's ok
[18:51] guido_g w/o code impossible to tell
[18:51] mikko cratores: it would be beneficial to run the app under valgrind
[18:51] mikko to see if there are memory errors somewhere
[18:51] cratores yes i'm installing it now
[18:52] cratores never used it before since i'm mostly a windows developer, so i'm learning
[18:52] mikko cratores: also, if you can put the relevant parts to it would help
[18:52] cratores will keep the channel posted
[18:52] mikko i'm gonna hit the gym, will be back in 2h or so
[19:04] espeed when using a Streamer Push/Pull device, sometimes messages are lost. For example, a client will use TCP to push 1000 messages to an internal worker on the same host, and the worker will only receive ~108 of them. HWM is set to default. Is this likely a buffer overrun issue? There is plenty of memory on the system.
[19:09] cremes espeed: are you sure all of the PULLers are connected before you start pushing?
[19:10] espeed I set it to only 1 puller, and it received the first through 108 messages, and it is still running in background
[19:10] cremes espeed: nevermind... PUSH is supposed to block if there are no PULL sockets connected
[19:11] espeed The first time I ran it, it receive all 100 messages
[19:11] cremes espeed: if this is a real bug, then a simple reproduction ought to be simple to make
[19:11] espeed subsquent times the quantity received fluctuated
[19:11] espeed *all 1000
[19:11] cremes what OS and what version of 0mq?
[19:11] cremes and what bindings?
[19:12] cratores valgrind detects nothing btw :(
[19:13] cremes cratores: :(
[19:14] cratores well i can easily enough try reproducing in a tiny case in 20 lines of code, but i'm sure it will work of course ;)
[19:14] cremes of course!
[19:14] cratores part of the problem is that valgrind slows down my app to the point that it may not trigger whatever is causing the problem
[19:15] espeed pyzmq, ZeroMQ 2.1.4 on Fedora
[19:20] espeed cremes: let me make sure I understand how 0mq is supposed to buffer -- an available PULL socket will buffer under the limits of available memory and then if no other PULL sockets are available, the messages are discarded?
[19:21] cremes hmmm
[19:21] cratores you mean the push socket will suffer, yes?
[19:21] cremes PULL sockets can receive as many messages as memory allows
[19:21] cratores buffer
[19:22] cremes the PUSH socket and its HWM determine how many messages the "worst" PULL socket can be behind
[19:22] cremes before it blocks
[19:22] cremes by default, HWM is 0 which means "no limit"
[19:22] cremes does that answer your question?
[19:22] espeed yes
[19:24] espeed ok, so then if there is ample GBs of memory on the system, and the PULL socket is "losing" 900 "hello" messages out of 1000, then that's not normal
[19:25] espeed it's so intermittent as to how many messages it will receive
[19:32] rbancroft are you using an up to date version?
[19:35] espeed Here's my test code... Streamer Device: , Client & Worker Tester:
[21:06] cremes espeed: i think i know why you are getting odd results
[21:07] cremes espeed: you aren't closing the sockets and terminating the context at the conclusion of the program
[21:07] cremes so it exits before all messages can be delivered
[21:07] cremes if you used ZMQ_LINGER, closed the sockets when the work was done, and terminated
[21:07] cremes the context to make sure everything was flushed, then you would probably get all messages every time
[21:26] espeed cremes: ahh, thanks -- that's what I was just investigating. However, the guide says "When you use ØMQ in a language like Python, stuff gets automatically freed for you"
[21:27] cremes espeed: sure, but it doesn't say "stuff automatically gets sent for you"
[21:36] mikko flushing is not always enough
[21:36] mikko in some cases things are in tcp buffers
[21:36] mikko which might get discarded
[21:37] mikko see
[21:37] mikko not sure if that's the case here
[21:38] mikko haven't read the whole backlog yet
[21:56] sigmonsay Hello, I'm attempting to call recv_pyobj() on a ipc and am getting an exception about cannot be accomplished in current state
[21:56] mikko sigmonsay: req/rep?
[21:56] sigmonsay Yah
[21:57] mikko request - reply follow a strict pattern
[21:57] sigmonsay I thought the intent was to blcok if the other side wasn't there
[21:57] mikko so request socket is always send/recv/send/recv
[21:57] mikko and reply is always recv/send/recv/send
[21:57] mikko etc
[21:58] sigmonsay Ok, perhaps I have a bug. Just didn't expect an error of this nature
[21:58] mikko if the operation cannot be accomplished in the current state (EFSM) then check that you are following the pattern strictly
[21:59] sigmonsay ty sir, that was it!
[21:59] mikko np
[22:08] espeed set LINGER to 0 on all sockets, changed from TCP to IPC on both, closing all client and worker sockets on termination, and destroying client and worker contexts -- but the problem persists
[22:17] sigmonsay in python does a zmq context create a process or a thread?
[22:19] minrk sigmonsay: no
[22:21] sigmonsay I see two processes forked off of the main sitting in epoll loops, they're not mine afaik
[22:21] minrk the ProcessDevice in the example means run zmq_device in a Process
[22:21] minrk but creating a Context does approximately nothing but call zmq_init
[22:27] sigmonsay well does calling bind or connect call fork? =P
[22:29] sigmonsay Yah, that's exactly what is happening
[22:29] cremes sigmonsay: no, zmq_bind() and zmq_connect() do not call fork
[22:29] sigmonsay in python perhaps?
[22:29] cremes but the python library could be doing so
[22:29] minrk no
[22:31] sigmonsay ps doesn't lie, its a thread or a new process. I'll RTFM
[22:31] minrk how many processes are there?
[22:31] sigmonsay 3 total, one process w/ 1 zmq context kicks off 2 children
[22:32] minrk and what's the code?
[22:34] sigmonsay drop dead simple zmq.REP socket
[22:35] minrk Is is possible to post actual code that sees this fork?
[22:37] sigmonsay they are likely threads then
[22:37] sigmonsay Yep, they're threads. Long way to a short answer
[22:41] jjg hi .. i'm looking for a robust and lightweight http to apmq gateway that keeps a persistent connection to the queue broker .. any tips?
[22:41] sigmonsay dude, wait a second for an answer before cross posting
[22:42] jjg sigmonsay: they're called channels not channel
[22:42] mvolkov Hello, Q: How to find out the volume of queue in a worker? I am using PUSH ... PULL
[22:42] mvolkov is it even possible?
[22:43] minrk @sigmonsay - I think zmq_init does spawn C-threads, but pyzmq definitely does not spawn any processes or *Python* threads unless you are calling an object named 'Thread…' or 'Process...'
[22:44] sigmonsay That's what i'm seeing, still newb here. thanks mikko
[22:44] cremes mvolkov: no, that information is not exposed
[22:44] cremes ok all, on my way to the chicago meetup
[22:45] mvolkov cremes: thank you
[22:46] rem7 Did the structure of msg envelopes change for zeromq3? Im playing address-based routing (ROUTER<->REP) My code seems to work with zeromq2.1.7 but not in 3. It seems it doesn't like the empty msg between the identity and the workload.
[22:48] jjg seb`: will do, tx
[22:49] rem7 if I remove the empty msg then it works in ver3, but for ver2 I need to put the empty msg... I just want to make sure if thats normal...?
[23:29] espeed I found that if I include a sleep(5) at the end of the client function, it stays active long enough for the socket to finish sending the messages to the worker
[23:30] nag espeed: surely that's not ideal
[23:31] espeed no, def not, but now I have a better understanding of where the problem is