IRC Log


Tuesday September 7, 2010

[Time] NameMessage
[05:57] lestrrat I'm having problems using fork() and making my parent process talk to the child processes via 0mq. I'm getting segfaults after the child exits and while the parent is in recv()
[05:57] lestrrat is this supposed to work?
[06:55] pieterh sustrik: you there?
[09:51] pieterh anyone here felt that XREP and XREQ could use better names?
[09:52] lestrrat I just gave a talk to my coworkers about zeromq, and yes, better_names++
[09:52] pieterh I was thinking of ROUTE and FORWARD
[09:53] pieterh XREP creates routing envelopes around incoming messages and uses these on output to route replies back to original clients
[09:54] pieterh XREQ just forwards messages in both directions without touching them
[09:55] pieterh XREP really looks like a router... it's the only socket type that lets you address specific connections
[09:55] keffo I like 'route' very much, not forward so much
[09:55] pieterh yeah, forward wasn't inspired
[09:55] pieterh it has to be a verb
[09:56] pieterh that says "move stuff in both directions but don't mess with it"
[09:56] pieterh PORT
[09:56] pieterh XFER
[10:11] pieterh keffo: here's a thought: http://www.zeromq.org/sandbox:mudem
[10:11] pieterh lestrrat: does that ring a bell?
[10:13] lestrrat thinking
[10:15] lestrrat hmm. I grok ROUTE, but not the mudem part :) but I don't have a great alternative plan either
[10:15] lestrrat naming is hard, eh.
[10:17] pieterh well, mudem is a play on modem, modulator/demodulator...
[10:17] keffo pieterh, I have a 'route-codec' in my code...
[10:17] lestrrat yeah, I know
[10:17] pieterh i don't like invented words but if one has to invent them they should be expressive
[10:18] keffo whichi encodes/decodes routes for an xreq
[10:18] pieterh keffo: sounds right
[10:18] pieterh creates and uses envelopes, right?
[10:20] keffo it just handles them, contains a payload the 'enduser' is interested in, and also has a sendroute function
[10:20] pieterh well, an alternative to mudem: dispatch
[10:21] keffo naa, someone who implements any type of loadbalancing of messages is in essense a dispatcher, imo
[10:21] pieterh true
[10:22] pieterh it's the combo of fanout and fanin
[10:23] pieterh i think "multiplex" is wrong since that suggest copying whereas its distribution
[10:23] pieterh *it's
[10:23] lestrrat yeah, I thought about multiplex, but it didn't quite fit
[10:24] pieterh in terms of use cases, xreq is like push+pull, it ventilates and sinks at once
[10:25] pieterh one could create a nice pipeline pattern using just XREQ to XREQ
[10:27] pieterh how about... something more visual... 1TON
[10:27] lestrrat 1000kg!
[10:28] pieterh yeah
[10:28] pieterh 1-to-N for the pedantic of us
[10:30] pieterh http://www.zeromq.org/sandbox:1ton
[10:31] pieterh it kind of feels more like a building block now
[10:42] keffo I have an issue where my worker process simply dissapears, but I cant seem to trap it
[10:42] keffo no exceptions, no atexits are run.. nada..
[10:42] pieterh what OS?
[10:42] keffo very annoying!
[10:43] keffo win7
[10:43] pieterh ah, that is a known problem
[10:43] keffo hu?
[10:43] pieterh the usual solution is to upgrade to Linux
[10:43] pieterh sorry :-)
[10:43] keffo caused by zmq??
[10:44] keffo tossing away 98% of the global userbase is hardly an upgrade btw :)
[10:44] pieterh I was kidding, my bad
[10:44] pieterh what language are you using?
[10:44] keffo c++, lua
[10:44] pieterh so you need a debug build of 0MQ IMO
[10:44] keffo oh it's all debug, debugger is attached too :)
[10:44] pieterh aw :-(
[10:45] keffo gives me nothing.. I've tried all routes I can think of
[10:45] keffo abort()?
[10:45] keffo but why would that be called?
[10:45] pieterh DebugBreak() afair, then continue it in the debugger
[10:45] pieterh ... assertion failure?
[10:45] keffo nada :)
[10:46] keffo no asserts. no breakpoints, simply dissapears.. windows event log shows nothing
[10:46] keffo pussling actually
[10:46] keffo no exceptions are raised either
[10:46] keffo It's as if the app cleanly exits, except it cant since it"s a while(true)
[10:46] pieterh it could exit in another thread I guess
[10:47] pieterh i've not worked on win32 for ages... maybe someone else here can be more helpful
[10:47] keffo does zmq ever call exit?
[10:47] keffo (or abort)
[10:48] pieterh nada
[10:48] pieterh asserts, yes
[10:48] pieterh 98%? keffo, 2010 is the Year of Linux
[10:48] pieterh it's no more than 97.85% by now
[10:49] keffo if my mom could use any generic desktop linux without calling me, then I"d agree :)
[10:50] pieterh hah, my mum actually does use linux and has for years...
[10:51] pieterh but then again she's currently asking me how to hide her IP address so she can troll Anonymous so perhaps she's not typical...
[10:51] pieterh keffo: if you can make a reproducible case, and chop it down, maybe we can reproduce it on another platform
[10:52] keffo lord no, that would take ages :)
[10:52] keffo I just want to somehow detect -when- it happens, and go from there, but so far I've been unable to
[10:52] pieterh then, my friend, you might have to resort to...
[10:53] pieterh if really you have no other option...
[10:53] keffo print? =)
[10:53] pieterh yeah :-)
[10:53] pieterh don't forget the fflush (stdout);
[10:54] keffo It's remarkably reproducable though
[10:55] pieterh well, that's always good
[10:55] keffo 4th time I calculate pi, it dissapears
[10:55] pieterh hopefully it remains stable as you add hundreds of prints
[10:55] pieterh you're calculating pi?
[10:57] keffo printing is not the problem, I generate ~250k of logs on each run :)
[10:57] keffo pi yeah, easy and verifiable thing to calc distributed :)
[10:58] pieterh is there an algo for distributed pi calculation somewhere?
[10:59] keffo sure
[10:59] keffo tons of different I guess
[10:59] pieterh i had an idea for a supermassive 0MQ project... lol
[10:59] pieterh not original but who cares...
[10:59] keffo as did I :)
[10:59] keffo for i=self.beginspan, self.endspan do
[10:59] keffo localpi = localpi + (1.0 / (i * 4.0 + 1.0) )
[10:59] keffo localpi = localpi - (1.0 / (i * 4.0 + 3.0) )
[11:00] keffo do that for each subspan of some arbitrary length, then sum them all up and the answer * 4 is pi :)
[11:00] pieterh not 42? weird...
[11:00] keffo hehe
[11:00] pieterh aight, so if we have a server somewhere that distributes workloads, and a simple 0MQ client that accepts them...
[11:01] pieterh has surely been done dozens of times
[11:02] keffo I'm doing a fairly more complicated scenario, but yeah
[11:05] keffo and I guess you can figure out why I want something a bit more flexibe than roundrobin :)
[11:06] pieterh right now I'm writing examples on how to use XREP to do routing
[11:06] keffo good, it was messy :)
[11:06] pieterh how so?
[11:06] pieterh you mean no documentation on the envelopes etc.?
[11:06] keffo just not very nicely explained
[11:06] pieterh right...
[11:13] keffo I would explain it as a stack+req...
[11:13] keffo push, push, push, payload, then pop,pop,pop, payload on the other side
[11:14] keffo btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
[11:21] keffo btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
[11:21] pieterh hmm, you mean a reply?
[11:21] pieterh with or without identity?
[11:22] pieterh this is what 0MQ/2.1 is fixing
[11:22] pieterh it will wait in some cases, discard in other cases
[11:22] keffo in general, the whole reconnect business
[11:22] keffo if it goes into the queue but never out, what happens to it?
[11:23] pieterh well, the queue is per socket, eventually
[11:24] pieterh there is not yet a proper explanation of how the 2.1 socket close semantics should work
[11:24] pieterh afaik
[11:24] keffo client(100msgs) -> queuedev -> service, then back again, except the client is gone forever..
[11:24] pieterh anonymous clients -> messages get thrown away
[11:25] pieterh client with identity -> messages persist as long as service is running
[11:25] keffo ok
[11:25] pieterh 0MQ does have the concept of a connection going away
[11:26] pieterh otherwise PUB sockets for example would end up with horrid resource leaks
[11:26] keffo I need to introduce some sort of session.. if I have (known)client A, does a bunch of test junk(like pi), but aborts prematurely, but then reconnects to start some other type of job, I dont want to receive a bunch of old pi results :)
[11:27] keffo and I would need to be able to tell all parties involved to dump everything related to an "old" session as well
[11:28] pieterh keffo: this starts to be industrial design work
[11:29] keffo pieterh, what do you mean?
[11:29] pieterh i mean, what you're making is heavy duty...
[11:30] keffo oh very much :)
[11:30] keffo it has fried my brain on may occasions.. Tons of papers of diagrams spread all over the place :)
[11:30] pieterh if you have budget to throw at it, i can recommend an industrial 0MQ designer like Mato here
[11:31] keffo the bulk of the work is not the transport & topology though
[11:31] keffo although that needs to obviously be stable
[11:31] pieterh well, you need an infrastructure that understands 'sessions'
[11:32] keffo Sure, but that"s already handled
[11:32] pieterh what do you still need then?
[11:32] pieterh apart from the thing not crashing...
[11:32] keffo hehe
[11:32] keffo lingering data trying to reconnect for one.
[11:33] keffo oh and more liberal means of implementing loadbalancing, but I've made that point already :)
[11:34] pieterh well, load balancing using XREP routing is pretty clear, and will be nicely explained in Ch3 of the Guide
[11:34] keffo will be? =)
[11:34] pieterh is in progress if I was not chatting here :-)
[11:34] pieterh anything to do with maintaining overall state is a different kettle of chicken, though
[11:34] keffo But I think I know what that will say by now :)
[11:35] pieterh hopefully, yeah
[11:35] keffo what I'm doing is something I've thought about for years though, so it should work :)
[11:35] keffo zmq solved a big gaping questionmark though :)
[11:36] keffo I might be getting a job soon though, so dev on this wll sadly be sidetracked to weekends and evenings only though :/
[11:37] pieterh is it open source?
[11:38] keffo it might be eventually!
[11:38] keffo would benefit the lua community I guess
[11:38] pieterh well... i've learned two relevant things here having done software for way too long
[11:39] pieterh a. if it's not open source it will die
[11:39] pieterh b. if you don't start as open source you can't make it work afterwards
[11:40] keffo I dont agree, and I've never done anything other than software :)
[11:40] pieterh cause it's not about building code but about building community...
[11:40] pieterh good luck, anyhow
[11:41] keffo I wasnt thinking of opensource as in leveraging resources, but simply aiding someone else, when I'm done with it :)
[11:41] pieterh nah, without people who helped make the code, it dies as soon as your evenings and weekends aren't available any more
[11:42] pieterh it's not about leveraging resources but about software that lives past the "free time" of its creator
[11:43] pieterh imho
[11:43] keffo Oh, it's not free time at all, but I need to work for a bit to not starve :)
[11:43] pieterh starving is not pleasant, no
[11:44] keffo Not so much starving, but keeping girlfriend happier :)
[11:45] keffo when this is done, it will take me & the other dude about 2-3 weeks to produce the actual product we'll eventually sell.. -That- part is already planned and so forth..
[11:45] keffo And ones that happens, there is no benefit to keep the code not opensource
[11:46] keffo err, once..
[11:46] pieterh :-) good spellng is hrad somtimes
[11:55] keffo um, this is odd
[11:56] keffo I wonder if lua might freak a little at some weird binary
[13:51] CIA-20 zeromq2: 03Martin Sustrik 07master * r6d4ffd9 10/ (src/fq.cpp src/lb.cpp): Bug in fq_t and lb_t (when used via ZMQ_EVENTS option) fixed - http://bit.ly/cvOPzL
[15:14] CIA-20 zeromq2: 03Martin Sustrik 07master * rf374431 10/ src/pipe.hpp : get rid of 'has virtual functions but non-virtual destructor' warnings in pipe.hpp - http://bit.ly/9Relxm
[15:21] Tasser cremes, it's more about the ruby part you wrote
[15:21] cremes Tasser: i'm around if you have questions
[15:21] Tasser cremes, oh, just asking for the big picture
[15:22] Tasser aka what is where, how to stuff flows
[15:22] cremes sure...
[15:22] Tasser and probably write it down into your git
[15:22] Tasser HACKING or something like that :-)
[15:22] cremes whatever i write here, i'll clean up and add to the README
[15:23] cremes ZM::Reactor is a thread that contains a single ZMQ context
[15:23] cremes from this context, you can create any kind of socket
[15:23] cremes (stop me if i'm not answering your question)
[15:24] cremes during socket creation, you pass a ruby object that will act as that socket's handler
[15:24] Tasser callback?
[15:24] cremes the handler should provide on_attach, on_writable and on_readable methods
[15:25] cremes the on_attach method is called right away and lets you set things up (kind of like a constructor)
[15:25] Tasser so why not #new ?
[15:25] cremes the on_readable and on_writable methods are called when the socket is polled for those events and finds them to be true
[15:26] cremes explain what you mean by "not #new"?
[15:27] Tasser create a new instance per socket, so call #new and on that instance #on_writable, #on_readable
[15:29] cremes the handler instance is just a regular ruby class that implements the 3 methods i mentioned
[15:29] cremes it has a constructor (def initialize(*args) nil; end) just like any other class
[15:30] Tasser less abstraction than EM
[15:30] cremes you *could* use one instance of a class to manage multiple sockets; look at the one-handed-ping-pong example
[15:30] Tasser yeah, having that one here atm
[15:30] cremes yeah, EM is kind of confusing with the EM::Connection stuff
[15:31] Tasser meh, gotta go :-(
[15:31] cremes sure
[15:31] cremes i'm usually on irc from 8am to 5pm central standard time (gmt -6, i think)
[15:32] cremes ping me if you have more questions or send them to the 0mq ml
[15:33] bbigras Is there a way to build zeromq with mingw?
[15:33] cremes bbigras: luislavena (rubyinstaller.org guy) has been playing with that
[15:34] cremes he opened an issue on github to fix a problem he encountered
[15:34] cremes so as far as i know he succeeded
[15:34] bbigras cremes: nice, thanks!
[15:35] bbigras cremes: Do you know if anyone had success using zeromq with Qt without having to disable Qt's signal/slot macros?
[15:36] cremes i haven't heard anything about that, so no
[15:36] cremes you might try asking on the 0mq ML
[15:38] bbigras cremes: thanks
[18:17] ModusPwnens hi
[18:17] ModusPwnens Is there anyone in here that has used google protobufs with zeromq? I'm wondering what kind of throughput is normal when using google protobufs
[18:20] cremes ModusPwnens: i recommend you write a small benchmark that serializes/deserializes your data
[18:20] cremes and see what the upper limit is on your message rate
[18:20] cremes then 0mq will have that as its upper limit for throughput
[18:41] ModusPwnens yeah i did this cremes
[18:41] ModusPwnens I am just wondering whether or not my results are expected
[18:42] ModusPwnens Hmm, well actually, I benchmarked it with zeromq too
[18:42] ModusPwnens so it's timing how long it takes to send messages as well
[18:44] cremes ModusPwnens: yeah, take 0mq out of the equation to get an upper bound
[18:44] cremes *then* you can test with 0mq to see what kind of overhead it is introducing
[18:57] ModusPwnens Hmm, another thing, I am having somewhat surprising results with the remote/local throughput tests
[18:58] ModusPwnens I am just using localhost, but I only get around 200 Mb/s throughput, which seems low to me.
[18:59] cremes ModusPwnens: try increasing the message body size from 50 bytes (from the example you posted last week) to something larger
[18:59] cremes also, note that the remote/local tests are doing a ping pong with REQ/REP sockets
[19:00] cremes you could see higher throughput on a PUB socket
[19:01] ModusPwnens Yeah, I have tried with larger message sizes. 5000 byte and 2500 count messages
[19:01] ModusPwnens is 200mb/s normal on localhost? I would have thought it would be much much faster
[19:01] cremes and what did you see?
[19:01] cremes with the varying message sizes...?
[19:02] ModusPwnens I see about 200mb/s or 5000 messages/s
[19:02] cremes so you see the same throughput regardless of message size?
[19:03] ModusPwnens not really, when the parameters are smaller I see different values
[19:03] ModusPwnens when i said 200 earlier i was using these paremeters
[19:03] cremes so what does 200 MB/s represent? the *best* that you see, the *average* or the *worst*?
[19:04] ModusPwnens about the average
[19:04] ModusPwnens i see 180 sometimes, sometimes 220
[19:04] cremes this is on windows, right?
[19:04] ModusPwnens Yeah. I was wondering if it would be faster on linux
[19:05] cremes sometimes; it's hard to draw conclusions because this stuff is so dependent on OS and the hardware
[19:05] ModusPwnens Yeah. I notice that the official results on the zeromq website are ridiculously high
[19:05] ModusPwnens but I'm thinking that's because they have a godly computer with 12 gigs of ram and 4 cores
[19:05] ModusPwnens i only have 1 core on this computer
[19:06] ModusPwnens as well as only 3 gigs of ram
[19:06] icy size of ram does not matter, speed does
[19:06] ModusPwnens I know that has an effect, but I'm not sure how large an effect that would be.
[19:06] cremes that computer is ancient if it has only 1 core; i don't think intel has shipped a 1-core desktop cpu since around 2006
[19:07] ModusPwnens it's actually relatively new
[19:07] ModusPwnens amd sempron m100
[19:07] ModusPwnens which i think has only 1 core
[19:08] cremes amd is behind the curve; sorry :)
[19:09] ModusPwnens Heh, apparently.
[19:09] ModusPwnens Anyways, is that sort of throughput expected?
[19:09] cremes again, it's dependent upon OS and hardware
[19:09] ModusPwnens Hmm. So it's at least not abnormal?
[19:10] cremes if you have nothing to compare it to...?
[19:10] guido_g http://answers.yahoo.com/question/index?qid=20091030200427AAjvYJw <- "That is a pretty decent mobile single core processor."
[19:10] ModusPwnens Ah, well there you go i guess.
[19:11] icy well, what cpu usage do you get while benchmarking?
[19:11] icy maybe it's just really slow ram :)
[19:12] ModusPwnens i get 100% CPU usage
[19:13] ModusPwnens Does that mean the RAM is just slow?
[19:13] Steve-o what parameters are you using, I can compare with a single core Xeon right now
[19:13] ModusPwnens ok
[19:13] ModusPwnens for the cpu usage test I just ran
[19:13] ModusPwnens I used
[19:14] ModusPwnens 5000 byte messages
[19:14] ModusPwnens and 250,000 message count
[19:14] ModusPwnens all on localhost
[19:15] Steve-o I get 39,031 msgs/s and 1561 Mb/s
[19:15] guido_g ./local_thr tcp://127.0.0.1:5000 1024 100000
[19:15] guido_g message size: 1024 [B]
[19:15] guido_g message count: 100000
[19:15] guido_g mean throughput: 381327 [msg/s]
[19:15] guido_g mean throughput: 3123.831 [Mb/s]
[19:15] guido_g also a notebook
[19:16] ModusPwnens Hmm, that is definitely much higher than what I am getting.
[19:16] ModusPwnens how much ram do you have and what kind of processor?
[19:16] ModusPwnens guido that is
[19:16] guido_g Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
[19:17] guido_g ram usage is not a problem at all
[19:17] guido_g it's more a matter of cache and latency
[19:17] ModusPwnens Latency shouldn't really be a problem on localhost though...right?
[19:18] guido_g ram latency
[19:18] guido_g *sigh*
[19:19] icy and bandwidth
[19:19] ModusPwnens Oh..sorry about that, i misunderstood..
[19:19] guido_g message size: 1024 [B]
[19:19] guido_g message count: 10000
[19:19] guido_g mean throughput: 98001 [msg/s]
[19:19] guido_g mean throughput: 802.824 [Mb/s]
[19:20] guido_g via lan
[19:20] guido_g so obviously you're on a dog slow machine
[19:20] icy or windows :P
[19:21] guido_g hrhrhr
[19:22] icy nice webserver choice for zeromq.org
[19:23] ModusPwnens Hmm thanks. I will try to procure another computer to test this on.
[19:23] guido_g icy: you mean wikidot?
[19:24] icy I mean the lighttpd part :)
[19:31] icy hm local_thr does not seem to do anything on my osx box
[19:49] cremes icy: you need to run local_thr and remote_thr as a pair; one is the client and the other is the server
[19:49] icy ah right, thx
[19:53] icy ouf, this thing just sent me 1gb into swap
[19:54] icy sending 1000000 1kb messages
[19:54] icy I guess they get buffered in ram
[19:57] cremes icy: the receiver must not have pulled them off the queue fast enough
[19:57] cremes when i run those tests on my system, memory usage is constant (no queueing)
[19:58] cremes why don't you pastie the arguments you passed to both programs so we can comment
[19:59] icy tcp://127.0.0.1:5000 1024 1000000 for both
[19:59] icy maybe I should start local_thr before remote_thr :)
[20:00] cremes try lowering the 1 million to 10 thousand and monitor the memory size of the programs
[20:00] cremes yeah, start order is important...
[20:01] icy doing that I get it to work even though it hits into swap briefly (understandable, the sender will always be faster)
[20:02] cremes icy: not true; this test is using REQ/REP sockets so it should only have 1 message in flight at any given time
[20:02] cremes one sender should *not* be able to get ahead of the other
[20:03] cremes (i was wrong in my statement from 2:57; no queueing should occur)
[20:03] icy I get 40mb ram usage with 100k messages
[20:03] cremes are you running the C programs or using the samples from another language binding?
[20:04] icy perf/ <- the ones in there which are C I think
[20:05] cremes did you modify the code at all?
[20:05] icy no, downloaded tarball, ./configure, make and ran the apps
[20:05] cremes huh... what OS?
[20:05] icy osx
[20:06] cremes 2.0.9?
[20:06] cremes 0mq, that is
[20:06] icy yea
[20:06] Samy I see a lot of emphasis on lock-free algorithms on the ZeroMQ website.
[20:06] cremes weird
[20:06] cremes it should *not* have unbounded memory growth
[20:06] cremes you should file a bug
[20:06] Samy What lock-free objects does ZeroMQ use? Where would the source-code be for them?
[20:07] Samy The atomics interface seemed too simple to support more complex data structures, though I wasn't looking at the right thing.
[20:07] cremes Samy: check out the y_pipe stuff... i believe that is where the lock-free algorithms are used though sustrik would know better (he wrote it)
[20:07] icy http://singularity.cryosphere.de/pub/remote_thr.png (at this point local_thr is long gone already)
[20:07] Samy cremes, cool. Does sustrik IRC?
[20:08] cremes Samy: yes... he's usually in channel but he isn't here right now
[20:08] Samy Ok, thank you.
[20:08] icy the real mem does not show all the memory it uses as I'm already several hundred mb into swap at the time the screenshot was made
[20:08] cremes icy: that shouldn't be; i would open a bug and describe the problem
[20:09] cremes make sure to include 0mq release, OS, OS release, etc
[20:09] icy k
[20:10] cremes icy: hold on a sec...
[20:11] cremes were you doing local_thr or local_lat as your perf test?
[20:14] icy thr
[20:16] cremes the local_thr/remote_thr examples don't make any sense
[20:17] cremes the remote_thr program is using a REQ socket while the local_thr is using a SUB socket
[20:17] cremes the two are not compatible
[20:18] icy I'm totally knew to zeromq, just saw this benchmark app in the src dir and thought I'd give it a go :)
[20:18] cremes nm... i'm looking at the wrong stuff
[20:18] cremes argh...
[20:19] cremes okay, so remote_thr uses a PUB socket while local_thr uses a SUB
[20:19] cremes that is correct
[20:19] cremes you need to start local_thr first
[20:19] cremes remote_thr will slam your system by publishing as fast as possible, so there *will* be queueing
[20:20] cremes (i was thinking of the local_lat/remote_lat examples which uses different socket types)
[20:20] icy I did start local_thr first and even after that execited, remote_thr was allocating ram
[20:21] icy s/execited/exited/
[20:21] cremes yeah, i'm looking at it now...
[20:23] cremes if everything is working correctly, remote_thr should exit *first*
[20:55] dermoth I'm wondering if there's an easy way to monitor a queue size on a zeromq worker... It doesn't seems like there is anything in the API
[20:55] dermoth on a zeromq broker I mean
[20:55] dermoth i.e. a device
[21:02] ModusPwnens whoa
[21:02] ModusPwnens im getting a strange error with the benchmarking tests now
[21:04] dermoth my concern is that the PUSH workers may send more messages than can be processed by the PULL workers, eventually filling up the queue. This would be possible if the PULL workers sync their state to disk to avoid data loss...
[21:12] ModusPwnens hey does zeroMQ allocate memory for messages all at once?
[21:13] ModusPwnens Or maybe it is because I am using the publish subscribe topology so it just continuously creates messages and the receiving end is not fast enough..
[22:23] cremes ModusPwnens: yes to the second thing you said; the publisher outpaces the subscriber
[22:23] cremes dermoth: no, there is no way to fetch the queue size; check the mailing list for the reasons why
[22:23] cremes that topic has been raised and answered a bunch of times (someone should add it to the FAQ)
[22:24] cremes dermoth: also, check out HWM (high water mark) settings for the PUSH sockets
[22:24] cremes by setting HWM, it will block when the queue hits that message level (or return EAGAIN if you try sending with ZMQ_NOBLOCK)
[22:26] dermoth yes, but I need to know berore i'll ve blocking... I guess I could test the latency though
[22:26] cremes dermoth: what do you mean that you need to know before?
[22:26] cremes and what does latency have to do with it?
[22:27] dermoth well, if my queues start filling up at peak times I want to react before all pushers block... my primary goal in using zmq is to avoid blocking
[22:28] cremes ok, then use send with ZMQ_NOBLOCK and test for EAGAIN; when you get it then you know you have hit your high water mark
[22:28] cremes and you can take whatever action is necessary
[22:29] dermoth cremes, if the queue fill up on the device, then there will be some latency between the time I push to the queue and the time my worked gets the event. I can push a special message and have whichever worker gets it respond to me, then I know the latency. it it rises thern I need more workers downstream
[22:29] cremes ok
[22:30] cremes you could also have another pair of sockets where each worker tells the server/pusher that it has received a message
[22:30] cremes using this "out of band" communication, you could publish *only* those messages that can be immediately handled by a worker
[22:31] cremes you would have at most 1 message in someone's queue because you would not push another one until each one had been acknowledged
[22:31] cremes that seems better than trying to rely on some weird latency calculation that might not be trustworthy
[22:31] dermoth i'll implement XREQ/XREP for thing I need to make sure a worker is getting it... the rest is high-throghtput stuff that can suffer a small percentage loss...
[22:32] cremes definitely take a look at HWM
[22:32] cremes i think it does what you need
[22:33] dermoth the point is that I don't want to block on the sending side... HWM will be useful in logging error conditions, but I should never end up hitting this limit...
[22:34] cremes dermoth: sorry, but your requirements don't make sense to me
[22:34] cremes if you could get the queue length, you would probably prevent your pusher from sending more messages if it hit some threshold, right?
[22:35] cremes if so, then this is exactly what you can do with HWM
[22:35] cremes couple HWM with NO_BLOCK and you'll get your "signal" that there aren't enough workers
[22:35] dermoth well I don't really control the pusher... it logs as data comes it. I need to have enough pullers to absorb the data as it comes in
[22:35] cremes assuming you ever hit the HWM
[22:36] cremes so you don't have any control over throttling the pusher?
[22:37] dermoth because if the pushers block, it's not goung to work anymore and I will loose data - the point is that the pushhers have to be non-blocking. But i'll figure out something... maybe even measuring the rss size of the process might work.
[22:37] cremes dermoth: let me say it again then.... use send with NO_BLOCK and test for EAGAIN
[22:37] cremes this will NOT BLOCK
[22:38] dermoth yes I get that ;)
[22:38] cremes so what's the problem then? ;)
[22:39] dermoth what I mean, EAGIN is bad too - I will log it & alert on it, but I want to know before that happens... If I start getting latency cpikes when I turn worn downstream workers, or during pikes, even before filling up the queues all the way up to the pushers, I want to know it. But that's fine, I'll work with what I have ;)
[22:40] cremes ok, i see
[22:40] cremes let us know what you come up with or add your solution to the wiki
[22:41] dermoth Sure. btw, it having multiple parallel devices wiorking well? i.e. if one crash will it nicely fall back to the other one? Will my XREQ/XREP suffer any latency? More that 2-3 seconds avg per requests may become problematic
[22:42] dermoth here actually on the ends it more like a REQ/REP socket, byt the device will be XREQ/XREP to pass & load-balance the messages
[22:43] cremes a device is just a fancy packaging for 2 sockets to aid with load balancing
[22:43] cremes if you are worried about a device crashing, then messages in flight could be lost
[22:43] dermoth i'll probably be running some tests anyway... Thanks
[22:44] cremes if you can't handle data loss, then you need to add some code around all of this to ack/nak messages and retry when things timeout or disappear
[22:44] cremes none of that is built in to 0mq; you have to build it on top
[22:44] dermoth yes... a srever crashing is pretty rare, what i'm concerned about it if the system will keep on working. the REQ/REP are for where I can't handle loss and I will loose the request if the device crash with my message
[22:45] cremes and unless you have a really slow link or are sending *very* large messages, i can't imagine it would ever take 2-3 seconds to send a message
[22:45] cremes if you use XREQ/XREP and do explicit acks, then you could have things continue to work
[22:46] cremes if you use REQ/REP sockets, they enforce a very strict send/recv/send/recv pattern, so a crashed peer would not be recoverable
[22:46] dermoth well, i'm thinking possible tcp timeouts, etc. I would assube ZeroMQ is able to handle well peers, but I will test it anyway. Thanks!
[22:46] cremes ok, good luck
[22:46] dermoth to handle well down peers
[22:47] cremes dermoth: TCP timeouts are not exposed to you via the 0mq api
[22:48] cremes 0mq may see a connection disappear at the tcp level, but that doesn't mean that the socket is no good
[22:48] cremes because a 0mq socket can be bound or connected to multiple endpoints
[22:48] cremes one endpoint failure does not cause the whole 0mq socket to fail
[22:49] cremes the 0mq socket will continue to load balance messages to any "surviving" endpoints
[22:51] dermoth ok, sounds good. FWIW, what I have in mind to mointor PUSH/PULL latency is to send a special message with an IP/port to reply withm and have the workers respond with an udp packet. I can release the check as a Nagios plugin like I usually do for my monitoring scripts, but obviously the other end is up to the developer to get right
[22:53] cremes sounds neat