[Time] Name | Message |
[05:57] lestrrat
|
I'm having problems using fork() and making my parent process talk to the child processes via 0mq. I'm getting segfaults after the child exits and while the parent is in recv()
|
[05:57] lestrrat
|
is this supposed to work?
|
[06:55] pieterh
|
sustrik: you there?
|
[09:51] pieterh
|
anyone here felt that XREP and XREQ could use better names?
|
[09:52] lestrrat
|
I just gave a talk to my coworkers about zeromq, and yes, better_names++
|
[09:52] pieterh
|
I was thinking of ROUTE and FORWARD
|
[09:53] pieterh
|
XREP creates routing envelopes around incoming messages and uses these on output to route replies back to original clients
|
[09:54] pieterh
|
XREQ just forwards messages in both directions without touching them
|
[09:55] pieterh
|
XREP really looks like a router... it's the only socket type that lets you address specific connections
|
[09:55] keffo
|
I like 'route' very much, not forward so much
|
[09:55] pieterh
|
yeah, forward wasn't inspired
|
[09:55] pieterh
|
it has to be a verb
|
[09:56] pieterh
|
that says "move stuff in both directions but don't mess with it"
|
[09:56] pieterh
|
PORT
|
[09:56] pieterh
|
XFER
|
[10:11] pieterh
|
keffo: here's a thought: http://www.zeromq.org/sandbox:mudem
|
[10:11] pieterh
|
lestrrat: does that ring a bell?
|
[10:13] lestrrat
|
thinking
|
[10:15] lestrrat
|
hmm. I grok ROUTE, but not the mudem part :) but I don't have a great alternative plan either
|
[10:15] lestrrat
|
naming is hard, eh.
|
[10:17] pieterh
|
well, mudem is a play on modem, modulator/demodulator...
|
[10:17] keffo
|
pieterh, I have a 'route-codec' in my code...
|
[10:17] lestrrat
|
yeah, I know
|
[10:17] pieterh
|
i don't like invented words but if one has to invent them they should be expressive
|
[10:18] keffo
|
whichi encodes/decodes routes for an xreq
|
[10:18] pieterh
|
keffo: sounds right
|
[10:18] pieterh
|
creates and uses envelopes, right?
|
[10:20] keffo
|
it just handles them, contains a payload the 'enduser' is interested in, and also has a sendroute function
|
[10:20] pieterh
|
well, an alternative to mudem: dispatch
|
[10:21] keffo
|
naa, someone who implements any type of loadbalancing of messages is in essense a dispatcher, imo
|
[10:21] pieterh
|
true
|
[10:22] pieterh
|
it's the combo of fanout and fanin
|
[10:23] pieterh
|
i think "multiplex" is wrong since that suggest copying whereas its distribution
|
[10:23] pieterh
|
*it's
|
[10:23] lestrrat
|
yeah, I thought about multiplex, but it didn't quite fit
|
[10:24] pieterh
|
in terms of use cases, xreq is like push+pull, it ventilates and sinks at once
|
[10:25] pieterh
|
one could create a nice pipeline pattern using just XREQ to XREQ
|
[10:27] pieterh
|
how about... something more visual... 1TON
|
[10:27] lestrrat
|
1000kg!
|
[10:28] pieterh
|
yeah
|
[10:28] pieterh
|
1-to-N for the pedantic of us
|
[10:30] pieterh
|
http://www.zeromq.org/sandbox:1ton
|
[10:31] pieterh
|
it kind of feels more like a building block now
|
[10:42] keffo
|
I have an issue where my worker process simply dissapears, but I cant seem to trap it
|
[10:42] keffo
|
no exceptions, no atexits are run.. nada..
|
[10:42] pieterh
|
what OS?
|
[10:42] keffo
|
very annoying!
|
[10:43] keffo
|
win7
|
[10:43] pieterh
|
ah, that is a known problem
|
[10:43] keffo
|
hu?
|
[10:43] pieterh
|
the usual solution is to upgrade to Linux
|
[10:43] pieterh
|
sorry :-)
|
[10:43] keffo
|
caused by zmq??
|
[10:44] keffo
|
tossing away 98% of the global userbase is hardly an upgrade btw :)
|
[10:44] pieterh
|
I was kidding, my bad
|
[10:44] pieterh
|
what language are you using?
|
[10:44] keffo
|
c++, lua
|
[10:44] pieterh
|
so you need a debug build of 0MQ IMO
|
[10:44] keffo
|
oh it's all debug, debugger is attached too :)
|
[10:44] pieterh
|
aw :-(
|
[10:45] keffo
|
gives me nothing.. I've tried all routes I can think of
|
[10:45] keffo
|
abort()?
|
[10:45] keffo
|
but why would that be called?
|
[10:45] pieterh
|
DebugBreak() afair, then continue it in the debugger
|
[10:45] pieterh
|
... assertion failure?
|
[10:45] keffo
|
nada :)
|
[10:46] keffo
|
no asserts. no breakpoints, simply dissapears.. windows event log shows nothing
|
[10:46] keffo
|
pussling actually
|
[10:46] keffo
|
no exceptions are raised either
|
[10:46] keffo
|
It's as if the app cleanly exits, except it cant since it"s a while(true)
|
[10:46] pieterh
|
it could exit in another thread I guess
|
[10:47] pieterh
|
i've not worked on win32 for ages... maybe someone else here can be more helpful
|
[10:47] keffo
|
does zmq ever call exit?
|
[10:47] keffo
|
(or abort)
|
[10:48] pieterh
|
nada
|
[10:48] pieterh
|
asserts, yes
|
[10:48] pieterh
|
98%? keffo, 2010 is the Year of Linux
|
[10:48] pieterh
|
it's no more than 97.85% by now
|
[10:49] keffo
|
if my mom could use any generic desktop linux without calling me, then I"d agree :)
|
[10:50] pieterh
|
hah, my mum actually does use linux and has for years...
|
[10:51] pieterh
|
but then again she's currently asking me how to hide her IP address so she can troll Anonymous so perhaps she's not typical...
|
[10:51] pieterh
|
keffo: if you can make a reproducible case, and chop it down, maybe we can reproduce it on another platform
|
[10:52] keffo
|
lord no, that would take ages :)
|
[10:52] keffo
|
I just want to somehow detect -when- it happens, and go from there, but so far I've been unable to
|
[10:52] pieterh
|
then, my friend, you might have to resort to...
|
[10:53] pieterh
|
if really you have no other option...
|
[10:53] keffo
|
print? =)
|
[10:53] pieterh
|
yeah :-)
|
[10:53] pieterh
|
don't forget the fflush (stdout);
|
[10:54] keffo
|
It's remarkably reproducable though
|
[10:55] pieterh
|
well, that's always good
|
[10:55] keffo
|
4th time I calculate pi, it dissapears
|
[10:55] pieterh
|
hopefully it remains stable as you add hundreds of prints
|
[10:55] pieterh
|
you're calculating pi?
|
[10:57] keffo
|
printing is not the problem, I generate ~250k of logs on each run :)
|
[10:57] keffo
|
pi yeah, easy and verifiable thing to calc distributed :)
|
[10:58] pieterh
|
is there an algo for distributed pi calculation somewhere?
|
[10:59] keffo
|
sure
|
[10:59] keffo
|
tons of different I guess
|
[10:59] pieterh
|
i had an idea for a supermassive 0MQ project... lol
|
[10:59] pieterh
|
not original but who cares...
|
[10:59] keffo
|
as did I :)
|
[10:59] keffo
|
for i=self.beginspan, self.endspan do
|
[10:59] keffo
|
localpi = localpi + (1.0 / (i * 4.0 + 1.0) )
|
[10:59] keffo
|
localpi = localpi - (1.0 / (i * 4.0 + 3.0) )
|
[11:00] keffo
|
do that for each subspan of some arbitrary length, then sum them all up and the answer * 4 is pi :)
|
[11:00] pieterh
|
not 42? weird...
|
[11:00] keffo
|
hehe
|
[11:00] pieterh
|
aight, so if we have a server somewhere that distributes workloads, and a simple 0MQ client that accepts them...
|
[11:01] pieterh
|
has surely been done dozens of times
|
[11:02] keffo
|
I'm doing a fairly more complicated scenario, but yeah
|
[11:05] keffo
|
and I guess you can figure out why I want something a bit more flexibe than roundrobin :)
|
[11:06] pieterh
|
right now I'm writing examples on how to use XREP to do routing
|
[11:06] keffo
|
good, it was messy :)
|
[11:06] pieterh
|
how so?
|
[11:06] pieterh
|
you mean no documentation on the envelopes etc.?
|
[11:06] keffo
|
just not very nicely explained
|
[11:06] pieterh
|
right...
|
[11:13] keffo
|
I would explain it as a stack+req...
|
[11:13] keffo
|
push, push, push, payload, then pop,pop,pop, payload on the other side
|
[11:14] keffo
|
btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
|
[11:21] keffo
|
btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
|
[11:21] pieterh
|
hmm, you mean a reply?
|
[11:21] pieterh
|
with or without identity?
|
[11:22] pieterh
|
this is what 0MQ/2.1 is fixing
|
[11:22] pieterh
|
it will wait in some cases, discard in other cases
|
[11:22] keffo
|
in general, the whole reconnect business
|
[11:22] keffo
|
if it goes into the queue but never out, what happens to it?
|
[11:23] pieterh
|
well, the queue is per socket, eventually
|
[11:24] pieterh
|
there is not yet a proper explanation of how the 2.1 socket close semantics should work
|
[11:24] pieterh
|
afaik
|
[11:24] keffo
|
client(100msgs) -> queuedev -> service, then back again, except the client is gone forever..
|
[11:24] pieterh
|
anonymous clients -> messages get thrown away
|
[11:25] pieterh
|
client with identity -> messages persist as long as service is running
|
[11:25] keffo
|
ok
|
[11:25] pieterh
|
0MQ does have the concept of a connection going away
|
[11:26] pieterh
|
otherwise PUB sockets for example would end up with horrid resource leaks
|
[11:26] keffo
|
I need to introduce some sort of session.. if I have (known)client A, does a bunch of test junk(like pi), but aborts prematurely, but then reconnects to start some other type of job, I dont want to receive a bunch of old pi results :)
|
[11:27] keffo
|
and I would need to be able to tell all parties involved to dump everything related to an "old" session as well
|
[11:28] pieterh
|
keffo: this starts to be industrial design work
|
[11:29] keffo
|
pieterh, what do you mean?
|
[11:29] pieterh
|
i mean, what you're making is heavy duty...
|
[11:30] keffo
|
oh very much :)
|
[11:30] keffo
|
it has fried my brain on may occasions.. Tons of papers of diagrams spread all over the place :)
|
[11:30] pieterh
|
if you have budget to throw at it, i can recommend an industrial 0MQ designer like Mato here
|
[11:31] keffo
|
the bulk of the work is not the transport & topology though
|
[11:31] keffo
|
although that needs to obviously be stable
|
[11:31] pieterh
|
well, you need an infrastructure that understands 'sessions'
|
[11:32] keffo
|
Sure, but that"s already handled
|
[11:32] pieterh
|
what do you still need then?
|
[11:32] pieterh
|
apart from the thing not crashing...
|
[11:32] keffo
|
hehe
|
[11:32] keffo
|
lingering data trying to reconnect for one.
|
[11:33] keffo
|
oh and more liberal means of implementing loadbalancing, but I've made that point already :)
|
[11:34] pieterh
|
well, load balancing using XREP routing is pretty clear, and will be nicely explained in Ch3 of the Guide
|
[11:34] keffo
|
will be? =)
|
[11:34] pieterh
|
is in progress if I was not chatting here :-)
|
[11:34] pieterh
|
anything to do with maintaining overall state is a different kettle of chicken, though
|
[11:34] keffo
|
But I think I know what that will say by now :)
|
[11:35] pieterh
|
hopefully, yeah
|
[11:35] keffo
|
what I'm doing is something I've thought about for years though, so it should work :)
|
[11:35] keffo
|
zmq solved a big gaping questionmark though :)
|
[11:36] keffo
|
I might be getting a job soon though, so dev on this wll sadly be sidetracked to weekends and evenings only though :/
|
[11:37] pieterh
|
is it open source?
|
[11:38] keffo
|
it might be eventually!
|
[11:38] keffo
|
would benefit the lua community I guess
|
[11:38] pieterh
|
well... i've learned two relevant things here having done software for way too long
|
[11:39] pieterh
|
a. if it's not open source it will die
|
[11:39] pieterh
|
b. if you don't start as open source you can't make it work afterwards
|
[11:40] keffo
|
I dont agree, and I've never done anything other than software :)
|
[11:40] pieterh
|
cause it's not about building code but about building community...
|
[11:40] pieterh
|
good luck, anyhow
|
[11:41] keffo
|
I wasnt thinking of opensource as in leveraging resources, but simply aiding someone else, when I'm done with it :)
|
[11:41] pieterh
|
nah, without people who helped make the code, it dies as soon as your evenings and weekends aren't available any more
|
[11:42] pieterh
|
it's not about leveraging resources but about software that lives past the "free time" of its creator
|
[11:43] pieterh
|
imho
|
[11:43] keffo
|
Oh, it's not free time at all, but I need to work for a bit to not starve :)
|
[11:43] pieterh
|
starving is not pleasant, no
|
[11:44] keffo
|
Not so much starving, but keeping girlfriend happier :)
|
[11:45] keffo
|
when this is done, it will take me & the other dude about 2-3 weeks to produce the actual product we'll eventually sell.. -That- part is already planned and so forth..
|
[11:45] keffo
|
And ones that happens, there is no benefit to keep the code not opensource
|
[11:46] keffo
|
err, once..
|
[11:46] pieterh
|
:-) good spellng is hrad somtimes
|
[11:55] keffo
|
um, this is odd
|
[11:56] keffo
|
I wonder if lua might freak a little at some weird binary
|
[13:51] CIA-20
|
zeromq2: 03Martin Sustrik 07master * r6d4ffd9 10/ (src/fq.cpp src/lb.cpp): Bug in fq_t and lb_t (when used via ZMQ_EVENTS option) fixed - http://bit.ly/cvOPzL
|
[15:14] CIA-20
|
zeromq2: 03Martin Sustrik 07master * rf374431 10/ src/pipe.hpp : get rid of 'has virtual functions but non-virtual destructor' warnings in pipe.hpp - http://bit.ly/9Relxm
|
[15:21] Tasser
|
cremes, it's more about the ruby part you wrote
|
[15:21] cremes
|
Tasser: i'm around if you have questions
|
[15:21] Tasser
|
cremes, oh, just asking for the big picture
|
[15:22] Tasser
|
aka what is where, how to stuff flows
|
[15:22] cremes
|
sure...
|
[15:22] Tasser
|
and probably write it down into your git
|
[15:22] Tasser
|
HACKING or something like that :-)
|
[15:22] cremes
|
whatever i write here, i'll clean up and add to the README
|
[15:23] cremes
|
ZM::Reactor is a thread that contains a single ZMQ context
|
[15:23] cremes
|
from this context, you can create any kind of socket
|
[15:23] cremes
|
(stop me if i'm not answering your question)
|
[15:24] cremes
|
during socket creation, you pass a ruby object that will act as that socket's handler
|
[15:24] Tasser
|
callback?
|
[15:24] cremes
|
the handler should provide on_attach, on_writable and on_readable methods
|
[15:25] cremes
|
the on_attach method is called right away and lets you set things up (kind of like a constructor)
|
[15:25] Tasser
|
so why not #new ?
|
[15:25] cremes
|
the on_readable and on_writable methods are called when the socket is polled for those events and finds them to be true
|
[15:26] cremes
|
explain what you mean by "not #new"?
|
[15:27] Tasser
|
create a new instance per socket, so call #new and on that instance #on_writable, #on_readable
|
[15:29] cremes
|
the handler instance is just a regular ruby class that implements the 3 methods i mentioned
|
[15:29] cremes
|
it has a constructor (def initialize(*args) nil; end) just like any other class
|
[15:30] Tasser
|
less abstraction than EM
|
[15:30] cremes
|
you *could* use one instance of a class to manage multiple sockets; look at the one-handed-ping-pong example
|
[15:30] Tasser
|
yeah, having that one here atm
|
[15:30] cremes
|
yeah, EM is kind of confusing with the EM::Connection stuff
|
[15:31] Tasser
|
meh, gotta go :-(
|
[15:31] cremes
|
sure
|
[15:31] cremes
|
i'm usually on irc from 8am to 5pm central standard time (gmt -6, i think)
|
[15:32] cremes
|
ping me if you have more questions or send them to the 0mq ml
|
[15:33] bbigras
|
Is there a way to build zeromq with mingw?
|
[15:33] cremes
|
bbigras: luislavena (rubyinstaller.org guy) has been playing with that
|
[15:34] cremes
|
he opened an issue on github to fix a problem he encountered
|
[15:34] cremes
|
so as far as i know he succeeded
|
[15:34] bbigras
|
cremes: nice, thanks!
|
[15:35] bbigras
|
cremes: Do you know if anyone had success using zeromq with Qt without having to disable Qt's signal/slot macros?
|
[15:36] cremes
|
i haven't heard anything about that, so no
|
[15:36] cremes
|
you might try asking on the 0mq ML
|
[15:38] bbigras
|
cremes: thanks
|
[18:17] ModusPwnens
|
hi
|
[18:17] ModusPwnens
|
Is there anyone in here that has used google protobufs with zeromq? I'm wondering what kind of throughput is normal when using google protobufs
|
[18:20] cremes
|
ModusPwnens: i recommend you write a small benchmark that serializes/deserializes your data
|
[18:20] cremes
|
and see what the upper limit is on your message rate
|
[18:20] cremes
|
then 0mq will have that as its upper limit for throughput
|
[18:41] ModusPwnens
|
yeah i did this cremes
|
[18:41] ModusPwnens
|
I am just wondering whether or not my results are expected
|
[18:42] ModusPwnens
|
Hmm, well actually, I benchmarked it with zeromq too
|
[18:42] ModusPwnens
|
so it's timing how long it takes to send messages as well
|
[18:44] cremes
|
ModusPwnens: yeah, take 0mq out of the equation to get an upper bound
|
[18:44] cremes
|
*then* you can test with 0mq to see what kind of overhead it is introducing
|
[18:57] ModusPwnens
|
Hmm, another thing, I am having somewhat surprising results with the remote/local throughput tests
|
[18:58] ModusPwnens
|
I am just using localhost, but I only get around 200 Mb/s throughput, which seems low to me.
|
[18:59] cremes
|
ModusPwnens: try increasing the message body size from 50 bytes (from the example you posted last week) to something larger
|
[18:59] cremes
|
also, note that the remote/local tests are doing a ping pong with REQ/REP sockets
|
[19:00] cremes
|
you could see higher throughput on a PUB socket
|
[19:01] ModusPwnens
|
Yeah, I have tried with larger message sizes. 5000 byte and 2500 count messages
|
[19:01] ModusPwnens
|
is 200mb/s normal on localhost? I would have thought it would be much much faster
|
[19:01] cremes
|
and what did you see?
|
[19:01] cremes
|
with the varying message sizes...?
|
[19:02] ModusPwnens
|
I see about 200mb/s or 5000 messages/s
|
[19:02] cremes
|
so you see the same throughput regardless of message size?
|
[19:03] ModusPwnens
|
not really, when the parameters are smaller I see different values
|
[19:03] ModusPwnens
|
when i said 200 earlier i was using these paremeters
|
[19:03] cremes
|
so what does 200 MB/s represent? the *best* that you see, the *average* or the *worst*?
|
[19:04] ModusPwnens
|
about the average
|
[19:04] ModusPwnens
|
i see 180 sometimes, sometimes 220
|
[19:04] cremes
|
this is on windows, right?
|
[19:04] ModusPwnens
|
Yeah. I was wondering if it would be faster on linux
|
[19:05] cremes
|
sometimes; it's hard to draw conclusions because this stuff is so dependent on OS and the hardware
|
[19:05] ModusPwnens
|
Yeah. I notice that the official results on the zeromq website are ridiculously high
|
[19:05] ModusPwnens
|
but I'm thinking that's because they have a godly computer with 12 gigs of ram and 4 cores
|
[19:05] ModusPwnens
|
i only have 1 core on this computer
|
[19:06] ModusPwnens
|
as well as only 3 gigs of ram
|
[19:06] icy
|
size of ram does not matter, speed does
|
[19:06] ModusPwnens
|
I know that has an effect, but I'm not sure how large an effect that would be.
|
[19:06] cremes
|
that computer is ancient if it has only 1 core; i don't think intel has shipped a 1-core desktop cpu since around 2006
|
[19:07] ModusPwnens
|
it's actually relatively new
|
[19:07] ModusPwnens
|
amd sempron m100
|
[19:07] ModusPwnens
|
which i think has only 1 core
|
[19:08] cremes
|
amd is behind the curve; sorry :)
|
[19:09] ModusPwnens
|
Heh, apparently.
|
[19:09] ModusPwnens
|
Anyways, is that sort of throughput expected?
|
[19:09] cremes
|
again, it's dependent upon OS and hardware
|
[19:09] ModusPwnens
|
Hmm. So it's at least not abnormal?
|
[19:10] cremes
|
if you have nothing to compare it to...?
|
[19:10] guido_g
|
http://answers.yahoo.com/question/index?qid=20091030200427AAjvYJw <- "That is a pretty decent mobile single core processor."
|
[19:10] ModusPwnens
|
Ah, well there you go i guess.
|
[19:11] icy
|
well, what cpu usage do you get while benchmarking?
|
[19:11] icy
|
maybe it's just really slow ram :)
|
[19:12] ModusPwnens
|
i get 100% CPU usage
|
[19:13] ModusPwnens
|
Does that mean the RAM is just slow?
|
[19:13] Steve-o
|
what parameters are you using, I can compare with a single core Xeon right now
|
[19:13] ModusPwnens
|
ok
|
[19:13] ModusPwnens
|
for the cpu usage test I just ran
|
[19:13] ModusPwnens
|
I used
|
[19:14] ModusPwnens
|
5000 byte messages
|
[19:14] ModusPwnens
|
and 250,000 message count
|
[19:14] ModusPwnens
|
all on localhost
|
[19:15] Steve-o
|
I get 39,031 msgs/s and 1561 Mb/s
|
[19:15] guido_g
|
./local_thr tcp://127.0.0.1:5000 1024 100000
|
[19:15] guido_g
|
message size: 1024 [B]
|
[19:15] guido_g
|
message count: 100000
|
[19:15] guido_g
|
mean throughput: 381327 [msg/s]
|
[19:15] guido_g
|
mean throughput: 3123.831 [Mb/s]
|
[19:15] guido_g
|
also a notebook
|
[19:16] ModusPwnens
|
Hmm, that is definitely much higher than what I am getting.
|
[19:16] ModusPwnens
|
how much ram do you have and what kind of processor?
|
[19:16] ModusPwnens
|
guido that is
|
[19:16] guido_g
|
Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
|
[19:17] guido_g
|
ram usage is not a problem at all
|
[19:17] guido_g
|
it's more a matter of cache and latency
|
[19:17] ModusPwnens
|
Latency shouldn't really be a problem on localhost though...right?
|
[19:18] guido_g
|
ram latency
|
[19:18] guido_g
|
*sigh*
|
[19:19] icy
|
and bandwidth
|
[19:19] ModusPwnens
|
Oh..sorry about that, i misunderstood..
|
[19:19] guido_g
|
message size: 1024 [B]
|
[19:19] guido_g
|
message count: 10000
|
[19:19] guido_g
|
mean throughput: 98001 [msg/s]
|
[19:19] guido_g
|
mean throughput: 802.824 [Mb/s]
|
[19:20] guido_g
|
via lan
|
[19:20] guido_g
|
so obviously you're on a dog slow machine
|
[19:20] icy
|
or windows :P
|
[19:21] guido_g
|
hrhrhr
|
[19:22] icy
|
nice webserver choice for zeromq.org
|
[19:23] ModusPwnens
|
Hmm thanks. I will try to procure another computer to test this on.
|
[19:23] guido_g
|
icy: you mean wikidot?
|
[19:24] icy
|
I mean the lighttpd part :)
|
[19:31] icy
|
hm local_thr does not seem to do anything on my osx box
|
[19:49] cremes
|
icy: you need to run local_thr and remote_thr as a pair; one is the client and the other is the server
|
[19:49] icy
|
ah right, thx
|
[19:53] icy
|
ouf, this thing just sent me 1gb into swap
|
[19:54] icy
|
sending 1000000 1kb messages
|
[19:54] icy
|
I guess they get buffered in ram
|
[19:57] cremes
|
icy: the receiver must not have pulled them off the queue fast enough
|
[19:57] cremes
|
when i run those tests on my system, memory usage is constant (no queueing)
|
[19:58] cremes
|
why don't you pastie the arguments you passed to both programs so we can comment
|
[19:59] icy
|
tcp://127.0.0.1:5000 1024 1000000 for both
|
[19:59] icy
|
maybe I should start local_thr before remote_thr :)
|
[20:00] cremes
|
try lowering the 1 million to 10 thousand and monitor the memory size of the programs
|
[20:00] cremes
|
yeah, start order is important...
|
[20:01] icy
|
doing that I get it to work even though it hits into swap briefly (understandable, the sender will always be faster)
|
[20:02] cremes
|
icy: not true; this test is using REQ/REP sockets so it should only have 1 message in flight at any given time
|
[20:02] cremes
|
one sender should *not* be able to get ahead of the other
|
[20:03] cremes
|
(i was wrong in my statement from 2:57; no queueing should occur)
|
[20:03] icy
|
I get 40mb ram usage with 100k messages
|
[20:03] cremes
|
are you running the C programs or using the samples from another language binding?
|
[20:04] icy
|
perf/ <- the ones in there which are C I think
|
[20:05] cremes
|
did you modify the code at all?
|
[20:05] icy
|
no, downloaded tarball, ./configure, make and ran the apps
|
[20:05] cremes
|
huh... what OS?
|
[20:05] icy
|
osx
|
[20:06] cremes
|
2.0.9?
|
[20:06] cremes
|
0mq, that is
|
[20:06] icy
|
yea
|
[20:06] Samy
|
I see a lot of emphasis on lock-free algorithms on the ZeroMQ website.
|
[20:06] cremes
|
weird
|
[20:06] cremes
|
it should *not* have unbounded memory growth
|
[20:06] cremes
|
you should file a bug
|
[20:06] Samy
|
What lock-free objects does ZeroMQ use? Where would the source-code be for them?
|
[20:07] Samy
|
The atomics interface seemed too simple to support more complex data structures, though I wasn't looking at the right thing.
|
[20:07] cremes
|
Samy: check out the y_pipe stuff... i believe that is where the lock-free algorithms are used though sustrik would know better (he wrote it)
|
[20:07] icy
|
http://singularity.cryosphere.de/pub/remote_thr.png (at this point local_thr is long gone already)
|
[20:07] Samy
|
cremes, cool. Does sustrik IRC?
|
[20:08] cremes
|
Samy: yes... he's usually in channel but he isn't here right now
|
[20:08] Samy
|
Ok, thank you.
|
[20:08] icy
|
the real mem does not show all the memory it uses as I'm already several hundred mb into swap at the time the screenshot was made
|
[20:08] cremes
|
icy: that shouldn't be; i would open a bug and describe the problem
|
[20:09] cremes
|
make sure to include 0mq release, OS, OS release, etc
|
[20:09] icy
|
k
|
[20:10] cremes
|
icy: hold on a sec...
|
[20:11] cremes
|
were you doing local_thr or local_lat as your perf test?
|
[20:14] icy
|
thr
|
[20:16] cremes
|
the local_thr/remote_thr examples don't make any sense
|
[20:17] cremes
|
the remote_thr program is using a REQ socket while the local_thr is using a SUB socket
|
[20:17] cremes
|
the two are not compatible
|
[20:18] icy
|
I'm totally knew to zeromq, just saw this benchmark app in the src dir and thought I'd give it a go :)
|
[20:18] cremes
|
nm... i'm looking at the wrong stuff
|
[20:18] cremes
|
argh...
|
[20:19] cremes
|
okay, so remote_thr uses a PUB socket while local_thr uses a SUB
|
[20:19] cremes
|
that is correct
|
[20:19] cremes
|
you need to start local_thr first
|
[20:19] cremes
|
remote_thr will slam your system by publishing as fast as possible, so there *will* be queueing
|
[20:20] cremes
|
(i was thinking of the local_lat/remote_lat examples which uses different socket types)
|
[20:20] icy
|
I did start local_thr first and even after that execited, remote_thr was allocating ram
|
[20:21] icy
|
s/execited/exited/
|
[20:21] cremes
|
yeah, i'm looking at it now...
|
[20:23] cremes
|
if everything is working correctly, remote_thr should exit *first*
|
[20:55] dermoth
|
I'm wondering if there's an easy way to monitor a queue size on a zeromq worker... It doesn't seems like there is anything in the API
|
[20:55] dermoth
|
on a zeromq broker I mean
|
[20:55] dermoth
|
i.e. a device
|
[21:02] ModusPwnens
|
whoa
|
[21:02] ModusPwnens
|
im getting a strange error with the benchmarking tests now
|
[21:04] dermoth
|
my concern is that the PUSH workers may send more messages than can be processed by the PULL workers, eventually filling up the queue. This would be possible if the PULL workers sync their state to disk to avoid data loss...
|
[21:12] ModusPwnens
|
hey does zeroMQ allocate memory for messages all at once?
|
[21:13] ModusPwnens
|
Or maybe it is because I am using the publish subscribe topology so it just continuously creates messages and the receiving end is not fast enough..
|
[22:23] cremes
|
ModusPwnens: yes to the second thing you said; the publisher outpaces the subscriber
|
[22:23] cremes
|
dermoth: no, there is no way to fetch the queue size; check the mailing list for the reasons why
|
[22:23] cremes
|
that topic has been raised and answered a bunch of times (someone should add it to the FAQ)
|
[22:24] cremes
|
dermoth: also, check out HWM (high water mark) settings for the PUSH sockets
|
[22:24] cremes
|
by setting HWM, it will block when the queue hits that message level (or return EAGAIN if you try sending with ZMQ_NOBLOCK)
|
[22:26] dermoth
|
yes, but I need to know berore i'll ve blocking... I guess I could test the latency though
|
[22:26] cremes
|
dermoth: what do you mean that you need to know before?
|
[22:26] cremes
|
and what does latency have to do with it?
|
[22:27] dermoth
|
well, if my queues start filling up at peak times I want to react before all pushers block... my primary goal in using zmq is to avoid blocking
|
[22:28] cremes
|
ok, then use send with ZMQ_NOBLOCK and test for EAGAIN; when you get it then you know you have hit your high water mark
|
[22:28] cremes
|
and you can take whatever action is necessary
|
[22:29] dermoth
|
cremes, if the queue fill up on the device, then there will be some latency between the time I push to the queue and the time my worked gets the event. I can push a special message and have whichever worker gets it respond to me, then I know the latency. it it rises thern I need more workers downstream
|
[22:29] cremes
|
ok
|
[22:30] cremes
|
you could also have another pair of sockets where each worker tells the server/pusher that it has received a message
|
[22:30] cremes
|
using this "out of band" communication, you could publish *only* those messages that can be immediately handled by a worker
|
[22:31] cremes
|
you would have at most 1 message in someone's queue because you would not push another one until each one had been acknowledged
|
[22:31] cremes
|
that seems better than trying to rely on some weird latency calculation that might not be trustworthy
|
[22:31] dermoth
|
i'll implement XREQ/XREP for thing I need to make sure a worker is getting it... the rest is high-throghtput stuff that can suffer a small percentage loss...
|
[22:32] cremes
|
definitely take a look at HWM
|
[22:32] cremes
|
i think it does what you need
|
[22:33] dermoth
|
the point is that I don't want to block on the sending side... HWM will be useful in logging error conditions, but I should never end up hitting this limit...
|
[22:34] cremes
|
dermoth: sorry, but your requirements don't make sense to me
|
[22:34] cremes
|
if you could get the queue length, you would probably prevent your pusher from sending more messages if it hit some threshold, right?
|
[22:35] cremes
|
if so, then this is exactly what you can do with HWM
|
[22:35] cremes
|
couple HWM with NO_BLOCK and you'll get your "signal" that there aren't enough workers
|
[22:35] dermoth
|
well I don't really control the pusher... it logs as data comes it. I need to have enough pullers to absorb the data as it comes in
|
[22:35] cremes
|
assuming you ever hit the HWM
|
[22:36] cremes
|
so you don't have any control over throttling the pusher?
|
[22:37] dermoth
|
because if the pushers block, it's not goung to work anymore and I will loose data - the point is that the pushhers have to be non-blocking. But i'll figure out something... maybe even measuring the rss size of the process might work.
|
[22:37] cremes
|
dermoth: let me say it again then.... use send with NO_BLOCK and test for EAGAIN
|
[22:37] cremes
|
this will NOT BLOCK
|
[22:38] dermoth
|
yes I get that ;)
|
[22:38] cremes
|
so what's the problem then? ;)
|
[22:39] dermoth
|
what I mean, EAGIN is bad too - I will log it & alert on it, but I want to know before that happens... If I start getting latency cpikes when I turn worn downstream workers, or during pikes, even before filling up the queues all the way up to the pushers, I want to know it. But that's fine, I'll work with what I have ;)
|
[22:40] cremes
|
ok, i see
|
[22:40] cremes
|
let us know what you come up with or add your solution to the wiki
|
[22:41] dermoth
|
Sure. btw, it having multiple parallel devices wiorking well? i.e. if one crash will it nicely fall back to the other one? Will my XREQ/XREP suffer any latency? More that 2-3 seconds avg per requests may become problematic
|
[22:42] dermoth
|
here actually on the ends it more like a REQ/REP socket, byt the device will be XREQ/XREP to pass & load-balance the messages
|
[22:43] cremes
|
a device is just a fancy packaging for 2 sockets to aid with load balancing
|
[22:43] cremes
|
if you are worried about a device crashing, then messages in flight could be lost
|
[22:43] dermoth
|
i'll probably be running some tests anyway... Thanks
|
[22:44] cremes
|
if you can't handle data loss, then you need to add some code around all of this to ack/nak messages and retry when things timeout or disappear
|
[22:44] cremes
|
none of that is built in to 0mq; you have to build it on top
|
[22:44] dermoth
|
yes... a srever crashing is pretty rare, what i'm concerned about it if the system will keep on working. the REQ/REP are for where I can't handle loss and I will loose the request if the device crash with my message
|
[22:45] cremes
|
and unless you have a really slow link or are sending *very* large messages, i can't imagine it would ever take 2-3 seconds to send a message
|
[22:45] cremes
|
if you use XREQ/XREP and do explicit acks, then you could have things continue to work
|
[22:46] cremes
|
if you use REQ/REP sockets, they enforce a very strict send/recv/send/recv pattern, so a crashed peer would not be recoverable
|
[22:46] dermoth
|
well, i'm thinking possible tcp timeouts, etc. I would assube ZeroMQ is able to handle well peers, but I will test it anyway. Thanks!
|
[22:46] cremes
|
ok, good luck
|
[22:46] dermoth
|
to handle well down peers
|
[22:47] cremes
|
dermoth: TCP timeouts are not exposed to you via the 0mq api
|
[22:48] cremes
|
0mq may see a connection disappear at the tcp level, but that doesn't mean that the socket is no good
|
[22:48] cremes
|
because a 0mq socket can be bound or connected to multiple endpoints
|
[22:48] cremes
|
one endpoint failure does not cause the whole 0mq socket to fail
|
[22:49] cremes
|
the 0mq socket will continue to load balance messages to any "surviving" endpoints
|
[22:51] dermoth
|
ok, sounds good. FWIW, what I have in mind to mointor PUSH/PULL latency is to send a special message with an IP/port to reply withm and have the workers respond with an udp packet. I can release the check as a Nagios plugin like I usually do for my monitoring scripts, but obviously the other end is up to the developer to get right
|
[22:53] cremes
|
sounds neat
|