ZeroMq IRC Log

Tuesday September 7, 2010

[Time] Name	Message
[05:57] lestrrat	I'm having problems using fork() and making my parent process talk to the child processes via 0mq. I'm getting segfaults after the child exits and while the parent is in recv()
[05:57] lestrrat	is this supposed to work?
[06:55] pieterh	sustrik: you there?
[09:51] pieterh	anyone here felt that XREP and XREQ could use better names?
[09:52] lestrrat	I just gave a talk to my coworkers about zeromq, and yes, better_names++
[09:52] pieterh	I was thinking of ROUTE and FORWARD
[09:53] pieterh	XREP creates routing envelopes around incoming messages and uses these on output to route replies back to original clients
[09:54] pieterh	XREQ just forwards messages in both directions without touching them
[09:55] pieterh	XREP really looks like a router... it's the only socket type that lets you address specific connections
[09:55] keffo	I like 'route' very much, not forward so much
[09:55] pieterh	yeah, forward wasn't inspired
[09:55] pieterh	it has to be a verb
[09:56] pieterh	that says "move stuff in both directions but don't mess with it"
[09:56] pieterh	PORT
[09:56] pieterh	XFER
[10:11] pieterh	keffo: here's a thought: http://www.zeromq.org/sandbox:mudem
[10:11] pieterh	lestrrat: does that ring a bell?
[10:13] lestrrat	thinking
[10:15] lestrrat	hmm. I grok ROUTE, but not the mudem part :) but I don't have a great alternative plan either
[10:15] lestrrat	naming is hard, eh.
[10:17] pieterh	well, mudem is a play on modem, modulator/demodulator...
[10:17] keffo	pieterh, I have a 'route-codec' in my code...
[10:17] lestrrat	yeah, I know
[10:17] pieterh	i don't like invented words but if one has to invent them they should be expressive
[10:18] keffo	whichi encodes/decodes routes for an xreq
[10:18] pieterh	keffo: sounds right
[10:18] pieterh	creates and uses envelopes, right?
[10:20] keffo	it just handles them, contains a payload the 'enduser' is interested in, and also has a sendroute function
[10:20] pieterh	well, an alternative to mudem: dispatch
[10:21] keffo	naa, someone who implements any type of loadbalancing of messages is in essense a dispatcher, imo
[10:21] pieterh	true
[10:22] pieterh	it's the combo of fanout and fanin
[10:23] pieterh	i think "multiplex" is wrong since that suggest copying whereas its distribution
[10:23] pieterh	*it's
[10:23] lestrrat	yeah, I thought about multiplex, but it didn't quite fit
[10:24] pieterh	in terms of use cases, xreq is like push+pull, it ventilates and sinks at once
[10:25] pieterh	one could create a nice pipeline pattern using just XREQ to XREQ
[10:27] pieterh	how about... something more visual... 1TON
[10:27] lestrrat	1000kg!
[10:28] pieterh	yeah
[10:28] pieterh	1-to-N for the pedantic of us
[10:30] pieterh	http://www.zeromq.org/sandbox:1ton
[10:31] pieterh	it kind of feels more like a building block now
[10:42] keffo	I have an issue where my worker process simply dissapears, but I cant seem to trap it
[10:42] keffo	no exceptions, no atexits are run.. nada..
[10:42] pieterh	what OS?
[10:42] keffo	very annoying!
[10:43] keffo	win7
[10:43] pieterh	ah, that is a known problem
[10:43] keffo	hu?
[10:43] pieterh	the usual solution is to upgrade to Linux
[10:43] pieterh	sorry :-)
[10:43] keffo	caused by zmq??
[10:44] keffo	tossing away 98% of the global userbase is hardly an upgrade btw :)
[10:44] pieterh	I was kidding, my bad
[10:44] pieterh	what language are you using?
[10:44] keffo	c++, lua
[10:44] pieterh	so you need a debug build of 0MQ IMO
[10:44] keffo	oh it's all debug, debugger is attached too :)
[10:44] pieterh	aw :-(
[10:45] keffo	gives me nothing.. I've tried all routes I can think of
[10:45] keffo	abort()?
[10:45] keffo	but why would that be called?
[10:45] pieterh	DebugBreak() afair, then continue it in the debugger
[10:45] pieterh	... assertion failure?
[10:45] keffo	nada :)
[10:46] keffo	no asserts. no breakpoints, simply dissapears.. windows event log shows nothing
[10:46] keffo	pussling actually
[10:46] keffo	no exceptions are raised either
[10:46] keffo	It's as if the app cleanly exits, except it cant since it"s a while(true)
[10:46] pieterh	it could exit in another thread I guess
[10:47] pieterh	i've not worked on win32 for ages... maybe someone else here can be more helpful
[10:47] keffo	does zmq ever call exit?
[10:47] keffo	(or abort)
[10:48] pieterh	nada
[10:48] pieterh	asserts, yes
[10:48] pieterh	98%? keffo, 2010 is the Year of Linux
[10:48] pieterh	it's no more than 97.85% by now
[10:49] keffo	if my mom could use any generic desktop linux without calling me, then I"d agree :)
[10:50] pieterh	hah, my mum actually does use linux and has for years...
[10:51] pieterh	but then again she's currently asking me how to hide her IP address so she can troll Anonymous so perhaps she's not typical...
[10:51] pieterh	keffo: if you can make a reproducible case, and chop it down, maybe we can reproduce it on another platform
[10:52] keffo	lord no, that would take ages :)
[10:52] keffo	I just want to somehow detect -when- it happens, and go from there, but so far I've been unable to
[10:52] pieterh	then, my friend, you might have to resort to...
[10:53] pieterh	if really you have no other option...
[10:53] keffo	print? =)
[10:53] pieterh	yeah :-)
[10:53] pieterh	don't forget the fflush (stdout);
[10:54] keffo	It's remarkably reproducable though
[10:55] pieterh	well, that's always good
[10:55] keffo	4th time I calculate pi, it dissapears
[10:55] pieterh	hopefully it remains stable as you add hundreds of prints
[10:55] pieterh	you're calculating pi?
[10:57] keffo	printing is not the problem, I generate ~250k of logs on each run :)
[10:57] keffo	pi yeah, easy and verifiable thing to calc distributed :)
[10:58] pieterh	is there an algo for distributed pi calculation somewhere?
[10:59] keffo	sure
[10:59] keffo	tons of different I guess
[10:59] pieterh	i had an idea for a supermassive 0MQ project... lol
[10:59] pieterh	not original but who cares...
[10:59] keffo	as did I :)
[10:59] keffo	for i=self.beginspan, self.endspan do
[10:59] keffo	localpi = localpi + (1.0 / (i * 4.0 + 1.0) )
[10:59] keffo	localpi = localpi - (1.0 / (i * 4.0 + 3.0) )
[11:00] keffo	do that for each subspan of some arbitrary length, then sum them all up and the answer * 4 is pi :)
[11:00] pieterh	not 42? weird...
[11:00] keffo	hehe
[11:00] pieterh	aight, so if we have a server somewhere that distributes workloads, and a simple 0MQ client that accepts them...
[11:01] pieterh	has surely been done dozens of times
[11:02] keffo	I'm doing a fairly more complicated scenario, but yeah
[11:05] keffo	and I guess you can figure out why I want something a bit more flexibe than roundrobin :)
[11:06] pieterh	right now I'm writing examples on how to use XREP to do routing
[11:06] keffo	good, it was messy :)
[11:06] pieterh	how so?
[11:06] pieterh	you mean no documentation on the envelopes etc.?
[11:06] keffo	just not very nicely explained
[11:06] pieterh	right...
[11:13] keffo	I would explain it as a stack+req...
[11:13] keffo	push, push, push, payload, then pop,pop,pop, payload on the other side
[11:14] keffo	btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
[11:21] keffo	btw, what happens in a queue device if a client never reconnects? will the msg linger indef.?
[11:21] pieterh	hmm, you mean a reply?
[11:21] pieterh	with or without identity?
[11:22] pieterh	this is what 0MQ/2.1 is fixing
[11:22] pieterh	it will wait in some cases, discard in other cases
[11:22] keffo	in general, the whole reconnect business
[11:22] keffo	if it goes into the queue but never out, what happens to it?
[11:23] pieterh	well, the queue is per socket, eventually
[11:24] pieterh	there is not yet a proper explanation of how the 2.1 socket close semantics should work
[11:24] pieterh	afaik
[11:24] keffo	client(100msgs) -> queuedev -> service, then back again, except the client is gone forever..
[11:24] pieterh	anonymous clients -> messages get thrown away
[11:25] pieterh	client with identity -> messages persist as long as service is running
[11:25] keffo	ok
[11:25] pieterh	0MQ does have the concept of a connection going away
[11:26] pieterh	otherwise PUB sockets for example would end up with horrid resource leaks
[11:26] keffo	I need to introduce some sort of session.. if I have (known)client A, does a bunch of test junk(like pi), but aborts prematurely, but then reconnects to start some other type of job, I dont want to receive a bunch of old pi results :)
[11:27] keffo	and I would need to be able to tell all parties involved to dump everything related to an "old" session as well
[11:28] pieterh	keffo: this starts to be industrial design work
[11:29] keffo	pieterh, what do you mean?
[11:29] pieterh	i mean, what you're making is heavy duty...
[11:30] keffo	oh very much :)
[11:30] keffo	it has fried my brain on may occasions.. Tons of papers of diagrams spread all over the place :)
[11:30] pieterh	if you have budget to throw at it, i can recommend an industrial 0MQ designer like Mato here
[11:31] keffo	the bulk of the work is not the transport & topology though
[11:31] keffo	although that needs to obviously be stable
[11:31] pieterh	well, you need an infrastructure that understands 'sessions'
[11:32] keffo	Sure, but that"s already handled
[11:32] pieterh	what do you still need then?
[11:32] pieterh	apart from the thing not crashing...
[11:32] keffo	hehe
[11:32] keffo	lingering data trying to reconnect for one.
[11:33] keffo	oh and more liberal means of implementing loadbalancing, but I've made that point already :)
[11:34] pieterh	well, load balancing using XREP routing is pretty clear, and will be nicely explained in Ch3 of the Guide
[11:34] keffo	will be? =)
[11:34] pieterh	is in progress if I was not chatting here :-)
[11:34] pieterh	anything to do with maintaining overall state is a different kettle of chicken, though
[11:34] keffo	But I think I know what that will say by now :)
[11:35] pieterh	hopefully, yeah
[11:35] keffo	what I'm doing is something I've thought about for years though, so it should work :)
[11:35] keffo	zmq solved a big gaping questionmark though :)
[11:36] keffo	I might be getting a job soon though, so dev on this wll sadly be sidetracked to weekends and evenings only though :/
[11:37] pieterh	is it open source?
[11:38] keffo	it might be eventually!
[11:38] keffo	would benefit the lua community I guess
[11:38] pieterh	well... i've learned two relevant things here having done software for way too long
[11:39] pieterh	a. if it's not open source it will die
[11:39] pieterh	b. if you don't start as open source you can't make it work afterwards
[11:40] keffo	I dont agree, and I've never done anything other than software :)
[11:40] pieterh	cause it's not about building code but about building community...
[11:40] pieterh	good luck, anyhow
[11:41] keffo	I wasnt thinking of opensource as in leveraging resources, but simply aiding someone else, when I'm done with it :)
[11:41] pieterh	nah, without people who helped make the code, it dies as soon as your evenings and weekends aren't available any more
[11:42] pieterh	it's not about leveraging resources but about software that lives past the "free time" of its creator
[11:43] pieterh	imho
[11:43] keffo	Oh, it's not free time at all, but I need to work for a bit to not starve :)
[11:43] pieterh	starving is not pleasant, no
[11:44] keffo	Not so much starving, but keeping girlfriend happier :)
[11:45] keffo	when this is done, it will take me & the other dude about 2-3 weeks to produce the actual product we'll eventually sell.. -That- part is already planned and so forth..
[11:45] keffo	And ones that happens, there is no benefit to keep the code not opensource
[11:46] keffo	err, once..
[11:46] pieterh	:-) good spellng is hrad somtimes
[11:55] keffo	um, this is odd
[11:56] keffo	I wonder if lua might freak a little at some weird binary
[13:51] CIA-20	zeromq2: 03Martin Sustrik 07master * r6d4ffd9 10/ (src/fq.cpp src/lb.cpp): Bug in fq_t and lb_t (when used via ZMQ_EVENTS option) fixed - http://bit.ly/cvOPzL
[15:14] CIA-20	zeromq2: 03Martin Sustrik 07master * rf374431 10/ src/pipe.hpp : get rid of 'has virtual functions but non-virtual destructor' warnings in pipe.hpp - http://bit.ly/9Relxm
[15:21] Tasser	cremes, it's more about the ruby part you wrote
[15:21] cremes	Tasser: i'm around if you have questions
[15:21] Tasser	cremes, oh, just asking for the big picture
[15:22] Tasser	aka what is where, how to stuff flows
[15:22] cremes	sure...
[15:22] Tasser	and probably write it down into your git
[15:22] Tasser	HACKING or something like that :-)
[15:22] cremes	whatever i write here, i'll clean up and add to the README
[15:23] cremes	ZM::Reactor is a thread that contains a single ZMQ context
[15:23] cremes	from this context, you can create any kind of socket
[15:23] cremes	(stop me if i'm not answering your question)
[15:24] cremes	during socket creation, you pass a ruby object that will act as that socket's handler
[15:24] Tasser	callback?
[15:24] cremes	the handler should provide on_attach, on_writable and on_readable methods
[15:25] cremes	the on_attach method is called right away and lets you set things up (kind of like a constructor)
[15:25] Tasser	so why not #new ?
[15:25] cremes	the on_readable and on_writable methods are called when the socket is polled for those events and finds them to be true
[15:26] cremes	explain what you mean by "not #new"?
[15:27] Tasser	create a new instance per socket, so call #new and on that instance #on_writable, #on_readable
[15:29] cremes	the handler instance is just a regular ruby class that implements the 3 methods i mentioned
[15:29] cremes	it has a constructor (def initialize(*args) nil; end) just like any other class
[15:30] Tasser	less abstraction than EM
[15:30] cremes	you could use one instance of a class to manage multiple sockets; look at the one-handed-ping-pong example
[15:30] Tasser	yeah, having that one here atm
[15:30] cremes	yeah, EM is kind of confusing with the EM::Connection stuff
[15:31] Tasser	meh, gotta go :-(
[15:31] cremes	sure
[15:31] cremes	i'm usually on irc from 8am to 5pm central standard time (gmt -6, i think)
[15:32] cremes	ping me if you have more questions or send them to the 0mq ml
[15:33] bbigras	Is there a way to build zeromq with mingw?
[15:33] cremes	bbigras: luislavena (rubyinstaller.org guy) has been playing with that
[15:34] cremes	he opened an issue on github to fix a problem he encountered
[15:34] cremes	so as far as i know he succeeded
[15:34] bbigras	cremes: nice, thanks!
[15:35] bbigras	cremes: Do you know if anyone had success using zeromq with Qt without having to disable Qt's signal/slot macros?
[15:36] cremes	i haven't heard anything about that, so no
[15:36] cremes	you might try asking on the 0mq ML
[15:38] bbigras	cremes: thanks
[18:17] ModusPwnens	hi
[18:17] ModusPwnens	Is there anyone in here that has used google protobufs with zeromq? I'm wondering what kind of throughput is normal when using google protobufs
[18:20] cremes	ModusPwnens: i recommend you write a small benchmark that serializes/deserializes your data
[18:20] cremes	and see what the upper limit is on your message rate
[18:20] cremes	then 0mq will have that as its upper limit for throughput
[18:41] ModusPwnens	yeah i did this cremes
[18:41] ModusPwnens	I am just wondering whether or not my results are expected
[18:42] ModusPwnens	Hmm, well actually, I benchmarked it with zeromq too
[18:42] ModusPwnens	so it's timing how long it takes to send messages as well
[18:44] cremes	ModusPwnens: yeah, take 0mq out of the equation to get an upper bound
[18:44] cremes	then you can test with 0mq to see what kind of overhead it is introducing
[18:57] ModusPwnens	Hmm, another thing, I am having somewhat surprising results with the remote/local throughput tests
[18:58] ModusPwnens	I am just using localhost, but I only get around 200 Mb/s throughput, which seems low to me.
[18:59] cremes	ModusPwnens: try increasing the message body size from 50 bytes (from the example you posted last week) to something larger
[18:59] cremes	also, note that the remote/local tests are doing a ping pong with REQ/REP sockets
[19:00] cremes	you could see higher throughput on a PUB socket
[19:01] ModusPwnens	Yeah, I have tried with larger message sizes. 5000 byte and 2500 count messages
[19:01] ModusPwnens	is 200mb/s normal on localhost? I would have thought it would be much much faster
[19:01] cremes	and what did you see?
[19:01] cremes	with the varying message sizes...?
[19:02] ModusPwnens	I see about 200mb/s or 5000 messages/s
[19:02] cremes	so you see the same throughput regardless of message size?
[19:03] ModusPwnens	not really, when the parameters are smaller I see different values
[19:03] ModusPwnens	when i said 200 earlier i was using these paremeters
[19:03] cremes	so what does 200 MB/s represent? the best that you see, the average or the worst?
[19:04] ModusPwnens	about the average
[19:04] ModusPwnens	i see 180 sometimes, sometimes 220
[19:04] cremes	this is on windows, right?
[19:04] ModusPwnens	Yeah. I was wondering if it would be faster on linux
[19:05] cremes	sometimes; it's hard to draw conclusions because this stuff is so dependent on OS and the hardware
[19:05] ModusPwnens	Yeah. I notice that the official results on the zeromq website are ridiculously high
[19:05] ModusPwnens	but I'm thinking that's because they have a godly computer with 12 gigs of ram and 4 cores
[19:05] ModusPwnens	i only have 1 core on this computer
[19:06] ModusPwnens	as well as only 3 gigs of ram
[19:06] icy	size of ram does not matter, speed does
[19:06] ModusPwnens	I know that has an effect, but I'm not sure how large an effect that would be.
[19:06] cremes	that computer is ancient if it has only 1 core; i don't think intel has shipped a 1-core desktop cpu since around 2006
[19:07] ModusPwnens	it's actually relatively new
[19:07] ModusPwnens	amd sempron m100
[19:07] ModusPwnens	which i think has only 1 core
[19:08] cremes	amd is behind the curve; sorry :)
[19:09] ModusPwnens	Heh, apparently.
[19:09] ModusPwnens	Anyways, is that sort of throughput expected?
[19:09] cremes	again, it's dependent upon OS and hardware
[19:09] ModusPwnens	Hmm. So it's at least not abnormal?
[19:10] cremes	if you have nothing to compare it to...?
[19:10] guido_g	http://answers.yahoo.com/question/index?qid=20091030200427AAjvYJw <- "That is a pretty decent mobile single core processor."
[19:10] ModusPwnens	Ah, well there you go i guess.
[19:11] icy	well, what cpu usage do you get while benchmarking?
[19:11] icy	maybe it's just really slow ram :)
[19:12] ModusPwnens	i get 100% CPU usage
[19:13] ModusPwnens	Does that mean the RAM is just slow?
[19:13] Steve-o	what parameters are you using, I can compare with a single core Xeon right now
[19:13] ModusPwnens	ok
[19:13] ModusPwnens	for the cpu usage test I just ran
[19:13] ModusPwnens	I used
[19:14] ModusPwnens	5000 byte messages
[19:14] ModusPwnens	and 250,000 message count
[19:14] ModusPwnens	all on localhost
[19:15] Steve-o	I get 39,031 msgs/s and 1561 Mb/s
[19:15] guido_g	./local_thr tcp://127.0.0.1:5000 1024 100000
[19:15] guido_g	message size: 1024 [B]
[19:15] guido_g	message count: 100000
[19:15] guido_g	mean throughput: 381327 [msg/s]
[19:15] guido_g	mean throughput: 3123.831 [Mb/s]
[19:15] guido_g	also a notebook
[19:16] ModusPwnens	Hmm, that is definitely much higher than what I am getting.
[19:16] ModusPwnens	how much ram do you have and what kind of processor?
[19:16] ModusPwnens	guido that is
[19:16] guido_g	Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz
[19:17] guido_g	ram usage is not a problem at all
[19:17] guido_g	it's more a matter of cache and latency
[19:17] ModusPwnens	Latency shouldn't really be a problem on localhost though...right?
[19:18] guido_g	ram latency
[19:18] guido_g	sigh
[19:19] icy	and bandwidth
[19:19] ModusPwnens	Oh..sorry about that, i misunderstood..
[19:19] guido_g	message size: 1024 [B]
[19:19] guido_g	message count: 10000
[19:19] guido_g	mean throughput: 98001 [msg/s]
[19:19] guido_g	mean throughput: 802.824 [Mb/s]
[19:20] guido_g	via lan
[19:20] guido_g	so obviously you're on a dog slow machine
[19:20] icy	or windows :P
[19:21] guido_g	hrhrhr
[19:22] icy	nice webserver choice for zeromq.org
[19:23] ModusPwnens	Hmm thanks. I will try to procure another computer to test this on.
[19:23] guido_g	icy: you mean wikidot?
[19:24] icy	I mean the lighttpd part :)
[19:31] icy	hm local_thr does not seem to do anything on my osx box
[19:49] cremes	icy: you need to run local_thr and remote_thr as a pair; one is the client and the other is the server
[19:49] icy	ah right, thx
[19:53] icy	ouf, this thing just sent me 1gb into swap
[19:54] icy	sending 1000000 1kb messages
[19:54] icy	I guess they get buffered in ram
[19:57] cremes	icy: the receiver must not have pulled them off the queue fast enough
[19:57] cremes	when i run those tests on my system, memory usage is constant (no queueing)
[19:58] cremes	why don't you pastie the arguments you passed to both programs so we can comment
[19:59] icy	tcp://127.0.0.1:5000 1024 1000000 for both
[19:59] icy	maybe I should start local_thr before remote_thr :)
[20:00] cremes	try lowering the 1 million to 10 thousand and monitor the memory size of the programs
[20:00] cremes	yeah, start order is important...
[20:01] icy	doing that I get it to work even though it hits into swap briefly (understandable, the sender will always be faster)
[20:02] cremes	icy: not true; this test is using REQ/REP sockets so it should only have 1 message in flight at any given time
[20:02] cremes	one sender should not be able to get ahead of the other
[20:03] cremes	(i was wrong in my statement from 2:57; no queueing should occur)
[20:03] icy	I get 40mb ram usage with 100k messages
[20:03] cremes	are you running the C programs or using the samples from another language binding?
[20:04] icy	perf/ <- the ones in there which are C I think
[20:05] cremes	did you modify the code at all?
[20:05] icy	no, downloaded tarball, ./configure, make and ran the apps
[20:05] cremes	huh... what OS?
[20:05] icy	osx
[20:06] cremes	2.0.9?
[20:06] cremes	0mq, that is
[20:06] icy	yea
[20:06] Samy	I see a lot of emphasis on lock-free algorithms on the ZeroMQ website.
[20:06] cremes	weird
[20:06] cremes	it should not have unbounded memory growth
[20:06] cremes	you should file a bug
[20:06] Samy	What lock-free objects does ZeroMQ use? Where would the source-code be for them?
[20:07] Samy	The atomics interface seemed too simple to support more complex data structures, though I wasn't looking at the right thing.
[20:07] cremes	Samy: check out the y_pipe stuff... i believe that is where the lock-free algorithms are used though sustrik would know better (he wrote it)
[20:07] icy	http://singularity.cryosphere.de/pub/remote_thr.png (at this point local_thr is long gone already)
[20:07] Samy	cremes, cool. Does sustrik IRC?
[20:08] cremes	Samy: yes... he's usually in channel but he isn't here right now
[20:08] Samy	Ok, thank you.
[20:08] icy	the real mem does not show all the memory it uses as I'm already several hundred mb into swap at the time the screenshot was made
[20:08] cremes	icy: that shouldn't be; i would open a bug and describe the problem
[20:09] cremes	make sure to include 0mq release, OS, OS release, etc
[20:09] icy	k
[20:10] cremes	icy: hold on a sec...
[20:11] cremes	were you doing local_thr or local_lat as your perf test?
[20:14] icy	thr
[20:16] cremes	the local_thr/remote_thr examples don't make any sense
[20:17] cremes	the remote_thr program is using a REQ socket while the local_thr is using a SUB socket
[20:17] cremes	the two are not compatible
[20:18] icy	I'm totally knew to zeromq, just saw this benchmark app in the src dir and thought I'd give it a go :)
[20:18] cremes	nm... i'm looking at the wrong stuff
[20:18] cremes	argh...
[20:19] cremes	okay, so remote_thr uses a PUB socket while local_thr uses a SUB
[20:19] cremes	that is correct
[20:19] cremes	you need to start local_thr first
[20:19] cremes	remote_thr will slam your system by publishing as fast as possible, so there will be queueing
[20:20] cremes	(i was thinking of the local_lat/remote_lat examples which uses different socket types)
[20:20] icy	I did start local_thr first and even after that execited, remote_thr was allocating ram
[20:21] icy	s/execited/exited/
[20:21] cremes	yeah, i'm looking at it now...
[20:23] cremes	if everything is working correctly, remote_thr should exit first
[20:55] dermoth	I'm wondering if there's an easy way to monitor a queue size on a zeromq worker... It doesn't seems like there is anything in the API
[20:55] dermoth	on a zeromq broker I mean
[20:55] dermoth	i.e. a device
[21:02] ModusPwnens	whoa
[21:02] ModusPwnens	im getting a strange error with the benchmarking tests now
[21:04] dermoth	my concern is that the PUSH workers may send more messages than can be processed by the PULL workers, eventually filling up the queue. This would be possible if the PULL workers sync their state to disk to avoid data loss...
[21:12] ModusPwnens	hey does zeroMQ allocate memory for messages all at once?
[21:13] ModusPwnens	Or maybe it is because I am using the publish subscribe topology so it just continuously creates messages and the receiving end is not fast enough..
[22:23] cremes	ModusPwnens: yes to the second thing you said; the publisher outpaces the subscriber
[22:23] cremes	dermoth: no, there is no way to fetch the queue size; check the mailing list for the reasons why
[22:23] cremes	that topic has been raised and answered a bunch of times (someone should add it to the FAQ)
[22:24] cremes	dermoth: also, check out HWM (high water mark) settings for the PUSH sockets
[22:24] cremes	by setting HWM, it will block when the queue hits that message level (or return EAGAIN if you try sending with ZMQ_NOBLOCK)
[22:26] dermoth	yes, but I need to know berore i'll ve blocking... I guess I could test the latency though
[22:26] cremes	dermoth: what do you mean that you need to know before?
[22:26] cremes	and what does latency have to do with it?
[22:27] dermoth	well, if my queues start filling up at peak times I want to react before all pushers block... my primary goal in using zmq is to avoid blocking
[22:28] cremes	ok, then use send with ZMQ_NOBLOCK and test for EAGAIN; when you get it then you know you have hit your high water mark
[22:28] cremes	and you can take whatever action is necessary
[22:29] dermoth	cremes, if the queue fill up on the device, then there will be some latency between the time I push to the queue and the time my worked gets the event. I can push a special message and have whichever worker gets it respond to me, then I know the latency. it it rises thern I need more workers downstream
[22:29] cremes	ok
[22:30] cremes	you could also have another pair of sockets where each worker tells the server/pusher that it has received a message
[22:30] cremes	using this "out of band" communication, you could publish only those messages that can be immediately handled by a worker
[22:31] cremes	you would have at most 1 message in someone's queue because you would not push another one until each one had been acknowledged
[22:31] cremes	that seems better than trying to rely on some weird latency calculation that might not be trustworthy
[22:31] dermoth	i'll implement XREQ/XREP for thing I need to make sure a worker is getting it... the rest is high-throghtput stuff that can suffer a small percentage loss...
[22:32] cremes	definitely take a look at HWM
[22:32] cremes	i think it does what you need
[22:33] dermoth	the point is that I don't want to block on the sending side... HWM will be useful in logging error conditions, but I should never end up hitting this limit...
[22:34] cremes	dermoth: sorry, but your requirements don't make sense to me
[22:34] cremes	if you could get the queue length, you would probably prevent your pusher from sending more messages if it hit some threshold, right?
[22:35] cremes	if so, then this is exactly what you can do with HWM
[22:35] cremes	couple HWM with NO_BLOCK and you'll get your "signal" that there aren't enough workers
[22:35] dermoth	well I don't really control the pusher... it logs as data comes it. I need to have enough pullers to absorb the data as it comes in
[22:35] cremes	assuming you ever hit the HWM
[22:36] cremes	so you don't have any control over throttling the pusher?
[22:37] dermoth	because if the pushers block, it's not goung to work anymore and I will loose data - the point is that the pushhers have to be non-blocking. But i'll figure out something... maybe even measuring the rss size of the process might work.
[22:37] cremes	dermoth: let me say it again then.... use send with NO_BLOCK and test for EAGAIN
[22:37] cremes	this will NOT BLOCK
[22:38] dermoth	yes I get that ;)
[22:38] cremes	so what's the problem then? ;)
[22:39] dermoth	what I mean, EAGIN is bad too - I will log it & alert on it, but I want to know before that happens... If I start getting latency cpikes when I turn worn downstream workers, or during pikes, even before filling up the queues all the way up to the pushers, I want to know it. But that's fine, I'll work with what I have ;)
[22:40] cremes	ok, i see
[22:40] cremes	let us know what you come up with or add your solution to the wiki
[22:41] dermoth	Sure. btw, it having multiple parallel devices wiorking well? i.e. if one crash will it nicely fall back to the other one? Will my XREQ/XREP suffer any latency? More that 2-3 seconds avg per requests may become problematic
[22:42] dermoth	here actually on the ends it more like a REQ/REP socket, byt the device will be XREQ/XREP to pass & load-balance the messages
[22:43] cremes	a device is just a fancy packaging for 2 sockets to aid with load balancing
[22:43] cremes	if you are worried about a device crashing, then messages in flight could be lost
[22:43] dermoth	i'll probably be running some tests anyway... Thanks
[22:44] cremes	if you can't handle data loss, then you need to add some code around all of this to ack/nak messages and retry when things timeout or disappear
[22:44] cremes	none of that is built in to 0mq; you have to build it on top
[22:44] dermoth	yes... a srever crashing is pretty rare, what i'm concerned about it if the system will keep on working. the REQ/REP are for where I can't handle loss and I will loose the request if the device crash with my message
[22:45] cremes	and unless you have a really slow link or are sending very large messages, i can't imagine it would ever take 2-3 seconds to send a message
[22:45] cremes	if you use XREQ/XREP and do explicit acks, then you could have things continue to work
[22:46] cremes	if you use REQ/REP sockets, they enforce a very strict send/recv/send/recv pattern, so a crashed peer would not be recoverable
[22:46] dermoth	well, i'm thinking possible tcp timeouts, etc. I would assube ZeroMQ is able to handle well peers, but I will test it anyway. Thanks!
[22:46] cremes	ok, good luck
[22:46] dermoth	to handle well down peers
[22:47] cremes	dermoth: TCP timeouts are not exposed to you via the 0mq api
[22:48] cremes	0mq may see a connection disappear at the tcp level, but that doesn't mean that the socket is no good
[22:48] cremes	because a 0mq socket can be bound or connected to multiple endpoints
[22:48] cremes	one endpoint failure does not cause the whole 0mq socket to fail
[22:49] cremes	the 0mq socket will continue to load balance messages to any "surviving" endpoints
[22:51] dermoth	ok, sounds good. FWIW, what I have in mind to mointor PUSH/PULL latency is to send a special message with an IP/port to reply withm and have the workers respond with an udp packet. I can release the check as a Nagios plugin like I usually do for my monitoring scripts, but obviously the other end is up to the developer to get right
[22:53] cremes	sounds neat