ZeroMq IRC Log

Tuesday October 19, 2010

[Time] Name	Message
[02:28] driethr	hello. do I have to zmq_setsockopt for ZMQ_IDENTITY before the zmq_connect is done? there is no chance to reassign it, before sending a message?
[07:43] sustrik	driethr: no, no chance
[09:20] omarkj	Morning.
[09:24] keffo	Morning
[09:24] keffo	What is it Sebastian? I'm arranging matches
[09:24] keffo	(sorry, felt brittish!) :)
[09:58] omarkj	Haha, it's fine.
[12:43] omarkj	Any idea what can cause a message not getting sent to a ZMQ_PUB socket?
[12:43] omarkj	When I'm publishing a few thousand messages, I sometimes have to retry sending each one up to ~600 times before they're actually published
[13:42] drbobbeaty	Question about an assertion in the code: If I get the log message: "Assertion failed: *tmpbuf > 0 (zmq_decoder.cpp:60)" -- this is in the 2.0.9 release version.
[13:42] drbobbeaty	What should I be looking at for the root cause?
[13:43] drbobbeaty	I'm using a simple "epgm://" PUB/SUB system and this is on the SUB-side.
[13:47] drbobbeaty	The code seems to indicate that this is indicating that there's nothing in the message to decode, but at the same time, the code has a TODO to indicate that there needs to be work done on the handling of oversized messages.
[13:48] drbobbeaty	So I'm wondering if there are limits I need to adhere to in message sizes when using the epgm:// transport?
[13:53] cremes	omarkj: what version of 0mq? what platform (linux, osx, windows)? and do you have a small code example that shows the problem?
[13:55] omarkj	Using version 2.10 og Linux (Ubuntu to be precise). Publishing from an erlang process.
[13:59] cremes	omarkj: it's possible there is a bug in 2.1.0 that you have uncovered; it hasn't been released yet
[13:59] cremes	do you see the same problem with 2.0.10?
[13:59] omarkj	Ah.
[13:59] omarkj	I'll have to try it.
[13:59] cremes	that's the last release of the 2.0 series
[14:00] cremes	honestly, it's more likely there is a bug in your code; unfortunately, i can't read erlang otherwise i'd lend a hand
[14:00] cremes	the most likely culprits are...
[14:00] cremes	1. starting your publisher before you start the subscribers; all messages get dropped due to a timing issue
[14:00] cremes	2. forgetting to call setsockopt(ZM_SUBSCRIBE, topic) and setting a subscriber topic
[14:01] cremes	3. not sleeping at the end of your publisher's send; the subscriber doesn't have time to fetch messages before the queue is released
[14:01] Andreas	Hi everybody!
[14:02] Andreas	I got a question regarding pub/sub. Maybe somebody can help me with that ....
[14:02] Andreas	Is there any possibilty to get noticed whether a subscriber got disconnected?
[14:02] guido_g	no
[14:03] guido_g	you have to do it yourself
[14:03] Andreas	is something planned for future releases?
[14:04] omarkj	cremes: I guess I'm not sleeping for the ms after the send..
[14:05] cremes	omarkj: try that and see what happens... sleep for 10s just for kicks
[14:06] omarkj	Why is that needed by the way? To flush or something like that?
[14:17] cremes	omarkj: the publisher would exit before the subscriber could process all of the messages; that's all
[14:17] cremes	omarkj: read the guide if you haven't yet; it answers pretty much all of the basics: http://zguide.zeromq.org/chapter:all
[14:22] sustrik	drbobbeaty: are you using multi-part messages?
[14:22] omarkj	No, I have, just wondering,
[14:23] drbobbeaty	No, I'm just making a single message and sending it. If it's multi-part, then it's by default as I'm using a very simplistic interface to ZeroMQ.
[14:23] sustrik	hm, it looks like a bug, can you provide a minimal test case to reproduce it?
[14:24] drbobbeaty	I'm actually going to try 2.0.10 now as I just noticed that it's been released. If that has the same issue, I'll send something to the mailing list.
[14:25] sustrik	drbobbeaty: thanks
[14:27] sustrik	omarkj: with 2.1 the sleep at the end of the program should not be needed
[14:27] sustrik	zmq_term just blocks until out outbound messages are sent
[14:27] omarkj	Okay.
[14:28] sustrik	omarkj: have you ZMQ_HWM option set?
[14:29] omarkj	sustrik: Yup, it's set at one. Maybe raise that.
[14:29] omarkj	I don't remember if I tried doing that.
[14:29] sustrik	that's the problem
[14:29] sustrik	if you have buffer of size 1
[14:29] sustrik	it can store 1 message
[14:29] pieterh	omarkj: yes, you're basically just dropping stuff at the pub side
[14:29] omarkj	Okay.
[14:29] pieterh	HWM=1 is for specialized request-reply cases or if you are using SWAP
[14:30] omarkj	I see, silly me. I'll give it a try.
[14:30] pieterh	just remove all HWM settings
[14:30] omarkj	I was having problems with the server crashing badly after some time I did that.
[14:30] pieterh	then remove IDENTITY settings on the subscribers
[14:31] pieterh	read the section on durable subscribers in the guide
[14:31] pieterh	if you create durable subscribers you must set a HWM but something like 10,000 is reasonable depending on message size and rate
[14:31] pieterh	actually with durable subscribers the server will eventually crash anyhow since there's no concept of ending a durable subscriber
[14:32] sustrik	even if there's no identity set
[14:32] sustrik	if the sender is faster than subscribed
[14:32] pieterh	yes, indeed
[14:32] sustrik	the buffer would eventually grow out of memory
[14:32] pieterh	what the guide says is "set HWM in a serious publisher"
[14:33] pieterh	but not 1 :-)
[14:33] omarkj	Haha, yes, I must have changed it to one during some bug hunting or something.
[15:52] drbobbeaty	Question: I have a user that's trying to run/debug a ZeroMQ app I've written in NetBeans 6.7.1. In gdb it runs fine, but in NetBeans it doesn't. I've never used NetBeans and can't really help because it works in the shell and in gdb. Has anyone ever heard of any issues with ZeroMQ and NetBeans? Assuming this guy can run it in gdb...
[15:54] DerGuteMoritz	AFAIK IDEs like NetBeans tend to mess with PATH
[15:54] DerGuteMoritz	maybe he needs to change some setting first
[15:54] DerGuteMoritz	I bet this is on Windows?
[16:58] drbobbeaty	No, NetBeans on Linux
[16:59] drbobbeaty	Seems to have a decent path as it's starting to run, and the LD_LIBRARY_PATH is set, but thanks for the idea. We'll have to keep checking things.
[19:19] sustrik	drbobbeaty: hi
[19:19] sustrik	still there?
[19:19] drbobbeaty	Yup
[19:19] sustrik	i am not sure i understand your use case exactly
[19:19] drbobbeaty	OK... I'll explain.
[19:19] sustrik	how many publisher socket are there
[19:19] sustrik	?
[19:20] drbobbeaty	At the current time - the network will have probably 270. Not all on one process - maybe 4 to 10 per process.
[19:21] sustrik	i mean in the test where you are seeing the problem
[19:21] drbobbeaty	The idea is to distribute the exchange tick data on different multicast addresses to allow the switches to "squeltch" the traffic if it's not subscribed for.
[19:21] drbobbeaty	In the test, there are four open publishers.
[19:21] drbobbeaty	Two of which are lines 80 and 81 in the gist.
[19:21] sustrik	four PUB sockets?
[19:21] drbobbeaty	Yup.
[19:21] drbobbeaty	in one process.
[19:22] drbobbeaty	In the other process, there is one SUB socket with 27 "connections"
[19:22] sustrik	how many connects/binds on each PUB socket?
[19:23] drbobbeaty	One socket - One connect. It's using epgm:// so I didn't think I needed a bind() - at least I didn't see it in the examples.
[19:23] sustrik	sure
[19:23] sustrik	so each PUB socket conects to a different multicast group, right?
[19:24] drbobbeaty	Yup - exactly right.
[19:24] sustrik	so you have 4 PUB sockets and 4 mutlicast groups
[19:24] drbobbeaty	Yes... in the transmitter process.
[19:24] sustrik	good
[19:25] sustrik	now, in the gist i see you connect SUB socket to ~20 multicast groups
[19:25] sustrik	meaning 16 of them are idle
[19:25] sustrik	right?
[19:25] drbobbeaty	Actually, of the 4 PUB sockets in the transmitter - only two are listed in the gist - so In the gist example, 2 of the 27 are expecting traffic.
[19:26] sustrik	ok
[19:26] sustrik	there's only one SUB socket, right?
[19:26] drbobbeaty	Right.
[19:26] sustrik	now, rach PUB socket transmits 4000 msgs/sec, right?
[19:26] sustrik	each*
[19:26] drbobbeaty	approximately, yes.
[19:27] sustrik	so we have ~16000 msgs/sec on the wire
[19:27] sustrik	of which the sub socket should retrieve 8000/sec
[19:27] sustrik	now, what are you seeing?
[19:28] sustrik	800,000 msgs/sec?
[19:28] drbobbeaty	To be honest, the PUB sockets aren't all the same at 4000 msgs/sec - two of them (the ones I'm NOT listening to in the gist) are much less... So I'd say we're looking at 8000 msgs/sec I should see.
[19:29] sustrik	ok
[19:29] drbobbeaty	With the gist example I'm seeing 200k - 700 k msgs/sec.
[19:29] drbobbeaty	in my latest tests.
[19:29] sustrik	that's number of successful recvs on the SUB socket, right?
[19:29] sustrik	per second
[19:29] drbobbeaty	Yup.
[19:30] drbobbeaty	It varies a lot as it's live exchange data.
[19:30] sustrik	so let's say each "connect" gets all the data sent
[19:31] drbobbeaty	But when I edit the gist to only connect to the URLs in line 80 and 81, the numbers line up very close.
[19:31] sustrik	that's 16000 msgs/sec * 27
[19:31] sustrik	432000 msgs/sec
[19:31] sustrik	hm
[19:31] drbobbeaty	OK, I'm with you... Yeah... and I didn't see it being a linear multiplication either.
[19:32] sustrik	what's the interval for calculation of throughput?
[19:32] sustrik	one second?
[19:32] drbobbeaty	Typically 10 sec on the transmitter and about 1 sec on the receiver.
[19:32] sustrik	ok, so the variation may be caused by small sample interval on the receiver
[19:33] drbobbeaty	Yeah, easily could be.
[19:33] sustrik	can you use a larger interval?
[19:33] drbobbeaty	Yeah.
[19:33] sustrik	great
[19:34] drbobbeaty	Increased it to 10 sec on the receiver... still seeing 92k msgs/sec received versus 4k msgs/sec sent.
[19:35] sustrik	92000, ok
[19:35] sustrik	92000 / 27 = 3400
[19:36] drbobbeaty	It's close if you do the division...
[19:36] sustrik	that's kind of close to the sending rate on a single pub socket
[19:36] drbobbeaty	Yup. Agreed.
[19:36] drbobbeaty	Which is why I wondered if I was doing something wrong and publishing the same message on ALL URLs and getting duplicates that way.
[19:37] drbobbeaty	But I didn't see how in the code, and in the logging of the connections.
[19:37] sustrik	if you turn the unused publishers of, does it make difference in throughput on the SUB side?
[19:37] drbobbeaty	If they are unused, they are never turned "on" - the default is to be OFF until needed. So it doesn't effect this.
[19:38] drbobbeaty	...on the PUB side.
[19:38] drbobbeaty	On the SUB side, if I turn off the unused ones, the numbers line up very closely.
[19:38] sustrik	how do you do that?
[19:38] sustrik	there's some OOB channel to inform publishers to staert/stop?
[19:39] drbobbeaty	On the PUB side, I look at the message from the exchange - is it a quote, is it a trade, what symbol is it for - I use these to "classify" the message into 1 of the 270 multicast channels.
[19:39] drbobbeaty	If the PUB channel isn't open, a socket is created, the connection the the correct URL is made, and the message is sent.
[19:39] drbobbeaty	"On Demand" sockets and connections on the PUB side.
[19:40] drbobbeaty	The SUB side can't know what's "active" so it had to listen to large "sections" of the multicast space.
[19:40] sustrik	what i am trying to figure out whether there are 2 sockets publishing or 4
[19:40] sustrik	in the test scenario
[19:41] drbobbeaty	The transmitter is publishing on 4. Two of which are in the gist code, and two are not. The two that are in the gist code are the "busy" ones.
[19:42] drbobbeaty	Essentially, the gist test receiver is listening to a part of what the transmitter is sending. But it's also listening to a lot of "dead channels".
[19:42] sustrik	ok, but there are 4 sockets pushing data to the wire
[19:42] drbobbeaty	Yup.
[19:42] sustrik	thus overall load on the wire is ~16000 msgs/sec
[19:42] sustrik	good
[19:43] sustrik	if you stop the two publishers that nodoby listens to
[19:43] sustrik	does it change the throughput on the receiver?
[19:43] drbobbeaty	Hmmm... I don't know. I can try that. I'll do that now.
[19:49] drbobbeaty	Lots of fluctuation in the market now, but it appears that the numbers sent and received are nearly the same as before. This, I believe, is due to the different loads on the 4 multicast channels. The two I'm listening to are Quotes - very high volume. The two I'm not listening to, and have now turned off, are Trades, and very low volume. If I recall correctly, the ration is somewhere in the 200:1 range or so. Meaning there are about 200 QUote messages f
[19:49] drbobbeaty	every Trade message - roughly.
[19:50] drbobbeaty	That was supposed to read: "..ratio is somewhere in the 200:1 range..."
[19:51] sustrik	ok, anyway
[19:52] sustrik	it looks like the SUB socket is getting the data from each connect even though 25 of them should get nothing
[19:52] drbobbeaty	That's my theory.
[19:52] sustrik	it looks like the filtering is broken
[19:52] sustrik	would it be possible to write a simple test progeam
[19:53] sustrik	say a publisher that sends 1 message/sec
[19:53] sustrik	and a subscribed that would connect to two different mcast groups
[19:53] sustrik	and check whether we ger 1 or 2 msgs/sec on the SUB side?
[19:54] sustrik	if we'll get 2, the theory is proven
[19:56] drbobbeaty	I can write that, sure. I'll use the same gist receiver but I'll make a simple transmitter and have it transmit on 1 or 2 of the multicast channels at a regular interval, and we'll see.
[19:56] drbobbeaty	It'll probably take me until tomorrow morning, but I'll do it and post it to the mailing list with the results.
[19:57] drbobbeaty	Sounds OK?
[19:57] sustrik	sure
[19:57] sustrik	just keep the code as simple as possible
[19:57] sustrik	while (1) {
[19:57] sustrik	sleep (1);
[19:57] sustrik	zmq_send (msg);
[19:57] sustrik	}
[19:57] sustrik	something like that
[19:58] drbobbeaty	Yup, that's what I was planning. Very simple.
[19:58] sustrik	ack
[19:58] sustrik	ok, see you
[20:21] drbobbeaty	sustrik: I've got the second gist done and the results are stunning. The gist is: http://gist.github.com/635015 - it's a very simple transmitter that's sending a message a second on one of the 27 multicast channels.
[20:22] sustrik	and?
[20:22] drbobbeaty	When I have the same receiver gist running against this, it returns all kinds of numbers more than 1/sec. But when I comment out all but the one that's actually active, it returns 1/sec - just like you'd expect.
[20:22] sustrik	good
[20:22] drbobbeaty	It really looks like something isn't working on the filtering.
[20:22] sustrik	so we have a reproducible test case
[20:23] drbobbeaty	You can build and run these gists as they have no dependencies other than ZeroMQ.
[20:23] drbobbeaty	Yup, we do.
[20:23] sustrik	can you report the problem on the mailing list, point to the gist etc.?
[20:23] drbobbeaty	You bet. Be glad to.
[20:23] sustrik	great, thanks!