Tuesday October 19, 2010

[Time] NameMessage
[02:28] driethr hello. do I have to zmq_setsockopt for ZMQ_IDENTITY before the zmq_connect is done? there is no chance to reassign it, before sending a message?
[07:43] sustrik driethr: no, no chance
[09:20] omarkj Morning.
[09:24] keffo Morning
[09:24] keffo What is it Sebastian? I'm arranging matches
[09:24] keffo (sorry, felt brittish!) :)
[09:58] omarkj Haha, it's fine.
[12:43] omarkj Any idea what can cause a message not getting sent to a ZMQ_PUB socket?
[12:43] omarkj When I'm publishing a few thousand messages, I sometimes have to retry sending each one up to ~600 times before they're actually published
[13:42] drbobbeaty Question about an assertion in the code: If I get the log message: "Assertion failed: *tmpbuf > 0 (zmq_decoder.cpp:60)" -- this is in the 2.0.9 release version.
[13:42] drbobbeaty What should I be looking at for the root cause?
[13:43] drbobbeaty I'm using a simple "epgm://" PUB/SUB system and this is on the SUB-side.
[13:47] drbobbeaty The code seems to indicate that this is indicating that there's nothing in the message to decode, but at the same time, the code has a TODO to indicate that there needs to be work done on the handling of oversized messages.
[13:48] drbobbeaty So I'm wondering if there are limits I need to adhere to in message sizes when using the epgm:// transport?
[13:53] cremes omarkj: what version of 0mq? what platform (linux, osx, windows)? and do you have a small code example that shows the problem?
[13:55] omarkj Using version 2.10 og Linux (Ubuntu to be precise). Publishing from an erlang process.
[13:59] cremes omarkj: it's possible there is a bug in 2.1.0 that you have uncovered; it hasn't been released yet
[13:59] cremes do you see the same problem with 2.0.10?
[13:59] omarkj Ah.
[13:59] omarkj I'll have to try it.
[13:59] cremes that's the last release of the 2.0 series
[14:00] cremes honestly, it's more likely there is a bug in your code; unfortunately, i can't read erlang otherwise i'd lend a hand
[14:00] cremes the most likely culprits are...
[14:00] cremes 1. starting your publisher *before* you start the subscribers; all messages get dropped due to a timing issue
[14:00] cremes 2. forgetting to call setsockopt(ZM_SUBSCRIBE, topic) and setting a subscriber topic
[14:01] cremes 3. not sleeping at the end of your publisher's send; the subscriber doesn't have time to fetch messages before the queue is released
[14:01] Andreas Hi everybody!
[14:02] Andreas I got a question regarding pub/sub. Maybe somebody can help me with that ....
[14:02] Andreas Is there any possibilty to get noticed whether a subscriber got disconnected?
[14:02] guido_g no
[14:03] guido_g you have to do it yourself
[14:03] Andreas is something planned for future releases?
[14:04] omarkj cremes: I guess I'm not sleeping for the ms after the send..
[14:05] cremes omarkj: try that and see what happens... sleep for 10s just for kicks
[14:06] omarkj Why is that needed by the way? To flush or something like that?
[14:17] cremes omarkj: the publisher would exit before the subscriber could process all of the messages; that's all
[14:17] cremes omarkj: read the guide if you haven't yet; it answers pretty much all of the basics:
[14:22] sustrik drbobbeaty: are you using multi-part messages?
[14:22] omarkj No, I have, just wondering,
[14:23] drbobbeaty No, I'm just making a single message and sending it. If it's multi-part, then it's by default as I'm using a very simplistic interface to ZeroMQ.
[14:23] sustrik hm, it looks like a bug, can you provide a minimal test case to reproduce it?
[14:24] drbobbeaty I'm actually going to try 2.0.10 now as I just noticed that it's been released. If that has the same issue, I'll send something to the mailing list.
[14:25] sustrik drbobbeaty: thanks
[14:27] sustrik omarkj: with 2.1 the sleep at the end of the program should not be needed
[14:27] sustrik zmq_term just blocks until out outbound messages are sent
[14:27] omarkj Okay.
[14:28] sustrik omarkj: have you ZMQ_HWM option set?
[14:29] omarkj sustrik: Yup, it's set at one. Maybe raise that.
[14:29] omarkj I don't remember if I tried doing that.
[14:29] sustrik that's the problem
[14:29] sustrik if you have buffer of size 1
[14:29] sustrik it can store 1 message
[14:29] pieterh omarkj: yes, you're basically just dropping stuff at the pub side
[14:29] omarkj Okay.
[14:29] pieterh HWM=1 is for specialized request-reply cases or if you are using SWAP
[14:30] omarkj I see, silly me. I'll give it a try.
[14:30] pieterh just remove all HWM settings
[14:30] omarkj I was having problems with the server crashing badly after some time I did that.
[14:30] pieterh then remove IDENTITY settings on the subscribers
[14:31] pieterh read the section on durable subscribers in the guide
[14:31] pieterh if you create durable subscribers you must set a HWM but something like 10,000 is reasonable depending on message size and rate
[14:31] pieterh actually with durable subscribers the server will eventually crash anyhow since there's no concept of ending a durable subscriber
[14:32] sustrik even if there's no identity set
[14:32] sustrik if the sender is faster than subscribed
[14:32] pieterh yes, indeed
[14:32] sustrik the buffer would eventually grow out of memory
[14:32] pieterh what the guide says is "set HWM in a serious publisher"
[14:33] pieterh but not 1 :-)
[14:33] omarkj Haha, yes, I must have changed it to one during some bug hunting or something.
[15:52] drbobbeaty Question: I have a user that's trying to run/debug a ZeroMQ app I've written in NetBeans 6.7.1. In gdb it runs fine, but in NetBeans it doesn't. I've never used NetBeans and can't really help because it works in the shell and in gdb. Has anyone ever heard of any issues with ZeroMQ and NetBeans? Assuming this guy can run it in gdb...
[15:54] DerGuteMoritz AFAIK IDEs like NetBeans tend to mess with PATH
[15:54] DerGuteMoritz maybe he needs to change some setting first
[15:54] DerGuteMoritz I bet this is on Windows?
[16:58] drbobbeaty No, NetBeans on Linux
[16:59] drbobbeaty Seems to have a decent path as it's starting to run, and the LD_LIBRARY_PATH is set, but thanks for the idea. We'll have to keep checking things.
[19:19] sustrik drbobbeaty: hi
[19:19] sustrik still there?
[19:19] drbobbeaty Yup
[19:19] sustrik i am not sure i understand your use case exactly
[19:19] drbobbeaty OK... I'll explain.
[19:19] sustrik how many publisher socket are there
[19:19] sustrik ?
[19:20] drbobbeaty At the current time - the network will have probably 270. Not all on one process - maybe 4 to 10 per process.
[19:21] sustrik i mean in the test where you are seeing the problem
[19:21] drbobbeaty The idea is to distribute the exchange tick data on different multicast addresses to allow the switches to "squeltch" the traffic if it's not subscribed for.
[19:21] drbobbeaty In the test, there are four open publishers.
[19:21] drbobbeaty Two of which are lines 80 and 81 in the gist.
[19:21] sustrik four PUB sockets?
[19:21] drbobbeaty Yup.
[19:21] drbobbeaty in one process.
[19:22] drbobbeaty In the other process, there is one SUB socket with 27 "connections"
[19:22] sustrik how many connects/binds on each PUB socket?
[19:23] drbobbeaty One socket - One connect. It's using epgm:// so I didn't think I needed a bind() - at least I didn't see it in the examples.
[19:23] sustrik sure
[19:23] sustrik so each PUB socket conects to a different multicast group, right?
[19:24] drbobbeaty Yup - exactly right.
[19:24] sustrik so you have 4 PUB sockets and 4 mutlicast groups
[19:24] drbobbeaty Yes... in the transmitter process.
[19:24] sustrik good
[19:25] sustrik now, in the gist i see you connect SUB socket to ~20 multicast groups
[19:25] sustrik meaning 16 of them are idle
[19:25] sustrik right?
[19:25] drbobbeaty Actually, of the 4 PUB sockets in the transmitter - only two are listed in the gist - so In the gist example, 2 of the 27 are expecting traffic.
[19:26] sustrik ok
[19:26] sustrik there's only one SUB socket, right?
[19:26] drbobbeaty Right.
[19:26] sustrik now, rach PUB socket transmits 4000 msgs/sec, right?
[19:26] sustrik each*
[19:26] drbobbeaty approximately, yes.
[19:27] sustrik so we have ~16000 msgs/sec on the wire
[19:27] sustrik of which the sub socket should retrieve 8000/sec
[19:27] sustrik now, what are you seeing?
[19:28] sustrik 800,000 msgs/sec?
[19:28] drbobbeaty To be honest, the PUB sockets aren't all the same at 4000 msgs/sec - two of them (the ones I'm NOT listening to in the gist) are much less... So I'd say we're looking at 8000 msgs/sec I should see.
[19:29] sustrik ok
[19:29] drbobbeaty With the gist example I'm seeing 200k - 700 k msgs/sec.
[19:29] drbobbeaty in my latest tests.
[19:29] sustrik that's number of successful recvs on the SUB socket, right?
[19:29] sustrik per second
[19:29] drbobbeaty Yup.
[19:30] drbobbeaty It varies a lot as it's live exchange data.
[19:30] sustrik so let's say each "connect" gets all the data sent
[19:31] drbobbeaty But when I edit the gist to only connect to the URLs in line 80 and 81, the numbers line up very close.
[19:31] sustrik that's 16000 msgs/sec * 27
[19:31] sustrik 432000 msgs/sec
[19:31] sustrik hm
[19:31] drbobbeaty OK, I'm with you... Yeah... and I didn't see it being a linear multiplication either.
[19:32] sustrik what's the interval for calculation of throughput?
[19:32] sustrik one second?
[19:32] drbobbeaty Typically 10 sec on the transmitter and about 1 sec on the receiver.
[19:32] sustrik ok, so the variation may be caused by small sample interval on the receiver
[19:33] drbobbeaty Yeah, easily could be.
[19:33] sustrik can you use a larger interval?
[19:33] drbobbeaty Yeah.
[19:33] sustrik great
[19:34] drbobbeaty Increased it to 10 sec on the receiver... still seeing 92k msgs/sec received versus 4k msgs/sec sent.
[19:35] sustrik 92000, ok
[19:35] sustrik 92000 / 27 = 3400
[19:36] drbobbeaty It's close if you do the division...
[19:36] sustrik that's kind of close to the sending rate on a single pub socket
[19:36] drbobbeaty Yup. Agreed.
[19:36] drbobbeaty Which is why I wondered if I was doing something wrong and publishing the same message on ALL URLs and getting duplicates that way.
[19:37] drbobbeaty But I didn't see how in the code, and in the logging of the connections.
[19:37] sustrik if you turn the unused publishers of, does it make difference in throughput on the SUB side?
[19:37] drbobbeaty If they are unused, they are never turned "on" - the default is to be OFF until needed. So it doesn't effect this.
[19:38] drbobbeaty ...on the PUB side.
[19:38] drbobbeaty On the SUB side, if I turn off the unused ones, the numbers line up very closely.
[19:38] sustrik how do you do that?
[19:38] sustrik there's some OOB channel to inform publishers to staert/stop?
[19:39] drbobbeaty On the PUB side, I look at the message from the exchange - is it a quote, is it a trade, what symbol is it for - I use these to "classify" the message into 1 of the 270 multicast channels.
[19:39] drbobbeaty If the PUB channel isn't open, a socket is created, the connection the the correct URL is made, and the message is sent.
[19:39] drbobbeaty "On Demand" sockets and connections on the PUB side.
[19:40] drbobbeaty The SUB side can't know what's "active" so it had to listen to large "sections" of the multicast space.
[19:40] sustrik what i am trying to figure out whether there are 2 sockets publishing or 4
[19:40] sustrik in the test scenario
[19:41] drbobbeaty The transmitter is publishing on 4. Two of which are in the gist code, and two are not. The two that are in the gist code are the "busy" ones.
[19:42] drbobbeaty Essentially, the gist test receiver is listening to a part of what the transmitter is sending. But it's also listening to a lot of "dead channels".
[19:42] sustrik ok, but there are 4 sockets pushing data to the wire
[19:42] drbobbeaty Yup.
[19:42] sustrik thus overall load on the wire is ~16000 msgs/sec
[19:42] sustrik good
[19:43] sustrik if you stop the two publishers that nodoby listens to
[19:43] sustrik does it change the throughput on the receiver?
[19:43] drbobbeaty Hmmm... I don't know. I can try that. I'll do that now.
[19:49] drbobbeaty Lots of fluctuation in the market now, but it appears that the numbers sent and received are nearly the same as before. This, I believe, is due to the different loads on the 4 multicast channels. The two I'm listening to are Quotes - very high volume. The two I'm not listening to, and have now turned off, are Trades, and very low volume. If I recall correctly, the ration is somewhere in the 200:1 range or so. Meaning there are about 200 QUote messages f
[19:49] drbobbeaty every Trade message - roughly.
[19:50] drbobbeaty That was supposed to read: "..ratio is somewhere in the 200:1 range..."
[19:51] sustrik ok, anyway
[19:52] sustrik it looks like the SUB socket is getting the data from each connect even though 25 of them should get nothing
[19:52] drbobbeaty That's my theory.
[19:52] sustrik it looks like the filtering is broken
[19:52] sustrik would it be possible to write a simple test progeam
[19:53] sustrik say a publisher that sends 1 message/sec
[19:53] sustrik and a subscribed that would connect to two different mcast groups
[19:53] sustrik and check whether we ger 1 or 2 msgs/sec on the SUB side?
[19:54] sustrik if we'll get 2, the theory is proven
[19:56] drbobbeaty I can write that, sure. I'll use the same gist receiver but I'll make a simple transmitter and have it transmit on 1 or 2 of the multicast channels at a regular interval, and we'll see.
[19:56] drbobbeaty It'll probably take me until tomorrow morning, but I'll do it and post it to the mailing list with the results.
[19:57] drbobbeaty Sounds OK?
[19:57] sustrik sure
[19:57] sustrik just keep the code as simple as possible
[19:57] sustrik while (1) {
[19:57] sustrik sleep (1);
[19:57] sustrik zmq_send (msg);
[19:57] sustrik }
[19:57] sustrik something like that
[19:58] drbobbeaty Yup, that's what I was planning. Very simple.
[19:58] sustrik ack
[19:58] sustrik ok, see you
[20:21] drbobbeaty sustrik: I've got the second gist done and the results are stunning. The gist is: - it's a very simple transmitter that's sending a message a second on one of the 27 multicast channels.
[20:22] sustrik and?
[20:22] drbobbeaty When I have the same receiver gist running against this, it returns all kinds of numbers more than 1/sec. But when I comment out all but the one that's actually active, it returns 1/sec - just like you'd expect.
[20:22] sustrik good
[20:22] drbobbeaty It really looks like something isn't working on the filtering.
[20:22] sustrik so we have a reproducible test case
[20:23] drbobbeaty You can build and run these gists as they have no dependencies other than ZeroMQ.
[20:23] drbobbeaty Yup, we do.
[20:23] sustrik can you report the problem on the mailing list, point to the gist etc.?
[20:23] drbobbeaty You bet. Be glad to.
[20:23] sustrik great, thanks!