Monday August 16, 2010

[Time] NameMessage
[11:09] pieter_hintjens sustrik: random question
[11:09] sustrik yes?
[11:10] pieter_hintjens can i connect a pull socket to a pub socket?
[11:10] sustrik you can but you should not
[11:10] pieterh i have a problem that this solves very elegantly
[11:10] sustrik ?
[11:10] pieterh in parallel pipeline, workers read off pull sockets
[11:11] pieterh at some point i want to kill them all
[11:11] pieterh broadcast a kill message to them
[11:11] pieterh simplest is that they connect their pull socket to a pub socket on coordinator node
[11:11] pieterh which can then kill them when the batch is done
[11:11] pieterh if i can't make this connection, i need to manage two sockets with poll
[11:12] pieterh which is significantly more work
[11:12] sustrik using same socket to transfer both work and admin messages?
[11:12] pieterh on input only
[11:12] pieterh separate patterns on the topology
[11:12] pieterh i want a single input queue
[11:12] sustrik 1. it wouldn't work
[11:13] sustrik 2. mingling disparate data flows is never a good idea
[11:13] pieterh 1. why not?
[11:13] pieterh 2. it's only fanin of disparate data flows
[11:13] pieterh no different than polling two sockets and reading from each
[11:13] pieterh just more work
[11:14] sustrik because the work won't be load-balanced but sent to all workers
[11:14] sustrik that's what PUB socekt does
[11:14] pieterh that is the point
[11:14] pieterh push/pull for the work
[11:14] pieterh pub/sub for the control
[11:15] pieterh except worker merges sub into pull to get a single queue and easier API
[11:15] pieterh conceptually it's two data flows
[11:15] sustrik yes, it's an API problem
[11:15] sustrik you don't want to use zmq_poll
[11:15] pieterh nope
[11:15] sustrik but something simples
[11:15] sustrik simpler
[11:15] pieterh it's too complex for this part of the guide
[11:15] keffo speaking of pub/sub, is there any timeframe for when the filtering will move to the source, if ever?
[11:15] sustrik mixing the messaging patterns is not a solution
[11:16] pieterh i could use a fanin device
[11:16] pieterh keffo: it'll come but it's not priority for now
[11:16] sustrik filtering is at destination atm
[11:16] keffo and what about ipc/inproc support for win platform?
[11:16] pieterh keffo: sorry, there is no clear roadmap for various reasons
[11:17] pieterh we will try to make one to clarify this
[11:17] sustrik no, using a device means mixing the feeds
[11:17] sustrik what you need is a better API
[11:17] keffo it's not a prio for me at the moment though, my workloads are usually far longer than any latency tcp adds..
[11:17] pieterh sustrik: probably, but i'm explaining how to use what we have today... :-)
[11:18] pieterh keffo: in naive tests, tcp and ipc are pretty comparable
[11:18] pieterh the biggest difference is perhaps security, ipc maybe comes from a 'more secure' origin
[11:19] pieterh sustrik: my use case does require mixing the feeds at the worker side
[11:19] sustrik not messing with flows
[11:19] sustrik you may, for example, implement a multi-socket recv:
[11:19] sustrik zmq_recv_mutliple (s1, s2, s3, ...);
[11:19] pieterh sure
[11:19] sustrik application obviously has to process the messages from all connected feeds
[11:19] pieterh but connecting a socket to two endpoints is semantically just the same
[11:20] sustrik the point is that the messaging infrastructure should keep them separate
[11:20] sustrik once you mix the feeds you are never going to get them apart
[11:20] sustrik it's asking for a trouble in future
[11:20] pieterh "don't cross the streams!"
[11:21] sustrik yes
[11:21] sustrik divide and conquer!
[11:21] pieterh you mean, specifically, never connect socket types to ones they're not explicitly designed to talk to?
[11:21] sustrik (your streams)
[11:21] pieterh it's a ghostbusters quote
[11:21] sustrik ah, don't recall that one
[11:22] sustrik yes, i mean, pipeline is a pipeline is a pipeline
[11:22] sustrik and distribution tree is a distribution tree
[11:22] sustrik well, frankesteing is frankestein..,
[11:23] pieterh ok, so some random observations on this
[11:23] pieterh 1- there is no checking of this vital rule in the infrastructure
[11:23] pieterh 2- it's not documented anywhere afaics
[11:23] sustrik :(
[11:23] sustrik i know
[11:23] pieterh 3- breaking this rule makes things work nicely in cases
[11:24] pieterh 4- there seems to be no nice alternative in cases
[11:24] pieterh 5- the objections are theoretical, because in fact these sockets are compatible
[11:24] sustrik mixing the patterns breaks scalability
[11:24] sustrik shrug
[11:24] sustrik just don't doo that
[11:25] sustrik if you have an API problem solve the API problem
[11:25] pieterh sorry, i'm a writer, not a fighter
[11:25] pieterh i have to mix two streams
[11:25] sustrik it's up to you
[11:25] pieterh if i can't use a fanin model (that would seem to make sense)
[11:25] sustrik but you should not
[11:25] pieterh what can i do, easily?
[11:26] sustrik use zmq_poll
[11:26] pieterh sorry, not 'easy'
[11:26] sustrik that you cannot do it
[11:26] pieterh please provide me a simple, stupid solution
[11:26] sustrik that's what zmq_poll is for
[11:27] pieterh i'm going to explain zmq_poll but it's not something i want to throw into the basic examples
[11:27] sustrik allowing a component to handle multiple streams
[11:27] pieterh yes
[11:27] sustrik ok, think of the same problem with TCP
[11:27] pieterh hang on
[11:27] sustrik i want to get data from 2 TCP connections
[11:27] sustrik therefore i patch the kernel to mix the packets for me
[11:28] pieterh can i recv on both sockets, non-blocking, with small nanosleep?
[11:28] sustrik yes, you can
[11:28] pieterh is that going to break scalability?
[11:28] sustrik but you'll get busy loop
[11:28] sustrik nope
[11:28] pieterh not busy loop, sleeping
[11:28] sustrik spinlocking basically
[11:28] pieterh and when there is work, no sleeping, obviously
[11:28] sustrik it's dirty but it doesn't break anything
[11:29] pieterh why is this dirty, specifically?
[11:29] sustrik i wouldn't use it personally
[11:29] pieterh please don't just say "because poll is cleaner", that's opinion
[11:29] sustrik in real world i mean
[11:29] pieterh technically, what's wrong with it?
[11:29] sustrik increase in latency
[11:30] pieterh only for first message, and it's for heavy parallel workloads
[11:30] sustrik sure, if that's your use case then it's ok
[11:30] pieterh ok, this is good, i have a simple way to work with multiple sockets that can be done 'better' with zmq_poll at a later stage
[11:30] pieterh thanks
[11:30] sustrik np
[11:45] sustrik btwL
[11:49] sustrik AMQP on top of 0MQ?
[12:39] pieterh sustrik: nice find, I've added it to
[13:04] pieterh sustrik: i have a strange case of message loss, can i ask you for help?
[13:04] sustrik sure
[13:04] sustrik what happened?
[13:04] pieterh ok, three programs in a pipeline
[13:04] pieterh
[13:05] pieterh the worker does this 'receive from two sockets' thing
[13:05] pieterh simply creating/binding a second socket causes message loss
[13:05] pieterh if you run taskvent, taskwork, tasksink and then start it, you will see
[13:05] pieterh should do 100 tasks
[13:06] pieterh if the '#if/#endif' in the worker is removed, it does 100 tasks but only 50 get to the sink
[13:07] pieterh if i run the code as its shown in the gist, the worker doesn't get all messages
[13:07] pieterh thx
[13:12] sustrik pieterh: the second program
[13:12] pieterh yes?
[13:12] sustrik you send messages then you exit
[13:12] sustrik those not already sent will be dropped
[13:13] pieterh ah... of course, it'll give EAGAIN and the loop will exit
[13:13] pieterh hang on, no
[13:13] pieterh it loops forever, there's no exit except on real errors
[13:13] sustrik it'll give OK
[13:14] pieterh which is the 'second' program?
[13:14] pieterh taskwork?
[13:14] sustrik "task ventilator"
[13:14] sustrik taksevent.xc
[13:14] sustrik c
[13:14] pieterh right... ok
[13:14] pieterh let me add the necessary sleep to that
[13:15] sustrik ack
[13:15] sustrik this should work when the big patch i woked on last weeks is merged though
[13:15] sustrik next release presuambly
[13:16] pieterh ok, this solves one problem but not the weird one
[13:17] pieterh tasks now get to the worker properly (this happened most of the time anyhow)
[13:17] pieterh the sink only gets half the result messages
[13:17] pieterh 1 in 2 get dropped
[13:18] pieterh it is not systematic :-(
[13:19] pieterh hah!
[13:19] sustrik found it?
[13:19] pieterh well, doing this:
[13:19] pieterh control = zmq_socket (context, ZMQ_SUB);
[13:19] pieterh zmq_connect (output, "tcp://localhost:5559");
[13:19] pieterh zmq_setsockopt (control, ZMQ_SUBSCRIBE, "", 0);
[13:19] pieterh causes outputs to be lost, whereas when that is commented out, it works
[13:20] pieterh systematically, at least on my box
[13:20] pieterh this is in the taskwork program
[13:21] sustrik the control socket is never used in the program, right?
[13:21] pieterh nope
[13:21] pieterh the connect is the critical thing
[13:22] sustrik i mean, the control socket is just opened and never used
[13:22] pieterh yes, that's right
[13:22] sustrik that's strange
[13:22] pieterh if i don't do the connect, it works
[13:23] pieterh the setsockopt() is innocent
[13:23] pieterh it loses exactly 1 message in two
[13:24] sustrik what do the workers say?
[13:24] sustrik are they processing messages?
[13:24] pieterh the workers process 100 tasks and send 100 results
[13:24] pieterh the sink receives 50 results
[13:24] sustrik hm
[13:24] pieterh it looks like a bug in 0MQ, to be honest
[13:25] pieterh let me try opening a 3rd random socket
[13:25] sustrik yes, it does
[13:25] sustrik the workers end up looping forever, right?
[13:25] pieterh hang on... lol
[13:26] pieterh sorry, mea culpa
[13:26] sustrik !!
[13:26] sustrik control = zmq_socket (context, ZMQ_SUB);
[13:26] sustrik zmq_connect (output, "tcp://localhost:5559");
[13:26] sustrik zmq_setsockopt (control, ZMQ_SUBSCRIBE, "", 0);
[13:26] pieterh i'm connecting the /output/ not the /control/ socket!
[13:26] pieterh fairqueuing of results... lol
[13:26] sustrik :)
[13:26] pieterh could 0MQ not detect this kind of misconnection, it's illegal...
[13:26] pieterh perhaps add the peer socket type in opening message
[13:26] sustrik uncompatible socket types?
[13:26] pieterh yeah
[13:26] sustrik yes, it should
[13:27] pieterh ok, let's add that to the 3.0 wishlist
[13:27] sustrik ack
[13:27] pieterh nice! thanks, martin
[13:28] sustrik you are welcome
[13:45] pieterh sustrik: just to confirm, it all works perfectly, including reading from two sockets
[13:45] pieterh thanks... :-)
[13:48] sustrik it's part of the user guide or a separate tutorial?
[14:01] pieterh sustrik: this is in the guide
[14:01] pieterh what i might do later is separate the different examples into mini-tutorials
[14:01] pieterh they are fairly self-contained
[14:02] sustrik ok
[14:02] pieterh i'm reorganizing the guide so that it develops more and more complex examples, each time illustrating some aspects
[14:26] sustrik ack
[22:51] pieterh sustrik: not sure if you're still around but on the off-chance...
[22:51] pieterh are the built-in forwarder/streamer devices multipart safe?
[22:51] pieterh it looks like they
[22:51] pieterh *they'll read multipart messages but not send them correctly
[23:30] JohnMcL Can anyone here tell me the scoop on cross-compiling ZeroMQ?
[23:31] JohnMcL Web site makes reference to CMake... but seems to be lissing a key config file
[23:31] JohnMcL missing
[23:36] pieterh JohnMcL: sorry, never tried that