Tuesday February 15, 2011

[Time] NameMessage
[08:26] sustrik Guthur: what's pipe ops?
[08:33] CIA-21 zeromq2: 03Michael Compton 07master * rfbe5d85 10/ (AUTHORS doc/zmq_setsockopt.txt):
[08:33] CIA-21 zeromq2: Added note regarding setting sockopt before bind/connect
[08:33] CIA-21 zeromq2: Signed-off-by: Michael Compton <> -
[11:22] cyball hi can i do something like this :: REQ -> XREP->XREQ->SUB with zmq_device::ZMQ_FORWARDER ? i want to publish a message to morre than one subscribers :) thx
[11:23] cyball or is it also possible to work with ... REQ->PUB->SUB ?
[11:31] mikko cyball: do you need req sock on client?
[11:32] mikko cyball: you could have PUB on client and use forwarder device
[11:32] mikko client PUB <---> SUB forwarder PUB <---> SUB subscribers
[11:34] cyball mikko, i work on a continous integration service and i got a post-commit from github so i can not run the PUB whole the time only on request from github :)
[11:35] mikko cyball: you can have the pub connect to forwarder on post-commit ?
[11:35] mikko connect, publish, go away
[11:36] cyball mikko, ohh ok .. i thought that i have to run the publisher whole the time
[11:36] cyball because of the subscriibers
[11:36] mikko the subscribers would be connected to forwarder device
[11:36] mikko whcih can run all the time
[11:37] cyball ok
[11:37] mikko look at the diagram:
[11:37] mikko client PUB <---> SUB forwarder PUB <---> SUB subscribers
[11:37] mikko :)
[11:38] cyball mikko, thx
[11:38] mikko forwarder would run all the time and subscribers would only know that they are connected to it
[11:38] cyball ok
[11:38] cyball i will have a look on it
[11:38] cyball do you have a link to it ?
[11:39] mikko link to where?
[11:39] cyball diagrag :)
[11:39] cyball diagram :)
[11:39] mikko it's that ascii one
[11:39] mikko let me see if zguide has similar with prettier graphics
[11:39] cyball no sorry i thought there is one in the manual i did not seen
[11:41] mikko it's a very simple scenario
[11:41] mikko the publisher client connects to insocket on forwarder and publishes message
[11:41] mikko forwarder then publishes it on outsocket
[11:42] mikko and subscribers are connected to outsocket
[11:42] mikko forwarder runs on the background and subscribers only know about it's existence
[11:42] mikko publishers on insocket can come and go as they please
[11:42] cyball mikko, is that ok ?
[11:45] mikko cyball: subscribe to "" on frontend
[11:46] mikko so before the frontend bind
[11:47] mikko zmq_setsockopt(frontend, "", 0);
[11:47] mikko otherwise your frontend will filter all messages
[12:03] cyball mikko, is that ok for the publisher on the client ?
[12:04] cyball or should i also add some socket options too ?
[12:04] mikko should be fine
[12:19] cyball mikko, ok i have put all pieces together :: i do not see anything on the subscriber i guess there is something i do not see can u have pls a look on it ?
[12:36] mikko cyball: subscribe the sub
[12:36] mikko subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);
[12:36] mikko otherwise it will filter all messages
[12:36] mikko i gotta commute to the office
[12:36] mikko back in 30 mins or so
[12:37] cyball thx
[12:42] cyball it does not work :-(
[12:47] cyball sure that i can have a SUB bind on a port and it does not only support connect ?
[12:51] sustrik yes, bind/connect are orthogonal to the socket type
[12:55] cyball sustrik, thx
[12:57] cyball sustrik, can you please have a look on the code ... ?
[12:57] cyball probably there is something missing ... i have adde the subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);
[12:57] cyball but it also does not work
[12:58] sustrik what version of 0mq are you using?
[12:59] cyball 2.0.10
[13:00] sustrik with 2.0.10 there's no blocking zmq_close()
[13:00] cyball ohhh upps
[13:00] sustrik thus, if there are any queued outbound messages
[13:00] sustrik they are dropped on zmq_close()
[13:00] cyball ok that means i should compile the beta ?
[13:00] sustrik if you need to block till the messages are send, you'll have to use 2.1.0
[13:01] sustrik yup
[13:01] cyball ok i will do it now :)
[13:06] cyball sustrik, yeahhh it works now THX
[13:06] sustrik np
[13:10] marcinkuzminski Hi, I'm planning a system to do concurrent database insert, that suppose to be fail safe when insert fails, it would retry/remember, can that all be achieved using zeromq ?
[13:11] sustrik what do you mean by fail-safe?
[13:12] marcinkuzminski sustrik, That i cannot allow a task to be lost.
[13:12] marcinkuzminski i run ~200-300 tasks/s
[13:13] marcinkuzminski let say one out of million fails due to any reason network/db. And i need to retry that task with delay few times, if it fails permanently i need to store and remember that.
[13:14] sustrik then you have to store the task into a database
[13:14] sustrik distributed transactions should be used to pair the task generation with the insertion
[13:14] sustrik so that either both fail or both succeed
[13:14] sustrik it has very little to do with 0mq
[13:15] marcinkuzminski right,
[13:16] marcinkuzminski so it's just about distrubution of task that zeromq does ?
[13:16] marcinkuzminski sustrik, ok, getting back to reading 0mq manual
[13:16] sustrik right
[13:19] Guthur sustrik: did you catch my messages last night
[13:19] Guthur you can ignore the first suggestion though
[13:19] Guthur I now think the second would be better
[13:22] sustrik Guthur: i missed it
[13:22] sustrik can you explain once more?
[13:30] Guthur basically would it be ok for 0MQ to pass in a custom OVERLAPPED struct with an operation type flag when performing various pipe operations. iocp_t (epoll etc equivalent) would then use this along with poll_entry, which would be set as the completion key, to call to determine which event handlers to call
[13:30] Guthur I also have a question regarding the retired functionality, but I'm not near the code at the moment and can not remember the details
[13:31] Guthur does that make any sense?
[13:34] sustrik Guthur: pass what from where to where?
[13:35] sustrik sorry, i don't follow
[13:36] sustrik afaics, the OVERLAPPED should be part of poll_entry
[13:36] sustrik actyually 2 of them
[13:36] sustrik one for write another one for read
[13:37] Guthur when doing ReadFile or WriteFile or ConnectToPipe there is an argument for the OVERLAPPED, this will be returned through the IOCP when the op completes
[13:38] sustrik the event in the OVERLAPPED will be signaled when the op completes, right?
[13:39] Guthur you can ignore the event field, not using it
[13:39] sustrik hm
[13:39] Guthur but the OVERLAPPED can be extended, so that it contains custom fields
[13:39] sustrik how are you notified about operation being exectured then?
[13:39] sustrik executed*
[13:40] Guthur the only way is by setting a field in the custom OVERLAPPED struct
[13:40] sustrik i mean, is it a callback or what?
[13:41] Guthur IOCP will return an array of OVERLAPPED_ENTRIES, these will contain that the OVERLAPPED passed in when starting the OP
[13:41] sustrik which function is that?
[13:42] sustrik the one that returns the ENTRIES
[13:42] Guthur it will also return the Completion Key which you specify when you add the handle to the IOCP
[13:42] Guthur GetQueueCompletionStatus
[13:42] Guthur GetQueueCompletionStatusEx actually (for the timeout)
[13:45] Guthur I'm going to grab a coffee, back in a mo
[13:46] sustrik checking the docs
[13:46] sustrik how do you associate a particular read/write request with a specific completion port?
[13:48] zchrish I am testing zmq::poll with a single entry pollitem_t list. After I send a packet, I check whether items[0].revents & ZMQ_POLLIN is "1" and then process the input from REP. But, in my case, it always seems to be set to "1" even though I purposely put a 5 second delay in my XEQ program. Am I doing something wrong?
[13:50] sustrik zchrish: so you get POLLIN even though there is no message available, right?
[13:51] zchrish I think so.
[13:51] sustrik write a minimal test case then and report it as a bug
[13:51] sustrik POLLIN should be signaled only if there's a message available for reading
[13:52] zchrish ok; let me test a minimal case.
[13:52] mikko pieterh_: i've been ripping the guts out from zfl builds
[13:53] pieterh mikko: nice, I think... :-)
[13:53] mikko there was quite a lot of things that weren't seem to be needed
[13:53] mikko like checks for C++ compiler, atomic ops, linking against socket libs etc
[13:53] mikko tested linux, mingw32 and mac os x this far
[13:54] mikko solaris and freebsd to go from platforms i have access to
[13:54] pieterh sounds great
[13:54] mikko also, make check now runs zfl_selftest
[13:54] pieterh is that the normal action, there's no "make test"?
[13:55] mikko make check seems to be default action
[13:55] pieterh how about I give you commit access to the git?
[13:55] pieterh that seems simpler than pull requests
[13:55] pieterh are you committer on zmq?
[13:56] mikko no, im not
[13:56] pieterh how would you like to work? I'm happy giving you commit access
[13:56] mikko well, let's see whether you agree my thinking here:
[13:57] mikko my thinking was to rip out as much as possible to make things maintainable and then fix per platform if there are bugs on let's say very old qnx
[13:57] mikko example:
[13:58] mikko mac os x was set to build without -pedantic even though ZFL builds fine with -pedantic on mac os x
[13:59] pieterh well, you are far more expert in this than me
[13:59] mikko sparc cpu optimization:
[13:59] mikko -mcpu=v9
[13:59] mikko is that really needed?
[13:59] pieterh :-)
[13:59] pieterh I hope you're not asking me
[13:59] mikko i am
[13:59] mikko it's in the build :)
[14:00] pieterh well, mikko, my process is kind of different
[14:00] pieterh copy some code, hack it till it works, forget about it again asap, wait for patches
[14:00] mikko i can agree with that
[14:00] pieterh specifically, for the tooling, which I don't want to be expert in
[14:01] pieterh so most of what is there I copied, and left unchanged because it didn't break things
[14:02] pieterh clearly someone who knows their stuff, like you, would rip most of it out
[14:02] pieterh which is perfect
[14:03] pieterh what's your github id?
[14:03] pieterh plain mikko?
[14:03] mikko mkoppanen
[14:04] Guthur sustrik: the completion key will take care of that
[14:04] pieterh ok, mikko, you are now committer on zfl
[14:04] Guthur it is returned as part of the OVERLAPPED_ENTRY struct
[14:04] mikko cool
[14:04] mikko i'll hopefully finish this during this week
[14:04] Guthur sustrik: oh wait I miss read
[14:04] pieterh :-) I'm enormously grateful...
[14:05] mikko got company evening today so might be a bit out of game tomorrow
[14:05] Guthur I think if you had multiple IOCP it would be returned to all
[14:05] pieterh suddenly zfl actually builds properly across more than Ubuntu :-)
[14:05] Guthur That's a guess though
[14:05] sustrik what would that be good for?
[14:05] sustrik strange
[14:06] Guthur sustrik: that comment for me?
[14:06] sustrik yep
[14:06] sustrik if every event is passed to all completion ports
[14:06] Guthur I have to admit I never really considered multiple IOCP
[14:06] sustrik what's the point of having many of them
[14:07] Guthur you associate the pipe handle with the IOCP
[14:07] sustrik ah
[14:07] sustrik how do you do that?
[14:07] Guthur you can associate multiple pipes with the IOCP
[14:07] zchrish so I added code to C++ example from "the guide" to hwclient.cpp and hwserver.cpp to include a single pollitem_t entry. I included a 5 second delay in the hwserver.cpp. It appears the code always enters regardless.
[14:07] Guthur CreateIoCompletionPort
[14:07] Guthur sustrik: ^^
[14:08] Guthur
[14:08] pieterh zchrish: could you post the code in a pastebin somewhere?
[14:08] zchrish Sure, that's next.
[14:08] sustrik Guthur: ok, i see
[14:09] Guthur sustrik: so, can we make it fit?
[14:09] sustrik i think i have a vague idea of how it works now :)
[14:09] mikko isnt ZMQ_SUBSCRIBE an exception to setsockopts before connect/bind?
[14:09] sustrik Guthur: there are few things to keep in mind
[14:09] sustrik mikko: no
[14:09] pieterh sustrik: really?
[14:10] mikko then i found an error on zguide :)
[14:10] sustrik yes, ZMQ_SUBSCRIBE applies to the socket as a whole
[14:10] pieterh you can't add/remove a filter after binding?
[14:10] Guthur i think you can
[14:10] sustrik as opposed to other sockopts that apply only to subsequent connects/binds
[14:10] sustrik pieterh_: yes
[14:10] pieterh sorry, the 'no' part confused us, I think
[14:11] sustrik Guthur: namely, that the messages should be kept in 0mq as long as possible
[14:11] sustrik thus instead of starting asynch writes immediately for any data to send
[14:12] sustrik you should start async send
[14:12] sustrik wait while it compeltes
[14:12] sustrik then start next send
[14:12] sustrik etc.
[14:12] sustrik the rationale is that if you pushed all the data to the kernel immediately
[14:13] Guthur sustrik: and does 0MQ use the poll to do that subsequent send?
[14:13] sustrik 0mq flow control (such as HWM) won't work
[14:13] sustrik Guthur: all existing polling mechanisms are using sync sends
[14:13] sustrik i.e. they poll for pollout
[14:14] sustrik pollout is signaled if there's a space free in the kernel buffer
[14:14] sustrik then it sends the data
[14:14] Guthur which should be the same as an IOCP completion status signaled for a write op, correct?
[14:14] sustrik yes
[14:15] sustrik it should be same
[14:15] sustrik except that IOCP itself is different from poll
[14:15] sustrik so it'll be a bit complex
[14:15] sustrik but the semantics should be the same, yes
[14:15] Guthur yep, we need to send some op identifying data in the OVERLAPPED,
[14:16] zchrish OK; here is the snipper -
[14:16] Guthur it's the only way we will know what the completion status is being returned for
[14:16] sustrik even better: we can have a single IOCP per poller
[14:17] Guthur but IOCP will return for all ops
[14:17] sustrik yes, but we can place custom data to the result, right?
[14:17] Guthur via a custom OVERLAPPED sure
[14:17] pieterh zchrish: what does the client program print?
[14:18] sustrik then we can identify the socket there as well as operation being performed
[14:18] sustrik something like:
[14:18] sustrik {
[14:18] sustrik HANDLE socket;
[14:18] sustrik bool read_or_write;
[14:18] sustrik }
[14:18] zchrish Just the normal case : Received reply 0: [World]
[14:18] Guthur struct exactly that
[14:19] Guthur well roughly actually
[14:19] Guthur you don't need the Handle though, we can set the completion key to that
[14:19] Guthur it will be returned for all completion status' for that socket
[14:19] zchrish Never goes to "WAITING..."
[14:19] pieterh zchrish: you are doing an infinite timeout in zmq::poll, what else would you expect to see?
[14:19] sustrik Guthur: ok
[14:20] zchrish Sorry, yes.
[14:20] pieterh zchrish: note that timeout is in usec (so use 1000000 for 1 second)
[14:21] Guthur sustrik: a bool wont cut it though, I actually intended to use an enum. There is at least 3 ops that take an overlapped ConnectNamedPipe, Read, write
[14:22] Guthur so the connect will also be returning to the IOCP
[14:25] zchrish So if I want to wrap my code like this, it seems like I should enter "0" and then perform the sleep myself. Is that permissible? Seems so.
[14:27] pieterh zchrish: did you get it working as expected?
[14:27] pieterh e.g. using a zmq_poll timeout of 1 second...
[14:28] pieterh doing the sleep outside zmq_poll is a bad design
[14:29] pieterh imagine your server replies after 1000 usec
[14:29] pieterh you client won't get the response until after 1 full second
[14:30] zchrish yes, it works. Thank you. I used "0". I agree with your assessment.
[14:33] sustrik Guthur: right
[14:36] Guthur cool, I think we are making progress with this. I have some rough code at home, hopefully we can go over it sometime soon
[14:44] zchrish pieterh: So I am playing around with different ways to detect errors in network traffic flow and the one that I seem to feel most comfortable with is the concept of a watchdog thread that monitors socket states. I put my 0mq socket into a thread and that thread doesn't reset the state, an alarm goes off. This is the best method I have learned thus far. If there are better ways that you are willing to share, please do so. Thank
[14:44] zchrish you.
[14:44] pieterh zchrish: you cannot share a socket between threads, remember
[14:45] pieterh in general you need specific algorithms for different kinds of failure
[14:45] zchrish pieterh: No, I have a variable that represents the thread state and the thread is responsible for updating that state on a timely basis.
[14:45] mikko zchrish: what does that actually monitor?
[14:45] pieterh you cannot share state between threads either
[14:45] mikko that the thread is not stuck blocking?
[14:47] zchrish Well the idea is to try to verify that the routine is cycling through its while (true) state which I have defined to do so every "x" cycles of time. I want to ensure this is the case.
[14:49] zchrish pieterh: I am referring to "state" in a non-zeromq sense.
[14:49] pieterh zchrish: you are IMO misusing threads quite fundamentally
[14:49] pieterh each thread should be entirely isolated in terms of state, meaning memory
[14:49] pieterh threads should communicate only by sending each other messages
[14:50] pieterh threads should process a set of sockets that they own fully
[14:51] pieterh the only object in ZMQ that's safe to share between threads is the context
[14:51] zchrish Thank you for your feedback; I will think...
[14:53] stimpie zchrish, you could have all your treads send a 'variable' to your watchdog thread using messages
[15:44] sustrik Guthur: still there?
[15:44] Guthur sure
[15:44] sustrik there's one problem with IOCP i haven't realised
[15:45] sustrik namely: how to implement zmq_poll()
[15:45] sustrik ?
[15:45] sustrik given that fd_t will be HANDLE instead of SOCKET
[15:45] sustrik we can't use select() to simulate the polling
[15:45] Guthur yeah, that's something I meant to be asking you
[15:48] Guthur sustrik: though you can use SOCKETS with IOCP, but I assume that's not the issue
[15:48] sustrik the problem is that IPC descriptor *has* to be HANDLE
[15:49] sustrik hm, well
[15:49] sustrik the I/O thread has to poll on both TCP and IPC sockets
[15:50] sustrik zmq_poll has to poll only on the descriptors provided by mailbox_t
[15:50] sustrik currently SOCKET but presumably a HANDLE in the future
[15:51] sustrik could be doable...
[15:51] Guthur yeah, a socket handle can be a file handle so it would make sense to get them all the same
[15:52] Guthur I think IOCP seems quite neat actually
[15:52] Guthur at bit murky at the beginning
[15:53] Guthur one thing though...
[15:53] Guthur I don't think one can remove a handle from an IOCP
[15:53] sustrik the problem with IOCP is that it doesn't provide a sane pushback mechanism
[15:53] Guthur which begs believe
[15:54] Guthur sustrik: you mean the fact we have to use the overlapped struct to identify etc?
[15:54] sustrik i mean the fact that you can push any amount of data to the socket
[15:55] sustrik without being notified that the TCP buffer is full
[15:57] sustrik there seems to be no equivalent to HWM when using IOCP
[15:57] Guthur sustrik: there is data in the overlapped regarding the amount of data sent
[15:57] Guthur you would probably also have to pass you HWM it make the comparison
[15:58] Guthur that is off the top of my head, so there may be other nicer ways
[15:58] sustrik i don't follow
[15:58] sustrik what data in OVERLAPPED
[15:58] sustrik ?
[15:58] Guthur
[15:58] Guthur internalhigh
[15:59] sustrik "The InternalHigh member was originally reserved for system use and its behavior may change. "
[15:59] Guthur internal might have an error code for full buffer
[15:59] sustrik it's some internal IOCP stuff
[15:59] sustrik better not touch it
[15:59] Guthur no i meant the part that says: The number of bytes transferred for the I/O request. The system sets this member if the request is completed without errors.
[16:00] Guthur we can add are own stuff to the custom overlapped struct so that's not an issue
[16:01] Guthur a custom overlapped could look like follows...
[16:01] Guthur { OVERLAPPED olp; OP_TYPE op; int HWM; }
[16:01] Guthur damn
[16:01] Guthur {
[16:01] Guthur OVERLAPPED olp;
[16:01] Guthur OP_TYPE op;
[16:02] Guthur int HWM;
[16:02] Guthur }
[16:02] Guthur sorry for the spam
[16:02] Guthur that's just a very crude example
[16:03] sustrik hm, how would you limit the amount of pending outbound data?
[16:15] Guthur hmm, yeah that HWM probably wouldn't help, it was more to explicitly show the custom Overlapped
[16:16] Guthur but I think in terms of what is in epoll etc, we can get that easy enough with IOCP
[16:16] Guthur agree?
[16:16] Guthur I haven't looked much outside that
[16:17] sustrik Guthur: in terms of functionality you can get the same with IOCP
[16:17] sustrik although it requires a bit more work
[16:18] sustrik in terms of performance, there can be problems with IOCP
[16:23] Guthur i though IOCP was pretty performant
[16:23] Guthur thought*
[16:24] sustrik the problem i see is with under-filled outbound TCP buffer
[16:24] sustrik to honour the HWM on the send side
[16:25] Guthur how to the other method facilitate that?
[16:25] Guthur to/do
[16:25] sustrik hm, in theory we can count the number of bytes we've already sent to the socket and haven't seen acknowledgements for
[16:26] sustrik yuck
[16:26] Guthur ah yes I see, pretty yucky
[16:26] Guthur so you get this for free the other ways?
[16:27] sustrik yes, using select/poll/epoll etc.
[16:27] sustrik any sane OS has a mechanism like this
[16:27] sustrik Win32 has select
[16:27] sustrik but undfortunately, it can be used just for SOCKETs
[16:27] sustrik (i.e. not named pipes)
[16:28] sustrik There's WSAPoll btw
[16:28] sustrik but:
[16:28] sustrik "The WSAPoll function is defined on Windows Vista and later."
[16:29] sustrik so the alternative to IOCP would be to use WSAPoll on Vista and Win7
[16:29] Guthur umm, XP is a large chuck of windows to not support
[16:29] sustrik and fall back to select() on XP or somesuch
[16:30] sustrik shrug
[16:30] Guthur also may rule out a lot of windows server
[16:30] Guthur not sure on win server kernel families though
[16:31] sustrik Minimum supported server
[16:31] sustrik Windows Server 2008
[16:32] Guthur umm that's quite modern
[16:32] sustrik damn, it works for SOCKETs only
[16:32] Guthur hehe
[16:32] Guthur red herring then
[16:40] sustrik however, WSAPoll seems to have no limit on number of sockets it can poll on
[16:40] sustrik select is by default limited to 64
[16:41] sustrik so maybe, as a warm up, you could try to modify poll.hpp/poll.cpp to use WSAPoll on windows instead of poll
[16:41] sustrik you would need vista/win7 for that, obviously
[16:43] sustrik WSAPoll looks like pretty close copy of POSIX poll, so it should take some 1 hour to do that...
[16:46] Guthur sustrik: and keep the old version as a fall back?
[16:46] sustrik poll.cpp doesn't compile on windows
[16:46] sustrik there's no poll() function there
[16:46] sustrik so you won't break anything
[16:47] Guthur oh, it uses select instead though, right?
[16:47] sustrik right
[16:47] sustrik you could force 0MQ to compile with poll
[16:48] sustrik by defining ZMQ_FORCE_POLL macro
[16:48] sustrik that would make it use poll.cpp instead of select.cpp
[16:48] sustrik obviously, the build will fail now
[16:48] sustrik but it can be presumably fixed by doing something like this:
[16:48] sustrik #ifdef ZMQ_HAVE_WINDOWS
[16:48] sustrik WSAPoll (...);
[16:48] sustrik #else
[16:49] sustrik poll (...);
[16:49] sustrik #endif
[16:50] Guthur ok, seems a reasonable well contained updated
[16:50] Guthur update*
[16:51] Guthur so would there be performance gains for ZMQ, how does the 64 socket limit effect ZMQ at the moment?
[16:53] sustrik in MSVC build the limit is rasied to 1024
[16:53] sustrik still, if 0mq hits the limit it fails
[16:54] sustrik also, poll() should be more efficient with large pollsets than select
[16:56] sustrik Guthur: wait a sec, the current implementation of poll presumes that fd is an int
[16:57] Guthur sustrik: can you point me to the portion of polls which IOCP does supply
[16:57] sustrik which is not true on windows
[16:57] Guthur sorry I'm a little inexperienced with polls and sockets in general
[16:57] sustrik so rewriting the poll wouldn't be that easy
[16:57] sustrik anyway, what's your question?
[16:58] sustrik polling means that you can wait for multiple sockets at once
[16:58] sustrik you wait either for socket becoming readable or socket becoming writeavle (or both)
[16:58] sustrik POSIX defines 2 ways of polling : select and poll
[16:59] sustrik different unix flavours provide additional polling mechanisms:
[16:59] sustrik epoll, /dev/poll, kqueue
[16:59] sustrik winapi is, unfortunately, highly inconsistent
[16:59] Guthur and there is something more than the events?
[17:00] sustrik ?
[17:00] sustrik poll() simply exists
[17:00] sustrik when one of the sockets is readable/writeable
[17:00] sustrik it works in the same way as zmq_poll() does
[17:01] sustrik exits*
[17:01] Guthur ok, so the problem is that IOCP only notifies when an operation has completed?
[17:01] sustrik exactly
[17:01] sustrik it's so called AIO
[17:01] sustrik (async I/O)
[17:02] sustrik which is supposed to be better than standard I/O
[17:02] sustrik howver, it's not used much
[17:02] private_meta Heya... Small question. Is there a way the server knows when a client disconnects?
[17:02] sustrik linux, for example, never implemented AIO for sockets
[17:02] sustrik private_meta: no
[17:03] private_meta so I'd have to implement some heartbeat and check if it's going through?
[17:04] sustrik it's up to you
[17:04] private_meta Would there be better options?
[17:04] sustrik i personally prefer timing out the request and resending afterwards
[17:05] private_meta Well, I would have needed to know when a connection terminates unexpectedly :/
[17:05] sustrik what does that mean?
[17:06] sustrik network stack has no idea about "connection termination"
[17:07] sustrik the only way to find out whether the other party is alive
[17:07] sustrik is to send it a ping
[17:07] sustrik and wait for a reply
[17:07] private_meta hmm k, thank you
[17:07] sustrik if the reply doesn't arrive in x secs, you say the "connection is broken"
[17:07] private_meta Apparently boost asio implemented something like that under the hood
[17:07] sustrik quite possibly
[17:08] private_meta Just to make sure, are timeout mechanims somehow implemented?
[17:09] sustrik there's timeout parameter in zmq_poll() finction
[17:09] sustrik function
[17:09] private_meta Thank you! I'll try to figure out the rest on my own.
[17:20] pieterh sustrik: I sent an email to the list about releases
[17:22] pieterh private_meta: it kind of depends on the type of work you're doing
[17:23] pieterh e.g. for pub-sub, servers don't even know clients exist
[17:24] pieterh and 0MQ's tcp:// transport is 'disconnected' meaning nodes can go and come back invisibly
[17:24] private_meta pieterh: I need two way communication between a server and multiple clients, and the server needs to be aware of the online status of clients
[17:24] pieterh so you have to define what this means, "online status"
[17:24] pieterh and then you have to explicitly send that to the server from clients
[17:24] pieterh typically it means "alive and kicking", i.e. not frozen, not crashed, not offline
[17:24] private_meta if the network connection between server and client is severed, the server needs to know, that's the basic thing
[17:25] private_meta hmm
[17:25] pieterh right
[17:25] pieterh the other typical problems are looping application threads, CPU overload on client box, etc.
[17:25] pieterh so a heartbeat sent by the main thread in your client is often the best thing
[17:25] private_meta ok
[17:25] pieterh this could be done by certain 0MQ sockets but it would not be fully reliable
[17:26] pieterh i.e. if your main thread looped, heartbeats would still be sent out
[17:26] pieterh also the reaction of your server to a dead client is specific to the use case
[17:26] pieterh if you read the Guide, you'll see an example of "least recently used" routing
[17:26] private_meta Of course. The reaction is already implemented. It's just that we need to switch the underlying server-client-infrastructure
[17:26] pieterh it's quite easy to modify to implement heartbeats
[17:27] pieterh I believe there are more advanced examples that actually do heartbeating
[17:28] pieterh take a look at the peering1/3 examples
[17:29] pieterh well, it's more complex than heartbeating but shows how to handle multiple sockets using zmq_poll
[17:29] pieterh
[17:30] pieterh mikko: there are zfl build failures from Hud^hJenkins, I've fixed that issue
[17:31] private_meta Thanks, I'll look it up
[17:31] private_meta *sigh* it's a pain having switch to a new library if it's not all too compatible :/
[17:36] sustrik pieterh: thx
[17:36] pieterh private_meta: you can most likely make a decent emulation of your old library
[17:36] pieterh sustrik: let's see what discussion that creates...
[17:37] private_meta pieterh: In some way that is what I want to do. Or let's say need to do.
[17:37] pieterh what is the old library? Boost.asio?
[17:39] private_meta Yes
[17:39] pieterh I'd suggest making that a public project then
[17:39] pieterh shove it on github, announce it on zeromq-dev, get others to help you
[17:40] private_meta But apparently, when compiled with a linux MPI compiler and when using it with MPI commands, it loses messages
[17:40] pieterh private_meta: if your only problem is bugs, that's pretty good
[17:41] private_meta How so?
[17:41] pieterh should be easy to solve, if it's reproducible
[17:41] private_meta Ahahaa... yeah, that's what we thought
[17:42] private_meta before we spent months trying to fix it
[17:42] pieterh months? wow... ok
[17:42] private_meta The OpenMPI project doesn't care and not a single boost or asio developer can help
[17:43] pieterh Can you explain briefly the relationship between boost asio and MPI?
[17:44] pieterh Also, if you get stuck on any 0MQ issue for more than a few... days... come here or to the dev list for help
[17:44] private_meta Well, I don't know exactly what you want detailed. We use boost asio to communicate between a server and clients, while these clients are MPI programs that run parallel code.
[17:44] pieterh i've never used MPI and have only seen boost asio from a distance
[17:44] pieterh does the MPI API call boost asio?
[17:45] pieterh or is the MPI part separate from the boost asio stuff?
[17:45] private_meta Nonono, MPI and Boost asio are not connected in any way execpt for our code. We use MPI for parallelization.
[17:45] pieterh ok...
[17:45] private_meta But still we need communication from the parallel clients to the server(s)
[17:45] pieterh so your client apps are doing weird multithreading via MPI
[17:46] pieterh and at the same time trying to do sane multithreading via 0MQ at the other side
[17:46] pieterh all in a single process
[17:46] private_meta Somewhat. I'd rather call it multiprocessing
[17:46] pieterh :-) to be able to help, I need to map unfamiliar stuff onto words that make sense in this universe...
[17:47] private_meta We have a cluster with several nodes/servers, and they all need to be communicated with while they do some multicore and multinode crunching
[17:47] private_meta hmm
[17:47] private_meta something familiar...
[17:47] pieterh so a client behaves correctly when it doesn't do any work, and starts to lose messages when it uses MPI...?
[17:47] private_meta well, imagine MPI to be some sort of threading library, where the threads can communicate with each other AND can be on different computers
[17:47] private_meta >_>
[17:47] pieterh sure, like a primitive 0MQ
[17:48] private_meta ok
[17:48] pieterh nah, I'm sure MPI is great, that's not the point
[17:48] pieterh your emulation over 0MQ works until you link clients with MPI, right?
[17:48] private_meta so, we use this setup to execute code on different platforms, like Graphics Cards (CUDA compiler), CPUs (MPI compiler) or IBM Cell Broadband Engine (IBM compiler)
[17:49] pieterh ack, a fairly classic setup IMO
[17:49] private_meta Whenever it's compiled with MPI and used with MPI, several messages are lost
[17:49] pieterh ok
[17:49] pieterh do you have a *minimal* test case that reproduces this?
[17:50] private_meta I'd have to ask my colleague, but he's not here right now
[17:50] pieterh faced with this, what I'd do is:
[17:50] private_meta As far as I know, he created some minimal test case for trying to find the bug
[17:50] pieterh ... ok, more questions
[17:50] pieterh what 0MQ socket types are you using?
[17:51] private_meta slow down, slow down, I'm new to 0MQ, do you mean like "TCP and IPC" or do you mean that "ZMQ_REP" stuff?
[17:51] pieterh :-)
[17:52] pieterh both
[17:52] pieterh socket types means REP/REQ/PUB/SUB/etc.
[17:52] pieterh but I was also going to ask what transports you use (presumably tcp://)
[17:52] private_meta We want to use TCP and IPC
[17:53] pieterh the best way to proceed (and this is for any kind of 0MQ problem you face)
[17:53] pieterh is to make a minimal 0MQ server/client that reproduces the problem
[17:53] private_meta I guess I still need to find a list where the REP/REQ and other 0MQ vocabulary is detailed
[17:53] pieterh and post this somewhere we can look at it (e.g. a gist at github)
[17:53] private_meta uhm...
[17:54] private_meta My problem isn't with 0MQ right now I hope
[17:54] pieterh and you do need to read the Guide (
[17:54] private_meta It's getting to simulate what I HAVE (without the error of course)
[17:54] pieterh 90% of the time its down to some error in how you use 0MQ
[17:54] private_meta yeah, I'm doing that, I was initially coming here to ask about the socket termination issue
[17:55] private_meta It's not like the docs are a 5 minute read :)
[17:55] pieterh 3-4 days, IMO
[17:55] pieterh well, it took longer to write :-)
[17:56] pieterh anyhow, first thing to do is a sanity check of your 0MQ code
[17:56] private_meta I'm sure of that. I try to extract what's necessary, I doubt I need all the details for now (somewhat lazy approach, I know)
[17:56] pieterh we don't care about code that works
[17:56] pieterh so a minimal (totally stripped down) server/client that fails, that we can look at...
[17:56] pieterh if we don't find any errors in that, we can start to blame something else
[17:57] private_meta Well, the usual then :)
[17:57] pieterh right
[17:57] pieterh feel free to email me at if I'm not here when you're ready
[17:57] pieterh or else post to the zeromq-dev list
[17:58] pieterh I assume the problem can be reproduced without exotic hardware?
[18:00] private_meta Somehow it feels like I'm being misunderstood. Until now there's no problem with 0MQ yet, just with Boost Asio, that's why I want to replace Asio with 0MQ, but I just started, so there are no problems, except for the learning curve :)
[18:04] pieterh ah
[18:04] private_meta >_<
[18:04] pieterh see how fast we kill problems with 0MQ!
[18:04] pieterh that took negative 30 minutes
[18:05] pieterh of *course* boost asio is dropping messages
[18:05] private_meta o_O
[18:05] private_meta Well, it shouldn't
[18:05] pieterh sorry for misunderstanding
[18:05] pieterh presumably there is a queue overflow issue or something
[18:08] pieterh So if you want to make a boost asio emulation layer over 0MQ, I'd recommend doing it open source
[18:11] private_meta The amount of "emulation" we need contradicts a full open source emulation... it would be a heck of a lot of work, and I don't have time for that at work >_>
[18:11] private_meta not that it wouldn't be a neat idea
[18:14] pieterh The usual (sane) approach is to make strictly only what you need for your apps, release it, and allow others to expand it
[18:15] pieterh Assuming it's possible to map the subset of boost asio you use to 0MQ
[18:15] private_meta It would be difficult I assume
[18:42] Guthur sustrik, returning to are discussion earlier re: IOCP, would it not be beneficial in the long run, if possible, to have both named pipes and sockets on IOCP, with aim to remove the need for Select
[18:53] staylor I have a question about zmq sockets, are the underlying sockets maintained or opened/closed on demand?
[18:53] staylor reason I ask is I'd like to know from my application if the client application is currently connected to the server or not, but I don't see any socket status calls in the zmq_socket api
[19:04] cremes pieterh: does anyone with a wiki account have permission to modify the FAQ?
[19:04] cremes pieterh: nm; just answered my own question
[19:13] cremes just updated the FAQ to help people with the assertion in mailbox.cpp:182
[20:40] sejo hey all I'm looking at different solutions, and basicly just need an mq to be able to use from python,
[20:40] sejo so what would be my advantage using 0mq over rabbitmq or others?
[20:47] Guthur sejo, I really think it depends on use case scenario
[20:48] Guthur they are different beasts, rabbitmq is a broker base MQ, whereas ZeroMQ is brokerless, for a start
[20:48] Guthur I wont pretend to know much about rabbitmq though
[20:49] Guthur hehe, even my 0MQ knowledge would be on the lighter side compared to some around here
[20:50] sejo ok, well basicly in the beginning i probably have only like 10 clients popping items, and the same 10 pushing others onto it
[20:51] sejo as far as I understand now I should write my own protocol(s) and can use them over the clients and the servers. However is it easy to have multiple servers handling the same data?
[20:52] sejo it probably is
[20:52] sejo sorry stupid question
[20:56] sejo my biggest fear is that i'll spend too much time developing on it before I can use it...
[20:56] sejo that's why I ask around and not test them out all.. don't have the time for it
[20:58] Guthur sure, it's sensible to do research first
[20:59] Guthur scaling to multiple servers would something that 0MQ can do well
[21:01] Guthur but you don't really get much in the way of 'topic' or 'queue' management out of the box, though there are PUB/SUB sockets
[21:01] Guthur I'm reluctant to give any hard advice though, due to be lack of hardcore experience and knowledge
[21:02] sejo thanks anyway right now I have no knowledge on what to use so
[21:02] Guthur you could glance through the 0MQ guide
[21:02] sejo basicly i want multiple servers and n-clients pushing and popping independently
[21:03] Guthur
[21:03] sejo i'm reading through it while we talk :p
[21:03] Guthur ok
[21:03] Guthur hehe cool
[21:03] Guthur there is a few example in there that could give some inspiration for your particular problem
[21:04] sejo yeah, main thing is that I don't need a real pub/sub, client just chooses when to pop a message
[21:06] Guthur check out the Queue device, which would show a possible multi server pattern
[21:06] Guthur at a very simple level
[21:06] Guthur pieter or sustrik would be better at giving advice than me
[21:07] sejo thk i'll chekc it out
[21:07] sejo the thing that got me here was the nice looking python api :p
[21:13] Guthur I'm not familiar with the python binding, but yeah I'm sure its nice, hehe
[21:14] Guthur python has that sort of philosophy, nice simple interfaces
[21:15] sejo we'll i'll read up on it more, the ventilation example pretty much does what i want, only i have multiple ventilators and each of them multiple types of messages
[21:16] sejo well no probably i only need one type that works with json
[21:16] Guthur I like JSON, nice format
[21:17] sejo Guthur: thanks for the information, i'll read up on it a bit more and then i'll probably need to choose
[21:17] sejo ttyal
[21:17] sejo gtg
[21:17] Guthur later
[21:17] Guthur ok, drop by later and someone more experience can give better advice
[21:43] lt_schmidt_jr is gonzalo here perchance
[21:46] whack So, is there no way to bind to a random port? (like binding to port 0)
[21:47] whack I'm not seeing anything obvious in the docs, and attempts to bind to tcp://blah:0 result in an error
[22:46] sustrik lt_schmidt_jr: gonzalo doesn't come here often, you have to use email instead
[22:47] sustrik whack: no there's no way
[22:53] lt_schmidt_jr sustrik; we are having an impedance mismatch on our responses, thanks
[23:06] kdj So what is the proper way to make sure that a message is sent to a polling server? Just a response?
[23:07] cremes kdj: i don't understand the question; can you rephrase?
[23:09] kdj Sorry. We have some clients that will occasionally send a short message to a server... but just sending won't error if the server isn't there. I understand why (I think)
[23:10] kdj But I want to make sure the server is there
[23:10] cremes kdj: that's correct; 0mq has no indicator that the server went away
[23:11] cremes you should establish an "ack" that the server should send back; if it times out, the server is dead
[23:11] cremes i recommend polling on req/rep sockets to accomplish this
[23:11] cremes e.g. each client has its own REQ socket; the server has a XREP socket (so that it can respond to multiple clients)
[23:11] kdj You can poll on REQ sockets?
[23:12] kdj Yeah, that is how it is setup now
[23:12] cremes absolutely; send/recv with ZM_NOBLOCK
[23:12] cremes and register them with zmq_poll
[23:13] kdj Sending with NOBLOCK isn't actually doing anything with just a normal REQ socket, but it does on receive... is that because I need polling?
[23:14] cremes kdj: well, you don't *need* to send with noblock
[23:14] cremes the basic idea is when your client sends the data, start a timer
[23:15] cremes if the server responds back, cancel the timer
[23:15] cremes if the timer expires, close the req socket
[23:15] cremes none of that needs noblock
[23:15] lt_schmidt_jr to jump in with kdj, when would you use in polling vs blocking
[23:15] cremes you will need to poll if your timer and req socket are in the same thread
[23:16] cremes lt_schmidt_jr: like so...
[23:16] cremes if you start your timer and then call recv in blocking mode, how do you handle timer expiration?
[23:16] cremes 1. timer must live in a separate thread or process from the blocking recv
[23:17] lt_schmidt_jr right
[23:17] cremes 2. recv is non-blocking and you use poll to handle the recv; timer is on the same thread
[23:17] cremes those are the 2 ways i would approach
[23:17] cremes i like #2 better
[23:17] kdj Ok. I wasn't sending an acknowledgement from the server originally... just receiving the message and moving on
[23:17] cremes threading gets so messy
[23:18] cremes kdj: if you were using REQ sockets on the client, the next time you tried to send you would get a EFSM error
[23:18] cremes REQ/REP sockets are strictly stateful; REQ *must* send/recv/send/recv while REP *must* recv/send/recv/send
[23:19] kdj Yeah, that makes sense.
[23:21] lt_schmidt_jr hmm, interesting, so I should be able to put multiple sockets with a poller
[23:21] lt_schmidt_jr same poller
[23:23] kdj Hmmm... does 0mq send an acknowledgement automatically?
[23:23] cremes lt_schmidt_jr: yes
[23:23] cremes kdj: no
[23:24] cremes kdj: the heartbeat is an application-level responsibility; your code must process and send the ack
[23:24] cremes you could actually abstract this out into your own private "heartbeat" socket and make it completely transparent
[23:25] kdj Yeah, that totally makes sense... I just threw some code together to test it though and it (sort of) works
[23:26] kdj having a poller on the server end, which just recieves messages (no sending), and a client which sends and then receives... somehow the receiving on the client end is still happening (and not blocking)
[23:26] lt_schmidt_jr kdj: for me I am planning to use ZooKeeper, which I have used successfully in a similar way to figure out server presence
[23:27] lt_schmidt_jr in my case to figure out other servers that will form a cluster
[23:27] cremes kdj: print out the data that your client is receiving
[23:27] cremes or run tcpdump and watch the packets fly
[23:27] cremes unless you are issuing a zmq_send() from the server, the client shouldn't be getting a response
[23:27] cremes there has to be code doing that somewhere in your example
[23:28] cremes is it small enough to pastie?
[23:29] lt_schmidt_jr kdj, cremes: you can use
[23:31] kdj Sorry, I think it was just my threading code for testing it. It works as it is supposed to. :X
[23:32] cremes yeah, that's an easy mistake to make
[23:32] cremes take a look at using the "inproc" transport for communicating between threads
[23:32] cremes it obviates the need for mutexes and makes threading code simpler
[23:32] cremes btw, that's one of the great wins of using 0mq; it's a threading library too!
[23:34] lt_schmidt_jr cremes: not to ask a stupid question, but how does one use it for threading - is it in the guide?
[23:36] cremes lt_schmidt_jr: i don't know if it's in the guide; haven't looked lately
[23:36] cremes but here's the basic idea
[23:36] cremes imagine you have 10 threads trying to access a shared resource
[23:36] lt_schmidt_jr right
[23:36] cremes right now you use a mutex, spinlock or some locking structure
[23:36] lt_schmidt_jr ok
[23:37] cremes with 0mq, put the resource that everyone wants into its own thread and give it a XREP socket
[23:37] cremes now make every other thread a "client" of that "server" and give them REQ sockets
[23:37] cremes connect them all together using inproc (all platforms) or ipc (unix only) to communicate so you don't pay the TCP penalty
[23:38] cremes each client "asks" the resource for whatever via the 0mq socket
[23:38] cremes the 0mq socket serializes all access to the resource and prevents all race conditions
[23:38] cremes make sense?
[23:38] lt_schmidt_jr I see
[23:38] lt_schmidt_jr absolutely
[23:38] lt_schmidt_jr thank you
[23:38] cremes this is the basic idea behind Actors if you have played with those in any languages
[23:39] lt_schmidt_jr I have played with erl
[23:39] cremes lt_schmidt_jr: right; instead of using mutexes, you are using *messaging* for your concurrency
[23:39] cremes and here's another cool part of using 0mq
[23:39] lt_schmidt_jr cremes: very cool
[23:40] cremes let's say at some point this "server" resource needs to be on its own box
[23:40] cremes all you have to do to change communications is modify the transport string that you pass to zmq_connect/zmq_bind from inproc (or ipc) to tcp
[23:40] cremes instant scaling
[23:40] cremes i have used this technique many times already; works wonderfully
[23:40] lt_schmidt_jr yeah, you would just change the ..
[23:41] lt_schmidt_jr I have prototyped a pub/sub message bus and I have inproc/ipc/tcp going between different participants
[23:42] kdj Hmmm... now I'm not really sure how our original client/server stuff was working...
[23:42] lt_schmidt_jr but I think I am just not treating the threading correctly - too many threads
[23:43] cremes lt_schmidt_jr: you'll have to figure that one out; i'm not a threading expert
[23:45] lt_schmidt_jr cremes: the issue is I have a thread per connection and I still need to use polling to figure out if the thread needs to be shut down
[23:45] lt_schmidt_jr so its a little ugly
[23:46] cremes i don't understand, but ok
[23:46] lt_schmidt_jr If I block on recv, I am not sure how a subscriber can be inerrrupted
[23:46] cremes oh, i see
[23:47] cremes are you using 2.0.10 or 2.1.0?
[23:47] lt_schmidt_jr 2.0.1 and Java
[23:47] lt_schmidt_jr 2.0.10
[23:47] cremes um... ok
[23:47] lt_schmidt_jr is there something in 2.1.0 that I should be using?
[23:47] cremes i think your only solution then is to close the entire context via zmq_term()
[23:48] cremes that will cause each socket to awaken and return ETERM
[23:48] cremes everybody should be on 2.1.0 now; the only 2.0.10 users should be legacy guys who *cannot* upgrade for whatever reason
[23:48] cremes so yeah, upgrade
[23:48] lt_schmidt_jr see, I have multiple subscibers within the same context, and only one would need to be terminated
[23:49] cremes yep, terminating the context terminates *all* sockets so that's your only choice there
[23:49] cremes in 2.1.0 i believe you can call zmq_close() on the socket from another thread and it will work as expected
[23:49] lt_schmidt_jr ok, I skipped 2.1.0, because it caused the java binding unit tests to fail
[23:49] cremes yeah, 2.1.0 is considered beta so not everyone has updated their bindings
[23:50] lt_schmidt_jr maybe I should do that myself
[23:50] cremes but it is *way* more stable than 2.0.10 so i would upgrade
[23:50] cremes maybe you could submit a patch to fix the java tests
[23:50] lt_schmidt_jr I submitted the maven fix, should do this as well
[23:51] lt_schmidt_jr so I could close the socket from a different thread, great
[23:52] lt_schmidt_jr I guess I could figure out how to use polling correctly and not have a bunch of threads in the first place
[23:52] lt_schmidt_jr that is have many sockets and a single polling thread
[23:53] lt_schmidt_jr and not have the computer turn into a space heater
[23:53] lt_schmidt_jr will go through the guide
[23:54] kdj Thanks for your help cremes