Monday April 5, 2010

[Time] NameMessage
[06:19] Bakafish Hello group. I was hacking on the new "Butterfly" Example code and have an issue where I successfully send a message over an ipc channel but the blocked read side never seems to see it.
[06:26] Bakafish I'm referring to the final 'sync' message indicating that all the test packets were received. The test packets are going across a similar channel just fine. But the final sync packet to trigger the end of the timer is sent and the "Sending" process never seems to see it.
[06:30] sustrik Bakafish: does the packet cross the network?
[06:30] sustrik mikko: re
[06:31] Bakafish It's using ipc (Pipes) so, no :-)
[06:31] sustrik ah
[06:33] Bakafish Is it a situation where the packet is below some threshold so it's being cached in a queue? I'm reaching for straws here.
[06:34] sustrik Bakafish: no
[06:34] sustrik it should be passed immediately
[06:34] sustrik so all the components are running on the same box
[06:35] sustrik using IPC for communication
[06:35] sustrik right?
[06:35] Bakafish I'm getting a good return on the send side. The blocking read blocks correctly, it just never sees the packet.
[06:35] Bakafish That's correct.
[06:36] Bakafish The other, seemingly identical connections are behaving correctly.
[06:36] sustrik is the IPC filename same on the send and recv side?
[06:36] sustrik (sanity check)
[06:38] Bakafish :-) I checked for that, and low and behold, one arg was ipc:///tmp/zmq/send_sync the other ipc://tmp/zmq/send_sync
[06:38] Bakafish Way to introduce myself to the group :-(
[06:39] Bakafish I was ls ing my tmp directory and saw the 3 expected files, didn't think to check for a missing slash. Thanks for the help!
[06:40] sustrik no problem :)
[07:29] Bakafish Okay, a question probably rivaling the level of stupidity of my prior one. In the case of more than one agent connecting to a ZMQ_DOWNSTREAM it round robins great, but if an agent goes away it doesn't seem to be aware and packets are seemingly queued or lost for that agent. How do you access the queue registry and manipulate it?
[07:43] sustrik Bakafish: the messages already delivered to an application are lost once the application crashes
[07:43] sustrik you can limit the damage
[07:43] sustrik by setting HWM socket option
[07:43] sustrik which specifies how much messages may be queued in any given moment
[07:46] Bakafish That's fair. But it's not what I mean. If I launch the 'Sender' with two 'processor' nodes and one 'results' node and run 1000 checks, everything is fine. 500 each. Then I kill one processor node, so there is only a single node in the pipeline. The queue still thinks there are two (timeout not met yet?) and a second run will result in 500 checks sent to the single processor instance.
[07:47] sustrik let me see...
[07:48] sustrik hm, i am not sure how UNIX domain sockets handle application crashes
[07:48] sustrik maybe it takes some time to notify the sender that the receiver is dead?
[07:49] sustrik can you try sleeping for a second after killing the app
[07:49] sustrik ?
[07:49] Bakafish If I have to write my own code to see if a process is alive or not that's fine, but how do I identify the processor nodes attached to a paticular queue?
[07:50] Bakafish The server is dispatching these checks and so it has some knowledge of how many there are I'd assume.
[07:50] Bakafish Ahh, this isn't a problem with TCP?
[07:51] Bakafish Meaning, if I was using TCP instead of sockets I shouldn't expect this behavior?
[07:51] sustrik well, i am not sure what's happening, however, the obvious explanation would be that there's an delay between receiver dying and sender being notified about the fact
[07:52] Bakafish sockets, bah. Pipes
[07:52] sustrik in the meantime the mesages will be dispatched to that receiver
[07:52] sustrik it applies to any communication channel (TCP, IPC, ...)
[07:53] Bakafish I see. I will check to see if it eventually times out. Is there a setting (I recall someone mentioning there was.)
[07:53] sustrik nope, it's dependent on IPC implementation
[07:53] sustrik there's probably a timeout in the kernel somewhere
[07:56] Bakafish Ahhh the HWM socket option was what was discussed. I think someone was saying you could limit it but that there were all sorts of places where it could be queued along the path and reducing the number of messages in the queue had performance issues.
[07:58] Bakafish Let me play with it some more since I was using local pipes for convenience, I will be using TCP in production so it may behave better.
[13:34] mikko sustrik: did you notice the issue i opened at github over the hols?
[13:47] sustrik mikko: yes, i did
[14:09] mikko cool
[14:09] mikko i got zmq_poll soon implemented
[14:09] mikko hopefully
[14:15] sustrik mikko: seen you've did ~20 commits a day during the holiday :)
[14:21] mikko open source is made by night
[14:21] mikko :)
[14:21] mikko at least in my case
[14:28] sustrik mikko: i've linked the PHP wiki page from the main page
[14:28] mikko sustrik: thanks
[14:28] sustrik you should announce availability of the binding on the mailing list
[14:29] mikko sustrik: i'll announce as soon as the api stabilizes
[14:29] sustrik goodo
[14:29] mikko still need to add polling support
[14:29] sustrik sure
[14:29] mikko after that it's fairly complete
[14:29] mikko some of the tests are failing due to the zeromq2 / issue #12
[14:32] sustrik i've reproduced the problem
[14:32] sustrik trying to fix it...
[14:32] mikko different errors with different socket types
[14:32] sustrik yes
[14:37] mikko i'm gonna take a short nap, been touring around berlin all day
[14:37] mikko laters
[14:52] sustrik see you