ZeroMq IRC Log

Monday April 5, 2010

[Time] Name	Message
[06:19] Bakafish	Hello group. I was hacking on the new "Butterfly" Example code and have an issue where I successfully send a message over an ipc channel but the blocked read side never seems to see it.
[06:26] Bakafish	I'm referring to the final 'sync' message indicating that all the test packets were received. The test packets are going across a similar channel just fine. But the final sync packet to trigger the end of the timer is sent and the "Sending" process never seems to see it.
[06:30] sustrik	Bakafish: does the packet cross the network?
[06:30] sustrik	mikko: re
[06:31] Bakafish	It's using ipc (Pipes) so, no :-)
[06:31] sustrik	ah
[06:33] Bakafish	Is it a situation where the packet is below some threshold so it's being cached in a queue? I'm reaching for straws here.
[06:34] sustrik	Bakafish: no
[06:34] sustrik	it should be passed immediately
[06:34] sustrik	so all the components are running on the same box
[06:35] sustrik	using IPC for communication
[06:35] sustrik	right?
[06:35] Bakafish	I'm getting a good return on the send side. The blocking read blocks correctly, it just never sees the packet.
[06:35] Bakafish	That's correct.
[06:36] Bakafish	The other, seemingly identical connections are behaving correctly.
[06:36] sustrik	is the IPC filename same on the send and recv side?
[06:36] sustrik	(sanity check)
[06:38] Bakafish	:-) I checked for that, and low and behold, one arg was ipc:///tmp/zmq/send_sync the other ipc://tmp/zmq/send_sync
[06:38] Bakafish	Way to introduce myself to the group :-(
[06:39] Bakafish	I was ls ing my tmp directory and saw the 3 expected files, didn't think to check for a missing slash. Thanks for the help!
[06:40] sustrik	no problem :)
[07:29] Bakafish	Okay, a question probably rivaling the level of stupidity of my prior one. In the case of more than one agent connecting to a ZMQ_DOWNSTREAM it round robins great, but if an agent goes away it doesn't seem to be aware and packets are seemingly queued or lost for that agent. How do you access the queue registry and manipulate it?
[07:43] sustrik	Bakafish: the messages already delivered to an application are lost once the application crashes
[07:43] sustrik	you can limit the damage
[07:43] sustrik	by setting HWM socket option
[07:43] sustrik	which specifies how much messages may be queued in any given moment
[07:46] Bakafish	That's fair. But it's not what I mean. If I launch the 'Sender' with two 'processor' nodes and one 'results' node and run 1000 checks, everything is fine. 500 each. Then I kill one processor node, so there is only a single node in the pipeline. The queue still thinks there are two (timeout not met yet?) and a second run will result in 500 checks sent to the single processor instance.
[07:47] sustrik	let me see...
[07:48] sustrik	hm, i am not sure how UNIX domain sockets handle application crashes
[07:48] sustrik	maybe it takes some time to notify the sender that the receiver is dead?
[07:49] sustrik	can you try sleeping for a second after killing the app
[07:49] sustrik	?
[07:49] Bakafish	If I have to write my own code to see if a process is alive or not that's fine, but how do I identify the processor nodes attached to a paticular queue?
[07:50] Bakafish	The server is dispatching these checks and so it has some knowledge of how many there are I'd assume.
[07:50] Bakafish	Ahh, this isn't a problem with TCP?
[07:51] Bakafish	Meaning, if I was using TCP instead of sockets I shouldn't expect this behavior?
[07:51] sustrik	well, i am not sure what's happening, however, the obvious explanation would be that there's an delay between receiver dying and sender being notified about the fact
[07:52] Bakafish	sockets, bah. Pipes
[07:52] sustrik	in the meantime the mesages will be dispatched to that receiver
[07:52] sustrik	it applies to any communication channel (TCP, IPC, ...)
[07:53] Bakafish	I see. I will check to see if it eventually times out. Is there a setting (I recall someone mentioning there was.)
[07:53] sustrik	nope, it's dependent on IPC implementation
[07:53] sustrik	there's probably a timeout in the kernel somewhere
[07:56] Bakafish	Ahhh the HWM socket option was what was discussed. I think someone was saying you could limit it but that there were all sorts of places where it could be queued along the path and reducing the number of messages in the queue had performance issues.
[07:58] Bakafish	Let me play with it some more since I was using local pipes for convenience, I will be using TCP in production so it may behave better.
[13:34] mikko	sustrik: did you notice the issue i opened at github over the hols?
[13:47] sustrik	mikko: yes, i did
[14:09] mikko	cool
[14:09] mikko	i got zmq_poll soon implemented
[14:09] mikko	hopefully
[14:15] sustrik	mikko: seen you've did ~20 commits a day during the holiday :)
[14:21] mikko	open source is made by night
[14:21] mikko	:)
[14:21] mikko	at least in my case
[14:28] sustrik	mikko: i've linked the PHP wiki page from the main page
[14:28] mikko	sustrik: thanks
[14:28] sustrik	you should announce availability of the binding on the mailing list
[14:29] mikko	sustrik: i'll announce as soon as the api stabilizes
[14:29] sustrik	goodo
[14:29] mikko	still need to add polling support
[14:29] sustrik	sure
[14:29] mikko	after that it's fairly complete
[14:29] mikko	some of the tests are failing due to the zeromq2 / issue #12
[14:32] sustrik	i've reproduced the problem
[14:32] sustrik	trying to fix it...
[14:32] mikko	different errors with different socket types
[14:32] sustrik	yes
[14:37] mikko	i'm gonna take a short nap, been touring around berlin all day
[14:37] mikko	laters
[14:52] sustrik	see you