ZeroMq IRC Log

Sunday December 12, 2010

[Time] Name	Message
[10:35] mathijs	Hi all, is it possible to find out how many messages are currently buffered?
[10:36] mathijs	I like the HWM and the transparent way things get handled in 0mq, but I would like to do some custom stuff when problems arise
[10:38] mathijs	So if a buffer starts to fill up, or reaches HWM, I would like my program to log warnings about it, or talk to some controller process about it.
[10:40] mathijs	ofcourse I can have my processes all log every received/sent message through some other socket to have a central process track overall flow, but it would be nice if there was a more low-level way
[10:42] sustrik	set hwm, send in non-blocking way
[10:42] rgl	mathijs, I don't think its possible right now; but from what I understood, future ZMQ version will have a way to accesses those events.
[10:42] sustrik	when zmq_send returns EAGAIN, the buffer is full
[10:43] rgl	oh we can? G
[10:43] mathijs	sustrik: so set a low hwm to have the application handle it?
[10:43] sustrik	if you want so
[10:44] sustrik	think of the queue as a network buffer
[10:44] rgl	we can also store the messages in a file, right?
[10:44] sustrik	you can offload them to the disk,yes
[10:46] mathijs	hmmm ok, that will work for preventing problems. but what I would like as well is a way to shut down an application safely. I understood 0mq 2.1 will wait for all messages to be handled (empty buffers). So in case one buffer isn't empty yet, and waiting for some other process to come up, I would like to find out what socket/buffer is waiting
[10:50] rgl	indeed, it would be nice to get the current state of a socket (or receive events about state changes). this is something that will be available on the future right sustrik?
[12:30] sustrik	mathijs, rgl: the number of messages in 0MQ queue is an irrelevant statistic, the messages are queue along the whole path from sender to receiver, in 0MQ buffers, TCP buffers, NICs, switch and router queues etc.
[12:33] rgl	sustrik, humm. sure, but they give you some hint if anything might be wrong. for example, how do we known why the messages are pilling up? it might be because a single consumer is delaying the whole seebang.
[12:34] rgl	or for some reason we stopped to be able to connect the socket that once was working fine.
[12:34] sustrik	that's why slow consumers are disconnected in pub/sub pattern
[12:35] rgl	indeed, but currently, how do you known it was disconnected?
[12:35] sustrik	yes, "cannot reconnect" problem should be logged
[12:35] sustrik	because it may be a signal to admins that something went wrong
[12:35] sustrik	and manual intervntion is needed
[12:36] rgl	indeed, thats something that we should be able to log, and to monitor, so we can act uppon it.
[12:36] sustrik	there's a logging mechanism in 0MQ, but at the moment nobody logs anything
[12:36] sustrik	TODO
[12:38] rgl	I might be wrong, but I though there were some messages on the ML about funneling that into some in-queue that could be consumed as a regular socket.
[15:02] mikko	sustrik: Assertion failed: stored (pipe.cpp:238)
[15:02] mikko	what should be the behavior when swap fills up?
[15:03] mikko	happens in zmq::writer_t::write pipe.cpp
[15:06] mikko	actually it should probably return false in bool zmq::writer_t::write (zmq_msg_t *msg_)
[15:34] benoitc	hi all
[15:34] benoitc	can we fork processes and share a zeromq "socket" between children ?
[15:34] benoitc	with pyzmq
[15:49] cremes	benoitc: i doubt that is supported; it's not a good idea at all
[15:49] cremes	if the children need to communicate with each other, open up sockets that are specific to that job
[16:04] benoitc	so i need to open a zmq socket per worker on the parent process ?
[16:06] cremes	benoitc: that's what i would recommend
[16:06] cremes	these sockets are "cheap" so i wouldn't worry about computational cost
[16:07] benoitc	hm ok
[16:07] benoitc	thanks
[16:07] cremes	saying "hmmm" implies you either don't agree or you don't understand why
[16:07] cremes	do you still have a question?
[16:08] benoitc	well it means i've to rethink my design :)
[16:08] benoitc	i was thinking at first to just open a zmq like any socket and make it not blocking
[16:09] cremes	not necessarily; if your design called for multiple children to all listen to the same socket then perhaps a device in the middle will solve your issue
[16:09] cremes	all children can connect to the device on the same port and the device will (depending on socket type) deliver messages to the children
[16:09] benoitc	well idea was to balance between workers as soon as a message is send to the socket
[16:10] cremes	definitely read the guide if you haven't done so yet
[16:10] cremes	so you are using req/rep sockets?
[16:13] cremes	i need to run out for about 90m; feel free to post a description of what you are trying to do
[16:13] cremes	i'll try to give some guidance when i return
[16:14] benoitc	mm yes I should reply i took the message I guess
[16:14] benoitc	in my first approach wanted to have this balancing you can do with "normal" sockets
[16:14] benoitc	anyway later
[16:53] mikko	benoitc: 0MQ has load-balancing built-in
[16:56] benoitc	mikko: so once the message is get by one listsner in req/rep it won't be send to other listeners ?
[16:56] benoitc	(just want to make sure)
[16:58] mikko	benoitc: different socket types have different kind of distribution algorithms
[16:58] mikko	for example in PUB/SUB the published message is sent to all subscribers
[16:59] mikko	but for example PUSH/PULL balances between live connections
[16:59] mikko	you should really read zguide
[18:00] cremes	benoitc: mikko is right; most of this is covered in the guide: http://zguide.zeromq.org/chapter:all
[18:01] cremes	for some reason, that link is returning a 503 right now
[18:01] cremes	sustrik, pieterh: the guide is offline
[18:01] cremes	it's returning a 503 error
[18:26] mikko	cremes: works from here
[18:27] cremes	mikko: it's responding now for me too but it wasn't before
[19:18] benoitc	yeah reading it
[19:18] benoitc	thanks
[20:14] sustrik	mikko: what version is that?
[20:15] sustrik	what it should do is behave the same way is if HWM is reached
[20:15] sustrik	block, drop, whatever
[20:15] sustrik	depending on the pattern
[20:18] mikko	sustrik: trunk, the lines won't match as i have some local changes
[20:19] mikko	sustrik: yes, probably
[20:19] sustrik	i see, so which assertion was actually hit?
[20:19] mikko	sustrik: writer_t::write
[20:19] mikko	when swap is full
[20:19] mikko	let me check the upstread lineno
[20:19] sustrik	zmq_assert (stored); ?
[20:20] mikko	https://github.com/zeromq/zeromq2/blob/master/src/pipe.cpp#L237
[20:20] mikko	yes
[20:20] sustrik	looks like a bug
[20:20] mathijs	I read some discussion about socket migrations and memory barriers. I never heard of those before, so I probably shouldn't touch stuff... but I think I have a possible usecase. GHC (haskell) uses a IO manager which manages a pool of OS threads. On top of those, it runs "green threads". A green thread can signal it's waiting for some FD. execution stops. When the fd is ready (epoll), execution resumes. But it's possible that it's mapped to a
[20:20] mathijs	different OS thread
[20:21] sustrik	afaiu the saving of messages to the swap is not atomic
[20:21] sustrik	i.e. you can store one message part and next one is rejected
[20:21] sustrik	should be fixed
[20:22] mathijs	is the case I describe a valid one for using socket migrations?
[20:24] sustrik	yes, that was the main use case for the "migration" feature
[20:25] sustrik	you can assume that the memory barriers are handled correctly by haskell runtime
[20:25] mathijs	I can signal the IO manager that a certain green thread gets mapped to the same OS thread every time it runs/continues, but having them scheduled on the least-busy OS thread sounds better.
[20:25] sustrik	yes, it should work
[20:25] mathijs	sustrik: thanks
[21:08] mikko	sustrik: so, i'm thinking about the whole swap thing
[21:09] mikko	what do you think about abstracting the swap slightly further?
[21:09] mikko	maybe something like: make the swap implementation pluggable and use inproc pipe to communicate with the swap engine
[21:24] sustrik	mikko: that way you would introduce a bottleneck
[21:25] mikko	but swap is going to be a bottleneck in any case
[21:25] sustrik	you can always make a device that would store messages on the disk if you want to
[21:25] mikko	the whole concept of swapping is more preserving operation rather than performance
[21:26] sustrik	the point is that you want the swap as fast as possible exactly because it is slow
[21:26] sustrik	so, for example
[21:26] sustrik	if your apps falls over into swap mode
[21:27] sustrik	you want it to get out of it as fast as possible once the network is up and running again
[21:29] sustrik	if you make it slow it may even happen that it will never get out of the swap mode
[21:29] sustrik	it's kind of tricky
[21:29] mikko	how much overhead does forwarding a message over inproc pipe add?
[21:29] sustrik	dunno
[21:29] sustrik	should be measured
[21:30] sustrik	maybe it would make no difference at all
[21:31] mikko	if there is little or no overhead it would allow people to implement optimised swap solutions
[21:32] sustrik	sure
[21:32] mikko	i noticed that there is the posix_fadvise for linux
[21:32] sustrik	yes
[21:32] mikko	but there are probably available optimizations for other platforms as well
[21:33] mikko	my point with this was: swap is really something that doesn't necessarily need to be in 0MQ core
[21:33] sustrik	ack
[21:33] mikko	if the implementation was pluggable it would allow people to make their own and share them
[21:33] mikko	not sure, it's tricky
[21:33] sustrik	it was added as a feature paid for by a customer
[21:34] sustrik	so it's kind of a hack atm
[21:35] mikko	void zmq::swap_t::rollback ()
[21:35] mikko	is that used ?
[21:35] sustrik	let me think about it...
[21:35] mikko	as it seems that it ends up into assertion in any possible path
[21:35] sustrik	mikko: it should be afaics
[21:35] mikko	well, possibly ends up
[21:36] sustrik	hm, quite possibly i've broken the implementation when i did all the changes for 2.1
[21:37] sustrik	well, it would be nice is swapping was implemented as a device
[21:37] sustrik	so you would just plug it in-between two nodes to get swap
[21:38] sustrik	one problem is that by doing so you have to do two hops instead of a single one
[21:38] sustrik	even though swap is not being used at the moment
[21:38] mikko	that is true
[21:38] mikko	but would you use swap if you were aiming for absolute performance?
[21:38] sustrik	actually, i like the idea
[21:39] sustrik	we could just say 'sorry, there's performance impact with swap'
[21:39] mikko	so, if you were to generalise this a bit further
[21:40] mikko	zmq_callback_device which takes function pointers for message received from front and sent out to back
[21:40] mikko	then swap would implement functions for storing and removing
[21:40] mikko	you could use the same pattern for implementing 'persistent' things
[21:41] mikko	not sure about exact details yet
[21:41] sustrik	right, it's just a device
[21:41] sustrik	i would like to move devices out of core 0mq with 3.0
[21:41] sustrik	swap could be removed as well at that point
[21:41] mikko	i think for devices to be more useful is a way to stop them without SIGINT
[21:42] sustrik	and instead we can provide a standalone swapping device
[21:42] sustrik	yes, you want a remote managementy
[21:42] mikko	void *device = zmq_device(...); while (1) { if (shutdown) { device_shutdown(device); } sleep(5); }
[21:42] mikko	something like that
[21:43] sustrik	shutdown sent as a message to the device, right?
[21:43] mikko	yes, currently in my small program i run device in a thread and just use pthread_cancel to stop the device thread
[21:43] sustrik	from the management console
[21:43] mikko	that is a possibility yes
[21:43] mikko	but does it open the device for DoS ?
[21:44] mikko	or do you mean some sort of internal message?
[21:44] sustrik	you need authentication etc.
[21:44] sustrik	the whole device thing is where development will happen in the future imo
[21:45] sustrik	actually, what i foresee is that devices will be removed from core 0mq
[21:45] sustrik	there will be various open source devices
[21:45] sustrik	but there can also be complex proprietary devices
[21:46] mikko	we should have a hackathon on removing all these assertions at some point
[21:46] sustrik	think of how cisco builds boxes for TCP/IP stack
[21:46] sustrik	yes, but it has to happen with new major version
[21:46] sustrik	i.e. 3.0
[21:47] sustrik	as it's not backward compatible
[21:49] mikko	thats true