[Time] Name | Message |
[10:35] mathijs
|
Hi all, is it possible to find out how many messages are currently buffered?
|
[10:36] mathijs
|
I like the HWM and the transparent way things get handled in 0mq, but I would like to do some custom stuff when problems arise
|
[10:38] mathijs
|
So if a buffer starts to fill up, or reaches HWM, I would like my program to log warnings about it, or talk to some controller process about it.
|
[10:40] mathijs
|
ofcourse I can have my processes all log every received/sent message through some other socket to have a central process track overall flow, but it would be nice if there was a more low-level way
|
[10:42] sustrik
|
set hwm, send in non-blocking way
|
[10:42] rgl
|
mathijs, I don't think its possible right now; but from what I understood, future ZMQ version will have a way to accesses those events.
|
[10:42] sustrik
|
when zmq_send returns EAGAIN, the buffer is full
|
[10:43] rgl
|
oh we can? *G*
|
[10:43] mathijs
|
sustrik: so set a low hwm to have the application handle it?
|
[10:43] sustrik
|
if you want so
|
[10:44] sustrik
|
think of the queue as a network buffer
|
[10:44] rgl
|
we can also store the messages in a file, right?
|
[10:44] sustrik
|
you can offload them to the disk,yes
|
[10:46] mathijs
|
hmmm ok, that will work for preventing problems. but what I would like as well is a way to shut down an application safely. I understood 0mq 2.1 will wait for all messages to be handled (empty buffers). So in case one buffer isn't empty yet, and waiting for some other process to come up, I would like to find out what socket/buffer is waiting
|
[10:50] rgl
|
indeed, it would be nice to get the current state of a socket (or receive events about state changes). this is something that will be available on the future right sustrik?
|
[12:30] sustrik
|
mathijs, rgl: the number of messages in 0MQ queue is an irrelevant statistic, the messages are queue along the whole path from sender to receiver, in 0MQ buffers, TCP buffers, NICs, switch and router queues etc.
|
[12:33] rgl
|
sustrik, humm. sure, but they give you some hint if anything might be wrong. for example, how do we known why the messages are pilling up? it might be because a single consumer is delaying the whole seebang.
|
[12:34] rgl
|
or for some reason we stopped to be able to connect the socket that once was working fine.
|
[12:34] sustrik
|
that's why slow consumers are disconnected in pub/sub pattern
|
[12:35] rgl
|
indeed, but currently, how do you known it was disconnected?
|
[12:35] sustrik
|
yes, "cannot reconnect" problem should be logged
|
[12:35] sustrik
|
because it may be a signal to admins that something went wrong
|
[12:35] sustrik
|
and manual intervntion is needed
|
[12:36] rgl
|
indeed, thats something that we should be able to log, and to monitor, so we can act uppon it.
|
[12:36] sustrik
|
there's a logging mechanism in 0MQ, but at the moment nobody logs anything
|
[12:36] sustrik
|
TODO
|
[12:38] rgl
|
I might be wrong, but I though there were some messages on the ML about funneling that into some in-queue that could be consumed as a regular socket.
|
[15:02] mikko
|
sustrik: Assertion failed: stored (pipe.cpp:238)
|
[15:02] mikko
|
what should be the behavior when swap fills up?
|
[15:03] mikko
|
happens in zmq::writer_t::write pipe.cpp
|
[15:06] mikko
|
actually it should probably return false in bool zmq::writer_t::write (zmq_msg_t *msg_)
|
[15:34] benoitc
|
hi all
|
[15:34] benoitc
|
can we fork processes and share a zeromq "socket" between children ?
|
[15:34] benoitc
|
with pyzmq
|
[15:49] cremes
|
benoitc: i doubt that is supported; it's not a good idea at all
|
[15:49] cremes
|
if the children need to communicate with each other, open up sockets that are specific to that job
|
[16:04] benoitc
|
so i need to open a zmq socket per worker on the parent process ?
|
[16:06] cremes
|
benoitc: that's what i would recommend
|
[16:06] cremes
|
these sockets are "cheap" so i wouldn't worry about computational cost
|
[16:07] benoitc
|
hm ok
|
[16:07] benoitc
|
thanks
|
[16:07] cremes
|
saying "hmmm" implies you either don't agree or you don't understand why
|
[16:07] cremes
|
do you still have a question?
|
[16:08] benoitc
|
well it means i've to rethink my design :)
|
[16:08] benoitc
|
i was thinking at first to just open a zmq like any socket and make it not blocking
|
[16:09] cremes
|
not necessarily; if your design called for multiple children to all listen to the same socket then perhaps a device in the middle will solve your issue
|
[16:09] cremes
|
all children can connect to the device on the same port and the device will (depending on socket type) deliver messages to the children
|
[16:09] benoitc
|
well idea was to balance between workers as soon as a message is send to the socket
|
[16:10] cremes
|
definitely read the guide if you haven't done so yet
|
[16:10] cremes
|
so you are using req/rep sockets?
|
[16:13] cremes
|
i need to run out for about 90m; feel free to post a description of what you are trying to do
|
[16:13] cremes
|
i'll try to give some guidance when i return
|
[16:14] benoitc
|
mm yes I should reply i took the message I guess
|
[16:14] benoitc
|
in my first approach wanted to have this balancing you can do with "normal" sockets
|
[16:14] benoitc
|
anyway later
|
[16:53] mikko
|
benoitc: 0MQ has load-balancing built-in
|
[16:56] benoitc
|
mikko: so once the message is get by one listsner in req/rep it won't be send to other listeners ?
|
[16:56] benoitc
|
(just want to make sure)
|
[16:58] mikko
|
benoitc: different socket types have different kind of distribution algorithms
|
[16:58] mikko
|
for example in PUB/SUB the published message is sent to all subscribers
|
[16:59] mikko
|
but for example PUSH/PULL balances between live connections
|
[16:59] mikko
|
you should really read zguide
|
[18:00] cremes
|
benoitc: mikko is right; most of this is covered in the guide: http://zguide.zeromq.org/chapter:all
|
[18:01] cremes
|
for some reason, that link is returning a 503 right now
|
[18:01] cremes
|
sustrik, pieterh: the guide is offline
|
[18:01] cremes
|
it's returning a 503 error
|
[18:26] mikko
|
cremes: works from here
|
[18:27] cremes
|
mikko: it's responding now for me too but it wasn't before
|
[19:18] benoitc
|
yeah reading it
|
[19:18] benoitc
|
thanks
|
[20:14] sustrik
|
mikko: what version is that?
|
[20:15] sustrik
|
what it should do is behave the same way is if HWM is reached
|
[20:15] sustrik
|
block, drop, whatever
|
[20:15] sustrik
|
depending on the pattern
|
[20:18] mikko
|
sustrik: trunk, the lines won't match as i have some local changes
|
[20:19] mikko
|
sustrik: yes, probably
|
[20:19] sustrik
|
i see, so which assertion was actually hit?
|
[20:19] mikko
|
sustrik: writer_t::write
|
[20:19] mikko
|
when swap is full
|
[20:19] mikko
|
let me check the upstread lineno
|
[20:19] sustrik
|
zmq_assert (stored); ?
|
[20:20] mikko
|
https://github.com/zeromq/zeromq2/blob/master/src/pipe.cpp#L237
|
[20:20] mikko
|
yes
|
[20:20] sustrik
|
looks like a bug
|
[20:20] mathijs
|
I read some discussion about socket migrations and memory barriers. I never heard of those before, so I probably shouldn't touch stuff... but I think I have a possible usecase. GHC (haskell) uses a IO manager which manages a pool of OS threads. On top of those, it runs "green threads". A green thread can signal it's waiting for some FD. execution stops. When the fd is ready (epoll), execution resumes. But it's possible that it's mapped to a
|
[20:20] mathijs
|
different OS thread
|
[20:21] sustrik
|
afaiu the saving of messages to the swap is not atomic
|
[20:21] sustrik
|
i.e. you can store one message part and next one is rejected
|
[20:21] sustrik
|
should be fixed
|
[20:22] mathijs
|
is the case I describe a valid one for using socket migrations?
|
[20:24] sustrik
|
yes, that was the main use case for the "migration" feature
|
[20:25] sustrik
|
you can assume that the memory barriers are handled correctly by haskell runtime
|
[20:25] mathijs
|
I can signal the IO manager that a certain green thread gets mapped to the same OS thread every time it runs/continues, but having them scheduled on the least-busy OS thread sounds better.
|
[20:25] sustrik
|
yes, it should work
|
[20:25] mathijs
|
sustrik: thanks
|
[21:08] mikko
|
sustrik: so, i'm thinking about the whole swap thing
|
[21:09] mikko
|
what do you think about abstracting the swap slightly further?
|
[21:09] mikko
|
maybe something like: make the swap implementation pluggable and use inproc pipe to communicate with the swap engine
|
[21:24] sustrik
|
mikko: that way you would introduce a bottleneck
|
[21:25] mikko
|
but swap is going to be a bottleneck in any case
|
[21:25] sustrik
|
you can always make a device that would store messages on the disk if you want to
|
[21:25] mikko
|
the whole concept of swapping is more preserving operation rather than performance
|
[21:26] sustrik
|
the point is that you want the swap as fast as possible exactly because it is slow
|
[21:26] sustrik
|
so, for example
|
[21:26] sustrik
|
if your apps falls over into swap mode
|
[21:27] sustrik
|
you want it to get out of it as fast as possible once the network is up and running again
|
[21:29] sustrik
|
if you make it slow it may even happen that it will never get out of the swap mode
|
[21:29] sustrik
|
it's kind of tricky
|
[21:29] mikko
|
how much overhead does forwarding a message over inproc pipe add?
|
[21:29] sustrik
|
dunno
|
[21:29] sustrik
|
should be measured
|
[21:30] sustrik
|
maybe it would make no difference at all
|
[21:31] mikko
|
if there is little or no overhead it would allow people to implement optimised swap solutions
|
[21:32] sustrik
|
sure
|
[21:32] mikko
|
i noticed that there is the posix_fadvise for linux
|
[21:32] sustrik
|
yes
|
[21:32] mikko
|
but there are probably available optimizations for other platforms as well
|
[21:33] mikko
|
my point with this was: swap is really something that doesn't necessarily need to be in 0MQ core
|
[21:33] sustrik
|
ack
|
[21:33] mikko
|
if the implementation was pluggable it would allow people to make their own and share them
|
[21:33] mikko
|
not sure, it's tricky
|
[21:33] sustrik
|
it was added as a feature paid for by a customer
|
[21:34] sustrik
|
so it's kind of a hack atm
|
[21:35] mikko
|
void zmq::swap_t::rollback ()
|
[21:35] mikko
|
is that used ?
|
[21:35] sustrik
|
let me think about it...
|
[21:35] mikko
|
as it seems that it ends up into assertion in any possible path
|
[21:35] sustrik
|
mikko: it should be afaics
|
[21:35] mikko
|
well, possibly ends up
|
[21:36] sustrik
|
hm, quite possibly i've broken the implementation when i did all the changes for 2.1
|
[21:37] sustrik
|
well, it would be nice is swapping was implemented as a device
|
[21:37] sustrik
|
so you would just plug it in-between two nodes to get swap
|
[21:38] sustrik
|
one problem is that by doing so you have to do two hops instead of a single one
|
[21:38] sustrik
|
even though swap is not being used at the moment
|
[21:38] mikko
|
that is true
|
[21:38] mikko
|
but would you use swap if you were aiming for absolute performance?
|
[21:38] sustrik
|
actually, i like the idea
|
[21:39] sustrik
|
we could just say 'sorry, there's performance impact with swap'
|
[21:39] mikko
|
so, if you were to generalise this a bit further
|
[21:40] mikko
|
zmq_callback_device which takes function pointers for message received from front and sent out to back
|
[21:40] mikko
|
then swap would implement functions for storing and removing
|
[21:40] mikko
|
you could use the same pattern for implementing 'persistent' things
|
[21:41] mikko
|
not sure about exact details yet
|
[21:41] sustrik
|
right, it's just a device
|
[21:41] sustrik
|
i would like to move devices out of core 0mq with 3.0
|
[21:41] sustrik
|
swap could be removed as well at that point
|
[21:41] mikko
|
i think for devices to be more useful is a way to stop them without SIGINT
|
[21:42] sustrik
|
and instead we can provide a standalone swapping device
|
[21:42] sustrik
|
yes, you want a remote managementy
|
[21:42] mikko
|
void *device = zmq_device(...); while (1) { if (shutdown) { device_shutdown(device); } sleep(5); }
|
[21:42] mikko
|
something like that
|
[21:43] sustrik
|
shutdown sent as a message to the device, right?
|
[21:43] mikko
|
yes, currently in my small program i run device in a thread and just use pthread_cancel to stop the device thread
|
[21:43] sustrik
|
from the management console
|
[21:43] mikko
|
that is a possibility yes
|
[21:43] mikko
|
but does it open the device for DoS ?
|
[21:44] mikko
|
or do you mean some sort of internal message?
|
[21:44] sustrik
|
you need authentication etc.
|
[21:44] sustrik
|
the whole device thing is where development will happen in the future imo
|
[21:45] sustrik
|
actually, what i foresee is that devices will be removed from core 0mq
|
[21:45] sustrik
|
there will be various open source devices
|
[21:45] sustrik
|
but there can also be complex proprietary devices
|
[21:46] mikko
|
we should have a hackathon on removing all these assertions at some point
|
[21:46] sustrik
|
think of how cisco builds boxes for TCP/IP stack
|
[21:46] sustrik
|
yes, but it has to happen with new major version
|
[21:46] sustrik
|
i.e. 3.0
|
[21:47] sustrik
|
as it's not backward compatible
|
[21:49] mikko
|
thats true
|