[Time] Name | Message |
[06:50] sustrik
|
petrilli: can you spell your problems with java binding more explicitly
|
[06:50] sustrik
|
?
|
[06:50] sustrik
|
having a list of issues could make it move forward faster
|
[08:01] mikko
|
good morning
|
[08:13] sustrik
|
morning
|
[08:24] mikko
|
Assertion failed: term_acks > 0 (own.cpp:175)
|
[08:24] mikko
|
this random assertion keeps popping up
|
[08:24] mikko
|
let me make sure that i got the latest master
|
[08:35] mikko
|
sustrik: at the moment on master: the context close will block even if the sockets are closed ?
|
[08:36] mikko
|
assuming there are messages in-flight waiting to be sent
|
[08:55] mikko
|
hmm
|
[09:08] sustrik
|
mikko: yes
|
[09:09] sustrik
|
the requirement was not to drop messages, so someone has to wait till they are sent
|
[09:09] mikko
|
sustrik: take a look at this
|
[09:09] mikko
|
sec
|
[09:10] mikko
|
https://gist.github.com/25d81c09cf6a838a2aed
|
[09:10] mikko
|
seems to result into deadlock
|
[09:10] mikko
|
zmq::ctx_t::terminate (this=0x601010) at semaphore.hpp:117
|
[09:10] sustrik
|
you haven't closed the sockets
|
[09:10] mikko
|
let me close
|
[09:11] sustrik
|
thus the context has no idea whether there are more messages going to be sent or what
|
[09:11] mikko
|
because i keep getting a deadlock in php
|
[09:12] mikko
|
which i cant reproduce in plain c
|
[09:12] mikko
|
i assume it has something to do with destruction order
|
[09:15] mikko
|
Assertion failed: !prefetched (xrep.cpp:108)
|
[09:15] mikko
|
now i got this out
|
[09:18] mikko
|
also Assertion failed: inpipe_ && outpipe_ (xreq.cpp:42)
|
[09:18] mikko
|
i think i must be doing something wrong
|
[09:21] sustrik
|
mikko: that's your test program>
|
[09:21] sustrik
|
?
|
[09:21] sustrik
|
in C?
|
[09:21] mikko
|
sustrik: i can see how these happen
|
[09:22] mikko
|
yes
|
[09:22] mikko
|
C
|
[09:22] sustrik
|
can you paste it, so that i can try?
|
[09:23] mikko
|
first one: https://gist.github.com/b7b74bf1521c085aa51f
|
[09:23] mikko
|
i think i must have error there
|
[09:23] mikko
|
as it ends up blocking on recv
|
[09:24] sustrik
|
what about the assertions?
|
[09:24] sustrik
|
what version are you using?
|
[09:24] mikko
|
comment out lines 45 - 49
|
[09:24] sustrik
|
xrep.cpp:108 has no assert in HEAD
|
[09:24] mikko
|
and you will get Assertion failed: !prefetched (xrep.cpp:108)
|
[09:24] mikko
|
let me see which version i got
|
[09:25] mikko
|
i thought i got latest master but i'll recheck
|
[09:25] mikko
|
taking a fresh checkout just in case
|
[09:30] sustrik
|
when i remove the lines 45-49
|
[09:30] sustrik
|
program exits with no problem
|
[09:30] sustrik
|
when i keep them in it freezes
|
[09:30] mikko
|
it blocks on recv() ?
|
[09:30] mikko
|
is that expected or do i have some silly error there?
|
[09:31] mikko
|
sustrik: http://github.com/zeromq/zeromq2/blob/master/src/xrep.cpp#L108
|
[09:32] mikko
|
?
|
[09:33] sustrik
|
hm, you are right
|
[09:33] sustrik
|
i wonder why it's not on my box
|
[09:33] mikko
|
so commenting out lines 45-49 causes Assertion failed: !prefetched (xrep.cpp:108)
|
[09:35] sustrik
|
ack, i'll remove the assert
|
[09:36] sustrik
|
it was a patch I've applied without thinking about it sufficiently :|
|
[09:36] mikko
|
https://gist.github.com/9cc7dbeaa1b37ff44626
|
[09:37] mikko
|
that causes
|
[09:37] mikko
|
Assertion failed: inpipe_ && outpipe_ (xreq.cpp:42)
|
[09:37] sustrik
|
as for the freeze, it's hung up in zmq_recv
|
[09:37] mikko
|
the freeze is unexpected?
|
[09:38] sustrik
|
nope
|
[09:38] sustrik
|
when using XREP
|
[09:38] sustrik
|
you have to send the identity first
|
[09:38] mikko
|
will zmq_poll show it readable?
|
[09:39] sustrik
|
when exactly?
|
[09:39] sustrik
|
btw, changing socket types to REQ/REP works OK
|
[09:40] mikko
|
it's blocking on zmq_recv, i wonder if polling socket before the recv show it as readable
|
[09:45] sustrik
|
it should not
|
[09:45] mikko
|
i can test
|
[09:51] mikko
|
zmq_poll returns it not readable
|
[09:51] mikko
|
good
|
[09:52] sustrik
|
ack
|
[09:52] mikko
|
will zmq_poll show socket non-writable if HWM has been reached?
|
[09:52] mikko
|
the inpipe/outpipe assert might be because of incorrect usage of XRE(P|Q) sockets
|
[09:53] sustrik
|
mikko: yes
|
[09:53] sustrik
|
it will show !writeable
|
[09:54] sustrik
|
as for the assert, it should not happen even if the sockets are used in incorrect way
|
[09:54] sustrik
|
i'll check
|
[09:57] mikko
|
https://gist.github.com/e7779cdc9345967cc75e this is also supposed to block on zmq_term?
|
[09:57] mikko
|
i assume because i connect the PUB socket
|
[10:25] CIA-14
|
zeromq2: 03Martin Sustrik 07master * rf22e85f 10/ src/xrep.cpp :
|
[10:25] CIA-14
|
zeromq2: Reverting commit 1d431190f50c86f62460
|
[10:25] CIA-14
|
zeromq2: The patch was supposed to check that pipe writer sends messages
|
[10:25] CIA-14
|
zeromq2: in atomic fashion. However, it prevented the user to read
|
[10:25] CIA-14
|
zeromq2: half of a message and close the socket.
|
[10:25] CIA-14
|
zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/c70jEY
|
[10:25] sustrik
|
mikko: the assert is removed from master
|
[10:25] mikko
|
good!
|
[10:26] sustrik
|
what next?
|
[10:26] mikko
|
it's odd that PUB socket close semantics are different depending on whther you bind or connect
|
[10:26] mikko
|
that might be confusing for new users
|
[10:26] sustrik
|
it's that way for all sockets
|
[10:26] sustrik
|
when you connect, a queue is created
|
[10:26] sustrik
|
the messages are stored in it
|
[10:27] sustrik
|
when you bind, there's no queue
|
[10:27] sustrik
|
as you don't even know how many peers there are going to be
|
[10:27] sustrik
|
a queue for a peer is created when the peer connects
|
[10:28] mikko
|
tricky situation, i think the current semantic for close is a bit problematic but apart from timeout i can't really think anything better either
|
[10:29] sustrik
|
yes, samw here
|
[10:29] sustrik
|
same*
|
[10:29] mikko
|
it's too easy to shoot yourself in the leg at the moment
|
[10:29] sustrik
|
you mean by blocking in term, right?
|
[10:29] mikko
|
for example if your remote peer goes down it might cause things to block eternally. in case of something like php scripts that would bring the whole site down
|
[10:30] sustrik
|
ack
|
[10:30] sustrik
|
we need to add SO_LINGER option
|
[10:30] sustrik
|
btw, reproduced the xreq.cpp:42 problem
|
[10:37] mikko
|
good!
|
[10:38] mikko
|
sustrik: even SO_LINGER is slightly undeterministic
|
[10:39] mikko
|
as the caller can't know whether it blocks due to "not being able to send" or whether it's sending but hasn't flushed everything yet
|
[10:39] mikko
|
what about making zmq_term non-blocking and returning error code if there are messages in-flight?
|
[10:39] mikko
|
that was user can handle the different scenarios as needed
|
[10:40] mikko
|
or zmq_term(ctx, 0) for blocking zmq_term(ctx, ZMQ_NOBLOCK);
|
[10:40] mikko
|
latter would come back with EAGAIN if it's still flushing stuff
|
[10:41] mikko
|
that is an API breakage but isn't api breaks possible in 2.1 ?
|
[10:43] mikko
|
it would enable to do things such as: http://gist.github.com/620337
|
[10:45] mikko
|
the blocking version could also use so_linger to determine timeout
|
[10:45] mikko
|
that way the core library doesn't need to try to give 'one size fits all' solution but to delegate it to the user
|
[10:46] sustrik
|
what's the difference between "not being able to send" and "haven't flushed everything yet"?
|
[10:48] mikko
|
not being able to send is for example if there are no lower level sockets open (not sure if context knows this)
|
[10:48] mikko
|
and the latter is when the messages are flying out to the network stack
|
[10:50] sustrik
|
by the former you mean that there wasn't zmq_bind or zmq_connect called on the socket?
|
[10:53] mikko
|
yes, that as well
|
[10:54] mikko
|
i don't know whether the context knows things about zmq_connect getting back connection refused
|
[10:54] mikko
|
and there is no active connection
|
[10:55] mikko
|
the main problem in close are 'connect'ed sockets
|
[10:55] mikko
|
i assume
|
[10:56] mikko
|
for example: 1. create pub socket 2. call zmq_connect 3. send() (under the hood socket gets connection refused) 4. close the socket 5. close the context
|
[10:57] mikko
|
in this scenario the remote peer is not there so you cannot send
|
[10:57] mikko
|
not sure if that is too much state
|
[10:58] sustrik
|
how does that differ from the case when server went down while sending the message?
|
[11:00] sustrik
|
anyway, if you want to define consistent semantics for the shutdown, you have to forget about underlying transport
|
[11:00] sustrik
|
details of how TCP works are irrelevant
|
[11:01] mikko
|
but that information is relevant to me as a user
|
[11:01] sustrik
|
why so?
|
[11:01] mikko
|
if i call close and there are 100 messages in-flight
|
[11:01] mikko
|
if the same 100 messages are there after 10 seconds i want to be able to act on it
|
[11:02] guido_g
|
because the app-developer knows how to handle the situation
|
[11:02] guido_g
|
'morning btw
|
[11:02] mikko
|
exactly, because my close semantics might depend on the data that the specific socket has been handling
|
[11:03] mikko
|
in some cases i might want to block until they are sent, even if it took days
|
[11:03] sustrik
|
so what you want is reliable delivery
|
[11:03] guido_g
|
no
|
[11:03] guido_g
|
more information on what is going on
|
[11:03] sustrik
|
either get the message to the peer or return it to the sender
|
[11:03] mikko
|
in some cases i might want to discard them if they are not being sent
|
[11:03] sustrik
|
that's what SO_LINGER is for
|
[11:03] guido_g
|
some sort of introspection of the current state of a ømq context or socket
|
[11:04] sustrik
|
impossible in distributed environment
|
[11:04] sustrik
|
the message may be in a device somewhere
|
[11:04] mikko
|
sustrik: i don't care about that
|
[11:04] sustrik
|
the library has no idea what state it is in
|
[11:04] guido_g
|
that's bad
|
[11:04] mikko
|
sustrik: as a developer all i care is that it has left my program
|
[11:04] mikko
|
or that it's not leaving my program
|
[11:05] mikko
|
think about the following scenario: i send 100 huge messages, the remote peer is consuming them but slowly. given small so_linger the messages might be discarded even if the remote peer is actually consuming
|
[11:06] mikko
|
that situation is different from a situation where the messages are in memory and are not being consumed at all
|
[11:06] mikko
|
i'm not saying that so_linger is not useful. it is for some scenarios but it's still a bit non-deterministic
|
[11:07] mikko
|
if i've closed my sockets, i'm not sending anything and the messages are not leaving my program i would like to know about that
|
[11:07] mikko
|
i dont need to care whether the remote peer is actually down or network is down. i just want to know they are not being sent and act on it
|
[11:07] sustrik
|
i think the problem in your reasoning is that you assume we know whether messages are being consumer or not
|
[11:07] mikko
|
depending on data i might choose to discard it or store locally
|
[11:07] sustrik
|
what does it exactly mean?
|
[11:08] sustrik
|
consumed*
|
[11:09] mikko
|
apart from inproc, to me it means that the message has left the current program
|
[11:10] sustrik
|
we can drop them then, no?
|
[11:10] mikko
|
as a developer i would like to choose
|
[11:10] mikko
|
keep blocking or discard
|
[11:14] sustrik
|
i still don't follow, how would you do the decision, based on what?
|
[11:15] mikko
|
i would do the decision based on the data
|
[11:15] mikko
|
(not sure if that answers the question)
|
[11:15] sustrik
|
what data?
|
[11:16] mikko
|
let me try to write down the scenarios i got in my head
|
[11:16] mikko
|
just a sec
|
[11:16] sustrik
|
you mean based on number of messages in 0mq's send buffer?
|
[11:17] mikko
|
the data that my application was handling and based on whether the send buffer is getting smaller on a period of time
|
[11:17] sustrik
|
ah, you want to shutdown depending on the throughput
|
[11:18] sustrik
|
if throughput goes below certain threshold => shutdown
|
[11:18] mikko
|
that was my original suggestion
|
[11:18] mikko
|
ages ago
|
[11:19] sustrik
|
yeah, that's semantically consistent solution
|
[11:19] mikko
|
because as an application developer i might want to do different decisions based on the data available to me: how many messages in flight? are the messages leaving my program? what kind data i was sending, can i just discard it or do i need to do more?
|
[11:22] sustrik
|
"how many messages in flight?"
|
[11:22] sustrik
|
that's messages in 0mq transmit buffer?
|
[11:22] mikko
|
yes
|
[11:23] mikko
|
i hope you see my point through this babbling
|
[11:23] mikko
|
:)
|
[11:23] sustrik
|
what about messages in TCP tx buffer?
|
[11:24] mikko
|
how large buffers are we talking about?
|
[11:25] sustrik
|
TCP tx buffer?
|
[11:25] sustrik
|
depends
|
[11:25] sustrik
|
128kB
|
[11:25] sustrik
|
1MB
|
[11:25] sustrik
|
shrug
|
[11:25] guido_g
|
on one side we're not allowed to see tcp through ømq and on the other side we're asked what we need to know about it's state, confusing
|
[11:26] sustrik
|
exactly
|
[11:26] sustrik
|
you should not see it at all
|
[11:27] guido_g
|
what i'd like to see in the future is more thought on how to get these parameters of operation out of ømq
|
[11:27] guido_g
|
for things like monitoring
|
[11:27] sustrik
|
ack
|
[11:28] sustrik
|
there are 2 levels to the monitoring imo
|
[11:28] sustrik
|
1. network monitoring
|
[11:28] guido_g
|
i -- in the role of an ops guy -- want to know how many conenctions from which host are done, if there are failures and how much per etc.
|
[11:28] sustrik
|
done on IP level
|
[11:29] sustrik
|
2. device monitoring -- connecting to 0mq device and finding out how many messages are queued there and so on
|
[11:29] sustrik
|
what's a failure?
|
[11:29] guido_g
|
also i want to correlate that with the applications state and behaviour
|
[11:29] guido_g
|
a failure is this kind of situation that ops defines as a failure
|
[11:29] guido_g
|
nothing more or less
|
[11:30] sustrik
|
can you give an example?
|
[11:30] guido_g
|
in my eyes ømq as a library should provide a way to peek into it's workings
|
[11:30] guido_g
|
monitoring != alerting
|
[11:30] guido_g
|
the monitoring is just collecting the data -- for starters
|
[11:31] guido_g
|
if i can't get key data like average queue sizes i'm basically lost
|
[11:32] guido_g
|
i know that this data isn't accurate, but it hasn't to be
|
[11:32] guido_g
|
most data is aggregated anyway
|
[11:32] sustrik
|
the problem is there's no real definition for "messages in flight"
|
[11:33] sustrik
|
if what you are worried about it memory consumption
|
[11:33] sustrik
|
you should monitor the memory used by your app
|
[11:33] guido_g
|
then stick a different label on the data and be done
|
[11:34] guido_g
|
sure, memory, cpu, ctx switches all known
|
[11:35] guido_g
|
except for the fact that (seen from app level) i can't say: for timespan ts there were 1000 messages send from node a, but only 40 received by node b
|
[11:35] guido_g
|
which amazingly correlates with the memory consumption on node a
|
[11:35] guido_g
|
and the reconnect rate of the corresponding sockets
|
[11:35] sustrik
|
wait a sec
|
[11:36] guido_g
|
sure
|
[11:36] sustrik
|
why can't you say how many messages you've sent and how many you've received?
|
[11:36] guido_g
|
this one i can do
|
[11:37] guido_g
|
but it gets a little complicated if ømq routing kicks in
|
[11:37] guido_g
|
and queueing
|
[11:37] guido_g
|
then i'm completey blind
|
[11:37] guido_g
|
obviouskly a fact i don't anticipate
|
[11:37] sustrik
|
the queueing is just a buffer, same as tcp tx buffer
|
[11:37] sustrik
|
set the HWM
|
[11:38] sustrik
|
and you have an upper limit on the buffer
|
[11:39] guido_g
|
why is it so complicated to understand that this data is kind of important?
|
[11:40] sustrik
|
because it has no clear semantics
|
[11:40] sustrik
|
if you can't say what the figure means, you don't need it
|
[11:40] guido_g
|
huh?
|
[11:41] sustrik
|
all i want is a clear definition of the figure you want 0MQ to provide
|
[11:42] sustrik
|
one that won't change arbitrarily depending on where the data is accidentally stored
|
[11:42] guido_g
|
why should I define "sematics" of data that is alreay there? shouldn't this be done beforehand?
|
[11:42] sustrik
|
whether it's in 0mq buff, tcp buff, NICs buff etc.
|
[11:42] guido_g
|
we're talking about ømq
|
[11:42] sustrik
|
let me give you an example
|
[11:42] guido_g
|
so the topic is set, no ip, tcp or moonphase
|
[11:43] sustrik
|
say you connect
|
[11:43] sustrik
|
then you send a message
|
[11:43] sustrik
|
the peer goes offline in the meantime
|
[11:43] sustrik
|
what's the number of "messages in flight"?
|
[11:44] guido_g
|
not in flight
|
[11:45] guido_g
|
there is a number of messages in the queue
|
[11:45] sustrik
|
ok, so what's the number of messages in queue
|
[11:45] guido_g
|
this would be one of the numbers people might be interested in
|
[11:45] sustrik
|
?
|
[11:45] guido_g
|
how much messages are in the send queue or queues
|
[11:46] sustrik
|
1?
|
[11:46] sustrik
|
the problem is it depends on details of how TCP works
|
[11:46] sustrik
|
and timing
|
[11:46] guido_g
|
NO
|
[11:47] guido_g
|
it depends on how many send calls have put something into the queues, no?
|
[11:47] sustrik
|
no
|
[11:47] guido_g
|
tcp is no ømq
|
[11:47] sustrik
|
what happens is that 0mq is either able to push the message to TCP buffer
|
[11:48] guido_g
|
then it's remove from the send q, right?
|
[11:48] sustrik
|
before TCP realises the other endpoint is not available
|
[11:48] sustrik
|
or the order of events is reverse
|
[11:48] sustrik
|
i.e. TCP realises the peer is not available first
|
[11:48] guido_g
|
see, you're thinking way to deep here
|
[11:49] sustrik
|
then the message stays in 0mq buffer
|
[11:49] sustrik
|
so the figure is either 0 or 1
|
[11:49] sustrik
|
depending on tcp details
|
[11:49] guido_g
|
it just about getting some numbers, that might help to spot or trace problems and perfrmance
|
[11:49] sustrik
|
exactly
|
[11:50] sustrik
|
so let's define them in a consistent way
|
[11:50] sustrik
|
rather then depending on details of underlying network transport
|
[11:50] sustrik
|
that way you are generic, consistent and future-proof
|
[11:50] guido_g
|
as i said, number of messages in a queue is a very nice and probably useful number
|
[11:51] sustrik
|
it's a definition based on implementation details
|
[11:51] sustrik
|
real definition should be based on observable behaviour
|
[11:52] guido_g
|
no
|
[11:52] guido_g
|
because you provide an "abstration"
|
[11:52] sustrik
|
exactly
|
[11:52] sustrik
|
abstraction works only if you abstract from implementation details
|
[11:52] guido_g
|
the the visible behaviour does not show what is going on
|
[11:53] sustrik
|
i mean observable bahviour such as "memory usage"
|
[11:53] sustrik
|
that's pretty clear
|
[11:53] guido_g
|
every abstraction leaks
|
[11:53] guido_g
|
the more you want to hide, the more leakage happens
|
[11:53] guido_g
|
a bad situation for both sides
|
[11:54] sustrik
|
ok, we've got into theoretical discussion :)
|
[11:54] guido_g
|
the app-devs are using "undocumented features" to get what they want and the lib-devs try to stop that
|
[11:54] guido_g
|
sustrik: not my fault
|
[11:54] sustrik
|
:)
|
[11:54] sustrik
|
it's about layering, in a correctly designed stack
|
[11:55] guido_g
|
see, monitoring is an extremly important thing, imnsho
|
[11:55] sustrik
|
if layer N doesnt' provide enough flexibility, you shift down to layer N-1
|
[11:55] sustrik
|
guido_g: definitely
|
[11:55] sustrik
|
but let's do it right
|
[11:55] guido_g
|
i do need a lot of informations about the current state of my apps, including the comminication
|
[11:55] sustrik
|
monitoring random implementation details makes no sense
|
[11:56] guido_g
|
sure, but beeing picky on names of data isn't very helpful imho
|
[11:56] sustrik
|
we have to monitor real data
|
[11:56] sustrik
|
i don't care about name
|
[11:56] sustrik
|
what i'm saying is that size of 0mq queue is an implementation detail
|
[11:56] guido_g
|
but an important one
|
[11:56] guido_g
|
if i use ømq i know that
|
[11:57] guido_g
|
i mean, i knwo that i use ømq
|
[11:57] sustrik
|
that's because you ignore all the layers below 0mq and all the devices on your path
|
[11:57] guido_g
|
so no further abstraction is needed
|
[11:57] guido_g
|
for now and this discussion, yes
|
[11:57] sustrik
|
i still don't see what you would use the number for
|
[11:57] guido_g
|
but devices are formed with ømq so...
|
[11:58] sustrik
|
it's completely random
|
[11:58] sustrik
|
if you send 200kB of messages
|
[11:58] guido_g
|
no
|
[11:58] sustrik
|
and there's TCP tx buffer of 120 kB
|
[11:58] sustrik
|
you'll have 80kB in 0mq queue
|
[11:58] sustrik
|
if the TCP buffer is accidentally set to 200kB
|
[11:58] sustrik
|
the 0mq queue will be empty
|
[11:58] guido_g
|
see it as an indicator
|
[11:59] sustrik
|
exactly, it's an indicator
|
[11:59] guido_g
|
w/o the data you will loose information on what the whole system is doing
|
[11:59] sustrik
|
try to define what it is indicating
|
[11:59] sustrik
|
then try to find a consistent indicator
|
[11:59] guido_g
|
but with this indicator at hand, you might find a way to predict upcomming problems or shortcomming etc.
|
[11:59] guido_g
|
this is the whole point of monitoring
|
[12:00] guido_g
|
and if this number is already a problem, then wait for the tcp connection details one might need...
|
[12:03] guido_g
|
ok, what number can ømq provide which reflects the number of messages that the application has sent but that are not put into the lower layer for delivery?
|
[12:04] guido_g
|
i mean, there must be a point where ømq treats a message as delivered (in the sense that the lower level has taken responsibility)
|
[12:07] sustrik
|
it's on 0MQ API
|
[12:08] sustrik
|
when you call zmq_send, you transfer the responsibility
|
[12:10] guido_g
|
to ømq
|
[12:10] guido_g
|
but between the send on the app side and the send from ømq to os is "something"
|
[12:11] sustrik
|
well, yes
|
[12:11] sustrik
|
and?
|
[12:11] sustrik
|
there's some 6 layers of functionality below zmq_send call
|
[12:11] sustrik
|
most of them doing some buffering
|
[12:11] guido_g
|
and because this "something" is quite important, one needs to know if "something" is feeling well etc.
|
[12:12] sustrik
|
understand me right, i am not against monitoring
|
[12:12] sustrik
|
i just want to monitor matrics that have real meaning
|
[12:12] guido_g
|
ok
|
[12:12] sustrik
|
let's rather start from use cases
|
[12:13] guido_g
|
above i gave one
|
[12:14] guido_g
|
"messages" put into ømq via send vs. "messages" removed von ømq responsibility
|
[12:14] guido_g
|
ops
|
[12:14] sustrik
|
that's not a use case
|
[12:14] sustrik
|
that a solution
|
[12:14] guido_g
|
it is
|
[12:14] sustrik
|
use case is "what you want to do"
|
[12:14] guido_g
|
no
|
[12:15] sustrik
|
:)
|
[12:15] sustrik
|
anyway, what do you want to do?
|
[12:15] sustrik
|
i can see two options:
|
[12:15] guido_g
|
most infrastructure things are not very well descibed by use-cases
|
[12:15] sustrik
|
1. memory monitoring
|
[12:15] sustrik
|
2. latency monitorring
|
[12:15] guido_g
|
monitoring in itself is not a closed system that can be described statically
|
[12:16] sustrik
|
c'mon you have to know what you want :)
|
[12:16] guido_g
|
for all these points we need some numbers, right?
|
[12:16] sustrik
|
yes, we need metrics to monitor
|
[12:16] guido_g
|
i know what i want now, yes
|
[12:16] guido_g
|
but i cant know what ops will need in 3 month/years
|
[12:17] sustrik
|
solve those then
|
[12:17] guido_g
|
but i've to provide as much of possibilities as possible
|
[12:17] guido_g
|
that's my job
|
[12:17] guido_g
|
if you don't have the data, you can't
|
[12:17] sustrik
|
my job is to cur possibilities :)
|
[12:17] sustrik
|
cut
|
[12:17] guido_g
|
good
|
[12:18] sustrik
|
some balance may result from us two discussing
|
[12:18] guido_g
|
yes
|
[12:18] sustrik
|
basically, 0mq resulted from taking a corporate middleware and cutting everything not strictly needed off
|
[12:19] guido_g
|
and now we need to put things back in, otherwise it's not useable for larger projects
|
[12:19] sustrik
|
so when adding a feature back we need a serious understanding of why it's needed
|
[12:19] guido_g
|
where larger is more then a handfull of nodes
|
[12:19] sustrik
|
otherwise we'll end up back in corporate middleware sphere
|
[12:19] sustrik
|
agreed
|
[12:19] sustrik
|
but extreme caution is needed
|
[12:20] guido_g
|
ack
|
[12:20] guido_g
|
one of the "key features" of ømq is its size
|
[12:20] sustrik
|
low memory footprint
|
[12:20] sustrik
|
right
|
[12:20] guido_g
|
and slick api
|
[12:20] sustrik
|
yes
|
[12:21] sustrik
|
we need a way to keep the memory footprint low
|
[12:21] sustrik
|
i am aware of that
|
[12:21] sustrik
|
HWM is already implemented
|
[12:21] sustrik
|
we need "max message size" option
|
[12:21] sustrik
|
as well
|
[12:21] sustrik
|
but that's orthogonal to monitoring
|
[12:22] guido_g
|
i think we should not start on the api side of monitoring
|
[12:22] guido_g
|
we should start by finding "interesting" data points in ømq
|
[12:23] sustrik
|
ack
|
[12:23] sustrik
|
so, can you produce a monitoring use case?
|
[12:23] guido_g
|
i we have that, i'm sure there will be a consistent way to gain access to them
|
[12:23] guido_g
|
i'll try
|
[12:23] sustrik
|
that'll be great
|
[12:23] sustrik
|
are you going to arrive at amsterdam btw?
|
[12:24] guido_g
|
hmmm...
|
[12:24] guido_g
|
would be the most expensive beer i ever had
|
[12:24] sustrik
|
same here
|
[12:24] mato
|
hi guys
|
[12:25] sustrik
|
anyway, i think we should so a conference later on anyway
|
[12:25] guido_g
|
but on the other hand, would be nice to discuss face to face (and scare away innocent bystanders :)
|
[12:25] mato
|
sustrik: check also brussels, pieter was offering space to crash at his place
|
[12:26] sustrik
|
i though of doing some event during/after FOSSDEM
|
[12:26] sustrik
|
that's february
|
[12:26] sustrik
|
and makes the whole thing more worth of coming
|
[12:26] sustrik
|
as you can attend the conference as well
|
[12:27] guido_g
|
sounds good
|
[12:30] guido_g
|
ok, need to do soemthing for my health (besides eating :)
|
[12:31] guido_g
|
will come up with some ideas regarding monitoring
|
[12:33] sustrik
|
great
|
[12:33] sustrik
|
thanks
|
[13:20] mikko
|
sustrik: http://github.com/mkoppanen/php-zmq/issues#issue/11 does this look familiar?
|
[13:20] mikko
|
seems like the segfault happens inside uuid
|
[13:45] sustrik
|
mikko: no
|
[13:45] sustrik
|
yes, the segfault is inside uuid
|
[13:46] sustrik
|
it's either invalid buffer passed to uuid_generate
|
[13:46] sustrik
|
or a bug in libuuid
|
[13:46] sustrik
|
anyway, hard to say what have gone wrong without reproducing the case
|
[13:47] mikko
|
i remember seeing this ages ago. it was due to linking order of libuuid
|
[13:47] sustrik
|
oh my
|
[13:47] mikko
|
http://usrportage.de/archives/922-PHP-segfaulting-with-pecluuid-and-peclimagick.html
|
[13:48] mikko
|
someone blogged about similar issue where two modules are linked against libuuid
|
[13:48] mikko
|
and it was fixed by changing the loading order of them
|
[13:48] mikko
|
which sounds pretty strange
|
[13:50] sustrik
|
well, if libuuid has some code hooked to the loading of the library
|
[13:50] sustrik
|
some strange misinteraction may happen
|
[13:50] sustrik
|
causing it to be used before it is initialised
|
[13:53] sustrik
|
maybe it's initalised twice
|
[13:53] sustrik
|
then deinitialised once
|
[13:53] sustrik
|
then called
|
[13:53] sustrik
|
?
|
[13:54] mikko
|
im just reading through libuuid code
|
[13:56] mikko
|
it's not that
|
[13:57] mikko
|
the guy commented
|
[13:57] mato
|
sustrik: see my email re the version patch, please put back the two lines I asked for, you've broken make dist
|
[13:58] mato
|
sustrik: that *and* the version number propagation to doc/Makefile
|
[14:00] mikko
|
sustrik: rather interesting valgrind output
|
[14:05] mato
|
mikko: rpath patch has been sent off to sustrik for applying
|
[14:05] mikko
|
mato: nice
|
[14:05] mikko
|
i found hudson iphone application
|
[14:06] mikko
|
i've been checking the builds even on the move :)
|
[14:06] mato
|
:-)
|
[14:10] sustrik
|
mato: hey
|
[14:10] sustrik
|
should i apply patches to the build system?
|
[14:10] mato
|
sustrik: damnit well, you just did
|
[14:10] mato
|
sustrik: and you broke it
|
[14:11] mato
|
sustrik: so now please fix what you broke :-)
|
[14:11] sustrik
|
i mean from procedural point of view
|
[14:11] sustrik
|
am i the only committer?
|
[14:11] mato
|
sustrik: you're the only committer to the github hosted-repository, yes
|
[14:11] mato
|
sustrik: that's the way it should work
|
[14:11] sustrik
|
ok
|
[14:12] mato
|
sustrik: otherwise things get problematic due to the maint branch
|
[14:12] mikko
|
no more holidays for Martin
|
[14:12] mato
|
:-)
|
[14:12] sustrik
|
ugh
|
[14:12] sustrik
|
:)
|
[14:13] mato
|
sustrik: it'll help a lot if you eventually use a real mail client and/or pull requests
|
[14:13] mato
|
sustrik: since as I showed you, applying X patches becomes one command
|
[14:13] mato
|
no hand work involved
|
[14:14] sustrik
|
you have to show me how to do that later on
|
[14:15] mato
|
will do, but you'll have to move to a better mail client
|
[14:15] mato
|
since Thunderbird doesn't understand "Save As" means "Save this without mangling it" :-)
|
[14:41] CIA-14
|
zeromq2: 03Martin Sustrik 07maint * r6cd0867 10/ configure.in :
|
[14:41] CIA-14
|
zeromq2: Fixing the Red Hat packaging
|
[14:41] CIA-14
|
zeromq2: When adding ZMQ_VERSION macros, I incorrectly removed
|
[14:41] CIA-14
|
zeromq2: the PACKAGE_VERSION macro. Adding it back.
|
[14:41] CIA-14
|
zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/amuEWp
|
[14:41] CIA-14
|
zeromq2: 03Martin Lucina 07maint * r57428db 10/ configure.in : (log message trimmed)
|
[14:41] CIA-14
|
zeromq2: configure.in: Do not patch libtool rpath handling
|
[14:41] CIA-14
|
zeromq2: For historic reasons (mainly compatbility with really old libtool), configure was
|
[14:41] CIA-14
|
zeromq2: patching libtool to not use rpath in binaries. This breaks (among other things)
|
[14:41] CIA-14
|
zeromq2: correct operation of "make check" since the test binaries may not be run with
|
[14:41] CIA-14
|
zeromq2: the correct shared library version.
|
[14:41] CIA-14
|
zeromq2: Current best practice as seen e.g. at http://wiki.debian.org/RpathIssue suggests
|
[14:42] sustrik
|
mato: done
|
[14:43] mikko
|
do those go into master as well?
|
[14:43] CIA-14
|
zeromq2: 03Martin Sustrik 07master * r6cd0867 10/ configure.in :
|
[14:43] CIA-14
|
zeromq2: Fixing the Red Hat packaging
|
[14:43] CIA-14
|
zeromq2: When adding ZMQ_VERSION macros, I incorrectly removed
|
[14:43] CIA-14
|
zeromq2: the PACKAGE_VERSION macro. Adding it back.
|
[14:43] CIA-14
|
zeromq2: Signed-off-by: Martin Sustrik <sustrik@250bpm.com> - http://bit.ly/amuEWp
|
[14:43] CIA-14
|
zeromq2: 03Martin Lucina 07master * r57428db 10/ configure.in : (log message trimmed)
|
[14:43] CIA-14
|
zeromq2: configure.in: Do not patch libtool rpath handling
|
[14:43] CIA-14
|
zeromq2: For historic reasons (mainly compatbility with really old libtool), configure was
|
[14:43] CIA-14
|
zeromq2: patching libtool to not use rpath in binaries. This breaks (among other things)
|
[14:43] CIA-14
|
zeromq2: correct operation of "make check" since the test binaries may not be run with
|
[14:43] CIA-14
|
zeromq2: the correct shared library version.
|
[14:43] CIA-14
|
zeromq2: Current best practice as seen e.g. at http://wiki.debian.org/RpathIssue suggests
|
[14:43] CIA-14
|
zeromq2: 03Martin Sustrik 07master * re168173 10/ configure.in :
|
[14:43] CIA-14
|
zeromq2: Merge branch 'maint'
|
[14:44] mato
|
sustrik: thx
|
[14:44] mikko
|
ah
|
[14:46] sustrik
|
mikko: yes?
|
[14:47] mikko
|
sustrik: building now
|
[14:48] mikko
|
All 7 tests passed
|
[14:48] mikko
|
rpath thingie fixed the build for me
|
[14:48] sustrik
|
great
|
[14:48] sustrik
|
how come all the bindings work
|
[14:48] sustrik
|
?
|
[14:48] mato
|
mikko: you might want to add 'make dist' to the build also
|
[14:48] sustrik
|
mikko: what's the link?
|
[14:48] mikko
|
http://valokuva.org:8080/
|
[14:48] mikko
|
it's building all the dependent projects atm
|
[14:49] mikko
|
yeah, means "photograph" in finnish
|
[14:49] mikko
|
mato: i'll add it
|
[14:49] sustrik
|
wow, great, i know one finnish word now
|
[14:49] keffo
|
terve!
|
[14:49] sustrik
|
though, i can't remember it :)
|
[14:49] mikko
|
mato: done
|
[14:53] mikko
|
building with 'make dist' now
|
[14:55] sustrik
|
mato: btw, there was some discussion about changing the license headers in 0mq source code
|
[14:55] sustrik
|
is there any outcome of that?
|
[14:55] mato
|
sustrik: talked about it with pieter, afaik does not need to be changed
|
[14:56] mato
|
sustrik: only the README and various supporting files (LGPL exception) need to be changed
|
[14:56] mato
|
sustrik: but not the actual source files, since neither the original copyright (iMatix) nor the license (LGPL) has changed
|
[14:56] sustrik
|
There's wrong name of the license there
|
[14:56] mato
|
sustrik: oh, only thing is, there is a wording error
|
[14:56] mikko
|
make[2]: *** No rule to make target `zmq_forwarder.1', needed by `dist-hook'. Stop.
|
[14:56] mato
|
yes, i just remembered
|
[14:57] mato
|
mikko: ja, you want asciidoc + xmltol for make dist, since it generates documentation
|
[14:57] mikko
|
im missing the doc generation tools
|
[14:57] mikko
|
hmm
|
[14:57] mikko
|
should make dist fail if those are not in place during configure?
|
[14:57] mato
|
make dist is special
|
[14:57] mato
|
in that, most users will never touch it
|
[14:57] mato
|
so, maybe, no, whatever, doesn't matter right now :-)
|
[14:58] mikko
|
hehe
|
[14:58] mato
|
sustrik: yeah, so, that stuff should be fixed, but no hurry
|
[14:58] mato
|
sustrik: TBD before a release
|
[14:58] mikko
|
installing the tools and rebuilding soon
|
[14:58] sustrik
|
akc
|
[14:58] sustrik
|
ack
|
[14:58] mato
|
sustrik: or we can do it together some time, involves writing a script
|
[14:59] mato
|
sustrik: that way you don't go changing files by hand :-)
|
[14:59] sustrik
|
i can do it by hand
|
[14:59] sustrik
|
but script is definitely better
|
[15:00] mato
|
sustrik: man, sometimes i feel you actually like feeding computers :-)
|
[15:00] mato
|
sustrik: "do it by hand" ... geez ...
|
[15:00] keffo
|
they're supposed to be fed!
|
[15:00] sustrik
|
it imposes discipline on a programmer
|
[15:00] sustrik
|
which is a good thing
|
[15:00] mato
|
fed by code, not by programmers :-)
|
[15:00] sustrik
|
a bit similar to brainwashing
|
[15:00] sustrik
|
bit still good :)
|
[15:00] keffo
|
indeed, programmers solve problems.. The best programmer is the one already done :)
|
[15:31] mikko
|
mato: http://valokuva.org:8080/job/ZeroMQ2_master/ws/zeromq-2.1.0.tar.gz
|
[15:31] mikko
|
make dist built that
|
[15:32] mato
|
mikko: great...
|
[15:34] mikko
|
was missing zip as well
|
[15:34] mikko
|
noticed
|
[16:00] CIA-14
|
zeromq2: 03Steven McCoy 07master * r5b8af52 10/ (src/pgm_receiver.cpp src/pgm_sender.cpp):
|
[16:00] CIA-14
|
zeromq2: Fix assertion in PGM transports on cancel_timer
|
[16:00] CIA-14
|
zeromq2: Signed-off-by: Steven McCoy <steven.mccoy@miru.hk> - http://bit.ly/aknF0L
|
[16:14] delaney
|
hi, i'm curious why there isn't pre-built binaries for windows on the site download page
|
[16:17] mato
|
sustrik: those patches you applied from steve, where did they come from?
|
[16:18] sustrik
|
from steve
|
[16:18] mato
|
sustrik: the reason i'm asking is that on the ML I did not see patches with a Signed-Off-By tag
|
[16:18] mato
|
sustrik: but the commit has a Signed-Off-By tag...
|
[16:18] mato
|
sustrik: so I'm confused
|
[16:19] sustrik
|
damn, i've got that wrong
|
[16:19] sustrik
|
let me ask steven to sign them off post hoc
|
[16:20] mato
|
np, you're learning ... signing them off "post hoc" won't really help anything now
|
[16:20] mato
|
anyway, no real problem
|
[16:20] mato
|
no panic
|
[16:21] mato
|
just do remember to double check what you're pushing to github makes sense :-)
|
[16:21] sustrik
|
why won't it help?
|
[16:21] mato
|
because it's not in the git history
|
[16:21] mato
|
hence not persistent
|
[16:22] sustrik
|
?
|
[16:22] sustrik
|
there's sign-off in the repo
|
[16:22] mato
|
it doesn't matter much though since the licensing is automatic
|
[16:22] mato
|
signoff is just tracking
|
[16:22] mato
|
ah, right, added by you :-)
|
[16:22] mato
|
bad you :-)
|
[16:22] delaney
|
i'm a python guy and was looking to use zmp, would it be useful to the project to include the dll i just made on the downlaod area?
|
[16:22] sustrik
|
so we need just steve to approve the sign-off now
|
[16:23] mato
|
well, there's no point
|
[16:23] mato
|
you added a signed-off-by tag
|
[16:23] mato
|
anyhow
|
[16:23] sustrik
|
by signing if off steven basically says "yes, i've created the patch myself"
|
[16:23] sustrik
|
that's it
|
[16:23] mato
|
the only point is, review everything you actually push to master :-)
|
[16:24] mato
|
i thought you liked bureaucracy :-)
|
[16:24] sustrik
|
i do
|
[16:24] sustrik
|
i just need few more patches to get it right
|
[16:24] sustrik
|
delaney: the problem is not providing binaries, rather maintaining them
|
[16:24] mato
|
or you could use the right tool... hang on... bureauracy... right... involves doing everything by hand so that the job take as long as possible :-)
|
[16:25] mato
|
i have to go
|
[16:25] sustrik
|
cya
|
[16:25] mato
|
sustrik: will make a robust patch for the version stuff, the latest idea looks ok
|
[16:25] mato
|
cyl
|
[16:25] sustrik
|
delaney: building new binaries when new version is released etc.
|
[16:26] delaney
|
yeah, true. still. i'm getting a 'Unable to find vcvarsall.bat' from easy_install but i have msvc 2010 express, any ideas?
|
[16:27] sustrik
|
no idea, sorry
|
[16:35] pieterh
|
delaney, it's normally produced by the installer if you ask for command line use
|
[16:38] starkdg
|
as an aside, is there any way to monitor buffer length in the io queues ? the number of messages ?
|
[16:38] starkdg
|
it might be a feature worth considering ?
|
[16:46] delaney
|
i'm trying to follow the http://www.zeromq.org/docs:windows-installations which i didn't see before... I'm able to build the solution but there is no libzmq.lib in the zeromq2\lib directory, only the lizmq.dll
|
[16:48] delaney
|
hmm, all seems to install to site-packages now, not sure what i did
|
[16:48] delaney
|
is that a misprint, should it be 'copy libzmq.DLL'?
|
[16:56] delaney
|
when i try to run the chat example i get http://pastebin.com/U1nrBEsU
|
[16:57] delaney
|
please excuse my c++/c n00bness
|
[17:02] mikko
|
are you compiling against github master?
|
[17:03] delaney
|
no, off the downloads, let me try that
|
[17:47] sustrik
|
delaney: it looks like you are passing invalid argument to bind
|
[17:48] sustrik
|
what's the string you are using?
|
[18:18] delaney
|
python display.py 127.0.0.1
|
[18:18] delaney
|
using the examples/chat, haven't touched the code
|
[18:44] sustrik
|
you are missing the port number i would say
|
[18:49] pieterh
|
delaney, did you read the user guide?
|
[19:00] delaney
|
ah, no i didn't thought i'd just run the examples
|
[19:00] delaney
|
that makes more sense
|
[20:34] rphillips
|
I'm running strace on a misbehaving subscriber daemon. I don't see a TCP connect after zmq_connect() with a tcp:// endpoint
|
[20:34] rphillips
|
should I?
|
[20:39] rgl
|
not immediatly. but soon a connection is going to be made in a background thread (the I/O thread of zmq).
|
[20:42] rphillips
|
strange... I don't see one
|
[21:09] rphillips
|
rgl: zdevice zmq_forwarder "tcp://127.0.0.1:22000" "tcp://127.0.0.1:22001" is creating netlink raw sockets on my system... that doesn't look correct
|
[21:11] rgl
|
maybe zmq has special handling for the loopback interface
|
[21:11] rgl
|
can you try connecting to different machines?
|
[21:20] rphillips
|
that seems to help... I'll have to submit a patch to the resolver code
|
[21:20] rphillips
|
thanks
|