Monday October 11, 2010

[Time] NameMessage
[06:50] sustrik petrilli: can you spell your problems with java binding more explicitly
[06:50] sustrik ?
[06:50] sustrik having a list of issues could make it move forward faster
[08:01] mikko good morning
[08:13] sustrik morning
[08:24] mikko Assertion failed: term_acks > 0 (own.cpp:175)
[08:24] mikko this random assertion keeps popping up
[08:24] mikko let me make sure that i got the latest master
[08:35] mikko sustrik: at the moment on master: the context close will block even if the sockets are closed ?
[08:36] mikko assuming there are messages in-flight waiting to be sent
[08:55] mikko hmm
[09:08] sustrik mikko: yes
[09:09] sustrik the requirement was not to drop messages, so someone has to wait till they are sent
[09:09] mikko sustrik: take a look at this
[09:09] mikko sec
[09:10] mikko
[09:10] mikko seems to result into deadlock
[09:10] mikko zmq::ctx_t::terminate (this=0x601010) at semaphore.hpp:117
[09:10] sustrik you haven't closed the sockets
[09:10] mikko let me close
[09:11] sustrik thus the context has no idea whether there are more messages going to be sent or what
[09:11] mikko because i keep getting a deadlock in php
[09:12] mikko which i cant reproduce in plain c
[09:12] mikko i assume it has something to do with destruction order
[09:15] mikko Assertion failed: !prefetched (xrep.cpp:108)
[09:15] mikko now i got this out
[09:18] mikko also Assertion failed: inpipe_ && outpipe_ (xreq.cpp:42)
[09:18] mikko i think i must be doing something wrong
[09:21] sustrik mikko: that's your test program>
[09:21] sustrik ?
[09:21] sustrik in C?
[09:21] mikko sustrik: i can see how these happen
[09:22] mikko yes
[09:22] mikko C
[09:22] sustrik can you paste it, so that i can try?
[09:23] mikko first one:
[09:23] mikko i think i must have error there
[09:23] mikko as it ends up blocking on recv
[09:24] sustrik what about the assertions?
[09:24] sustrik what version are you using?
[09:24] mikko comment out lines 45 - 49
[09:24] sustrik xrep.cpp:108 has no assert in HEAD
[09:24] mikko and you will get Assertion failed: !prefetched (xrep.cpp:108)
[09:24] mikko let me see which version i got
[09:25] mikko i thought i got latest master but i'll recheck
[09:25] mikko taking a fresh checkout just in case
[09:30] sustrik when i remove the lines 45-49
[09:30] sustrik program exits with no problem
[09:30] sustrik when i keep them in it freezes
[09:30] mikko it blocks on recv() ?
[09:30] mikko is that expected or do i have some silly error there?
[09:31] mikko sustrik:
[09:32] mikko ?
[09:33] sustrik hm, you are right
[09:33] sustrik i wonder why it's not on my box
[09:33] mikko so commenting out lines 45-49 causes Assertion failed: !prefetched (xrep.cpp:108)
[09:35] sustrik ack, i'll remove the assert
[09:36] sustrik it was a patch I've applied without thinking about it sufficiently :|
[09:36] mikko
[09:37] mikko that causes
[09:37] mikko Assertion failed: inpipe_ && outpipe_ (xreq.cpp:42)
[09:37] sustrik as for the freeze, it's hung up in zmq_recv
[09:37] mikko the freeze is unexpected?
[09:38] sustrik nope
[09:38] sustrik when using XREP
[09:38] sustrik you have to send the identity first
[09:38] mikko will zmq_poll show it readable?
[09:39] sustrik when exactly?
[09:39] sustrik btw, changing socket types to REQ/REP works OK
[09:40] mikko it's blocking on zmq_recv, i wonder if polling socket before the recv show it as readable
[09:45] sustrik it should not
[09:45] mikko i can test
[09:51] mikko zmq_poll returns it not readable
[09:51] mikko good
[09:52] sustrik ack
[09:52] mikko will zmq_poll show socket non-writable if HWM has been reached?
[09:52] mikko the inpipe/outpipe assert might be because of incorrect usage of XRE(P|Q) sockets
[09:53] sustrik mikko: yes
[09:53] sustrik it will show !writeable
[09:54] sustrik as for the assert, it should not happen even if the sockets are used in incorrect way
[09:54] sustrik i'll check
[09:57] mikko this is also supposed to block on zmq_term?
[09:57] mikko i assume because i connect the PUB socket
[10:25] CIA-14 zeromq2: 03Martin Sustrik 07master * rf22e85f 10/ src/xrep.cpp :
[10:25] CIA-14 zeromq2: Reverting commit 1d431190f50c86f62460
[10:25] CIA-14 zeromq2: The patch was supposed to check that pipe writer sends messages
[10:25] CIA-14 zeromq2: in atomic fashion. However, it prevented the user to read
[10:25] CIA-14 zeromq2: half of a message and close the socket.
[10:25] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[10:25] sustrik mikko: the assert is removed from master
[10:25] mikko good!
[10:26] sustrik what next?
[10:26] mikko it's odd that PUB socket close semantics are different depending on whther you bind or connect
[10:26] mikko that might be confusing for new users
[10:26] sustrik it's that way for all sockets
[10:26] sustrik when you connect, a queue is created
[10:26] sustrik the messages are stored in it
[10:27] sustrik when you bind, there's no queue
[10:27] sustrik as you don't even know how many peers there are going to be
[10:27] sustrik a queue for a peer is created when the peer connects
[10:28] mikko tricky situation, i think the current semantic for close is a bit problematic but apart from timeout i can't really think anything better either
[10:29] sustrik yes, samw here
[10:29] sustrik same*
[10:29] mikko it's too easy to shoot yourself in the leg at the moment
[10:29] sustrik you mean by blocking in term, right?
[10:29] mikko for example if your remote peer goes down it might cause things to block eternally. in case of something like php scripts that would bring the whole site down
[10:30] sustrik ack
[10:30] sustrik we need to add SO_LINGER option
[10:30] sustrik btw, reproduced the xreq.cpp:42 problem
[10:37] mikko good!
[10:38] mikko sustrik: even SO_LINGER is slightly undeterministic
[10:39] mikko as the caller can't know whether it blocks due to "not being able to send" or whether it's sending but hasn't flushed everything yet
[10:39] mikko what about making zmq_term non-blocking and returning error code if there are messages in-flight?
[10:39] mikko that was user can handle the different scenarios as needed
[10:40] mikko or zmq_term(ctx, 0) for blocking zmq_term(ctx, ZMQ_NOBLOCK);
[10:40] mikko latter would come back with EAGAIN if it's still flushing stuff
[10:41] mikko that is an API breakage but isn't api breaks possible in 2.1 ?
[10:43] mikko it would enable to do things such as:
[10:45] mikko the blocking version could also use so_linger to determine timeout
[10:45] mikko that way the core library doesn't need to try to give 'one size fits all' solution but to delegate it to the user
[10:46] sustrik what's the difference between "not being able to send" and "haven't flushed everything yet"?
[10:48] mikko not being able to send is for example if there are no lower level sockets open (not sure if context knows this)
[10:48] mikko and the latter is when the messages are flying out to the network stack
[10:50] sustrik by the former you mean that there wasn't zmq_bind or zmq_connect called on the socket?
[10:53] mikko yes, that as well
[10:54] mikko i don't know whether the context knows things about zmq_connect getting back connection refused
[10:54] mikko and there is no active connection
[10:55] mikko the main problem in close are 'connect'ed sockets
[10:55] mikko i assume
[10:56] mikko for example: 1. create pub socket 2. call zmq_connect 3. send() (under the hood socket gets connection refused) 4. close the socket 5. close the context
[10:57] mikko in this scenario the remote peer is not there so you cannot send
[10:57] mikko not sure if that is too much state
[10:58] sustrik how does that differ from the case when server went down while sending the message?
[11:00] sustrik anyway, if you want to define consistent semantics for the shutdown, you have to forget about underlying transport
[11:00] sustrik details of how TCP works are irrelevant
[11:01] mikko but that information is relevant to me as a user
[11:01] sustrik why so?
[11:01] mikko if i call close and there are 100 messages in-flight
[11:01] mikko if the same 100 messages are there after 10 seconds i want to be able to act on it
[11:02] guido_g because the app-developer knows how to handle the situation
[11:02] guido_g 'morning btw
[11:02] mikko exactly, because my close semantics might depend on the data that the specific socket has been handling
[11:03] mikko in some cases i might want to block until they are sent, even if it took days
[11:03] sustrik so what you want is reliable delivery
[11:03] guido_g no
[11:03] guido_g more information on what is going on
[11:03] sustrik either get the message to the peer or return it to the sender
[11:03] mikko in some cases i might want to discard them if they are not being sent
[11:03] sustrik that's what SO_LINGER is for
[11:03] guido_g some sort of introspection of the current state of a ømq context or socket
[11:04] sustrik impossible in distributed environment
[11:04] sustrik the message may be in a device somewhere
[11:04] mikko sustrik: i don't care about that
[11:04] sustrik the library has no idea what state it is in
[11:04] guido_g that's bad
[11:04] mikko sustrik: as a developer all i care is that it has left my program
[11:04] mikko or that it's not leaving my program
[11:05] mikko think about the following scenario: i send 100 huge messages, the remote peer is consuming them but slowly. given small so_linger the messages might be discarded even if the remote peer is actually consuming
[11:06] mikko that situation is different from a situation where the messages are in memory and are not being consumed at all
[11:06] mikko i'm not saying that so_linger is not useful. it is for some scenarios but it's still a bit non-deterministic
[11:07] mikko if i've closed my sockets, i'm not sending anything and the messages are not leaving my program i would like to know about that
[11:07] mikko i dont need to care whether the remote peer is actually down or network is down. i just want to know they are not being sent and act on it
[11:07] sustrik i think the problem in your reasoning is that you assume we know whether messages are being consumer or not
[11:07] mikko depending on data i might choose to discard it or store locally
[11:07] sustrik what does it exactly mean?
[11:08] sustrik consumed*
[11:09] mikko apart from inproc, to me it means that the message has left the current program
[11:10] sustrik we can drop them then, no?
[11:10] mikko as a developer i would like to choose
[11:10] mikko keep blocking or discard
[11:14] sustrik i still don't follow, how would you do the decision, based on what?
[11:15] mikko i would do the decision based on the data
[11:15] mikko (not sure if that answers the question)
[11:15] sustrik what data?
[11:16] mikko let me try to write down the scenarios i got in my head
[11:16] mikko just a sec
[11:16] sustrik you mean based on number of messages in 0mq's send buffer?
[11:17] mikko the data that my application was handling and based on whether the send buffer is getting smaller on a period of time
[11:17] sustrik ah, you want to shutdown depending on the throughput
[11:18] sustrik if throughput goes below certain threshold => shutdown
[11:18] mikko that was my original suggestion
[11:18] mikko ages ago
[11:19] sustrik yeah, that's semantically consistent solution
[11:19] mikko because as an application developer i might want to do different decisions based on the data available to me: how many messages in flight? are the messages leaving my program? what kind data i was sending, can i just discard it or do i need to do more?
[11:22] sustrik "how many messages in flight?"
[11:22] sustrik that's messages in 0mq transmit buffer?
[11:22] mikko yes
[11:23] mikko i hope you see my point through this babbling
[11:23] mikko :)
[11:23] sustrik what about messages in TCP tx buffer?
[11:24] mikko how large buffers are we talking about?
[11:25] sustrik TCP tx buffer?
[11:25] sustrik depends
[11:25] sustrik 128kB
[11:25] sustrik 1MB
[11:25] sustrik shrug
[11:25] guido_g on one side we're not allowed to see tcp through ømq and on the other side we're asked what we need to know about it's state, confusing
[11:26] sustrik exactly
[11:26] sustrik you should not see it at all
[11:27] guido_g what i'd like to see in the future is more thought on how to get these parameters of operation out of ømq
[11:27] guido_g for things like monitoring
[11:27] sustrik ack
[11:28] sustrik there are 2 levels to the monitoring imo
[11:28] sustrik 1. network monitoring
[11:28] guido_g i -- in the role of an ops guy -- want to know how many conenctions from which host are done, if there are failures and how much per etc.
[11:28] sustrik done on IP level
[11:29] sustrik 2. device monitoring -- connecting to 0mq device and finding out how many messages are queued there and so on
[11:29] sustrik what's a failure?
[11:29] guido_g also i want to correlate that with the applications state and behaviour
[11:29] guido_g a failure is this kind of situation that ops defines as a failure
[11:29] guido_g nothing more or less
[11:30] sustrik can you give an example?
[11:30] guido_g in my eyes ømq as a library should provide a way to peek into it's workings
[11:30] guido_g monitoring != alerting
[11:30] guido_g the monitoring is just collecting the data -- for starters
[11:31] guido_g if i can't get key data like average queue sizes i'm basically lost
[11:32] guido_g i know that this data isn't accurate, but it hasn't to be
[11:32] guido_g most data is aggregated anyway
[11:32] sustrik the problem is there's no real definition for "messages in flight"
[11:33] sustrik if what you are worried about it memory consumption
[11:33] sustrik you should monitor the memory used by your app
[11:33] guido_g then stick a different label on the data and be done
[11:34] guido_g sure, memory, cpu, ctx switches all known
[11:35] guido_g except for the fact that (seen from app level) i can't say: for timespan ts there were 1000 messages send from node a, but only 40 received by node b
[11:35] guido_g which amazingly correlates with the memory consumption on node a
[11:35] guido_g and the reconnect rate of the corresponding sockets
[11:35] sustrik wait a sec
[11:36] guido_g sure
[11:36] sustrik why can't you say how many messages you've sent and how many you've received?
[11:36] guido_g this one i can do
[11:37] guido_g but it gets a little complicated if ømq routing kicks in
[11:37] guido_g and queueing
[11:37] guido_g then i'm completey blind
[11:37] guido_g obviouskly a fact i don't anticipate
[11:37] sustrik the queueing is just a buffer, same as tcp tx buffer
[11:37] sustrik set the HWM
[11:38] sustrik and you have an upper limit on the buffer
[11:39] guido_g why is it so complicated to understand that this data is kind of important?
[11:40] sustrik because it has no clear semantics
[11:40] sustrik if you can't say what the figure means, you don't need it
[11:40] guido_g huh?
[11:41] sustrik all i want is a clear definition of the figure you want 0MQ to provide
[11:42] sustrik one that won't change arbitrarily depending on where the data is accidentally stored
[11:42] guido_g why should I define "sematics" of data that is alreay there? shouldn't this be done beforehand?
[11:42] sustrik whether it's in 0mq buff, tcp buff, NICs buff etc.
[11:42] guido_g we're talking about ømq
[11:42] sustrik let me give you an example
[11:42] guido_g so the topic is set, no ip, tcp or moonphase
[11:43] sustrik say you connect
[11:43] sustrik then you send a message
[11:43] sustrik the peer goes offline in the meantime
[11:43] sustrik what's the number of "messages in flight"?
[11:44] guido_g not in flight
[11:45] guido_g there is a number of messages in the queue
[11:45] sustrik ok, so what's the number of messages in queue
[11:45] guido_g this would be one of the numbers people might be interested in
[11:45] sustrik ?
[11:45] guido_g how much messages are in the send queue or queues
[11:46] sustrik 1?
[11:46] sustrik the problem is it depends on details of how TCP works
[11:46] sustrik and timing
[11:46] guido_g NO
[11:47] guido_g it depends on how many send calls have put something into the queues, no?
[11:47] sustrik no
[11:47] guido_g tcp is no ømq
[11:47] sustrik what happens is that 0mq is either able to push the message to TCP buffer
[11:48] guido_g then it's remove from the send q, right?
[11:48] sustrik before TCP realises the other endpoint is not available
[11:48] sustrik or the order of events is reverse
[11:48] sustrik i.e. TCP realises the peer is not available first
[11:48] guido_g see, you're thinking way to deep here
[11:49] sustrik then the message stays in 0mq buffer
[11:49] sustrik so the figure is either 0 or 1
[11:49] sustrik depending on tcp details
[11:49] guido_g it just about getting some numbers, that might help to spot or trace problems and perfrmance
[11:49] sustrik exactly
[11:50] sustrik so let's define them in a consistent way
[11:50] sustrik rather then depending on details of underlying network transport
[11:50] sustrik that way you are generic, consistent and future-proof
[11:50] guido_g as i said, number of messages in a queue is a very nice and probably useful number
[11:51] sustrik it's a definition based on implementation details
[11:51] sustrik real definition should be based on observable behaviour
[11:52] guido_g no
[11:52] guido_g because you provide an "abstration"
[11:52] sustrik exactly
[11:52] sustrik abstraction works only if you abstract from implementation details
[11:52] guido_g the the visible behaviour does not show what is going on
[11:53] sustrik i mean observable bahviour such as "memory usage"
[11:53] sustrik that's pretty clear
[11:53] guido_g every abstraction leaks
[11:53] guido_g the more you want to hide, the more leakage happens
[11:53] guido_g a bad situation for both sides
[11:54] sustrik ok, we've got into theoretical discussion :)
[11:54] guido_g the app-devs are using "undocumented features" to get what they want and the lib-devs try to stop that
[11:54] guido_g sustrik: not my fault
[11:54] sustrik :)
[11:54] sustrik it's about layering, in a correctly designed stack
[11:55] guido_g see, monitoring is an extremly important thing, imnsho
[11:55] sustrik if layer N doesnt' provide enough flexibility, you shift down to layer N-1
[11:55] sustrik guido_g: definitely
[11:55] sustrik but let's do it right
[11:55] guido_g i do need a lot of informations about the current state of my apps, including the comminication
[11:55] sustrik monitoring random implementation details makes no sense
[11:56] guido_g sure, but beeing picky on names of data isn't very helpful imho
[11:56] sustrik we have to monitor real data
[11:56] sustrik i don't care about name
[11:56] sustrik what i'm saying is that size of 0mq queue is an implementation detail
[11:56] guido_g but an important one
[11:56] guido_g if i use ømq i know that
[11:57] guido_g i mean, i knwo that i use ømq
[11:57] sustrik that's because you ignore all the layers below 0mq and all the devices on your path
[11:57] guido_g so no further abstraction is needed
[11:57] guido_g for now and this discussion, yes
[11:57] sustrik i still don't see what you would use the number for
[11:57] guido_g but devices are formed with ømq so...
[11:58] sustrik it's completely random
[11:58] sustrik if you send 200kB of messages
[11:58] guido_g no
[11:58] sustrik and there's TCP tx buffer of 120 kB
[11:58] sustrik you'll have 80kB in 0mq queue
[11:58] sustrik if the TCP buffer is accidentally set to 200kB
[11:58] sustrik the 0mq queue will be empty
[11:58] guido_g see it as an indicator
[11:59] sustrik exactly, it's an indicator
[11:59] guido_g w/o the data you will loose information on what the whole system is doing
[11:59] sustrik try to define what it is indicating
[11:59] sustrik then try to find a consistent indicator
[11:59] guido_g but with this indicator at hand, you might find a way to predict upcomming problems or shortcomming etc.
[11:59] guido_g this is the whole point of monitoring
[12:00] guido_g and if this number is already a problem, then wait for the tcp connection details one might need...
[12:03] guido_g ok, what number can ømq provide which reflects the number of messages that the application has sent but that are not put into the lower layer for delivery?
[12:04] guido_g i mean, there must be a point where ømq treats a message as delivered (in the sense that the lower level has taken responsibility)
[12:07] sustrik it's on 0MQ API
[12:08] sustrik when you call zmq_send, you transfer the responsibility
[12:10] guido_g to ømq
[12:10] guido_g but between the send on the app side and the send from ømq to os is "something"
[12:11] sustrik well, yes
[12:11] sustrik and?
[12:11] sustrik there's some 6 layers of functionality below zmq_send call
[12:11] sustrik most of them doing some buffering
[12:11] guido_g and because this "something" is quite important, one needs to know if "something" is feeling well etc.
[12:12] sustrik understand me right, i am not against monitoring
[12:12] sustrik i just want to monitor matrics that have real meaning
[12:12] guido_g ok
[12:12] sustrik let's rather start from use cases
[12:13] guido_g above i gave one
[12:14] guido_g "messages" put into ømq via send vs. "messages" removed von ømq responsibility
[12:14] guido_g ops
[12:14] sustrik that's not a use case
[12:14] sustrik that a solution
[12:14] guido_g it is
[12:14] sustrik use case is "what you want to do"
[12:14] guido_g no
[12:15] sustrik :)
[12:15] sustrik anyway, what do you want to do?
[12:15] sustrik i can see two options:
[12:15] guido_g most infrastructure things are not very well descibed by use-cases
[12:15] sustrik 1. memory monitoring
[12:15] sustrik 2. latency monitorring
[12:15] guido_g monitoring in itself is not a closed system that can be described statically
[12:16] sustrik c'mon you have to know what you want :)
[12:16] guido_g for all these points we need some numbers, right?
[12:16] sustrik yes, we need metrics to monitor
[12:16] guido_g i know what i want now, yes
[12:16] guido_g but i cant know what ops will need in 3 month/years
[12:17] sustrik solve those then
[12:17] guido_g but i've to provide as much of possibilities as possible
[12:17] guido_g that's my job
[12:17] guido_g if you don't have the data, you can't
[12:17] sustrik my job is to cur possibilities :)
[12:17] sustrik cut
[12:17] guido_g good
[12:18] sustrik some balance may result from us two discussing
[12:18] guido_g yes
[12:18] sustrik basically, 0mq resulted from taking a corporate middleware and cutting everything not strictly needed off
[12:19] guido_g and now we need to put things back in, otherwise it's not useable for larger projects
[12:19] sustrik so when adding a feature back we need a serious understanding of why it's needed
[12:19] guido_g where larger is more then a handfull of nodes
[12:19] sustrik otherwise we'll end up back in corporate middleware sphere
[12:19] sustrik agreed
[12:19] sustrik but extreme caution is needed
[12:20] guido_g ack
[12:20] guido_g one of the "key features" of ømq is its size
[12:20] sustrik low memory footprint
[12:20] sustrik right
[12:20] guido_g and slick api
[12:20] sustrik yes
[12:21] sustrik we need a way to keep the memory footprint low
[12:21] sustrik i am aware of that
[12:21] sustrik HWM is already implemented
[12:21] sustrik we need "max message size" option
[12:21] sustrik as well
[12:21] sustrik but that's orthogonal to monitoring
[12:22] guido_g i think we should not start on the api side of monitoring
[12:22] guido_g we should start by finding "interesting" data points in ømq
[12:23] sustrik ack
[12:23] sustrik so, can you produce a monitoring use case?
[12:23] guido_g i we have that, i'm sure there will be a consistent way to gain access to them
[12:23] guido_g i'll try
[12:23] sustrik that'll be great
[12:23] sustrik are you going to arrive at amsterdam btw?
[12:24] guido_g hmmm...
[12:24] guido_g would be the most expensive beer i ever had
[12:24] sustrik same here
[12:24] mato hi guys
[12:25] sustrik anyway, i think we should so a conference later on anyway
[12:25] guido_g but on the other hand, would be nice to discuss face to face (and scare away innocent bystanders :)
[12:25] mato sustrik: check also brussels, pieter was offering space to crash at his place
[12:26] sustrik i though of doing some event during/after FOSSDEM
[12:26] sustrik that's february
[12:26] sustrik and makes the whole thing more worth of coming
[12:26] sustrik as you can attend the conference as well
[12:27] guido_g sounds good
[12:30] guido_g ok, need to do soemthing for my health (besides eating :)
[12:31] guido_g will come up with some ideas regarding monitoring
[12:33] sustrik great
[12:33] sustrik thanks
[13:20] mikko sustrik: does this look familiar?
[13:20] mikko seems like the segfault happens inside uuid
[13:45] sustrik mikko: no
[13:45] sustrik yes, the segfault is inside uuid
[13:46] sustrik it's either invalid buffer passed to uuid_generate
[13:46] sustrik or a bug in libuuid
[13:46] sustrik anyway, hard to say what have gone wrong without reproducing the case
[13:47] mikko i remember seeing this ages ago. it was due to linking order of libuuid
[13:47] sustrik oh my
[13:47] mikko
[13:48] mikko someone blogged about similar issue where two modules are linked against libuuid
[13:48] mikko and it was fixed by changing the loading order of them
[13:48] mikko which sounds pretty strange
[13:50] sustrik well, if libuuid has some code hooked to the loading of the library
[13:50] sustrik some strange misinteraction may happen
[13:50] sustrik causing it to be used before it is initialised
[13:53] sustrik maybe it's initalised twice
[13:53] sustrik then deinitialised once
[13:53] sustrik then called
[13:53] sustrik ?
[13:54] mikko im just reading through libuuid code
[13:56] mikko it's not that
[13:57] mikko the guy commented
[13:57] mato sustrik: see my email re the version patch, please put back the two lines I asked for, you've broken make dist
[13:58] mato sustrik: that *and* the version number propagation to doc/Makefile
[14:00] mikko sustrik: rather interesting valgrind output
[14:05] mato mikko: rpath patch has been sent off to sustrik for applying
[14:05] mikko mato: nice
[14:05] mikko i found hudson iphone application
[14:06] mikko i've been checking the builds even on the move :)
[14:06] mato :-)
[14:10] sustrik mato: hey
[14:10] sustrik should i apply patches to the build system?
[14:10] mato sustrik: damnit well, you just did
[14:10] mato sustrik: and you broke it
[14:11] mato sustrik: so now please fix what you broke :-)
[14:11] sustrik i mean from procedural point of view
[14:11] sustrik am i the only committer?
[14:11] mato sustrik: you're the only committer to the github hosted-repository, yes
[14:11] mato sustrik: that's the way it should work
[14:11] sustrik ok
[14:12] mato sustrik: otherwise things get problematic due to the maint branch
[14:12] mikko no more holidays for Martin
[14:12] mato :-)
[14:12] sustrik ugh
[14:12] sustrik :)
[14:13] mato sustrik: it'll help a lot if you eventually use a real mail client and/or pull requests
[14:13] mato sustrik: since as I showed you, applying X patches becomes one command
[14:13] mato no hand work involved
[14:14] sustrik you have to show me how to do that later on
[14:15] mato will do, but you'll have to move to a better mail client
[14:15] mato since Thunderbird doesn't understand "Save As" means "Save this without mangling it" :-)
[14:41] CIA-14 zeromq2: 03Martin Sustrik 07maint * r6cd0867 10/ :
[14:41] CIA-14 zeromq2: Fixing the Red Hat packaging
[14:41] CIA-14 zeromq2: When adding ZMQ_VERSION macros, I incorrectly removed
[14:41] CIA-14 zeromq2: the PACKAGE_VERSION macro. Adding it back.
[14:41] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[14:41] CIA-14 zeromq2: 03Martin Lucina 07maint * r57428db 10/ : (log message trimmed)
[14:41] CIA-14 zeromq2: Do not patch libtool rpath handling
[14:41] CIA-14 zeromq2: For historic reasons (mainly compatbility with really old libtool), configure was
[14:41] CIA-14 zeromq2: patching libtool to not use rpath in binaries. This breaks (among other things)
[14:41] CIA-14 zeromq2: correct operation of "make check" since the test binaries may not be run with
[14:41] CIA-14 zeromq2: the correct shared library version.
[14:41] CIA-14 zeromq2: Current best practice as seen e.g. at suggests
[14:42] sustrik mato: done
[14:43] mikko do those go into master as well?
[14:43] CIA-14 zeromq2: 03Martin Sustrik 07master * r6cd0867 10/ :
[14:43] CIA-14 zeromq2: Fixing the Red Hat packaging
[14:43] CIA-14 zeromq2: When adding ZMQ_VERSION macros, I incorrectly removed
[14:43] CIA-14 zeromq2: the PACKAGE_VERSION macro. Adding it back.
[14:43] CIA-14 zeromq2: Signed-off-by: Martin Sustrik <> -
[14:43] CIA-14 zeromq2: 03Martin Lucina 07master * r57428db 10/ : (log message trimmed)
[14:43] CIA-14 zeromq2: Do not patch libtool rpath handling
[14:43] CIA-14 zeromq2: For historic reasons (mainly compatbility with really old libtool), configure was
[14:43] CIA-14 zeromq2: patching libtool to not use rpath in binaries. This breaks (among other things)
[14:43] CIA-14 zeromq2: correct operation of "make check" since the test binaries may not be run with
[14:43] CIA-14 zeromq2: the correct shared library version.
[14:43] CIA-14 zeromq2: Current best practice as seen e.g. at suggests
[14:43] CIA-14 zeromq2: 03Martin Sustrik 07master * re168173 10/ :
[14:43] CIA-14 zeromq2: Merge branch 'maint'
[14:44] mato sustrik: thx
[14:44] mikko ah
[14:46] sustrik mikko: yes?
[14:47] mikko sustrik: building now
[14:48] mikko All 7 tests passed
[14:48] mikko rpath thingie fixed the build for me
[14:48] sustrik great
[14:48] sustrik how come all the bindings work
[14:48] sustrik ?
[14:48] mato mikko: you might want to add 'make dist' to the build also
[14:48] sustrik mikko: what's the link?
[14:48] mikko
[14:48] mikko it's building all the dependent projects atm
[14:49] mikko yeah, means "photograph" in finnish
[14:49] mikko mato: i'll add it
[14:49] sustrik wow, great, i know one finnish word now
[14:49] keffo terve!
[14:49] sustrik though, i can't remember it :)
[14:49] mikko mato: done
[14:53] mikko building with 'make dist' now
[14:55] sustrik mato: btw, there was some discussion about changing the license headers in 0mq source code
[14:55] sustrik is there any outcome of that?
[14:55] mato sustrik: talked about it with pieter, afaik does not need to be changed
[14:56] mato sustrik: only the README and various supporting files (LGPL exception) need to be changed
[14:56] mato sustrik: but not the actual source files, since neither the original copyright (iMatix) nor the license (LGPL) has changed
[14:56] sustrik There's wrong name of the license there
[14:56] mato sustrik: oh, only thing is, there is a wording error
[14:56] mikko make[2]: *** No rule to make target `zmq_forwarder.1', needed by `dist-hook'. Stop.
[14:56] mato yes, i just remembered
[14:57] mato mikko: ja, you want asciidoc + xmltol for make dist, since it generates documentation
[14:57] mikko im missing the doc generation tools
[14:57] mikko hmm
[14:57] mikko should make dist fail if those are not in place during configure?
[14:57] mato make dist is special
[14:57] mato in that, most users will never touch it
[14:57] mato so, maybe, no, whatever, doesn't matter right now :-)
[14:58] mikko hehe
[14:58] mato sustrik: yeah, so, that stuff should be fixed, but no hurry
[14:58] mato sustrik: TBD before a release
[14:58] mikko installing the tools and rebuilding soon
[14:58] sustrik akc
[14:58] sustrik ack
[14:58] mato sustrik: or we can do it together some time, involves writing a script
[14:59] mato sustrik: that way you don't go changing files by hand :-)
[14:59] sustrik i can do it by hand
[14:59] sustrik but script is definitely better
[15:00] mato sustrik: man, sometimes i feel you actually like feeding computers :-)
[15:00] mato sustrik: "do it by hand" ... geez ...
[15:00] keffo they're supposed to be fed!
[15:00] sustrik it imposes discipline on a programmer
[15:00] sustrik which is a good thing
[15:00] mato fed by code, not by programmers :-)
[15:00] sustrik a bit similar to brainwashing
[15:00] sustrik bit still good :)
[15:00] keffo indeed, programmers solve problems.. The best programmer is the one already done :)
[15:31] mikko mato:
[15:31] mikko make dist built that
[15:32] mato mikko: great...
[15:34] mikko was missing zip as well
[15:34] mikko noticed
[16:00] CIA-14 zeromq2: 03Steven McCoy 07master * r5b8af52 10/ (src/pgm_receiver.cpp src/pgm_sender.cpp):
[16:00] CIA-14 zeromq2: Fix assertion in PGM transports on cancel_timer
[16:00] CIA-14 zeromq2: Signed-off-by: Steven McCoy <> -
[16:14] delaney hi, i'm curious why there isn't pre-built binaries for windows on the site download page
[16:17] mato sustrik: those patches you applied from steve, where did they come from?
[16:18] sustrik from steve
[16:18] mato sustrik: the reason i'm asking is that on the ML I did not see patches with a Signed-Off-By tag
[16:18] mato sustrik: but the commit has a Signed-Off-By tag...
[16:18] mato sustrik: so I'm confused
[16:19] sustrik damn, i've got that wrong
[16:19] sustrik let me ask steven to sign them off post hoc
[16:20] mato np, you're learning ... signing them off "post hoc" won't really help anything now
[16:20] mato anyway, no real problem
[16:20] mato no panic
[16:21] mato just do remember to double check what you're pushing to github makes sense :-)
[16:21] sustrik why won't it help?
[16:21] mato because it's not in the git history
[16:21] mato hence not persistent
[16:22] sustrik ?
[16:22] sustrik there's sign-off in the repo
[16:22] mato it doesn't matter much though since the licensing is automatic
[16:22] mato signoff is just tracking
[16:22] mato ah, right, added by you :-)
[16:22] mato bad you :-)
[16:22] delaney i'm a python guy and was looking to use zmp, would it be useful to the project to include the dll i just made on the downlaod area?
[16:22] sustrik so we need just steve to approve the sign-off now
[16:23] mato well, there's no point
[16:23] mato you added a signed-off-by tag
[16:23] mato anyhow
[16:23] sustrik by signing if off steven basically says "yes, i've created the patch myself"
[16:23] sustrik that's it
[16:23] mato the only point is, review everything you actually push to master :-)
[16:24] mato i thought you liked bureaucracy :-)
[16:24] sustrik i do
[16:24] sustrik i just need few more patches to get it right
[16:24] sustrik delaney: the problem is not providing binaries, rather maintaining them
[16:24] mato or you could use the right tool... hang on... bureauracy... right... involves doing everything by hand so that the job take as long as possible :-)
[16:25] mato i have to go
[16:25] sustrik cya
[16:25] mato sustrik: will make a robust patch for the version stuff, the latest idea looks ok
[16:25] mato cyl
[16:25] sustrik delaney: building new binaries when new version is released etc.
[16:26] delaney yeah, true. still. i'm getting a 'Unable to find vcvarsall.bat' from easy_install but i have msvc 2010 express, any ideas?
[16:27] sustrik no idea, sorry
[16:35] pieterh delaney, it's normally produced by the installer if you ask for command line use
[16:38] starkdg as an aside, is there any way to monitor buffer length in the io queues ? the number of messages ?
[16:38] starkdg it might be a feature worth considering ?
[16:46] delaney i'm trying to follow the which i didn't see before... I'm able to build the solution but there is no libzmq.lib in the zeromq2\lib directory, only the lizmq.dll
[16:48] delaney hmm, all seems to install to site-packages now, not sure what i did
[16:48] delaney is that a misprint, should it be 'copy libzmq.DLL'?
[16:56] delaney when i try to run the chat example i get
[16:57] delaney please excuse my c++/c n00bness
[17:02] mikko are you compiling against github master?
[17:03] delaney no, off the downloads, let me try that
[17:47] sustrik delaney: it looks like you are passing invalid argument to bind
[17:48] sustrik what's the string you are using?
[18:18] delaney python
[18:18] delaney using the examples/chat, haven't touched the code
[18:44] sustrik you are missing the port number i would say
[18:49] pieterh delaney, did you read the user guide?
[19:00] delaney ah, no i didn't thought i'd just run the examples
[19:00] delaney that makes more sense
[20:34] rphillips I'm running strace on a misbehaving subscriber daemon. I don't see a TCP connect after zmq_connect() with a tcp:// endpoint
[20:34] rphillips should I?
[20:39] rgl not immediatly. but soon a connection is going to be made in a background thread (the I/O thread of zmq).
[20:42] rphillips strange... I don't see one
[21:09] rphillips rgl: zdevice zmq_forwarder "tcp://" "tcp://" is creating netlink raw sockets on my system... that doesn't look correct
[21:11] rgl maybe zmq has special handling for the loopback interface
[21:11] rgl can you try connecting to different machines?
[21:20] rphillips that seems to help... I'll have to submit a patch to the resolver code
[21:20] rphillips thanks