[Time] Name | Message |
[09:13] pieterh
|
sustrik: you there?
|
[09:21] sustrik
|
pieterh: hi
|
[09:22] pieterh
|
I have this feeling as we roll out 2.1.x people will complain about zmq_term
|
[09:23] sustrik
|
we can switch LINGER default to 0
|
[09:23] pieterh
|
that would make sense IMO
|
[09:23] pieterh
|
or if not 0, then 1 second or whatever
|
[09:23] sustrik
|
but that would mean that such a simple program as:
|
[09:23] pieterh
|
neither 0 nor infinity are sensible defaults IMO
|
[09:23] sustrik
|
send();close();term();exit()
|
[09:24] sustrik
|
would not send anything
|
[09:24] pieterh
|
sensible defaults work for simple programs
|
[09:24] sustrik
|
anything larger than 0 and less than infinity does not make sense imo
|
[09:24] sustrik
|
such a default is a trap
|
[09:25] sustrik
|
it works in low-load envs (test env)
|
[09:25] sustrik
|
and break when high load hits
|
[09:25] sustrik
|
you can try some experimenting with throughput perf test
|
[09:25] pieterh
|
possibly, but if you make it hard for people to write simple code, they won't
|
[09:25] sustrik
|
if you set linger to say 10
|
[09:25] pieterh
|
and it's easy to make 0MQ explain wtf is going on
|
[09:25] sustrik
|
it works initially
|
[09:26] sustrik
|
unless you send a lot of messages
|
[09:26] pieterh
|
e.g. if linker is 1 second, it works, and if there is still unsent data at 1 second, it can say something
|
[09:26] sustrik
|
then it breaks mysteriously
|
[09:26] pieterh
|
mystery is entirely a design choice here
|
[09:26] pieterh
|
point is,
|
[09:26] pieterh
|
you're going to get a lot of complaints IMO
|
[09:26] sustrik
|
i'd be rather explicit than mysterious
|
[09:26] sustrik
|
let's see
|
[09:26] pieterh
|
right now it's extremely mysterious
|
[09:27] sustrik
|
well, it blocks
|
[09:27] sustrik
|
what's mysterious about that?
|
[09:27] pieterh
|
i don't even understand the reply you sent to Christian
|
[09:27] sustrik
|
deterministic behaviour
|
[09:27] sustrik
|
zmq_term() -> ETERM -> zmq_close()
|
[09:27] pieterh
|
deterministically useless behaviour
|
[09:27] sustrik
|
better than heisenbugs
|
[09:27] pieterh
|
erlang binding works around it
|
[09:28] pieterh
|
this is not going to end nicely...
|
[09:28] sustrik
|
what's the thing with erlang?
|
[09:28] pieterh
|
you didn't follow?
|
[09:29] sustrik
|
haven't seen anything
|
[09:29] pieterh
|
it tracks open sockets and secretly closes them all _before_ calling zmq_term
|
[09:29] sustrik
|
where's the discussion?
|
[09:29] pieterh
|
because an infinitely blocking system call is so insane
|
[09:29] pieterh
|
utterly... pathological
|
[09:30] sustrik
|
where's the discussion?
|
[09:30] pieterh
|
can't find it immediately
|
[09:31] pieterh
|
email is not a database
|
[09:31] pieterh
|
as I've pointed out many times, it's useless for finding back stuff
|
[09:31] pieterh
|
search for a thread titled "zmq_term() blocks in 2.1"
|
[09:32] pieterh
|
"For simplicity, the port driver doesn't handle any threads itself. Â And it never
|
[09:32] pieterh
|
actually calls a zmq library function that could block indefinitely."
|
[09:32] sustrik
|
zeromq-dev?
|
[09:32] pieterh
|
from chris <csrl@gmx.com>
|
[09:32] pieterh
|
yes, of course it's on zeromq-dev
|
[09:32] pieterh
|
and you were on that thread
|
[09:34] sustrik
|
aha, found it
|
[09:34] sustrik
|
that's not about LINGER
|
[09:34] sustrik
|
that's about necessity to close the sockets
|
[09:34] sustrik
|
same problem you've reported a long time ago
|
[09:35] pieterh
|
you mentioned Linger, I just pointed out that zmq_term blocks
|
[09:35] pieterh
|
and that people will hit this more and more
|
[09:36] sustrik
|
ok, there are several issue involved:
|
[09:36] sustrik
|
1. necessity to close sockets
|
[09:36] sustrik
|
it would be nice to be able to avoid that
|
[09:37] sustrik
|
however, i have no idea of how to do that
|
[09:37] sustrik
|
2. linger
|
[09:37] pieterh
|
no other classic OS API requires that kind of thing
|
[09:37] sustrik
|
3. Ctrl+C
|
[09:37] sustrik
|
this one was solved in one go with reaper thread
|
[09:37] sustrik
|
you are free to send a patch
|
[09:37] private_meta
|
uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
|
[09:38] pieterh
|
sustrik: "send a patch" is telling me to jump in the lake, you know that
|
[09:38] sustrik
|
well, i have no idea how to avoid 1.
|
[09:38] pieterh
|
I'm raising a concern that the "stable 2.1" release will annoy many people and break a lot of code
|
[09:38] sustrik
|
so, someone else have to
|
[09:38] pieterh
|
this makes it very hard for me to release that with any kind of confidence
|
[09:39] pieterh
|
except with a large disclaimer, which may be sufficient
|
[09:39] pieterh
|
but...
|
[09:39] pieterh
|
not when the explanation is confused
|
[09:40] sustrik
|
what should i say? i have no idea how to fix it
|
[09:40] sustrik
|
that's it
|
[09:40] pieterh
|
sustrik: look, I can document the need to close sockets
|
[09:40] pieterh
|
I can document the need to set LINGER even in the most trivial apps
|
[09:41] pieterh
|
that's just "oh, it's not so shiny and elegant anymore"
|
[09:41] pieterh
|
but if you tell people to call zmq_term before closing sockets, I'm kind of confused
|
[09:41] sustrik
|
let's not mix the issues
|
[09:41] sustrik
|
are you concerned about 1 or 2?
|
[09:41] pieterh
|
this is your breakdown, it's not mine
|
[09:42] sustrik
|
are you concerned about both?
|
[09:42] pieterh
|
my concern is just "zmq_term blocks in 2.1 and I don't know why"
|
[09:42] sustrik
|
ok
|
[09:42] sustrik
|
1. i can't solve it
|
[09:42] pieterh
|
it's relatively easy to solve in a simple threaded app
|
[09:42] sustrik
|
you are free to try
|
[09:42] pieterh
|
1. close sockets, 2. set linger = 0, terminate
|
[09:42] sustrik
|
2. linger is a problem
|
[09:42] pieterh
|
but if this starts to go wrong in multithreaded apps, people will _refuse_ to use 0MQ...
|
[09:43] pieterh
|
it's a class 1 fatal "no go" problem that will stop it going into production
|
[09:43] sustrik
|
setting linger to some default value
|
[09:43] pieterh
|
"sorry, we can't solve it" may be one answer
|
[09:43] pieterh
|
but it's a really crappy answer
|
[09:43] sustrik
|
means that even a single message won't pass
|
[09:43] sustrik
|
given it's sufficiently long and/or network is sufficiently slow
|
[09:43] pieterh
|
like I said, there are easy ways to make that work
|
[09:43] sustrik
|
?
|
[09:44] sustrik
|
sure, do so
|
[09:44] pieterh
|
a. use a sensible default linger value
|
[09:44] pieterh
|
b. if the app still has unsent messages after that, issue a loud warning
|
[09:44] sustrik
|
hm, returning an error from zmq_term()?
|
[09:44] sustrik
|
that may work
|
[09:45] pieterh
|
no, a loud warning
|
[09:45] sustrik
|
what's that?
|
[09:45] pieterh
|
printf ("123 messages not sent, please raise ZMQ_LINGER on socket")
|
[09:45] pieterh
|
etc.
|
[09:45] pieterh
|
something that gets literally printed and sent to logs
|
[09:45] pieterh
|
or sent on sys://log if that ever goes live
|
[09:45] sustrik
|
that works only with console is available
|
[09:45] sustrik
|
problem on windows
|
[09:45] pieterh
|
so, people _need_ consoles on production systems
|
[09:46] sustrik
|
but the error would kind of make sense
|
[09:46] sustrik
|
let me think about it
|
[09:46] pieterh
|
returning an error?
|
[09:46] sustrik
|
rc = zmq_term()
|
[09:46] pieterh
|
maybe but it just makes the caller responsible again
|
[09:46] pieterh
|
yes, it would at least be consistent
|
[09:46] sustrik
|
if (rc == EPENDINGMESSAGES)...
|
[09:46] pieterh
|
yes
|
[09:46] pieterh
|
and then set LINGER to 1 second by default please
|
[09:47] sustrik
|
ok, i'm going to think about it
|
[09:47] pieterh
|
this simplifies simple cases
|
[09:47] sustrik
|
the close() problem remains though
|
[09:47] pieterh
|
as for the deadlock issue, it just needs accurate documentation
|
[09:47] sustrik
|
ok
|
[09:47] pieterh
|
accurate, i.e. precisely what do people have to do to avoid it
|
[09:48] pieterh
|
this change to linger would be very good, at least it'll distinguish the deadlock from infinite linger
|
[09:48] pieterh
|
that's a headache today, not knowing what's actually going wrong
|
[09:49] sustrik
|
it won't distinguish the two cases :(
|
[09:49] sustrik
|
it will just timeout the term() after a while
|
[09:49] sustrik
|
and allow to restart it
|
[09:50] pieterh
|
you would also timeout the deadlock?
|
[09:50] sustrik
|
yes
|
[09:50] pieterh
|
...
|
[09:50] pieterh
|
but in the deadlock case there are zero messages to send
|
[09:50] sustrik
|
the deadlock is caused by the handshake
|
[09:51] sustrik
|
"tell me whether there are more you've queued"
|
[09:51] sustrik
|
"ok, there are no more messages"
|
[09:51] sustrik
|
the application thread's part of the handshake is executed in zmq_close() call
|
[09:51] pieterh
|
right
|
[09:52] pieterh
|
well, we know how many sockets are not responding, right?
|
[09:52] sustrik
|
yes
|
[09:52] pieterh
|
that's valuable information to report
|
[09:53] sustrik
|
yup
|
[09:53] pieterh
|
rc = number of unclosed sockets, maybe
|
[09:54] sustrik
|
possibly
|
[09:55] pieterh
|
you can't use EPENDINGMESSAGES unless you know there are actually messages waiting
|
[09:55] pieterh
|
something like ETIMEOUT
|
[09:55] pieterh
|
if we can make this work sensibly, IMO 2.1 is ready for the big stage
|
[09:56] pieterh
|
coffee, brb
|
[09:57] sustrik
|
to be ready for big stage we need ubscription forwarding :|
|
[10:03] pieterh
|
uhm, no, you just don't need to break every app already running...
|
[10:04] sustrik
|
well, it's actually a bugfix
|
[10:04] sustrik
|
people complained that messages are dropped on exit
|
[10:04] sustrik
|
namely, mato
|
[10:05] pieterh
|
well, there's always someone complaining... :-)
|
[10:06] pieterh
|
my radar mainly focuses on the dev list
|
[10:06] sustrik
|
it have been a common complaint back then
|
[10:06] pieterh
|
LINGER per socket is also kind of a strange choice
|
[10:07] sustrik
|
it's POSIX
|
[10:07] pieterh
|
yes, this was a necessary change, no argument with that
|
[10:07] pieterh
|
zmq_term is not POSIX :-)
|
[10:07] sustrik
|
zmq_term = OS shutdown
|
[10:07] pieterh
|
nope
|
[10:07] sustrik
|
yes
|
[10:08] pieterh
|
sigh
|
[10:08] pieterh
|
then why am I calling "Shutdown OS" in my apps?
|
[10:08] pieterh
|
0MQ is _not_ a kernel module
|
[10:08] pieterh
|
sorry, this is 2011 and we're on version 2.x.x
|
[10:08] pieterh
|
please remain in the present
|
[10:08] Steve-o
|
lol
|
[10:08] pieterh
|
you may have a vision of where 0MQ will go
|
[10:08] sustrik
|
zmq_term is *equivalent* to OS shutdown
|
[10:09] sustrik
|
not OS shutdown itself
|
[10:09] pieterh
|
but we are discussing today's code and today's design
|
[10:09] pieterh
|
again, I do not call OS shutdown in my apps
|
[10:09] sustrik
|
it does the same thing the TCP does with tx buffers on OS shutdown
|
[10:09] pieterh
|
please, this analogy is not helpful
|
[10:09] pieterh
|
it really is not helpful
|
[10:10] sustrik
|
it's what it does
|
[10:10] pieterh
|
"Sorry, sir, your app is deadlocking because zmq_term is like OS shutdown"
|
[10:10] sustrik
|
shrug
|
[10:10] sustrik
|
no point in this discussion
|
[10:10] pieterh
|
well, it'll keep coming back
|
[10:10] sustrik
|
i'll have a look at the timout for zmq_term()
|
[10:10] pieterh
|
you won't be able to make it work IMO
|
[10:10] pieterh
|
because you have LINGER per socket not per context
|
[10:11] sustrik
|
they are two different timouts
|
[10:11] pieterh
|
how would you modify the term timeout?
|
[10:11] pieterh
|
as a user, I mean
|
[10:11] sustrik
|
zmq_term_wiat (void *ctx, int timeout);
|
[10:12] pieterh
|
so revert the old method to not blocking, and introduce a new one?
|
[10:12] sustrik
|
introduce a new one
|
[10:12] pieterh
|
+1, gets my vote
|
[10:13] pieterh
|
it is totally explicit and leaves 2.0 semantics unchanged
|
[10:13] sustrik
|
reverting zmq_term() to immediate would be consistent with 2.0
|
[10:13] pieterh
|
ack
|
[10:13] sustrik
|
however, 2.1 users may complain
|
[10:13] sustrik
|
so it's up to concensus
|
[10:13] pieterh
|
HEY EVERYONE!!!
|
[10:14] pieterh
|
please ack/nack sustrik's suggestion here...
|
[10:14] sustrik
|
something like that
|
[10:14] sustrik
|
on mailing list
|
[10:15] pieterh
|
yes
|
[10:15] pieterh
|
it's a major topic, would you raise it then?
|
[10:16] sustrik
|
i have to think about the whole thing first
|
[10:17] pieterh
|
ok
|
[11:14] private_meta
|
hmm
|
[11:14] private_meta
|
uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
|
[11:15] pieterh
|
private_meta: Chapter 3 of the Guide
|
[11:16] pieterh
|
various routing based on using XREP / ROUTER socket and identities of peers
|
[12:14] stimpie
|
Does anyone have measurements or explanations on performance on a connection per thread/cpu versus a singele connection per system with a dispatcher to each thread?
|
[12:18] ianbarber
|
threads vs events? that's a big argument :) the http://www.kegel.com/c10k.html c10k page is a good overview, not 0MQ specific
|
[12:25] stimpie
|
thats an interesting read but not exactly what I'am thinking about, I have a system with x cores and x threads, messages from other systems need to arrive at those threads. I can create a socket for each thread or create 1 socket from where messages are dispatched to each thread.
|
[12:26] stimpie
|
With one socket each physical devices has only one address and other systems do not have to take the number of threads per system into account
|
[12:27] stimpie
|
Each thread a socket appears faster to me but requires more 'global' knowledge, (messages should be duplicated across physical machines)
|
[12:29] ianbarber
|
yeah, i see
|
[12:30] pieterh
|
stimpie: it's not really an either/or choice IMO
|
[12:30] pieterh
|
on the one hand you need a frontend able to poll your 10K sockets
|
[12:30] pieterh
|
but you usually also need a bunch of threads to do the real work
|
[12:30] pieterh
|
however it is pathological to create one thread per socket
|
[12:31] pieterh
|
see asyncsrv example in Chapter 3 of the guide
|
[12:32] ianbarber
|
pieterh: i think he's asking about one thread per core basically, with one tcp socket per thread, or a device on tcp with inproc/ipc to the other threads
|
[12:32] stimpie
|
ianbarber, those should have been my words ;-)
|
[12:33] pieterh
|
well, you want one thread per core for threads that do real work
|
[12:33] ianbarber
|
definitely
|
[12:33] pieterh
|
however, that does not map to TCP connections
|
[12:34] pieterh
|
not "one tcp socket per thread", nope
|
[12:34] pieterh
|
that would be an anti-pattern in 0MQ
|
[12:35] ianbarber
|
i think, tbh, that a forwarder type device would be fine, they're pretty quick. If you did want to have a TCP listener per core, then you could have them check in to a name service, and have your clients query the name service
|
[12:36] ianbarber
|
though I would have each of those TCP listeners be a separate process
|
[12:37] stimpie
|
So you think the overhead of the forwarde (dispatcher) would not be a negative impact?
|
[12:38] stimpie
|
Best way to find out is to benchmark I guess
|
[12:39] pieterh
|
stimpie: best way is to benchmark, try any device and see how it performs
|
[12:39] ianbarber
|
yeah
|
[12:40] stimpie
|
I will do, thanks for your thoughts
|
[12:40] pieterh
|
stimpie: the pattern I'd recommend is:
|
[12:40] pieterh
|
n clients, connecting as usual to a queue
|
[12:41] pieterh
|
m workers, where m is much smaller than n
|
[12:41] pieterh
|
queue talking over inproc to workers
|
[12:41] pieterh
|
total number of threads on the server is m + 1
|
[12:41] pieterh
|
if m is too large, you will lose time in context switching
|
[12:42] pieterh
|
sorry, total number of app threads on server is m + 1, there is also at least 1 I/O thread
|
[12:42] pieterh
|
so optimal value for m is (total cores on server box) - 2
|
[12:43] pieterh
|
assuming you can dedicate a whole multicore box to your server app
|
[12:43] pieterh
|
this would be for CPU-limited workers, it's different if they are I/O bound
|
[12:45] ianbarber
|
make sure to benchmark with as relastic conditions etc. as you can - it can be easy to benchmark with (say) much smaller messages than you'd normally use, and see a different performance character
|
[12:59] Guthur
|
pieterh: The new projects page seems a little similar to the labs page, imo
|
[12:59] pieterh
|
Guthur: yes, it's meant to overlap
|
[12:59] pieterh
|
this projects page is a temporary place to collect community projects
|
[13:00] pieterh
|
that is, projects we consider part of the 0MQ community and want to expose to potential contributors
|
[13:00] Guthur
|
ok, and then what is labs?
|
[13:00] pieterh
|
the Labs page goes a bit further and also doesn't really expose the core projects
|
[13:00] pieterh
|
so my idea with the projects business is to show these on the main community page
|
[13:00] pieterh
|
similarly as we do for the bindings
|
[13:01] Guthur
|
ok, so I suppose they should have a reasonable level of maturity
|
[13:01] pieterh
|
not necessarily but they should be tight extensions of 0MQ
|
[13:01] pieterh
|
rather than apps which use it
|
[13:01] pieterh
|
e.g. I'd consider zguide a project but not mongrel2
|
[13:02] Guthur
|
oh ok, that clears it up
|
[13:02] pieterh
|
ideally all these projects would gravitate towards the same workflow, core community of contributors, infrastructure, etc.
|
[13:02] pieterh
|
like the bindings
|
[13:03] pieterh
|
I had this vision of making it into a dashboard like this: http://extensions.wdeditor.com/
|
[13:03] pieterh
|
that's based on my design
|
[13:03] pieterh
|
but it'd have to be red/black/white of course :-)
|
[13:04] Guthur
|
of course, hehe
|
[13:05] pieterh
|
so you come to the community site and see a whole bunch of projects, each with a name/person/graphic
|
[13:05] pieterh
|
I guess we're moving towards that very slowly
|
[13:06] Guthur
|
so for an example, where would a implementation of the FIXT 1.1 (Transport Independent) protocol using ZeroMQ as the transport lie
|
[13:06] pieterh
|
it's really up to the owner
|
[13:06] pieterh
|
it's a choice: move it into the 0MQ community or keep it separate
|
[13:07] Guthur
|
ok
|
[13:07] pieterh
|
if, for example, there were several such bridges, it would be great to see them as 0MQ projects
|
[13:08] pieterh
|
let me give another example
|
[13:09] pieterh
|
I'm working on Whaleshark (http://zero.mq/ws)
|
[13:09] pieterh
|
which depends on a bunch of other 0MQ layers
|
[13:09] pieterh
|
like a name service, security service, etc.
|
[13:09] pieterh
|
it could be fun to also include FIXT support
|
[13:09] pieterh
|
so if the FIXT layer was aimed at 0MQ apps like Whaleshark, it's a natural 0MQ project
|
[13:10] pieterh
|
but if it's aimed at FIXT apps, it's not
|
[13:10] Guthur
|
FIXT seemed like a nice place to start with FIX and 0MQ, due to its transport independent spec
|
[13:10] Guthur
|
ok i understand
|
[13:11] pieterh
|
acid test would be, do you discuss project X here and on zeromq-dev, or on some other forum
|
[13:14] Guthur
|
would it be possible to offer commercial support for such projects via a corporate entity, similar to how imatix is mentioned for whaleshark?
|
[13:14] pieterh
|
of course
|
[13:15] pieterh
|
that's why there's a 'website' column
|
[13:15] pieterh
|
you'd probably not be able to use the zeromq.org domain without iMatix agreeing
|
[13:16] Guthur
|
that's reasonable
|
[13:46] sustrik
|
pieterh: it seems there a problem with the mailing list
|
[13:46] sustrik
|
i've sent an email
|
[13:46] sustrik
|
it haven't apperared
|
[13:46] pieterh
|
hmm, ok, let me restart the server...
|
[13:48] pieterh
|
rebooting, it'll take a minute or so
|
[13:48] pieterh
|
there's a service (spam filter afair) which gets confused now and then
|
[13:59] pieterh
|
sustrik: didn't help, I'm contacting Ewen
|
[14:33] sustrik
|
thx
|
[14:41] Seta00
|
I need an example that uses polling on a sub socket :/
|
[14:42] pieterh
|
Seta00: poll works the same on all socket types
|
[14:42] Seta00
|
well then I need an example that uses polling
|
[14:42] pieterh
|
there are lots in the Guide
|
[14:43] Seta00
|
kk I'll check
|
[14:53] pieterh
|
sustrik: I've put a note on the community page, this sucks, sorry
|
[15:55] travlr
|
pieterh: just had to mention how much i appreciate the work you did with the online reference... much much nicer to work with... very thorough too! thanks.
|
[15:56] pieterh
|
travlr: you mean the new API site?
|
[15:56] travlr
|
yes
|
[15:56] pieterh
|
np :-) it was fun to make
|
[15:56] travlr
|
cool. thanks again for all
|
[15:56] pieterh
|
we needed to cover older/newer versions anyhow
|
[15:57] travlr
|
yes, very smooth and easy to work with
|
[16:51] private_meta
|
Does the router in a router-to-dealer-relationship know when a dealer connects, even if it didn't send a message yet? Meaning, can I as a user of the router know that?
|
[16:53] pieterh
|
private_meta: not when it connects, but if it sends a message, yes
|
[16:54] pieterh
|
any router-to-anything depends on the anything sending something to the router first
|
[16:54] private_meta
|
kk...
|
[16:55] private_meta
|
pieterh: so that no messages are lost in a router-dealer-relationship the router must wait for the first message to arrive
|
[16:56] private_meta
|
well, sounds logical now that i write it
|
[16:56] pieterh
|
yes
|
[16:56] pieterh
|
the router needs to know an address to send to
|
[16:56] pieterh
|
that only comes with an input message
|
[16:56] pieterh
|
unless (a) you pass the identities some other way
|
[16:56] pieterh
|
or (b) you use durable sockets
|
[16:56] private_meta
|
I'm in need of logon messages anyway
|
[16:57] pieterh
|
and router is like pub: if there's no recipient, the message is not queued, it's dropped immediately
|
[16:58] private_meta
|
pieterh: I seem to have overseen that in the docs, but what happens to a dealer trying to connect to a non-existant router, and how does the dealer know?
|
[16:59] pieterh
|
it doesn't know unless it expects a reply and doesn't get one
|
[16:59] pieterh
|
actually I'm writing this up now for Ch4
|
[17:00] private_meta
|
so there is no such thing as "unknown host" or other error messages that I could get?
|
[17:00] pieterh
|
nope
|
[17:01] pieterh
|
note that tcp:// is a disconnected protocol... the host might be away at lunch and back in 2 hours, 0MQ will wait
|
[17:01] pieterh
|
inproc:// will tell you if it can't connect
|
[17:01] private_meta
|
Did you do that so you have an abstraction of any protocols?
|
[17:01] private_meta
|
oh
|
[17:01] pieterh
|
it's just more useful like that, for most apps
|
[17:02] private_meta
|
I'm not quite sure how to implement a timeout to wait for that :/
|
[17:02] pieterh
|
it's documented... hang on...
|
[17:02] pieterh
|
ah, sorry, not yet pushed :-)
|
[17:03] private_meta
|
huh=
|
[17:03] private_meta
|
*huh?
|
[17:03] pieterh
|
if you can wait a little while...
|
[17:03] private_meta
|
define little while
|
[17:04] private_meta
|
for some people, a week might be a little while, for others a little while is an hour :D
|
[17:05] private_meta
|
As far as I figured, you use durable sockets where you have a fixed name whenever you reconnect (more or less), but also the router discards messages that are sent to a target it doesn't know. So if a router sends a message to a durable socket that is not yet connected, are these messages also discarded?
|
[17:06] pieterh
|
durable sockets cannot be "not yet connected"
|
[17:06] pieterh
|
a durable socket may be "temporarily away for lunch"
|
[17:07] pieterh
|
i've no idea what a router socket does with durable sockets but I imagine it queues messages for them
|
[17:07] pieterh
|
that would be consistent with PUB, but it's not documented afaik
|
[17:07] private_meta
|
kk, so a computer where the durable socket is located on which, let's say, reboots, is "away for lunch" for the router?
|
[17:07] private_meta
|
-which
|
[17:07] pieterh
|
the whole business of "XREP discards and does not queue messages it can't route" is not documented
|
[17:08] private_meta
|
kk
|
[17:24] pieterh
|
private_meta: ok, http://zguide.zeromq.org/page:all#toc67
|
[17:28] private_meta
|
sweet
|
[17:29] private_meta
|
pieterh: So the initial timeout is oc pretty much the first heartbeat not coming through I assume?
|
[17:30] pieterh
|
it's not quite that simple
|
[17:30] private_meta
|
how so?
|
[17:30] pieterh
|
you need a clock for the poll, should be the lowest heartbeat interval
|
[17:30] pieterh
|
if you use the same heartbeat for all peers, that value
|
[17:30] pieterh
|
then you need to allow for 2-3 lost heartbeats before declaring a 'disconnected peer'
|
[17:31] private_meta
|
Yes, seems like a good thing to allow for single lost messages.
|
[17:32] private_meta
|
Uhm... a "lost heartbeat" would be, in your case, a certain heartbeat not receiving a reply, wouldn't it? Isn't 0mq build so, if the client decides to connect one day, all those "lost" heartbeats would be sent?
|
[17:32] private_meta
|
*built
|
[17:33] pieterh
|
heartbeats don't get replies
|
[17:33] pieterh
|
they are asynchronous in both directions
|
[17:33] private_meta
|
ah yeah
|
[17:33] private_meta
|
sorry, true
|
[17:33] pieterh
|
please read the code and the docs...
|
[17:33] private_meta
|
I will
|
[17:33] private_meta
|
sorry for asking prematurely :)
|
[17:36] pieterh
|
np, if there's anything unclear or missing in the text, let me know
|
[17:36] pieterh
|
it's a first draft and raw
|
[17:40] private_meta
|
pieterh: to get it straight, you would use one zmq_poll call with infinite timeout for message transfer and one with heartbeat timeout to send heartbeat messages?
|
[17:40] pieterh
|
i don't think that's what the examples do
|
[17:40] private_meta
|
You mean the pirate example?
|
[17:41] pieterh
|
any of them
|
[17:41] private_meta
|
Okay, I'll look at that one
|
[17:41] pieterh
|
it's tempting to do heartbeating via a second socket
|
[17:41] pieterh
|
this is a bad idea for two or three reasons
|
[17:41] pieterh
|
which I'll document
|
[17:44] pieterh
|
"First, if you're sending data you don't need to send heartbeats. Second, sockets may, due to network vagaries, become jammed. You need to know when your main data socket is silent because it's dead, rather than just not busy, so you need heartbeats on that socket. Lastly, two sockets is more complex than one."
|
[17:54] cremes
|
is there a C FORWARDER device in the zguide anywhere? i can't seem to find one and I'd like one for testing
|
[18:02] pieterh
|
cremes, afaik the msgqueue example will work if you use PUB and SUB
|
[18:02] pieterh
|
a forwarder just reads and writes two sockets
|
[18:02] cremes
|
pieterh: ok, i'll try it
|
[18:03] pieterh
|
sorry, msgqueue just calls the built-in device, that's not what you want, is it
|
[18:03] pieterh
|
you want the actual core, poll / recv / send?
|
[18:03] cremes
|
no, i just want something that will subscribe to everything and publish out the other side
|
[18:04] cremes
|
the built in device is probably okay then, yes?
|
[18:04] pieterh
|
yes
|
[18:04] pieterh
|
it's the same code for all three devices
|
[18:04] pieterh
|
the only differences are the bind/connect directions and socket types
|
[18:09] zedas
|
pieterh: what?! not even http://mulltedb.org :-)
|
[18:10] zedas
|
pieterh: or i mean http://mulletdb.org/ :-)
|
[18:10] cremes
|
pieterh: looks like i don't need it; i have isolated another slow leaker with PUB sockets
|
[18:10] pieterh
|
zedas: uhm, what's the question?
|
[18:10] pieterh
|
cremes: really, and it's not even Friday yet?
|
[18:11] cremes
|
:)
|
[18:11] cremes
|
well, i need to verify one or two more things.... but yeah
|
[18:12] pieterh
|
zedas: you mean for the 0MQ projects list?
|
[18:12] pieterh
|
and it's mulletdb.com, :-)
|
[18:13] zedas
|
damn, see i don't even care about that project.
|
[18:13] zedas
|
pieterh: yeah i was joking about "projects"
|
[18:14] pieterh
|
yeah, the love shows
|
[18:14] pieterh
|
tokyo cabinet seems useful
|
[18:14] pieterh
|
not so sure about that zeromq stuff you are so keen about
|
[18:23] cremes
|
false alarm on that leak... i was calling setsockopt(LINGER) after zmq_connect()
|
[18:23] cremes
|
i guess it doesn't honor it after the socket has been bound/connected
|
[18:23] cremes
|
or is that a bug?
|
[18:24] cremes
|
nope, not a bug according to the man page
|
[18:41] sp4ke
|
Hi
|
[18:42] sp4ke
|
can anyone help me setting up zeromq with my project on Visual Studio 2010
|
[18:42] sp4ke
|
i get unresolved external symbols when i build projects
|
[18:42] sp4ke
|
i built the libzmq project and added the path to the directory on my project dpendencies
|
[18:52] sustrik
|
the libs are in libs subdir
|
[18:52] sustrik
|
iirc
|
[18:53] sp4ke
|
in the libs subdir i've got only a libzmq.dll and libzmq.ilk
|
[18:53] sp4ke
|
how can i add these files as dependencies in VS ?
|
[18:54] sp4ke
|
i mean other than specify the path in the Librarry Directories which i did
|
[18:55] sustrik
|
there should be libzmq.lib iirc
|
[18:55] sustrik
|
you should link that with your project
|
[18:58] sp4ke
|
ok thanx i found a discussion on irc archive it's common problem to not get the .lib the answer should be there
|
[19:29] pieterh
|
cremes: you can set LINGER at any time before close, afaics
|
[19:30] cremes
|
the docs say otherwise: "Caution: All options, with the exception of subscription strings, only take effect for subsequent socket bind/connects."
|
[19:30] cremes
|
that's from the zmq_setsockopt man page
|
[19:30] cremes
|
i don't think it's lying... my testing appears to bear this out
|
[19:32] pieterh
|
i've been using LINGER in examples to stop zmq_term blocking, and I use it just before close
|
[19:32] pieterh
|
something to clarify...
|
[19:32] cremes
|
indeed
|
[19:33] pieterh
|
example like https://github.com/imatix/zguide/blob/master/examples/C/lpclient.c
|
[19:59] mikko
|
sigh
|
[20:05] Guthur
|
cremes pieterh: that was my update
|
[20:06] pieterh
|
Guthur: yeah, but is it accurate?
|
[20:06] Guthur
|
sustrik mentioned that all options should be set before connect
|
[20:06] mikko
|
Guthur: not all
|
[20:06] mikko
|
zmq_subscribe can be set afterwards
|
[20:07] Guthur
|
mikko, yeah besides that
|
[20:07] pieterh
|
mikko: that's what the text says :-)
|
[20:07] pieterh
|
Guthur: it should IMO say "ZMQ_SUBSCRIBE" rather than "subscription strings" but that's minor
|
[20:08] pieterh
|
ZMQ_SUBSCRIBE, ZMQ_UNSUBSCRIBE, ZMQ_LINGER can afaik be set at any time
|
[20:09] pieterh
|
not sure about ZMQ_RECONNECT_IVL
|
[20:09] Guthur
|
ok, I can post another update patch
|
[20:09] Guthur
|
if that's ok
|
[20:09] pieterh
|
we need El Sustrik's formal confirmation with an "are you sure", IMO
|
[20:10] pieterh
|
I made an issue: https://github.com/zeromq/zeromq2/issues/173
|
[20:10] Guthur
|
hehe, yep that's are very sensible idea
|
[20:10] pieterh
|
there are a couple of fuzzy areas that cropped up
|
[20:20] pieterh
|
omg, I'm reinventing AMQP for Ch4... :-/
|
[20:21] pieterh
|
please shoot me now before this goes too far
|
[20:24] Guthur
|
at some point someone is bound to say 'It would be nice if core had this'
|
[20:24] Guthur
|
and then that will be the end
|
[20:24] pieterh
|
nah, it's all just user-space patterns
|
[20:25] pieterh
|
the key IMO is not even software, but documented protocols
|
[20:25] Guthur
|
is AMQP poorly documented?
|
[20:25] Guthur
|
I am not very familiar with it to be honest
|
[20:26] pieterh
|
hmm, depends on the version of AMQP, there are quite a few
|
[20:26] pieterh
|
on this page http://www.amqp.org/confluence/display/AMQP/AMQP+Specification
|
[20:26] pieterh
|
only AMQP/0-8 and AMQP/0-9-1 are properly documented
|
[20:27] pieterh
|
0-9 and 0-10 don't even have dates in the document... very shoddy work
|
[20:27] pieterh
|
every version is incompatible with every other version
|
[20:27] pieterh
|
oh, don't get me started :-)
|
[20:28] Guthur
|
I don't think i'll delve into it too deeply
|
[20:28] Guthur
|
I've enough on my plate without getting lost in AMQP
|
[20:28] pieterh
|
:-)
|
[20:51] cremes
|
pieterh: can you confirm this leaks memory on your system? https://gist.github.com/848007
|
[20:51] cremes
|
if so, i'll open a ticket and attach it
|
[20:51] sustrik
|
it's only SUBSCRIBE and UNSUBSCRIBE that affect the connection after it is established
|
[20:52] cremes
|
sustrik: i think i *might* have found another leak with PUB
|
[20:52] sustrik
|
yes?
|
[20:52] cremes
|
see this gist: https://gist.github.com/848007
|
[20:52] pieterh
|
cremes: nope
|
[20:52] cremes
|
if someone can confirm it leaks on their system, i'll open a ticket
|
[20:52] pieterh
|
it does not leak
|
[20:52] pieterh
|
it does consume 300% CPU
|
[20:53] pieterh
|
but memory usage is stable: "7867 ph 20 0 198m 1904 1148 S 312 0.0 1:09.50 leaker6 "
|
[20:53] cremes
|
hrmm...
|
[20:53] pieterh
|
sustrik: I've tested LINGER and it definitely works after the connection is established
|
[20:54] sustrik
|
aaaah
|
[20:54] sustrik
|
i recall something like that dimly
|
[20:54] sustrik
|
let me check the code
|
[20:54] pieterh
|
Ergo^: are you on the latest 0MQ?
|
[20:56] pieterh
|
Ergo^: check the release notes, Ctrl-C was fixed but I don't recall exactly what version
|
[20:59] cremes
|
pieterh: ah! make a small change to that code and it will leak like a sieve
|
[20:59] cremes
|
change the number of client threads it spawn to something greater than 1
|
[20:59] pieterh
|
cremes... put the 'free' into comments?
|
[20:59] pieterh
|
ah, will try
|
[20:59] pieterh
|
Ergo^: did you read the Guide yet?
|
[20:59] cremes
|
i think it's a race condition bug
|
[20:59] pieterh
|
cremes: I'll spend 10 minutes on that, would you spend 10 minutes reviewing http://rfc.zeromq.org/spec:7?
|
[21:00] cremes
|
my pleasure
|
[21:00] pieterh
|
Ergo^: until you've read at least Ch1 and Ch2, you're kind of in RTFM mode here
|
[21:01] sustrik
|
ack: LINGER is socket-wide
|
[21:01] sustrik
|
not connection-wide
|
[21:02] pieterh
|
cremes: I hereby name this ship the "Leaky and Nasty"
|
[21:02] pieterh
|
7993 ph 20 0 1853m 1.4g 1148 S 382 17.6 2:42.12 leaker6
|
[21:02] cremes
|
huzzah!
|
[21:02] pieterh
|
That's 1.4g of memory in about 30 seconds
|
[21:02] pieterh
|
with 10 client threads
|
[21:02] cremes
|
i can email you guys a call-tree backtrace if that is helpful to you
|
[21:02] cremes
|
yeah, same thing happens on my box
|
[21:03] pieterh
|
i love it when people send beautiful C code that reproduces problems...
|
[21:03] cremes
|
btw, it doesn't leak as fast when the LINGER line is uncommented but it still leaks *rapidly*
|
[21:10] sustrik
|
what unit is s_clock() in?
|
[21:10] cremes
|
milliseconds
|
[21:11] mikko
|
success!
|
[21:12] sustrik
|
cremes: ok, what about the cpu usage?
|
[21:12] mikko
|
i managed to create pure shell-script that executes zeromq build and sends results over http to jenkins
|
[21:12] sustrik
|
a peak followed by flat line?
|
[21:12] pieterh
|
mikko: nice!
|
[21:12] cremes
|
sustrik: let me take a look
|
[21:13] mikko
|
also, on the other news. i am bringing up powerpc (debian 6.0) build slave soon(ish)
|
[21:13] cremes
|
sustrik: did you update the code to use 2+ client threads? i see cpu spike and *stay* there
|
[21:13] sustrik
|
mikko: btw, i've had a discussion with a guy who has problems building 0mq under mingw-win64
|
[21:14] cremes
|
sustrik: reload that gist if you like; i updated it to create 5 client threads which more readily show the leak
|
[21:14] mikko
|
sustrik: what is the problem?
|
[21:14] mikko
|
using mingw64?
|
[21:14] sustrik
|
order of includes, presumably
|
[21:14] sustrik
|
https://github.com/zeromq/zeromq2/issues/#issue/60
|
[21:15] sustrik
|
i just though it can possibly make sense to add that to builds
|
[21:15] sustrik
|
cremes: ok, so it's processing something
|
[21:15] mikko
|
sustrik: the current cluster is 32bit hardware
|
[21:15] sustrik
|
that definitely looks like a bug
|
[21:15] mikko
|
that's slightly problematic
|
[21:16] mikko
|
would need a win64 box (i presume)
|
[21:16] sustrik
|
ah, i though it's a cross-compile
|
[21:16] mikko
|
or does the cross-compile work on 32bit?
|
[21:16] sustrik
|
never mind
|
[21:16] sustrik
|
no idea
|
[21:16] sustrik
|
check the issue
|
[21:16] mikko
|
can't do 'make check' without win64
|
[21:16] mikko
|
i can add build
|
[21:16] sustrik
|
mikko: spot on
|
[21:16] sustrik
|
i forgot about the tests
|
[21:17] cremes
|
sustrik: yes, i agree; i changed the publish interval to 500ms and cpu remains high
|
[21:17] cremes
|
sustrik: whatever it is processing, it's stuck
|
[21:17] sustrik
|
right
|
[21:17] cremes
|
sustrik: i can send you the call-tree for the code that is allocating (and holding onto) all of this memory if that's helpful
|
[21:17] pieterh
|
cremes: I think I see the problem
|
[21:17] sustrik
|
yes, please
|
[21:18] pieterh
|
the client is never pausing for breath
|
[21:18] sustrik
|
it's not, but it's time-limited
|
[21:18] pieterh
|
server can't keep up
|
[21:18] sustrik
|
so it should send for 200ms
|
[21:18] pieterh
|
let me set a HWM and do small sleep in the client after closing a socket...
|
[21:18] sustrik
|
then stop
|
[21:18] pieterh
|
the clock in the client has no purpose at all afaics
|
[21:20] cremes
|
ok, so a small sleep inside the publish loop fixes it
|
[21:20] cremes
|
but shouldn't it just drop those messages if they are in queue and undelivered?
|
[21:20] cremes
|
LINGER = 0 in this case
|
[21:21] pieterh
|
cremes: if I sleep 1 second after each publish burst, client memory usage is flat
|
[21:21] pieterh
|
they are sent to publisher before you close the socket
|
[21:21] pieterh
|
the memory consumption is in the server queue
|
[21:22] cremes
|
hmmm, i can believe that
|
[21:22] sustrik
|
2 producers are definitely going to overload one consumer
|
[21:22] pieterh
|
hmm, indeed, I set 10k HWM ons server socket, still runs out of memory
|
[21:22] sustrik
|
you have to set HWM to make excess messages be dropped
|
[21:22] pieterh
|
setting 10K HWM on client socket AND sleeping in between bursts, it's ok
|
[21:23] sustrik
|
what about HWM on both sender and receiver?
|
[21:23] pieterh
|
cremes: ah...
|
[21:23] pieterh
|
LINGER is only executed at zmq_term time!
|
[21:24] sustrik
|
zmq_close() time, to be precise
|
[21:24] pieterh
|
bleh, you're right, and doing init/term in teh loop makes no difference
|
[21:25] pieterh
|
cremes: you always find the weird cases... :-)
|
[21:25] sustrik
|
have you tried with HWM on both sides?
|
[21:25] pieterh
|
have tried on either side, no difference
|
[21:25] cremes
|
i didn't think HWM had any effect on a SUB socket...?
|
[21:25] sustrik
|
i meant *both*
|
[21:25] sustrik
|
not either
|
[21:26] sustrik
|
cremes: it does
|
[21:26] sustrik
|
it specifies how many messages can be buffered before 0mq starts dropping them
|
[21:26] pieterh
|
sustrik: either, both, makes no visible difference
|
[21:26] sustrik
|
ok, that looks like a buf
|
[21:27] sustrik
|
bug
|
[21:27] cremes
|
on the zmq_socket() man page, it says N/A for HWM on a SUB socket
|
[21:27] sustrik
|
oh
|
[21:27] sustrik
|
i see
|
[21:27] pieterh
|
the only thing that seems to work is a long (1 second) sleep in the client loop
|
[21:27] sustrik
|
the clients are creating new connections all the time
|
[21:27] cremes
|
sustrik: right
|
[21:27] pieterh
|
cremes: yeah, I remember that, it's a bug, no?
|
[21:28] sustrik
|
meaning that the server creates a new buffer each time
|
[21:28] sustrik
|
each buffer is limited by HWM
|
[21:28] sustrik
|
but the number of buffers is unlimited
|
[21:29] sustrik
|
there should be MAX_CONNECTIONS socket options...
|
[21:29] sustrik
|
option*
|
[21:29] cremes
|
that buffer should be dropped when zmq_close() is called so it should catch up, right?
|
[21:29] Guthur
|
what is expected to happen if you poll before TCP sockets are fully connected?
|
[21:29] sustrik
|
cremes: the buffer is dropped on the client side
|
[21:30] sustrik
|
the server side buffer remains untill all the messages are read from it
|
[21:30] cremes
|
sustrik: i thought zmq_connect() is what created the buffer
|
[21:30] pieterh
|
Guthur: nothing in particular?
|
[21:30] cremes
|
ok, right
|
[21:30] sustrik
|
cremes: yes
|
[21:30] pieterh
|
sustrik: yes, but are there multiple buffers at the server side?
|
[21:30] sustrik
|
but the server side buffer remains in place while there are messages in it
|
[21:31] sustrik
|
yes, one buffer per connection
|
[21:31] pieterh
|
it's N client-side buffers (that should be destroyed by close + LINGER=0) + 1 server-side buffer
|
[21:31] Guthur
|
pieterh, I'm getting strange behaviour on POSIX OSs (linux and OSX) with polling with CLRZMQ2
|
[21:31] pieterh
|
setting HWM on sub socket (server) makes no difference
|
[21:31] pieterh
|
Guthur: 'strange' = ?
|
[21:31] sustrik
|
the socket on the server side is never closed
|
[21:32] sustrik
|
so the buffers remain
|
[21:32] Guthur
|
pieterh, well if I don't delay the polling ever so slightly it throws an exception
|
[21:32] pieterh
|
sustrik... where is that 1.4Gb of memory sitting then?
|
[21:32] Guthur
|
and a users seems to be getting similar problems on OSX
|
[21:32] Guthur
|
user*
|
[21:32] sustrik
|
lot of buffers in the server socket
|
[21:32] sustrik
|
they are gradually being emptied and deallocated
|
[21:32] Guthur
|
same code works on windows fine though
|
[21:33] sustrik
|
but client create new buffers even faster
|
[21:33] Guthur
|
without the delay
|
[21:33] mikko
|
http://johanharjono.com/archives/633
|
[21:33] mikko
|
installation instructions missing something?
|
[21:33] pieterh
|
and HWM is for each buffer independently... not the socket as such
|
[21:33] pieterh
|
Guthur: no idea, we'd need some test code that reproduces it
|
[21:33] sustrik
|
yes, HWM is same as SO_SNDBUF and SO_RCVBUF
|
[21:33] sustrik
|
local
|
[21:34] sustrik
|
doesn't affect the peer
|
[21:34] Guthur
|
it's all related to this issue: https://github.com/zeromq/clrzmq2/issues/13
|
[21:34] pieterh
|
cremes: so what did you not know that led you to think this could work?
|
[21:35] cremes
|
pieterh: i saw another resource leak and followed it back to the PUB socket
|
[21:36] Guthur
|
i do notice that if I place it in a try block it also works, I put this down to the fact a try block will possibly delay the polling ever so slightly
|
[21:36] cremes
|
i'll have to look and see if i am overrunning the SUB socket on the other side like in this example
|
[21:36] pieterh
|
seems like that opening/closing the client sockets each time is the cause
|
[21:36] sustrik
|
this is a problem i wanted to address for a long time but never quite get to do it
|
[21:36] pieterh
|
Guthur: I can't really help, have no idea what the exception could be or why
|
[21:36] sustrik
|
there should be a socket option limiting the max number of concurrent connecitons
|
[21:37] peter_NOrth
|
is nial dalton on this IRC ever?
|
[21:37] pieterh
|
sustrik: anti-DoS protection
|
[21:37] sustrik
|
exactly
|
[21:37] pieterh
|
useful, but here we have a problem of documentation IMO
|
[21:37] pieterh
|
or something
|
[21:38] sustrik
|
possibly
|
[21:38] pieterh
|
it's unclear how HWM and LINGER help here
|
[21:38] pieterh
|
(in fact they don't)
|
[21:38] sustrik
|
LINGER is irrelevant
|
[21:39] sustrik
|
because it affects the send side
|
[21:39] sustrik
|
and the problem is on recv side
|
[21:39] pieterh
|
yes, but that's not obvious
|
[21:39] sustrik
|
HWM would help in combination with MAX_CONNECTIONS
|
[21:39] sustrik
|
MAX_CONNECTION * HWM = max number of messages queued
|
[21:39] pieterh
|
possibly HWM affecting socket rather than each buffer
|
[21:39] pieterh
|
ah, yes
|
[21:40] sustrik
|
* MAX_MSG_SIZE = max memory used
|
[21:40] pieterh
|
...calculating...
|
[21:40] pieterh
|
102523.2231GB
|
[21:40] pieterh
|
yeah, that'll do
|
[21:41] pieterh
|
sustrik: why not add MAX_CONNECTIONS and MAX_MSG_SIZE to the 3.0 roadmap?
|
[21:41] pieterh
|
they are excellent ideas
|
[21:41] Guthur
|
pieterh, errno 4 mean anything?
|
[21:42] pieterh
|
documenting them will perhaps give someone the incentive to go make the patch
|
[21:42] sustrik
|
it can be added to 2.x
|
[21:42] NoToes
|
Hi Guther, I'm "johndeko". So you've managed to reproduce the poll timing issue? If so I wont bother to reproduce it outside of Unity.
|
[21:42] sustrik
|
no backward compatibility problem
|
[21:42] pieterh
|
sustrik: sure
|
[21:42] pieterh
|
we have a 2.2 roadmap page?
|
[21:42] sustrik
|
nope
|
[21:42] Guthur
|
NoToes, I think so
|
[21:42] Guthur
|
very strange one though
|
[21:43] pieterh
|
sustrik: ok, I'm going to make it, I assume?
|
[21:43] sustrik
|
why not
|
[21:43] NoToes
|
Sure is!
|
[21:43] sustrik
|
no, i'm not
|
[21:44] sustrik
|
it's either brian granger or minrk
|
[21:44] Guthur
|
NoToes, a sleep of at least 100 milliseconds before starting to poll and there is no problem
|
[21:45] Guthur
|
but I don't think that's really what you want to hear
|
[21:45] pieterh
|
sustrik: ok, done, and I added the socket type renames since there was consensus on that
|
[21:45] pieterh
|
oh, I can provide a patch for that already :-)
|
[21:45] sustrik
|
what renames?
|
[21:46] pieterh
|
:-)
|
[21:46] pieterh
|
XREP -> ROUTER, XREQ -> DEALER
|
[21:46] sustrik
|
yuck
|
[21:46] NoToes
|
Guther, not really. It doesn't fill me with certainty and makes fast updates impossible.
|
[21:46] Guthur
|
here it's an interrupted syscall
|
[21:46] pieterh
|
yeah, you should have said that when it was discussed on zeromq-dev
|
[21:46] pieterh
|
les absents on toujours tort
|
[21:46] Guthur
|
that's the exception
|
[21:46] Guthur
|
NoToes, ^
|
[21:46] sustrik
|
ok, good
|
[21:47] sustrik
|
i'll add it as an alias
|
[21:47] pieterh
|
sustrik: thread has title "[0MQ/3.0] discuss: rename XREP to ROUTER"
|
[21:47] pieterh
|
but we can introduce the name change in 2.2 as we did for PUSH/PULL
|
[21:48] cremes
|
Ergo^: if the python 0mq interface allows you to send multipart messages, make sure the topic is the first
|
[21:48] Guthur
|
NoToes, it maybe that you only have to do this after first connecting, and then things will be fine unless you have to reconnect again, that's a guess though
|
[21:48] cremes
|
Ergo^: part and your json-encoded string is the second part
|
[21:48] cremes
|
Ergo^: don't be overly concerned that the api doesn't have a single call that does everything you want
|
[21:48] Guthur
|
NoToes, I have not got the sleep in the polling loop, rather just before it, does this work for you?
|
[21:48] cremes
|
Ergo^: you can build your own convenience method from the methods already present, right?
|
[21:48] NoToes
|
Guthur, well easy enough for me to test.
|
[21:49] NoToes
|
Guthur, I'll try it out.
|
[21:49] Guthur
|
cool
|
[21:50] Guthur
|
sustrik, any idea why we would get an "Interrupted system call" error when polling to quickly after a TCP socket connection
|
[21:50] cremes
|
Ergo^: i disagree; i don't think the api should have any explicit method dealing with json
|
[21:50] cremes
|
Ergo^: why not a different serialization format? what is json's connection to 0mq?
|
[21:51] cremes
|
Ergo^: i guess i fail to see the problem here; you can easily accomplish what you want with a 3-line method
|
[21:51] cremes
|
Ergo^: why does it matter that the api doesn't already have it? write it and send in a patch...?
|
[21:51] sustrik
|
Guthur: presumably, there's a signal generated somewhere
|
[21:52] mikko
|
sustrik: http://build.zero.mq/job/ZeroMQ2-core-master_mingw64/5/console
|
[21:52] mikko
|
mingw64 cross compile running
|
[21:52] mikko
|
well, was running
|
[21:52] sustrik
|
wow, that was quick
|
[21:52] cremes
|
Ergo^: ok!
|
[21:53] mikko
|
sustrik: not sure if that is my environment or something else
|
[21:53] sustrik
|
no windows.h
|
[21:53] sustrik
|
strange
|
[21:53] mikko
|
might be something odd with the build i guess
|
[21:54] mikko
|
./configure --host=amd64-mingw32msvc --target=mingw64
|
[21:54] mikko
|
do i need anything else?
|
[21:55] sustrik
|
no idea
|
[21:55] NoToes
|
Guthur, no luck with a sleep before the poll loop.
|
[21:55] sustrik
|
try asking the guy who filled the issue
|
[21:55] mikko
|
ok, will investigate
|
[21:55] sustrik
|
he's pretty responsive
|
[21:56] cremes
|
sustrik: what would you say is holding onto the memory if you saw this callstack? https://gist.github.com/848123
|
[21:56] sustrik
|
that are messages
|
[21:56] cremes
|
are they unsent and in a queue?
|
[21:57] sustrik
|
they are received by I/O thread and waiting to be read by the application
|
[21:58] cremes
|
sustrik: i don't understand that... it's from a PUB socket, so what is waiting to read it?
|
[21:58] sustrik
|
sorry?
|
[21:58] sustrik
|
I/O thread reads messages from TCP connections and buffers them
|
[21:58] sustrik
|
application reads them
|
[21:59] cremes
|
that call-tree is for a pub socket that is sending messages
|
[21:59] cremes
|
i don't understand why you say the i/o thread has received them and is waiting for the application to read them
|
[21:59] sustrik
|
oops
|
[21:59] cremes
|
i though pub was broadcast, fire-and-forget
|
[21:59] sustrik
|
missed the first line
|
[21:59] sustrik
|
it is
|
[22:00] sustrik
|
but there's some reliability built in
|
[22:00] cremes
|
pieterh: sent you some feedback on that rfc
|
[22:00] sustrik
|
namely, up to HWM messages are buffered before 0mq starts dropping them
|
[22:00] pieterh
|
cremes: our email server is dead atm
|
[22:00] cremes
|
ok, so what are the conditions that will cause pub to hang onto those messages?
|
[22:00] sustrik
|
by default, HWM=infinite
|
[22:00] pieterh
|
cremes: could you resend to pieterh@gmail.com, thanks
|
[22:00] cremes
|
pieterh: that explains why the email bounced!
|
[22:01] pieterh
|
bounced? that's not nice... rats...
|
[22:01] NoToes
|
Guthur, adding a System.GC.Collect() instead of a sleep also works.
|
[22:01] cremes
|
sustrik: ok, so they are in queue because there is a slow subscriber somewhere; is that right?
|
[22:02] Guthur
|
NoToes, that's even weirder
|
[22:02] sustrik
|
yes
|
[22:02] cremes
|
ok
|
[22:02] sustrik
|
to guard against slow consumers
|
[22:02] Guthur
|
NoToes, but the sleep did not work for you?
|
[22:02] NoToes
|
Guthur, doesn't say much if just takes up some time.
|
[22:02] sustrik
|
all buffering has to have upper limit
|
[22:02] cremes
|
and if there are *no* subscribers, it should just drop those messages, yes?
|
[22:03] sustrik
|
so we need at least 3 options: HWM, MAX_CONNECTIONS, and MAX_SIZE
|
[22:03] sustrik
|
cremes: yes
|
[22:03] cremes
|
cool
|
[22:03] cremes
|
i must have a slow subscriber somewhere.... damn it
|
[22:03] sustrik
|
well, if you are doing something like the example posted
|
[22:04] sustrik
|
i.e. publishing at full speed from serveral apps to a single app
|
[22:04] sustrik
|
it's just going to blow up
|
[22:04] peter_NOrth
|
dalton
|
[22:04] cremes
|
i don't think i have that configuration though... i'll have to dig into this; thanks for your help
|
[22:05] sustrik
|
you are welcome
|
[22:08] Guthur
|
NoToes, crumbs, I can not replicate anymore
|
[22:09] Guthur
|
it's just working now, grrr
|
[22:11] NoToes
|
Guthur, That's timing bugs for you!
|
[22:14] pieterh
|
cremes: thanks for the review, made changes
|
[22:14] pieterh
|
could you send me that email bounce message so I can see the error?
|
[22:15] cremes
|
pieterh: it wasn't a real bounce; the mail app refused to take a message to sustrik probably because it was too large
|
[22:15] cremes
|
pieterh: so... never mind!
|
[22:15] pieterh
|
ok
|
[22:19] pieterh
|
cremes: could you send me random something to ph@imatix.com?
|
[22:20] pieterh
|
I've fixed our email server but need to test
|
[22:20] NoToes
|
Guthur, I missed your message. No putting a sleep before the loop, instead of in the poll loop didn't work.
|
[22:20] cremes
|
pieterh: on its way
|
[22:20] pieterh
|
zeromq-dev should be working again now
|
[22:20] pieterh
|
thx!
|
[22:20] Guthur
|
NoToes, There is no error with you either?
|
[22:21] NoToes
|
Guthur, no error.
|
[22:21] NoToes
|
Guthur, zmq_poll just always returns 0.
|
[22:21] pieterh
|
sustrik: email list is fixed
|
[22:22] pieterh
|
messages will be coming in slowly as servers retry
|
[22:22] Guthur
|
NoToes, it seems as if I am getting a slightly different issue then
|
[22:22] Guthur
|
Mine returns errno 4, if there the slight delay before starting the polling loop
|
[22:23] Guthur
|
this translates to an "Interrupted system call"
|
[22:25] NoToes
|
Guthur, OK, different issue then.
|
[22:26] Guthur
|
which is doubly annoying, hehe
|
[22:26] NoToes
|
Guthur, I suppose I should try to reproduce this outside of Unity then.
|
[22:26] Guthur
|
NoToes, that would be helpful, and much appreciated if you could
|
[22:35] Guthur
|
is there any advisable action an app should take when getting EINTR while polling?
|
[22:37] Guthur
|
NoToes, I have found that if I catch that EINTR and then continue all is fine
|
[22:37] Guthur
|
OSX does signal things properly I assume, and unity isn't suppressing them even, or something
|
[22:38] Guthur
|
I admit we are on the borders of my knowledge here
|
[22:38] Guthur
|
probably left the country to be honest
|
[22:40] NoToes
|
Guthur Unfortunately I'm new to OSX as well.Is EINTER a signal or a return code from zmq_recv?
|
[22:41] Guthur
|
http://api.zeromq.org/master:zmq-poll
|
[22:41] Guthur
|
EINTR is returned if there is a signal
|
[22:42] Guthur
|
well not return actually, the errno is set
|
[22:42] Guthur
|
poll returns -1
|
[22:42] NoToes
|
Guthur, Ah OK.
|
[22:43] Guthur
|
I think the issue I have here in linux MONO is something I can't really rectify, but is easily worked around
|
[22:44] Guthur
|
The OSX one is a little less clear
|
[22:44] Guthur
|
have you tried it outside Unity?
|
[22:49] NoToes
|
Guthur, I'm trying now...
|
[22:53] NoToes
|
Guthur, it's working inside Unity now :(
|
[22:55] Guthur
|
I wonder if it's a MONO issue
|
[22:56] Guthur
|
but that doesn't really make much sense either, it's only a relatively simple interop call
|
[23:03] NoToes
|
Guthur I don't know enough about zmq to make sense of it. Is it possible that there is a shared native buffer referenced by multiple managed objects (or something like that)? Would explain why the running the GC helps and the timing issues.
|
[23:05] Guthur
|
NoToes, I'm looking through now
|
[23:22] Guthur
|
NoToes, Not seeing anything at the moment
|
[23:22] pieterh
|
Ergo^_: build using --with-openpgm afir
|
[23:23] Guthur
|
it's getting late here so i'll probably get my head down soon, sorry we've been unable to get this sorted for you
|
[23:23] Guthur
|
hopefully we'll get to the bottom of it eventually
|
[23:23] pieterh
|
:-)
|
[23:24] NoToes
|
Guthur, Thanks for all your help.
|
[23:24] Guthur
|
no probs
|
[23:24] Guthur
|
I might drop by the MONO channel tomorrow and see if I can get an clues
|
[23:24] Guthur
|
an/any
|
[23:25] Guthur
|
ok, it's late, night all
|