[Time] Name | Message |
[06:40] sustrik
|
zadas: why does the pointer has to be in zmq_pollitem_t?
|
[06:41] sustrik
|
keeping a different array of data pointers would work equally well, no?
|
[07:52] CIA-19
|
zeromq2: 03Martin Sustrik 07wip-shutdown * rc001223 10/ (src/fq.cpp src/pipe.cpp src/req.cpp src/req.hpp): REQ socket implementation is layered on top of XREQ - http://bit.ly/brXle9
|
[09:24] CIA-19
|
zeromq2: 03Martin Sustrik 07wip-shutdown * rb5fc565 10/ (src/rep.cpp src/rep.hpp): REP socket layered on top of XREP socket - http://bit.ly/cEXRbB
|
[11:05] sustrik
|
pieterh: i like the story of ordinary socket turning into supersocket :)
|
[11:26] pieterh
|
sustrik: the idea was to give a striking visual image :-)
|
[11:26] pieterh
|
Zap! Pow!
|
[11:28] pieterh
|
sustrik: i have to admit, i'm impressed
|
[11:28] pieterh
|
translated the C++ multithreaded server into C
|
[11:29] pieterh
|
it worked first time
|
[11:29] pieterh
|
that is kind of scary
|
[11:37] sustrik
|
:)
|
[11:37] sustrik
|
with previous enterprise messaging experience it makes you a bit suspicious
|
[11:39] sustrik
|
this has to be some kind of cheat...
|
[11:39] sustrik
|
no way it's that easy...
|
[11:41] sustrik
|
maybe we should provide an "enterprise" wrapper that'll some complexy to make corporate users feel comfortable :)
|
[11:48] pieterh
|
someone started an AMQP-to-0MQ bridge... I guess that counts?
|
[12:46] mato
|
sustrik: i just tested remote_thr on the wip-shutdown branch and it does *not* freeze
|
[12:47] sustrik
|
it fails then...
|
[12:47] mato
|
sustrik: maybe you forgot to comment out the zmq_sleep(10) ? :-)
|
[12:47] mato
|
sustrik: doesn't fail either :-)
|
[12:47] sustrik
|
ha
|
[12:47] sustrik
|
anyway, it freezes on my box, so i'll work on that
|
[12:47] sustrik
|
in the meantime you can do some tests
|
[12:48] mato
|
and you did comment out the zmq_sleep(), right? :-)
|
[12:48] sustrik
|
i've tried both commenting it out and not commenting it out
|
[12:48] sustrik
|
freezes in both cases
|
[12:48] mato
|
ah, interesting
|
[12:49] mato
|
sustrik: it freezes if you start remote first
|
[12:49] mato
|
at least that's what i see...
|
[12:49] mato
|
if i start local_thr, then remote_thr it works fine (with or without sleep)
|
[12:49] sustrik
|
doesn't really matter, the code is still a bit creaky, i'll have to do few fixes
|
[12:50] mato
|
yes, yes
|
[12:50] mato
|
it sort of works
|
[12:50] sustrik
|
kind of
|
[12:50] mato
|
not for small amounts of messages :)
|
[12:53] sustrik
|
i have to leave no for an hour
|
[12:53] sustrik
|
can you have a look at the poll stuff in the meantime?
|
[12:53] mato
|
ok, will do
|
[12:54] mato
|
sustrik: where did that patch come from?
|
[12:54] mato
|
sustrik: all i have from you is some github url...
|
[12:54] sustrik
|
it's a pull request
|
[12:54] sustrik
|
from that repo
|
[12:54] mato
|
sustrik: aha, ok
|
[12:55] mato
|
sustrik: we should disable pull requests via github somehow
|
[12:55] mato
|
sustrik: not transparent :(
|
[14:54] mato
|
sustrik: back yet?
|
[14:55] sustrik
|
yes, i'm back
|
[14:55] mato
|
sustrik: that patch kind of works, but not really
|
[14:55] sustrik
|
?
|
[14:55] mato
|
sustrik: i'm rewriting zmq_poll to use ZMQ_FD and ZMQ_EVENTS which is what it should be doing
|
[14:55] sustrik
|
yes, that's the idea
|
[14:55] mato
|
sustrik: which also kills this business of zmq_poll() poking around inside socket_base_t
|
[14:56] sustrik
|
right
|
[14:56] mato
|
sustrik: anyway, getting there... maybe poll will not be a hack after i've finished :)
|
[14:56] sustrik
|
:)
|
[15:05] mato
|
sustrik: so the idea is I'm supposed to poll for POLLIN on ZMQ_FD, then if I get POLLIN get the "real" events using ZMQ_EVENTS
|
[15:05] mato
|
sustrik: correct?
|
[15:31] sustrik
|
mato: correct
|
[16:11] mato
|
sustrik: ok, do you have a minute?
|
[16:12] mato
|
sustrik: i have zmq_poll () working, at least the version using poll ()
|
[16:12] mato
|
sustrik: but, I'm using some test programs Brian Granger wrote and things are breaking (as we would expect)
|
[16:13] mato
|
sustrik: what I can do now, is commit the poll () version of zmq_poll (), leaving the select () version #ifdefed out
|
[16:13] mato
|
sustrik: and then, either you use Brian's test programs, or I can rewrite them in C++ which might be easier for you
|
[16:13] mato
|
sustrik: and you can squash the bugs ...
|
[16:26] sustrik
|
ok, commit it
|
[16:26] CIA-19
|
zeromq2: 03Martin Lucina 07wip-shutdown * rf4db0f6 10/ src/zmq.cpp :
|
[16:26] CIA-19
|
zeromq2: zmq_poll(): Rewrite to use ZMQ_FD/ZMQ_EVENTS pt1
|
[16:26] CIA-19
|
zeromq2: Rewrite zmq_poll() to use ZMQ_FD and ZMQ_EVENTS introduced on the
|
[16:26] CIA-19
|
zeromq2: wip-shutdown branch. Only do the poll()-based version of zmq_poll (), the
|
[16:26] CIA-19
|
zeromq2: select()-based version will not compile at the moment. - http://bit.ly/b9yiFc
|
[16:26] mato
|
sustrik: done
|
[16:26] sustrik
|
i am running into different tricky races between delayed socket close and reconnection functionality
|
[16:26] sustrik
|
it'll take a while to get that sorted out
|
[16:27] mato
|
sustrik: d'you want those test programs rewritten in C++?
|
[16:27] sustrik
|
for poll?
|
[16:27] mato
|
sustrik: yeah
|
[16:27] sustrik
|
no need thanks
|
[16:27] sustrik
|
i have to make simple cases work first
|
[16:27] mato
|
sustrik: ok, so there's nothing i can do in the mean time?
|
[16:28] mato
|
sustrik: ...except complete the poll() rewrite (fix the select() based version)
|
[16:28] mato
|
sustrik: ack?
|
[16:28] sustrik
|
that would be one good thing to do
|
[16:28] mato
|
well i'll go and do it then
|
[16:28] sustrik
|
the other would be to make poll's timeout behave as expected
|
[16:28] mato
|
yeah, i was going to do that too :-)
|
[16:28] sustrik
|
:)
|
[16:29] sustrik
|
now the performance problem of calling getimeofday is not critical
|
[16:29] mato
|
yup
|
[16:29] mato
|
actually zmq_poll() is much simpler now
|
[16:30] mato
|
so these are all good changes (ZMQ_FD, ZMQ_EVENTS...)
|
[16:30] sustrik
|
those who need fast version can use ZMQ_FD and ZMQ_EVENTS instead
|
[16:31] sustrik
|
ack
|
[16:34] CIA-19
|
zeromq2: 03Martin Lucina 07wip-shutdown * r39abee0 10/ src/zmq.cpp :
|
[16:34] CIA-19
|
zeromq2: Fix whitespace
|
[16:34] CIA-19
|
zeromq2: Dunno where those <TAB>s came from... - http://bit.ly/c3hytK
|
[17:37] pieterh
|
sustrik: does zmq_errno not work portably?
|
[17:37] sustrik
|
it should
|
[17:38] pieterh
|
the man page strongly advises people not to use the function except on win32
|
[17:38] sustrik
|
on any sane platform errno should do
|
[17:38] mato
|
pieterh: that's because it's really just a hack for win32
|
[17:39] mato
|
pieterh: and it's mess with multiple CRTs
|
[17:39] pieterh
|
i understand from the "0mq will become a linux kernel module" PoV
|
[17:39] pieterh
|
but from "i want to write an app that runs anywhere" PoV it's... not useful
|
[17:40] pieterh
|
do we have an explicit policy of not trying to help people write portable code?
|
[17:40] pieterh
|
fine by me, but we should say so explicitly IMO
|
[17:40] mato
|
more or less
|
[17:41] sustrik
|
the question is how much api contamination we are willing to accept as to support broken platforms
|
[17:41] pieterh
|
it's ok for non-OMQ functions, we can tell people to use random portability layer
|
[17:41] pieterh
|
but in this case the only solution is for people to themselves wrap zmq_errno and errno
|
[17:41] pieterh
|
which is insane IMO
|
[17:42] mato
|
write an email
|
[17:42] pieterh
|
will do
|
[18:26] pieterh
|
sustrik: is there a reason the C api does not assert if its passed null objects, e.g. context in zmq_socket?
|
[18:26] pieterh
|
*it's
|
[18:27] sustrik
|
no
|
[18:27] sustrik
|
it can possibly return EFAULT
|
[18:27] sustrik
|
(invalid pointer)
|
[18:28] sustrik
|
but if you want to do so, check whether the error exists on Win32
|
[18:28] sustrik
|
and if it does not, define it in zmq.h
|
[18:28] pieterh
|
ack
|
[18:28] mato
|
of course it has no way to tell if a non-NULL pointer passed in is a valid object...
|
[18:28] pieterh
|
indeed
|
[18:29] mato
|
so not really worth it imho
|
[18:29] sustrik
|
pieterh: btw, this won't work:
|
[18:29] mato
|
you would also need to update the docs specifying that EFAULT is a valid return, etc etc
|
[18:29] sustrik
|
c -lzmq -l myprogram
|
[18:29] sustrik
|
there's also a uuid library needed
|
[18:30] pieterh
|
I'd not bother returning EFAULT, but asserting if it gets NULL is cheap and useful
|
[18:30] mato
|
hmm, maybe, no opinion right now
|
[18:30] mato
|
also -lzmq is sufficient if that points to a shared lib
|
[18:38] CIA-19
|
zeromq2: 03Martin Lucina 07wip-shutdown * r3d72e38 10/ src/zmq.cpp :
|
[18:38] CIA-19
|
zeromq2: zmq_poll(): Rewrite to use ZMQ_FD/ZMQ_EVENTS pt2
|
[18:38] CIA-19
|
zeromq2: Rewrite the select()-based zmq_poll() implementation to use
|
[18:38] CIA-19
|
zeromq2: ZMQ_FD and ZMQ_EVENTS.
|
[18:38] CIA-19
|
zeromq2: Also fix some corner cases: We should not pollute revents with
|
[18:38] CIA-19
|
zeromq2: unrequested events, and we don't need to poll on ZMQ_FD at all
|
[18:38] CIA-19
|
zeromq2: if a pollitem with no events set was passed in. - http://bit.ly/9x0r7G
|
[18:51] CIA-19
|
zeromq2: 03Pieter Hintjens 07master * r677b3d9 10/ src/zmq.cpp : (log message trimmed)
|
[18:51] CIA-19
|
zeromq2: Added not-null assertions on pointer arguments in C API functions
|
[18:51] CIA-19
|
zeromq2: * zmq_term
|
[18:51] CIA-19
|
zeromq2: * zmq_socket
|
[18:51] CIA-19
|
zeromq2: * zmq_close
|
[18:51] CIA-19
|
zeromq2: * zmq_setsockopt
|
[18:51] CIA-19
|
zeromq2: * zmq_getsockopt
|
[19:11] CIA-19
|
zeromq2: 03Martin Lucina 07wip-shutdown * rc240ece 10/ src/zmq.cpp :
|
[19:11] CIA-19
|
zeromq2: zmq_poll(): Fix some corner cases
|
[19:11] CIA-19
|
zeromq2: Trying to optimize out the case where items_[i]. events is 0 would
|
[19:11] CIA-19
|
zeromq2: result in a bogus pollfds[i]. Similarly in the select()-based impl,
|
[19:11] CIA-19
|
zeromq2: while not strictly necessary it's better to get ZMQ_FD even if
|
[19:11] CIA-19
|
zeromq2: events is 0 since that detects ETERM and friends. - http://bit.ly/cSKumL
|
[19:17] mato
|
pieterh: incidentally i'd somewhat prefer the EFAULT approach as opposed to asserting... the only reason is asserting makes it look as if there's a problem in zmq which in this case there's not...
|
[19:17] pieterh
|
yeah, i was kind of thinking the same
|
[19:17] pieterh
|
EFAULT exists on win32, I checked
|
[19:18] pieterh
|
and the assertion is kind of useless anyhow
|
[19:18] mato
|
well, then, it's a bit more work but you may as well do it that way...
|
[19:18] mato
|
exactly
|
[19:18] pieterh
|
yes
|
[19:19] sustrik
|
mato: semantics quizz
|
[19:19] mato
|
sustrik: go on
|
[19:20] sustrik
|
how should zmq_close behave
|
[19:20] sustrik
|
here's my suggestion:
|
[19:20] sustrik
|
1. for connections created by zmq_connect
|
[19:20] sustrik
|
send all data to the wire
|
[19:20] sustrik
|
even if the peer is disconnected at the moment
|
[19:20] sustrik
|
in such case wait till it reconnects
|
[19:20] sustrik
|
and send the data
|
[19:21] sustrik
|
2. connections established via bind
|
[19:21] sustrik
|
send the data to connections that are connected at the moment
|
[19:21] sustrik
|
ignore disconnected ones
|
[19:21] sustrik
|
if disconnection happens during sending the data
|
[19:21] sustrik
|
drop all the remaining messages
|
[19:22] sustrik
|
how does that feel?
|
[19:22] mato
|
hmm
|
[19:22] mato
|
i would unify 1) and 2)
|
[19:22] sustrik
|
how?
|
[19:22] mato
|
i.e. for 1) if there are no connections then discard the data
|
[19:23] sustrik
|
what if there's a disconnected connection?
|
[19:23] mato
|
also if a disconnect occurs during sending then discard the remaining data
|
[19:23] mato
|
ignore it
|
[19:23] sustrik
|
consider this:
|
[19:23] sustrik
|
zmq_connect
|
[19:23] sustrik
|
connecting happens in background, will take some time
|
[19:23] sustrik
|
zmq_send
|
[19:23] sustrik
|
zmq_close
|
[19:23] sustrik
|
not yet connected
|
[19:23] mato
|
yeh, i know what you're going to say ... yes
|
[19:23] sustrik
|
=> drop the data
|
[19:23] mato
|
yes yes
|
[19:23] mato
|
how about this
|
[19:24] mato
|
if we've never tried to connect, then behave as you describe
|
[19:24] mato
|
if a connection has been seen, but is now disconnected, then ignore it
|
[19:24] mato
|
don't know if it makes sense, just brainstorming
|
[19:24] sustrik
|
pretty inconsistent
|
[19:25] sustrik
|
behaves differently based on something invisible to user
|
[19:25] mato
|
so does 2) in your case
|
[19:25] sustrik
|
i would prefer dropping the data straight away in 2)
|
[19:26] sustrik
|
however, at least trying gets the least surprise behaviour
|
[19:26] sustrik
|
with ZMQ_PUSH socket
|
[19:26] pieterh
|
i thought you were going to drop nothing until zmq_term?
|
[19:26] sustrik
|
how would you do that with zmq_bind?
|
[19:26] sustrik
|
you don't know what connections are going to arrive
|
[19:27] pieterh
|
it depends on the pattern, no?
|
[19:27] sustrik
|
at some point you have to decide:
|
[19:27] sustrik
|
"i don't care about connections that arrive from this point on"
|
[19:27] pieterh
|
if it's a pub socket and there are no connections, data will be dropped
|
[19:27] pieterh
|
if it's a req or xreq socket and there are no connections, data will be queued
|
[19:28] pieterh
|
but if you do zmq_term, it'll get dropped anyhow
|
[19:28] pieterh
|
no?
|
[19:28] mato
|
no
|
[19:28] pieterh
|
:-)
|
[19:28] mato
|
with the current implementation zmq_term() will just block forever if the data can't be sent
|
[19:29] pieterh
|
imo any working design that fixes that is already good enough for now
|
[19:29] pieterh
|
no need to attain perfection in a single cycle
|
[19:29] mato
|
fixes what?
|
[19:29] sustrik
|
let's consider the pieter's algorithm
|
[19:29] pieterh
|
the blocking zmq_term() and/or lost messages
|
[19:30] sustrik
|
say we have a bound ZMQ_REP socket
|
[19:30] pieterh
|
sustrik: my algorithm is 90% ignorance
|
[19:30] sustrik
|
thinkng about it aloud...
|
[19:30] sustrik
|
so if clients disconnects
|
[19:30] sustrik
|
the reply will be queued
|
[19:30] sustrik
|
if the client doesn't reconnect
|
[19:30] sustrik
|
zmq_term will block
|
[19:31] sustrik
|
doesn't sound right
|
[19:31] sustrik
|
servers should terminate in more or less forcefull way
|
[19:31] pieterh
|
sustrik: i suggested zmq_term drop all waiting messages
|
[19:31] pieterh
|
but that zmq_close act more or less asynchronously
|
[19:31] sustrik
|
consider this:
|
[19:31] sustrik
|
zmq_send
|
[19:31] sustrik
|
zmq_close
|
[19:31] sustrik
|
zmq_term
|
[19:31] sustrik
|
the message will never get to the other side
|
[19:32] mato
|
yes, that case must work, so term must block
|
[19:32] mato
|
but we need to find some kind of middle ground here
|
[19:32] pieterh
|
question
|
[19:32] pieterh
|
"must work, term must block", but for how long?
|
[19:32] pieterh
|
does it make sense to block indefinitely?
|
[19:33] sustrik
|
yes?
|
[19:33] sustrik
|
in first iteration ye
|
[19:33] pieterh
|
besides, that scenario seems artificial, is it a real use case?
|
[19:33] sustrik
|
yes
|
[19:33] sustrik
|
:)
|
[19:33] sustrik
|
it's standard client case
|
[19:33] sustrik
|
zmq_connect
|
[19:33] sustrik
|
zmq_send
|
[19:33] sustrik
|
zmq_close
|
[19:33] pieterh
|
no, client is send/recv/close/term
|
[19:33] sustrik
|
zmq_term
|
[19:33] mato
|
pieterh: it's precisely the case people are having a problem with at the moment
|
[19:33] pieterh
|
ah, push sockets, ok
|
[19:33] sustrik
|
that's for req/rep
|
[19:33] mato
|
pieterh: and solving with various sleep() hacks
|
[19:33] sustrik
|
think of say push
|
[19:34] pieterh
|
i think it depends on the pattern, it's a policy issue
|
[19:34] pieterh
|
e.g. for pub socket, you'd drop data
|
[19:34] pieterh
|
but a push socket tries to deliver data
|
[19:34] pieterh
|
like an xreq socket will also do
|
[19:34] sustrik
|
see above
|
[19:35] sustrik
|
<sustrik> so if clients disconnects
|
[19:35] sustrik
|
<sustrik> the reply will be queued
|
[19:35] sustrik
|
<sustrik> if the client doesn't reconnect
|
[19:35] sustrik
|
<sustrik> zmq_term will block
|
[19:35] sustrik
|
the question is how to shut down a server
|
[19:35] pieterh
|
request/reply or push? i'm confused, you are mixing the two
|
[19:35] mato
|
well, the counter-question is how to shut down a client with no server :-)
|
[19:35] pieterh
|
and i think the use cases are not the same
|
[19:35] sustrik
|
mato: block the client
|
[19:36] sustrik
|
that one is easy
|
[19:36] pieterh
|
reply socket is like push, it should block forever until someone can collect the message
|
[19:36] pieterh
|
where "forever" can be tuned with a timeout
|
[19:36] pieterh
|
imho
|
[19:36] sustrik
|
hm
|
[19:36] mato
|
sustrik: what if the client *really* wants the context to go away...?
|
[19:36] sustrik
|
actually it makes sense
|
[19:36] pieterh
|
but xrep and pub sockets should not do thuis
|
[19:37] pieterh
|
*this
|
[19:37] pieterh
|
they have no delivery promise
|
[19:37] pieterh
|
we're talking about a basic reliability policy here
|
[19:37] mato
|
sustrik: actually, pieter is right about PUB sockets... the policy should be similar to what happens on HWM being hit
|
[19:37] pieterh
|
pull and sub sockets are not relevant, obviously
|
[19:37] sustrik
|
mato: set SO_LINGER to 0?
|
[19:38] pieterh
|
yes, mato, it's a policy decision
|
[19:39] mato
|
sustrik: yes, i just have a feeling that for trivial cases there should essentially be a "linger" option to zmq_term()
|
[19:40] mato
|
hmm
|
[19:40] sustrik
|
what about this case:
|
[19:40] pieterh
|
setsockopt, please don't add options to zmq_init() or zmq_term()
|
[19:40] pieterh
|
and it can't be an option to zmq_term because it differs per socket
|
[19:40] sustrik
|
zmq_socket (PUB)
|
[19:41] sustrik
|
zmq_connect
|
[19:41] sustrik
|
zmq_send
|
[19:41] sustrik
|
zmq_close
|
[19:41] sustrik
|
zmq_term
|
[19:41] sustrik
|
message is lost
|
[19:41] pieterh
|
sustrik: you're right that connect/bind are relevant
|
[19:41] sustrik
|
is that the least surprise scenario?
|
[19:41] pieterh
|
no, PUB sockets don't normally connect anyhow
|
[19:42] sustrik
|
yes, it's the corner case
|
[19:42] pieterh
|
if you're using a PUB socket like that you should probably be using PUSH or XREQ
|
[19:42] sustrik
|
?
|
[19:42] pieterh
|
if connect/bind are 100% orthogonal to socket policy
|
[19:42] pieterh
|
THEN whether the PUB socket connects or binds is immateriel
|
[19:42] sustrik
|
PUB is fanout while PUSH/XREQ are load-balanced
|
[19:42] pieterh
|
yes, my bad
|
[19:43] sustrik
|
i mean, dropping the message in the case above is ok
|
[19:43] sustrik
|
but it's kind of surprising
|
[19:43] pieterh
|
anyhow, PUB sockets as 'clients' is weird and does not (yet) count as a realistic use case
|
[19:43] mato
|
pieterh: not for you, but people do weird things...
|
[19:43] pieterh
|
mato: yes, but often they should not
|
[19:44] pieterh
|
part of education is to help them not do that
|
[19:44] sustrik
|
ok, let me put the problem in another way
|
[19:44] pieterh
|
anyhow, if you make connect/bind direction relevant to socket policy you can do this
|
[19:44] pieterh
|
but that is a fairly big change semantically
|
[19:44] sustrik
|
the people often complain about not being able to write regular "clients"
|
[19:44] sustrik
|
i.e. dumb apps that connect to the server
|
[19:45] mato
|
yeah, i'm wondering about that [connect/bind semantic change]
|
[19:45] pieterh
|
mato: i don't like it from a complexity PoV
|
[19:45] sustrik
|
the problem is that messages from the client may be lost
|
[19:45] pieterh
|
sustrik: PUB as client is weird
|
[19:45] mato
|
|
[19:45] sustrik
|
it is
|
[19:45] sustrik
|
|
[19:45] pieterh
|
it's not a use case I care to cuddle
|
[19:45] sustrik
|
|
[19:46] sustrik
|
ok, i'll give it a thought
|
[19:46] mato
|
sustrik: i'm thinking... the fact is, a lot of the time it's not obvious in zmq apps which end is a "server" and which is a "client"
|
[19:46] sustrik
|
true
|
[19:46] pieterh
|
mato: yes, it is
|
[19:46] pieterh
|
the static nodes are servers
|
[19:46] sustrik
|
you can be both
|
[19:46] pieterh
|
the dynamic nodes are clients
|
[19:46] mato
|
bah
|
[19:46] pieterh
|
yes, this is clear
|
[19:46] pieterh
|
it's how every IETF protocol uses the semantics
|
[19:46] mato
|
i can't discuss this with people talking all over one another
|
[19:47] sustrik
|
ok, go on
|
[19:47] pieterh
|
:-)
|
[19:48] mato
|
so, i think that making such policy decisions on whether or not the connection came from bind or connect is a bad idea
|
[19:48] mato
|
now, sustrik, you have the problem that a "server will never exit"
|
[19:48] mato
|
i can counter that with what you told me about the client never exiting -- just use SO_LINGER
|
[19:49] mato
|
and I can answer my own problem by saying that we need a setcontextopt() which lets you set a timeout for zmq_term()
|
[19:49] mato
|
which can also be set to 0
|
[19:49] mato
|
which solves all the cases
|
[19:49] mato
|
no?
|
[19:49] pieterh
|
mato: yes
|
[19:49] pieterh
|
plus per socket-type policy similar to how we handle HWM
|
[19:50] pieterh
|
it's clear and systematic
|
[19:50] mato
|
well, that only affects PUB really... i'm undecided on that but leaning towards doing what HWM does for consistency
|
[19:50] mato
|
altough
|
[19:50] mato
|
let me find the flow control docs... 2mins
|
[19:50] sustrik
|
ok, say we agree on the above
|
[19:51] sustrik
|
now let's enter level 2
|
[19:51] sustrik
|
there's no such thing as "socket queue"
|
[19:51] pieterh
|
mato: yes, it should be like HWM since it's about delivery guarantees
|
[19:51] sustrik
|
what we have is a queue per _connection_
|
[19:52] pieterh
|
right
|
[19:52] sustrik
|
so the messages sent are destined for a specific connection
|
[19:52] sustrik
|
now, the connection can be either reconnectable, or transient
|
[19:52] pieterh
|
how about an XREQ socket with zero connections?
|
[19:52] sustrik
|
it blocks
|
[19:52] pieterh
|
on send()?
|
[19:52] sustrik
|
blocks
|
[19:52] pieterh
|
ok
|
[19:52] mato
|
one person at a time please, i can't follow
|
[19:53] mato
|
else i'll go talk to sustrik out of band and face to face :-)
|
[19:53] pieterh
|
mato: this is... irc, people chat.
|
[19:53] pieterh
|
sustrik: please, don't let mato interrupt you
|
[19:53] sustrik
|
:)
|
[19:53] sustrik
|
eee...
|
[19:54] sustrik
|
i've lost my train of thought
|
[19:54] mato
|
see :)
|
[19:54] pieterh
|
no queues per socket...
|
[19:54] sustrik
|
aha
|
[19:54] sustrik
|
if the connection is transient and it disconnects
|
[19:54] sustrik
|
it will never get reconnected
|
[19:55] sustrik
|
in such case there's nothing to do but to drop the messages
|
[19:55] sustrik
|
even if socket type is REQ or whatever
|
[19:55] pieterh
|
sure
|
[19:55] sustrik
|
which gives us inconsistent semantics:
|
[19:55] pieterh
|
it's an unrecoverable network error
|
[19:55] sustrik
|
1. transient connecitons - drop
|
[19:55] sustrik
|
2. permanent connection - wait till send
|
[19:55] pieterh
|
what defines a transient connection?
|
[19:55] sustrik
|
sent
|
[19:56] mato
|
sustrik: just a minute, ignoring close(), send() handles transient connections reconnecting, does it not?
|
[19:56] sustrik
|
basically it's a connection client's connection to server
|
[19:56] pieterh
|
you mean as compared to incoming connection on bound endpoint?
|
[19:57] pieterh
|
why will a transient connection never get reconnected?
|
[19:57] pieterh
|
there is too much data missing from your explanation, sorry
|
[19:57] mato
|
sustrik: by "transient connection", you mean an inbound connection?
|
[19:58] mato
|
sustrik: if so, by definition, that inbound (transient) connection will be someone else's outbound (permanent) connection
|
[19:58] sustrik
|
when the client has no identity
|
[19:58] sustrik
|
if client dies
|
[19:58] sustrik
|
it'll never get reconnected
|
[19:58] sustrik
|
there are 3 options:
|
[19:58] sustrik
|
1. zmq_connect
|
[19:58] sustrik
|
here we are proactively creating a connection, we can reconnect if the connection breaks
|
[19:58] sustrik
|
2. zmq_bind
|
[19:58] sustrik
|
here we are waiting for connections, if connections breaks there's no way of telling the new connection is actually a reincarnation of the old one
|
[19:58] sustrik
|
3. zmq_bind + connecter has an unique identity
|
[19:59] sustrik
|
ok
|
[19:59] sustrik
|
think of a server
|
[19:59] pieterh
|
hmm... so when an anonymous connection breaks, it's dead
|
[19:59] pieterh
|
but if the application did not specify an identity
|
[19:59] pieterh
|
this should not change ITS semantic view of things
|
[20:00] sustrik
|
connection is accepted
|
[20:00] sustrik
|
pieterh: exactly
|
[20:00] sustrik
|
that's case 2
|
[20:00] sustrik
|
case 3 is when client has a strong identity
|
[20:00] sustrik
|
in that case it can announce its identity on reconnect
|
[20:00] sustrik
|
and server can attach it to the existing session
|
[20:00] sustrik
|
it does not
|
[20:00] pieterh
|
sustrik: would you perhaps write this up on the wiki somewhere?
|
[20:00] pieterh
|
i'm tired and need to leave soon but would like to understand this fully
|
[20:01] mato
|
sustrik: but all connections have an identity. weak identity still means we can see when that client comes back
|
[20:01] mato
|
sustrik: it won't come back if it's restarted, but then we never know if a client *will* come back
|
[20:01] mato
|
sustrik: strong identity or not
|
[20:02] sustrik
|
the only semantic diference can be observed on client restart
|
[20:02] sustrik
|
i am not sure how it works/should work myself
|
[20:02] sustrik
|
identity stuff is not trivial
|
[20:02] sustrik
|
it's actually about how 0mq handles application restart
|
[20:02] sustrik
|
we have something that kind of works in some scenarios
|
[20:02] sustrik
|
but is in no way theoretically sound
|
[20:02] sustrik
|
the whole thing intersects with intended REQ/REP resend functionality
|
[20:04] mato
|
sustrik: well, we can do two things...
|
[20:04] mato
|
sustrik: i think it would be helpful if all the cases of the new semantics are formally written down
|
[20:04] pieterh
|
yes
|
[20:04] pieterh
|
this should be written down first
|
[20:04] mato
|
sustrik: then we can discuss that in person which may help, and present any results on the mailing list
|
[20:05] pieterh
|
IMO the intended REQ/REP resend can be built on top of this
|
[20:05] pieterh
|
or perhaps it will be the same thing
|
[20:06] mato
|
sustrik: we can discuss this tomorrow if you like? after all, that's why i'm here over the weekend
|
[20:06] mato
|
sustrik: and we can write the proposal together if you want
|
[20:06] pieterh
|
mato: i have the git workflows from wikidot, will post to the wiki tomorrow in edited form
|
[20:06] mato
|
pieterh: ok
|
[20:07] mato
|
sustrik: ack?
|
[20:07] sustrik
|
not relevant, because if it comes back its treated as a different client
|
[20:07] sustrik
|
thus messages queued for it's first reincarnation
|
[20:07] sustrik
|
won't be delivered to the next one
|
[20:07] sustrik
|
so, from the point of server, disconnection on anonymous session = permanently dead
|
[20:07] sustrik
|
in other words, there is no permanent session created on server side for anonymous connections
|
[20:07] sustrik
|
that kind of thing would very quickly exhaust all the memory
|
[20:07] sustrik
|
anyway, thanks for the "algorithm based on socket type" suggestion
|
[20:07] sustrik
|
i'm going to think about it
|
[20:07] mato
|
ah, sustrik is lagged
|
[20:07] mato
|
this is why the conversation sucks
|
[20:07] pieterh
|
lol
|
[20:07] mato
|
i'll call him :)
|
[20:08] sustrik
|
here i am again
|
[20:08] sustrik
|
my connection block regularlu
|
[20:08] sustrik
|
hello world!
|
[20:08] sustrik
|
hoo!
|
[20:08] mato
|
:-)
|
[20:08] sustrik
|
is anyone there???
|
[20:08] pieterh
|
sustrik is on a transient connection
|
[20:08] mato
|
sustrik: yes
|
[20:08] pieterh
|
sustrik: no
|
[20:08] sustrik
|
:)
|
[20:08] mato
|
bah
|
[20:09] pieterh
|
sustrik: IMO anonymous connection that is dead is the same as no connection at all
|
[20:09] pieterh
|
ex-parrots do not count
|
[20:10] pieterh
|
you might want to requeue message(s) that failed
|
[20:10] pieterh
|
or not
|
[20:10] pieterh
|
its kind of an edge case and does not need to be solved properly now
|
[20:16] sustrik
|
it's the case that's used all the time
|
[20:17] sustrik
|
people normally don't use identities to create permanent connections
|
[20:19] sustrik
|
cyl
|
[20:20] mato
|
sustrik: i'll think about it/sleep on it and will try to enumerate the cases tomorrow
|
[20:21] mato
|
sustrik: cyl
|