[Time] Name | Message |
[05:54] CIA-76
|
libzmq: 03Martin Sustrik 07master * rc7fb5c5 10/ (src/ctx.cpp src/select.cpp src/select.hpp src/windows.hpp): Reverting previous commit that broke MSVC2010 build ...
|
[05:54] CIA-76
|
libzmq: 03Martin Sustrik 07master * r970798f 10/ builds/msvc/libzmq/libzmq.vcproj : mtrie.cpp added to MSVC build ...
|
[08:12] mikko
|
sustrik: the icc patch
|
[08:12] mikko
|
sustrik: that didnt break gcc build?
|
[08:40] sustrik
|
mikko: no, i've tested it with gcc
|
[08:41] sustrik
|
and the compilation went ok
|
[08:41] sustrik
|
did it break it for you?
|
[08:50] mikko
|
no, im just wondering why gcc is so relaxed about these things
|
[08:54] jsimmons
|
built by bearded hippies :D
|
[08:55] sustrik
|
actaully, gcc gets it right, icc does not
|
[08:56] sustrik
|
zmq_assert (false); means that the execution never gets past that point
|
[08:56] mikko
|
i just viewed the patch it looking like a missing return
|
[08:56] sustrik
|
gcc detects the fact, icc does not
|
[08:56] mikko
|
sun studio works as well?
|
[08:56] mikko
|
i think at the moment sun studio is my favourite compiler
|
[08:56] sustrik
|
let me see...
|
[08:56] mikko
|
it seems to be the strictest of the three main ones
|
[08:57] sustrik
|
sun studio looks ok
|
[09:01] pieterh
|
hi
|
[09:02] pieterh
|
mikko: sustrik: did we figure out why mingw32 didn't complain with Steve's patch?
|
[09:03] mikko
|
pieterh: i don't know what the patch fixes really
|
[09:03] sustrik
|
no idea
|
[09:03] sustrik
|
some kind on windows black magic
|
[09:03] mikko
|
i've been slightly super-busy lately and haven't been able to pay much attention
|
[09:03] sustrik
|
order of includes can break things
|
[09:04] pieterh
|
I don't like blindly applying / reverting patches
|
[09:04] sustrik
|
i warned you about this one
|
[09:04] pieterh
|
let me boot up that old virtual XP
|
[09:30] pieterh
|
sustrik: ok, I've found the cause and the fix of the win32 build issue
|
[09:30] sustrik
|
yes?
|
[09:31] pieterh
|
I've no idea why it worked before, it doesn't seem directly caused by Steve's patch
|
[09:31] pieterh
|
basically if you include winsock2.h and then later winsock.h, it'll do weird stuff
|
[09:31] pieterh
|
tries to redefine (some) of the same constants
|
[09:32] pieterh
|
it's the mswsock.h include, I think (will check)
|
[09:32] pieterh
|
the fix is, in windows.hpp, #define _WINSOCKAPI_ // stops windows.h including winsock.h
|
[09:34] pieterh
|
well, it's not that header... in fact I can't find which header is responsible :-/
|
[09:34] pieterh
|
but the fix does work, have tested that
|
[09:35] sustrik
|
you mean steve's fix?
|
[09:36] sustrik
|
it doesn't work for me
|
[09:36] sustrik
|
msvc2010 build failed
|
[09:36] sustrik
|
when i reverted the patch it succeeded
|
[09:36] pieterh
|
yeah, agreed
|
[09:37] pieterh
|
there's something weird going on, when I define that macro in windows.hpp, at the start, it succeeds
|
[09:37] pieterh
|
otherwise, it fails
|
[09:37] pieterh
|
yet the very first include file that is supposed to be called starts by defining that macro
|
[09:37] pieterh
|
I think there's something broken in the Windows version number detection
|
[09:38] sustrik
|
do you have any idea what problem steve was fixing?
|
[09:38] sustrik
|
jenkins mingw builds seem to work ok
|
[09:38] pieterh
|
it looks like he was fixing some errors, e.g. using "" to include system header files
|
[09:39] pieterh
|
it's definitely a problem with version detection, possibly provoked by that LEAN_AND_MEAN change
|
[09:40] sustrik
|
the description of the patch is not very helpful: "Fix scope on Windows includes. Fix windows.h included before
|
[09:40] sustrik
|
winsock2.h. Remove definition of _WINSOCKAPI_.
|
[09:40] sustrik
|
"
|
[09:41] pieterh
|
sustrik: heh, " Remove definition of _WINSOCKAPI_."
|
[09:41] pieterh
|
He could as well have said, "Cause major breakage at compile time"
|
[09:42] pieterh
|
ok, since the fix is apparently to revert the patch, I'm going with that
|
[09:42] sustrik
|
ok
|
[09:43] pieterh
|
the problem here is if you somehow include winsock.h and then include windows.h (which the code does, after his change), you get symbols defined again by winsock2.h
|
[09:43] pieterh
|
or even if you include windows.h, then include winsock2.h, the same
|
[09:44] sustrik
|
does the unpatched version fail for you?
|
[09:44] pieterh
|
let me revert, and check that
|
[09:47] pieterh
|
sustrik: it builds fine, after reverting the patch, though I see what Steve was aiming at
|
[11:38] pieterh
|
sustrik: is there an issue for commit 864c18 (https://github.com/zeromq/libzmq/commit/864c18f797203c06e66e739166b246cfb3d47ce9)?
|
[11:39] pieterh
|
no changes to stable without issues and test cases
|
[11:39] pieterh
|
http://www.zeromq.org/docs:distributions#toc3
|
[11:40] pieterh
|
we managed to release 2.1.5 and 2.1.6 with breakage, it's not worth cutting corners
|
[11:40] pieterh
|
I'm happy to make a test case if there's an issue
|
[11:42] sustrik
|
drop it then
|
[11:42] pieterh
|
we shouldn't lower our standards for contributions
|
[11:42] pieterh
|
we agreed to have issues for changes
|
[11:42] sustrik
|
what it does is that it returns ENOMEM instead of asserting in zmq_msg_init*
|
[11:43] pieterh
|
for out of memory conditions?
|
[11:43] sustrik
|
yes
|
[11:43] pieterh
|
well, if there's no test case, there's no proof the fix actually works
|
[11:43] sustrik
|
sure
|
[11:43] sustrik
|
ditch it
|
[11:43] pieterh
|
ditched
|
[11:44] pieterh
|
I'd recommend being more strict about contributed patches in the future
|
[11:44] pieterh
|
though at the same time it's good to keep barriers low
|
[11:44] sustrik
|
:)
|
[11:45] pieterh
|
i think contributions to libzmq can be quite difficult, there are lower barriers elsewhere
|
[11:45] pieterh
|
plus there is an educational aspect, making test cases is just good practice
|
[11:46] pieterh
|
sustrik: I have an unrelated question / discussion
|
[11:46] sustrik
|
yes?
|
[11:46] pieterh
|
so I've gotten UDP working quite nicely
|
[11:46] sustrik
|
kudos!
|
[11:46] pieterh
|
it's actually a nice fit for 0MQ
|
[11:46] pieterh
|
I have a small wire protocol on top of UDP
|
[11:47] pieterh
|
there is a connection semantic, heartbeating, etc.
|
[11:47] pieterh
|
I'm going to call the protocol NOM-1
|
[11:47] pieterh
|
nom-oriented messaging protocol 1
|
[11:48] pieterh
|
so... I'd like to experiment with selectors
|
[11:48] pieterh
|
that is, socket validation at connection time, and receivers specifying selectors
|
[11:48] pieterh
|
so pull, dealer, and sub would all work with prefix filters
|
[11:49] pieterh
|
done at the sender side
|
[11:49] pieterh
|
the reason here is that I basically have one engine for all socket types
|
[11:50] pieterh
|
so if I do filtering for sub sockets, it's the same code whether it's pub-side or sub-side
|
[11:50] pieterh
|
and I can do it for free on all socket types where it makes sense
|
[11:50] pieterh
|
any obvious scalability problems that you can see with this?
|
[11:52] sustrik
|
why do you want to do filtering in the transport?
|
[11:52] sustrik
|
seems to be a wrong place for it
|
[11:52] pieterh
|
good question
|
[11:53] pieterh
|
it turns out the transport cannot simply be a transport
|
[11:53] pieterh
|
in fact the driver has to reimplement socket semantics
|
[11:53] pieterh
|
that is quite OK
|
[11:55] sustrik
|
what's the reason for that?
|
[11:55] pieterh
|
it's more to do with the VTX approach than UDP
|
[11:55] pieterh
|
I may be able to create a generic socket emulation layer above the transport
|
[11:55] pieterh
|
basically because VTX talks to applications over inproc
|
[11:56] pieterh
|
(you'll see the same issue with any bridge, in fact)
|
[11:56] pieterh
|
the semantics of app-to-bridge are pair
|
[11:57] sustrik
|
a brigde is basically a device imo
|
[11:57] pieterh
|
that is also what I'd hoped, but it doesn't work like that
|
[11:57] sustrik
|
why so?
|
[11:58] eintr
|
a device breaks delivery guarantees always, correct? it's not "transparent" as far as delivery goes, right?
|
[11:58] pieterh
|
well, if your bridge wants to support all socket types, it has to have N device models built in
|
[11:58] pieterh
|
where each device model in fact emulates a specific socket type
|
[11:58] sustrik
|
yes
|
[11:59] pieterh
|
which is where I ended up, with one engine emulating 10 socket types rather than 10 simpler engines
|
[11:59] pieterh
|
especially if all 10 simpler engines have to speak UDP
|
[11:59] pieterh
|
you can see that each bridge/device would be custom made for the transport
|
[11:59] sustrik
|
can't you simply specify the pattern when creating the bridge?
|
[12:00] pieterh
|
:-) of course
|
[12:00] pieterh
|
you say, "i want a push socket", and the bridge emulates that
|
[12:00] sustrik
|
right
|
[12:00] pieterh
|
but if you use the simplistic "bridge is device" approach you have to write 10 bridges
|
[12:00] pieterh
|
and then start the right one
|
[12:00] sustrik
|
not really
|
[12:01] sustrik
|
the device code is generic
|
[12:01] pieterh
|
only in 0MQ because you can use 0MQ sockets at both ends :-)
|
[12:01] pieterh
|
that is cheating
|
[12:01] pieterh
|
and doesn't work when one end is UDP or something else
|
[12:01] pieterh
|
you need to e.g. load-balance yourself
|
[12:01] sustrik
|
i see
|
[12:01] pieterh
|
manually, explicitly
|
[12:02] sustrik
|
but different patterns have different protcols
|
[12:02] sustrik
|
req/rep has backtrace stack
|
[12:02] pieterh
|
indeed
|
[12:02] sustrik
|
pub/sub has topics etc.
|
[12:02] pieterh
|
indeed
|
[12:02] pieterh
|
I will support all these over UDP, of course
|
[12:02] sustrik
|
one off-topic remark
|
[12:03] pieterh
|
sure
|
[12:03] sustrik
|
if you want pub-side filtering with UDP you have to build reliability into the transport
|
[12:03] pieterh
|
yes, I know
|
[12:03] sustrik
|
ok
|
[12:03] pieterh
|
in any case request-reply won't work otherwise either
|
[12:03] pieterh
|
single lost message = blocked client
|
[12:04] pieterh
|
it actually works really nicely
|
[12:04] pieterh
|
since I can add reliability precisely in those cases I need it
|
[12:04] sustrik
|
any point in using UDP then?
|
[12:04] pieterh
|
oh, yes
|
[12:04] sustrik
|
looks like duplicating TCP functionality
|
[12:04] pieterh
|
e.g. no reliability on pubsub
|
[12:04] pieterh
|
broadcast functionality, i.e. connect to *:port
|
[12:04] eintr
|
no timeouts
|
[12:05] pieterh
|
no reliability on push/pull or dealer
|
[12:05] pieterh
|
plus the goal isn't really UDP, it's about learning how to add user-space transports
|
[12:05] pieterh
|
next up, TCP
|
[12:06] sustrik
|
ok, i see
|
[12:06] pieterh
|
is it worth exploring selectors for pull sockets?
|
[12:06] sustrik
|
nope imo
|
[12:06] sustrik
|
the parallelised pipeline is for load distribution
|
[12:06] sustrik
|
filtering doesn't make sense there
|
[12:07] pieterh
|
it does if you do sender-side filtering
|
[12:07] sustrik
|
what would that be good for?
|
[12:07] pieterh
|
well, you have a task queue and then workers can join and specify the category of tasks they're prepared to handle
|
[12:08] sustrik
|
the workers are meant to be interchangeble
|
[12:08] pieterh
|
sure
|
[12:08] pieterh
|
you can have interchangeable workers
|
[12:08] pieterh
|
many for any category of tasks, any mix
|
[12:08] sustrik
|
then you should have multiple pipelines
|
[12:08] pieterh
|
ah, queuing issues
|
[12:09] sustrik
|
and administration
|
[12:09] pieterh
|
yes, that's one option but it means you split work over queues
|
[12:09] sustrik
|
imagine one worker fails
|
[12:09] sustrik
|
are the other workers able to take over the load?
|
[12:09] sustrik
|
etc.
|
[12:09] pieterh
|
yes
|
[12:09] pieterh
|
because we know when peers disappear
|
[12:10] sustrik
|
yes
|
[12:10] pieterh
|
but that's just the same as normal pipeline
|
[12:10] sustrik
|
however, with filtering it's not clear whether there's a worker that can process particular request
|
[12:10] sustrik
|
with multiple pipelines it's obvious
|
[12:10] pieterh
|
true, so a task may remain stuck in a queue
|
[12:11] pieterh
|
well, we solved this issue in AMQP, if a task can't be delivered because no-one's willing to handle it, it gets dropped
|
[12:12] pieterh
|
but ok
|
[12:12] pieterh
|
how did you implement xsub, then? you send subscribe messages from sub to pub?
|
[12:12] sustrik
|
yes
|
[12:13] pieterh
|
ok
|
[12:13] sustrik
|
here's a sketch of the arch whitepaper:
|
[12:13] sustrik
|
http://www.250bpm.com/pubsub
|
[12:15] pieterh
|
hmm, any reason for not simply replacing sub/pub with xsub/xpub semantics?
|
[12:15] pieterh
|
we seem to have three different ways to talk to 'special' sockets
|
[12:15] pieterh
|
setsockopt, send special frame, send special message
|
[12:16] sustrik
|
there are two layers in the stack
|
[12:16] sustrik
|
X- layer and non-X layer
|
[12:16] pieterh
|
:-)
|
[12:16] sustrik
|
in X-layer you compose the messages by hand
|
[12:16] pieterh
|
you always tell me about the architecture of your implementation
|
[12:17] sustrik
|
ok, forget it
|
[12:17] pieterh
|
whereas I'm always arguing about APIs :-)
|
[12:17] sustrik
|
as an end user you should use only non-X socket types
|
[12:17] sustrik
|
which are consistent in using socket options
|
[12:18] pieterh
|
yes, you're right
|
[12:18] sustrik
|
note that you can plug into subscription forwarding mechanism
|
[12:19] sustrik
|
that may be of use in vtx
|
[12:19] pieterh
|
we could one day see if there's an alternative implementation for ROUTER
|
[12:19] sustrik
|
ah, that reminds my of issue 190
|
[12:19] sustrik
|
i have to fix that sooner or later
|
[12:19] sustrik
|
at that point we have to separate req/rep from the router
|
[12:20] sustrik
|
i can do that
|
[12:20] sustrik
|
however, someone has to take care of new router/dealer socket types
|
[12:20] sustrik
|
would you like to become a maintainer?
|
[12:21] pieterh
|
well, I'm not competent to modify the code, but willing to learn
|
[12:21] pieterh
|
however, you mean 'contributor', right?
|
[12:21] pieterh
|
:)
|
[12:21] pieterh
|
how will you fix issue 190?
|
[12:22] sustrik
|
straightforwardly: drop messages on disconnect
|
[12:22] pieterh
|
drop messages where, on disconnect what?
|
[12:22] sustrik
|
client
|
[12:22] sustrik
|
requester
|
[12:22] pieterh
|
that doesn't seem to address the issue...
|
[12:22] pieterh
|
1000 requests waiting at the REP side, no?
|
[12:23] sustrik
|
if requester is dead, there's noone to send replies to => drop any pending requests & replies
|
[12:23] pieterh
|
so when a REQ dies, you go through the queue and remove any requests it sent?
|
[12:23] sustrik
|
yes
|
[12:24] pieterh
|
how does that end up separating req/rep from router?
|
[12:24] sustrik
|
router won't work anymore then
|
[12:24] sustrik
|
i assume "send(); close();" is a valid sequence for router
|
[12:24] pieterh
|
ah, right
|
[12:25] sustrik
|
so, i'll serpate the two
|
[12:25] pieterh
|
i don't see how you can solve issue 190 over more than 1 hop
|
[12:25] sustrik
|
i can't
|
[12:25] sustrik
|
it's just an optimisation
|
[12:25] sustrik
|
not perfect solution
|
[12:25] pieterh
|
imo it
|
[12:25] pieterh
|
it's a wrongly stated problem
|
[12:26] pieterh
|
"In my case I am attempting to do least recently used"
|
[12:26] pieterh
|
the problem is not the queuing at the rep socket end
|
[12:26] sustrik
|
yes
|
[12:26] pieterh
|
the problem is the semantics where clients disappear as part of the normal scenario
|
[12:27] sustrik
|
exactly
|
[12:27] pieterh
|
if this problem is even worth solving, it means it happens often
|
[12:27] pieterh
|
once a week, it's not an issue
|
[12:27] sustrik
|
yes
|
[12:27] pieterh
|
so the problem is not solvable at the socket level at all
|
[12:27] sustrik
|
?
|
[12:27] pieterh
|
it requires some protocol for disconnected clients and workers
|
[12:27] pieterh
|
i.e. I can make a request and come back later to fetch the response
|
[12:28] pieterh
|
if clients are connected, they're connected
|
[12:28] sustrik
|
you can do that using identities
|
[12:28] pieterh
|
explicit identities?
|
[12:28] sustrik
|
yeah
|
[12:28] pieterh
|
that's not a good answer
|
[12:28] sustrik
|
ugly
|
[12:28] pieterh
|
nope
|
[12:29] pieterh
|
it's hacking the socket layer for something it shouldn't be used for
|
[12:29] pieterh
|
and I don't see that issue 190 is a reason to break ROUTER functionality
|
[12:29] sustrik
|
router is a different pattern anyway
|
[12:29] pieterh
|
perhaps
|
[12:30] pieterh
|
I mean, it should be implemented, in one way or another
|
[12:30] sustrik
|
so my proposal is to split the two
|
[12:30] sustrik
|
what's missing though
|
[12:30] pieterh
|
being able to address anonymous peers by automatic identity is a valid pattern
|
[12:30] sustrik
|
is docuementation
|
[12:30] pieterh
|
it so happens xrep does that perfectly
|
[12:30] pieterh
|
clumsy, though
|
[12:30] pieterh
|
however the use case in 190 is bogus IMO
|
[12:31] sustrik
|
forget about the use case
|
[12:31] sustrik
|
router will break sooner or later anyway
|
[12:31] pieterh
|
why?
|
[12:31] sustrik
|
imagine adding discarding duplicates to req/rep
|
[12:31] sustrik
|
for example
|
[12:32] sustrik
|
basically any firther development of req/rep pattern breaks router
|
[12:32] pieterh
|
look, the basic addressing model for req-xrep-xreq-rep will break sooner or later
|
[12:32] sustrik
|
which mean req/rep is stuck atm
|
[12:32] pieterh
|
we know that
|
[12:33] pieterh
|
once you accept that router (and dealer) are valid user space patterns
|
[12:33] pieterh
|
we can design a better router API
|
[12:33] sustrik
|
sure, that's what i am proposing
|
[12:33] pieterh
|
makes sense
|
[12:33] sustrik
|
i'll separate req/rep from router
|
[12:33] sustrik
|
you'll become maintener of router
|
[12:33] pieterh
|
this won't happen in 2.1.x but could happen in 2.2
|
[12:33] sustrik
|
i'll take care of req/rep
|
[12:33] pieterh
|
well, 'maintainer' means, accepting patches and running test cases and enforcing process
|
[12:34] sustrik
|
you can leave the code as is if that's what you want
|
[12:34] sustrik
|
the point now is there's no documentation for router pattern
|
[12:34] pieterh
|
?
|
[12:34] sustrik
|
so even if i split the two
|
[12:34] pieterh
|
there's like 50 pages of that
|
[12:34] pieterh
|
please, one day when you have nothing else to do, read the Guide
|
[12:34] pieterh
|
seriously
|
[12:35] sustrik
|
i meant in man pages
|
[12:35] sustrik
|
so zmq_socket(3)
|
[12:35] sustrik
|
see
|
[12:35] sustrik
|
there should be at least a paragraph about router pattern
|
[12:35] pieterh
|
there is a section describing ZMQ_ROUTER, yes
|
[12:35] sustrik
|
and a paragraph for each associated socket type
|
[12:36] sustrik
|
in 2-1?
|
[12:36] pieterh
|
yes
|
[12:36] sustrik
|
let me see
|
[12:36] pieterh
|
ROUTER only makes sense within REQ-REP pattern
|
[12:36] pieterh
|
it's not a separate pattern
|
[12:36] sustrik
|
it is
|
[12:36] sustrik
|
ROUTER != XREP
|
[12:36] pieterh
|
I seem to remember that patterns are not interconnectable
|
[12:36] sustrik
|
exactly
|
[12:37] pieterh
|
yet ROUTER must be able to talk to REP, REQ, DEALER, and ROUTER
|
[12:37] pieterh
|
otherwise it's kind of... useless :)
|
[12:37] pieterh
|
hey, I have this great socket type but it can't talk to anything else
|
[12:37] pieterh
|
-1
|
[12:37] sustrik
|
ok, let it be for now
|
[12:37] sustrik
|
we'll sort it out once the router breaks
|
[12:37] pieterh
|
read the Guide, martin, get some idea of actual use cases for this stuff
|
[12:37] pieterh
|
it's good to have all the theory
|
[12:38] pieterh
|
but in the end it's what people do with it that really defines reality
|
[12:38] pieterh
|
if you break router arbitrarily, you'll annoy a lot of people
|
[12:39] sustrik
|
that's why router should be a separate pattern
|
[12:39] pieterh
|
the argument "you should not have used it, I warned you" won't work
|
[12:39] sustrik
|
changing req/rep won't break it
|
[12:39] pieterh
|
that's only plausible if you let patterns interconnect
|
[12:39] sustrik
|
req/rep is meant for stateless services
|
[12:39] pieterh
|
router should be part of the request-reply pattern, but be much more explicit
|
[12:39] sustrik
|
if you want something different, use router
|
[12:39] sustrik
|
easy
|
[12:40] pieterh
|
connecting to what?
|
[12:40] sustrik
|
another router?
|
[12:40] sustrik
|
dealer?
|
[12:40] pieterh
|
no, no, and like my son says, no
|
[12:40] pieterh
|
:)
|
[12:40] sustrik
|
you can add as many socket types to the router pattern as you want
|
[12:40] pieterh
|
routing is from req, to rep / dealer
|
[12:41] pieterh
|
this is splitting hairs
|
[12:41] sustrik
|
routing allows you to address particular service instance
|
[12:41] pieterh
|
making the router semantics a clearly defined package is good
|
[12:41] sustrik
|
which breaks the model of interchangable staless services
|
[12:41] sustrik
|
stateless
|
[12:41] pieterh
|
sustrik: you're not accurate, really sorry
|
[12:41] sustrik
|
shrug
|
[12:42] pieterh
|
router is definitely used for interchangeable stateless services
|
[12:42] pieterh
|
look at the lruqueue for example
|
[12:42] pieterh
|
I can't discuss this if you refuse to read the dozens of worked examples I made, which are widely used
|
[12:42] sustrik
|
what if the service instance you send message to is dead?
|
[12:43] pieterh
|
precisely
|
[12:43] sustrik
|
0mq has to pass it to some other instance of the service
|
[12:43] sustrik
|
which means the address is disregarded anyway
|
[12:43] pieterh
|
ok, as you like
|
[12:43] pieterh
|
i'm not going to insist on this
|
[12:44] pieterh
|
you've said for a year or so that XREP was not meant for end users
|
[12:44] sustrik
|
ok, cyl
|
[12:44] pieterh
|
we went ahead and did it
|
[12:44] pieterh
|
I'm sure you'll break it and explain why
|
[12:45] pieterh
|
and you're pretty stubborn about not learning *why* people use it, and *what* they do with it
|
[12:46] sustrik
|
they use it to address state in the network
|
[12:46] sustrik
|
which is ok, but not a stateless req/rep model
|
[12:47] sustrik
|
simply a different thing
|
[12:47] pieterh
|
hey, we use router to kill puppies, which is obviously wrong
|
[12:47] pieterh
|
the actual use case is to create application-level routing to peers
|
[12:47] pieterh
|
which is not the same as state
|
[12:48] sustrik
|
yes
|
[12:48] pieterh
|
req-rep already has state
|
[12:48] pieterh
|
a reply address is state
|
[12:48] sustrik
|
yes, but it's nicely encapsulated in the message
|
[12:48] pieterh
|
meaningless distinction
|
[12:48] sustrik
|
the point is not to have state at the nodes
|
[12:48] pieterh
|
obviously 190 is about state
|
[12:48] pieterh
|
there is no state at the nodes
|
[12:48] sustrik
|
erlang-style approach
|
[12:49] sustrik
|
you mean the queues?
|
[12:49] pieterh
|
what do you mean by 'nodes'?
|
[12:49] sustrik
|
applications
|
[12:49] sustrik
|
state = business logic state
|
[12:49] pieterh
|
I don't think a single of the router use cases puts state in the applications
|
[12:49] pieterh
|
you'd know that if you read the guide
|
[12:50] pieterh
|
in fact, categorically, router is used to construct devices
|
[12:50] sustrik
|
XREP
|
[12:51] pieterh
|
you got me
|
[12:51] sustrik
|
:)
|
[12:51] pieterh
|
should have said that half an hour ago
|
[12:51] sustrik
|
well, which pattern should i have a look at?
|
[12:51] pieterh
|
XREP/ROUTER is not used in applications
|
[12:51] sustrik
|
is that the state of affairs with the users?
|
[12:51] pieterh
|
it makes no sense and that was never covered in the guide
|
[12:52] pieterh
|
if they do stuff that's not explained in the Guide, I'm not responsible :)
|
[12:52] pieterh
|
I'm sure people try *everything*
|
[12:52] sustrik
|
well, but that's the point
|
[12:52] pieterh
|
however our _users_ are principally not application developers
|
[12:52] sustrik
|
if i break that uncovered in the guide use case
|
[12:52] pieterh
|
they are infrastructure builders
|
[12:52] pieterh
|
they build brokers
|
[12:52] sustrik
|
everybody will be pissed off
|
[12:53] pieterh
|
for good reasons
|
[12:53] sustrik
|
that's why i want to split the existing functioanlity into a separate pattern
|
[12:53] pieterh
|
but I'm confident when you actually read the guide you'll be like "ah, I get it!"
|
[12:54] pieterh
|
there's only 280 pages or so to get through
|
[12:54] pieterh
|
I've made a nice PDF for you to download
|
[12:54] sustrik
|
that's quite a lot. which part should i have a look at?
|
[12:54] pieterh
|
chapters 3 and 4, I guess
|
[12:55] sustrik
|
any pattern that makes your point clear?
|
[12:55] pieterh
|
you need to see how router/xrep is really used, why I forced that rename
|
[12:55] pieterh
|
well, there are a few
|
[12:55] pieterh
|
lruqueue, all the reliable request-reply patterns
|
[12:55] sustrik
|
ok, let me have a look at lrqueue
|
[12:55] pieterh
|
lruqueue was the original abuse of XREP to solve a real issue
|
[12:56] pieterh
|
there was no better answer at the time (I'd have loved one)
|
[12:56] pieterh
|
tbh the whole business with special message frames is annoying
|
[12:56] pieterh
|
but it does work
|
[12:56] sustrik
|
hm, no "lrqueue" in the text
|
[12:56] pieterh
|
lruqueue
|
[12:56] sustrik
|
ah
|
[12:56] pieterh
|
least recently used queue broker
|
[12:57] pieterh
|
the only state it maintains is presence/absence/busy/available of workers
|
[12:58] pieterh
|
does not use explicit identities
|
[12:58] pieterh
|
ok, cyl, I'm going skating with the kids, it's a holiday here in Belgium
|
[12:59] sustrik
|
cya
|
[13:01] pieterh
|
I think lruqueue is the canonical example, if you can solve that better, we're winning
|
[13:01] pieterh
|
cyl, nice chatting
|
[13:04] sustrik
|
ok, read it
|
[13:04] sustrik
|
the goal is to tweak the schduler
|
[13:04] sustrik
|
that can be done by socket options
|
[13:09] sustrik
|
i have a more powerfull tool on my todo list -- a priority based-scheduler
|
[13:09] sustrik
|
requires some work though
|
[16:22] michelp
|
OMG backscroll
|
[16:24] michelp
|
FWIW we use the lruqueue pattern to great effect
|
[17:52] brianjarita
|
hey all ... I am trying to daemonize a python script that is a handler for mongrel2. What is the best way to do this?
|
[17:58] pieterh
|
brianjarita: I kind of think you're in the wrong irc channel
|
[17:58] pieterh
|
this is #zeromq
|
[17:59] pieterh
|
we're more about ... well... puppies, and stuff
|
[18:02] brianjarita
|
haha ... thanks i'll ask #mongrel2
|
[18:02] pieterh
|
brianjarita: or maybe #python
|
[20:56] iFire
|
wondering, anyone use infiniband with zeromq?
|
[20:59] pieterh
|
iFire: hi
|
[20:59] iFire
|
just wondering. I don't really plan on it
|
[20:59] pieterh
|
http://www.zeromq.org/results:ib-tests-v206
|
[21:00] iFire
|
that's megabits right?
|
[21:01] pieterh
|
megabytes
|
[21:01] pieterh
|
ah, sorry, no, megabits
|
[21:01] iFire
|
what type of infiniband
|
[21:02] pieterh
|
that page is all the info I've got, but if you google 'zeromq infiniband' you can find more material
|
[21:02] iFire
|
4x 16gigabit ddr is rated for16gigabits
|
[21:02] iFire
|
which is about 2000megabytes/s (calculator)
|
[21:03] iFire
|
9026 megabits to megabytes = 1128 megabytes
|
[21:03] iFire
|
but experimental is good :)
|
[21:04] pieterh
|
also this isn't using rdma or anysuch
|
[21:04] pieterh
|
I'd assume that message throughput will be limited by CPU, to 6-8M events a second per core
|
[21:05] pieterh
|
but you should be able to saturate any network at messages of 64KB and above
|
[21:06] pieterh
|
in those 2.0.6 tests, obviously the network is saturated at about 9Mb/sec
|
[21:06] pieterh
|
network, or driver
|
[21:06] iFire
|
yea 900Megabits/sec
|
[21:06] iFire
|
9000Megabits/sec
|
[21:06] pieterh
|
that's not a zeromq limitation, afaics
|
[21:08] iFire
|
Mellanox MT25204 is either 10Gigabits/s or 20 gigabits/s
|
[21:09] iFire
|
I'm thinking it's probably 10gigabits in that machine
|
[21:09] pieterh
|
that's what I'd conclude
|
[21:09] iFire
|
hmm let me see what 8gigabits is in megabits
|
[21:09] pieterh
|
do you have material to test on?
|
[21:09] iFire
|
no
|
[21:09] iFire
|
I was just wondering
|
[21:10] pieterh
|
you may get more answers on the list
|
[21:43] ssi
|
does it make sense to use high-water marks on PULL/PUSH sockets?
|
[21:44] ssi
|
I see stuff in the guide regarding HWM with pub/sub model, but nothing regarding push/pull
|
[21:50] michelp
|
ssi, it does make sense. Check out the man page for zmq_socket
|
[21:56] ssi
|
I see... so the HWM will basically only be on the PUSH side... if there's noone pulling messages from downstream, the PUSH socket will fill up, and on the HWM it'll block
|
[21:56] ssi
|
so I guess the call to send() itself blocks?
|
[21:57] ssi
|
I'm working on writing a monitoring system that'll let me watch the backlogs on all the sockets for each node
|
[22:04] ssi
|
so does a PULL socket only have inbound messages if recv is being called? Ie, no queueing at the PULL end ever happens?
|
[22:04] ssi
|
otherwise, wouldn't PULL also be able to fill up and need a HWM to signal the upstream node not to send anymore?
|
[22:08] ssi
|
oshi I think I broke activemq's brain
|
[22:09] ssi
|
I tested my new jmsreceiver to zmq bridge with the new zmq pipeline, and ran 10 million small text messages through activemq
|
[22:09] ssi
|
zmq side ate it up, but activemq gave up around 8.5M
|
[22:21] michelp
|
ssi, you should look into zmq_poll, i'm not 100% positive on this but it will tell you when it's ok to push without blocking
|
[22:21] michelp
|
if you register your socket with POLLOUT
|
[22:22] michelp
|
yeah "For 0MQ sockets, at least one message may be sent to the socket without blocking." that's for POLLOUT
|
[22:22] michelp
|
the man page is zmq_poll
|
[22:23] ssi
|
hrm
|
[22:23] ssi
|
I'm using poll for receives
|
[22:23] ssi
|
here's the issue I'm facing:
|
[22:24] ssi
|
I have a JmsReceiver component which pulls messages off a JMS queue, converts them to my internal message format, and puts them into the pipeline
|
[22:24] ssi
|
if I fire it up with more messages on the queue than heap in my jvm, it eagerly consumes them all and OOMs
|
[22:25] ssi
|
so I want to find a solid way of making sure that I'm throttling everyone upstream while work is being done
|
[22:25] ssi
|
the more I dig in, the more I think most links in the chain are well-behaved
|
[22:26] ssi
|
I'm trying to print the backlog on all the sockets, but everything is coming back -1 so far
|
[22:29] ssi
|
my arch is kinda complex, but basically each pipeline node is a streamer device which is bound on the upstream end to the address that other nodes can send messages in through, and on the downstream end to a worker address. Each node has N workers which exist in a threadpool, consuming messages off the streamer and doing work. When they're finished with the work, they send the message down the pipeline to the stable address of the next node(s) in the fl
|
[22:30] ssi
|
the JmsReceiver is a node which also happens to be an MDB. onMessage, it creates a socket, connects to its own inbound stable address, sends that message, and closes the socket
|
[22:30] ssi
|
if queueing on push/pull is done at the push end, that may be my issue
|
[22:30] ssi
|
if I make that receiver maintain its socket, it may behave better
|
[22:38] michelp
|
woah sorry, i'm at work and not able to really pay enough attention to help you. maybe just as a some general advice i'd tell you to try and do what you want to do in a very small and simple test case first
|
[22:38] michelp
|
without all the complexity
|
[22:39] michelp
|
even in just some side code, or in a quick scripting language where you can experiment at will
|
[22:39] michelp
|
then translate that into your big project once you've figured out where the misbehavior is
|
[22:40] ssi
|
heheh sadly, this is the small test case :)
|
[22:40] ssi
|
it's not as complex as it sounds, really
|
[22:41] ssi
|
and the problem is indeed that the jms consumer is eating everything on the queue as fast as possible
|
[22:41] ssi
|
I just need to figure out how to properly monitor the backlogs... everything reports -1 no matter what
|
[22:47] ssi
|
hrm backlog isn't what I think it is
|
[22:47] ssi
|
is there a way to get ahold of the number of messages that are queued?
|
[22:53] pieterh
|
ssi: hi
|
[22:53] pieterh
|
how many processes is this split into?
|
[22:56] ssi
|
this is all done with inproc, so it's a single process
|
[22:56] ssi
|
threaded, of course
|
[22:56] pieterh
|
ok, so a few things here...
|
[22:56] pieterh
|
you can't measure the queue size, no
|
[22:56] ssi
|
right, reading that
|
[22:56] pieterh
|
wish that was possible but it's not
|
[22:57] pieterh
|
best you can do is set a lowish HWM everywhere and check for POLLOUT before writin
|
[22:57] pieterh
|
when you hit a HWM somewhere, you know you've a problem
|
[22:57] ssi
|
I am setting low HWMs everywhere, but i'm not checking pollout, so that's likely the part I'm missing
|
[22:57] ssi
|
I need to figure out how to do that (using the java bindings)
|
[22:58] pieterh
|
second thing, this is presumably your first real 0MQ application?
|
[22:58] ssi
|
yes
|
[22:58] pieterh
|
ok, so plan to throw it away
|
[22:58] ssi
|
and I'm not sure I'd call it "real" yet, but it certainly has come together quicker than I expected
|
[22:58] pieterh
|
especially if you make it bigger than you can fully control
|
[22:59] pieterh
|
it's kind of too easy to put things together without really understanding what's going on
|
[22:59] ssi
|
well fortunately, it's complex but it's not big. It's exactly one class of any significance
|
[22:59] pieterh
|
sure
|
[22:59] pieterh
|
once you internalize the 0MQ semantics the complexity will go away
|
[23:00] pieterh
|
third, for success, you have to start with a small piece, make that work fully, then add one piece at a time
|
[23:00] ssi
|
yes, that's what I've done... I just skipped over the HWM part I think
|
[23:00] ssi
|
trying to go back and make that happen before I get too far along
|
[23:00] pieterh
|
so assuming you're pretty close to having an accurate design
|
[23:01] pieterh
|
I think you can see memory consumption per thread
|
[23:01] pieterh
|
that may not be sufficient, you may need to break the app into multiple processes over tcp://
|
[23:02] pieterh
|
you need to understand why messages are building up
|
[23:02] pieterh
|
HWM is not really the best tool IME for resolving message build up, it's more of an exceptional condition
|
[23:02] ssi
|
well I don't think I even have buildup
|
[23:02] pieterh
|
which you can test for by doing non-blocking writes and checking for EAGAIN
|
[23:03] pieterh
|
you said you have out-of-memory?
|
[23:03] ssi
|
I've artificially slowed down my downstream processes to simulate it
|
[23:03] ssi
|
yes, I have out of memory because I connected to a jms queue with 50 messages on it, each around 10MB in size
|
[23:03] ssi
|
and a 128MB heap in my jvm
|
[23:03] pieterh
|
ah, so it actually works until you do breakage testing?
|
[23:03] ssi
|
and my jms receiver is eagerly consuming EVERYTHING
|
[23:03] ssi
|
I'm trying to find a way to signal the jms receiver that yes, the downstream nodes can accept, or no they can't
|
[23:03] pieterh
|
hmm
|
[23:03] ssi
|
yeah everything works great
|
[23:04] pieterh
|
how much of the guide have you read?
|
[23:04] ssi
|
all of it
|
[23:04] ssi
|
not sure I fully internalized it all
|
[23:04] ssi
|
but I read it all :)
|
[23:04] pieterh
|
so the pattern I'm thinking you need is a least-recently used routing
|
[23:04] pieterh
|
with some way to stop the JMS receiver in case of queue overflow
|
[23:05] pieterh
|
that's just a superficial opinion...
|
[23:05] ssi
|
yes, that make sense
|
[23:05] ssi
|
I don't know if I necessarily need the LRU or not
|
[23:05] pieterh
|
you don't have much traffic, so you can definitely synchronize between downstream and upstream
|
[23:06] ssi
|
I have a document I drew up explaining how my messaging works currently
|
[23:06] ssi
|
I'm trying to get my hands on it so I can show you
|
[23:06] pieterh
|
well, tbh it sounds like you do have it under control...
|
[23:07] ssi
|
I'm just not sure how to make it respect the HWM
|
[23:07] pieterh
|
find the specific case where you think it doesn't respect the HWM
|
[23:07] pieterh
|
no-one will help you debug anything more complex than a minimal test case
|
[23:08] ssi
|
fair enough
|
[23:08] ssi
|
can you point me at what you mean by "check pollout on the send"?
|
[23:08] pieterh
|
well, what I'd do is not use poll for that, it's painful
|
[23:08] pieterh
|
instead, use zmq_send with ZMQ_NOBLOCK
|
[23:08] pieterh
|
and check for an error return with EAGAIN as error value
|
[23:09] pieterh
|
I hope the Java binding exposes this error value, it must
|
[23:09] ssi
|
not sure I fully comprehend, but if I put together the simple test case it may come to me
|
[23:10] pieterh
|
there's a lot still missing from the Guide, unfortunately
|
[23:17] ssi
|
now, I'm correct from my reading that only the PUSH side has an HWM, is that right?
|
[23:18] pieterh
|
both sides will apply a queue limit
|
[23:18] pieterh
|
I'm not sure what the semantics are for HWM at the receiver side
|
[23:19] pieterh
|
it will, I assume, cause the PULL socket to stop reading from the network, which will cause the PUSH socket to eventually stop sending to that PULL socket
|
[23:19] pieterh
|
that means the actual queued data = queues at both sides plus network buffers at both sides plus whatever's in transit over the network
|
[23:19] ssi
|
that's fine, not entirely concerned with how much
|
[23:19] ssi
|
just as long as I can make sure that something stops it
|
[23:20] pieterh
|
hmm, the docs don't specify any exception handling for PULL sockets
|
[23:20] ssi
|
and are they entirely a function of message count? or does message size come into play at all
|
[23:20] pieterh
|
so let's assume they're accurate, and only the PUSH side counts
|
[23:20] pieterh
|
it'
|
[23:20] pieterh
|
it's only a messsage count
|
[23:20] pieterh
|
no byte counts
|
[23:20] ssi
|
ok
|
[23:21] ssi
|
that'll need to be tuned individually then... cause I have to be able to deal with enormous messages and tiny messages both
|
[23:21] pieterh
|
one pattern I've used successfully is credit-based flow control
|
[23:21] pieterh
|
so receiver sends credit messages to sender
|
[23:21] pieterh
|
which only sends when it has credit
|
[23:21] ssi
|
hrm that's interesting
|
[23:22] pieterh
|
it's fairly simple to implement, you need some basic command framing like all the toy protocols in the Guide have
|
[23:22] pieterh
|
then receiver says 'ready' by sending off credit, and tops up as it receives and processes data
|
[23:22] pieterh
|
sender can route based on credit, as well as stop/start sending
|
[23:23] pieterh
|
for this you don't use PUSH/PULL any more, but ROUTER/DEALER
|
[23:23] ssi
|
right
|
[23:23] ssi
|
and i worry about getting away from push/pull, because it so perfectly represents my flow
|
[23:23] ssi
|
(plus routing is still scary)
|
[23:23] pieterh
|
well, dealer is push and pull combined
|
[23:24] pieterh
|
and routing is scary but seems inevitable in many patterns
|
[23:26] pieterh
|
ssi: ok, good luck, I'm off now, it's 1.26am here :)
|
[23:27] ssi
|
ok, thanks for the help
|
[23:27] ssi
|
I hope to have this simple test case working tomorrow
|
[23:30] ssi
|
...or right now... and it definitely behaves the same way. I just need to figure out how to respect the HWM, and I think I'm good to go
|