Friday May 6, 2011

[Time] NameMessage
[01:03] eyecue hai
[01:12] mikko hello
[06:05] ASY i am looking at the guide again. from what I gather, to make a bi-directional communication between two instances of the system (where each instance can do on-demand requests to one another) I have to basically create two PUSH/PULL sockets. When system A wants to ask B something, it will ask on a dedicated PUSH socket and expect response on dedicated PULL socket. am I correct in this
[06:05] ASY understanding? REQ/REP doesn't make sense as I would still need two but then I would have requests coming back on both. however (thinking out loud) I wouldn't be able to synchronize PUSH/PULL, i.e. what if A sends to B something while B requests and waits for reply for something else... Thus I do need REQ/REP which will guarantee me that request receives reply in order. So I have to setup two
[06:05] ASY REQ/REP then. can somone please read this and confirm that I make sense? been too many late nights lately... :)
[08:12] Guthur is the lastest version of 2.1.x available on the main libzmq repo
[08:12] Guthur it only seems to have 2.1.0
[08:12] djc you should get it from zeromq2-1
[08:14] Guthur ?
[08:14] Guthur doesn't exist
[08:14] djc no, zeromq/zeromq2-1
[08:14] Guthur ah, definitely not sold on this multi repo malarky
[08:14] Guthur no one is going to find that
[08:15] djc sure they are
[08:15] Guthur it's not even mentioned here
[08:15] Guthur
[08:16] Guthur plus it uses the old zeromq name and not libzmq
[08:16] djc yeah, well, that's because it has more stuff than libzmq, IIUC
[08:22] pieterh Guthur: malarky?
[08:24] Guthur hehe
[08:24] Guthur sorry it's early, and I haven't had any coffee
[08:25] pieterh There are good reasons for having separate repos, in fact
[08:25] Guthur and have horrid software issues to resolve
[08:25] pieterh get some coffee, man!
[08:25] Guthur pieterh: it was just difficult to find
[08:25] pieterh yeah, I'm fixing that
[08:25] pieterh normally no-one should have go to the repos for the distributions, they'd use the zip/tar packages
[08:25] Guthur I had read there was a 2.1.6, but couldn't find it
[08:26] pieterh main download page
[08:26] pieterh
[08:26] pieterh and that links to the git repo
[09:11] sustrik mikko: hi
[09:11] sustrik what's up?
[09:27] ronny hi
[09:28] ronny what are the current options for securing the transport?
[09:36] pieterh ronny: none, at present
[09:44] ronny any timelines? ot is it a 'done when done item' ?
[09:47] mikko sustrik: nothing anymore, i think i had an application issue
[09:47] sustrik ok
[09:58] pieterh ronny: no timelines, no real agreement on how to make it
[09:58] pieterh there are several options
[09:58] pieterh 1. an SSL/TLS protocol layer (replacing tcp://)
[09:58] pieterh 2. bridging across a secure protocol (e.g. HTTPS)
[09:58] pieterh 3. tunneling across a VPN
[09:59] ronny how abot startls support for non multicast sockets?
[09:59] pieterh my preference would be (2)
[09:59] pieterh ronny: that would be option (1)
[10:00] ronny bridge over secure would use connect or something, wouldnt it?
[10:00] pieterh bridging over a secure transport would use a separate application
[10:01] ronny i dont care too much about 1 vs 2 as long as it works reliably
[10:01] ronny i wonder if (2) could be integrated with mongrel2
[10:01] pieterh ronny: yes, that would be my first choice
[10:01] mikko sustrik: yes application problem
[10:02] pieterh I'd make a proof of concept with a throw-away HTTP server, and a real implementation over mongrel2
[10:02] mikko i was 'new'ing an object and passing over to gui
[10:02] mikko and i had no gui thread running so it wasn't freed
[10:02] ronny bbl
[10:02] pieterh sustrik: hi
[10:04] sustrik morning
[10:04] mikko what's this: nbytes != -1 (mailbox.cpp:242) ?
[10:05] mikko i get it during termination of the application
[10:05] sustrik let me see
[10:07] sustrik it's unexpected error when reading from "maibox" ie. socket pair used for commication between threads
[10:07] sustrik it should have printed out the error
[10:07] sustrik did it?
[10:08] mikko Bad file descriptor
[10:08] mikko nbytes != -1 (mailbox.cpp:242)
[10:08] sustrik ok, obviosuly a bug
[10:08] sustrik is it reproducible?
[10:09] sustrik pieterh: btw, nice to see you've started hacking the core
[10:09] mikko not sure but roughly happens when context goes out of scope
[10:09] pieterh sustrik: well, it was quite a pleasant experiment
[10:09] sustrik one comment: you haven't fixed the MSVC build system
[10:09] mikko and sockets should terminate
[10:09] pieterh sustrik: ah, right
[10:09] sustrik mikko: right
[10:09] sustrik but i haven't seem it myself yet
[10:10] sustrik you get it regularly
[10:10] mikko it's very difficult to reproduce because it happens inside a gui application
[10:10] sustrik or was it just 1-off thing?
[10:10] mikko that requires kinect to run
[10:10] mikko consistent
[10:10] sustrik hm
[10:10] mikko i can try to create backtrace
[10:10] pieterh sustrik: I'll do that, and when/if there's feedback from Fabien that this socket type is useful, I'll propose a patch for 3.0
[10:10] sustrik what's needed is the description of the semantics of the pattern
[10:10] pieterh the funny thing is that I was sketching out the socket types some weeks ago and this showed up as "missing"
[10:11] mikko not meaning to be a kill joy here
[10:11] pieterh bidirectional fanout/in
[10:11] mikko but new functionality developed in 2.2 and then upstreamed to master?
[10:11] mikko guys
[10:11] pieterh mikko: why not
[10:11] sustrik mikko: i think the backtrace won't be of much help in this case
[10:11] pieterh mikko: how else to test a new feature without at the same time migrating to 3.0 api...?
[10:11] sustrik it looks like the problem already happened when the program asserts
[10:12] pieterh sustrik: that mailbox error is familiar
[10:12] pieterh isn't it what you get when you try to work with a closed socket?
[10:12] sustrik right, that can be the case
[10:12] sustrik it happens when you close ther ZMQ_FD
[10:12] pieterh something like that, yes
[10:13] sustrik mikko: any chance you do that?
[10:13] mikko it's possible that i have a race condition in thread shutdown
[10:13] pieterh yeah, quite plausible
[10:13] sustrik i mean, you should never close ZMQ_FD
[10:13] pieterh try commenting out all socket closes
[10:13] pieterh and also don't terminate the context
[10:13] mikko pieterh: c++
[10:14] pieterh see if the problem disappears
[10:14] mikko it closes when auto_ptr goes out of scope
[10:14] sustrik ugh
[10:14] pieterh ah... magic happens
[10:14] sustrik what about using raw fd instead?
[10:14] pieterh simple solution, rewrite in a real language like C...
[10:14] mikko sustrik: raw fd for what?
[10:14] sustrik for ZMQ_FD
[10:14] mikko i am not using ZMQ_FD
[10:14] sustrik ok, then there's a bug in 0mq
[10:15] sustrik what version are you using?
[10:15] mikko now im debugging why the program doesn't terminate
[10:15] mikko 2.1.4
[10:15] sustrik pieterh: have the socket validity checking gone to 2.1.4?
[10:16] sustrik one possibility is that mikko closes the same socket twice
[10:16] sustrik the validity checking would catch that
[10:16] mikko i'm not closing twice
[10:16] pieterh sustrik: not in 2.1.4, but I don't see an entry for it in 2.1.6 either...
[10:17] sustrik ok
[10:17] mikko unless std::auto_ptr is broken in some way
[10:17] mikko it's inside one scope
[10:17] mikko so i wouldn't expect things go south
[10:17] sustrik ok, then it must be a bug
[10:17] sustrik we have to find some way to reproduce it
[10:18] pieterh sustrik: I don't think we downstreamed socket validity checking, only msg checking
[10:18] sustrik ok, np
[10:18] pieterh mikko: have we ruled out accessing a socket after it was closed?
[10:18] pieterh sustrik: we could add this patch, and mikko can at least test that...
[10:18] mikko i can rule that out by moving the auto_ptr to outer scope
[10:19] pieterh definitely worth trying
[10:20] sustrik let me see
[10:20] pieterh sustrik: I don't see any commits for socket validity, not in 2.1, not in libzmq master
[10:21] mikko yes
[10:21] mikko that is the case
[10:21] mikko moving the auto_ptr to outer scope eliminates the assertion
[10:21] sustrik ph: b96fe15bb666e59728b6aa02f28c5838020f6bf3
[10:21] pieterh mikko: yay! /me wins 9,000 internets!
[10:23] mikko it would be nice if it threw zmq::error_t instead of asserting
[10:23] pieterh sustrik: ack... maybe I should port that down to 2.1?
[10:23] jait hey, i'm currently working on a project on process migration and am planning on using zeromq for handling the message passing between systems basically to share their current system load information. I want this information to be broadcasted throughout the network. can this be done?
[10:23] pieterh mikko: this is what we're discussing, checking it and throwing an error
[10:23] mikko jait: yes, sir
[10:24] mikko jait: you would probably want openpgm for that
[10:24] pieterh jait: you need to tell us roughly how many peers, how many messages / second, how large the messages
[10:24] jait ok
[10:25] pieterh mikko: if I get you a patched 2.1.6, can you test that?
[10:26] mikko pieterh: i can
[10:26] pieterh sustrik: did you see that bash 1-liner?
[10:26] pieterh some insane person actually implemented 0MQ in 1 line of shell script
[10:26] sustrik pieterh: yes, i retweeted it even
[10:26] pieterh "implemented" and "0MQ", of course
[10:26] pieterh freaky, huh :)
[10:26] sustrik i like it
[10:27] pieterh me too, it's very unprotocol
[10:27] pieterh
[10:27] pieterh it's how to design protocols
[10:27] sustrik i see
[10:27] pieterh if you can't implement a protocol in 1 line of bash, you're doing something wrong
[10:27] sustrik +1
[10:28] pieterh sustrik: mikko: I'm going to downstream b96fe15 and provide Mikko with a new version to test
[10:31] jait this being a distributed system the number of peers, will vary during runtime, but for experimental purposes i'll be running five peers, the messages per second will not exceed 20 and message size will be approximately 30bytes we're only sharing the system load information
[10:32] pieterh jait: so use tcp, not pgm, and cross-connect the peers in a pub/sub architecture
[10:33] pieterh if you get more peers, stick a forwarder device in the middle
[10:33] pieterh i.e. everyone publishes to that device, everyone subscribes to that device
[10:33] jait ok got it
[10:33] pieterh if you had thousands of peers and thousands of messages / second, you'd want to use pgm
[10:34] jait ok
[10:34] jait thanks a ton
[10:35] pieterh np
[10:35] jait i'll try it out and get back if i have a problem
[10:35] jait thanks again
[10:37] pieterh sustrik: b96fe15b applied cleanly to 2.1, I'm testing it now
[10:37] sustrik ok
[10:40] pieterh mikko: it's ready
[10:40] pieterh if you checkout git:// then you'll have the patch on master
[10:53] jait ok this may seem n00bish but can a subscriber connect to multiple publishers? i'm right now using zmq_connect(subscriber,"tcp://*:5555) where subscriber is a pointer to zmq socket. is it correct?
[10:55] sustrik no
[10:56] sustrik you have either to connect to different publishers one by one
[10:56] pieterh jait: you connect to a specific IP address
[10:56] sustrik or bind in sub socket and connect from pub sockets
[10:56] pieterh you bind to an interface, and tcp://*:5555 means "all interfaces on this box"
[10:57] jait understood
[10:58] pieterh however, a subscriber can connect to multiple publishers, one by one, specifying their IP address each time
[10:59] pieterh this gets annoying when you have more than a few peers, which is why we recommend using a device in the middle
[10:59] pieterh a device is simply a 0MQ loop that connects a sub to a pub
[10:59] pieterh mikko: ping?
[11:01] jait ok got it
[11:01] mikko y?
[11:01] mikko ok
[11:15] mikko pieterh: testing in a sec
[11:15] mikko need to sort out some lunch first
[11:16] pieterh mikko: ok, no hurry
[11:31] mikko i wonder if i could bring the kinect to the unconf
[11:31] mikko to demo the software
[11:32] mikko pieterh: i get an exception saying "Context was terminated"
[11:32] mikko i think
[11:32] mikko unless i've fixed the issue somewhere else
[11:32] pieterh mikko: that's neat
[11:32] mikko no assertion in any case
[11:32] pieterh it means you're trying to do a socket access after closing the socket
[11:32] pieterh closing the context, I mean
[11:33] pieterh hmm, or somesuch... normally you'd expect an ETERM at that stage
[11:33] pieterh mikko: yes, bring the kinect, that'd be fun
[13:51] CIA-75 libzmq: 03Martin Sustrik 07master * r0c5b781 10/ src/xrep.cpp : urrent pipe pointer in XREP out of range -- fixed. ...
[14:14] pieterh sustrik: btw your commit message lost its first 'C'
[14:14] pieterh that 1 liner, it crashes 0MQ with 'Assertion failed: (msg_->flags | ZMQ_MSG_MASK) == 0xff (zmq.cpp:223)'
[14:24] sustrik sorry>
[14:25] sustrik ?
[14:25] pieterh bleh... sorry, I was testing that bash 1-liner
[14:25] pieterh it actually works, that assertion disappeared when I rebuilt zmq, not sure what was going on
[14:26] sustrik i see
[14:26] pieterh but your commit message lost its initial 'C' anyhow
[14:26] sustrik i guess it cannot be fixed
[14:26] pieterh not if you've pushed the repo, no
[14:26] pieterh i fixed it in 2.1/2.2
[14:26] sustrik i did
[14:26] sustrik ok
[14:26] pieterh it'd be real nice if we could make rep sockets resistant against bad frames, in 2.1
[14:27] pieterh that's the assertion: msg_->flags & ZMQ_MSG_MORE (rep.cpp:81)
[14:27] sustrik yes
[14:27] sustrik the problem is that to achieve that you have to move the protocol state machine into I/O thread
[14:28] sustrik not a trivial change
[14:28] sustrik as currently the I/O thread just blindly passes the frames to the application thread
[14:28] pieterh sure
[14:28] pieterh can't I just discard the message if it's invalid, and loop?
[14:28] pieterh doesn't need to change state at all
[14:29] sustrik good idea
[14:29] sustrik it's not perfect as the offending connection should be dropped
[14:29] sustrik but will do as a workaround
[14:29] pieterh indeed, it'll leave the connection alive
[14:29] pieterh but it'll do...
[14:30] pieterh ok, I'll give it a shot, I guess it's all in the zmq::rep_t::xrecv method?
[14:30] sustrik yep, just read the rest of the message from the pipe and drop it
[14:30] pieterh ok
[14:33] pieterh is there any way to clear stuff that's already been sent to the reply pipe?
[14:33] sustrik rollback()
[14:34] pieterh ack
[14:34] sustrik you have send, flush and reollback
[14:34] sustrik rollbacks rollbacks everything that's not flushed yet
[14:34] pieterh neat
[15:01] pieterh sustrik: yay, it works...
[15:01] sustrik :)
[15:01] pieterh I'll push a branch, you can review the code, it's pretty simple
[15:01] sustrik ok
[15:05] pieterh
[15:06] pieterh I'll make a patch that applies to 3.0 and send it to the list...
[15:19] pieterh sustrik: ok, patch sent to list
[15:20] sustrik revierwing it...
[15:20] sustrik it looks like it won't do
[15:20] sustrik the problem is that xrep socket is reading frames from pipe and passing it to the app
[15:20] sustrik then it finds out that the message is invalid
[15:21] sustrik some frames were already passed to the user
[15:21] sustrik it's probably needed to pick the whole message from the pipe, validate it and either deliver it or drop it
[15:22] pieterh so, afaics, nothing has been passed to the app when this code kicks in
[15:22] pieterh it is parsing the envelope and shoving it down the out pipe
[15:22] pieterh it expects an empty frame
[15:22] pieterh if it meanwhile gets a frame without MORE, it rollsback, and restarts
[15:25] pieterh this is all happening in the REP socket, where the app doesn't see the envelope, right?
[15:25] sustrik ah, i misread the patch
[15:25] sustrik sorry
[15:25] sustrik so there's no validation xrep level
[15:25] pieterh right
[15:26] pieterh I'm sure there'll be other problems to catch but this seems to be the only one with rep
[15:26] sustrik so it'll fail if it gets an invalid message, no>
[15:26] sustrik ?
[15:26] pieterh well, depends on definition of 'invalid'
[15:26] pieterh the actual assert was hitting in a very specific case
[15:26] sustrik one without empty frame
[15:26] pieterh malformed envelopes
[15:26] pieterh e.g. send simple message to REP socket, boom!
[15:27] sustrik ah, ok
[15:27] pieterh that was the 1-liner
[15:27] sustrik i thought you are trying to make it resilient against random data from the network
[15:27] pieterh well, yes, this is part of it
[15:27] pieterh in this case 'random' means 'valid to some sockets but toxic to REP'
[15:28] pieterh it also seems to cover a lot of the futzing, but I'll test that later
[15:28] sustrik what was the original assert you were solving?
[15:28] pieterh as written here:
[15:29] pieterh it was the same assert that the futzing raised a few days ago
[15:29] pieterh so it hits in two cases
[15:29] pieterh a. normal message but without envelope, sent to REP socket
[15:29] pieterh b. pretty much any random data, by experimentation
[15:29] sustrik 50% of them
[15:29] sustrik given it's a bit flag
[15:30] pieterh well, 'valid' for REP is quite complex
[15:30] pieterh 0 or more MORE frames, plus an empty MORE frame
[15:30] sustrik yep, state machine is needed
[15:30] pieterh i.e. 65534 combinations out of 65535 are invalid
[15:31] pieterh I mean, this approach will catch all 64k random errors except one
[15:31] pieterh to craft an attack now you have to make a valid envelope, and then invalid data
[15:31] sustrik ?
[15:32] sustrik the error condition you catch now is "there's no MORE flag on first frame"
[15:32] sustrik righjt?
[15:32] pieterh the error condition is "anything up to and including the end of the envelope that's not got a MORE flag"
[15:32] pieterh that was the assert
[15:33] sustrik ok
[15:34] pieterh I need to set up some more wide-randing futzing tests, all socket types
[15:34] pieterh but not now...
[15:34] pieterh *wide-ranging
[15:49] iFire neopallium is there a windows threading lua library?
[15:53] staylor I received an assertion for zmq_connecter.cpp:48 on windows, from my understanding asserts in the library should be reported as bugs yes?
[15:55] pieterh staylor: yes, in general
[15:55] pieterh but in some cases it will be due to bugs in the application, not zmq
[15:55] staylor I believe it is caused during sleep/wakeup but I'll do more tests to find out for sure
[15:57] pieterh staylor: yes, looks like 0MQ is trying to reconnect while the network interface is still coming up
[16:15] iFire neopallium perhaps boost threads?
[16:15] iFire or c++0x threads
[16:16] iFire hmm probably not a c++ library
[16:17] pieterh iFire: would it help to have C win32 threading functions?
[16:22] iFire well I'm trying to see what's out there
[16:23] pieterh iFire: I've ported the minimal pthreads code to win32 as part of czmq
[16:23] pieterh only for thread creation
[16:24] pieterh I'm happy to provide that code for wrapping in the Lua binding if it'd help
[16:24] iFire it's mit right?
[16:24] iFire hmm
[16:24] iFire lgpl with linking exception?
[16:25] iFire that'll be pretty good
[16:27] iFire pieterh what is the source filename?
[16:27] pieterh iFire: I'll relicense it any way needed
[16:27] pieterh it's in the zthreads class, at
[16:27] pieterh the code would need some modification anyhow, it's bound into the way czmq wraps threads
[16:28] staylor pieterh: is there a work around for that or should I post a bug report for it?
[16:28] pieterh staylor: post an issue, please, and explain how to reproduce it
[16:28] pieterh staylor: it's a fairly typical problem for Windows servers
[16:29] staylor pieterh: sounds good thanks.
[16:33] pieterh staylor: I don't know if it's possible to get a stack trace, but that'd definitely help
[16:36] staylor pieterh: I can look, might be screenshots though because I'm not sure what visual studio will let me do with that
[16:37] pieterh if you can run it from inside VS, you should get a stack when it asserts
[16:37] pieterh screenshots are fine, of course
[16:37] staylor yeah, just not sure how to export it never had to but I'll figure something out
[17:25] rsain pieterh: alright I posted the bug report to github with the backtrace and I can post more examples if needed.
[17:58] guido_g the world is full of magic :)
[18:19] staylor I just updated from zmq 2.1.4 to 2.1.6 on windows and am now getting a assert on (msg_->flags | ZMQ_MSG_MASK) == 0xff in zmq.cpp:223
[18:59] mikko staylor: interesting
[19:03] sustrik staylor: i believe that one was already fixed
[19:04] staylor mikko: switching back to the 2.1.4 works fine, I'll put together a small test sample when I get a chance is needed
[19:04] sustrik check with current 2-1
[19:54] trbs I'm running a slight variation on the weather-proxy example ( on 2.1.7 and noticed that after a couple of days the process started to crow / leak memory. Is there a way to get some information out of zmq to know what exactly is taking up the space (or if it's a memory leak) ?
[19:58] sustrik trbs: no, but there are some diagnostic tools for that
[20:04] trbs ok thanks
[20:04] trbs restarted the progress and everything works fine again now... also running a test setup on my local system with much more msg/s then the other server is doing.. and everything still looks fine
[20:05] trbs so i'll just keep my eye on it for some days :)
[20:27] mikko trbs: maybe on the server the consumers are slower?
[20:27] mikko not sure what does
[21:58] trbs mikko, i have a simple program that has a pub and a sub socket that bind on two ports on that machine. sub socket is subscribed to all messages and pub socket has HWM=1. then in the mainloop that just copy's all messages received socket and (re)send them on the pub socket
[21:59] trbs (except for the hwm setsockopt my code looks identical to )
[22:00] trbs sorry identical except for hwm setsocketopt and that the zmq.SUB port does a bind() not a connect