Monday October 4, 2010

[Time] NameMessage
[08:19] mikko erlzmq and jzmq seems to be broken against maint branch
[08:42] mikko also pyzmq stopped building after last night
[08:42] mikko hmm
[09:50] ptrb so I'm noticing that I can easily run my system out of memory if I try to publish a ton of messages (10MM) to a ZMQ_PUB socket. is there a way I can limit the max amount of memory that will take? some sockopt or something?
[09:51] Zao Can't you set a highwatermark or something?
[09:52] ptrb ok, that was probably the keyword I was looking for
[09:54] sustrik mikko: erlzmq never really worked with 2.0.x
[09:54] mikko
[09:55] mikko i added almost all bindings
[09:55] sustrik nice
[09:56] sustrik jzmq is being discussed on the mailing list
[09:56] sustrik not sure what happened to pyzmq
[09:56] lestrrat what do I need to do in order to add Perl to this list ? :)
[09:56] ptrb noooo :(
[09:58] sustrik mikko: would it be possible to send an automated email to the list if something goes wrong?
[09:58] mikko sustrik: yes
[09:58] mikko sustrik: im testing 'on-demand' builds next
[09:58] sustrik :)
[09:58] mikko so that people can put a machine on, build executes
[09:58] mikko and they can turn off
[09:59] mikko i think it's possible via hudson api
[09:59] mikko make a small script that runs on @reboot of the machine
[09:59] mikko and posts to master hudson to start builds
[09:59] mikko lestrrat: let me know what to add
[09:59] mikko lestrrat: i added most of the things under zeromq account in github
[09:59] mikko things from*
[09:59] lestrrat mikko:
[10:00] mikko lestrrat: i'll add that in a minute
[10:01] mikko zeromq master/maint runs on midnight GMT
[10:01] lestrrat thanks.
[10:01] mikko and all bindings are compiled against both
[10:01] mikko i can add people accounts later on as well
[10:01] mikko as soon as the setup is stable
[10:06] mikko lestrrat: how do i build it?
[10:06] lestrrat perl Makefile.PL; make; make test
[10:06] lestrrat ah, but you would need to install some Perl modules..
[10:08] lestrrat are you ok installing some Perl modules in your system? when you run perl Makefile.PL, it will tell you the required modules
[10:08] lestrrat let me know if you need help installing those modules
[10:12] mikko yes
[10:16] mikko lestrrat: how do i specify a custom libzmq path?
[10:17] lestrrat ZMQ_H=/path/to/include/zmq.h perl Makefile.PL
[10:17] lestrrat are the shared objects also in places where you need explicit specification?
[10:17] mikko yes
[10:17] mikko because i build multiple versions of libzmq
[10:18] lestrrat hold on, verifying ....
[10:21] lestrrat hmm, I guess I hadn't implemented it :/
[10:21] lestrrat Will let you know when you can do it.
[10:21] lestrrat sorry bout that
[10:22] mikko no problem
[10:22] mikko i'll add it when you got it
[10:31] lestrrat mikko: LIBS=-L/path/to/libs INCLUDES=/path/to/includes ZMQ_H=/path/to/zmq.h perl Makefile.PL
[10:31] lestrrat just pushed to master.
[10:34] omarkj morning all.
[10:44] mikko lestrrat: do i need ZMQ_H if i use INCLUDES?
[11:00] mikko lestrrat: i installed the required modules from cpan
[11:00] mikko but it still complains about them missing
[11:00] mikko do i need to do something else?
[11:01] mikko should INCLUDES be -I/path/to ?
[11:20] ptrb should I be cleaning up zmq_sockets, or in general doing anything more than zmq_term() at the end of my ZMQ lifecycle?
[11:21] mikko zmq_close(s);
[11:22] ptrb oi!
[11:25] ptrb that does resolve quite a few 'possibly lost' blocks, thanks :)
[11:27] mikko lestrrat: nevermind, there was a build failure among the output
[11:34] omarkj Hey guys. I'm having some problems with the zmq driver for erlang. It seems to eat up my memory until it comes crashing down with the error "Assertion failed: nbytes == sizeof (command_t) (signaler.cpp:284)"
[11:35] omarkj nbytes in this case is 0.
[11:38] mikko omarkj: that is a known issue
[11:38] mikko can you check the value of nbytes there?
[11:38] mikko (i think that was what sustrik was after the other day)
[11:39] omarkj Yup, it's zero at the time of failure.
[11:39] omarkj mikko: Yes, that's what he was after..
[11:43] mikko lestrrat: finally!
[11:43] mikko there were a couple of things that were missing
[11:43] mikko but now it seems to build
[11:43] mikko but tests get stuck
[11:44] mikko either t/002_socket.t ........... 1/? takes a lot of time or its frozen
[11:45] mikko yep, against maintenance of ZeroMQ everything works
[11:45] mikko that test is waiting on something against master
[11:52] mikko lestrrat:
[12:10] sustrik omarkj: thanks
[12:14] sustrik omarkj: does that happen during the shutdown on the socket/library?
[12:41] mikko hmm, i now got a script that starts hudson slave on a machine and executes builds assigned to that machine
[12:45] nisbus sustrik: this happens while in a tight send loop, see erlzmq issues:
[12:55] sustrik nisbus: looking on the erlang errors in the issue it almost looks like some kind of memory overwrite bug
[14:23] CIA-20 zeromq2: 03Steven McCoy 07master * rd62d721 10/ : Add amd64 to OpenPGM supported platforms -
[14:23] CIA-20 zeromq2: 03Martin Lucina 07master * r965fb77 10/ : OpenPGM no longer requires pkg-config -
[14:29] ptrb is zmq_send threadsafe?
[14:30] sustrik no
[14:35] ptrb thanks
[14:58] mikko lestrrat: the perl one is def stuck
[14:58] mikko lestrrat: its still building (several hours now)
[15:21] mikko bgranger: did something happen in the past 12 hours for pyzmq?
[15:22] mikko i added builds last night (GMT) and this morning they were broken
[15:22] bgranger Yes, what are you seeing...
[15:22] mikko bgranger:
[15:22] mikko pyzmq went unstable this morning
[15:22] mikko
[15:23] mikko thats the output
[15:24] bgranger Ahh, we are no longer including the .c files in the git repo. We are including those in the released versions though.
[15:24] bgranger To fix this:
[15:25] bgranger Download and install the latest version of Cython (>= 0.13)
[15:25] bgranger Then do the following before building and installing
[15:25] bgranger python cython
[15:25] bgranger The .c files change often and were making the repo massive (they are autogenerated)
[15:32] mikko looks like debian testing doesnt have new enough cython
[15:34] CIA-20 jzmq: 03Gonzalo Diethelm 07master * rbe0aef9 10/ src/Socket.cpp : Handle 32 and 64 bit [gs]etsockopt options with specific code for each size. -
[15:34] CIA-20 jzmq: 03Gonzalo Diethelm 07master * rda2e47d 10/ src/Socket.cpp : Left out changes for setsockopt. -
[15:38] bgranger Yes Cython 0.13 is quite new
[15:38] bgranger I can help you get cython installed...
[15:40] bgranger mikko: Also after doing the build, you could do python test to run the test suite
[15:40] mikko
[15:40] mikko building now
[15:40] mikko sure
[15:41] mikko ill add the test suite step
[15:41] mikko build succeeds now
[15:41] mikko let me try the tests
[15:42] mikko running test
[15:42] mikko ...Assertion failed: sessions.empty () (socket_base.cpp:117)
[15:42] mikko
[15:44] mikko bgranger: how do i "make clean" ?
[15:44] mikko python clean ?
[15:44] bgranger Yes, that should clean the .so files
[15:46] mikko cool
[15:46] mikko now the build works
[15:46] mikko but tests fail
[15:46] mikko ...Assertion failed: sessions.empty () (socket_base.cpp:117)
[15:46] bgranger What version of zeromq are you building against?
[15:46] mikko master and maint branches
[15:46] mikko master is the one that fails now
[15:46] bgranger We don't have any code that is tested with zeromq master.
[15:47] bgranger Rigth now pyzmq master is a 2.0.9 dev build, so I would build it against 2.0.9 or zeormq maint
[15:47] bgranger After we release 2.0.9 we will create a new maint branch for 2.0.x and then move master to following zeromq master...
[15:48] mikko ok
[15:48] mikko at the moment the automated build runs your master against zeromq master and maint
[15:48] mikko i can change that if needed
[15:48] mikko i need to test buildbot at some point as well
[15:53] mikko so, jzmq against zeromq maintenance is broken atm
[15:53] mikko but i guess thats a pretty easy fix when the details have been agreed
[15:54] mikko after the*
[15:54] bgranger ok
[15:54] bgranger Is the build bot site linked to from the main site?
[15:54] mikko not yet
[15:54] mikko it's just a prototype at this point
[15:56] bgranger OK, I will be back in about an hour...
[16:14] bgranger mikko: ok I am back around now...
[16:15] mikko howdy
[16:21] sustrik mikko, bgranger: it looks like you've found a bud in master
[16:21] sustrik do we know which test have failed?
[16:21] bgranger Very likely
[16:21] bgranger It will take a bit of digging because it if failing in a c++ assert.
[16:22] sustrik the output says:
[16:22] sustrik + python test
[16:22] sustrik running test
[16:22] sustrik ...Assertion failed: sessions.empty () (socket_base.cpp:117)
[16:22] sustrik Aborted
[16:22] mikko sustrik: the setup is already paying off!
[16:22] bgranger We will probably get to that when we port pyzmq to the 2.1 stuff.
[16:22] mikko :)
[16:22] sustrik if there are multiple tests, would it be possible to print out the test name before running it?
[16:22] sustrik :)
[16:22] bgranger Also having a test suite like that of pyzmq help a lot as well
[16:22] sustrik definitrly
[16:22] bgranger Yes, we can do that
[16:23] mikko perl, python and php have tests which i know how to run
[16:23] mikko jzmq might have but i dont know yet how to run them
[16:23] bgranger That is great - it starts to give pretty good test coverage
[16:23] mikko perl also hangs on master
[16:23] mikko the tests
[16:23] mikko noticed that the build took ~5 hours
[16:24] mikko which is a bit above the average for a binding
[16:24] sustrik :)
[16:24] bgranger A bit...
[16:24] bgranger
[16:25] bgranger Ideally whenever a big is fixed in zeromq itself, we could add a test to pyzmq that tests the fix...
[16:25] sustrik thanks!
[16:25] bgranger But I am not following zeromq that closely
[16:26] sustrik there's a test suite in 0mq core itself
[16:26] sustrik not much tests there yet
[16:26] mikko sustrik: is it make test ?
[16:26] sustrik make check
[16:26] mikko ill add that to builds as well
[16:26] sustrik ack
[16:27] sustrik mikko: wait a sec
[16:27] sustrik it's only added to the master
[16:27] sustrik the maint has no tests
[16:27] mikko ok
[16:28] mikko added the build step for master
[16:28] mikko
[16:28] mikko let's see how it goes
[16:30] mikko sustrik:
[16:30] mikko what do you think about his for close semantics?
[16:30] mikko too far away from posixy behavior?
[16:31] mikko zmq_close would return the amount of messages in flight
[16:31] mikko and would let user to decide what to do with timeouts etc
[16:31] sustrik people want it simple
[16:32] sustrik they don't want to even care about shutdown details
[16:32] sustrik it should "just work"
[16:32] sustrik i think POSIX sockets got this part right
[16:32] sustrik i.e. don't block on close
[16:33] sustrik send data after the close
[16:33] sustrik allow users to tweak close timeouts using SO_LINGER
[16:33] mikko /home/hudson/.hudson/jobs/ZeroMQ2_master/workspace/tests/.libs/lt-test_shutdown_stress: error while loading shared libraries: cannot open shared object
[16:34] mikko no -rpath?
[16:34] mikko i assume
[16:34] sustrik i dimly recall tweaking something with rpath very long time ago
[16:35] sustrik anyway, mato is maintaining the build system
[16:35] sustrik so just post the output on the mailing list or so
[16:37] mikko brb
[18:34] mikko mato: here?
[19:11] mikko
[19:11] mikko this fixes the tests for me
[19:17] RyanSchneider hey guys, anyone have a minute to answer some questions about using zmq in a problem I'm working on involving 50+ nodes?
[19:23] mikko RyanSchneider: sure
[19:25] RyanSchneider So here's the problem: I'm doing load testing on our game, and need to launch load testing apps across 50+ EC2 instances, with each instance hosting ~5 processes.
[19:26] RyanSchneider I want the processes on each instance to talk to an aggregator on the instance, and have each of the 50 aggregators talk to a 'master'.
[19:26] RyanSchneider The master knows the name/password to use for each load tester process. I want to hand out each account once at startup, and replace it if a process goes down.
[19:27] RyanSchneider So I'm thinking the startup would go something like this:
[19:27] RyanSchneider - Aggregator launches, and sends 5 'Request an Account' messages to Master
[19:28] RyanSchneider - For each reply, it spawns a LoadTester process configured with that account.
[19:29] RyanSchneider - The aggregator also acts as a Device of some sort between LoadTester and MAster so LoadTester can keep Master up to date with it's status.
[19:30] RyanSchneider The part I'm struggling with is how to determine that a LoadTester has 'gone down' (e.g. crashed) and to reclaim that account for another Aggregator/LoadTester process to use.
[19:30] RyanSchneider Should I use some sort of 'keep-alive'/timestamp messages?
[19:31] mikko do you want to check for both Aggregator and LoadTester to go down?
[19:31] mikko or are you relying on the Aggregator to stay up?
[19:32] RyanSchneider Hmm, good question. While under development I'd expect Aggregator's to occasionally crash, so I guess I want to handle that too. :)
[19:33] RyanSchneider But my main concern is that I have X LoadTesters running with Accounts A1...AX.
[19:34] mikko hmm
[19:35] mikko i guess you need some sort of heartbeat in two places
[19:35] mikko from Master -> heartbeat -> Aggregator -> heartbeat -> LoadTester
[19:36] mikko currently there is no way to detect a socket disconnection in the code
[19:36] RyanSchneider Yeah that's about what I was thinking. What socket types (e.g. REP/REQ) work best for heartbeats?
[19:37] mikko probably yes
[19:37] mikko it _should_ be something as simple as "ping? pong!"
[19:38] mikko why do you have loadtesters in separate processes from the aggregator?
[19:38] mikko wouldn't it be easier to manage a pool of threads?
[19:38] mikko (depending on language of course)
[19:39] RyanSchneider The LoadTester is a stand-alone process built by someone else. It's basically a headless version of our game.
[19:39] RyanSchneider Basically I run one LoadTester for each core on the EC2 instance.
[19:40] mikko ah
[19:40] RyanSchneider I can add code to LoadTester, but just want to add simple things like 'Send a heartbeat with these stats' rather than add a bunch of Aggregator-ish logic.
[19:41] mikko makes sense
[19:41] RyanSchneider But LoadTester is a C++ app, and I'd prefer to write the MasterAggregator using PyZMQ
[19:41] RyanSchneider er But -> Plus
[19:41] RyanSchneider So I guess I can do it like this:
[19:42] RyanSchneider - Every X seconds, LoadTester sends status packet to Aggregator.
[19:42] RyanSchneider - Aggregator assumes LoadTester hung if status packet isn't received in Y seconds.
[19:42] RyanSchneider - Aggregator sends status packet to Master every A seconds.
[19:43] RyanSchneider - Master assumes Aggregator is hung if status packet isn't received in B seconds.
[19:43] RyanSchneider So now the next big issue is how do I send A1...AX accounts one to each LoadTester?
[19:44] mikko just a sec
[19:44] mikko ill draw this down
[19:44] RyanSchneider cool, I just logged into creately to do the same :)
[19:45] mikko so initial thoughts
[19:46] mikko does master know about all aggregators?
[19:46] mikko i think it would be better if Aggregators would know about master
[19:46] mikko and register themselves to the master
[19:47] mikko and if you got enough LoadTesters running already the Aggregator would be in "idle queue"
[19:48] RyanSchneider yeah, I think that makes sense. Master binds, Aggregators connected. Likewise I think LoadTester's would connect() to a zmq socket that the Aggregator bind()s.
[19:49] RyanSchneider We can also assume that we can always start-over with a clean slate (that is, when we do a test, Master and all EC2 instances can be restarted)
[19:50] RyanSchneider Doh I forgot I have a meeting in 10 minutes :\
[19:51] RyanSchneider So I think the startup process is more or less like:
[19:51] RyanSchneider - Aggregator connect()s to Master.
[19:52] RyanSchneider - Aggregator sends X 'Request Username' packets, where X is number of LoadTesters it wants to spawn.
[19:53] RyanSchneider - Master replies to each request when an Username U1..UX is available.
[19:53] RyanSchneider - On reply, Aggregator spawns LoadTester and assigns it Username Ux.
[19:53] mikko When Master gives out a username it would also give an address where to report periodically
[19:54] RyanSchneider Would that address be used by Aggregator or directly by LoadTester?
[19:54] RyanSchneider I'm guessing Aggregator
[19:55] mikko i would say yes
[19:55] mikko let me think for a bit
[19:56] mikko master is not likely to go down?
[19:56] mikko i guess thats handled on the aggregators by stopping the loadtesters
[19:57] mikko so the master could bind one port for status reports
[19:57] RyanSchneider Correct. If Master goes down, then I'd say its safe to assume the entire test cancelled.
[19:57] mikko aggregator reports two things: that it's up and which loadtesters are active and sending
[19:57] RyanSchneider Grr, got my meeting now, I'll be back.. later, maybe not for a couple hours..
[19:57] RyanSchneider Thanks a ton mikko! :)
[19:57] mikko i might be in bed by then
[19:57] rgl humm mongrel2 has similar requirements. you should chek it out RyanSchneider
[19:57] mikko GMT
[19:58] mikko mongrel2 uses two sockets for the communication iirc?
[19:58] rgl yes.
[19:58] RyanSchneider Thanks rgl I'll do that.
[19:58] RyanSchneider bbl. I'll finish my graph at some point and post a link here as well.
[19:58] rgl a PUSH/PULL and a PUB/SUB
[19:59] mikko PUB/SUB would be good for status reports i guess
[20:03] rgl I've got something that is itching me... normally sockets have a high watermark line (or backlog I guess). for example, when using a PUSH/PULL and a HWM of 2, I can push two messages before being blocked waiting for more space. correct?
[20:07] mikko yes
[20:07] mikko well, you can push more
[20:07] mikko but 2 can be pending in the io-thread
[20:07] mikko as far as i understand
[20:08] mikko if someone is consuming on the other side with the rate that you are sending then it wouldnt block
[20:11] rgl my itch is, what happens if the consuming side dies? the messages that were already queued in the consuming side seem to get lost in the void...
[20:11] mikko yes, that would be the case
[20:11] rgl or, if the consuming side wants to correctly close the socket?
[20:11] rgl I didn't find a way to: 1. halclose the socket. 2. consume the queue messages. :|
[20:12] mikko as far as i know the guarantee goes "when it reaches the consumer it's considered to be delivered"
[20:12] rgl err halfclose
[20:12] rgl that what I understood too :(
[20:12] mikko you could use REQ/REP and answer with "send me more" or "dont send anymore"
[20:12] mikko adds a bit of overhead
[20:13] mikko or have a second port open on the sender where the receiver can tell it's had enough
[20:17] rgl humm how would that work?
[20:17] rgl you'd also include the sender address in every message?
[20:18] rgl (btw, this is what mongrel2 does for the response on the PUB/SUB socket)
[20:28] mikko rgl: yes, it would probably include sender identifier
[20:29] mikko unless you have something like +1 port or similar
[20:29] mikko depends on the case i guess
[20:30] rgl humm but how do you known how sent the message? that is, to retreive its source port address?
[20:31] rgl errr s,how,who,
[20:31] mikko peer id 1 -> port + 1
[20:31] mikko peer id 2 -> port 2
[20:31] mikko but that would possibly require a lot of port
[20:31] mikko s
[20:32] rgl how to get the peer id?
[20:32] mikko something you would generate
[20:32] rgl this is mutation to another doubt of mine *G*
[20:32] rgl mutating!
[20:33] rgl but would also include in the message?
[20:52] neale I'm looking at building 0MQ on Linux on System z and enabling openpgm. The issue is that x86 etc. use rdtsc to get the timer. For system z I have the option of using gettimeofday (CONFIG_HAVE_GETTIMEOFDAY) or adding the equivalent of rdtsc (trivial). However, in the root of zeromq2 has an explicit -DCONFIG_HAVE_TSC. Could it use a test instead in to establish whether there is a TSC or that gettimeofday() is available?
[21:26] RyanSchneider mikko or anyone else interested, I think I'm basically going to go with a 'folded' PUSH/PULL topology like 'Divide and Conquer' in the Guide, but the Master is both the Ventilator and the Sink (using separate sockets of course)
[21:28] RyanSchneider Sigh.. I just built a nice diagram in Creately, but wasn't logged in, and lost it when Iogged in..
[21:28] RyanSchneider lemme recreate it real quick.
[21:42] RyanSchneider So here's my plan:
[22:29] lestrrat mikko: Thanks for the setup I'll look into it!
[22:56] RyanSchneider So I think i need to use REQ/REP instead of PUSH/PULL for the config data. I basically want the Master to have a queue of available accounts, and pop off and reply with the next one on an incoming request.