Saturday August 21, 2010

[Time] NameMessage
[01:14] rbraley say, what does zero-copy mean for message passing?
[01:52] zedas sustrik: hey so i've found a potential bug in zmq_poll
[01:52] zedas sustrik: i'll work up a fix, but you go into a permanent loop when poll sets EINTR errno
[01:52] zedas it still seems to work, but it's just calling poll like mad pegging that thread at 100%
[01:53] zedas it's also why 0mq sockets seem to "die". we traced it down in mongrel2 to the line in zmq.cpp that does the continue on EINTR
[01:55] travlr rbraley:
[02:10] rbraley check the tech talk
[02:11] rbraley I want to see if I can do this with C++ and ZeroMQ
[02:28] travlr rbraley: after reading the first sentence, i saw that's the concept you've been reaching for with zmq
[02:28] rbraley yes!
[02:28] travlr use the source luke... use the source ... lol
[02:30] travlr you should start a new thread on zeromq-dev mail list for this topic... you'll get some interest and suggestions to boot.
[02:32] rbraley travlr, Since this technique requires code transformation I am thinking about either seeing if I can do it with the preprocessor(kinda doubtful) or maybe extending Qt's meta object-compiler to support the Actor Model
[02:33] travlr ok,, here's a stupid question i've been thinking about.. this is neat stuff, but are you sure you need it?
[02:33] travlr btw the actor model is very close to fbp itself
[02:35] rbraley basically the concept for an ultra-lightweight process a la erlang is just like process isolation for OS processes, they don't need to know about the other ones at all and shouldn't, unless you send messages. But they are much much smaller than threads. That means they can fail in isolation and your program doesn't die unless you tell it to die.
[02:37] rbraley yes I am sure I need it. My game architecture requires that every entity in the game be an actor. That means that threads can't cut it unless I only want 1000 things in my game.
[02:38] travlr and individual processes (executables) don't have zero-copy so that scratches that idea i suppose.
[02:38] jsimmons why should you have an individual thread per actor?
[02:39] rbraley and they're huge and would mess up your system process monitor, like $ ps -eaf or $ top or some gui equivalent
[02:39] travlr true, with inproc maybe you don't need threads
[02:40] rbraley messages aren't the only concurrent things that happen. Actors still need to do stuff. And they shouldn't have to wait for each other unnecessarily
[02:40] jsimmons aside from the difficulty in implementation, having actors individually threaded is horribly inefficient
[02:41] travlr i see. you need light weighted threads ;)
[02:41] rbraley I do want to multiplex actors onto different system threads, one thread per core.
[02:41] travlr you'll do the gaming community a big service if you do implement it zmq
[02:42] jsimmons you mean co-routines or cooperative multithreading
[02:42] jsimmons I presume
[02:43] rbraley jsimmons, it is possible to do preemptive scheduling of ultra-lightweight threads (as I think only erlang does right now). But yeah I would transform functions into continuation-passing style
[02:45] rbraley and each actor would get its own stack and context switch between other actors on the same thread entirely in user space making it around 3 orders of magnitude faster than context switching between user space and kernel space that threads do.
[02:46] rbraley sorry, the first occurrence of context switch in my previous statement was used incorrectly
[02:46] jsimmons my point is, are you really going to make it faster than say brute forcing or executing in blocks
[02:47] rbraley brute forcing?
[02:47] jsimmons as in for actor in actors do actor.update()
[02:47] jsimmons I should say 'buckets' rather than blocks
[02:48] travlr you'd probably save yourself hassle if you implemented in java with kilim, but that would have to mean coding in java.. eeww :P
[02:48] rbraley within a single thread it will do round robin or poll() or something
[02:49] rbraley but there will be n threads for n cores so it will scale as well as whatever your hardware can handle
[02:49] rbraley and it doesn't even need to be one machine, with zeromq.
[02:50] travlr have you studied zmq source yet?
[02:50] rbraley but this may also work with GPUs
[02:50] rbraley travlr, yes but not since we talked
[02:51] travlr yeah what about simply using coda for gpu or similar .... 10-100x
[02:51] rbraley well I was hoping to migrate actors off the threads to the GPU if possible
[02:52] jsimmons gpu will suck for actor logic
[02:52] travlr yeah i know i was missing something with that thought
[02:52] jsimmons unless you have super super simple actors
[02:52] rbraley all actors tend to be super simple jsimmons
[02:53] jsimmons by actors do you mean something like boids, then?
[02:54] rbraley no, (of course they would be good for agent-based modeling and emergent behavior) actors are in the context of the Actor Model of concurrency
[02:54] rbraley
[02:55] jsimmons because as soon as you start branching on the gpu you take a massive performance hit
[02:55] rbraley ah
[02:55] rbraley yes well just make two actors then
[02:56] rbraley one where the branch succeeds and one where the branch fails
[02:56] rbraley then leave choosing which computation to keep back at the CPU
[02:57] jsimmons ah I've been oversimplifying 'Actor'
[02:58] rbraley it is hard to oversimplify actors, since they are a single primitive that is computationally universal :P
[02:58] rbraley even the lambda calculus has two primitives
[03:01] rbraley basically: the real world is concurrent, how can we model that in an optimal way? -> actors
[03:01] jsimmons I mean I was imposing my own meaning from traditional game engines
[03:02] rbraley yeah I am doing something that is either really stupid or 10-30 years ahead of its time :P so I wouldn't expect any game engine to bear a resemblance to mine.
[03:02] rbraley although some come close
[03:03] travlr nothing wrong with trying to "touch" the hardware from my point of view ;-)
[03:04] rbraley travlr, I want to make sexy times with the hardware and blow current engines out of the water, at least from a scalability/concurrency standpoint
[03:05] travlr no doubt.. kudos imo
[03:06] rbraley thanks, it's frustrating that I have to invent so much to realize my vision, but hey, at least there's 0MQ, and Protocol Buffers.
[03:07] jsimmons I'll be interested if you make it work :D
[03:07] travlr i like paying attention to your vision.. as i apply some of it in my mind to my needs
[03:07] rbraley jsimmons, so will I :D
[03:09] rbraley I just need to invent the 0MQ of actor model frameworks.
[03:10] rbraley well that is a recursive definition ;)
[03:10] rbraley I need ultra-lightweight processes for C++
[03:11] travlr start a thread on the ml... i want to follow along ;)
[03:12] travlr i'm sure that if implemented as you describe it will find plenty of use cases outside of a game engine
[03:12] rbraley of course!
[03:12] travlr like mine
[03:13] travlr :-D
[03:13] rbraley hehe
[03:13] rbraley this kilim is cool, but it doesn't adhere to a unix philosophy
[03:14] travlr for some reason anything java just rubs me the wrong way.. i don't event want to begin to play with it.. i don't know why but that's the way it is
[03:14] rbraley travlr, that's called bigotry :P
[03:14] travlr lol
[03:15] jsimmons as far as I'm concerned, at least it's not C++ :P
[03:15] travlr i've learned to love c++, thanks to Qt
[03:15] rbraley
[03:15] jsimmons I kinda despise Qt
[03:16] rbraley any rubyists may enjoy that link
[03:16] travlr jsimmons: yes i understand.. to each their own..
[03:16] jsimmons Gtk supremacy travlr :P
[03:16] rbraley jsimmons, whaaat? I don't understand.
[03:16] travlr oh, so you want to start a flame war now... lol
[03:16] rbraley oh Gtk, ok
[03:17] jsimmons Actually I'm not a big fan of the Gtk graphical toolkit bit, but I like GLib, especially GObject introspection. :D
[03:18] rbraley I do too, actually have you heard of Vala, jsimmons?
[03:18] jsimmons Yeah rbraley, I've even patched some of the vapis :P
[03:18] rbraley well played
[03:19] rbraley Gtk is problematic, but Glib is really neat.
[03:19] jsimmons Have you heard of Clutter/Mx rbraley?
[03:20] rbraley is that the opengl gui?
[03:20] jsimmons Yeah clutter is an opengl scene graph/animation framework kinda thing, and mx is a ui library that uses clutter.
[03:21] rbraley nice
[03:21] jsimmons It's used in stuff like MeeGo
[03:21] rbraley cool
[03:23] jsimmons but enough tomfoolery for now, I think I'll go stab some people in Oblivion. :D good luck with your multi-tasking madness.
[03:25] rbraley jsimmons, the game I am working on is very similar to a TES title
[03:25] rbraley jsimmons,
[03:27] jsimmons bookmarked
[03:45] rbraley travlr, this might just do the trick
[03:48] travlr very interesting.. get it in zmq now
[03:50] travlr this stuff still confuses me a bit.. i'll have to study it more
[03:53] rbraley they have their own message passing implementation, which seems to be just within the same address space of the process and can't scale to multiple machines
[03:53] rbraley like 0MQ can
[03:53] travlr right, hence my call to zmq impl
[03:54] travlr only provides part of the story
[03:54] travlr are you gonna do it?
[03:55] rbraley depends on if it is written with hard dependencies on the way they do message passing or not
[03:56] travlr use their ideas. what's its license?
[03:58] travlr Creative Commons Attribution 3.0 License
[03:59] rbraley sweet
[04:01] travlr i say start a thread on the ml and get some interest going
[04:05] travlr this is a new library.. you should contact the dev(s) there and convince them to work with you on zmq impl
[04:05] travlr not new but very active
[04:05] rbraley travlr, never used a mailinglist before. Always google groups or fora
[04:06] travlr only 503 threads
[04:06] travlr is that enough for you
[04:19] rbraley nevar!
[04:21] travlr does dabbleboard have automagic spacing for flow charts?
[04:37] rbraley I don't know I just found it on the spot because you wanted something like that
[04:37] travlr k
[05:20] sustrik zedas: yes, the code is messy there
[05:20] sustrik zmq_poll will get rewritten in new version
[05:20] sustrik in the meantime, if you get patch, i'll apply it
[05:48] zedas sustrik: cool it'll be against 2.0.7
[05:49] sustrik sure
[05:49] sustrik thanks
[05:51] zedas sustrik: ah i think i found it: // Wait for events. Ignore interrupts if there's infinite timeout
[05:51] zedas we have infinite timeout
[05:53] sustrik let me see
[05:54] sustrik hm, does it result in infinite loop?
[05:54] sustrik once the signal is processed, the loop should exit afaics
[05:55] sustrik what signal are you testing with?
[06:05] zedas well, if you have -1 one timeout, then EINTR happens, and then it loops, poll runs, exits immediately with EINTR, repeate
[06:05] zedas so not sure why poll thinks it has an EINTR condition again
[06:05] sustrik maybe the signal is unhadled
[06:05] zedas we've got SIGINT, SIGHUP, SIGQUIT, but none of those fire during this loop
[06:05] sustrik so it stays in the signal queue
[06:06] zedas could it be socket close for some reason?
[06:06] zedas let me try a few signal catchers....see what i'm getting.
[06:06] sustrik SIGPIPE?
[06:06] zedas yeah but i can't see why...i pretty much block those but lemme see.
[06:07] sustrik use gdb
[06:07] sustrik there's an option to stop on a signal
[06:07] sustrik iirc
[06:07] sustrik 'handle' command i think
[06:08] zedas ah yeah
[06:10] zedas SIGPIPE is the most frustrating signal ever
[06:10] zedas whoever thought it should be part of sockets should be shot
[06:10] sustrik :)
[06:26] zedas sustrik: ok looks like SIGPIPE was the culprit. i set it to SIG_IGN and haven't had the problem yet.
[06:26] zedas i'll keep you posted
[06:27] sustrik zedas: thanks
[06:38] zedas sustrik: nope
[06:39] zedas i have every possible signal blocked and it's still setting EINTR
[06:41] sustrik what does gdb say?
[06:41] sustrik handle all print
[07:29] zedas sustrik: well in the process of adding a timer, i found out that zmq_poll has a timeout/1000 in it, so it give different poll semantics on timeout
[07:30] zedas which was causing mongrel2 to thrash if there was a timer, so fixed that, now seeing if having a timeout causes it or not
[07:32] sustrik are you saying the signal you've seen was the timer signal?
[07:36] zedas nope, i'm saying the timeout parameter to zmq_poll is different from real poll
[07:36] zedas it's /1000, so if you pass in ms it's way off
[07:37] zedas i added a timout to the zmq_poll so i could see the effect of it on those branches in zmq_poll that continue if there's EINTR
[07:37] zedas but, even with a solid timeout it still pegs the CPU 100% with EINTR when a handler dies
[07:37] zedas and, killing the handler makes it go back to normal
[07:37] zedas so, more digging.
[07:39] jsimmons so you can produce that bug reliably now zedas?
[07:40] zedas yep, all the time.
[07:40] zedas i just hit handles with 0mq messages, wait a bit, 100% cpu. kill the dead handler, 0%
[07:40] zedas it also pulls out another but so i'll just go fix that for now, this is kind of wearing me down.
[08:12] sustrik zedas: hm, would returning EINTR from zmq_poll instead of looping help?
[08:13] sustrik however, if you handle the EINTR by simply calling zmq_poll again, it'll behave exactly as it does now
[08:15] sustrik btw, what's "the handler"
[08:15] sustrik signal handler?
[08:15] sustrik or the network peer?
[09:45] jsimmons he was talking about a handler in mongrel2 sustrik, which is just an entity that receives and sends zmq messages to/from the mongrel2 server. He's also gone to bed.
[11:32] pieterh sustrik: hi
[11:33] pieterh random question, I'm writing a relay example (step1->step2->step3)
[11:33] pieterh to demonstrate inter-thread signalling
[11:46] CIA-20 zeromq2: 03Pieter Hintjens 07master * rc52d1f2 10/ doc/zmq_recv.txt : Fixed example for multipart zmq_recv() -
[11:52] pieterh Hah... do NOT use 'socket' in example code
[11:52] pieterh it compiles even if you don't define a variable called 'socket'
[13:47] CIA-20 zeromq2: 03Pieter Hintjens 07master * r2b2accb 10/ doc/zmq_recv.txt : Added calls to zmq_msg_close in examples -
[17:43] sustrik jsimmons: thanks
[17:43] kshah I've used homebrew to install zeromq, but I can not get the rubygem (not the ffi one) to build even when I specify the zmq-dir
[17:44] kshah I've had this issue for a while, and I can't seem to hack at the extconf.rb file provided in the gem either to solve the problem
[17:44] sustrik kshah: what's the problem?
[17:45] kshah sustrik: when I provide the --with-zmq-dir option on "sudo gem install zmq" it can't find the necessary libraries
[17:45] kshah and prints back "extconf.rb:25: Couldn't find zmq library. try setting --with-zmq-dir=<path> to tell me where it is. (RuntimeError)"
[17:46] sustrik what's the OS?
[17:46] kshah OSX 10.6
[17:46] sustrik maybe a problem with .so vs. .dylib?
[17:47] kshah sustrik: not sure, are you familiar with homebrew btw?
[17:48] sustrik i haven't slightest idea what is it :)
[17:48] kshah it's quickly replacing macports as the package manager of choice for OSX... I think it's formula, however, for installing zeromq is not compatible
[17:48] sustrik but OSX seems to be a bit tricky
[17:48] sustrik because the dynamic libraries tend to be named something.dylib
[17:48] sustrik as opposed to .so on Linux
[17:49] kshah this may mean something to you:
[17:49] sustrik no, sorry
[17:50] kshah I've been using dynamic languages for most of my life/career, so .dylib and .so stuff is usually over my head :/
[17:50] sustrik can you check thefilename of your zmq library?
[17:50] kshah yes, I'll print the tree in a pastebin
[17:52] kshah
[17:52] kshah that is in /usr/local/Cellar/zeromq/2.0.7/
[17:52] sustrik libzmq.dylib
[17:52] sustrik so i would say, your extconf.rb is checking for
[17:52] sustrik and doesn't find it because there's no such file
[17:52] sustrik instead there's libzmq.dylib :|
[17:53] sustrik no idea how's that solved in ruby world
[17:53] kshah well, the extconf.rb is using the mkmf library, I'll read up on that
[17:54] sustrik ok
[17:54] kshah I don't suppose I can pass a flag to specify the location of that lib
[17:54] sustrik i assume the location is ok
[17:54] sustrik the problem is that filename is different on OSX
[17:57] sustrik you may also check the other ruby binding, maybe the problem is solved there is some way
[17:57] kshah that homebrew formula is straight up pulling the source and running ./configure --disable-dependency-tracking and make install on it, that portion isn't ruby, even though homebrew itself is written in ruby
[17:58] kshah oh, but you're saying the compiler on OSX creates the library with a different extension
[17:58] zedas sustrik: i think returning EINTR might be the next step, but i have to try a few other things first.
[17:59] sustrik zedas: so it's SIGPIPE caused by the peer failing, right?
[18:00] sustrik kshah: yes
[18:00] sustrik it's kind of weird but that seems to be the OSX way
[18:01] zedas sustrik: nope, i've got no idea why poll thinks it should return EINTR
[18:02] sustrik bleh
[18:02] sustrik anyway, maybe it's not in general a good idea not to return on signals
[18:02] zedas sustrik: and i'm pretty sure that even *if* there was a signal, poll isn't supposed to flail about because of it. it should report EINTR once and then move on.
[18:03] sustrik yeah, we were trying yo be too smart
[18:04] sustrik it won't work well with users explicitly sending signals and reafing them via sigwait
[18:04] sustrik they would just get stuck in an infinite loop
[18:04] sustrik mato: are you here? you may have an opinion on this
[18:09] kshah sustrik: would this mean it did find the library but couldnt' find a necessary function? "checking for zmq_init() in -lzmq... no"
[18:10] sustrik yes, possibly
[18:10] sustrik but my guess is that it's the extension problem
[18:20] kshah sustrik: the ffi ruby library built with no problems at all
[18:21] kshah minus the missing gem dependency on ffi, but thats just updating the gemspec
[18:23] kshah the Oliver Smith video introduction is incredibly helpful
[18:25] sustrik kshah: so the problem is solved?
[18:25] kshah the problem is evaded
[18:25] kshah so no problem! :)
[18:26] kshah thank you, I forked the ruby-ffi library on GH, I'll build out some docs for the README to get people started
[18:34] sustrik thanks
[18:35] sustrik btw, try to push your README changes back to the original library, so that it's not lost
[18:36] kshah will do
[18:38] kshah i'm going to make a node services framework with this for managing remote/cloud infrastructure
[18:38] kshah we have over 100 nodes in EC2 and I can't really effectively message them without clustershell
[18:38] kshah which is kinda sad
[18:39] kshah i'm also sure someone has done this before.. but maybe not in ruby
[19:18] pieterh sustrik: ping
[19:20] sustrik pieterh: pong
[19:21] pieterh hi, do you have that whitepaper measuring impact of different frame encoding sizes?
[19:21] pieterh i'm still lobbying HyBi to use an 8/64 algorithm
[19:22] sustrik :)
[19:22] sustrik it used to be on
[19:22] sustrik let me see
[19:23] sustrik
[19:23] pieterh thanks!
[21:29] kshah I've got my hello world up with the ruby-ffi library, but I get a *lot* of segmentation faults
[21:31] kshah
[21:36] kshah instance variables seem to perform better, again I don't use real languages so I don't know very much about memory allocation/management
[21:36] kshah but I was able to trigger the same behavior
[21:57] kshah this has got to be some sort of Ruby locking nonsense.. this cant really be tested on the same machine, at least I don't think so
[22:01] kshah okay, this is FFI and Ruby 1.8.7 related, not at all zeromq related
[22:07] cremes kshah: i'm the ffi-ruby bindings author; feel free to ping me with questions
[22:09] kshah cremes: I saw a closed ticket related to issues on 1.8.7, am I correct to say FFI flat out doesn't work well with the 1.8.x branch?
[22:10] cremes kshah: correct; the 1.8.x branch has terminally broken thread handling
[22:10] cremes i recommend jruby and then the 1.9 branch (with the updated ffi build for further threading fixes)
[22:12] kshah that's a many month project for my company (porting to another Ruby VM) but one which we need to get done regardless
[22:13] cremes kshah: i understand
[22:13] cremes at this stage it is probably a good idea to at least get on the 1.9.x branch
[22:14] cremes it *is* faster and the syntax changes are mostly avoidable
[22:14] kshah thats a community wide effort
[22:14] kshah
[22:20] pieterh I like those four support levels
[22:20] pieterh could be nice to adopt that for 0MQ
[22:27] cremes not a bad idea for 0mq
[22:27] cremes it helps to set expectations
[22:30] pieterh cremes: ack
[22:33] kshah gentlemen I'll be back in a few months after I upgrade our VM, I learned *a lot* today looking at the ffi-rzmq source and reading the docs. The way we do messaging right now sucks and costs us a lot of money. I'm excited for this, thanks
[22:34] cremes a few months to upgrade the VM? oh boy...
[22:34] cremes go with god, my son