ZeroMq IRC Log

Saturday August 21, 2010

[Time] Name	Message
[01:14] rbraley	say, what does zero-copy mean for message passing?
[01:52] zedas	sustrik: hey so i've found a potential bug in zmq_poll
[01:52] zedas	sustrik: i'll work up a fix, but you go into a permanent loop when poll sets EINTR errno
[01:52] zedas	it still seems to work, but it's just calling poll like mad pegging that thread at 100%
[01:53] zedas	it's also why 0mq sockets seem to "die". we traced it down in mongrel2 to the line in zmq.cpp that does the continue on EINTR
[01:55] travlr	rbraley: http://www.zeromq.org/docs:user-guide#toc17
[02:10] rbraley	http://www.malhar.net/sriram/kilim/ check the tech talk
[02:11] rbraley	I want to see if I can do this with C++ and ZeroMQ
[02:28] travlr	rbraley: after reading the first sentence, i saw that's the concept you've been reaching for with zmq
[02:28] rbraley	yes!
[02:28] travlr	use the source luke... use the source ... lol
[02:30] travlr	you should start a new thread on zeromq-dev mail list for this topic... you'll get some interest and suggestions to boot.
[02:32] rbraley	travlr, Since this technique requires code transformation I am thinking about either seeing if I can do it with the preprocessor(kinda doubtful) or maybe extending Qt's meta object-compiler to support the Actor Model
[02:33] travlr	ok,, here's a stupid question i've been thinking about.. this is neat stuff, but are you sure you need it?
[02:33] travlr	btw the actor model is very close to fbp itself
[02:35] rbraley	basically the concept for an ultra-lightweight process a la erlang is just like process isolation for OS processes, they don't need to know about the other ones at all and shouldn't, unless you send messages. But they are much much smaller than threads. That means they can fail in isolation and your program doesn't die unless you tell it to die.
[02:37] rbraley	yes I am sure I need it. My game architecture requires that every entity in the game be an actor. That means that threads can't cut it unless I only want 1000 things in my game.
[02:38] travlr	and individual processes (executables) don't have zero-copy so that scratches that idea i suppose.
[02:38] jsimmons	why should you have an individual thread per actor?
[02:39] rbraley	and they're huge and would mess up your system process monitor, like $ ps -eaf or $ top or some gui equivalent
[02:39] travlr	true, with inproc maybe you don't need threads
[02:40] rbraley	messages aren't the only concurrent things that happen. Actors still need to do stuff. And they shouldn't have to wait for each other unnecessarily
[02:40] jsimmons	aside from the difficulty in implementation, having actors individually threaded is horribly inefficient
[02:41] travlr	i see. you need light weighted threads ;)
[02:41] rbraley	I do want to multiplex actors onto different system threads, one thread per core.
[02:41] travlr	you'll do the gaming community a big service if you do implement it zmq
[02:42] jsimmons	you mean co-routines or cooperative multithreading
[02:42] jsimmons	I presume
[02:43] rbraley	jsimmons, it is possible to do preemptive scheduling of ultra-lightweight threads (as I think only erlang does right now). But yeah I would transform functions into continuation-passing style
[02:45] rbraley	and each actor would get its own stack and context switch between other actors on the same thread entirely in user space making it around 3 orders of magnitude faster than context switching between user space and kernel space that threads do.
[02:46] rbraley	sorry, the first occurrence of context switch in my previous statement was used incorrectly
[02:46] jsimmons	my point is, are you really going to make it faster than say brute forcing or executing in blocks
[02:47] rbraley	brute forcing?
[02:47] jsimmons	as in for actor in actors do actor.update()
[02:47] jsimmons	I should say 'buckets' rather than blocks
[02:48] travlr	you'd probably save yourself hassle if you implemented in java with kilim, but that would have to mean coding in java.. eeww :P
[02:48] rbraley	within a single thread it will do round robin or poll() or something
[02:49] rbraley	but there will be n threads for n cores so it will scale as well as whatever your hardware can handle
[02:49] rbraley	and it doesn't even need to be one machine, with zeromq.
[02:50] travlr	have you studied zmq source yet?
[02:50] rbraley	but this may also work with GPUs
[02:50] rbraley	travlr, yes but not since we talked
[02:51] travlr	yeah what about simply using coda for gpu or similar .... 10-100x
[02:51] rbraley	well I was hoping to migrate actors off the threads to the GPU if possible
[02:52] jsimmons	gpu will suck for actor logic
[02:52] travlr	yeah i know i was missing something with that thought
[02:52] jsimmons	unless you have super super simple actors
[02:52] rbraley	all actors tend to be super simple jsimmons
[02:53] jsimmons	by actors do you mean something like boids, then?
[02:54] rbraley	no, (of course they would be good for agent-based modeling and emergent behavior) actors are in the context of the Actor Model of concurrency
[02:54] rbraley	https://secure.wikimedia.org/wikipedia/en/wiki/Actor_model
[02:55] jsimmons	because as soon as you start branching on the gpu you take a massive performance hit
[02:55] rbraley	ah
[02:55] rbraley	yes well just make two actors then
[02:56] rbraley	one where the branch succeeds and one where the branch fails
[02:56] rbraley	then leave choosing which computation to keep back at the CPU
[02:57] jsimmons	ah I've been oversimplifying 'Actor'
[02:58] rbraley	it is hard to oversimplify actors, since they are a single primitive that is computationally universal :P
[02:58] rbraley	even the lambda calculus has two primitives
[03:01] rbraley	basically: the real world is concurrent, how can we model that in an optimal way? -> actors
[03:01] jsimmons	I mean I was imposing my own meaning from traditional game engines
[03:02] rbraley	yeah I am doing something that is either really stupid or 10-30 years ahead of its time :P so I wouldn't expect any game engine to bear a resemblance to mine.
[03:02] rbraley	although some come close
[03:03] travlr	nothing wrong with trying to "touch" the hardware from my point of view ;-)
[03:04] rbraley	travlr, I want to make sexy times with the hardware and blow current engines out of the water, at least from a scalability/concurrency standpoint
[03:05] travlr	no doubt.. kudos imo
[03:06] rbraley	thanks, it's frustrating that I have to invent so much to realize my vision, but hey, at least there's 0MQ, and Protocol Buffers.
[03:07] jsimmons	I'll be interested if you make it work :D
[03:07] travlr	i like paying attention to your vision.. as i apply some of it in my mind to my needs
[03:07] rbraley	jsimmons, so will I :D
[03:09] rbraley	I just need to invent the 0MQ of actor model frameworks.
[03:10] rbraley	well that is a recursive definition ;)
[03:10] rbraley	I need ultra-lightweight processes for C++
[03:11] travlr	start a thread on the ml... i want to follow along ;)
[03:12] travlr	i'm sure that if implemented as you describe it will find plenty of use cases outside of a game engine
[03:12] rbraley	of course!
[03:12] travlr	like mine
[03:13] travlr	:-D
[03:13] rbraley	hehe
[03:13] rbraley	this kilim is cool, but it doesn't adhere to a unix philosophy
[03:14] travlr	for some reason anything java just rubs me the wrong way.. i don't event want to begin to play with it.. i don't know why but that's the way it is
[03:14] rbraley	travlr, that's called bigotry :P
[03:14] travlr	lol
[03:15] jsimmons	as far as I'm concerned, at least it's not C++ :P
[03:15] travlr	i've learned to love c++, thanks to Qt
[03:15] rbraley	http://www.mirah.org/
[03:15] jsimmons	I kinda despise Qt
[03:16] rbraley	any rubyists may enjoy that link
[03:16] travlr	jsimmons: yes i understand.. to each their own..
[03:16] jsimmons	Gtk supremacy travlr :P
[03:16] rbraley	jsimmons, whaaat? I don't understand.
[03:16] travlr	oh, so you want to start a flame war now... lol
[03:16] rbraley	oh Gtk, ok
[03:17] jsimmons	Actually I'm not a big fan of the Gtk graphical toolkit bit, but I like GLib, especially GObject introspection. :D
[03:18] rbraley	I do too, actually have you heard of Vala, jsimmons?
[03:18] jsimmons	Yeah rbraley, I've even patched some of the vapis :P
[03:18] rbraley	well played
[03:19] rbraley	Gtk is problematic, but Glib is really neat.
[03:19] jsimmons	Have you heard of Clutter/Mx rbraley?
[03:20] rbraley	is that the opengl gui?
[03:20] jsimmons	Yeah clutter is an opengl scene graph/animation framework kinda thing, and mx is a ui library that uses clutter.
[03:21] rbraley	nice
[03:21] jsimmons	It's used in stuff like MeeGo
[03:21] rbraley	cool
[03:23] jsimmons	but enough tomfoolery for now, I think I'll go stab some people in Oblivion. :D good luck with your multi-tasking madness.
[03:25] rbraley	jsimmons, the game I am working on is very similar to a TES title
[03:25] rbraley	jsimmons, http://dungeonhack.sf.net
[03:27] jsimmons	bookmarked
[03:45] rbraley	travlr, this might just do the trick http://theron.ashtonmason.net/index.php?t=page&p=threadpool
[03:48] travlr	very interesting.. get it in zmq now
[03:50] travlr	this stuff still confuses me a bit.. i'll have to study it more
[03:53] rbraley	they have their own message passing implementation, which seems to be just within the same address space of the process and can't scale to multiple machines
[03:53] rbraley	like 0MQ can
[03:53] travlr	right, hence my call to zmq impl
[03:54] travlr	only provides part of the story
[03:54] travlr	are you gonna do it?
[03:55] rbraley	depends on if it is written with hard dependencies on the way they do message passing or not
[03:56] travlr	use their ideas. what's its license?
[03:58] travlr	Creative Commons Attribution 3.0 License
[03:59] rbraley	sweet
[04:01] travlr	i say start a thread on the ml and get some interest going
[04:05] travlr	this is a new library.. you should contact the dev(s) there and convince them to work with you on zmq impl
[04:05] travlr	not new but very active
[04:05] rbraley	travlr, never used a mailinglist before. Always google groups or fora
[04:06] travlr	only 503 threads
[04:06] travlr	is that enough for you
[04:19] rbraley	nevar!
[04:21] travlr	does dabbleboard have automagic spacing for flow charts?
[04:37] rbraley	I don't know I just found it on the spot because you wanted something like that
[04:37] travlr	k
[05:20] sustrik	zedas: yes, the code is messy there
[05:20] sustrik	zmq_poll will get rewritten in new version
[05:20] sustrik	in the meantime, if you get patch, i'll apply it
[05:48] zedas	sustrik: cool it'll be against 2.0.7
[05:49] sustrik	sure
[05:49] sustrik	thanks
[05:51] zedas	sustrik: ah i think i found it: // Wait for events. Ignore interrupts if there's infinite timeout
[05:51] zedas	we have infinite timeout
[05:53] sustrik	let me see
[05:54] sustrik	hm, does it result in infinite loop?
[05:54] sustrik	once the signal is processed, the loop should exit afaics
[05:55] sustrik	what signal are you testing with?
[06:05] zedas	well, if you have -1 one timeout, then EINTR happens, and then it loops, poll runs, exits immediately with EINTR, repeate
[06:05] zedas	so not sure why poll thinks it has an EINTR condition again
[06:05] sustrik	maybe the signal is unhadled
[06:05] zedas	we've got SIGINT, SIGHUP, SIGQUIT, but none of those fire during this loop
[06:05] sustrik	so it stays in the signal queue
[06:06] zedas	could it be socket close for some reason?
[06:06] zedas	let me try a few signal catchers....see what i'm getting.
[06:06] sustrik	SIGPIPE?
[06:06] zedas	yeah but i can't see why...i pretty much block those but lemme see.
[06:07] sustrik	use gdb
[06:07] sustrik	there's an option to stop on a signal
[06:07] sustrik	iirc
[06:07] sustrik	'handle' command i think
[06:08] zedas	ah yeah
[06:10] zedas	SIGPIPE is the most frustrating signal ever
[06:10] zedas	whoever thought it should be part of sockets should be shot
[06:10] sustrik	:)
[06:26] zedas	sustrik: ok looks like SIGPIPE was the culprit. i set it to SIG_IGN and haven't had the problem yet.
[06:26] zedas	i'll keep you posted
[06:27] sustrik	zedas: thanks
[06:38] zedas	sustrik: nope
[06:39] zedas	i have every possible signal blocked and it's still setting EINTR
[06:41] sustrik	what does gdb say?
[06:41] sustrik	handle all print
[07:29] zedas	sustrik: well in the process of adding a timer, i found out that zmq_poll has a timeout/1000 in it, so it give different poll semantics on timeout
[07:30] zedas	which was causing mongrel2 to thrash if there was a timer, so fixed that, now seeing if having a timeout causes it or not
[07:32] sustrik	are you saying the signal you've seen was the timer signal?
[07:36] zedas	nope, i'm saying the timeout parameter to zmq_poll is different from real poll
[07:36] zedas	it's /1000, so if you pass in ms it's way off
[07:37] zedas	i added a timout to the zmq_poll so i could see the effect of it on those branches in zmq_poll that continue if there's EINTR
[07:37] zedas	but, even with a solid timeout it still pegs the CPU 100% with EINTR when a handler dies
[07:37] zedas	and, killing the handler makes it go back to normal
[07:37] zedas	so, more digging.
[07:39] jsimmons	so you can produce that bug reliably now zedas?
[07:40] zedas	yep, all the time.
[07:40] zedas	i just hit handles with 0mq messages, wait a bit, 100% cpu. kill the dead handler, 0%
[07:40] zedas	it also pulls out another but so i'll just go fix that for now, this is kind of wearing me down.
[08:12] sustrik	zedas: hm, would returning EINTR from zmq_poll instead of looping help?
[08:13] sustrik	however, if you handle the EINTR by simply calling zmq_poll again, it'll behave exactly as it does now
[08:15] sustrik	btw, what's "the handler"
[08:15] sustrik	signal handler?
[08:15] sustrik	or the network peer?
[09:45] jsimmons	he was talking about a handler in mongrel2 sustrik, which is just an entity that receives and sends zmq messages to/from the mongrel2 server. He's also gone to bed.
[11:32] pieterh	sustrik: hi
[11:33] pieterh	random question, I'm writing a relay example (step1->step2->step3)
[11:33] pieterh	to demonstrate inter-thread signalling
[11:46] CIA-20	zeromq2: 03Pieter Hintjens 07master * rc52d1f2 10/ doc/zmq_recv.txt : Fixed example for multipart zmq_recv() - http://bit.ly/a3r6KT
[11:52] pieterh	Hah... do NOT use 'socket' in example code
[11:52] pieterh	it compiles even if you don't define a variable called 'socket'
[13:47] CIA-20	zeromq2: 03Pieter Hintjens 07master * r2b2accb 10/ doc/zmq_recv.txt : Added calls to zmq_msg_close in examples - http://bit.ly/bWEUgx
[17:43] sustrik	jsimmons: thanks
[17:43] kshah	I've used homebrew to install zeromq, but I can not get the rubygem (not the ffi one) to build even when I specify the zmq-dir
[17:44] kshah	I've had this issue for a while, and I can't seem to hack at the extconf.rb file provided in the gem either to solve the problem
[17:44] sustrik	kshah: what's the problem?
[17:45] kshah	sustrik: when I provide the --with-zmq-dir option on "sudo gem install zmq" it can't find the necessary libraries
[17:45] kshah	and prints back "extconf.rb:25: Couldn't find zmq library. try setting --with-zmq-dir=<path> to tell me where it is. (RuntimeError)"
[17:46] sustrik	what's the OS?
[17:46] kshah	OSX 10.6
[17:46] sustrik	maybe a problem with .so vs. .dylib?
[17:47] kshah	sustrik: not sure, are you familiar with homebrew btw?
[17:48] sustrik	i haven't slightest idea what is it :)
[17:48] kshah	it's quickly replacing macports as the package manager of choice for OSX... I think it's formula, however, for installing zeromq is not compatible
[17:48] sustrik	but OSX seems to be a bit tricky
[17:48] sustrik	because the dynamic libraries tend to be named something.dylib
[17:48] sustrik	as opposed to .so on Linux
[17:49] kshah	this may mean something to you: http://github.com/mxcl/homebrew/blob/master/Library/Formula/zeromq.rb
[17:49] sustrik	no, sorry
[17:50] kshah	I've been using dynamic languages for most of my life/career, so .dylib and .so stuff is usually over my head :/
[17:50] sustrik	can you check thefilename of your zmq library?
[17:50] kshah	yes, I'll print the tree in a pastebin
[17:52] kshah	http://gist.github.com/542632
[17:52] kshah	that is in /usr/local/Cellar/zeromq/2.0.7/
[17:52] sustrik	libzmq.dylib
[17:52] sustrik	so i would say, your extconf.rb is checking for libzmq.so
[17:52] sustrik	and doesn't find it because there's no such file
[17:52] sustrik	instead there's libzmq.dylib :\|
[17:53] sustrik	no idea how's that solved in ruby world
[17:53] kshah	well, the extconf.rb is using the mkmf library, I'll read up on that
[17:54] sustrik	ok
[17:54] kshah	I don't suppose I can pass a flag to specify the location of that lib
[17:54] sustrik	i assume the location is ok
[17:54] sustrik	the problem is that filename is different on OSX
[17:57] sustrik	you may also check the other ruby binding, maybe the problem is solved there is some way
[17:57] kshah	that homebrew formula is straight up pulling the source and running ./configure --disable-dependency-tracking and make install on it, that portion isn't ruby, even though homebrew itself is written in ruby
[17:58] kshah	oh, but you're saying the compiler on OSX creates the library with a different extension
[17:58] zedas	sustrik: i think returning EINTR might be the next step, but i have to try a few other things first.
[17:59] sustrik	zedas: so it's SIGPIPE caused by the peer failing, right?
[18:00] sustrik	kshah: yes
[18:00] sustrik	it's kind of weird but that seems to be the OSX way
[18:01] zedas	sustrik: nope, i've got no idea why poll thinks it should return EINTR
[18:02] sustrik	bleh
[18:02] sustrik	anyway, maybe it's not in general a good idea not to return on signals
[18:02] zedas	sustrik: and i'm pretty sure that even if there was a signal, poll isn't supposed to flail about because of it. it should report EINTR once and then move on.
[18:03] sustrik	yeah, we were trying yo be too smart
[18:04] sustrik	it won't work well with users explicitly sending signals and reafing them via sigwait
[18:04] sustrik	they would just get stuck in an infinite loop
[18:04] sustrik	mato: are you here? you may have an opinion on this
[18:09] kshah	sustrik: would this mean it did find the library but couldnt' find a necessary function? "checking for zmq_init() in -lzmq... no"
[18:10] sustrik	yes, possibly
[18:10] sustrik	but my guess is that it's the extension problem
[18:20] kshah	sustrik: the ffi ruby library built with no problems at all
[18:21] kshah	minus the missing gem dependency on ffi, but thats just updating the gemspec
[18:23] kshah	the Oliver Smith video introduction is incredibly helpful
[18:25] sustrik	kshah: so the problem is solved?
[18:25] kshah	the problem is evaded
[18:25] kshah	so no problem! :)
[18:26] kshah	thank you, I forked the ruby-ffi library on GH, I'll build out some docs for the README to get people started
[18:34] sustrik	thanks
[18:35] sustrik	btw, try to push your README changes back to the original library, so that it's not lost
[18:36] kshah	will do
[18:38] kshah	i'm going to make a node services framework with this for managing remote/cloud infrastructure
[18:38] kshah	we have over 100 nodes in EC2 and I can't really effectively message them without clustershell
[18:38] kshah	which is kinda sad
[18:39] kshah	i'm also sure someone has done this before.. but maybe not in ruby
[19:18] pieterh	sustrik: ping
[19:20] sustrik	pieterh: pong
[19:21] pieterh	hi, do you have that whitepaper measuring impact of different frame encoding sizes?
[19:21] pieterh	i'm still lobbying HyBi to use an 8/64 algorithm
[19:22] sustrik	:)
[19:22] sustrik	it used to be on zeromq.org
[19:22] sustrik	let me see
[19:23] sustrik	http://www.zeromq.org/whitepapers:design-v01#toc5
[19:23] pieterh	thanks!
[21:29] kshah	I've got my hello world up with the ruby-ffi library, but I get a lot of segmentation faults
[21:31] kshah	http://gist.github.com/542878
[21:36] kshah	instance variables seem to perform better, again I don't use real languages so I don't know very much about memory allocation/management
[21:36] kshah	but I was able to trigger the same behavior
[21:57] kshah	this has got to be some sort of Ruby locking nonsense.. this cant really be tested on the same machine, at least I don't think so
[22:01] kshah	okay, this is FFI and Ruby 1.8.7 related, not at all zeromq related
[22:07] cremes	kshah: i'm the ffi-ruby bindings author; feel free to ping me with questions
[22:09] kshah	cremes: I saw a closed ticket related to issues on 1.8.7, am I correct to say FFI flat out doesn't work well with the 1.8.x branch?
[22:10] cremes	kshah: correct; the 1.8.x branch has terminally broken thread handling
[22:10] cremes	i recommend jruby and then the 1.9 branch (with the updated ffi build for further threading fixes)
[22:12] kshah	that's a many month project for my company (porting to another Ruby VM) but one which we need to get done regardless
[22:13] cremes	kshah: i understand
[22:13] cremes	at this stage it is probably a good idea to at least get on the 1.9.x branch
[22:14] cremes	it is faster and the syntax changes are mostly avoidable
[22:14] kshah	thats a community wide effort
[22:14] kshah	http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/367983
[22:20] pieterh	I like those four support levels
[22:20] pieterh	could be nice to adopt that for 0MQ
[22:27] cremes	not a bad idea for 0mq
[22:27] cremes	it helps to set expectations
[22:30] pieterh	cremes: ack
[22:33] kshah	gentlemen I'll be back in a few months after I upgrade our VM, I learned a lot today looking at the ffi-rzmq source and reading the docs. The way we do messaging right now sucks and costs us a lot of money. I'm excited for this, thanks
[22:34] cremes	a few months to upgrade the VM? oh boy...
[22:34] cremes	go with god, my son