ZeroMq IRC Log

Tuesday September 28, 2010

[Time] Name	Message
[05:39] CIA-20	zeromq2: 03Martin Sustrik 07maint * rf61921d 10/ src/req.cpp : REQ socket can die when reply is delivered on wrong unerlying connection -- fixed - http://bit.ly/cX2rpW
[05:46] CIA-20	zeromq2: 03Dhammika Pathirana 07maint * rc1deb22 10/ src/ypipe.hpp : crash when closing an ypipe -- fixed - http://bit.ly/bmxM8U
[05:53] CIA-20	zeromq2: 03Martin Sustrik 07master * rf61921d 10/ src/req.cpp : REQ socket can die when reply is delivered on wrong unerlying connection -- fixed - http://bit.ly/cX2rpW
[05:53] CIA-20	zeromq2: 03Dhammika Pathirana 07master * rc1deb22 10/ src/ypipe.hpp : crash when closing an ypipe -- fixed - http://bit.ly/bmxM8U
[05:53] CIA-20	zeromq2: 03Martin Sustrik 07master * r6715f9b 10/ src/ypipe.hpp :
[05:53] CIA-20	zeromq2: Merge branch 'maint'
[05:53] CIA-20	zeromq2: * maint:
[05:53] CIA-20	zeromq2: crash when closing an ypipe -- fixed - http://bit.ly/cxlL0o
[09:20] keffo	pieterh, around?
[09:21] pieterh	keffo: hi!
[09:23] keffo	Busy?
[09:24] pieterh	always, but shoot...
[09:24] keffo	I'm in the process of solidifying how the network is monitored.. Currently I gather various info & statistics in the loadbalancer which publishes regularly(~5s), and a WPF app subscribing and displaying nice graphs etc
[09:25] pieterh	sounds good
[09:25] keffo	but I cant really figure out a good way of limiting it.. I dont want to send a complete state every 5s
[09:25] pieterh	how large is the state?
[09:26] keffo	Depends, both on what info I decide to publish.. I'd like load, bandwidth usage, connected nodes and their respective stats(cpu/ram/etc), but also an overview of what is happenening, as well as more detailed info for each worker
[09:27] pieterh	estimated size? in bytes?
[09:27] keffo	geesh, no idea, not in the mb range at least :)
[09:27] pieterh	if you have no idea, it's not sensible to think about optimizing it
[09:27] pieterh	so do a back-of-envelope calculation and come up with a figure...
[09:28] keffo	I wondered if it was sound to have the monitoring app poll a complete current-state at startup, and then depend on a persistent sub-forwarder to handle deltastates? But that sounds very complicated
[09:28] keffo	I can't really guestimate, the number of nodes can range from the local setup I have here of a few machines, to much larger..
[09:28] pieterh	...
[09:29] pieterh	how big is "much larger"?
[09:29] keffo	ideally wan :)
[09:29] pieterh	please stick a number onto it...
[09:29] keffo	but being realistic, perhaps around 20?
[09:29] pieterh	and how large would the state be per worker?
[09:30] pieterh	please stick a number onto it...
[09:30] pieterh	then multiply the two numbers and add something for the overview
[09:30] keffo	Basic info(linpack measurements), uptime, average load, around that
[09:30] pieterh	come back when you have a total in KB per 5 seconds, a'ight?
[09:32] keffo	That's not what I'm interested in, the assumption here is that no data is published that isn't "needed", but I would like to figure out the most efficient means of passing around that data
[09:32] keffo	basically how to monitor a distributed system with the least amount of overhead as possible..
[09:32] pieterh	this is for research purposes rather than an actual use case...
[09:32] keffo	(regardless of what the data actually is)
[09:33] keffo	well both I guess, research first, use later :)
[09:33] pieterh	well, you can wait until Ch4 of the Guide if you want to
[09:33] keffo	Surely this problem has been dealt with before, as loadbalancing has :)
[09:33] pieterh	but here is how I'd do it...
[09:33] pieterh	- maintain state in the publisher
[09:33] pieterh	- apply updates to state and publish updates to pub socket
[09:34] pieterh	- in subscriber, request state via req/rep socket
[09:34] pieterh	- and also subscribe to updates
[09:34] pieterh	- queue incoming updates
[09:34] pieterh	- as soon as state arrives, apply updates to state and continue to do this
[09:34] keffo	m, that's what I was thinking as well.
[09:35] pieterh	i think it's robust but need to prove it
[09:35] keffo	Things do start to get hairy if the monitor app starts to depend on delta-updates though, like "join/part" of nodes
[09:35] keffo	you're doing this type of stuff for ch 4?
[09:35] pieterh	yup
[09:36] pieterh	stateful pubsub
[09:36] pieterh	or whatever this is properly called...
[09:36] keffo	I'll try it out, see if it behaves well.. :)
[09:37] pieterh	feel free to write it up as a recipe or code sample
[09:37] pieterh	if i can reuse that for the guide it'll save me time
[09:37] keffo	I'll let you know for sure yeah
[09:38] pieterh	the main reason for this is not so much to save network bandwidth but to allow realtime updates
[09:38] keffo	I was aso thinking about 'history-nodes' as well.. something appealing about that, someone who keeps track of what's going on, but isn't directly part -of- the system
[09:38] pieterh	indeed, this work could be totally outsourced to a stateful device
[09:38] keffo	non-gonzo network monitoring
[09:39] pieterh	with some notion of state + patches
[09:39] pieterh	like pair/value updates
[09:39] pieterh	hmm, nice
[09:39] pieterh	it's a distributed cache
[09:40] keffo	yes, distributed tuple-storage
[09:40] pieterh	yup, that's the thing
[09:40] keffo	I cant help but thinking most source control software face much of the same issues here
[09:40] pieterh	in the general solution any node can update its cache
[09:41] pieterh	it is a very useful general solution to state distribution
[09:41] keffo	oh yeah, one can think of it as cache hits and misses, makes things clearer
[09:42] pieterh	i'd start with a simple model, one publisher, many subscribers, pair/value updates
[09:43] keffo	It reuses the same mechanisms as the rest
[09:44] keffo	maybe I could contribute some C# stuff to go along with ch 4.. I do find it quite lacking on the site..
[09:45] pieterh	keffo: we've gotten one C# example yesterday, but others would be great
[09:46] pieterh	do start at ch1 if you could, it's quite trivial stuff but useful to newcomers
[09:46] keffo	yeah..
[09:46] keffo	need a complete binding, for starters :)
[09:47] pieterh	the binding is not complete?
[09:47] keffo	no, there was no poll for example?
[09:47] keffo	or did I somehow get an old version?
[09:48] keffo	http://github.com/zeromq/clrzmq/blob/master/clrzmq/zmq.cs
[09:48] keffo	nope..
[09:48] pieterh	who maintains this binding?
[09:49] pieterh	ask them to fix it or submit a patch
[09:49] keffo	not sure, says sustrik last commit I guess
[09:49] pieterh	hmm, the owner of every project should be WRITTEN IN HUGE LETTERS
[09:49] keffo	then there's http://nzmq.codeplex.com as well
[09:49] pieterh	otherwise it's kind of dead by definition
[09:49] pieterh	nzmq is something layered on top afaics
[09:49] pieterh	different API
[09:50] keffo	yes, but it does the same lowlevel binding of the dll
[09:50] pieterh	a'ight
[09:51] keffo	So merging them shouldnt be very difficult
[09:51] keffo	on the list?
[09:51] pieterh	"What's the latest version of ZeroMQ that's not LGPL? I need to static link in commercial projects and the LGPL is not an option. "
[09:51] pieterh	on http://www.zeromq.org/blog:rfc-0mq-contributions
[09:53] pieterh	not a single sensible comment on that thread, just trolls coming to complain that [sic] switching to LGPL will kill 0MQ...
[09:53] keffo	will it? =)
[09:54] pieterh	oh, yes, of course...
[09:54] pieterh	that's why we have to go back in time and switch to the Microsoft Open Software License or whatever...
[09:55] keffo	licensing is tricky business, it went and got itself hugely complicated
[09:55] keffo	I prefer "mine" and "public domain" :)
[09:55] pieterh	it's not really tricky, just politically sensitive because it involves so much money
[09:56] pieterh	every license is the contract on which the community grows
[09:56] pieterh	LGPL and GPL are IME proven beyond a reasonable doubt to be the most effective contracts
[09:56] pieterh	because they make it impossible to cheat
[09:56] pieterh	end.
[09:56] PerfDave	Not impossible, see gpl-violations.org. But very difficult ;)
[09:57] pieterh	impossible in any sustainable sense
[09:57] pieterh	and GPLv3 closed the loopholes people found in GPLv2
[09:57] keffo	who has the time anyway :)
[09:57] pieterh	oh, people love to cheat
[09:57] pieterh	but communities die when they get parasited
[09:58] pieterh	so these trolls come and complain that LGPL will kill the community when in fact it creates it
[09:58] pieterh	sigh.
[09:58] keffo	:)
[10:03] keffo	what's the most elaborate zmq based project anyway?
[10:07] pieterh	keffo: wow, there are some very elaborate ones out there
[10:07] pieterh	but most are so secret that I'd have to kill you after explaining them
[10:07] keffo	Oh I just mean scale, not impl. details :)
[10:08] keffo	I'd like to know what level I'm at :)
[10:08] pieterh	scale: hundreds to thousands of nodes
[10:09] pieterh	multiple data centers
[10:10] keffo	are they mostly about shuffling data around, or generic compute clusters?
[10:10] pieterh	both cases
[10:10] keffo	interesting stuff
[10:10] pieterh	it's also growing rapidly
[10:11] keffo	I would assume so!
[10:11] pieterh	the first 0MQ projects a year or two ago were maybe 10 nodes
[10:11] pieterh	i'd say the scale is growing x10 every six months or so
[10:12] keffo	It would be nice if the license included algorithmic contributions as opposed to solely sourcecode :)
[10:12] pieterh	well, ideas cannot be copyrighted
[10:13] keffo	A lot of information was probably gathered during development of those, which ideally should be shared :)
[10:13] pieterh	well
[10:13] pieterh	whenever possible we do move experience into the open source layers
[10:14] pieterh	however there are often valuable business secrets in these algorithms
[10:14] pieterh	obviously we do not consider sharing those
[10:14] keffo	I was more thinking along the lines of "this common-practice method breaks down under these conditions" etc
[10:15] pieterh	this is what the user guide will eventually cover
[10:15] pieterh	at least the more common cases
[10:15] pieterh	we can also try to document some of the higher level patterns as protocols
[10:15] keffo	Some of the things I've found annoyingly void so far has been the loadbalancing(which is now covered well enough), and also recursive behaviour, which I think I've solved
[10:16] pieterh	have you been using the custom routing from Ch3?
[10:16] keffo	Yeah, pretty much, but with prioqueues
[10:16] pieterh	what is a prioqueue?
[10:16] keffo	both for incoming tasks and outgoing results
[10:16] keffo	priority queue
[10:16] pieterh	ah, so queues in your broker rather than using the socket queues
[10:17] keffo	indeed
[10:17] pieterh	i just didn't want to start mucking with data structures
[10:17] pieterh	but I think it's inevitable
[10:17] pieterh	i already had to define a zmsg class, will probably define a zqueue class as well
[10:17] keffo	messages and tasks have priorities, as well as workers based on a mix of scimark(hardware) and something like a running average network 'behaviour'
[10:17] pieterh	right
[10:18] keffo	Another thing I've been struggling to figure out a 'pretty' solution to is how to make workers present their updates when doing long running jobs..
[10:18] pieterh	intermediate updates?
[10:18] keffo	That' sortof ties into what we talked about earlier with the Monitoring app
[10:19] keffo	yeah
[10:19] pieterh	two threads in the workers, I assume
[10:19] pieterh	workers as micro clusters
[10:19] pieterh	it's fractal :-)
[10:19] pieterh	every node can be a cluster of nodes
[10:19] keffo	leaving it solely up to the designer of the job (ie, they post progress at will) leads to abuse most likely
[10:20] pieterh	sounds like you're solving a lot of interesting problems
[10:20] keffo	indeed :)
[10:20] keffo	most interesting of all is the fact that a job can post new jobs :)
[10:20] pieterh	you should write about it, if you can
[10:20] keffo	that's a brain teaser if anything :)
[10:20] keffo	yeah
[10:20] keffo	It's becoming quite large to be honest :)
[10:20] pieterh	well, stack-based simulated recursion is an old technique
[10:20] keffo	but it's working well
[10:20] pieterh	it's how we used to do quicksort in cobol
[10:21] keffo	stack-based?
[10:21] pieterh	your prioqueue can also be a stack
[10:21] pieterh	you can push jobs to the front
[10:21] pieterh	or to the back
[10:21] keffo	ah yeah, that's the priority of the messages :)
[10:21] pieterh	that's how you simulate recursion
[10:21] keffo	he deeper the recursion, the higher the priority
[10:21] pieterh	priority is perhaps the wrong metaphor
[10:22] pieterh	in fact it's "push these child jobs" followed by "pop next job and execute"
[10:23] pieterh	hah, I found an old paper on this: http://www.arnoldtrembley.com/svalgard.htm
[10:24] pieterh	If you look at section 3, you see how Quicksort (recursion) works using a stack
[10:24] keffo	My solution was to have each worker-node (which owns one worker-process per cpu-core) can request to have an additional worker spawned while it sleeps.. So when it posts a child-job, it does so with a higher priority
[10:24] pieterh	http://www.arnoldtrembley.com/pseudor2.htm
[10:24] keffo	interesting
[10:25] pieterh	Leif and I developed these techniques in the 80's...
[10:25] keffo	how old are you? =)
[10:25] pieterh	it's very basic but I think it maps correctly to recursive messaging
[10:25] pieterh	not so old
[10:25] pieterh	:-)
[10:25] pieterh	47, to be accurate
[10:26] keffo	that was grad student times then? =)
[10:26] pieterh	nope, first job developing tools for large software houses
[10:26] pieterh	event-driven concurrency in cobol
[10:27] keffo	obol of all things :)
[10:27] keffo	there's a scary amount of need for cobol developers now
[10:28] pieterh	lol...
[10:28] pieterh	we used to train people to become cobol developers in like 3 weeks
[10:29] keffo	That's even more scary :)
[10:29] keffo	like todays php crowd I guess?
[10:29] pieterh	hmm, yeah, I guess
[10:29] pieterh	Cobol was good for mediocre programmers, they could make stuff that worked, and didn't kill the system
[10:30] keffo	I'd be very afraid if my bank announced they moved from cobol to php :)
[10:30] pieterh	indeed
[10:30] keffo	or voting/pacemakers :)
[11:33] CIA-20	jzmq: 03Stefan Majer 07master * rd46166f 10/ src/org/zeromq/ZMQ.java : Merged from upstream - http://bit.ly/c9pkwW
[11:33] CIA-20	jzmq: 03Stefan Majer 07master * r05b384d 10/ src/org/zeromq/ZMQ.java : Reduced duplicate Javadoc comments by references to the corresponding setter. - http://bit.ly/dvaPdr
[11:33] CIA-20	jzmq: 03Stefan Majer 07master * r1a98406 10/ src/org/zeromq/ZMQ.java : References to the man pages to further clarify the Javadoc. - http://bit.ly/c5FVwB
[11:33] CIA-20	jzmq: 03Gonzalo Diethelm 07master * rc7c9929 10/ src/org/zeromq/ZMQ.java : Merge branch 'master' of http://github.com/majst01/jzmq into majst02-master - http://bit.ly/93TVKZ
[13:27] CIA-20	zeromq2: 03Gonzalo Diethelm 07master * r87beaaa 10/ (14 files in 4 dirs): ZMQ_TYPE socket option added - http://bit.ly/b3HZYe
[13:32] CIA-20	zeromq2: 03Martin Sustrik 07master * r10bb9d0 10/ AUTHORS : Dhammika Pathirana was missing from the AUTOHRS file for some reason -- fixed - http://bit.ly/cz4TZ7
[14:39] CIA-20	zeromq2: 03Steven McCoy 07master * r00cd7d4 10/ (7 files in 3 dirs): Upgrade to OpenPGM-5.0.78 - http://bit.ly/aKqJjZ
[15:01] CIA-20	zeromq2: 03Steven McCopy 07master * r1dc4531 10/ (4 files): (log message trimmed)
[15:01] CIA-20	zeromq2: * Add assertions to check for OpenPGM calls with invalid parameters.
[15:01] CIA-20	zeromq2: * Assertion to check that pgm_getaddrinfo is actually returning something.
[15:01] CIA-20	zeromq2: * Missing pgm_connect call.
[15:01] CIA-20	zeromq2: * Typo on TOS causing immediate abort.
[15:01] CIA-20	zeromq2: * Placeholder calls for timeouts whilst continuing spin loop functionality.
[15:01] CIA-20	zeromq2: * OpenPGM v5 now supports reference counting so remove init checks.
[15:28] psino	I'm having some issues with understanding how a PUSH socket with no downstream nodes is supposed to behave. There seem to be a difference between the sockets whether they have been used as .bind or .connect
[15:28] psino	I have a small test case here: https://gist.github.com/a7e8b4fdfc303ce4b0e6
[15:28] psino	the connect_socket sends the data (send does not block) even if there are no downstream nodes, but the bind_socket blocks
[15:30] psino	is this the expected behaviour? from what I understand after reading http://api.zeromq.org/zmq_socket.html is that both should have been blocking
[15:31] psino	As its said here: "When a ZMQ_PUSH socket enters an exceptional state due to having reached the high water mark for all downstream nodes, or if there are no downstream nodes at all, then any zmq_send(3) operations on the socket shall block until the exceptional state ends or at least one downstream node becomes available for sending; messages are not discarded."
[15:37] sustrik	psino: yes. it works that way
[15:38] sustrik	on connect, socket can immediately create a queue to store messages in -- even before actual connection is established
[15:38] psino	so the manual is incorrect/imprecise?
[15:39] sustrik	when binding, the socket has to wait till peers connect, it cannot create a queue itself (it doesn't even know whether there'll be a connection in the future)
[15:39] sustrik	imprecise, i would say
[15:39] sustrik	connect assumed that there's a peer
[15:40] sustrik	exactly 1 peer
[15:40] sustrik	bind makes no such assumption
[15:41] psino	hmm
[15:52] jashmenn	hey can anyone give me some guidance on socket migration between threads
[15:52] jashmenn	i'm having a tough time figuring out how that works
[15:52] jashmenn	basically i want to close a socket from a thread different than the thread that started it
[15:58] mato	sustrik: yo
[15:59] sustrik	mato: hi
[15:59] mato	sustrik: i would have appreciated a heads up on the OpenPGM commits
[16:00] mato	sustrik: it does touch the build system, of which you declared me the maintainer :-)
[16:00] sustrik	ah
[16:00] sustrik	i blindly applied the patches :\|
[16:00] mato	also, isn't the author of those commits steve mccoy?
[16:00] sustrik	yes, he is
[16:00] mato	ah, it's there
[16:00] mato	'cept you have a typo
[16:00] mato	but that's my fault
[16:01] sustrik	?
[16:01] mato	sustrik: you should not be typing author names manually if at all possible
[16:01] sustrik	no idea how to do it automatically
[16:01] mato	git-am :-)
[16:02] sustrik	good god!
[16:02] mato	sustrik: anyway, please do ask for review before committing stuff that touches other people's designated areas of responsibility
[16:02] sustrik	anyway, can you check the part that touches the build system post hoc?
[16:02] mato	yes, i can, i will
[16:02] sustrik	thanks
[16:02] mato	but please don't do that in future :-)
[16:02] mato	including e.g. docs
[16:02] sustrik	sure
[16:03] sustrik	eek
[16:03] mato	eek?
[16:03] sustrik	i've just committed gonzalos patch
[16:03] sustrik	which touchs docs
[16:03] sustrik	ZMQ_TYPE socket opt description
[16:03] mato	it also has a commit to .gitignore for bin/ for some reason
[16:03] mato	as part of gonzalo's patch
[16:03] sustrik	hm
[16:04] sustrik	it shouldn't be there but once it's there it doesn't hurt
[16:04] mato	sure
[16:04] mato	but my point is be careful about what is in a patch
[16:04] sustrik	what are we going to do with patches that intersect different areas of functionality?
[16:04] sustrik	there's no clear committer there
[16:05] mato	ask the interested parties for review
[16:05] mato	that's how it normally works
[16:05] sustrik	ok, will do
[16:05] mato	also, if you want my attention quickly a good way to do that is to Cc: me directly as well as sending email to the list
[16:05] sustrik	ack
[16:05] mato	that way it lands in my INBOX rather than in the auto-filtered-away list folder :-)
[16:42] CIA-20	jzmq: 03Gonzalo Diethelm 07master * re3e6b7f 10/ (src/Socket.cpp src/org/zeromq/ZMQ.java): Added support for getType() and added some missing constants. - http://bit.ly/at8wUF
[17:30] vharron	Is there a list of projects using zeromq? I'm trying to understand what it does well by looking at the problems it solves.
[17:52] drbobbeaty	vharron: I can't give you a list of projects, but I can tell you about mine. I'm using ZMQ as a transport layer using OpenPGM and the reliable multicast. I'm using this because what I need is a transport more than a full-fledged messaging system like Tibco or 29West. It's working out wonderfully in this capacity and we're moving a lot of messages through the system.
[17:53] drbobbeaty	It's not a real full-blown messaging system - but you can build one with it. I just see it's real target as a very advanced transport system with a lot of flexibility.
[20:47] CIA-20	zeromq2: 03Steven McCoy 07master * ra729357 10/ (6 files): more fixes to (e)pgm transport - http://bit.ly/9E1o2T
[23:14] lluad	If I have multiple clients connecting to a single server over TCP, is there a good idiom for the server to send an unsolicited message to just one of the clients?
[23:20] Zao	lluad: When connecting a client, send a connection string to the server if it needs to communicate with you?