Tuesday September 28, 2010

[Time] NameMessage
[05:39] CIA-20 zeromq2: 03Martin Sustrik 07maint * rf61921d 10/ src/req.cpp : REQ socket can die when reply is delivered on wrong unerlying connection -- fixed -
[05:46] CIA-20 zeromq2: 03Dhammika Pathirana 07maint * rc1deb22 10/ src/ypipe.hpp : crash when closing an ypipe -- fixed -
[05:53] CIA-20 zeromq2: 03Martin Sustrik 07master * rf61921d 10/ src/req.cpp : REQ socket can die when reply is delivered on wrong unerlying connection -- fixed -
[05:53] CIA-20 zeromq2: 03Dhammika Pathirana 07master * rc1deb22 10/ src/ypipe.hpp : crash when closing an ypipe -- fixed -
[05:53] CIA-20 zeromq2: 03Martin Sustrik 07master * r6715f9b 10/ src/ypipe.hpp :
[05:53] CIA-20 zeromq2: Merge branch 'maint'
[05:53] CIA-20 zeromq2: * maint:
[05:53] CIA-20 zeromq2: crash when closing an ypipe -- fixed -
[09:20] keffo pieterh, around?
[09:21] pieterh keffo: hi!
[09:23] keffo Busy?
[09:24] pieterh always, but shoot...
[09:24] keffo I'm in the process of solidifying how the network is monitored.. Currently I gather various info & statistics in the loadbalancer which publishes regularly(~5s), and a WPF app subscribing and displaying nice graphs etc
[09:25] pieterh sounds good
[09:25] keffo but I cant really figure out a good way of limiting it.. I dont want to send a complete state every 5s
[09:25] pieterh how large is the state?
[09:26] keffo Depends, both on what info I decide to publish.. I'd like load, bandwidth usage, connected nodes and their respective stats(cpu/ram/etc), but also an overview of what is happenening, as well as more detailed info for each worker
[09:27] pieterh estimated size? in bytes?
[09:27] keffo geesh, no idea, not in the mb range at least :)
[09:27] pieterh if you have no idea, it's not sensible to think about optimizing it
[09:27] pieterh so do a back-of-envelope calculation and come up with a figure...
[09:28] keffo I wondered if it was sound to have the monitoring app poll a complete current-state at startup, and then depend on a persistent sub-forwarder to handle deltastates? But that sounds very complicated
[09:28] keffo I can't really guestimate, the number of nodes can range from the local setup I have here of a few machines, to much larger..
[09:28] pieterh ...
[09:29] pieterh how big is "much larger"?
[09:29] keffo ideally wan :)
[09:29] pieterh please stick a number onto it...
[09:29] keffo but being realistic, perhaps around 20?
[09:29] pieterh and how large would the state be per worker?
[09:30] pieterh please stick a number onto it...
[09:30] pieterh then multiply the two numbers and add something for the overview
[09:30] keffo Basic info(linpack measurements), uptime, average load, around that
[09:30] pieterh come back when you have a total in KB per 5 seconds, a'ight?
[09:32] keffo That's not what I'm interested in, the assumption here is that no data is published that isn't "needed", but I would like to figure out the most efficient means of passing around that data
[09:32] keffo basically how to monitor a distributed system with the least amount of overhead as possible..
[09:32] pieterh this is for research purposes rather than an actual use case...
[09:32] keffo (regardless of what the data actually is)
[09:33] keffo well both I guess, research first, use later :)
[09:33] pieterh well, you can wait until Ch4 of the Guide if you want to
[09:33] keffo Surely this problem has been dealt with before, as loadbalancing has :)
[09:33] pieterh but here is how I'd do it...
[09:33] pieterh - maintain state in the publisher
[09:33] pieterh - apply updates to state and publish updates to pub socket
[09:34] pieterh - in subscriber, request state via req/rep socket
[09:34] pieterh - and also subscribe to updates
[09:34] pieterh - queue incoming updates
[09:34] pieterh - as soon as state arrives, apply updates to state and continue to do this
[09:34] keffo m, that's what I was thinking as well.
[09:35] pieterh i think it's robust but need to prove it
[09:35] keffo Things do start to get hairy if the monitor app starts to depend on delta-updates though, like "join/part" of nodes
[09:35] keffo you're doing this type of stuff for ch 4?
[09:35] pieterh yup
[09:36] pieterh stateful pubsub
[09:36] pieterh or whatever this is properly called...
[09:36] keffo I'll try it out, see if it behaves well.. :)
[09:37] pieterh feel free to write it up as a recipe or code sample
[09:37] pieterh if i can reuse that for the guide it'll save me time
[09:37] keffo I'll let you know for sure yeah
[09:38] pieterh the main reason for this is not so much to save network bandwidth but to allow realtime updates
[09:38] keffo I was aso thinking about 'history-nodes' as well.. something appealing about that, someone who keeps track of what's going on, but isn't directly part -of- the system
[09:38] pieterh indeed, this work could be totally outsourced to a stateful device
[09:38] keffo non-gonzo network monitoring
[09:39] pieterh with some notion of state + patches
[09:39] pieterh like pair/value updates
[09:39] pieterh hmm, nice
[09:39] pieterh it's a distributed cache
[09:40] keffo yes, distributed tuple-storage
[09:40] pieterh yup, that's the thing
[09:40] keffo I cant help but thinking most source control software face much of the same issues here
[09:40] pieterh in the general solution any node can update its cache
[09:41] pieterh it is a very useful general solution to state distribution
[09:41] keffo oh yeah, one can think of it as cache hits and misses, makes things clearer
[09:42] pieterh i'd start with a simple model, one publisher, many subscribers, pair/value updates
[09:43] keffo It reuses the same mechanisms as the rest
[09:44] keffo maybe I could contribute some C# stuff to go along with ch 4.. I do find it quite lacking on the site..
[09:45] pieterh keffo: we've gotten one C# example yesterday, but others would be great
[09:46] pieterh do start at ch1 if you could, it's quite trivial stuff but useful to newcomers
[09:46] keffo yeah..
[09:46] keffo need a complete binding, for starters :)
[09:47] pieterh the binding is not complete?
[09:47] keffo no, there was no poll for example?
[09:47] keffo or did I somehow get an old version?
[09:48] keffo
[09:48] keffo nope..
[09:48] pieterh who maintains this binding?
[09:49] pieterh ask them to fix it or submit a patch
[09:49] keffo not sure, says sustrik last commit I guess
[09:49] pieterh hmm, the owner of every project should be WRITTEN IN HUGE LETTERS
[09:49] keffo then there's as well
[09:49] pieterh otherwise it's kind of dead by definition
[09:49] pieterh nzmq is something layered on top afaics
[09:49] pieterh different API
[09:50] keffo yes, but it does the same lowlevel binding of the dll
[09:50] pieterh a'ight
[09:51] keffo So merging them shouldnt be very difficult
[09:51] keffo on the list?
[09:51] pieterh "What's the latest version of ZeroMQ that's not LGPL? I need to static link in commercial projects and the LGPL is not an option. "
[09:51] pieterh on
[09:53] pieterh not a single sensible comment on that thread, just trolls coming to complain that [sic] switching to LGPL will kill 0MQ...
[09:53] keffo will it? =)
[09:54] pieterh oh, yes, of course...
[09:54] pieterh that's why we have to go back in time and switch to the Microsoft Open Software License or whatever...
[09:55] keffo licensing is tricky business, it went and got itself hugely complicated
[09:55] keffo I prefer "mine" and "public domain" :)
[09:55] pieterh it's not really tricky, just politically sensitive because it involves so much money
[09:56] pieterh every license is the contract on which the community grows
[09:56] pieterh LGPL and GPL are IME proven beyond a reasonable doubt to be the most effective contracts
[09:56] pieterh because they make it impossible to cheat
[09:56] pieterh end.
[09:56] PerfDave Not *impossible*, see But very difficult ;)
[09:57] pieterh impossible in any sustainable sense
[09:57] pieterh and GPLv3 closed the loopholes people found in GPLv2
[09:57] keffo who has the time anyway :)
[09:57] pieterh oh, people love to cheat
[09:57] pieterh but communities die when they get parasited
[09:58] pieterh so these trolls come and complain that LGPL will kill the community when in fact it creates it
[09:58] pieterh sigh.
[09:58] keffo :)
[10:03] keffo what's the most elaborate zmq based project anyway?
[10:07] pieterh keffo: wow, there are some very elaborate ones out there
[10:07] pieterh but most are so secret that I'd have to kill you after explaining them
[10:07] keffo Oh I just mean scale, not impl. details :)
[10:08] keffo I'd like to know what level I'm at :)
[10:08] pieterh scale: hundreds to thousands of nodes
[10:09] pieterh multiple data centers
[10:10] keffo are they mostly about shuffling data around, or generic compute clusters?
[10:10] pieterh both cases
[10:10] keffo interesting stuff
[10:10] pieterh it's also growing rapidly
[10:11] keffo I would assume so!
[10:11] pieterh the first 0MQ projects a year or two ago were maybe 10 nodes
[10:11] pieterh i'd say the scale is growing x10 every six months or so
[10:12] keffo It would be nice if the license included algorithmic contributions as opposed to solely sourcecode :)
[10:12] pieterh well, ideas cannot be copyrighted
[10:13] keffo A lot of information was probably gathered during development of those, which ideally should be shared :)
[10:13] pieterh well
[10:13] pieterh whenever possible we do move experience into the open source layers
[10:14] pieterh however there are often valuable business secrets in these algorithms
[10:14] pieterh obviously we do not consider sharing those
[10:14] keffo I was more thinking along the lines of "this common-practice method breaks down under these conditions" etc
[10:15] pieterh this is what the user guide will eventually cover
[10:15] pieterh at least the more common cases
[10:15] pieterh we can also try to document some of the higher level patterns as protocols
[10:15] keffo Some of the things I've found annoyingly void so far has been the loadbalancing(which is now covered well enough), and also recursive behaviour, which I think I've solved
[10:16] pieterh have you been using the custom routing from Ch3?
[10:16] keffo Yeah, pretty much, but with prioqueues
[10:16] pieterh what is a prioqueue?
[10:16] keffo both for incoming tasks and outgoing results
[10:16] keffo priority queue
[10:16] pieterh ah, so queues in your broker rather than using the socket queues
[10:17] keffo indeed
[10:17] pieterh i just didn't want to start mucking with data structures
[10:17] pieterh but I think it's inevitable
[10:17] pieterh i already had to define a zmsg class, will probably define a zqueue class as well
[10:17] keffo messages and tasks have priorities, as well as workers based on a mix of scimark(hardware) and something like a running average network 'behaviour'
[10:17] pieterh right
[10:18] keffo Another thing I've been struggling to figure out a 'pretty' solution to is how to make workers present their updates when doing long running jobs..
[10:18] pieterh intermediate updates?
[10:18] keffo That' sortof ties into what we talked about earlier with the Monitoring app
[10:19] keffo yeah
[10:19] pieterh two threads in the workers, I assume
[10:19] pieterh workers as micro clusters
[10:19] pieterh it's fractal :-)
[10:19] pieterh every node can be a cluster of nodes
[10:19] keffo leaving it solely up to the designer of the job (ie, they post progress at will) leads to abuse most likely
[10:20] pieterh sounds like you're solving a lot of interesting problems
[10:20] keffo indeed :)
[10:20] keffo most interesting of all is the fact that a job can post new jobs :)
[10:20] pieterh you should write about it, if you can
[10:20] keffo that's a brain teaser if anything :)
[10:20] keffo yeah
[10:20] keffo It's becoming quite large to be honest :)
[10:20] pieterh well, stack-based simulated recursion is an old technique
[10:20] keffo but it's working well
[10:20] pieterh it's how we used to do quicksort in cobol
[10:21] keffo stack-based?
[10:21] pieterh your prioqueue can also be a stack
[10:21] pieterh you can push jobs to the front
[10:21] pieterh or to the back
[10:21] keffo ah yeah, that's the priority of the messages :)
[10:21] pieterh that's how you simulate recursion
[10:21] keffo he deeper the recursion, the higher the priority
[10:21] pieterh priority is perhaps the wrong metaphor
[10:22] pieterh in fact it's "push these child jobs" followed by "pop next job and execute"
[10:23] pieterh hah, I found an old paper on this:
[10:24] pieterh If you look at section 3, you see how Quicksort (recursion) works using a stack
[10:24] keffo My solution was to have each worker-node (which owns one worker-process per cpu-core) can request to have an additional worker spawned while it sleeps.. So when it posts a child-job, it does so with a higher priority
[10:24] pieterh
[10:24] keffo interesting
[10:25] pieterh Leif and I developed these techniques in the 80's...
[10:25] keffo how old are you? =)
[10:25] pieterh it's very basic but I think it maps correctly to recursive messaging
[10:25] pieterh not so old
[10:25] pieterh :-)
[10:25] pieterh 47, to be accurate
[10:26] keffo that was grad student times then? =)
[10:26] pieterh nope, first job developing tools for large software houses
[10:26] pieterh event-driven concurrency in cobol
[10:27] keffo obol of all things :)
[10:27] keffo there's a scary amount of need for cobol developers now
[10:28] pieterh lol...
[10:28] pieterh we used to train people to become cobol developers in like 3 weeks
[10:29] keffo That's even more scary :)
[10:29] keffo like todays php crowd I guess?
[10:29] pieterh hmm, yeah, I guess
[10:29] pieterh Cobol was good for mediocre programmers, they could make stuff that worked, and didn't kill the system
[10:30] keffo I'd be very afraid if my bank announced they moved from cobol to php :)
[10:30] pieterh indeed
[10:30] keffo or voting/pacemakers :)
[11:33] CIA-20 jzmq: 03Stefan Majer 07master * rd46166f 10/ src/org/zeromq/ : Merged from upstream -
[11:33] CIA-20 jzmq: 03Stefan Majer 07master * r05b384d 10/ src/org/zeromq/ : Reduced duplicate Javadoc comments by references to the corresponding setter. -
[11:33] CIA-20 jzmq: 03Stefan Majer 07master * r1a98406 10/ src/org/zeromq/ : References to the man pages to further clarify the Javadoc. -
[11:33] CIA-20 jzmq: 03Gonzalo Diethelm 07master * rc7c9929 10/ src/org/zeromq/ : Merge branch 'master' of into majst02-master -
[13:27] CIA-20 zeromq2: 03Gonzalo Diethelm 07master * r87beaaa 10/ (14 files in 4 dirs): ZMQ_TYPE socket option added -
[13:32] CIA-20 zeromq2: 03Martin Sustrik 07master * r10bb9d0 10/ AUTHORS : Dhammika Pathirana was missing from the AUTOHRS file for some reason -- fixed -
[14:39] CIA-20 zeromq2: 03Steven McCoy 07master * r00cd7d4 10/ (7 files in 3 dirs): Upgrade to OpenPGM-5.0.78 -
[15:01] CIA-20 zeromq2: 03Steven McCopy 07master * r1dc4531 10/ (4 files): (log message trimmed)
[15:01] CIA-20 zeromq2: * Add assertions to check for OpenPGM calls with invalid parameters.
[15:01] CIA-20 zeromq2: * Assertion to check that pgm_getaddrinfo is actually returning something.
[15:01] CIA-20 zeromq2: * Missing pgm_connect call.
[15:01] CIA-20 zeromq2: * Typo on TOS causing immediate abort.
[15:01] CIA-20 zeromq2: * Placeholder calls for timeouts whilst continuing spin loop functionality.
[15:01] CIA-20 zeromq2: * OpenPGM v5 now supports reference counting so remove init checks.
[15:28] psino I'm having some issues with understanding how a PUSH socket with no downstream nodes is supposed to behave. There seem to be a difference between the sockets whether they have been used as .bind or .connect
[15:28] psino I have a small test case here:
[15:28] psino the connect_socket sends the data (send does not block) even if there are no downstream nodes, but the bind_socket blocks
[15:30] psino is this the expected behaviour? from what I understand after reading is that both should have been blocking
[15:31] psino As its said here: "When a ZMQ_PUSH socket enters an exceptional state due to having reached the high water mark for all downstream nodes, or if there are no downstream nodes at all, then any zmq_send(3) operations on the socket shall block until the exceptional state ends or at least one downstream node becomes available for sending; messages are not discarded."
[15:37] sustrik psino: yes. it works that way
[15:38] sustrik on connect, socket can immediately create a queue to store messages in -- even before actual connection is established
[15:38] psino so the manual is incorrect/imprecise?
[15:39] sustrik when binding, the socket has to wait till peers connect, it cannot create a queue itself (it doesn't even know whether there'll be a connection in the future)
[15:39] sustrik imprecise, i would say
[15:39] sustrik connect assumed that there's a peer
[15:40] sustrik exactly 1 peer
[15:40] sustrik bind makes no such assumption
[15:41] psino hmm
[15:52] jashmenn hey can anyone give me some guidance on socket migration between threads
[15:52] jashmenn i'm having a tough time figuring out how that works
[15:52] jashmenn basically i want to close a socket from a thread different than the thread that started it
[15:58] mato sustrik: yo
[15:59] sustrik mato: hi
[15:59] mato sustrik: i would have appreciated a heads up on the OpenPGM commits
[16:00] mato sustrik: it does touch the build system, of which you declared me the maintainer :-)
[16:00] sustrik ah
[16:00] sustrik i blindly applied the patches :|
[16:00] mato also, isn't the author of those commits steve mccoy?
[16:00] sustrik yes, he is
[16:00] mato ah, it's there
[16:00] mato 'cept you have a typo
[16:00] mato but that's my fault
[16:01] sustrik ?
[16:01] mato sustrik: you should not be typing author names manually if at all possible
[16:01] sustrik no idea how to do it automatically
[16:01] mato git-am :-)
[16:02] sustrik good god!
[16:02] mato sustrik: anyway, please do ask for review before committing stuff that touches other people's designated areas of responsibility
[16:02] sustrik anyway, can you check the part that touches the build system post hoc?
[16:02] mato yes, i can, i will
[16:02] sustrik thanks
[16:02] mato but please don't do that in future :-)
[16:02] mato including e.g. docs
[16:02] sustrik sure
[16:03] sustrik eek
[16:03] mato eek?
[16:03] sustrik i've just committed gonzalos patch
[16:03] sustrik which touchs docs
[16:03] sustrik ZMQ_TYPE socket opt description
[16:03] mato it also has a commit to .gitignore for bin/ for some reason
[16:03] mato as part of gonzalo's patch
[16:03] sustrik hm
[16:04] sustrik it shouldn't be there but once it's there it doesn't hurt
[16:04] mato sure
[16:04] mato but my point is be careful about what is in a patch
[16:04] sustrik what are we going to do with patches that intersect different areas of functionality?
[16:04] sustrik there's no clear committer there
[16:05] mato ask the interested parties for review
[16:05] mato that's how it normally works
[16:05] sustrik ok, will do
[16:05] mato also, if you want my attention quickly a good way to do that is to Cc: me directly as well as sending email to the list
[16:05] sustrik ack
[16:05] mato that way it lands in my INBOX rather than in the auto-filtered-away list folder :-)
[16:42] CIA-20 jzmq: 03Gonzalo Diethelm 07master * re3e6b7f 10/ (src/Socket.cpp src/org/zeromq/ Added support for getType() and added some missing constants. -
[17:30] vharron Is there a list of projects using zeromq? I'm trying to understand what it does well by looking at the problems it solves.
[17:52] drbobbeaty vharron: I can't give you a list of projects, but I can tell you about mine. I'm using ZMQ as a transport layer using OpenPGM and the reliable multicast. I'm using this because what I need is a transport more than a full-fledged messaging system like Tibco or 29West. It's working out wonderfully in this capacity and we're moving a lot of messages through the system.
[17:53] drbobbeaty It's not a real full-blown messaging system - but you can build one with it. I just see it's real target as a very advanced transport system with a lot of flexibility.
[20:47] CIA-20 zeromq2: 03Steven McCoy 07master * ra729357 10/ (6 files): more fixes to (e)pgm transport -
[23:14] lluad If I have multiple clients connecting to a single server over TCP, is there a good idiom for the server to send an unsolicited message to just one of the clients?
[23:20] Zao lluad: When connecting a client, send a connection string to the server if it needs to communicate with you?