Sunday June 19, 2011

[Time] NameMessage
[07:27] CIA-32 libzmq: 03Martin Sustrik 07master * r4b60023 10/ (5 files in 2 dirs): Merge branch 'master' of -
[07:27] CIA-32 libzmq: 03Martin Sustrik 07master * r5b77a41 10/ (perf/inproc_thr.cpp perf/local_thr.cpp perf/remote_thr.cpp): Throughput tests fixed. ...
[07:34] CIA-32 libzmq: 03Martin Sustrik 07master * r6052709 10/ src/tcp_connecter.cpp : ENETDOWN is a legal error from TCP connect ...
[09:18] CIA-32 libzmq: 03Martin Sustrik 07master * r00dc024 10/ src/pipe.cpp : Race condition in pipe_t fixed. ...
[10:26] fredix hi
[10:27] fredix I can't find an example with push/pull and load balancing
[10:37] mikko fredix: the load balancing happens automatically between connected peers
[10:37] mikko round-robin
[10:41] fredix I try with taskvent
[10:41] fredix and 2 taskwork receive the message
[10:41] mikko fredix: have you read the guide?
[10:41] mikko there are a lot of load-balancing examples
[10:41] fredix yes
[10:50] CIA-32 libzmq: 03Martin Sustrik 07master * r9f4d376 10/ src/session.cpp : Session termination error fixed ...
[10:54] fredix ok
[10:54] fredix taskvent loadbalance between worker
[10:56] fredix but it's really strange that pub/sub cannot load balance between subscribers on the same filter
[11:37] sustrik why should it?
[11:38] sustrik pub/sub is for data distribution
[11:38] sustrik not for load balancing
[11:38] sustrik you can place a node in the middle that would subscribe for messages from upstream and load balance them to the downstream
[11:38] sustrik if that's what you want
[12:56] fredix sustrik, if you see my architecture maybe you should understand what I want :
[12:57] sustrik hm, not really
[12:57] sustrik anyway
[12:57] fredix sustrik, :)
[12:57] sustrik do you want to distribute or load-balance?
[12:57] fredix sustrik, the dispatcher send a message on each channel
[12:58] sustrik that's message distribution, ie. PUB/SUB
[12:58] fredix sustrik, I can have many work listen a channel, but only one receive a message
[12:59] sustrik so you have two-layered architecture
[12:59] fredix sustrik, but with pub/sub zeromq all worker receive the same message
[12:59] sustrik dispatcher sends messages to load-balancers
[12:59] sustrik load-balancers load-balance them among workers
[13:00] fredix mmm
[13:01] sustrik each load-balancer instance represents what you call a "channel"
[13:03] fredix the dispatcher and load balancer can it be in the same process ?
[13:04] sustrik sure
[13:04] sustrik that's what inproc transport is for
[13:04] fredix mmm
[13:04] fredix really powerfull
[13:04] fredix but complicated :)
[14:40] CIA-32 libzmq: 03Martin Sustrik 07master * red680a3 10/ doc/zmq_socket.txt : Documentation for XPUB and XSUB socket added ...
[17:09] CIA-32 libzmq: 03Martin Sustrik 07master * r082f8e1 10/ src/mailbox.cpp : Mailbox timeouts fixed on Windows ...
[18:42] pieterh sustrik: ping
[18:47] sustrik pieterh: pong
[18:47] pieterh hi martin, random question about tcp transport
[18:47] sustrik yes?
[18:47] pieterh I don't see any writev's
[18:47] sustrik there are none
[18:47] pieterh how are you batching writes...?
[18:48] sustrik small messages are copied to a buffer and sent in a single go
[18:48] pieterh ah, that makes sense
[18:48] sustrik large messages are sent in place
[18:48] pieterh any reason for not doing writev?
[18:48] sustrik too much overhead
[18:48] sustrik it's cheaper to copy small messages to a buffer
[18:48] sustrik than constucting iovecs
[18:49] pieterh I recall from openamq, for certain message flows it was much faster
[18:49] pieterh but indeed, copying to a single buffer was even faster
[18:49] sustrik we've measured it thoroughly back in 2008
[18:50] sustrik writev seemed to give no improvement
[18:50] pieterh do you have any documentation on the different optimizations that you studied at the time?
[18:50] pieterh writev is definitely faster than multiple writes, for small messages
[18:50] sustrik some of them, in no way all of them
[18:51] sustrik we've tested several options a day
[18:51] sustrik for several months
[18:52] pieterh ok, np, was just curious (am making TCP VTX driver now)
[18:52] sustrik the thumb of the rule is
[18:52] sustrik copy small messages, process large messages in place
[18:52] sustrik that's the reocurring pattern in HPC
[18:52] pieterh right
[18:53] pieterh clearly, given cycles to copy vs. cycles to process
[18:53] sustrik exactly
[18:53] pieterh regarding that 'separation' thread on email
[18:54] pieterh I actually do have a kind of API between transport and socket patterns, emerging
[18:54] sustrik goodo
[18:54] sustrik i would like to have it there
[18:54] sustrik but it's terribly complex atm
[18:54] pieterh it's nothing at all like the 0MQ API, but it might form a basis for a future framework
[18:54] pieterh it's complex, yes
[18:55] pieterh it'll get simpler as I make more drivers
[18:55] sustrik TCP is the most problematic one
[18:55] pieterh why do you think that?
[18:55] sustrik async connects, disconnects, reconnects
[18:56] pieterh ah, like that
[18:56] pieterh yes, UDP is very similar since I have a peering layer on top
[18:56] sustrik too much async stuff happening at once
[18:56] pieterh it's helpful to use a reactor model but I've not measured that performance yet
[18:57] sustrik when identitites get into the mix it becomes insanely complex
[18:57] pieterh indeed
[18:57] sustrik reactor = event driven?
[18:57] pieterh yes
[18:57] sustrik yes, i use that internally
[18:57] pieterh it lets one handle the async pieces neatly
[18:58] pieterh it's still complex, IMO this will take two years at least to get into shape
[18:58] sustrik presumably, no idea
[18:58] pieterh with sufficient performance to be useful
[18:58] pieterh if performance isn't a criteria, much faster of course
[18:59] sustrik yes, without performance constraints it's a piece of cake
[18:59] pieterh yes
[18:59] pieterh well, I'm still doing it *properly*, just not optimized in any way
[18:59] pieterh async connects etc.
[18:59] sustrik great
[19:00] sustrik the API would be the most important outcome
[19:00] sustrik if you manage to get it done
[19:00] pieterh I
[19:00] pieterh I've abstracted the routing, that's simplest
[19:00] pieterh not done exception handling yet
[19:01] sustrik i would say the connection creation/teardown is the most scary stuff
[19:01] pieterh hmm, that's ok, so far
[19:01] pieterh I use a peering concept
[19:02] sustrik the problems look like this for example:
[19:02] sustrik imagine there's a TCP listener object listening on a port
[19:02] pieterh sure
[19:02] sustrik it gets a new connection
[19:02] sustrik it creates a connection object
[19:02] pieterh right
[19:03] sustrik that runs in async way, maybe in a different thread
[19:03] sustrik the connection object reads the identity
[19:03] pieterh ah, I'm cheating :-)
[19:03] sustrik finds the assciated session
[19:03] sustrik (if one exists, if it does not it creates one)
[19:03] pieterh right
[19:04] sustrik the session may be running in a 3rd thread
[19:04] sustrik so it has to migrate itself to a different thread
[19:04] pieterh oh, I'm definitely cheating :-)
[19:04] sustrik etc.
[19:04] pieterh since 90% of use cases only need one thread for I/O
[19:04] sustrik now imagine all the possible combinations that may happen during this process
[19:04] pieterh and I can create multiple I/O threads by creating multiple driver instances
[19:04] pieterh I do all this in a single thread
[19:05] sustrik the problem is that you don't know in advance in which thread you are going to run
[19:05] sustrik you have to read identity first
[19:05] pieterh like I said, I'm cheating, I'm using exactly one thread
[19:05] pieterh per driver instance
[19:05] sustrik ok
[19:05] sustrik it'll get more complex once you start making it multi-threaded
[19:06] pieterh that would be, if you want multiple threads, you create multiple instances of a driver
[19:06] pieterh tcp1://, tcp2:// etc.
[19:06] pieterh it's only needed for extreme use cases
[19:06] pieterh I didn't see any reason to make multithreaded drivers
[19:06] sustrik sure, the problem is that the session persists between subsequent tcp connections
[19:06] pieterh indeed
[19:06] sustrik thus you have to migrate the driver
[19:06] pieterh so more cheating: no identities
[19:06] sustrik to the right thread
[19:07] pieterh this is a good opportunity to experiment with "no identities" as a simplification
[19:07] sustrik i've asked for that
[19:07] sustrik everyone was like "no way"
[19:07] pieterh no durable sockets, in any case
[19:07] pieterh well, I like the simplification
[19:08] sustrik so do i
[19:08] sustrik but
[19:08] sustrik backward compatibility
[19:08] pieterh but like any change it has to be justified with pros and cons
[19:08] sustrik not pissing of users
[19:08] sustrik etc.
[19:08] pieterh it's just a matter of demonstrating why it's worth it
[19:08] pieterh let me give you an example...
[19:09] sustrik tell that to someone using identities in production :|
[19:09] pieterh "here is a secure SASL TCP transport for #zeromq"
[19:09] pieterh "hey, cool!"
[19:09] pieterh "btw, it doesn't do explicit identities"
[19:09] pieterh "who the heck cares, GIMME!"
[19:09] pieterh eventually people will migrate use cases away from explicit identities
[19:10] pieterh and then we can deprecate that, and then eventually remove it
[19:10] pieterh durable sockets break driver single-threadedness
[19:10] pieterh therefore VTX won't support them
[19:12] pieterh We do need to gather use cases for explicit identities and see how otherwise to solve them
[19:12] pieterh IMO it's like SWAP, it should be done at the device/broker level
[19:12] pieterh we already do persistence decently at a higher level
[19:26] sustrik <sustrik> sorry, got disconnected
[19:26] sustrik <sustrik> <sustrik> the problem is that the infrastrucure is generic
[19:26] sustrik <sustrik> <sustrik> so even with ideantity-less SASL transport
[19:26] sustrik <sustrik> <sustrik> you would have to maintain identies beacuse of TCP transport
[19:26] sustrik <sustrik> <sustrik> btw, subscription forwarding looks working ok
[19:26] sustrik <sustrik> <sustrik> i would say it's time to start thinking about polishing 3.0 and releasing it
[19:37] jurica what happens when a worker dies that still have tasks in his queue? are those tasks lost?
[19:38] sustrik yes