[Time] Name | Message |
[01:50] kenkeiter
|
Okay, Rubyists with zmq experience -- question!
|
[02:23] Xin
|
test
|
[02:25] Xin
|
Does it make sense to add UDP support to 0mq?
|
[08:06] sustrik
|
Samy: pong
|
[08:06] Zao
|
Doesn't PGM stuff travel on UDP already?
|
[08:09] sustrik
|
Zao: PGM travels on IP
|
[08:09] sustrik
|
epgm is a convenience hack that travels on UDP instead of IP
|
[08:09] Zao
|
Oh, right.
|
[08:09] Zao
|
Never used any of them, so I went with faint memories of manpages past.
|
[09:18] omarkj
|
kleppari: Morning.
|
[09:29] mrm2m
|
Hey ho!
|
[09:30] mrm2m
|
Is there a lightweight introduction, how zmq works internally?
|
[09:33] pieterh
|
mrm2m: there is documentation of the source code and there are whitepapers
|
[09:33] pieterh
|
but there is nothing lightweight about 0MQ's internals, really
|
[09:34] mrm2m
|
ok
|
[09:39] Bruc
|
hey wat sup all
|
[10:00] kleppari
|
omarkj: hah, hi!
|
[10:01] omarkj
|
kleppari: Didn't notice you here until now, are you guys using 0mq?
|
[10:01] kleppari
|
I'm experimenting a bit
|
[10:02] kleppari
|
I just love the idea of message passing for concurrency
|
[10:03] kleppari
|
well, outside of erlang at least
|
[10:05] kleppari
|
how about yourself?
|
[10:07] omarkj
|
Working with it.
|
[10:07] omarkj
|
Let's go pm. :)
|
[12:49] keffo
|
sustrik, I think I might have figured out the concurrency issue
|
[12:49] sustrik
|
cremes: here i am
|
[12:50] sustrik
|
keffo: so what's the problem?
|
[12:51] keffo
|
I was rewriting (using poll rather than sequential) and to clean up logging etc, and I just noticed one place where I send a null message before sending the rest, except "the rest" already included the incomin route-null, so I suspect it was being sent twie
|
[12:52] keffo
|
twice
|
[12:52] keffo
|
nullmore, nullmore, uuidmore, data, etc..
|
[12:52] sustrik
|
ok, i see
|
[12:53] sustrik
|
so no problem in 0mq to worry about
|
[12:53] keffo
|
Not at all
|
[12:53] sustrik
|
:)
|
[12:53] keffo
|
I never expected that either :)
|
[12:53] sustrik
|
you are an optimist then :)
|
[12:54] keffo
|
It's been very stable and well behaved so far, apart from this issue which is my own fault :)
|
[12:55] keffo
|
I did 100k pi calcs yesterday, with 10 worker procs on different machines.. the loadbalancer had ~500kb/s passing through it, and in total more than half a gig of logs were generated.. That's when I decided the logging was way wonky, didnt really tell much
|
[13:25] cremes
|
sustrik: i'm available for the next hour or so if you want to chat
|
[13:45] sustrik
|
cremes: hi
|
[13:46] sustrik
|
give me some context first: why have you moved from ruby-ffi to rbzmq?
|
[13:46] cremes
|
there are multiple ruby runtimes; not all of them support the C extension api that rbzmq uses
|
[13:47] cremes
|
ffi allows me to support *all* of the runtimes
|
[13:47] cremes
|
oh, and i haven't moved from one to the other; i would just like rbzmq to have the same api as ffi-rzmq
|
[13:48] sustrik
|
ok, i see
|
[13:48] sustrik
|
it looks like brian buchanan is not reponding, right?
|
[13:48] cremes
|
right
|
[13:48] sustrik
|
ok, i have admin access to the project
|
[13:49] sustrik
|
i'll add you there as a developer
|
[13:49] mato
|
sustrik: cremes: a question; i was wondering myself why there were two ruby bindings
|
[13:49] mato
|
is there any point in keeping both around?
|
[13:49] sustrik
|
hi
|
[13:49] mato
|
if FFI is the way to go then why keep rbzmq at all?
|
[13:50] cremes
|
um... i don't know; it's probably best to get feedback from folks using rbzmq and ask them why they prefer it
|
[13:50] cremes
|
i'm biased towards the ffi one, obviously ;)
|
[13:51] mato
|
that might be a good idea; having two bindings is IMO confusing
|
[13:51] sustrik
|
does it work with any runtime?
|
[13:51] cremes
|
that's why i was hoping brian buchanan would speak up
|
[13:51] mato
|
cremes: is there a mailing list for this? I've not seen any discussion on zeromq-dev...
|
[13:52] mato
|
maybe just ping there, outline the situation (two bindings, ffi works on all runtimes, etc.) and ask the community
|
[13:52] cremes
|
this has not been brought up on the zerome-dev list
|
[13:52] cremes
|
i'll do that
|
[13:53] sustrik
|
can you explain how does the runtime interoperability work?
|
[13:53] sustrik
|
so far i had an impression that ffi doesn't work with all the ruby runtimes
|
[13:53] cremes
|
sustrik: as of a few days ago, ffi works with all of the runtimes
|
[13:53] sustrik
|
aha
|
[13:53] cremes
|
i worked out the last details with the runtime guys
|
[13:53] sustrik
|
so it's new
|
[13:54] cremes
|
the issue with C extensions is that they oftentimes access lots of internal memory structures
|
[13:54] sustrik
|
that may actually mean that rbzmq is not needed any more
|
[13:54] mato
|
cremes: what's the status of FFI support for Ruby MRI (i.e. that which most people are using right now)?
|
[13:54] cremes
|
ffi allows the runtime authors to hide their implementation details and provide a more solid mechanism for accessing C libraries
|
[13:55] sustrik
|
i see
|
[13:55] cremes
|
mato: it's quite good
|
[13:55] mato
|
cremes: right, but does "quite good" mean "production quality"?
|
[13:55] mato
|
i guess what i'm asking is if rbzmq goes away then what will that mean for the current mainstream of Ruby MRI users
|
[13:56] mato
|
will they all just happily start using the FFI binding, or not?
|
[13:56] cremes
|
mato: perhaps; i can only speak for how it works with 0mq; obviously we still have hangs until 2.1.x comes out with EINTR support for blocking calls
|
[13:56] mato
|
sure, but the EINTR hangs are orthogonal to FFI/non-FFI
|
[13:56] sustrik
|
well, whatever happens the rbzmq project is not going to be killed
|
[13:56] sustrik
|
so no worry
|
[13:56] cremes
|
mato: i'll ask for feedback on the ML
|
[13:57] sustrik
|
the problem is that it doesn't have a permanent maintainer
|
[13:57] cremes
|
mato: sure, it's orthogonal, but it's hard to say if there are other bugs when you get a hang
|
[13:57] cremes
|
is it due to the blocking behavior or something else?
|
[13:57] mato
|
well then put a note up about that (lack of maintainer) on the rbzmq project page
|
[13:57] cremes
|
i don't have the time or energy to run stuff under gdb all of the time to figure that out :)
|
[13:58] cremes
|
let's take this to the ML and see what users have to say
|
[13:58] mato
|
yup
|
[13:58] sustrik
|
ok
|
[13:58] cremes
|
btw, from a search on github most projects are using the ffi-based binding
|
[13:58] cremes
|
but i certainly don't want to lose or alienate the rbzmq users
|
[13:59] cremes
|
which takes us full circle; i'd like each one to present the same api to the user (based off of the C and C++ binding apis)
|
[13:59] mato
|
definitely
|
[13:59] mato
|
otherwise it just all gets way too confusing
|
[13:59] cremes
|
if people can switch back and forth with no code changes, that's a great situation for the community
|
[13:59] mato
|
is it much work to synchronize the APIs?
|
[14:00] cremes
|
mato: not a lot but unfortunately i don't have the C chops to do it myself
|
[14:00] cremes
|
so i need someone else to agree to it
|
[14:00] cremes
|
to do the work
|
[14:00] mato
|
cremes: well, ask for help on the list and we'll see what happens
|
[14:00] cremes
|
will do
|
[14:05] sustrik
|
mato: man bug!
|
[14:05] sustrik
|
zmq_socket(3):
|
[14:05] sustrik
|
"When a ZMQ_XREQ socket is connected to a ZMQ_REP socket each message sent must consist of an empty message part, the delimiter, followed by one or more body parts."
|
[14:05] mato
|
sustrik: !!?!? :-)
|
[14:06] sustrik
|
the identities should be mentioned
|
[14:06] sustrik
|
they are definitely mentioned for ZMQ_XREP
|
[14:06] ptrb
|
yeah, it's not really clear to me what work i have to do at the app level to do XREQ/REP (or REQ/XREP)
|
[14:07] mato
|
uh, yeah, i kind of punted on explaining the identity stack on the XREQ side at the time
|
[14:07] mato
|
sustrik: that stuff was written in a hurry, you remember :)
|
[14:07] sustrik
|
ptrb: there was a nice diagram somewhere...
|
[14:07] mato
|
sustrik: anyway, it's not a bug, it's just a simplification :)
|
[14:07] mato
|
sustrik: i'll look into it.
|
[14:08] cremes
|
ptrb: was this helpful at all? http://www.zeromq.org/tutorials:xreq-and-xrep/
|
[14:08] cremes
|
if not, tell me what is unclear and i'll fix it (or edit the page yourself)
|
[14:08] ptrb
|
*clicks*
|
[14:08] sustrik
|
ah -- that's the one i meant
|
[14:08] cremes
|
i was certainly confused by that stuff which is why i wrote it down
|
[14:09] ptrb
|
yeah, it definitely helps
|
[14:09] ptrb
|
but ruby is definitely not the right language to use in examples like this
|
[14:09] ptrb
|
it's as opaque as perl without the benefit of perl's saturation
|
[14:09] cremes
|
ha
|
[14:10] mato
|
:-)
|
[14:45] CIA-20
|
zeromq2: 03Martin Sustrik 07maint * re2802d9 10/ src/options.cpp : values of RATE, RECOVERY_IVL and SWAP options are checked for negative values - http://bit.ly/9lAWvv
|
[14:46] pieterh
|
sustrik: nice to see the validation of option values
|
[14:47] sustrik
|
there are 3 bugs filled by cremes :)
|
[14:47] mato
|
sustrik: incidentally, there's one annoying thing regarding the option values
|
[14:48] sustrik
|
yes?
|
[14:48] mato
|
well, it's the whole "option types" issue
|
[14:48] sustrik
|
?
|
[14:48] mato
|
one thing that i realised the other day when i was writing some simple test cases
|
[14:48] mato
|
is that in order to use options sensibly from C, you have to have uint64_t / int64_t defined
|
[14:49] mato
|
now, obviously those are not defined by zmq.h
|
[14:49] mato
|
of course if you want to use the API you have to do "platform specific stuff" to get those types
|
[14:49] mato
|
not a good situation :-(
|
[14:50] mato
|
especially on windows where M$ does not ship stdint.h
|
[14:50] sustrik
|
i know
|
[14:50] sustrik
|
what can i do without breaking the backward compatibility?
|
[14:51] mato
|
probably nothing
|
[14:51] sustrik
|
:|
|
[14:51] mato
|
hmm, maybe...
|
[14:52] mato
|
well, in theory you could define zmq_ namespaced equivalents of those types including the platform magic in zmq.h, but that's kind of ugly
|
[14:52] mato
|
and it'd probably still break C++ due to it's strict notions of types
|
[14:52] sustrik
|
it has to be solved properly
|
[14:53] sustrik
|
BSD-style
|
[14:53] mato
|
hmm
|
[14:53] sustrik
|
4-byte unsinged integer in network byte order
|
[14:53] mato
|
hang on
|
[14:53] sustrik
|
and such
|
[14:53] mato
|
what?
|
[14:53] mato
|
what does network byte order have to do with it? :)
|
[14:53] sustrik
|
that's what i do when usign BSD sockets
|
[14:53] mato
|
?
|
[14:54] sustrik
|
addr.port = htons (port);
|
[14:58] mato
|
sustrik: ah, you're right, i didn't realise e.g. sockaddr_in.sin_port was in network byte order
|
[14:58] mato
|
sustrik: the thing is, the setsockopt stuff is somewhat ad-hoc
|
[14:58] CIA-20
|
zeromq2: 03Martin Sustrik 07master * re2802d9 10/ src/options.cpp : values of RATE, RECOVERY_IVL and SWAP options are checked for negative values - http://bit.ly/9lAWvv
|
[14:58] CIA-20
|
zeromq2: 03Martin Sustrik 07master * rff10807 10/ src/options.cpp :
|
[14:58] CIA-20
|
zeromq2: Merge branch 'maint'
|
[14:58] CIA-20
|
zeromq2: * maint:
|
[14:58] CIA-20
|
zeromq2: values of RATE, RECOVERY_IVL and SWAP options are checked for negative values - http://bit.ly/9P99Fu
|
[14:58] mato
|
it does not use network byte order AFAIK
|
[14:59] mato
|
sustrik: anyway thinking about it, i guess zmq.h will have to define its own types
|
[15:00] mato
|
the main problem with that is doing it in a portable fashion :-(
|
[15:07] ptrb
|
#defines, #defines for everybody!!
|
[15:07] mato
|
sure, feel free to suggest how to portably define a 64-bit unsigned integer type :-)
|
[15:08] ptrb
|
you just take two ints and mash 'em together, obviously
|
[15:08] mato
|
:-)
|
[15:29] sustrik
|
mato: btw, there's some packaging PATCH on the mailing list
|
[15:29] sustrik
|
will you reply to that?
|
[15:30] mato
|
sustrik: yes, later, i know it's there
|
[15:30] mato
|
have to go now
|
[15:30] sustrik
|
ok, there's on in the bug tracker as well
|
[15:30] mato
|
sustrik: i'd have just applied it but the license stuff needs to be sorted out
|
[15:31] mato
|
else all i can reply is "please state your patch is licensed under..."
|
[15:31] mato
|
have discussed with pieter will update you this evening
|
[15:31] mato
|
or pieter will
|
[15:31] mato
|
bbl
|
[15:31] sustrik
|
should i ask for the license?
|
[15:31] mato
|
don't bother lets just fix it
|
[15:31] mato
|
talk to you in the evening
|
[15:31] sustrik
|
ok
|
[15:41] Samy
|
sustrik, just had some questions regarding zeromq's lock-less data structures.
|
[15:42] sustrik
|
yes?
|
[15:42] Samy
|
sustrik, I'm working on a library to help ease concurrent programming, it includes a plethora of concurrent data structures, synchronization methods and mechanisms for SMR (currently using hazard pointers, considering implementing RCU).
|
[15:43] Samy
|
sustrik, I was curious if there were some features you would really like to see, and some specific constrainted data structures (M:N consumer/producers) for zeromq.
|
[15:43] Samy
|
sustrik, I was also curious if SMR was important for ZeroMQ at all.
|
[15:44] sustrik
|
sorry, what's SMR?
|
[15:44] Samy
|
sustrik, safe memory reclamation.
|
[15:45] Samy
|
sustrik, usually it's important for unbunded lock-less data structures (simple stack being a great example).
|
[15:46] sustrik
|
does it translate to "make the current state of the memory visible to the other CPU codes"?
|
[15:47] sustrik
|
sorry, i am not an expert on lock-free algos
|
[15:48] Samy
|
sustrik, well, I guess fundamentally, it's more like "let other CPUs know I am not using this memory".
|
[15:48] sustrik
|
ah, ok, let me explain what 0mq is doing
|
[15:48] Samy
|
sustrik, some techniques will require a full barrier (RCU) while others do not (hazard pointers).
|
[15:48] Samy
|
sustrik, ok.
|
[15:48] sustrik
|
the communication always happens between exactly two endpoins
|
[15:49] sustrik
|
ie. at most 2 cpu cores
|
[15:49] sustrik
|
so each lock-free queue has exactly one writer and exactly one reader
|
[15:49] Samy
|
Ah, makes it much easier. :-)
|
[15:49] Samy
|
Sorry, let me pop open the source-code.
|
[15:49] sustrik
|
src/ypipe.hpp
|
[15:49] Samy
|
Thanks, loading.
|
[15:50] sustrik
|
what it does basically is that writer is appending new items to the linked list
|
[15:50] sustrik
|
the reader is reading items from the linked list
|
[15:50] Samy
|
I don't see any padding, sustrik.
|
[15:51] sustrik
|
what padding?
|
[15:51] Samy
|
Between w and r.
|
[15:51] Samy
|
sustrik, to prevent false cache line sharing, this improves concurrency.
|
[15:51] sustrik
|
that would be nice
|
[15:51] Samy
|
sustrik, it can be drastic. :-)
|
[15:51] Samy
|
Let me show you a simple example on this machine.
|
[15:52] sustrik
|
ok, so what has to be done is separate the variables that belong to the reader and those that belong to the writer
|
[15:53] sustrik
|
and keep them in different cachelines
|
[15:53] sustrik
|
right?
|
[15:53] keffo
|
the data too
|
[15:53] Samy
|
[sbahra@sbahra validate]$ time ./ck_fifo_spsc 8 1 100000
|
[15:53] Samy
|
real 0m5.294s
|
[15:53] Samy
|
[sbahra@sbahra validate]$ time ./ck_fifo_spsc 8 1 100000
|
[15:53] Samy
|
real 0m7.718s
|
[15:54] Samy
|
sustrik, the latter is without appropriate padding.
|
[15:54] sustrik
|
nice
|
[15:54] sustrik
|
the data are allocated in a contiguous block
|
[15:54] keffo
|
that was pretty substantial, what arch is that?
|
[15:54] sustrik
|
to avoid excessive memory allocation
|
[15:54] Samy
|
sustrik, that is a simple benchmark that creates a token ring using a single-producer/single-consumer lock-less queue (passes around 100000 across 8 threads some number of iterations)
|
[15:55] Samy
|
keffo, that is on a Nehalem box.
|
[15:55] keffo
|
aho
|
[15:55] keffo
|
sustrik, You should look into pooling, not only for that
|
[15:55] Samy
|
sustrik, you want the writer and reader variables to be on a seperate cache line.
|
[15:55] sustrik
|
right, i understand that
|
[15:55] Samy
|
sustrik, so for example, void *reader; char pad[56]; void *writer; ...
|
[15:56] Samy
|
sustrik, ok.
|
[15:56] sustrik
|
not sure how to do that in portable fashion though
|
[15:56] keffo
|
compile time :)
|
[15:56] keffo
|
macro hell
|
[15:56] ptrb
|
did someone say #define??
|
[15:56] ptrb
|
:D
|
[15:56] Samy
|
sustrik, not very portable (my library tries to make that portable by doing the hard work of generating those constants at compile-time).
|
[15:56] sustrik
|
:)
|
[15:56] Samy
|
sustrik, but modern IA32 and SPARCv9 boxes I know of all have 64 byte cache lines.
|
[15:56] sustrik
|
well, if we assumed the cache line size is 64 bytes
|
[15:56] keffo
|
cpuid fiddling works I guess :)
|
[15:57] sustrik
|
we would be right in most cases
|
[15:57] sustrik
|
no?
|
[15:57] keffo
|
all x86_64 is >64 no?
|
[15:57] sustrik
|
128?
|
[15:57] Samy
|
keffo, no.
|
[15:57] Samy
|
keffo, 64.
|
[15:57] Samy
|
keffo, when I mean IA32, I mean x86/x86_64.
|
[15:57] keffo
|
>=64 then :)
|
[15:57] Samy
|
I don't know of any > 64.
|
[15:58] sustrik
|
padding to 64 bytes seems reasonable imo
|
[15:58] keffo
|
I meant, x86_64 is presumably >=64, while x86 is not
|
[15:58] sustrik
|
that would be 64 bytes per reader and 64 bytes per writer
|
[15:59] sustrik
|
the shared varaible should probably be on a separate cahceline
|
[15:59] keffo
|
Samy, Have you tried measuring on itanium? :)
|
[15:59] sustrik
|
so it's 192 bytes per queue
|
[15:59] Samy
|
keffo, Itanium is very interesting, but barely used, so no.
|
[16:00] Samy
|
keffo, I would like to work on a port of my library there but I have had design issues, just recently did a major design change.
|
[16:00] keffo
|
Yeah, void of any cachemisses it should be
|
[16:00] Samy
|
sustrik, sorry, can you continue with your explanation of ypipe?
|
[16:00] sustrik
|
ah
|
[16:00] sustrik
|
ok, so part of the list belongs to the writer thread
|
[16:00] sustrik
|
part of it to the reader thread
|
[16:01] sustrik
|
the only point where the two interact is when reader has no more data to read
|
[16:01] sustrik
|
then it does CAS
|
[16:01] sustrik
|
to get the writer's portion of the list
|
[16:02] Samy
|
So, basically, you batch the operations?
|
[16:02] sustrik
|
exactly
|
[16:02] sustrik
|
that's the key
|
[16:02] Samy
|
So, writer does enqueue operations to the queue.
|
[16:02] sustrik
|
ack
|
[16:02] Samy
|
The reader occassionally does a batch dequeue.
|
[16:02] sustrik
|
ack
|
[16:02] Samy
|
Ok.
|
[16:02] Samy
|
sustrik, why do you use CAS for the batch dequeue? That can be implemented using an xchg, making the batch dequeue a wait-free operation.
|
[16:02] sustrik
|
it's pretty effective just because of the batching
|
[16:03] sustrik
|
hm, i haven't seen the code for a long time
|
[16:03] sustrik
|
let me have a look
|
[16:04] Samy
|
It looks like you do a single CAS.
|
[16:04] Samy
|
Line 137, check_read (I assume?)
|
[16:05] sustrik
|
"If there are no
|
[16:05] sustrik
|
// items to prefetch, set c to NULL"
|
[16:05] sustrik
|
Samy: yes
|
[16:05] sustrik
|
the cas is used to communicate back to the writer
|
[16:05] Samy
|
You can implement this without any lock-free operations.
|
[16:06] sustrik
|
you mean without atomic ops?
|
[16:06] sustrik
|
or bus locking?
|
[16:07] Samy
|
Without explicit atomic operations.
|
[16:07] Samy
|
sustrik, the idea is to share a stub node for both tail and head of the queue.
|
[16:08] sustrik
|
can you explain in more detail?
|
[16:08] Samy
|
sustrik, writer updates tail, always updating the next pointer, and reader always assume first node is stub entry.
|
[16:08] Samy
|
sustrik, I can show you source-code.
|
[16:08] Samy
|
Let me see if I find it, hold on.
|
[16:09] sustrik
|
ok
|
[16:09] sustrik
|
how does the communication between two cpu cores happens then?
|
[16:09] sustrik
|
cache coherency algos?
|
[16:11] Samy
|
sustrik, yes.
|
[16:11] Samy
|
sustrik, the point is, the only time we need to really share state is if the queue is empty.
|
[16:11] sustrik
|
ack
|
[16:11] Samy
|
sustrik, we need a way to detect if the queue is empty atomically and update both head and tail atomically.
|
[16:11] Samy
|
sustrik, by sharing a stub node, this is possible.
|
[16:12] sustrik
|
i think something like that is done in ypipe
|
[16:12] sustrik
|
there's an emty item in the list
|
[16:12] sustrik
|
that servers as a placeholder
|
[16:12] sustrik
|
that's what you mean by stub, right?
|
[16:12] Samy
|
Yes.
|
[16:12] Samy
|
sustrik, http://codepad.org/gtPkkz57
|
[16:13] Samy
|
This isn't fenced correctly, but that should be fine on IA32.
|
[16:14] sustrik
|
i thought that IA requires you to fence explicitely
|
[16:14] sustrik
|
ah, you mean x85
|
[16:14] sustrik
|
86
|
[16:14] Samy
|
Yes.
|
[16:15] Samy
|
The fences may be sufficient, I just haven't verified it on non-IA32 yet. :-)
|
[16:15] sustrik
|
ok, there's one more issue there
|
[16:15] sustrik
|
when reader finds out that there are no more items to read
|
[16:15] sustrik
|
it goes asleep
|
[16:16] sustrik
|
it's writers responsibity to wake it up
|
[16:16] sustrik
|
so the writer has to be informed about the fact that reader tried to get more items and failed
|
[16:16] Samy
|
sustrik, futex(2) works well.
|
[16:16] sustrik
|
yes, it's similar
|
[16:17] sustrik
|
but keep in mind that this is a multi-platform app
|
[16:17] Samy
|
Things sort of suck without futex. :-(
|
[16:18] Samy
|
(or a similar mechanism, at least in ring-3)
|
[16:18] sustrik
|
that's how it is :|
|
[16:18] Samy
|
But you could abstract a generic CV layer that uses futex directly if available.
|
[16:18] Samy
|
sustrik, where is this wake-up mechanism implemented?
|
[16:18] sustrik
|
out of the class, it's a dumb socket pair
|
[16:19] sustrik
|
i had a futex implementation once
|
[16:19] Samy
|
What happened?
|
[16:19] sustrik
|
but it was pain in the ass to make it work everywhere
|
[16:19] Samy
|
I see.
|
[16:19] sustrik
|
some linux kernels pretend to have futexes
|
[16:19] sustrik
|
but astucally return ENOTSUP when you try to use them
|
[16:19] sustrik
|
and alike
|
[16:19] Samy
|
I see.
|
[16:20] sustrik
|
anyway, the wake up mechanism is irrelevant for this discussion
|
[16:20] Samy
|
Mostly, yes.
|
[16:20] sustrik
|
what's relevant is the writer has to be notified that reader is sleeping
|
[16:20] sustrik
|
that's what the second cas is for
|
[16:20] sustrik
|
in flush function
|
[16:21] Samy
|
I don't understand.
|
[16:21] Samy
|
Why not simply use plain loads and stores if this is single reader and consumer?
|
[16:22] sustrik
|
because one bit of information is passed the other way round
|
[16:22] sustrik
|
from reader to writer
|
[16:22] sustrik
|
"i am sleeping"
|
[16:22] sustrik
|
that's when c is set to NULL
|
[16:22] sustrik
|
maybe it can be done without atomic ops, i am not an expert
|
[16:23] sustrik
|
however, if it was a normal locking code, this would be a place where races can occur
|
[16:24] Samy
|
Ok.
|
[16:24] sustrik
|
so, reader tries to get the latest batch of the items
|
[16:24] sustrik
|
and if there are none, it sets c to NULL
|
[16:25] sustrik
|
in flush function, writer adds new batch of items and if it finds out that c was NULL previously
|
[16:25] sustrik
|
it knows the reader is sleeping
|
[16:25] sustrik
|
and that it should wake it up
|
[16:26] sustrik
|
my feeling is that the operations on c have to be atomic
|
[16:26] sustrik
|
not 100% sure though
|
[16:30] Samy
|
read_asleep = false; loop { old = xchg(top, NULL); if old == null then reader_asleep = true; sleep(); }
|
[16:31] Samy
|
Sorry, read_asleep is in loop.
|
[16:32] sustrik
|
that's the writer side?
|
[16:33] Samy
|
Reader.
|
[16:34] Samy
|
There is still a risk of spurious wake-up, but that isn't a big deal.
|
[16:34] Samy
|
If it is, then you basically implement a barrier.
|
[16:34] Samy
|
You have a reader and a writer shared variable. Writer signals reader once, only if reader has signaled writer since last wake-up.
|
[16:35] sustrik
|
yes, that's the goal
|
[16:36] Samy
|
And the algorithm.
|
[16:36] Samy
|
Me too.
|
[16:36] sustrik
|
can you give a pseudo-code for writer side as well?
|
[16:38] sustrik
|
in the code above reader_asleep is another shared variable, right?
|
[16:39] pieterh
|
sustrik: I was trying to say something about options but got interrupted by stuff...
|
[16:40] pieterh
|
API should IMO reject an identity starting with zero byte
|
[16:40] sustrik
|
now you are likely to get interrupted by brain-damaging lock-free algo discussion :)
|
[16:40] pieterh
|
:-/
|
[16:40] sustrik
|
yes, that would be nice
|
[16:41] Samy
|
reader: loop { r = false; store_fence; w = true; o = xchg(top, NULL); if old == null then { r = true; sleep(); } }
|
[16:41] Samy
|
writer: add(); c_w = w; load_fence; c_r = r; if c_r == true and c_w == true; then { c_w = false; signal_reader; }
|
[16:41] sustrik
|
pieter: fancy sumbitting a patch?
|
[16:41] pieterh
|
sustrik: I'm pulling master as we speak, yes I'll make you a patch
|
[16:42] sustrik
|
thanks
|
[16:42] sustrik
|
Samy: sorry, having 3 discussions in parallel
|
[16:43] Samy
|
Hey, no problem. :-)
|
[16:43] Samy
|
I'll be back in some minutes.
|
[16:43] sustrik
|
sure, checking your code right now
|
[16:46] sustrik
|
what's top?
|
[16:46] sustrik
|
which variable belong to whom?
|
[16:46] sustrik
|
both r and w seem to be shared
|
[16:49] pieterh
|
sustrik: np. options.cpp already checks that... :-)
|
[16:49] sustrik
|
good
|
[16:51] Samy
|
sustrik, yes, they are shared.
|
[16:52] Samy
|
sustrik, that statement is whatever condition you use to check if the queue is empty.
|
[16:53] sustrik
|
this one: o = xchg(top, NULL); ?
|
[16:53] Samy
|
That and next.
|
[16:53] Samy
|
o == null
|
[16:53] Samy
|
(not old)
|
[16:54] sustrik
|
do c_r and c_w are local to writer
|
[16:54] Samy
|
Yes.
|
[16:54] Samy
|
The reason we have 2 variables is that writer will signal the reader only once.
|
[16:55] Samy
|
The reader will then wake-up, and if the queue is empty it will indicate that it is asleep and it is fine for the writer to send another signal.
|
[16:56] Samy
|
sustrik, if you treat it like a stack, it will make more sense.
|
[16:56] sustrik
|
Samy: stack of what?
|
[16:56] Samy
|
sustrik, for a FIFO, you can use another technique.
|
[16:57] Samy
|
sustrik, of objects.
|
[16:57] sustrik
|
that's the case in 0MQ
|
[16:57] Samy
|
sustrik, this is in reference of xchg(top, ...) :)
|
[16:57] Samy
|
sustrik, what's of relevance to you is the notion of the r and w variables, that's all.
|
[16:58] sustrik
|
r = "i am sleeeping"
|
[16:58] sustrik
|
what abour w?
|
[16:58] sustrik
|
about*
|
[16:58] Samy
|
w = "reader has woken-up since the last signal"
|
[16:59] sustrik
|
soemging is missing
|
[16:59] sustrik
|
w is never set to false
|
[16:59] Samy
|
Oh, sorry.
|
[16:59] Samy
|
{ c_w = false; signal_reader; } should be { w = false; signal_reader; }
|
[17:00] sustrik
|
lk,now it makes sense
|
[17:00] sustrik
|
now, what about the fences
|
[17:01] sustrik
|
does it sync the data in the queue as well?
|
[17:02] Samy
|
You can figure that out.
|
[17:02] Samy
|
That is mainly meant for those flags.
|
[17:02] Samy
|
The idea, reader should always appear to be set to false before writer is set to true (it must indicate it has woken up before it indicates it is fine to send another signal).
|
[17:03] sustrik
|
got it
|
[17:04] sustrik
|
that part ensures that everything happens in nice step-lock fashion
|
[17:04] Samy
|
Right.
|
[17:04] sustrik
|
now few technical questions...
|
[17:04] Samy
|
I'm no expert, but ok.
|
[17:05] sustrik
|
what does lock;xchg() actually means
|
[17:05] sustrik
|
is there an implied barrier?
|
[17:05] Samy
|
On IA32, atomic operations have a total order across all processors.
|
[17:05] Samy
|
There is an implicit pipeline flush.
|
[17:06] Samy
|
There is an implied barrier.
|
[17:06] Samy
|
There is also no need for the lock prefix on xchg.
|
[17:06] Samy
|
xchg is guaranteed to be atomic (which is what also makes it expensive to use, if you don't need it).
|
[17:06] sustrik
|
in that case there's no global ordering, right?
|
[17:06] Samy
|
Usually, yes.
|
[17:07] Samy
|
add is not guaranteed to be atomic, lock add is.
|
[17:07] Samy
|
xchg is always atomic.
|
[17:07] sustrik
|
oh my
|
[17:07] Samy
|
It's expensive compared to simple loads and stores, that's for sure.
|
[17:07] sustrik
|
so there's an implied barrier on xchg, right?
|
[17:07] Samy
|
Yes.
|
[17:07] sustrik
|
then, in your code we have 2 barriers
|
[17:08] Samy
|
For more information, you can see Volume 3 of the Intel Architecture Manuals, Chapter 8.
|
[17:08] sustrik
|
isn't it better to use single cas with a barrier then?
|
[17:08] Samy
|
sustrik, you don't need any of those barriers for IA32. Stores are always seen in order.
|
[17:09] Samy
|
sustrik, CAS is expensive.
|
[17:09] Samy
|
sustrik, it's best to avoid the lock prefix completely if possible.
|
[17:09] sustrik
|
ack
|
[17:09] Samy
|
sustrik, atomic loads and stores do this.
|
[17:10] sustrik
|
atomic load & stores == have global ordering ?
|
[17:10] Samy
|
sustrik, in theory, CAS can provide infinite consensus. If you don't need infinite consensus, you might not need CAS at all.
|
[17:10] sustrik
|
anyway, i have to get my head around it
|
[17:11] sustrik
|
are you around somewhere?
|
[17:11] Samy
|
Processor ordering.
|
[17:11] sustrik
|
email, or so?
|
[17:11] Samy
|
sbahra@repnop.org
|
[17:11] Samy
|
sustrik, I'd love your feedback on this library I'm working on sometime.
|
[17:11] Samy
|
sustrik, what architectures does ZeroMQ support?
|
[17:12] sustrik
|
we've tested on x86, itanium, sparc, ppc
|
[17:12] sustrik
|
arm
|
[17:12] Samy
|
sustrik, if you see section 8.2.2 in Volume 3 of the IA32 manuals, you'll get a nice break-down of the memory ordering.
|
[17:12] Samy
|
sustrik, does the ZeroMQ project have regular access to such boxes?
|
[17:13] sustrik
|
we have an itanium and sparc box
|
[17:13] sustrik
|
no ppc
|
[17:13] Samy
|
Itanium, nice. :-)
|
[17:13] Samy
|
My PPC box died (Mac Mini) recently. Need to fix it.
|
[17:14] Samy
|
I sold my only decent SPARCv9 box, at work our SPARCv9 machines are tied to our build cluster.
|
[17:14] Samy
|
sustrik, if I could get accounts on those, I can port my library's atomic interface for them to support those.
|
[17:14] Samy
|
sustrik, might be useful, at least as a reference implementation of some structures.
|
[17:15] Samy
|
sustrik, ;]
|
[17:15] sustrik
|
our sparc box is extremely old and slow :)
|
[17:15] Samy
|
Itanium is fine too.
|
[17:16] Samy
|
Well, Itanium is what I really want access to.
|
[17:16] sustrik
|
but mato mumbled something about bringing some 14-core sparc
|
[17:16] sustrik
|
ah
|
[17:16] sustrik
|
i have to ask mato, i think there's some applications running on it
|
[17:16] keffo
|
sandy bridge ftw!
|
[17:16] Samy
|
sustrik, ok.
|
[17:17] Samy
|
I look forward to future discussions. I would like to show you the work I've done so far for feedback.
|
[17:17] Samy
|
It is C, not C++, however.
|
[17:17] sustrik
|
Samy, i can have a look but as you see i am not an expert :)
|
[17:18] Samy
|
Well, that's fine. The idea is, what would it take to make you use this? As a 3rd party looking to integrate this into their product, what issues do you have with the interface?
|
[17:18] Samy
|
etc
|
[17:18] sustrik
|
what's the link to the lib?
|
[17:18] kleppari
|
we're ditching a couple of sun t1000 boxes at work
|
[17:18] Samy
|
No link. Tarball, sustrik.
|
[17:18] Samy
|
When we have a discussion, I can share that.
|
[17:18] kleppari
|
let me see if I can 'rescue' on of them
|
[17:19] kleppari
|
s/on/one/
|
[17:19] sustrik
|
Samy: sustrik@250bpm.com
|
[17:20] Samy
|
Cool.
|
[17:20] Samy
|
I'm back to work, take care.
|
[17:20] sustrik
|
you too
|
[17:20] sustrik
|
bye
|
[17:20] sustrik
|
kleppari: i don't think it's really needed
|
[17:21] sustrik
|
unless you need it yourself
|
[17:21] kleppari
|
not really
|
[17:21] kleppari
|
kind of lost interest in solaris after oracle bought it
|
[17:21] kleppari
|
err, after oracle bought sun
|
[17:22] kleppari
|
I think they'll do a wonderful job killing it
|
[17:22] sustrik
|
yeah, looks like there will be a lot of sun boxes avialable :)
|
[17:22] kleppari
|
heh, yeah - red hat will make a fortune
|
[17:23] kleppari
|
but the t1000 was a good box, I still think 16 concurrent threads of execution per core is impressive
|
[17:24] kleppari
|
t2000, sorry
|
[17:29] pieterh
|
kleppari: we used to have a t2000 at iMatix when we were making OpenAMQ
|
[17:30] pieterh
|
it was a pretty impressive box, made a noise like an aircraft taking off
|
[17:30] pieterh
|
and was slower (all 32 cores or whatever) than a 2-core Athlon
|
[17:31] mato
|
pieterh: it still might be more useful than the old e4500 i can get for the project
|
[17:31] kleppari
|
slower at what?
|
[17:31] mato
|
pieterh: 0mq is a very different codebase from OpenAMQ
|
[17:32] pieterh
|
mato: perhaps, yes
|
[17:32] mato
|
kleppari: if you do get a chance to rescue one i think it'd be more interesting for development of the lockfree algorithms in 0mq than what i've been offered
|
[17:32] pieterh
|
kleppari: at raw I/O afaics
|
[17:32] pieterh
|
but the main difference was probably Linux vs. Solaris
|
[17:32] mato
|
kleppari: which is an old E4500 (fully spec'ed with 14 CPUs)
|
[17:32] kleppari
|
I found that these boxes perform pretty well with a huge runqueue
|
[17:32] kleppari
|
mato: let me see what I can do
|
[17:32] pieterh
|
they were designed for web services, indeed
|
[17:32] kleppari
|
mato: can't promise anything, though, but I'll try
|
[17:33] mato
|
kleppari: no hurry; also, where are you located? it's not worth shipping that stuff from outside of the EU, too expensive
|
[17:33] kleppari
|
iceland
|
[17:33] mato
|
lol :)
|
[17:33] kleppari
|
so the shipping might kill the deal :P
|
[17:33] mato
|
kleppari: precisely, unless you have friends at smyril line :-)
|
[17:33] mato
|
"chuck it down with them bananas" :-)
|
[17:34] kleppari
|
heheh
|
[17:35] kleppari
|
I don't think we'd be shipping bananas out of the country, but fair point :P
|
[17:36] pieterh
|
hey... ebay is offering free shipping or something...
|
[17:36] mato
|
kleppari: thanks for the offer in any case
|
[17:37] pieterh
|
0beer0clock
|
[17:37] mato
|
Ãbeer
|
[17:37] pieterh
|
zbeer
|
[17:37] mato
|
cyl
|
[17:37] pieterh
|
cyrsn
|
[17:38] mato
|
øÃÃÃÃøøøÃÃøøÃ!
|
[17:38] pieterh
|
:-) I knew it...
|
[17:39] pieterh
|
oh... hang on...
|
[17:43] kleppari
|
alt gr + o ?
|
[17:43] kleppari
|
works on the .is layout
|
[17:44] guido_g
|
and on 105 key .de (qwertz)
|
[17:44] guido_g
|
and I just saw that ch. 3 is back...
|
[17:44] pieterh
|
yeah, I fixed it
|
[17:46] pieterh
|
cyal, i'm off for zbeer
|
[17:47] pieterh
|
guido_g: there's also a new problem solver in Ch1
|
[17:47] guido_g
|
oh...
|
[17:50] guido_g
|
nice one
|
[17:50] guido_g
|
but can you do it in uml?
|
[17:51] guido_g
|
*ducks for cover*
|
[17:53] guido_g
|
more serious thing: smaller version that fits on one page so it can be printed easily
|
[18:04] keffo
|
sustrik, What scenario could trigger a deadlock? (in single threaded proc.)
|
[19:09] Samy
|
sustrik, ping?
|
[19:09] Samy
|
sustrik, http://codepad.org/u1DWN3FG is the correct version.
|
[19:27] ModusPwnens
|
Does zeromq have a lot of start up overhead of initialization?
|
[19:27] ModusPwnens
|
or initialization*
|
[19:36] ModusPwnens
|
the only reason I ask is because i keep getting strange results when I benchmark where the throughput is low intially, but then gradually increases
|