[Time] Name | Message |
[00:09] Guthur
|
0MQ does work on x86-64, correct?
|
[00:09] Guthur
|
polling in particualr
|
[00:09] Guthur
|
particular*
|
[00:31] Guthur
|
oh got it sorted
|
[00:32] Guthur
|
SOCKET changes size depending on x86 x86-64
|
[00:32] Guthur
|
Is fd always and int on *nix
|
[00:32] Guthur
|
this is in the zmq_pollitem_t struct
|
[00:35] Guthur
|
oh it is, how bothersome
|
[08:08] euk
|
hi all
|
[08:09] euk
|
can anyone advise on unlimited memory growth with zmq?
|
[08:10] guido_g
|
see HWM socket option and read the guide
|
[08:10] euk
|
when just sending and receiving a large set (millions) of small messages
|
[08:10] euk
|
thank you
|
[08:20] euk
|
another question - i'm getting tcp throughput higher than ipc (linux, small messages). is that normal? cpu utilization is higher on tcp (so every message takes more cpu), still throughput is somehow higher
|
[09:49] Guthur
|
sustrik: Do you think it would be worth making a clrzmq2 repo
|
[11:12] sustrik
|
Guthur: yes, you can't just drop the existing codebase
|
[11:16] Guthur
|
Yeah that's fair enough
|
[11:17] mikko
|
mornin'
|
[11:17] sustrik
|
mikko: morning
|
[11:17] Guthur
|
Just felt it might be better to make the distinction clear, and especially considering it can not ever go into the master due to the compatibility issue
|
[11:18] mikko
|
what is the biggest compatibility break?
|
[11:19] Guthur
|
mikko: Most of it to be honest
|
[11:19] Guthur
|
I used namespaces for one
|
[11:20] Guthur
|
Also changed the constants to enums
|
[11:21] sustrik
|
mikko: what about tomorrow, fance have a beer in the evening?
|
[11:21] mikko
|
sustrik: sure
|
[11:21] mikko
|
it's a storm in london
|
[11:21] mikko
|
so take an umbrella
|
[11:22] mato
|
hi guys
|
[11:22] sustrik
|
i was already told about the storm in london by the girl behind the counter in the local supermarket here :)
|
[11:22] sustrik
|
mato: hi
|
[11:22] Guthur
|
Recv also just now just returns the message, its null if there is no message, this removes the need for an out parameter,
|
[11:22] mato
|
sustrik: i've replied to the mailbox problem, can you check if my reasoning is correct?
|
[11:22] sustrik
|
mikko: 7pm or so?
|
[11:23] sustrik
|
let me see
|
[11:23] mikko
|
sustrik: 7pm in Doggett's Coat & Badge ?
|
[11:23] mikko
|
is that fine or do you want to eat something nicer?
|
[11:23] sustrik
|
i am ok with that
|
[11:23] sustrik
|
can you drop a notice to the mailing list in case someone would like to join us?
|
[11:24] mikko
|
yeah, i'm just thinking if there was a nicer place to eat
|
[11:24] mikko
|
at 7pm people are going to be hungry
|
[11:24] sustrik
|
mikko: it's up to you
|
[11:24] sustrik
|
i am not familiar with london too much
|
[11:25] mikko
|
what kind of food do you like?
|
[11:25] sustrik
|
all kinds :)
|
[11:28] sustrik
|
mato: yes, the reasoning seems correct
|
[11:28] mato
|
sustrik: ok, i'll whip up a patch and get chuck to test it
|
[11:28] sustrik
|
however, keep in mind that by writing a command in two chunks
|
[11:28] sustrik
|
you can recv it in 2 chunks as well
|
[11:28] sustrik
|
so the recv part has to be changed as well
|
[11:28] mato
|
hmm
|
[11:33] mikko
|
sustrik: where are you staying ?
|
[11:34] sustrik
|
southwark
|
[11:34] sustrik
|
but choose any place you like
|
[11:35] mikko
|
http://www.doggettscoatandbadge.co.uk/
|
[11:35] mikko
|
it's easy
|
[11:35] sustrik
|
yep, that's 2 mins from my hotel
|
[11:38] mato
|
sustrik: the recv() side could in theory use MSG_WAITALL
|
[11:38] mato
|
sustrik: I can try that and give it to chuck to test
|
[11:39] mato
|
sustrik: there's also the problem that if recv() gets EINTR, it *may* get some bytes...
|
[11:39] mikko
|
sustrik: sent
|
[11:39] mato
|
sustrik: which sucks, I'm not sure how to solve that...
|
[11:39] sustrik
|
mikko: thx
|
[11:39] mikko
|
should've written Martin S. as there are many martins
|
[11:39] sustrik
|
see you tomorrow
|
[11:39] mikko
|
:)
|
[11:39] sustrik
|
mato: EINTR?
|
[11:39] mato
|
sustrik: yes, what?
|
[11:39] sustrik
|
how would the API report that kind of thing?
|
[11:40] mato
|
sustrik: it already does report EINTR
|
[11:40] sustrik
|
nbytes == 3 && errno = EITNR?
|
[11:40] mato
|
oh, that, right
|
[11:40] mato
|
sustrik: sorry, you're right, i was reading the recvmsg docs
|
[11:41] sustrik
|
as for the WAITALL wouldn't it work only for blocking recv?
|
[11:42] sustrik
|
is there a way to combine WAITALL and NONBLOCK?
|
[11:43] mato
|
unclear
|
[11:43] mato
|
hang on, reading various threads
|
[11:43] mato
|
sustrik: the most reliable way to do it would be to use a datagram socket instead of a stream socket
|
[11:43] mato
|
sustrik: datagram sockets guarantee that the send/recv is atomic
|
[11:43] sustrik
|
mato: is it possible with socketpair?
|
[11:44] mato
|
sustrik: yes, you just ask for AF_UNIX, SOCK_DGRAM
|
[11:45] mato
|
?
|
[11:45] mato
|
What do you mean?
|
[11:45] sustrik
|
MTU per packet?
|
[11:45] mato
|
?
|
[11:45] sustrik
|
if you send command, which was 1 byte
|
[11:45] sustrik
|
how much of the buffer will be used
|
[11:46] sustrik
|
1 byte
|
[11:46] sustrik
|
?
|
[11:46] sustrik
|
MTU bytes?
|
[11:46] mato
|
No idea
|
[11:46] mato
|
System-dependent
|
[11:46] sustrik
|
it can make the problem even worse
|
[11:46] sustrik
|
in any case, you should be able to read the command using at most 2 recv calls
|
[11:47] sustrik
|
given that you write it using at most 2 sends
|
[11:47] mato
|
guess so
|
[11:47] sustrik
|
assert (PIPE_BUF < sizeof (command_t)) guarantees that
|
[11:47] sustrik
|
>=
|
[11:48] mato
|
I kind of doubt PIPE_BUF has much to do with AF_UNIX sockets
|
[11:49] sustrik
|
what does it apply to then?
|
[11:49] sustrik
|
mkfifo?
|
[11:49] mato
|
sustrik: pipes
|
[11:49] mato
|
sustrik: yeah
|
[11:49] sustrik
|
btw, why aren't we using pipes?
|
[11:50] mato
|
UNIX sockets are better these days IMO
|
[11:50] mato
|
also you don't have to invent funky naming schemes
|
[11:50] mato
|
etc etc
|
[11:50] mato
|
since socketpair nicely gives you an anonymous pair
|
[11:51] sustrik
|
ok
|
[11:51] mato
|
sustrik: also, pipes have some fixed buffer sized
|
[11:51] mato
|
sustrik: not resizable
|
[11:51] sustrik
|
i was just thinking about the fact that what we need is unidirectional pipe
|
[11:51] sustrik
|
the other direction is unused
|
[11:52] mato
|
too bad :-)
|
[11:52] mato
|
use it for something :-)
|
[11:52] mato
|
if it bothers you :-)
|
[11:52] sustrik
|
maybe we can at least shrink the buffer for that direction?
|
[11:52] mato
|
maybe
|
[12:29] Guthur
|
sustrik: Can someone create a zeromq / clrzmq2 repo?
|
[12:30] sustrik
|
yup, wait a sec
|
[12:31] Guthur
|
I'm at work at the moment but when I get home I can move the code to it and update the clrzmq page accordingly
|
[12:32] Guthur
|
I'm off for lunch back in a bit
|
[12:33] mato
|
sustrik: I occasionally get this from test_shutdown_stress:
|
[12:33] mato
|
Socket operation on non-socket
|
[12:33] mato
|
nbytes != -1 (tcp_socket.cpp:197)
|
[12:33] mato
|
/bin/sh: line 4: 32321 Aborted (core dumped) ${dir}$tst
|
[12:33] mato
|
FAIL: test_shutdown_stress
|
[12:34] sustrik
|
mato: yes, the problem was reported already
|
[12:40] mato
|
sustrik: ok, well, I have a preliminary patch for the mailbox retry stuff, sent to the ML
|
[12:41] sustrik
|
mato: thanks
|
[12:46] dv
|
what would be the recommended way to implement a messaging without a response?
|
[12:47] dv
|
i have two nodes A and B, and they communicate asynchronously,
|
[12:47] dv
|
there are no "requests" and subsequent "responses", only events
|
[12:48] dv
|
i could open two req-rep connections between the two, so that both are requesters, and simply not use the response (or send some dummy response)
|
[12:48] dv
|
but that sounds wasteful
|
[12:48] dv
|
any suggestions?
|
[12:48] sustrik
|
Guthur: done
|
[12:49] sustrik
|
dv_: you have to think about scaling to get it right
|
[12:49] sustrik
|
are there going to be multiple A's in the future?
|
[12:50] sustrik
|
or multiple B's?
|
[12:50] sustrik
|
if so, how are the messages to be dispatched to the multiple instances?
|
[12:50] sustrik
|
each message to each instance?
|
[12:50] sustrik
|
if so, use PUB/SUB
|
[12:50] sustrik
|
if you want to load-balance messages between instances
|
[12:50] sustrik
|
use PUSH/PULL
|
[12:51] dv
|
no they would be peers
|
[12:51] dv
|
there is a pair pattern, but it is marked as experimental
|
[12:52] dv
|
if multiple A's/B's were the case, i would use pub/sub, yes
|
[12:52] dv
|
hmm. you know, I could also use it for two nodes only.
|
[12:52] dv
|
but can multiple publishers exist with an IPC connection?
|
[12:53] sustrik
|
you can have multiple pubs pushing messages to a single SUB for example
|
[12:54] dv
|
in fact, can i have multiple publishers with any kind of connection?
|
[12:54] sustrik
|
yes
|
[12:54] dv
|
cool
|
[12:55] dv
|
i have an audio playback process here that is controlled by the frontend, and the playback process can send events, such as "song finished", "song metadata scanned" ..
|
[12:55] dv
|
i think i'll use pub-sub here
|
[12:56] dv
|
but does the IPC mechanism scale linearly with the amount of publishers/subscribers? for example, with tcp, since it is strictly point-to-point, i have a situation where every node has connections to all the other nodes, right?
|
[13:08] sustrik
|
right, same with ipc
|
[13:10] dv
|
something like multicast for ipc would rock. but i guess this is far from trivial. not just in zeromq, but speaking generally
|
[13:11] sustrik
|
the whole multicast thing is a complex thing
|
[13:11] sustrik
|
for ipc to prevent copying the message you would have to have it stored in shmem
|
[13:11] dv
|
i noticed. i stumbled upon weird bugs that i have yet to replicate with the openpgm tools.
|
[13:11] sustrik
|
however, allocating shmem is an expensive operation
|
[13:12] sustrik
|
so it's not that easy
|
[13:12] dv
|
to think of that in ipc ...
|
[13:12] dv
|
yes i see
|
[13:13] dv
|
oh one other thing, when I use pub/sub with multicast (epgm),
|
[13:13] dv
|
and I turn set mcast_loop to 0 in the publisher socket,
|
[13:14] dv
|
the receiver doesnt get anything anymore - but i've only tried it out with receiver and sender running on the same host so far
|
[13:15] dv
|
if I understand this correctly, mcast_loop 0 means that only messages sent between hosts will pass through? i found that part of the manual a bit confusing
|
[13:15] dv
|
(thats my last question btw. :) )
|
[13:15] sustrik
|
dv_: yes, that's how it works
|
[13:15] sustrik
|
multicast over loopback is a terrible hack
|
[13:16] sustrik
|
that is better not to use at all
|
[13:16] dv
|
hmm. i wonder if the bugs i noticed stem from this
|
[13:16] dv
|
i will use IPC for nodes on the same host then
|
[13:16] sustrik
|
quite possibly, anyway turning mcast_loop to off means you won't get the messages on the same host
|
[13:17] dv
|
i dont know if you follow the mailing list much, the bug I mentioned was an assert that happened randomly
|
[13:17] dv
|
openpgm sometimes thinks there is no multicast capable NIC
|
[13:17] sustrik
|
i'm following the list, but i have little knowledge of openpgm
|
[13:17] dv
|
overriding internal openpgm checks made it work all the time
|
[13:17] dv
|
but it felt ... wrong.
|
[13:18] sustrik
|
steve mccoy should comment on that
|
[13:18] dv
|
he did. i just mention it
|
[13:18] dv
|
hmm i'll tell him about the same host
|
[13:18] dv
|
or no, hold on. it already happened when I was just creating a publisher socket
|
[13:19] dv
|
but maybe turning off mcast_loop first does help ... i'll try it out this afternoon. thanks for the suggestions.
|
[13:21] cremes
|
mato: any chance you can send me that mailbox patch as a gist? i get "fatal: corrupt patch at line 6" if i copy/paste it into a file for application
|
[13:22] mato
|
i'll email it to you directly
|
[13:22] mato
|
cremes: what's your email address?
|
[13:23] mato
|
cremes: I sent it to the email you use on the ML.
|
[13:26] cremes
|
mato: perfect; applied it but now get another assertion
|
[13:26] cremes
|
Assertion failed: nbytes == want_nbytes (mailbox.cpp:213)
|
[13:27] mato
|
cremes: Hmm, well, that's precisely what shouldn't happen
|
[13:28] mato
|
cremes: Either you send me your code and I'll try and reproduce it here, or insert some printf's in the various code paths in send() yourself and try and figure out the actual sequence of syscalls
|
[13:28] cremes
|
if you have ruby 1.9.2 on your system, i can send you the code
|
[13:28] mato
|
it's possible OSX is behaving funny when the socket SNDBUF is resized
|
[13:29] cremes
|
otherwise, suggest what details you want printed out and i'll modify the 0mq source locally
|
[13:29] mato
|
cremes: no i don't, sorry... I have some oldish snapshot of 1.9.1
|
[13:29] cremes
|
ok
|
[13:29] cremes
|
before modifying the source, i'll try this on my archlinux box
|
[13:29] mato
|
cremes: that's a good idea
|
[13:29] cremes
|
let's see if linux blows up too
|
[13:31] mato
|
lunch. bbl
|
[13:35] cremes
|
mato: when i run the *unpatched* master on linux against my code example, i get a different assertion
|
[13:35] cremes
|
Assertion failed: new_sndbuf > old_sndbuf (mailbox.cpp:182)
|
[13:35] cremes
|
let me try it with the patch...
|
[13:35] mato
|
cremes: that means you hit the system SNDBUF max
|
[13:35] mato
|
cremes: set the net.core.wmem_max sysctl to something high
|
[13:35] mato
|
cremes: and it'll go away
|
[13:36] cremes
|
ko
|
[14:39] cremes
|
ok, so after boosting net.core.wmem_max i am now getting a socket error "too many open files" when i allocate 508 REQ sockets
|
[14:40] cremes
|
according to ulimit -n, i set the file descriptor max to 250_000 and i still get that
|
[14:40] cremes
|
this is on archlinux, kernel 2.6.35
|
[14:41] cremes
|
same behavior with *and* without mato's patch
|
[14:55] pieterh
|
cremes, cat /proc/sys/fs/file-max
|
[14:55] pieterh
|
what does it give you?
|
[14:56] mato
|
cremes: for ulimit -n you'll have to run that as root, then from that *same* root shell run your code
|
[14:56] mato
|
cremes: since it's inherited to child processes, it is not a system setting
|
[14:56] mato
|
my patch doesn't change anything wrt number of open files
|
[14:56] cremes
|
pieterh: results in 1201177
|
[14:56] pieterh
|
This is from the Confluence wiki
|
[14:56] pieterh
|
Run the command sysctl -a. If this is less than 200000, increase the number of file handles by editing /etc/sysctl.conf and changing the property fs.file-max to 200000. If there isn't a value set already for this property, you need to add the line fs.file-max=200000.
|
[14:56] pieterh
|
Then run sysctl -p to apply your changes to your system.
|
[14:56] cremes
|
mato: i logged in a fresh shell after making the change
|
[14:56] mato
|
cremes: yes, but that won't work
|
[14:57] pieterh
|
cremes: seems high enough, how about 'sysctl -a'?
|
[14:57] mato
|
cremes: verify with ulimit -a in the shell that you're running your code in that the setting has actually taken effect
|
[14:57] mato
|
cremes: anyway, this is beside the point of my patch, so you're running into some macosx weirdness there
|
[14:57] cremes
|
mato: it tells me 250000 for open files
|
[14:58] cremes
|
this is on linux
|
[14:58] pieterh
|
archlinux
|
[14:58] mato
|
cremes: and you're still running out of open files?
|
[14:58] cremes
|
yep
|
[14:58] mato
|
cremes: then your code is the problem, sorry :-)
|
[14:58] cremes
|
ha
|
[14:58] pieterh
|
cremes: do you have a simple test case I can try?
|
[14:58] mato
|
as in, the number of open files is not infinite :-)
|
[14:59] cremes
|
pieterh: results of sysctl -a .... https://gist.github.com/667772
|
[14:59] mato
|
cremes: anyhow, the interesting case is the failure you're getting on macosx, let me know when you have time to put some printfs in the mailbox code
|
[14:59] cremes
|
mato: i have time now; give me suggestions on what is important and i'll print it out
|
[14:59] pieterh
|
cremes, fs.file-max = 1201177
|
[14:59] pieterh
|
fs.nr_open = 1048576
|
[14:59] cremes
|
pieterh: if you have ruby 1.9.2 then i can give you a relatively simple test case
|
[15:00] pieterh
|
derp
|
[15:00] mato
|
cremes: print nbytes after the 1st do loop in send() (after line 163)
|
[15:00] cremes
|
mato: prepatched or postpatch?
|
[15:00] mato
|
cremes: patched
|
[15:00] mato
|
cremes: then, print old_sndbuf, new_sndbuf after the assert on line 184
|
[15:01] mato
|
cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
|
[15:01] mato
|
cremes: that should give us an idea of what is going on...
|
[15:02] mato
|
cremes: oh, and each thing that you print...
|
[15:02] mato
|
cremes: print also the this pointer as %p, (void *)this
|
[15:02] mato
|
cremes: so that it's obvious which printf belongs to which instance
|
[15:07] pieterh
|
cremes, how about trying a simple program to test your Linux file handle limit...
|
[15:07] pieterh
|
https://gist.github.com/667779
|
[15:09] pieterh
|
i can create over 8M sockets on my linux box
|
[15:10] cremes
|
pieterh: where is zfl.h at?
|
[15:11] mikko
|
cremes: github/zeromq/zfl
|
[15:13] pieterh
|
cremes: sorry, just use zmq.h it'll work too
|
[15:14] pieterh
|
cremes, ok, fixed it
|
[15:14] cremes
|
pieterh: give me the line to compile it please... i tried "gcc test.c" but it's bitching about implicit declaration of 'printf'
|
[15:15] pieterh
|
ack, take my fixed version
|
[15:15] pieterh
|
gcc -o testit testit.c
|
[15:15] pieterh
|
don't call c programs 'test' cause 'test' is a built-in shell command
|
[15:15] pieterh
|
gcc -o testit -lzmq testit.c
|
[15:15] mato
|
pieterh: your program will fall over in zmq sooner than it can print the number of sockets it managed to create
|
[15:16] pieterh
|
mato: doesn't seem to
|
[15:16] mato
|
pieterh: becuase of the mailbox failing to get a socket
|
[15:16] mato
|
pieterh: well, in master anyway
|
[15:16] pieterh
|
perhaps in master, yes... i was just trying in 2.0.10
|
[15:16] mato
|
2.0.10 is much different
|
[15:16] mato
|
in fact, in 2.1 even if it did not fall over, it would produce a different number
|
[15:17] pieterh
|
hey: :-)
|
[15:17] pieterh
|
"too many open files"
|
[15:17] pieterh
|
not even a clean return, just internal failure
|
[15:17] pieterh
|
neato
|
[15:17] mato
|
2.0.10 has completely different signalling mechanisms
|
[15:17] cremes
|
pieterh: aha!
|
[15:17] pieterh
|
Too many open files
|
[15:17] pieterh
|
rc == 0 (mailbox.cpp:374)
|
[15:17] pieterh
|
Aborted (core dumped)
|
[15:17] cremes
|
maybe i'm not so crazy after all...? :)
|
[15:18] pieterh
|
cremes: i can't make a test program for that :-)
|
[15:18] cremes
|
too bad...
|
[15:18] pieterh
|
let me see how quickly it dies...
|
[15:18] mato
|
with the default limit of 1024 it'll die somewhere around 512
|
[15:18] mato
|
actually, probably not
|
[15:19] pieterh
|
cremes!
|
[15:19] mato
|
it'll die at 1024/3 roughly i'd guess
|
[15:19] pieterh
|
hey!
|
[15:19] pieterh
|
it dies at 508
|
[15:19] mato
|
yup
|
[15:19] pieterh
|
wow... from >8M to 508... that is quite a step backwards
|
[15:19] pieterh
|
ok, mato, where's that default limit of 1K?
|
[15:19] cremes
|
pieterh: 508 is what i see too
|
[15:20] mato
|
hang on guys, there are several things going on here
|
[15:20] pieterh
|
net.core.wmem_max.. right?
|
[15:20] mato
|
no
|
[15:20] mato
|
that's part of it
|
[15:20] mato
|
allow me to explain
|
[15:20] pieterh
|
shoot
|
[15:21] mato
|
in 2.0.x, there is one signalling socketpair (what is called mailbox in 2.1) per *application thread*
|
[15:21] mato
|
hence, creating a 0mq socket does not use up an actual file descriptor
|
[15:21] pieterh
|
ah, neat
|
[15:21] mato
|
now, in 2.1 the situation is entirely different
|
[15:21] mato
|
each 0mq socket created == 1 mailbox
|
[15:22] mato
|
2 file descriptors
|
[15:22] mato
|
hence, if the default limit is 1024, you'll die at <512
|
[15:22] pieterh
|
sure...
|
[15:22] pieterh
|
now we have a trivial reproducible case on a default linux box
|
[15:22] pieterh
|
how do I raise that limit?
|
[15:22] cremes
|
guys, what is the printf format code for printing ssize_t?
|
[15:22] pieterh
|
%d
|
[15:23] mato
|
cremes: use %d and cast it to int
|
[15:23] cremes
|
archlinux bitches at me about that
|
[15:23] pieterh
|
%u maybe
|
[15:23] cremes
|
ah, ok, it will fit
|
[15:23] mato
|
cremes: there is no printf format for ssize_t
|
[15:23] mato
|
now, as to raising the limit
|
[15:23] mato
|
there are two limits
|
[15:23] mato
|
1. the system maximum, which is the sysctl pieter already mentioned
|
[15:23] mato
|
normally you don't need to touch that
|
[15:24] mato
|
2. your distribution will have some $mechanism for setting a default RLIMIT_NOFILE on login, usually to 1024
|
[15:24] mato
|
if you want to raise that, either figure out where the setting for $mechanism is
|
[15:24] mato
|
or
|
[15:24] mato
|
1. (as user) $ su root
|
[15:24] mato
|
2. (as root) # ulimit -n 8192
|
[15:25] mato
|
3. (still as that root) # su user
|
[15:25] mato
|
4. run your app
|
[15:25] mato
|
you need to follow those 4 exact steps otherwise the raised rlimit setting will not propagate to your app
|
[15:25] mato
|
if you open a new shell, it won't work
|
[15:25] mato
|
is this clear?
|
[15:27] pieterh
|
mato: having raised the ulimit to 8192 (or any large number), the test program now exits cleanly
|
[15:27] pieterh
|
and reports: "This system allows up to 511 sockets"
|
[15:27] mato
|
pieterh: yes, now there is a third limit also
|
[15:27] cremes
|
mato: results from running my test program on OSX and archlinux with the printf statements
|
[15:27] cremes
|
https://gist.github.com/667811
|
[15:27] mato
|
pieterh: compile time setting in 0MQ src/config.hpp
|
[15:28] mato
|
pieterh: which happens to be set to 512
|
[15:28] pieterh
|
was this limit introduced in 2.1.0?
|
[15:29] pieterh
|
i assume it must have been, since previously I could open 8M sockets
|
[15:29] mato
|
pieterh: presumably
|
[15:29] pieterh
|
well, ok I can confirm what you've explained
|
[15:30] pieterh
|
though the max-sockets option seems 1 off
|
[15:30] mato
|
pieterh: dunno, that is sustrik's work
|
[15:30] pieterh
|
- setting it to 8000 allows me to create 7999 sockets
|
[15:30] mato
|
it's allocating resources of some kind up front, otherwise it wouldn't be a compile-time option
|
[15:31] mato
|
cremes: ok, that output is strange
|
[15:31] mato
|
cremes: note that the old_sndbuf/new_sndbuf values for the failing mailbox look completely bogus
|
[15:31] mato
|
32? 16? wtf...
|
[15:31] cremes
|
mato: are you looking at the osx or arch output?
|
[15:32] mato
|
oh, right, sorry
|
[15:32] mato
|
bah, let me save those gists to a file
|
[15:33] cremes
|
i think on linux i am hitting the 512 socket limit
|
[15:33] cremes
|
on osx, it fails long before
|
[15:35] mato
|
cremes: hmm, the output is a bit mixed up due to stdout/stderr mixing
|
[15:35] cremes
|
mato: yes, want me to separate them?
|
[15:35] mato
|
cremes: are you printing the retry_nbytes case?
|
[15:35] mato
|
cremes: i can't find it anywhere...
|
[15:35] mato
|
cremes: just use fprintf (stderr, ...) for everything, it'll work fine
|
[15:35] cremes
|
mato: no, you didn't mention that as being interesting
|
[15:35] mato
|
cremes: oh, sorry, i did, you must have missed it
|
[15:36] mato
|
16:01 < mato> cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
|
[15:36] mato
|
cremes: can you try that please? and set it all to use stderr?
|
[15:37] cremes
|
ok, i'll add that one... sorry for the mixup
|
[15:51] cremes
|
mato: osx results... https://gist.github.com/3892c9fcb0f5493f6ae8
|
[15:53] pieterh
|
cremes, I've reported the issue https://github.com/zeromq/zeromq2/issues/issue/113
|
[15:54] cremes
|
pieterh: thank you
|
[15:58] pieterh
|
I'm pretty sure the limit of ~512 sockets will trip up a few existing apps
|
[15:58] cremes
|
pieterh: i changed that limit to 10k on my linux box, reran the test and it passed
|
[15:58] pieterh
|
yes, indeed
|
[15:58] cremes
|
i did the same change on osx and it still fails in the same spot, same error :(
|
[15:59] cremes
|
must be some funky osx-ism
|
[15:59] pieterh
|
different techniques for setting per-user limits
|
[15:59] pieterh
|
its BSD-4.3 heritage or something
|
[15:59] cremes
|
pieterh: i bumped file descriptors per process to 25k a long time ago for mongodb on this box
|
[16:00] mato
|
pieterh: max_sockets directly affects the number of mailbox slots
|
[16:00] mato
|
pieterh: so the limit is set intentionally low
|
[16:00] zmq_help
|
hi folks - i'm trying to get zeromq ruby bindings working on Ubuntu 10.10
|
[16:00] pieterh
|
mato: you mean as in reducing pre-allocated memory size?
|
[16:00] mato
|
pieterh: yes
|
[16:00] pieterh
|
is there another performance reason?
|
[16:00] cremes
|
zmq_help: which bindings? the zmq gem or ffi-rzmq?
|
[16:01] pieterh
|
data copying etc?
|
[16:01] mato
|
pieterh: possibly, read the code :-)
|
[16:01] zmq_help
|
I installed zmq
|
[16:01] zmq_help
|
I installed zmq ruby gem
|
[16:01] mato
|
pieterh: but in any case, the standard untuned Linux limit is 1024 fds
|
[16:01] cremes
|
zmq_help: ok; so what seems to be the problem?
|
[16:01] mato
|
pieterh: so there's no point in changing the default of max_sockets
|
[16:01] mato
|
pieterh: since it'll fall over anyway
|
[16:01] pieterh
|
mato: IMO there is
|
[16:01] pieterh
|
it'll fall over with a normal system error
|
[16:02] pieterh
|
that can be fixed without recompilation
|
[16:02] pieterh
|
modifying code to change a limit is not acceptable for many processes
|
[16:02] pieterh
|
it means requalification
|
[16:02] pieterh
|
we've discussed this previously
|
[16:02] pieterh
|
imagine 'large bank'
|
[16:03] mato
|
pieterh: large bank should test damn well for such limits...
|
[16:03] zmq_help
|
cremes: when I enter 'require "zmq"' I get this error message
|
[16:03] zmq_help
|
LoadError: libzmq.so.0: cannot open shared object file: No such file or directory - /usr/lib/ruby/gems/1.8/gems/zmq-2.0.9/lib/zmq.so
|
[16:03] pieterh
|
mato: that's not the point
|
[16:03] pieterh
|
point is tuning the sw at compile time is sub-optimal
|
[16:03] zmq_help
|
cremes: but I can see that the zmq.so file is there, and readable
|
[16:03] pieterh
|
and it can be done better unless there is an actual cost in having a larger limit
|
[16:03] pieterh
|
anyhow, issue is noted
|
[16:04] mato
|
pieterh: well, don't forget there are many audiences, embedded, etc etc..
|
[16:04] cremes
|
zmq_help: did you install the 0mq library? the gem does not include it
|
[16:04] cremes
|
also, make sure you run ldconfig after installing the 0mq lib
|
[16:04] mato
|
pieterh: and the limit only affects # of sockets, not # connections obviously
|
[16:04] cremes
|
so that linux picks it up
|
[16:04] mato
|
anyway, noted.
|
[16:04] pieterh
|
mato: I'd suggest embedded systems are less 'typical'
|
[16:04] zmq_help
|
cremes - yes I installed everything - i'll try the ldconfig trick now...
|
[16:04] pieterh
|
plus 0MQ encourages use of many sockets
|
[16:04] mato
|
it does? :-)
|
[16:05] pieterh
|
yes
|
[16:05] pieterh
|
typically one for each flow
|
[16:05] cremes
|
pieterh: agreed... i sometimes need thousands for just that reason
|
[16:05] pieterh
|
and if you want any custom routing that means several sockets per peer
|
[16:05] zmq_help
|
cremes: THANKS! the ldconfig trick did the job
|
[16:05] pieterh
|
as you well know, mato :-)
|
[16:05] cremes
|
zmq_help: remember to help someone else in channel when you get a chance ;)
|
[16:06] zmq_help
|
:-) will do thanks
|
[16:06] zmq_help
|
cremes: where should I give documentation feedback re: the ldconfig trick ??
|
[16:06] cremes
|
pieterh: does imatix have an osx box for testing?
|
[16:06] cremes
|
zmq_help: that trick is listed in the FAQ i think
|
[16:06] pieterh
|
cremes: yes, sure
|
[16:06] pieterh
|
try this: sudo echo âlimit maxfiles 1000000 1000000â³ > /etc/launchd.conf
|
[16:06] cremes
|
otherwise, feel free to join the wiki and modify it on your own
|
[16:07] zmq_help
|
cremes: ok thanks -
|
[16:07] pieterh
|
and then launchctl limit maxfiles 1000000 1000000
|
[16:07] cremes
|
pieterh: i did that when i bumped my limits to 25k... here are the current contents: limit maxfiles 25000 100000
|
[16:07] pieterh
|
cremes, I'm just googling random pages for ""too many open files" OSX"
|
[16:07] pieterh
|
hmm
|
[16:07] cremes
|
ulimit -a agrees that it is 25k
|
[16:08] pieterh
|
and you tuned 0MQ to 10k sockets?
|
[16:08] cremes
|
yes
|
[16:08] pieterh
|
and you still get an abort at 508 sockets?
|
[16:09] pieterh
|
cremes, see http://artur.hefczyc.net/node/27
|
[16:09] mato
|
cremes: hmph, those sndbuf numbers from osx make no sense
|
[16:09] mato
|
cremes: if it really was 16/32 bytes, it'd fall over immediately
|
[16:10] mato
|
cremes: i'd need access to an OSX box to figure out what is going on
|
[16:10] cremes
|
mato: i can give you an account on mine if you'd like
|
[16:10] pieterh
|
mato: I can bring you my MacBook when I manage to make it to BA
|
[16:11] mato
|
cremes: that would be great, but honestly i don't have time this week to look at it seriously
|
[16:11] cremes
|
mato: ok
|
[16:12] mato
|
cremes: upcoming deadlines, too much to do, sorry...
|
[16:12] cremes
|
i understand... no worries
|
[16:15] twomashi1
|
How can I process all outstanding messages in a socket and stop recieving new ones?
|
[16:19] guido_g
|
you can't
|
[16:20] guido_g
|
you can receive all messages in a loop, but you can't tell the ømq socket to stop accepting more/new messages
|
[16:23] Steve-o
|
twomashi1: what is the logical requirement from this scenario? clean FT fail over?
|
[16:29] twomashi1
|
Steve-o: Removing a worker from a worker pool
|
[16:30] Guthur
|
Does 0MQ aim to provide more devices in the future?
|
[16:34] mikko
|
Guthur: i would imagine that if certain devices are used in large amount of projects they could be incorporated into the core
|
[16:34] mikko
|
but not sure whether adding new devices is the biggest priority
|
[16:35] mikko
|
Guthur: are there specific devices that you are after?
|
[16:36] Guthur
|
mikko: Not really just interested is all, if I could find examples of what would be beneficial I would maybe even have a stab at implementing
|
[16:41] twomashi1
|
guido_g: I wanted to have a worker which would process n messages and then die
|
[16:42] twomashi1
|
but it sounds like theres no way to stop the worker recieving more messages
|
[16:42] guido_g
|
as i already said
|
[16:42] twomashi1
|
so this pattern wont work
|
[16:42] twomashi1
|
thanks
|
[16:47] Steve-o
|
twomashi1: I think there are few implementations of worker pools already, e.g. http://kfsone.wordpress.com/2010/07/21/asyncworker-parallelism-with-zeromq/
|
[16:51] guido_g
|
doesn't handle the dynamic shutdown of one worker though
|
[17:04] ngerakines
|
sup
|
[17:17] cremes
|
found the trick for osx... it has some sysctl settings that it shares with freebsd
|
[17:18] cremes
|
for localhost/loopback connections, there are separate send/recv buffers allocated
|
[17:18] cremes
|
net.local.stream.sendspace=82320
|
[17:18] cremes
|
net.local.stream.recvspace=82320
|
[17:18] cremes
|
or whatever value you want
|
[17:18] cremes
|
i'll add this to the FAQ
|
[17:18] mato
|
cremes: so upping those values makes the code work for you?
|
[17:19] mato
|
cremes: that would imply that at least for some sockets, the default on OSX is ridiculously low
|
[17:19] cremes
|
yes, because it avoids the recovery code that tries to adjust SO_SNDBUF
|
[17:19] mato
|
the thing is, the recovery code works
|
[17:19] mato
|
as long as the OS comes back with a sane sndbuf in the getsockopt
|
[17:19] cremes
|
i believe you, but SO_SNDBUF isn't returning the right vals
|
[17:19] cremes
|
and i don't know why
|
[17:20] cremes
|
maybe it is returning kbytes instead of bytes
|
[17:20] mato
|
that is indeed what it looks like
|
[17:20] mato
|
cremes: could you please somehow summarize what we've found and reply to the thread on the ML so that we don't lose it?
|
[17:20] cremes
|
absolutely
|
[17:21] mato
|
cremes: it's plausible that even just setting a large-ish value at mailbox socketpair creation time would make the problem go away
|
[17:21] mato
|
needs more investigation...
|
[17:21] cremes
|
right... i'll respond to the ML with a few details; i'll also update pieter's issue
|
[17:21] mato
|
thx
|
[17:49] cremes
|
hmmm, what would you guess is happening if i get "Too many open files: rc == 0 (mailbox.cpp:431)" ?
|
[17:49] cremes
|
this is with the patched mailbox.cpp
|
[17:51] mato
|
precisely what it says :)
|
[17:52] cremes
|
heh
|
[17:53] cremes
|
i must not have raised my limits high enough and now i'm running out of another resource
|
[18:10] ngerakines
|
hey folks, I've got a c++ app that uses zmq with threads and I'm not sure about the proper way of doing something in particular:
|
[18:10] ngerakines
|
for these threads, they bind to a given socket and wait for incoming messages, however when I want to shutdown these threads I'm not finding through the docs the best way to my socket_in.recv/1 to stop blocking on input
|
[18:11] ngerakines
|
how should I go about this?
|
[18:12] twomashi1
|
ngerakines: i have the same issue
|
[18:13] twomashi1
|
im told theres no way to get zmq to stop recieving messages so you can process outstanding ones
|
[18:13] nettok
|
ngerakines: I would like to know about that too
|
[18:15] nettok
|
ngerakines: maybe registering a signal handler and then trigger the signal from another thread or process, something like that?
|
[18:16] ngerakines
|
hmm
|
[18:16] nettok
|
Or making read non-blocking
|
[18:16] twomashi1
|
oh wait... sorry different issue
|
[18:16] twomashi1
|
use poll?
|
[18:17] ngerakines
|
yeah, looking into poll
|
[18:18] guido_g
|
ngerakines: simply send a quit message (can be of length 0)
|
[18:19] ngerakines
|
ok, was looking for a way to effectively halt all zmq communication once stop was initiated, but thanks
|
[18:20] guido_g
|
then just terminate the context
|
[18:20] guido_g
|
that will close all ømq sockets and stop all ømq related activities
|
[18:22] nettok
|
guido_g: thanks!
|
[18:23] ngerakines
|
guido_g: and in cases where i've got a pub/sub client, use poll?
|
[18:24] guido_g
|
ngerakines: can't follow you, sorry
|
[18:24] ngerakines
|
I've got a pub/sub subscriber with a run-loop that uses socket_t.recv(..)
|
[18:24] guido_g
|
so?
|
[18:24] ngerakines
|
instead of recv, should I be using poll to close out cleanly?
|
[18:24] guido_g
|
why?
|
[18:25] ngerakines
|
Is there a way to send it a kill message like above given that it isn't binding?
|
[18:25] guido_g
|
wht isn't binding? w/o a bound socket you can't receive any messages
|
[18:26] guido_g
|
so if you have a sub socket, it'll just receive the message
|
[18:27] ngerakines
|
because it is creating a socket with ZMQ_SUB ?
|
[18:27] ngerakines
|
with connect ?
|
[18:27] ngerakines
|
socket_t.connect(...) as opposed to socket_t.bind(...)
|
[18:28] guido_g
|
there is no difference between the connect side and the bind side in this case
|
[18:28] guido_g
|
this is for sure mentioned in the guide somewhere
|
[18:56] gandhijee
|
hey guys, is there an ubuntu/debian package of zeromq anywhere?
|
[18:56] pieterh
|
gandhijee, I believe there is, somewhere
|
[18:57] gandhijee
|
happen to have an idea where somewhere might be?
|
[18:57] pieterh
|
Do a google for "debian package zeromq"
|
[18:57] pieterh
|
http://packages.debian.org/source/sid/zeromq
|
[18:58] twomashi1
|
pieterh: do you know how a process using zeromq could stop recieving messages and process it's outstanding messages? (say to remove itself from a worker pool)
|
[18:58] gandhijee
|
yeah i had found that one, its 2.0.6, the other machines are all on 2.10
|
[18:58] pieterh
|
twomashi1, let me think about it... I just got back and am reading the traffic here
|
[18:59] twomashi1
|
ok cool
|
[18:59] twomashi1
|
thanks
|
[18:59] pieterh
|
gandhijee, right... but it's pretty simple to build from source anyhow
|
[18:59] gandhijee
|
i don't want to have to deal with any issues because 1 is newer than the other
|
[18:59] gandhijee
|
yeah, but i wanted to be lazy and not build the .deb
|
[18:59] gandhijee
|
i have to put it on 3 more machines, 1 x86_64 and 2 atoms
|
[18:59] pieterh
|
gandhijee, download tarball, build, it's really simple
|
[19:00] gandhijee
|
yes i know, like i said, i wanted to be lazy and not build the deb, because i have to get it to 3 more machines,
|
[19:00] pieterh
|
you can make a simple script that does it from wget to 'sudo make install; ldconfig'
|
[19:00] gandhijee
|
building the deb is just a cpl extra steps
|
[19:00] pieterh
|
sure
|
[19:01] pieterh
|
being lazy is good if you use that to create something
|
[19:01] pieterh
|
if it's just to avoid work... well... :-)
|
[19:01] pieterh
|
twomashi1, can you provide more background info?
|
[19:01] pieterh
|
do you want flow control?
|
[19:03] twomashi1
|
pieterh: I want to use a pool of PHP workers to process data, this is existing PHP code adapted to read messages from a ZMQ socket. I dont want them to live indefinitely because PHP is prone to memory leakes, so they must exit after processing n requests. Zeromq will fetch more messages than I need tho, so I want some way to stop ZMQ prefetching messages and process the messages already in memory.
|
[19:04] pieterh
|
right...
|
[19:04] pieterh
|
I can think of a few ways
|
[19:04] pieterh
|
here is the most brutal
|
[19:04] pieterh
|
you batch the messages together, using multipart and some delimiters
|
[19:05] pieterh
|
so that you actually send your whole batch, say 100 messages, as one 0MQ message
|
[19:05] pieterh
|
you then use the LRU routing technique from the Guide ch3
|
[19:05] pieterh
|
where the worker signals 'ready' and then gets a job
|
[19:05] pieterh
|
when the worker has processed its job it just terminates
|
[19:05] pieterh
|
it won't signal 'ready' again, so won't get another batch
|
[19:05] pieterh
|
you control the batch size explicitly at the sender side
|
[19:06] twomashi1
|
ah, with LRU the worker must signal to get a job?
|
[19:06] pieterh
|
now the problems with this:
|
[19:06] pieterh
|
yes
|
[19:06] pieterh
|
it's a nice model except it's chatty
|
[19:06] twomashi1
|
chatty is fine, it must be safe.
|
[19:06] pieterh
|
so if you have many small jobs the to and fro will cost too much
|
[19:06] pieterh
|
well, study the lruqueue example
|
[19:06] pieterh
|
you can probably use that as a device in front of your existing client/s
|
[19:06] pieterh
|
with some small mods it'll do what you want
|
[19:07] twomashi1
|
that could work because the job server will be on the same machine.
|
[19:07] pieterh
|
sure
|
[19:07] twomashi1
|
so I dont think chatty is an issue
|
[19:07] pieterh
|
so use the LRU device and then in the workers, die after doing X jobs
|
[19:07] pieterh
|
put $5 in the can on the way out :-)
|
[19:07] twomashi1
|
hehe
|
[19:08] twomashi1
|
Ok, and do you think that in some future there will be a way to support this usage case by instructing the context to stop recieving for a socket, or something like that?
|
[19:08] pieterh
|
well...
|
[19:08] pieterh
|
i don't think it has to be built in
|
[19:08] twomashi1
|
i imagine that if it doesnt go against the goals and policies of the project it could happen, if someone put the time in.
|
[19:08] pieterh
|
it works very nicely at the level it's at now
|
[19:09] pieterh
|
to answer that in technical detail...
|
[19:09] pieterh
|
if you want to regulate how much the sender sends
|
[19:09] pieterh
|
then you must send information back explicitly to synchronize it
|
[19:10] pieterh
|
you could call this an ack windw
|
[19:10] pieterh
|
ack window
|
[19:10] pieterh
|
it only makes sense in an asynchronous request-reply model
|
[19:10] pieterh
|
so yes, it *might* be added to the XREP socket
|
[19:11] twomashi1
|
I see what you mean... the effect can be reproduced using the currently available facilities
|
[19:11] twomashi1
|
thanks for your help!
|
[19:18] cremes
|
OSX sysctl is very odd... see my update on the ML
|
[19:22] mato
|
gandhijee: there is a Debian package of ZeroMQ 2.0.10, I maintain it
|
[19:22] mato
|
gandhijee: it's in Debian unstable
|
[19:22] gandhijee
|
sweet!!
|
[20:25] gandhijee
|
how do i have zeromq listen on a spec interface?
|
[20:25] gandhijee
|
right now i have 2 in the machine that is a client, and it doesn't seem to get messages over the netowrk for some reason
|
[20:45] Guthur
|
gandhijee, Spec Interface?
|
[20:46] Guthur
|
maybe you want to check out IO polling
|
[20:46] gandhijee
|
its ok i figured it out
|
[20:46] gandhijee
|
it was something super silly, its been a long day
|
[20:46] Guthur
|
hehe no bother, we all have those days
|
[21:11] Ben
|
hi all - I have a question about using sockets from different threads in C++. I'm on the latest code from the git repository master branch. The scenario is that I've created a socket in one thread, but I use it from another thread. Only one thread ever uses the socket at a time. I thought this was possible with the latest code, but I am not seeing any messages come into my test receiver on the other end.
|
[21:12] Ben
|
another question - my test receiver is written in Python and is on 2.0.8. Do I need to upgrade this to 2.1 as well in order to get it to work?
|
[21:15] ngerakines
|
from my experience its a bad idea and leads to unexpected results
|
[21:15] Ben
|
which part? mixing zmq versions or migrated between threads?
|
[21:15] ngerakines
|
I think the docs say in bold that sockets should never be shared across threads
|
[21:15] Ben
|
it does for 2.0
|
[21:16] Ben
|
but in 2.1 there is a statement saying that this is now legal
|
[21:16] Ben
|
I know it isn't released yet - I'm doing this out of git
|
[21:16] Ben
|
it may just be that it isn't ready
|
[21:16] ngerakines
|
I don't know then, sorry
|
[21:23] Guthur
|
is there really a need for the wuserver example to be binding to ipc
|
[21:23] Guthur
|
It's not used, and it just makes the example incompatible with windows
|
[21:25] Ben
|
if anyone is curious about my earlier question - moving my test client to 2.1 from 2.0.8 did indeed fix the problem.
|
[22:03] mikko
|
sustrik: how do i recognise you?
|
[22:04] sustrik
|
mikko: i've sent you my number
|
[22:04] sustrik
|
have you got it?
|
[22:04] mikko
|
yes
|
[22:04] sustrik
|
let me find some picture...
|
[22:05] mikko
|
http://farm4.static.flickr.com/3297/3626630182_006c6ba2c0.jpg
|
[22:05] mikko
|
thats me
|
[22:05] mikko
|
i dont have the beard anymore, just the moustache
|
[22:05] sustrik
|
that's me:
|
[22:05] sustrik
|
http://www.facebook.com/#!/photo.php?fbid=1403475059607&set=t.1233485121
|
[22:05] sustrik
|
guy with the tuba
|
[22:05] mikko
|
ok
|
[22:06] mikko
|
and it's not that large pub
|
[22:06] sustrik
|
ok
|
[22:07] sustrik
|
ah, jon dyte mentioned he's going to arrive
|
[22:19] pieterh
|
sustrik: you take a tube to all 0MQ meetups?
|
[22:19] pieterh
|
*tuba
|
[22:19] pieterh
|
sorry, it's been a long day :-)
|
[22:38] lestrrat
|
is fd_t only available in C++ land? (for getsockopt( ... ZMQ_FD ))
|
[22:51] Guthur
|
sustrik, I pushed the clrzmq2 code to the repo
|
[22:53] Guthur
|
I kept the same .NET assembly name, but bumped the version up to 2.0.0.0
|
[22:53] Guthur
|
It's easy for people to set there required version in MSVC, and MonoDevelop
|
[22:53] Guthur
|
their*
|