Monday November 8, 2010

[Time] NameMessage
[00:09] Guthur 0MQ does work on x86-64, correct?
[00:09] Guthur polling in particualr
[00:09] Guthur particular*
[00:31] Guthur oh got it sorted
[00:32] Guthur SOCKET changes size depending on x86 x86-64
[00:32] Guthur Is fd always and int on *nix
[00:32] Guthur this is in the zmq_pollitem_t struct
[00:35] Guthur oh it is, how bothersome
[08:08] euk hi all
[08:09] euk can anyone advise on unlimited memory growth with zmq?
[08:10] guido_g see HWM socket option and read the guide
[08:10] euk when just sending and receiving a large set (millions) of small messages
[08:10] euk thank you
[08:20] euk another question - i'm getting tcp throughput higher than ipc (linux, small messages). is that normal? cpu utilization is higher on tcp (so every message takes more cpu), still throughput is somehow higher
[09:49] Guthur sustrik: Do you think it would be worth making a clrzmq2 repo
[11:12] sustrik Guthur: yes, you can't just drop the existing codebase
[11:16] Guthur Yeah that's fair enough
[11:17] mikko mornin'
[11:17] sustrik mikko: morning
[11:17] Guthur Just felt it might be better to make the distinction clear, and especially considering it can not ever go into the master due to the compatibility issue
[11:18] mikko what is the biggest compatibility break?
[11:19] Guthur mikko: Most of it to be honest
[11:19] Guthur I used namespaces for one
[11:20] Guthur Also changed the constants to enums
[11:21] sustrik mikko: what about tomorrow, fance have a beer in the evening?
[11:21] mikko sustrik: sure
[11:21] mikko it's a storm in london
[11:21] mikko so take an umbrella
[11:22] mato hi guys
[11:22] sustrik i was already told about the storm in london by the girl behind the counter in the local supermarket here :)
[11:22] sustrik mato: hi
[11:22] Guthur Recv also just now just returns the message, its null if there is no message, this removes the need for an out parameter,
[11:22] mato sustrik: i've replied to the mailbox problem, can you check if my reasoning is correct?
[11:22] sustrik mikko: 7pm or so?
[11:23] sustrik let me see
[11:23] mikko sustrik: 7pm in Doggett's Coat & Badge ?
[11:23] mikko is that fine or do you want to eat something nicer?
[11:23] sustrik i am ok with that
[11:23] sustrik can you drop a notice to the mailing list in case someone would like to join us?
[11:24] mikko yeah, i'm just thinking if there was a nicer place to eat
[11:24] mikko at 7pm people are going to be hungry
[11:24] sustrik mikko: it's up to you
[11:24] sustrik i am not familiar with london too much
[11:25] mikko what kind of food do you like?
[11:25] sustrik all kinds :)
[11:28] sustrik mato: yes, the reasoning seems correct
[11:28] mato sustrik: ok, i'll whip up a patch and get chuck to test it
[11:28] sustrik however, keep in mind that by writing a command in two chunks
[11:28] sustrik you can recv it in 2 chunks as well
[11:28] sustrik so the recv part has to be changed as well
[11:28] mato hmm
[11:33] mikko sustrik: where are you staying ?
[11:34] sustrik southwark
[11:34] sustrik but choose any place you like
[11:35] mikko
[11:35] mikko it's easy
[11:35] sustrik yep, that's 2 mins from my hotel
[11:38] mato sustrik: the recv() side could in theory use MSG_WAITALL
[11:38] mato sustrik: I can try that and give it to chuck to test
[11:39] mato sustrik: there's also the problem that if recv() gets EINTR, it *may* get some bytes...
[11:39] mikko sustrik: sent
[11:39] mato sustrik: which sucks, I'm not sure how to solve that...
[11:39] sustrik mikko: thx
[11:39] mikko should've written Martin S. as there are many martins
[11:39] sustrik see you tomorrow
[11:39] mikko :)
[11:39] sustrik mato: EINTR?
[11:39] mato sustrik: yes, what?
[11:39] sustrik how would the API report that kind of thing?
[11:40] mato sustrik: it already does report EINTR
[11:40] sustrik nbytes == 3 && errno = EITNR?
[11:40] mato oh, that, right
[11:40] mato sustrik: sorry, you're right, i was reading the recvmsg docs
[11:41] sustrik as for the WAITALL wouldn't it work only for blocking recv?
[11:42] sustrik is there a way to combine WAITALL and NONBLOCK?
[11:43] mato unclear
[11:43] mato hang on, reading various threads
[11:43] mato sustrik: the most reliable way to do it would be to use a datagram socket instead of a stream socket
[11:43] mato sustrik: datagram sockets guarantee that the send/recv is atomic
[11:43] sustrik mato: is it possible with socketpair?
[11:44] mato sustrik: yes, you just ask for AF_UNIX, SOCK_DGRAM
[11:45] mato ?
[11:45] mato What do you mean?
[11:45] sustrik MTU per packet?
[11:45] mato ?
[11:45] sustrik if you send command, which was 1 byte
[11:45] sustrik how much of the buffer will be used
[11:46] sustrik 1 byte
[11:46] sustrik ?
[11:46] sustrik MTU bytes?
[11:46] mato No idea
[11:46] mato System-dependent
[11:46] sustrik it can make the problem even worse
[11:46] sustrik in any case, you should be able to read the command using at most 2 recv calls
[11:47] sustrik given that you write it using at most 2 sends
[11:47] mato guess so
[11:47] sustrik assert (PIPE_BUF < sizeof (command_t)) guarantees that
[11:47] sustrik >=
[11:48] mato I kind of doubt PIPE_BUF has much to do with AF_UNIX sockets
[11:49] sustrik what does it apply to then?
[11:49] sustrik mkfifo?
[11:49] mato sustrik: pipes
[11:49] mato sustrik: yeah
[11:49] sustrik btw, why aren't we using pipes?
[11:50] mato UNIX sockets are better these days IMO
[11:50] mato also you don't have to invent funky naming schemes
[11:50] mato etc etc
[11:50] mato since socketpair nicely gives you an anonymous pair
[11:51] sustrik ok
[11:51] mato sustrik: also, pipes have some fixed buffer sized
[11:51] mato sustrik: not resizable
[11:51] sustrik i was just thinking about the fact that what we need is unidirectional pipe
[11:51] sustrik the other direction is unused
[11:52] mato too bad :-)
[11:52] mato use it for something :-)
[11:52] mato if it bothers you :-)
[11:52] sustrik maybe we can at least shrink the buffer for that direction?
[11:52] mato maybe
[12:29] Guthur sustrik: Can someone create a zeromq / clrzmq2 repo?
[12:30] sustrik yup, wait a sec
[12:31] Guthur I'm at work at the moment but when I get home I can move the code to it and update the clrzmq page accordingly
[12:32] Guthur I'm off for lunch back in a bit
[12:33] mato sustrik: I occasionally get this from test_shutdown_stress:
[12:33] mato Socket operation on non-socket
[12:33] mato nbytes != -1 (tcp_socket.cpp:197)
[12:33] mato /bin/sh: line 4: 32321 Aborted (core dumped) ${dir}$tst
[12:33] mato FAIL: test_shutdown_stress
[12:34] sustrik mato: yes, the problem was reported already
[12:40] mato sustrik: ok, well, I have a preliminary patch for the mailbox retry stuff, sent to the ML
[12:41] sustrik mato: thanks
[12:46] dv what would be the recommended way to implement a messaging without a response?
[12:47] dv i have two nodes A and B, and they communicate asynchronously,
[12:47] dv there are no "requests" and subsequent "responses", only events
[12:48] dv i could open two req-rep connections between the two, so that both are requesters, and simply not use the response (or send some dummy response)
[12:48] dv but that sounds wasteful
[12:48] dv any suggestions?
[12:48] sustrik Guthur: done
[12:49] sustrik dv_: you have to think about scaling to get it right
[12:49] sustrik are there going to be multiple A's in the future?
[12:50] sustrik or multiple B's?
[12:50] sustrik if so, how are the messages to be dispatched to the multiple instances?
[12:50] sustrik each message to each instance?
[12:50] sustrik if so, use PUB/SUB
[12:50] sustrik if you want to load-balance messages between instances
[12:50] sustrik use PUSH/PULL
[12:51] dv no they would be peers
[12:51] dv there is a pair pattern, but it is marked as experimental
[12:52] dv if multiple A's/B's were the case, i would use pub/sub, yes
[12:52] dv hmm. you know, I could also use it for two nodes only.
[12:52] dv but can multiple publishers exist with an IPC connection?
[12:53] sustrik you can have multiple pubs pushing messages to a single SUB for example
[12:54] dv in fact, can i have multiple publishers with any kind of connection?
[12:54] sustrik yes
[12:54] dv cool
[12:55] dv i have an audio playback process here that is controlled by the frontend, and the playback process can send events, such as "song finished", "song metadata scanned" ..
[12:55] dv i think i'll use pub-sub here
[12:56] dv but does the IPC mechanism scale linearly with the amount of publishers/subscribers? for example, with tcp, since it is strictly point-to-point, i have a situation where every node has connections to all the other nodes, right?
[13:08] sustrik right, same with ipc
[13:10] dv something like multicast for ipc would rock. but i guess this is far from trivial. not just in zeromq, but speaking generally
[13:11] sustrik the whole multicast thing is a complex thing
[13:11] sustrik for ipc to prevent copying the message you would have to have it stored in shmem
[13:11] dv i noticed. i stumbled upon weird bugs that i have yet to replicate with the openpgm tools.
[13:11] sustrik however, allocating shmem is an expensive operation
[13:12] sustrik so it's not that easy
[13:12] dv to think of that in ipc ...
[13:12] dv yes i see
[13:13] dv oh one other thing, when I use pub/sub with multicast (epgm),
[13:13] dv and I turn set mcast_loop to 0 in the publisher socket,
[13:14] dv the receiver doesnt get anything anymore - but i've only tried it out with receiver and sender running on the same host so far
[13:15] dv if I understand this correctly, mcast_loop 0 means that only messages sent between hosts will pass through? i found that part of the manual a bit confusing
[13:15] dv (thats my last question btw. :) )
[13:15] sustrik dv_: yes, that's how it works
[13:15] sustrik multicast over loopback is a terrible hack
[13:16] sustrik that is better not to use at all
[13:16] dv hmm. i wonder if the bugs i noticed stem from this
[13:16] dv i will use IPC for nodes on the same host then
[13:16] sustrik quite possibly, anyway turning mcast_loop to off means you won't get the messages on the same host
[13:17] dv i dont know if you follow the mailing list much, the bug I mentioned was an assert that happened randomly
[13:17] dv openpgm sometimes thinks there is no multicast capable NIC
[13:17] sustrik i'm following the list, but i have little knowledge of openpgm
[13:17] dv overriding internal openpgm checks made it work all the time
[13:17] dv but it felt ... wrong.
[13:18] sustrik steve mccoy should comment on that
[13:18] dv he did. i just mention it
[13:18] dv hmm i'll tell him about the same host
[13:18] dv or no, hold on. it already happened when I was just creating a publisher socket
[13:19] dv but maybe turning off mcast_loop first does help ... i'll try it out this afternoon. thanks for the suggestions.
[13:21] cremes mato: any chance you can send me that mailbox patch as a gist? i get "fatal: corrupt patch at line 6" if i copy/paste it into a file for application
[13:22] mato i'll email it to you directly
[13:22] mato cremes: what's your email address?
[13:23] mato cremes: I sent it to the email you use on the ML.
[13:26] cremes mato: perfect; applied it but now get another assertion
[13:26] cremes Assertion failed: nbytes == want_nbytes (mailbox.cpp:213)
[13:27] mato cremes: Hmm, well, that's precisely what shouldn't happen
[13:28] mato cremes: Either you send me your code and I'll try and reproduce it here, or insert some printf's in the various code paths in send() yourself and try and figure out the actual sequence of syscalls
[13:28] cremes if you have ruby 1.9.2 on your system, i can send you the code
[13:28] mato it's possible OSX is behaving funny when the socket SNDBUF is resized
[13:29] cremes otherwise, suggest what details you want printed out and i'll modify the 0mq source locally
[13:29] mato cremes: no i don't, sorry... I have some oldish snapshot of 1.9.1
[13:29] cremes ok
[13:29] cremes before modifying the source, i'll try this on my archlinux box
[13:29] mato cremes: that's a good idea
[13:29] cremes let's see if linux blows up too
[13:31] mato lunch. bbl
[13:35] cremes mato: when i run the *unpatched* master on linux against my code example, i get a different assertion
[13:35] cremes Assertion failed: new_sndbuf > old_sndbuf (mailbox.cpp:182)
[13:35] cremes let me try it with the patch...
[13:35] mato cremes: that means you hit the system SNDBUF max
[13:35] mato cremes: set the net.core.wmem_max sysctl to something high
[13:35] mato cremes: and it'll go away
[13:36] cremes ko
[14:39] cremes ok, so after boosting net.core.wmem_max i am now getting a socket error "too many open files" when i allocate 508 REQ sockets
[14:40] cremes according to ulimit -n, i set the file descriptor max to 250_000 and i still get that
[14:40] cremes this is on archlinux, kernel 2.6.35
[14:41] cremes same behavior with *and* without mato's patch
[14:55] pieterh cremes, cat /proc/sys/fs/file-max
[14:55] pieterh what does it give you?
[14:56] mato cremes: for ulimit -n you'll have to run that as root, then from that *same* root shell run your code
[14:56] mato cremes: since it's inherited to child processes, it is not a system setting
[14:56] mato my patch doesn't change anything wrt number of open files
[14:56] cremes pieterh: results in 1201177
[14:56] pieterh This is from the Confluence wiki
[14:56] pieterh Run the command sysctl -a. If this is less than 200000, increase the number of file handles by editing /etc/sysctl.conf and changing the property fs.file-max to 200000. If there isn't a value set already for this property, you need to add the line fs.file-max=200000.
[14:56] pieterh Then run sysctl -p to apply your changes to your system.
[14:56] cremes mato: i logged in a fresh shell after making the change
[14:56] mato cremes: yes, but that won't work
[14:57] pieterh cremes: seems high enough, how about 'sysctl -a'?
[14:57] mato cremes: verify with ulimit -a in the shell that you're running your code in that the setting has actually taken effect
[14:57] mato cremes: anyway, this is beside the point of my patch, so you're running into some macosx weirdness there
[14:57] cremes mato: it tells me 250000 for open files
[14:58] cremes this is on linux
[14:58] pieterh archlinux
[14:58] mato cremes: and you're still running out of open files?
[14:58] cremes yep
[14:58] mato cremes: then your code is the problem, sorry :-)
[14:58] cremes ha
[14:58] pieterh cremes: do you have a simple test case I can try?
[14:58] mato as in, the number of open files is not infinite :-)
[14:59] cremes pieterh: results of sysctl -a ....
[14:59] mato cremes: anyhow, the interesting case is the failure you're getting on macosx, let me know when you have time to put some printfs in the mailbox code
[14:59] cremes mato: i have time now; give me suggestions on what is important and i'll print it out
[14:59] pieterh cremes, fs.file-max = 1201177
[14:59] pieterh fs.nr_open = 1048576
[14:59] cremes pieterh: if you have ruby 1.9.2 then i can give you a relatively simple test case
[15:00] pieterh derp
[15:00] mato cremes: print nbytes after the 1st do loop in send() (after line 163)
[15:00] cremes mato: prepatched or postpatch?
[15:00] mato cremes: patched
[15:00] mato cremes: then, print old_sndbuf, new_sndbuf after the assert on line 184
[15:01] mato cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
[15:01] mato cremes: that should give us an idea of what is going on...
[15:02] mato cremes: oh, and each thing that you print...
[15:02] mato cremes: print also the this pointer as %p, (void *)this
[15:02] mato cremes: so that it's obvious which printf belongs to which instance
[15:07] pieterh cremes, how about trying a simple program to test your Linux file handle limit...
[15:07] pieterh
[15:09] pieterh i can create over 8M sockets on my linux box
[15:10] cremes pieterh: where is zfl.h at?
[15:11] mikko cremes: github/zeromq/zfl
[15:13] pieterh cremes: sorry, just use zmq.h it'll work too
[15:14] pieterh cremes, ok, fixed it
[15:14] cremes pieterh: give me the line to compile it please... i tried "gcc test.c" but it's bitching about implicit declaration of 'printf'
[15:15] pieterh ack, take my fixed version
[15:15] pieterh gcc -o testit testit.c
[15:15] pieterh don't call c programs 'test' cause 'test' is a built-in shell command
[15:15] pieterh gcc -o testit -lzmq testit.c
[15:15] mato pieterh: your program will fall over in zmq sooner than it can print the number of sockets it managed to create
[15:16] pieterh mato: doesn't seem to
[15:16] mato pieterh: becuase of the mailbox failing to get a socket
[15:16] mato pieterh: well, in master anyway
[15:16] pieterh perhaps in master, yes... i was just trying in 2.0.10
[15:16] mato 2.0.10 is much different
[15:16] mato in fact, in 2.1 even if it did not fall over, it would produce a different number
[15:17] pieterh hey: :-)
[15:17] pieterh "too many open files"
[15:17] pieterh not even a clean return, just internal failure
[15:17] pieterh neato
[15:17] mato 2.0.10 has completely different signalling mechanisms
[15:17] cremes pieterh: aha!
[15:17] pieterh Too many open files
[15:17] pieterh rc == 0 (mailbox.cpp:374)
[15:17] pieterh Aborted (core dumped)
[15:17] cremes maybe i'm not so crazy after all...? :)
[15:18] pieterh cremes: i can't make a test program for that :-)
[15:18] cremes too bad...
[15:18] pieterh let me see how quickly it dies...
[15:18] mato with the default limit of 1024 it'll die somewhere around 512
[15:18] mato actually, probably not
[15:19] pieterh cremes!
[15:19] mato it'll die at 1024/3 roughly i'd guess
[15:19] pieterh hey!
[15:19] pieterh it dies at 508
[15:19] mato yup
[15:19] pieterh wow... from >8M to 508... that is quite a step backwards
[15:19] pieterh ok, mato, where's that default limit of 1K?
[15:19] cremes pieterh: 508 is what i see too
[15:20] mato hang on guys, there are several things going on here
[15:20] pieterh net.core.wmem_max.. right?
[15:20] mato no
[15:20] mato that's part of it
[15:20] mato allow me to explain
[15:20] pieterh shoot
[15:21] mato in 2.0.x, there is one signalling socketpair (what is called mailbox in 2.1) per *application thread*
[15:21] mato hence, creating a 0mq socket does not use up an actual file descriptor
[15:21] pieterh ah, neat
[15:21] mato now, in 2.1 the situation is entirely different
[15:21] mato each 0mq socket created == 1 mailbox
[15:22] mato 2 file descriptors
[15:22] mato hence, if the default limit is 1024, you'll die at <512
[15:22] pieterh sure...
[15:22] pieterh now we have a trivial reproducible case on a default linux box
[15:22] pieterh how do I raise that limit?
[15:22] cremes guys, what is the printf format code for printing ssize_t?
[15:22] pieterh %d
[15:23] mato cremes: use %d and cast it to int
[15:23] cremes archlinux bitches at me about that
[15:23] pieterh %u maybe
[15:23] cremes ah, ok, it will fit
[15:23] mato cremes: there is no printf format for ssize_t
[15:23] mato now, as to raising the limit
[15:23] mato there are two limits
[15:23] mato 1. the system maximum, which is the sysctl pieter already mentioned
[15:23] mato normally you don't need to touch that
[15:24] mato 2. your distribution will have some $mechanism for setting a default RLIMIT_NOFILE on login, usually to 1024
[15:24] mato if you want to raise that, either figure out where the setting for $mechanism is
[15:24] mato or
[15:24] mato 1. (as user) $ su root
[15:24] mato 2. (as root) # ulimit -n 8192
[15:25] mato 3. (still as that root) # su user
[15:25] mato 4. run your app
[15:25] mato you need to follow those 4 exact steps otherwise the raised rlimit setting will not propagate to your app
[15:25] mato if you open a new shell, it won't work
[15:25] mato is this clear?
[15:27] pieterh mato: having raised the ulimit to 8192 (or any large number), the test program now exits cleanly
[15:27] pieterh and reports: "This system allows up to 511 sockets"
[15:27] mato pieterh: yes, now there is a third limit also
[15:27] cremes mato: results from running my test program on OSX and archlinux with the printf statements
[15:27] cremes
[15:27] mato pieterh: compile time setting in 0MQ src/config.hpp
[15:28] mato pieterh: which happens to be set to 512
[15:28] pieterh was this limit introduced in 2.1.0?
[15:29] pieterh i assume it must have been, since previously I could open 8M sockets
[15:29] mato pieterh: presumably
[15:29] pieterh well, ok I can confirm what you've explained
[15:30] pieterh though the max-sockets option seems 1 off
[15:30] mato pieterh: dunno, that is sustrik's work
[15:30] pieterh - setting it to 8000 allows me to create 7999 sockets
[15:30] mato it's allocating resources of some kind up front, otherwise it wouldn't be a compile-time option
[15:31] mato cremes: ok, that output is strange
[15:31] mato cremes: note that the old_sndbuf/new_sndbuf values for the failing mailbox look completely bogus
[15:31] mato 32? 16? wtf...
[15:31] cremes mato: are you looking at the osx or arch output?
[15:32] mato oh, right, sorry
[15:32] mato bah, let me save those gists to a file
[15:33] cremes i think on linux i am hitting the 512 socket limit
[15:33] cremes on osx, it fails long before
[15:35] mato cremes: hmm, the output is a bit mixed up due to stdout/stderr mixing
[15:35] cremes mato: yes, want me to separate them?
[15:35] mato cremes: are you printing the retry_nbytes case?
[15:35] mato cremes: i can't find it anywhere...
[15:35] mato cremes: just use fprintf (stderr, ...) for everything, it'll work fine
[15:35] cremes mato: no, you didn't mention that as being interesting
[15:35] mato cremes: oh, sorry, i did, you must have missed it
[15:36] mato 16:01 < mato> cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
[15:36] mato cremes: can you try that please? and set it all to use stderr?
[15:37] cremes ok, i'll add that one... sorry for the mixup
[15:51] cremes mato: osx results...
[15:53] pieterh cremes, I've reported the issue
[15:54] cremes pieterh: thank you
[15:58] pieterh I'm pretty sure the limit of ~512 sockets will trip up a few existing apps
[15:58] cremes pieterh: i changed that limit to 10k on my linux box, reran the test and it passed
[15:58] pieterh yes, indeed
[15:58] cremes i did the same change on osx and it still fails in the same spot, same error :(
[15:59] cremes must be some funky osx-ism
[15:59] pieterh different techniques for setting per-user limits
[15:59] pieterh its BSD-4.3 heritage or something
[15:59] cremes pieterh: i bumped file descriptors per process to 25k a long time ago for mongodb on this box
[16:00] mato pieterh: max_sockets directly affects the number of mailbox slots
[16:00] mato pieterh: so the limit is set intentionally low
[16:00] zmq_help hi folks - i'm trying to get zeromq ruby bindings working on Ubuntu 10.10
[16:00] pieterh mato: you mean as in reducing pre-allocated memory size?
[16:00] mato pieterh: yes
[16:00] pieterh is there another performance reason?
[16:00] cremes zmq_help: which bindings? the zmq gem or ffi-rzmq?
[16:01] pieterh data copying etc?
[16:01] mato pieterh: possibly, read the code :-)
[16:01] zmq_help I installed zmq
[16:01] zmq_help I installed zmq ruby gem
[16:01] mato pieterh: but in any case, the standard untuned Linux limit is 1024 fds
[16:01] cremes zmq_help: ok; so what seems to be the problem?
[16:01] mato pieterh: so there's no point in changing the default of max_sockets
[16:01] mato pieterh: since it'll fall over anyway
[16:01] pieterh mato: IMO there is
[16:01] pieterh it'll fall over with a normal system error
[16:02] pieterh that can be fixed without recompilation
[16:02] pieterh modifying code to change a limit is not acceptable for many processes
[16:02] pieterh it means requalification
[16:02] pieterh we've discussed this previously
[16:02] pieterh imagine 'large bank'
[16:03] mato pieterh: large bank should test damn well for such limits...
[16:03] zmq_help cremes: when I enter 'require "zmq"' I get this error message
[16:03] zmq_help LoadError: cannot open shared object file: No such file or directory - /usr/lib/ruby/gems/1.8/gems/zmq-2.0.9/lib/
[16:03] pieterh mato: that's not the point
[16:03] pieterh point is tuning the sw at compile time is sub-optimal
[16:03] zmq_help cremes: but I can see that the file is there, and readable
[16:03] pieterh and it can be done better unless there is an actual cost in having a larger limit
[16:03] pieterh anyhow, issue is noted
[16:04] mato pieterh: well, don't forget there are many audiences, embedded, etc etc..
[16:04] cremes zmq_help: did you install the 0mq library? the gem does not include it
[16:04] cremes also, make sure you run ldconfig after installing the 0mq lib
[16:04] mato pieterh: and the limit only affects # of sockets, not # connections obviously
[16:04] cremes so that linux picks it up
[16:04] mato anyway, noted.
[16:04] pieterh mato: I'd suggest embedded systems are less 'typical'
[16:04] zmq_help cremes - yes I installed everything - i'll try the ldconfig trick now...
[16:04] pieterh plus 0MQ encourages use of many sockets
[16:04] mato it does? :-)
[16:05] pieterh yes
[16:05] pieterh typically one for each flow
[16:05] cremes pieterh: agreed... i sometimes need thousands for just that reason
[16:05] pieterh and if you want any custom routing that means several sockets per peer
[16:05] zmq_help cremes: THANKS! the ldconfig trick did the job
[16:05] pieterh as you well know, mato :-)
[16:05] cremes zmq_help: remember to help someone else in channel when you get a chance ;)
[16:06] zmq_help :-) will do thanks
[16:06] zmq_help cremes: where should I give documentation feedback re: the ldconfig trick ??
[16:06] cremes pieterh: does imatix have an osx box for testing?
[16:06] cremes zmq_help: that trick is listed in the FAQ i think
[16:06] pieterh cremes: yes, sure
[16:06] pieterh try this: sudo echo “limit maxfiles 1000000 1000000″ > /etc/launchd.conf
[16:06] cremes otherwise, feel free to join the wiki and modify it on your own
[16:07] zmq_help cremes: ok thanks -
[16:07] pieterh and then launchctl limit maxfiles 1000000 1000000
[16:07] cremes pieterh: i did that when i bumped my limits to 25k... here are the current contents: limit maxfiles 25000 100000
[16:07] pieterh cremes, I'm just googling random pages for ""too many open files" OSX"
[16:07] pieterh hmm
[16:07] cremes ulimit -a agrees that it is 25k
[16:08] pieterh and you tuned 0MQ to 10k sockets?
[16:08] cremes yes
[16:08] pieterh and you still get an abort at 508 sockets?
[16:09] pieterh cremes, see
[16:09] mato cremes: hmph, those sndbuf numbers from osx make no sense
[16:09] mato cremes: if it really was 16/32 bytes, it'd fall over immediately
[16:10] mato cremes: i'd need access to an OSX box to figure out what is going on
[16:10] cremes mato: i can give you an account on mine if you'd like
[16:10] pieterh mato: I can bring you my MacBook when I manage to make it to BA
[16:11] mato cremes: that would be great, but honestly i don't have time this week to look at it seriously
[16:11] cremes mato: ok
[16:12] mato cremes: upcoming deadlines, too much to do, sorry...
[16:12] cremes i understand... no worries
[16:15] twomashi1 How can I process all outstanding messages in a socket and stop recieving new ones?
[16:19] guido_g you can't
[16:20] guido_g you can receive all messages in a loop, but you can't tell the ømq socket to stop accepting more/new messages
[16:23] Steve-o twomashi1: what is the logical requirement from this scenario? clean FT fail over?
[16:29] twomashi1 Steve-o: Removing a worker from a worker pool
[16:30] Guthur Does 0MQ aim to provide more devices in the future?
[16:34] mikko Guthur: i would imagine that if certain devices are used in large amount of projects they could be incorporated into the core
[16:34] mikko but not sure whether adding new devices is the biggest priority
[16:35] mikko Guthur: are there specific devices that you are after?
[16:36] Guthur mikko: Not really just interested is all, if I could find examples of what would be beneficial I would maybe even have a stab at implementing
[16:41] twomashi1 guido_g: I wanted to have a worker which would process n messages and then die
[16:42] twomashi1 but it sounds like theres no way to stop the worker recieving more messages
[16:42] guido_g as i already said
[16:42] twomashi1 so this pattern wont work
[16:42] twomashi1 thanks
[16:47] Steve-o twomashi1: I think there are few implementations of worker pools already, e.g.
[16:51] guido_g doesn't handle the dynamic shutdown of one worker though
[17:04] ngerakines sup
[17:17] cremes found the trick for osx... it has some sysctl settings that it shares with freebsd
[17:18] cremes for localhost/loopback connections, there are separate send/recv buffers allocated
[17:18] cremes
[17:18] cremes
[17:18] cremes or whatever value you want
[17:18] cremes i'll add this to the FAQ
[17:18] mato cremes: so upping those values makes the code work for you?
[17:19] mato cremes: that would imply that at least for some sockets, the default on OSX is ridiculously low
[17:19] cremes yes, because it avoids the recovery code that tries to adjust SO_SNDBUF
[17:19] mato the thing is, the recovery code works
[17:19] mato as long as the OS comes back with a sane sndbuf in the getsockopt
[17:19] cremes i believe you, but SO_SNDBUF isn't returning the right vals
[17:19] cremes and i don't know why
[17:20] cremes maybe it is returning kbytes instead of bytes
[17:20] mato that is indeed what it looks like
[17:20] mato cremes: could you please somehow summarize what we've found and reply to the thread on the ML so that we don't lose it?
[17:20] cremes absolutely
[17:21] mato cremes: it's plausible that even just setting a large-ish value at mailbox socketpair creation time would make the problem go away
[17:21] mato needs more investigation...
[17:21] cremes right... i'll respond to the ML with a few details; i'll also update pieter's issue
[17:21] mato thx
[17:49] cremes hmmm, what would you guess is happening if i get "Too many open files: rc == 0 (mailbox.cpp:431)" ?
[17:49] cremes this is with the patched mailbox.cpp
[17:51] mato precisely what it says :)
[17:52] cremes heh
[17:53] cremes i must not have raised my limits high enough and now i'm running out of another resource
[18:10] ngerakines hey folks, I've got a c++ app that uses zmq with threads and I'm not sure about the proper way of doing something in particular:
[18:10] ngerakines for these threads, they bind to a given socket and wait for incoming messages, however when I want to shutdown these threads I'm not finding through the docs the best way to my socket_in.recv/1 to stop blocking on input
[18:11] ngerakines how should I go about this?
[18:12] twomashi1 ngerakines: i have the same issue
[18:13] twomashi1 im told theres no way to get zmq to stop recieving messages so you can process outstanding ones
[18:13] nettok ngerakines: I would like to know about that too
[18:15] nettok ngerakines: maybe registering a signal handler and then trigger the signal from another thread or process, something like that?
[18:16] ngerakines hmm
[18:16] nettok Or making read non-blocking
[18:16] twomashi1 oh wait... sorry different issue
[18:16] twomashi1 use poll?
[18:17] ngerakines yeah, looking into poll
[18:18] guido_g ngerakines: simply send a quit message (can be of length 0)
[18:19] ngerakines ok, was looking for a way to effectively halt all zmq communication once stop was initiated, but thanks
[18:20] guido_g then just terminate the context
[18:20] guido_g that will close all ømq sockets and stop all ømq related activities
[18:22] nettok guido_g: thanks!
[18:23] ngerakines guido_g: and in cases where i've got a pub/sub client, use poll?
[18:24] guido_g ngerakines: can't follow you, sorry
[18:24] ngerakines I've got a pub/sub subscriber with a run-loop that uses socket_t.recv(..)
[18:24] guido_g so?
[18:24] ngerakines instead of recv, should I be using poll to close out cleanly?
[18:24] guido_g why?
[18:25] ngerakines Is there a way to send it a kill message like above given that it isn't binding?
[18:25] guido_g wht isn't binding? w/o a bound socket you can't receive any messages
[18:26] guido_g so if you have a sub socket, it'll just receive the message
[18:27] ngerakines because it is creating a socket with ZMQ_SUB ?
[18:27] ngerakines with connect ?
[18:27] ngerakines socket_t.connect(...) as opposed to socket_t.bind(...)
[18:28] guido_g there is no difference between the connect side and the bind side in this case
[18:28] guido_g this is for sure mentioned in the guide somewhere
[18:56] gandhijee hey guys, is there an ubuntu/debian package of zeromq anywhere?
[18:56] pieterh gandhijee, I believe there is, somewhere
[18:57] gandhijee happen to have an idea where somewhere might be?
[18:57] pieterh Do a google for "debian package zeromq"
[18:57] pieterh
[18:58] twomashi1 pieterh: do you know how a process using zeromq could stop recieving messages and process it's outstanding messages? (say to remove itself from a worker pool)
[18:58] gandhijee yeah i had found that one, its 2.0.6, the other machines are all on 2.10
[18:58] pieterh twomashi1, let me think about it... I just got back and am reading the traffic here
[18:59] twomashi1 ok cool
[18:59] twomashi1 thanks
[18:59] pieterh gandhijee, right... but it's pretty simple to build from source anyhow
[18:59] gandhijee i don't want to have to deal with any issues because 1 is newer than the other
[18:59] gandhijee yeah, but i wanted to be lazy and not build the .deb
[18:59] gandhijee i have to put it on 3 more machines, 1 x86_64 and 2 atoms
[18:59] pieterh gandhijee, download tarball, build, it's really simple
[19:00] gandhijee yes i know, like i said, i wanted to be lazy and not build the deb, because i have to get it to 3 more machines,
[19:00] pieterh you can make a simple script that does it from wget to 'sudo make install; ldconfig'
[19:00] gandhijee building the deb is just a cpl extra steps
[19:00] pieterh sure
[19:01] pieterh being lazy is good if you use that to create something
[19:01] pieterh if it's just to avoid work... well... :-)
[19:01] pieterh twomashi1, can you provide more background info?
[19:01] pieterh do you want flow control?
[19:03] twomashi1 pieterh: I want to use a pool of PHP workers to process data, this is existing PHP code adapted to read messages from a ZMQ socket. I dont want them to live indefinitely because PHP is prone to memory leakes, so they must exit after processing n requests. Zeromq will fetch more messages than I need tho, so I want some way to stop ZMQ prefetching messages and process the messages already in memory.
[19:04] pieterh right...
[19:04] pieterh I can think of a few ways
[19:04] pieterh here is the most brutal
[19:04] pieterh you batch the messages together, using multipart and some delimiters
[19:05] pieterh so that you actually send your whole batch, say 100 messages, as one 0MQ message
[19:05] pieterh you then use the LRU routing technique from the Guide ch3
[19:05] pieterh where the worker signals 'ready' and then gets a job
[19:05] pieterh when the worker has processed its job it just terminates
[19:05] pieterh it won't signal 'ready' again, so won't get another batch
[19:05] pieterh you control the batch size explicitly at the sender side
[19:06] twomashi1 ah, with LRU the worker must signal to get a job?
[19:06] pieterh now the problems with this:
[19:06] pieterh yes
[19:06] pieterh it's a nice model except it's chatty
[19:06] twomashi1 chatty is fine, it must be safe.
[19:06] pieterh so if you have many small jobs the to and fro will cost too much
[19:06] pieterh well, study the lruqueue example
[19:06] pieterh you can probably use that as a device in front of your existing client/s
[19:06] pieterh with some small mods it'll do what you want
[19:07] twomashi1 that could work because the job server will be on the same machine.
[19:07] pieterh sure
[19:07] twomashi1 so I dont think chatty is an issue
[19:07] pieterh so use the LRU device and then in the workers, die after doing X jobs
[19:07] pieterh put $5 in the can on the way out :-)
[19:07] twomashi1 hehe
[19:08] twomashi1 Ok, and do you think that in some future there will be a way to support this usage case by instructing the context to stop recieving for a socket, or something like that?
[19:08] pieterh well...
[19:08] pieterh i don't think it has to be built in
[19:08] twomashi1 i imagine that if it doesnt go against the goals and policies of the project it could happen, if someone put the time in.
[19:08] pieterh it works very nicely at the level it's at now
[19:09] pieterh to answer that in technical detail...
[19:09] pieterh if you want to regulate how much the sender sends
[19:09] pieterh then you must send information back explicitly to synchronize it
[19:10] pieterh you could call this an ack windw
[19:10] pieterh ack window
[19:10] pieterh it only makes sense in an asynchronous request-reply model
[19:10] pieterh so yes, it *might* be added to the XREP socket
[19:11] twomashi1 I see what you mean... the effect can be reproduced using the currently available facilities
[19:11] twomashi1 thanks for your help!
[19:18] cremes OSX sysctl is very odd... see my update on the ML
[19:22] mato gandhijee: there is a Debian package of ZeroMQ 2.0.10, I maintain it
[19:22] mato gandhijee: it's in Debian unstable
[19:22] gandhijee sweet!!
[20:25] gandhijee how do i have zeromq listen on a spec interface?
[20:25] gandhijee right now i have 2 in the machine that is a client, and it doesn't seem to get messages over the netowrk for some reason
[20:45] Guthur gandhijee, Spec Interface?
[20:46] Guthur maybe you want to check out IO polling
[20:46] gandhijee its ok i figured it out
[20:46] gandhijee it was something super silly, its been a long day
[20:46] Guthur hehe no bother, we all have those days
[21:11] Ben hi all - I have a question about using sockets from different threads in C++. I'm on the latest code from the git repository master branch. The scenario is that I've created a socket in one thread, but I use it from another thread. Only one thread ever uses the socket at a time. I thought this was possible with the latest code, but I am not seeing any messages come into my test receiver on the other end.
[21:12] Ben another question - my test receiver is written in Python and is on 2.0.8. Do I need to upgrade this to 2.1 as well in order to get it to work?
[21:15] ngerakines from my experience its a bad idea and leads to unexpected results
[21:15] Ben which part? mixing zmq versions or migrated between threads?
[21:15] ngerakines I think the docs say in bold that sockets should never be shared across threads
[21:15] Ben it does for 2.0
[21:16] Ben but in 2.1 there is a statement saying that this is now legal
[21:16] Ben I know it isn't released yet - I'm doing this out of git
[21:16] Ben it may just be that it isn't ready
[21:16] ngerakines I don't know then, sorry
[21:23] Guthur is there really a need for the wuserver example to be binding to ipc
[21:23] Guthur It's not used, and it just makes the example incompatible with windows
[21:25] Ben if anyone is curious about my earlier question - moving my test client to 2.1 from 2.0.8 did indeed fix the problem.
[22:03] mikko sustrik: how do i recognise you?
[22:04] sustrik mikko: i've sent you my number
[22:04] sustrik have you got it?
[22:04] mikko yes
[22:04] sustrik let me find some picture...
[22:05] mikko
[22:05] mikko thats me
[22:05] mikko i dont have the beard anymore, just the moustache
[22:05] sustrik that's me:
[22:05] sustrik!/photo.php?fbid=1403475059607&set=t.1233485121
[22:05] sustrik guy with the tuba
[22:05] mikko ok
[22:06] mikko and it's not that large pub
[22:06] sustrik ok
[22:07] sustrik ah, jon dyte mentioned he's going to arrive
[22:19] pieterh sustrik: you take a tube to all 0MQ meetups?
[22:19] pieterh *tuba
[22:19] pieterh sorry, it's been a long day :-)
[22:38] lestrrat is fd_t only available in C++ land? (for getsockopt( ... ZMQ_FD ))
[22:51] Guthur sustrik, I pushed the clrzmq2 code to the repo
[22:53] Guthur I kept the same .NET assembly name, but bumped the version up to
[22:53] Guthur It's easy for people to set there required version in MSVC, and MonoDevelop
[22:53] Guthur their*