ZeroMq IRC Log

Monday November 8, 2010

[Time] Name	Message
[00:09] Guthur	0MQ does work on x86-64, correct?
[00:09] Guthur	polling in particualr
[00:09] Guthur	particular*
[00:31] Guthur	oh got it sorted
[00:32] Guthur	SOCKET changes size depending on x86 x86-64
[00:32] Guthur	Is fd always and int on *nix
[00:32] Guthur	this is in the zmq_pollitem_t struct
[00:35] Guthur	oh it is, how bothersome
[08:08] euk	hi all
[08:09] euk	can anyone advise on unlimited memory growth with zmq?
[08:10] guido_g	see HWM socket option and read the guide
[08:10] euk	when just sending and receiving a large set (millions) of small messages
[08:10] euk	thank you
[08:20] euk	another question - i'm getting tcp throughput higher than ipc (linux, small messages). is that normal? cpu utilization is higher on tcp (so every message takes more cpu), still throughput is somehow higher
[09:49] Guthur	sustrik: Do you think it would be worth making a clrzmq2 repo
[11:12] sustrik	Guthur: yes, you can't just drop the existing codebase
[11:16] Guthur	Yeah that's fair enough
[11:17] mikko	mornin'
[11:17] sustrik	mikko: morning
[11:17] Guthur	Just felt it might be better to make the distinction clear, and especially considering it can not ever go into the master due to the compatibility issue
[11:18] mikko	what is the biggest compatibility break?
[11:19] Guthur	mikko: Most of it to be honest
[11:19] Guthur	I used namespaces for one
[11:20] Guthur	Also changed the constants to enums
[11:21] sustrik	mikko: what about tomorrow, fance have a beer in the evening?
[11:21] mikko	sustrik: sure
[11:21] mikko	it's a storm in london
[11:21] mikko	so take an umbrella
[11:22] mato	hi guys
[11:22] sustrik	i was already told about the storm in london by the girl behind the counter in the local supermarket here :)
[11:22] sustrik	mato: hi
[11:22] Guthur	Recv also just now just returns the message, its null if there is no message, this removes the need for an out parameter,
[11:22] mato	sustrik: i've replied to the mailbox problem, can you check if my reasoning is correct?
[11:22] sustrik	mikko: 7pm or so?
[11:23] sustrik	let me see
[11:23] mikko	sustrik: 7pm in Doggett's Coat & Badge ?
[11:23] mikko	is that fine or do you want to eat something nicer?
[11:23] sustrik	i am ok with that
[11:23] sustrik	can you drop a notice to the mailing list in case someone would like to join us?
[11:24] mikko	yeah, i'm just thinking if there was a nicer place to eat
[11:24] mikko	at 7pm people are going to be hungry
[11:24] sustrik	mikko: it's up to you
[11:24] sustrik	i am not familiar with london too much
[11:25] mikko	what kind of food do you like?
[11:25] sustrik	all kinds :)
[11:28] sustrik	mato: yes, the reasoning seems correct
[11:28] mato	sustrik: ok, i'll whip up a patch and get chuck to test it
[11:28] sustrik	however, keep in mind that by writing a command in two chunks
[11:28] sustrik	you can recv it in 2 chunks as well
[11:28] sustrik	so the recv part has to be changed as well
[11:28] mato	hmm
[11:33] mikko	sustrik: where are you staying ?
[11:34] sustrik	southwark
[11:34] sustrik	but choose any place you like
[11:35] mikko	http://www.doggettscoatandbadge.co.uk/
[11:35] mikko	it's easy
[11:35] sustrik	yep, that's 2 mins from my hotel
[11:38] mato	sustrik: the recv() side could in theory use MSG_WAITALL
[11:38] mato	sustrik: I can try that and give it to chuck to test
[11:39] mato	sustrik: there's also the problem that if recv() gets EINTR, it may get some bytes...
[11:39] mikko	sustrik: sent
[11:39] mato	sustrik: which sucks, I'm not sure how to solve that...
[11:39] sustrik	mikko: thx
[11:39] mikko	should've written Martin S. as there are many martins
[11:39] sustrik	see you tomorrow
[11:39] mikko	:)
[11:39] sustrik	mato: EINTR?
[11:39] mato	sustrik: yes, what?
[11:39] sustrik	how would the API report that kind of thing?
[11:40] mato	sustrik: it already does report EINTR
[11:40] sustrik	nbytes == 3 && errno = EITNR?
[11:40] mato	oh, that, right
[11:40] mato	sustrik: sorry, you're right, i was reading the recvmsg docs
[11:41] sustrik	as for the WAITALL wouldn't it work only for blocking recv?
[11:42] sustrik	is there a way to combine WAITALL and NONBLOCK?
[11:43] mato	unclear
[11:43] mato	hang on, reading various threads
[11:43] mato	sustrik: the most reliable way to do it would be to use a datagram socket instead of a stream socket
[11:43] mato	sustrik: datagram sockets guarantee that the send/recv is atomic
[11:43] sustrik	mato: is it possible with socketpair?
[11:44] mato	sustrik: yes, you just ask for AF_UNIX, SOCK_DGRAM
[11:45] mato	?
[11:45] mato	What do you mean?
[11:45] sustrik	MTU per packet?
[11:45] mato	?
[11:45] sustrik	if you send command, which was 1 byte
[11:45] sustrik	how much of the buffer will be used
[11:46] sustrik	1 byte
[11:46] sustrik	?
[11:46] sustrik	MTU bytes?
[11:46] mato	No idea
[11:46] mato	System-dependent
[11:46] sustrik	it can make the problem even worse
[11:46] sustrik	in any case, you should be able to read the command using at most 2 recv calls
[11:47] sustrik	given that you write it using at most 2 sends
[11:47] mato	guess so
[11:47] sustrik	assert (PIPE_BUF < sizeof (command_t)) guarantees that
[11:47] sustrik	>=
[11:48] mato	I kind of doubt PIPE_BUF has much to do with AF_UNIX sockets
[11:49] sustrik	what does it apply to then?
[11:49] sustrik	mkfifo?
[11:49] mato	sustrik: pipes
[11:49] mato	sustrik: yeah
[11:49] sustrik	btw, why aren't we using pipes?
[11:50] mato	UNIX sockets are better these days IMO
[11:50] mato	also you don't have to invent funky naming schemes
[11:50] mato	etc etc
[11:50] mato	since socketpair nicely gives you an anonymous pair
[11:51] sustrik	ok
[11:51] mato	sustrik: also, pipes have some fixed buffer sized
[11:51] mato	sustrik: not resizable
[11:51] sustrik	i was just thinking about the fact that what we need is unidirectional pipe
[11:51] sustrik	the other direction is unused
[11:52] mato	too bad :-)
[11:52] mato	use it for something :-)
[11:52] mato	if it bothers you :-)
[11:52] sustrik	maybe we can at least shrink the buffer for that direction?
[11:52] mato	maybe
[12:29] Guthur	sustrik: Can someone create a zeromq / clrzmq2 repo?
[12:30] sustrik	yup, wait a sec
[12:31] Guthur	I'm at work at the moment but when I get home I can move the code to it and update the clrzmq page accordingly
[12:32] Guthur	I'm off for lunch back in a bit
[12:33] mato	sustrik: I occasionally get this from test_shutdown_stress:
[12:33] mato	Socket operation on non-socket
[12:33] mato	nbytes != -1 (tcp_socket.cpp:197)
[12:33] mato	/bin/sh: line 4: 32321 Aborted (core dumped) ${dir}$tst
[12:33] mato	FAIL: test_shutdown_stress
[12:34] sustrik	mato: yes, the problem was reported already
[12:40] mato	sustrik: ok, well, I have a preliminary patch for the mailbox retry stuff, sent to the ML
[12:41] sustrik	mato: thanks
[12:46] dv	what would be the recommended way to implement a messaging without a response?
[12:47] dv	i have two nodes A and B, and they communicate asynchronously,
[12:47] dv	there are no "requests" and subsequent "responses", only events
[12:48] dv	i could open two req-rep connections between the two, so that both are requesters, and simply not use the response (or send some dummy response)
[12:48] dv	but that sounds wasteful
[12:48] dv	any suggestions?
[12:48] sustrik	Guthur: done
[12:49] sustrik	dv_: you have to think about scaling to get it right
[12:49] sustrik	are there going to be multiple A's in the future?
[12:50] sustrik	or multiple B's?
[12:50] sustrik	if so, how are the messages to be dispatched to the multiple instances?
[12:50] sustrik	each message to each instance?
[12:50] sustrik	if so, use PUB/SUB
[12:50] sustrik	if you want to load-balance messages between instances
[12:50] sustrik	use PUSH/PULL
[12:51] dv	no they would be peers
[12:51] dv	there is a pair pattern, but it is marked as experimental
[12:52] dv	if multiple A's/B's were the case, i would use pub/sub, yes
[12:52] dv	hmm. you know, I could also use it for two nodes only.
[12:52] dv	but can multiple publishers exist with an IPC connection?
[12:53] sustrik	you can have multiple pubs pushing messages to a single SUB for example
[12:54] dv	in fact, can i have multiple publishers with any kind of connection?
[12:54] sustrik	yes
[12:54] dv	cool
[12:55] dv	i have an audio playback process here that is controlled by the frontend, and the playback process can send events, such as "song finished", "song metadata scanned" ..
[12:55] dv	i think i'll use pub-sub here
[12:56] dv	but does the IPC mechanism scale linearly with the amount of publishers/subscribers? for example, with tcp, since it is strictly point-to-point, i have a situation where every node has connections to all the other nodes, right?
[13:08] sustrik	right, same with ipc
[13:10] dv	something like multicast for ipc would rock. but i guess this is far from trivial. not just in zeromq, but speaking generally
[13:11] sustrik	the whole multicast thing is a complex thing
[13:11] sustrik	for ipc to prevent copying the message you would have to have it stored in shmem
[13:11] dv	i noticed. i stumbled upon weird bugs that i have yet to replicate with the openpgm tools.
[13:11] sustrik	however, allocating shmem is an expensive operation
[13:12] sustrik	so it's not that easy
[13:12] dv	to think of that in ipc ...
[13:12] dv	yes i see
[13:13] dv	oh one other thing, when I use pub/sub with multicast (epgm),
[13:13] dv	and I turn set mcast_loop to 0 in the publisher socket,
[13:14] dv	the receiver doesnt get anything anymore - but i've only tried it out with receiver and sender running on the same host so far
[13:15] dv	if I understand this correctly, mcast_loop 0 means that only messages sent between hosts will pass through? i found that part of the manual a bit confusing
[13:15] dv	(thats my last question btw. :) )
[13:15] sustrik	dv_: yes, that's how it works
[13:15] sustrik	multicast over loopback is a terrible hack
[13:16] sustrik	that is better not to use at all
[13:16] dv	hmm. i wonder if the bugs i noticed stem from this
[13:16] dv	i will use IPC for nodes on the same host then
[13:16] sustrik	quite possibly, anyway turning mcast_loop to off means you won't get the messages on the same host
[13:17] dv	i dont know if you follow the mailing list much, the bug I mentioned was an assert that happened randomly
[13:17] dv	openpgm sometimes thinks there is no multicast capable NIC
[13:17] sustrik	i'm following the list, but i have little knowledge of openpgm
[13:17] dv	overriding internal openpgm checks made it work all the time
[13:17] dv	but it felt ... wrong.
[13:18] sustrik	steve mccoy should comment on that
[13:18] dv	he did. i just mention it
[13:18] dv	hmm i'll tell him about the same host
[13:18] dv	or no, hold on. it already happened when I was just creating a publisher socket
[13:19] dv	but maybe turning off mcast_loop first does help ... i'll try it out this afternoon. thanks for the suggestions.
[13:21] cremes	mato: any chance you can send me that mailbox patch as a gist? i get "fatal: corrupt patch at line 6" if i copy/paste it into a file for application
[13:22] mato	i'll email it to you directly
[13:22] mato	cremes: what's your email address?
[13:23] mato	cremes: I sent it to the email you use on the ML.
[13:26] cremes	mato: perfect; applied it but now get another assertion
[13:26] cremes	Assertion failed: nbytes == want_nbytes (mailbox.cpp:213)
[13:27] mato	cremes: Hmm, well, that's precisely what shouldn't happen
[13:28] mato	cremes: Either you send me your code and I'll try and reproduce it here, or insert some printf's in the various code paths in send() yourself and try and figure out the actual sequence of syscalls
[13:28] cremes	if you have ruby 1.9.2 on your system, i can send you the code
[13:28] mato	it's possible OSX is behaving funny when the socket SNDBUF is resized
[13:29] cremes	otherwise, suggest what details you want printed out and i'll modify the 0mq source locally
[13:29] mato	cremes: no i don't, sorry... I have some oldish snapshot of 1.9.1
[13:29] cremes	ok
[13:29] cremes	before modifying the source, i'll try this on my archlinux box
[13:29] mato	cremes: that's a good idea
[13:29] cremes	let's see if linux blows up too
[13:31] mato	lunch. bbl
[13:35] cremes	mato: when i run the unpatched master on linux against my code example, i get a different assertion
[13:35] cremes	Assertion failed: new_sndbuf > old_sndbuf (mailbox.cpp:182)
[13:35] cremes	let me try it with the patch...
[13:35] mato	cremes: that means you hit the system SNDBUF max
[13:35] mato	cremes: set the net.core.wmem_max sysctl to something high
[13:35] mato	cremes: and it'll go away
[13:36] cremes	ko
[14:39] cremes	ok, so after boosting net.core.wmem_max i am now getting a socket error "too many open files" when i allocate 508 REQ sockets
[14:40] cremes	according to ulimit -n, i set the file descriptor max to 250_000 and i still get that
[14:40] cremes	this is on archlinux, kernel 2.6.35
[14:41] cremes	same behavior with and without mato's patch
[14:55] pieterh	cremes, cat /proc/sys/fs/file-max
[14:55] pieterh	what does it give you?
[14:56] mato	cremes: for ulimit -n you'll have to run that as root, then from that same root shell run your code
[14:56] mato	cremes: since it's inherited to child processes, it is not a system setting
[14:56] mato	my patch doesn't change anything wrt number of open files
[14:56] cremes	pieterh: results in 1201177
[14:56] pieterh	This is from the Confluence wiki
[14:56] pieterh	Run the command sysctl -a. If this is less than 200000, increase the number of file handles by editing /etc/sysctl.conf and changing the property fs.file-max to 200000. If there isn't a value set already for this property, you need to add the line fs.file-max=200000.
[14:56] pieterh	Then run sysctl -p to apply your changes to your system.
[14:56] cremes	mato: i logged in a fresh shell after making the change
[14:56] mato	cremes: yes, but that won't work
[14:57] pieterh	cremes: seems high enough, how about 'sysctl -a'?
[14:57] mato	cremes: verify with ulimit -a in the shell that you're running your code in that the setting has actually taken effect
[14:57] mato	cremes: anyway, this is beside the point of my patch, so you're running into some macosx weirdness there
[14:57] cremes	mato: it tells me 250000 for open files
[14:58] cremes	this is on linux
[14:58] pieterh	archlinux
[14:58] mato	cremes: and you're still running out of open files?
[14:58] cremes	yep
[14:58] mato	cremes: then your code is the problem, sorry :-)
[14:58] cremes	ha
[14:58] pieterh	cremes: do you have a simple test case I can try?
[14:58] mato	as in, the number of open files is not infinite :-)
[14:59] cremes	pieterh: results of sysctl -a .... https://gist.github.com/667772
[14:59] mato	cremes: anyhow, the interesting case is the failure you're getting on macosx, let me know when you have time to put some printfs in the mailbox code
[14:59] cremes	mato: i have time now; give me suggestions on what is important and i'll print it out
[14:59] pieterh	cremes, fs.file-max = 1201177
[14:59] pieterh	fs.nr_open = 1048576
[14:59] cremes	pieterh: if you have ruby 1.9.2 then i can give you a relatively simple test case
[15:00] pieterh	derp
[15:00] mato	cremes: print nbytes after the 1st do loop in send() (after line 163)
[15:00] cremes	mato: prepatched or postpatch?
[15:00] mato	cremes: patched
[15:00] mato	cremes: then, print old_sndbuf, new_sndbuf after the assert on line 184
[15:01] mato	cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
[15:01] mato	cremes: that should give us an idea of what is going on...
[15:02] mato	cremes: oh, and each thing that you print...
[15:02] mato	cremes: print also the this pointer as %p, (void *)this
[15:02] mato	cremes: so that it's obvious which printf belongs to which instance
[15:07] pieterh	cremes, how about trying a simple program to test your Linux file handle limit...
[15:07] pieterh	https://gist.github.com/667779
[15:09] pieterh	i can create over 8M sockets on my linux box
[15:10] cremes	pieterh: where is zfl.h at?
[15:11] mikko	cremes: github/zeromq/zfl
[15:13] pieterh	cremes: sorry, just use zmq.h it'll work too
[15:14] pieterh	cremes, ok, fixed it
[15:14] cremes	pieterh: give me the line to compile it please... i tried "gcc test.c" but it's bitching about implicit declaration of 'printf'
[15:15] pieterh	ack, take my fixed version
[15:15] pieterh	gcc -o testit testit.c
[15:15] pieterh	don't call c programs 'test' cause 'test' is a built-in shell command
[15:15] pieterh	gcc -o testit -lzmq testit.c
[15:15] mato	pieterh: your program will fall over in zmq sooner than it can print the number of sockets it managed to create
[15:16] pieterh	mato: doesn't seem to
[15:16] mato	pieterh: becuase of the mailbox failing to get a socket
[15:16] mato	pieterh: well, in master anyway
[15:16] pieterh	perhaps in master, yes... i was just trying in 2.0.10
[15:16] mato	2.0.10 is much different
[15:16] mato	in fact, in 2.1 even if it did not fall over, it would produce a different number
[15:17] pieterh	hey: :-)
[15:17] pieterh	"too many open files"
[15:17] pieterh	not even a clean return, just internal failure
[15:17] pieterh	neato
[15:17] mato	2.0.10 has completely different signalling mechanisms
[15:17] cremes	pieterh: aha!
[15:17] pieterh	Too many open files
[15:17] pieterh	rc == 0 (mailbox.cpp:374)
[15:17] pieterh	Aborted (core dumped)
[15:17] cremes	maybe i'm not so crazy after all...? :)
[15:18] pieterh	cremes: i can't make a test program for that :-)
[15:18] cremes	too bad...
[15:18] pieterh	let me see how quickly it dies...
[15:18] mato	with the default limit of 1024 it'll die somewhere around 512
[15:18] mato	actually, probably not
[15:19] pieterh	cremes!
[15:19] mato	it'll die at 1024/3 roughly i'd guess
[15:19] pieterh	hey!
[15:19] pieterh	it dies at 508
[15:19] mato	yup
[15:19] pieterh	wow... from >8M to 508... that is quite a step backwards
[15:19] pieterh	ok, mato, where's that default limit of 1K?
[15:19] cremes	pieterh: 508 is what i see too
[15:20] mato	hang on guys, there are several things going on here
[15:20] pieterh	net.core.wmem_max.. right?
[15:20] mato	no
[15:20] mato	that's part of it
[15:20] mato	allow me to explain
[15:20] pieterh	shoot
[15:21] mato	in 2.0.x, there is one signalling socketpair (what is called mailbox in 2.1) per application thread
[15:21] mato	hence, creating a 0mq socket does not use up an actual file descriptor
[15:21] pieterh	ah, neat
[15:21] mato	now, in 2.1 the situation is entirely different
[15:21] mato	each 0mq socket created == 1 mailbox
[15:22] mato	2 file descriptors
[15:22] mato	hence, if the default limit is 1024, you'll die at <512
[15:22] pieterh	sure...
[15:22] pieterh	now we have a trivial reproducible case on a default linux box
[15:22] pieterh	how do I raise that limit?
[15:22] cremes	guys, what is the printf format code for printing ssize_t?
[15:22] pieterh	%d
[15:23] mato	cremes: use %d and cast it to int
[15:23] cremes	archlinux bitches at me about that
[15:23] pieterh	%u maybe
[15:23] cremes	ah, ok, it will fit
[15:23] mato	cremes: there is no printf format for ssize_t
[15:23] mato	now, as to raising the limit
[15:23] mato	there are two limits
[15:23] mato	1. the system maximum, which is the sysctl pieter already mentioned
[15:23] mato	normally you don't need to touch that
[15:24] mato	2. your distribution will have some $mechanism for setting a default RLIMIT_NOFILE on login, usually to 1024
[15:24] mato	if you want to raise that, either figure out where the setting for $mechanism is
[15:24] mato	or
[15:24] mato	1. (as user) $ su root
[15:24] mato	2. (as root) # ulimit -n 8192
[15:25] mato	3. (still as that root) # su user
[15:25] mato	4. run your app
[15:25] mato	you need to follow those 4 exact steps otherwise the raised rlimit setting will not propagate to your app
[15:25] mato	if you open a new shell, it won't work
[15:25] mato	is this clear?
[15:27] pieterh	mato: having raised the ulimit to 8192 (or any large number), the test program now exits cleanly
[15:27] pieterh	and reports: "This system allows up to 511 sockets"
[15:27] mato	pieterh: yes, now there is a third limit also
[15:27] cremes	mato: results from running my test program on OSX and archlinux with the printf statements
[15:27] cremes	https://gist.github.com/667811
[15:27] mato	pieterh: compile time setting in 0MQ src/config.hpp
[15:28] mato	pieterh: which happens to be set to 512
[15:28] pieterh	was this limit introduced in 2.1.0?
[15:29] pieterh	i assume it must have been, since previously I could open 8M sockets
[15:29] mato	pieterh: presumably
[15:29] pieterh	well, ok I can confirm what you've explained
[15:30] pieterh	though the max-sockets option seems 1 off
[15:30] mato	pieterh: dunno, that is sustrik's work
[15:30] pieterh	- setting it to 8000 allows me to create 7999 sockets
[15:30] mato	it's allocating resources of some kind up front, otherwise it wouldn't be a compile-time option
[15:31] mato	cremes: ok, that output is strange
[15:31] mato	cremes: note that the old_sndbuf/new_sndbuf values for the failing mailbox look completely bogus
[15:31] mato	32? 16? wtf...
[15:31] cremes	mato: are you looking at the osx or arch output?
[15:32] mato	oh, right, sorry
[15:32] mato	bah, let me save those gists to a file
[15:33] cremes	i think on linux i am hitting the 512 socket limit
[15:33] cremes	on osx, it fails long before
[15:35] mato	cremes: hmm, the output is a bit mixed up due to stdout/stderr mixing
[15:35] cremes	mato: yes, want me to separate them?
[15:35] mato	cremes: are you printing the retry_nbytes case?
[15:35] mato	cremes: i can't find it anywhere...
[15:35] mato	cremes: just use fprintf (stderr, ...) for everything, it'll work fine
[15:35] cremes	mato: no, you didn't mention that as being interesting
[15:35] mato	cremes: oh, sorry, i did, you must have missed it
[15:36] mato	16:01 < mato> cremes: and print retry_nbytes after the 2nd do loop in send(), i.e. after line 202
[15:36] mato	cremes: can you try that please? and set it all to use stderr?
[15:37] cremes	ok, i'll add that one... sorry for the mixup
[15:51] cremes	mato: osx results... https://gist.github.com/3892c9fcb0f5493f6ae8
[15:53] pieterh	cremes, I've reported the issue https://github.com/zeromq/zeromq2/issues/issue/113
[15:54] cremes	pieterh: thank you
[15:58] pieterh	I'm pretty sure the limit of ~512 sockets will trip up a few existing apps
[15:58] cremes	pieterh: i changed that limit to 10k on my linux box, reran the test and it passed
[15:58] pieterh	yes, indeed
[15:58] cremes	i did the same change on osx and it still fails in the same spot, same error :(
[15:59] cremes	must be some funky osx-ism
[15:59] pieterh	different techniques for setting per-user limits
[15:59] pieterh	its BSD-4.3 heritage or something
[15:59] cremes	pieterh: i bumped file descriptors per process to 25k a long time ago for mongodb on this box
[16:00] mato	pieterh: max_sockets directly affects the number of mailbox slots
[16:00] mato	pieterh: so the limit is set intentionally low
[16:00] zmq_help	hi folks - i'm trying to get zeromq ruby bindings working on Ubuntu 10.10
[16:00] pieterh	mato: you mean as in reducing pre-allocated memory size?
[16:00] mato	pieterh: yes
[16:00] pieterh	is there another performance reason?
[16:00] cremes	zmq_help: which bindings? the zmq gem or ffi-rzmq?
[16:01] pieterh	data copying etc?
[16:01] mato	pieterh: possibly, read the code :-)
[16:01] zmq_help	I installed zmq
[16:01] zmq_help	I installed zmq ruby gem
[16:01] mato	pieterh: but in any case, the standard untuned Linux limit is 1024 fds
[16:01] cremes	zmq_help: ok; so what seems to be the problem?
[16:01] mato	pieterh: so there's no point in changing the default of max_sockets
[16:01] mato	pieterh: since it'll fall over anyway
[16:01] pieterh	mato: IMO there is
[16:01] pieterh	it'll fall over with a normal system error
[16:02] pieterh	that can be fixed without recompilation
[16:02] pieterh	modifying code to change a limit is not acceptable for many processes
[16:02] pieterh	it means requalification
[16:02] pieterh	we've discussed this previously
[16:02] pieterh	imagine 'large bank'
[16:03] mato	pieterh: large bank should test damn well for such limits...
[16:03] zmq_help	cremes: when I enter 'require "zmq"' I get this error message
[16:03] zmq_help	LoadError: libzmq.so.0: cannot open shared object file: No such file or directory - /usr/lib/ruby/gems/1.8/gems/zmq-2.0.9/lib/zmq.so
[16:03] pieterh	mato: that's not the point
[16:03] pieterh	point is tuning the sw at compile time is sub-optimal
[16:03] zmq_help	cremes: but I can see that the zmq.so file is there, and readable
[16:03] pieterh	and it can be done better unless there is an actual cost in having a larger limit
[16:03] pieterh	anyhow, issue is noted
[16:04] mato	pieterh: well, don't forget there are many audiences, embedded, etc etc..
[16:04] cremes	zmq_help: did you install the 0mq library? the gem does not include it
[16:04] cremes	also, make sure you run ldconfig after installing the 0mq lib
[16:04] mato	pieterh: and the limit only affects # of sockets, not # connections obviously
[16:04] cremes	so that linux picks it up
[16:04] mato	anyway, noted.
[16:04] pieterh	mato: I'd suggest embedded systems are less 'typical'
[16:04] zmq_help	cremes - yes I installed everything - i'll try the ldconfig trick now...
[16:04] pieterh	plus 0MQ encourages use of many sockets
[16:04] mato	it does? :-)
[16:05] pieterh	yes
[16:05] pieterh	typically one for each flow
[16:05] cremes	pieterh: agreed... i sometimes need thousands for just that reason
[16:05] pieterh	and if you want any custom routing that means several sockets per peer
[16:05] zmq_help	cremes: THANKS! the ldconfig trick did the job
[16:05] pieterh	as you well know, mato :-)
[16:05] cremes	zmq_help: remember to help someone else in channel when you get a chance ;)
[16:06] zmq_help	:-) will do thanks
[16:06] zmq_help	cremes: where should I give documentation feedback re: the ldconfig trick ??
[16:06] cremes	pieterh: does imatix have an osx box for testing?
[16:06] cremes	zmq_help: that trick is listed in the FAQ i think
[16:06] pieterh	cremes: yes, sure
[16:06] pieterh	try this: sudo echo âlimit maxfiles 1000000 1000000â³ > /etc/launchd.conf
[16:06] cremes	otherwise, feel free to join the wiki and modify it on your own
[16:07] zmq_help	cremes: ok thanks -
[16:07] pieterh	and then launchctl limit maxfiles 1000000 1000000
[16:07] cremes	pieterh: i did that when i bumped my limits to 25k... here are the current contents: limit maxfiles 25000 100000
[16:07] pieterh	cremes, I'm just googling random pages for ""too many open files" OSX"
[16:07] pieterh	hmm
[16:07] cremes	ulimit -a agrees that it is 25k
[16:08] pieterh	and you tuned 0MQ to 10k sockets?
[16:08] cremes	yes
[16:08] pieterh	and you still get an abort at 508 sockets?
[16:09] pieterh	cremes, see http://artur.hefczyc.net/node/27
[16:09] mato	cremes: hmph, those sndbuf numbers from osx make no sense
[16:09] mato	cremes: if it really was 16/32 bytes, it'd fall over immediately
[16:10] mato	cremes: i'd need access to an OSX box to figure out what is going on
[16:10] cremes	mato: i can give you an account on mine if you'd like
[16:10] pieterh	mato: I can bring you my MacBook when I manage to make it to BA
[16:11] mato	cremes: that would be great, but honestly i don't have time this week to look at it seriously
[16:11] cremes	mato: ok
[16:12] mato	cremes: upcoming deadlines, too much to do, sorry...
[16:12] cremes	i understand... no worries
[16:15] twomashi1	How can I process all outstanding messages in a socket and stop recieving new ones?
[16:19] guido_g	you can't
[16:20] guido_g	you can receive all messages in a loop, but you can't tell the Ã¸mq socket to stop accepting more/new messages
[16:23] Steve-o	twomashi1: what is the logical requirement from this scenario? clean FT fail over?
[16:29] twomashi1	Steve-o: Removing a worker from a worker pool
[16:30] Guthur	Does 0MQ aim to provide more devices in the future?
[16:34] mikko	Guthur: i would imagine that if certain devices are used in large amount of projects they could be incorporated into the core
[16:34] mikko	but not sure whether adding new devices is the biggest priority
[16:35] mikko	Guthur: are there specific devices that you are after?
[16:36] Guthur	mikko: Not really just interested is all, if I could find examples of what would be beneficial I would maybe even have a stab at implementing
[16:41] twomashi1	guido_g: I wanted to have a worker which would process n messages and then die
[16:42] twomashi1	but it sounds like theres no way to stop the worker recieving more messages
[16:42] guido_g	as i already said
[16:42] twomashi1	so this pattern wont work
[16:42] twomashi1	thanks
[16:47] Steve-o	twomashi1: I think there are few implementations of worker pools already, e.g. http://kfsone.wordpress.com/2010/07/21/asyncworker-parallelism-with-zeromq/
[16:51] guido_g	doesn't handle the dynamic shutdown of one worker though
[17:04] ngerakines	sup
[17:17] cremes	found the trick for osx... it has some sysctl settings that it shares with freebsd
[17:18] cremes	for localhost/loopback connections, there are separate send/recv buffers allocated
[17:18] cremes	net.local.stream.sendspace=82320
[17:18] cremes	net.local.stream.recvspace=82320
[17:18] cremes	or whatever value you want
[17:18] cremes	i'll add this to the FAQ
[17:18] mato	cremes: so upping those values makes the code work for you?
[17:19] mato	cremes: that would imply that at least for some sockets, the default on OSX is ridiculously low
[17:19] cremes	yes, because it avoids the recovery code that tries to adjust SO_SNDBUF
[17:19] mato	the thing is, the recovery code works
[17:19] mato	as long as the OS comes back with a sane sndbuf in the getsockopt
[17:19] cremes	i believe you, but SO_SNDBUF isn't returning the right vals
[17:19] cremes	and i don't know why
[17:20] cremes	maybe it is returning kbytes instead of bytes
[17:20] mato	that is indeed what it looks like
[17:20] mato	cremes: could you please somehow summarize what we've found and reply to the thread on the ML so that we don't lose it?
[17:20] cremes	absolutely
[17:21] mato	cremes: it's plausible that even just setting a large-ish value at mailbox socketpair creation time would make the problem go away
[17:21] mato	needs more investigation...
[17:21] cremes	right... i'll respond to the ML with a few details; i'll also update pieter's issue
[17:21] mato	thx
[17:49] cremes	hmmm, what would you guess is happening if i get "Too many open files: rc == 0 (mailbox.cpp:431)" ?
[17:49] cremes	this is with the patched mailbox.cpp
[17:51] mato	precisely what it says :)
[17:52] cremes	heh
[17:53] cremes	i must not have raised my limits high enough and now i'm running out of another resource
[18:10] ngerakines	hey folks, I've got a c++ app that uses zmq with threads and I'm not sure about the proper way of doing something in particular:
[18:10] ngerakines	for these threads, they bind to a given socket and wait for incoming messages, however when I want to shutdown these threads I'm not finding through the docs the best way to my socket_in.recv/1 to stop blocking on input
[18:11] ngerakines	how should I go about this?
[18:12] twomashi1	ngerakines: i have the same issue
[18:13] twomashi1	im told theres no way to get zmq to stop recieving messages so you can process outstanding ones
[18:13] nettok	ngerakines: I would like to know about that too
[18:15] nettok	ngerakines: maybe registering a signal handler and then trigger the signal from another thread or process, something like that?
[18:16] ngerakines	hmm
[18:16] nettok	Or making read non-blocking
[18:16] twomashi1	oh wait... sorry different issue
[18:16] twomashi1	use poll?
[18:17] ngerakines	yeah, looking into poll
[18:18] guido_g	ngerakines: simply send a quit message (can be of length 0)
[18:19] ngerakines	ok, was looking for a way to effectively halt all zmq communication once stop was initiated, but thanks
[18:20] guido_g	then just terminate the context
[18:20] guido_g	that will close all Ã¸mq sockets and stop all Ã¸mq related activities
[18:22] nettok	guido_g: thanks!
[18:23] ngerakines	guido_g: and in cases where i've got a pub/sub client, use poll?
[18:24] guido_g	ngerakines: can't follow you, sorry
[18:24] ngerakines	I've got a pub/sub subscriber with a run-loop that uses socket_t.recv(..)
[18:24] guido_g	so?
[18:24] ngerakines	instead of recv, should I be using poll to close out cleanly?
[18:24] guido_g	why?
[18:25] ngerakines	Is there a way to send it a kill message like above given that it isn't binding?
[18:25] guido_g	wht isn't binding? w/o a bound socket you can't receive any messages
[18:26] guido_g	so if you have a sub socket, it'll just receive the message
[18:27] ngerakines	because it is creating a socket with ZMQ_SUB ?
[18:27] ngerakines	with connect ?
[18:27] ngerakines	socket_t.connect(...) as opposed to socket_t.bind(...)
[18:28] guido_g	there is no difference between the connect side and the bind side in this case
[18:28] guido_g	this is for sure mentioned in the guide somewhere
[18:56] gandhijee	hey guys, is there an ubuntu/debian package of zeromq anywhere?
[18:56] pieterh	gandhijee, I believe there is, somewhere
[18:57] gandhijee	happen to have an idea where somewhere might be?
[18:57] pieterh	Do a google for "debian package zeromq"
[18:57] pieterh	http://packages.debian.org/source/sid/zeromq
[18:58] twomashi1	pieterh: do you know how a process using zeromq could stop recieving messages and process it's outstanding messages? (say to remove itself from a worker pool)
[18:58] gandhijee	yeah i had found that one, its 2.0.6, the other machines are all on 2.10
[18:58] pieterh	twomashi1, let me think about it... I just got back and am reading the traffic here
[18:59] twomashi1	ok cool
[18:59] twomashi1	thanks
[18:59] pieterh	gandhijee, right... but it's pretty simple to build from source anyhow
[18:59] gandhijee	i don't want to have to deal with any issues because 1 is newer than the other
[18:59] gandhijee	yeah, but i wanted to be lazy and not build the .deb
[18:59] gandhijee	i have to put it on 3 more machines, 1 x86_64 and 2 atoms
[18:59] pieterh	gandhijee, download tarball, build, it's really simple
[19:00] gandhijee	yes i know, like i said, i wanted to be lazy and not build the deb, because i have to get it to 3 more machines,
[19:00] pieterh	you can make a simple script that does it from wget to 'sudo make install; ldconfig'
[19:00] gandhijee	building the deb is just a cpl extra steps
[19:00] pieterh	sure
[19:01] pieterh	being lazy is good if you use that to create something
[19:01] pieterh	if it's just to avoid work... well... :-)
[19:01] pieterh	twomashi1, can you provide more background info?
[19:01] pieterh	do you want flow control?
[19:03] twomashi1	pieterh: I want to use a pool of PHP workers to process data, this is existing PHP code adapted to read messages from a ZMQ socket. I dont want them to live indefinitely because PHP is prone to memory leakes, so they must exit after processing n requests. Zeromq will fetch more messages than I need tho, so I want some way to stop ZMQ prefetching messages and process the messages already in memory.
[19:04] pieterh	right...
[19:04] pieterh	I can think of a few ways
[19:04] pieterh	here is the most brutal
[19:04] pieterh	you batch the messages together, using multipart and some delimiters
[19:05] pieterh	so that you actually send your whole batch, say 100 messages, as one 0MQ message
[19:05] pieterh	you then use the LRU routing technique from the Guide ch3
[19:05] pieterh	where the worker signals 'ready' and then gets a job
[19:05] pieterh	when the worker has processed its job it just terminates
[19:05] pieterh	it won't signal 'ready' again, so won't get another batch
[19:05] pieterh	you control the batch size explicitly at the sender side
[19:06] twomashi1	ah, with LRU the worker must signal to get a job?
[19:06] pieterh	now the problems with this:
[19:06] pieterh	yes
[19:06] pieterh	it's a nice model except it's chatty
[19:06] twomashi1	chatty is fine, it must be safe.
[19:06] pieterh	so if you have many small jobs the to and fro will cost too much
[19:06] pieterh	well, study the lruqueue example
[19:06] pieterh	you can probably use that as a device in front of your existing client/s
[19:06] pieterh	with some small mods it'll do what you want
[19:07] twomashi1	that could work because the job server will be on the same machine.
[19:07] pieterh	sure
[19:07] twomashi1	so I dont think chatty is an issue
[19:07] pieterh	so use the LRU device and then in the workers, die after doing X jobs
[19:07] pieterh	put $5 in the can on the way out :-)
[19:07] twomashi1	hehe
[19:08] twomashi1	Ok, and do you think that in some future there will be a way to support this usage case by instructing the context to stop recieving for a socket, or something like that?
[19:08] pieterh	well...
[19:08] pieterh	i don't think it has to be built in
[19:08] twomashi1	i imagine that if it doesnt go against the goals and policies of the project it could happen, if someone put the time in.
[19:08] pieterh	it works very nicely at the level it's at now
[19:09] pieterh	to answer that in technical detail...
[19:09] pieterh	if you want to regulate how much the sender sends
[19:09] pieterh	then you must send information back explicitly to synchronize it
[19:10] pieterh	you could call this an ack windw
[19:10] pieterh	ack window
[19:10] pieterh	it only makes sense in an asynchronous request-reply model
[19:10] pieterh	so yes, it might be added to the XREP socket
[19:11] twomashi1	I see what you mean... the effect can be reproduced using the currently available facilities
[19:11] twomashi1	thanks for your help!
[19:18] cremes	OSX sysctl is very odd... see my update on the ML
[19:22] mato	gandhijee: there is a Debian package of ZeroMQ 2.0.10, I maintain it
[19:22] mato	gandhijee: it's in Debian unstable
[19:22] gandhijee	sweet!!
[20:25] gandhijee	how do i have zeromq listen on a spec interface?
[20:25] gandhijee	right now i have 2 in the machine that is a client, and it doesn't seem to get messages over the netowrk for some reason
[20:45] Guthur	gandhijee, Spec Interface?
[20:46] Guthur	maybe you want to check out IO polling
[20:46] gandhijee	its ok i figured it out
[20:46] gandhijee	it was something super silly, its been a long day
[20:46] Guthur	hehe no bother, we all have those days
[21:11] Ben	hi all - I have a question about using sockets from different threads in C++. I'm on the latest code from the git repository master branch. The scenario is that I've created a socket in one thread, but I use it from another thread. Only one thread ever uses the socket at a time. I thought this was possible with the latest code, but I am not seeing any messages come into my test receiver on the other end.
[21:12] Ben	another question - my test receiver is written in Python and is on 2.0.8. Do I need to upgrade this to 2.1 as well in order to get it to work?
[21:15] ngerakines	from my experience its a bad idea and leads to unexpected results
[21:15] Ben	which part? mixing zmq versions or migrated between threads?
[21:15] ngerakines	I think the docs say in bold that sockets should never be shared across threads
[21:15] Ben	it does for 2.0
[21:16] Ben	but in 2.1 there is a statement saying that this is now legal
[21:16] Ben	I know it isn't released yet - I'm doing this out of git
[21:16] Ben	it may just be that it isn't ready
[21:16] ngerakines	I don't know then, sorry
[21:23] Guthur	is there really a need for the wuserver example to be binding to ipc
[21:23] Guthur	It's not used, and it just makes the example incompatible with windows
[21:25] Ben	if anyone is curious about my earlier question - moving my test client to 2.1 from 2.0.8 did indeed fix the problem.
[22:03] mikko	sustrik: how do i recognise you?
[22:04] sustrik	mikko: i've sent you my number
[22:04] sustrik	have you got it?
[22:04] mikko	yes
[22:04] sustrik	let me find some picture...
[22:05] mikko	http://farm4.static.flickr.com/3297/3626630182_006c6ba2c0.jpg
[22:05] mikko	thats me
[22:05] mikko	i dont have the beard anymore, just the moustache
[22:05] sustrik	that's me:
[22:05] sustrik	http://www.facebook.com/#!/photo.php?fbid=1403475059607&set=t.1233485121
[22:05] sustrik	guy with the tuba
[22:05] mikko	ok
[22:06] mikko	and it's not that large pub
[22:06] sustrik	ok
[22:07] sustrik	ah, jon dyte mentioned he's going to arrive
[22:19] pieterh	sustrik: you take a tube to all 0MQ meetups?
[22:19] pieterh	*tuba
[22:19] pieterh	sorry, it's been a long day :-)
[22:38] lestrrat	is fd_t only available in C++ land? (for getsockopt( ... ZMQ_FD ))
[22:51] Guthur	sustrik, I pushed the clrzmq2 code to the repo
[22:53] Guthur	I kept the same .NET assembly name, but bumped the version up to 2.0.0.0
[22:53] Guthur	It's easy for people to set there required version in MSVC, and MonoDevelop
[22:53] Guthur	their*