ZeroMq IRC Log

Monday September 20, 2010

[Time] Name	Message
[03:20] stoneman	hello
[03:20] stoneman	is anyone here?
[03:27] dbudworth	Trying to figure out if 0mq is appropriate for my project, simple description would be something like a sticky load balancer. once a client talks to a service, it sticks with that one. for a bi-directional conversation between 2 nodes once an initial node has been selected. ie: my client does round robin on each new conversation but updates to a given conversation have to stick to the beginning server
[03:27] dbudworth	and hello stoneman, i'm here. but basically not useful if you are looking for help
[09:10] mato	sustrik: i was just thinking about the zmq_term()/zmq_close() semantics
[09:10] mato	sustrik: and i came up with one small detail
[09:11] mato	sustrik: which may or may not be helpful in the case where zmq_term() would wait indefinitely on exit
[09:11] mato	sustrik: the thing is, syscall close() means "commit".
[09:12] mato	sustrik: if you don't close your filedescriptors, and don't use fsync() (not for sockets obviously), then there is no commitment on the part of the kernel
[09:13] mato	sustrik: hence, as far as sockets with data in-flight at zmq_term() time are concerned, we only need care about those that zmq_close() has actually been called on
[09:13] mato	sustrik: for any other sockets, all bets are off
[09:19] sustrik	mato: sure, but user has to close all the sockets anyway
[09:19] sustrik	otherwise he'll end up with memory leaks
[09:19] mato	sustrik: i was thinking more of the case of "the application terminating in an abnormal way"
[09:20] sustrik	no guarantees then
[09:20] mato	sure, what i mean is...
[09:20] mato	if an application handles e.g. SIGINT/SIGTERM
[09:20] mato	should it go and call zmq_term() as part of its exit process in such a case?
[09:20] mato	or explicitly not do that, becuase the call may block...?
[09:21] mato	i guess the answer is application dependent
[09:21] sustrik	no idea
[09:21] mato	oh, and another thing
[09:21] mato	we talked about the non-zero-copy APIs
[09:22] mato	and i have the most obvious names :)
[09:22] mato	zmq_sendcopy and zmq_recvcopy ... ?
[09:22] mato	would do fine for 2.1, in 3.x the whole mess can be done the right way
[09:23] sustrik	dunno, does it makes sense to introduce something that will be changed anyway?
[09:23] sustrik	it's just asking for problems
[09:23] mato	why?
[09:24] mato	well, one reason is that pieter uses all these "helper" functions in the user guide precisely for this reason
[09:25] sustrik	so, we'll have send, sendcopy and the helper function
[09:25] sustrik	:)
[09:25] mato	no, the helpers can go away
[09:25] mato	check with pieter, but imo the reason those are there is becuase zerocopy in C is "too much typing" or something :)
[09:25] mato	at least that's what he told me
[09:25] sustrik	anyway, first thing on this path is to defined the semantics for recvcopy when the buffer is no large enough
[09:27] mato	yeah, good point
[09:27] mato	presumably some error will have to be returned ...
[09:27] mato	SCTP works in terms of messages as atomic units?
[09:30] sustrik	mato: yes
[13:36] user	was hoping someone can help me
[13:37] user	cant download anything always comes up with an error
[13:43] drbobbeaty	I was just able to download the POSIX version from the website: http://www.zeromq.org/area:download
[13:44] user	whats that?
[13:44] drbobbeaty	The POSIX version is the version for Linux, etc. There's also a Windows version on the same page.
[13:44] mato	sustrik: are you around?
[13:45] mato	sustrik: I've found a silly bug in the select() impl. of zmq_poll() on master
[13:45] sustrik	mato: here am I
[13:45] mato	sustrik: trivial fix, but something else is broken when using select()
[13:45] user	k. my error code tells me: Archive: /media/OFFICE12/setup.exe
[13:45] user	[/media/OFFICE12/setup.exe]
[13:45] user	End-of-central-directory signature not found. Either this file is not
[13:45] user	a zipfile, or it constitutes one disk of a multi-part archive. In the
[13:45] user	latter case the central directory and zipfile comment will be found on
[13:45] user	the last disk(s) of this archive.
[13:45] user	note: /media/OFFICE12/setup.exe may be a plain executable, not an archive
[13:45] user	zipinfo: cannot find zipfile directory in one of /media/OFFICE12/setup.exe or
[13:45] user	/media/OFFICE12/setup.exe.zip, and cannot find /media/OFFICE12/setup.exe.ZIP, period.
[13:46] mato	user: I'm sorry, you're probably on the wrong chat room here.
[13:46] user	where do i need to go?
[13:47] sustrik	:)
[13:47] sustrik	mato: wull?
[13:48] mato	sustrik: wull?
[13:48] sustrik	well?
[13:48] mato	hang on
[13:48] mato	i'm on bloody win32
[13:48] mato	everything is confusing :-)
[13:48] mato	patience...
[13:49] mato	sustrik: ok, 1st, somewhee around line 547 of zmq.cpp, the select () call needs to be changed to use maxfd + 1
[13:49] mato	sustrik: not just maxfd
[13:49] mato	sustrik: that's probably my fault
[13:49] sustrik	ok
[13:50] mato	sustrik: now, then, what i'm seeing is on win32 _or_ on Linux with ZMQ_FORCE_SELECT (and I patched zmq.cpp to also use select() for zmq_poll() when zmq_force_select() is defined)
[13:50] mato	sustrik: for some reason a socket is not becoming ready on the app side when it should
[13:51] mato	sustrik: i.e. data gets sent down XREP, XREQ on the client side never becomes ready
[13:51] sustrik	do you have a simple test program?
[13:52] mato	working on it...
[13:52] sustrik	ok
[14:20] mato	sustrik: ok, i have a test case... will msg you, it;s on the test box
[14:45] CIA-20	zeromq2: 03Martin Lucina 07master * r1abfc92 10/ src/zmq.cpp : minor problem in zmq_poll (select version) fixed - http://bit.ly/c9Sdnb
[14:55] CIA-20	zeromq2: 03Martin Lucina 07master * rf49b77e 10/ src/zmq.cpp : zmq_poll honours ZMQ_FORCE_POLL and ZMQ_FORCE_SELECT options - http://bit.ly/dzi76e
[15:16] sustrik	mato: afaics the problem is that ZMQ_FD is edge-trigerred
[15:17] sustrik	thus, IN/OUT flag may be set in the past, but the select/poll won't exit because of it
[15:18] sustrik	mato: check how poll version of zmq_poll works
[15:20] mato	sustrik: the code seems equivalent, no?
[15:20] sustrik	nope
[15:21] mato	then i don't understand the problem...
[15:21] sustrik	you have to delete lines 572.577
[15:21] sustrik	let me do it
[15:21] mato	which lines, i have different line numbers here...
[15:22] mato	and i'd like to understand what the problem is
[15:22] sustrik	you should _not_ check POLLIN is set on ZMQ_FD and check ZMQ_EVENTS anyway
[15:22] sustrik	whather
[15:22] sustrik	ehether
[15:23] sustrik	whether
[15:23] sustrik	:)
[15:23] mato	?
[15:23] sustrik	...whether POLLIN is set...
[15:24] mato	sustrik: hmm, so you're saying that each ZMQ_FD needs to be checked every time you come out of the select/poll() ?
[15:24] CIA-20	zeromq2: 03Martin Sustrik 07master * r4d51a52 10/ src/zmq.cpp : zmq_poll (select version) now correctly assumes that ZMQ_FD is edge-trigerred - http://bit.ly/dklauP
[15:25] sustrik	yes
[15:25] sustrik	committed
[15:25] sustrik	the test program seems to work
[15:26] mato	i still don't understand... select would not have exited if the fd did not become ready...?
[15:27] sustrik	what?
[15:27] mato	sustrik: if you're sitting in select(), and the notify fd becomes ready, then you read the events
[15:27] mato	sustrik: what is the other code path?
[15:28] sustrik	the commands in ZMQ_FD were already processed before calling zmq_poll
[15:28] sustrik	select blocks forever
[15:28] sustrik	although there are messages in available
[15:28] mato	processed by who?
[15:28] sustrik	random previous command
[15:29] mato	ah, right, this is because ZMQ_FD is tapping straight into the signaller
[15:29] mato	mumble
[15:29] sustrik	ack
[15:29] mato	i wish we could fix that
[15:29] sustrik	?
[15:29] sustrik	it's fixed
[15:29] mato	this is also why you do that first_pass thing, right?
[15:29] sustrik	yes
[15:29] sustrik	no timout on first pass
[15:30] sustrik	exit immediately
[15:30] sustrik	then check whether events are availalbe
[15:30] mato	this is to pick up events coming from previously processed commands, right?
[15:30] sustrik	yes
[15:30] mato	ok, understood
[15:38] sustrik	mato: btw, if you want to optimise it you can still perform the check when !first_pass
[15:39] sustrik	getting ZMQ_EVENTS can be rather slow as it involves reading from the signaler => recv()
[15:40] mato	right, that might be a good idea
[15:40] mato	also, is getting ZMQ_FD slow?
[15:41] mato	since that is also done more times than strictly necessary
[15:41] mato	sustrik: btw, the signaler uses socketpair?
[15:41] mato	sustrik: which translates to a tcp connection with winsock?
[15:44] sustrik	yes
[15:44] mato	interesting, i get an occasional "Address already in use" on Win32 from signaler.cpp:80
[15:45] sustrik	let me see
[15:45] mato	I guess this is just the poor M$ WIn$ock running out of tcp ports or something
[15:45] mato	it's the "connect to remote peer" call...
[15:45] sustrik	EADDRINUSE on connect?
[15:45] mato	yeah
[15:45] mato	bizarre
[15:46] sustrik	hm, it's documented
[15:46] mato	"not enough ports" ?
[15:47] sustrik	very useful description:
[15:47] sustrik	[EADDRINUSE]
[15:47] sustrik	Attempt to establish a connection that uses addresses that are already in use.
[15:47] sustrik	(that's POSIX)
[15:48] mato	sustrik: windoze is slightly different
[15:48] sustrik	Linux: EADDRINUSE
[15:48] sustrik	Local address is already in use.
[15:48] mato	http://msdn.microsoft.com/en-us/library/ms740668%28VS.85%29.aspx
[15:49] sustrik	The socket's local address is already in use and the socket was not marked to allow address reuse with SO_REUSEADDR. This error usually occurs when executing bind, but could be delayed until the connect function if the bind was to a wildcard address (INADDR_ANY or in6addr_any) for the local IP address. A specific address needs to be implicitly bound by the connect function.
[15:49] mato	Yeah, I never quite understood the SO_REUSEADDR semantics on win32
[15:49] sustrik	does it make any sense to you?
[15:49] mato	but i'll try adding that in and see if it changes anything
[15:50] mato	well, i think what it's saying is "the wildcard bind() picked a port that is still in time-wait state"
[15:50] mato	or some nonsense like that
[15:50] sustrik	:\|
[15:50] mato	let me try adding SO_REUSEADDR for win32 to the signaler listen socket and see what happens
[15:50] mato	i don't care what it actually does on win32 as long as the error goes away :-)
[15:51] mato	proper win32 solution is obviously to use named pipes/win32 objects/whatsits and iocp
[15:54] sustrik	ok
[16:07] mato	hmm
[16:07] mato	i dunno
[16:07] mato	doesn't seem to help much
[16:07] mato	anyway, this doesn't really matter
[16:08] mato	i added SO_REUSEADDR to both ends of the emulated socketpair and still get EADDRINUSE back
[16:08] mato	so i think it's just poor windows running out of ports to auto-assign or something :-)
[16:08] mato	sustrik: anyway, the problem with select() has been fixed, so it's all good!
[16:11] sustrik	nice
[16:11] sustrik	you may fill in the bug report for the EADDRINUSE stuff
[16:24] mato	done
[20:00] cpscotti	Hello there, anyone up to a philosophical (although basic) discussion on a zmq networking topology for a "generic" application? Regarding many clients & services but without a broker.
[20:17] cremes	cpscotti: i'd be happy to have that conversation with you tomorrow; i have to leave for the rest of today
[20:19] cpscotti	cremes: thanks.. if nothing is solved in my side of things I'll try tomorrow then
[20:19] cremes	ok
[21:07] ModusPwnens	is the send function faster than the receive function? or are they both just as fast?
[21:08] cpscotti	send doesn't block, recv blocks
[21:08] cpscotti	(as for the "program flow" speed)
[21:08] cpscotti	now for the underlying stuff, dunno
[21:08] ModusPwnens	even with pub/sub topology?
[21:08] jhawk28	recv can do a noblock flag
[21:08] cpscotti	awl.. yep..
[21:08] ModusPwnens	I see.
[21:09] ModusPwnens	So that would explain benchmarking tests where
[21:09] ModusPwnens	i run several in a row
[21:09] ModusPwnens	http://pastie.org/1161267
[21:09] ModusPwnens	thats probably better than explaining it
[21:10] ModusPwnens	I've sort of run into a wall with this and have been stuck on it for several days
[21:10] ModusPwnens	because I am seeing those results
[21:10] ModusPwnens	and am not sure what to make of them
[22:41] larrytheliquid	are there any semantic implications to connecting multiple times to the same address?
[22:41] larrytheliquid	from the same socket, that is