Monday September 20, 2010

[Time] NameMessage
[03:20] stoneman hello
[03:20] stoneman is anyone here?
[03:27] dbudworth Trying to figure out if 0mq is appropriate for my project, simple description would be something like a sticky load balancer. once a client talks to a service, it sticks with that one. for a bi-directional conversation between 2 nodes once an initial node has been selected. ie: my client does round robin on each new conversation but updates to a given conversation have to stick to the beginning server
[03:27] dbudworth and hello stoneman, i'm here. but basically not useful if you are looking for help
[09:10] mato sustrik: i was just thinking about the zmq_term()/zmq_close() semantics
[09:10] mato sustrik: and i came up with one small detail
[09:11] mato sustrik: which may or may not be helpful in the case where zmq_term() would wait indefinitely on exit
[09:11] mato sustrik: the thing is, syscall close() means "commit".
[09:12] mato sustrik: if you don't close your filedescriptors, and don't use fsync() (not for sockets obviously), then there is no commitment on the part of the kernel
[09:13] mato sustrik: hence, as far as sockets with data in-flight at zmq_term() time are concerned, we only need care about those that zmq_close() has actually been called on
[09:13] mato sustrik: for any other sockets, all bets are off
[09:19] sustrik mato: sure, but user has to close all the sockets anyway
[09:19] sustrik otherwise he'll end up with memory leaks
[09:19] mato sustrik: i was thinking more of the case of "the application terminating in an abnormal way"
[09:20] sustrik no guarantees then
[09:20] mato sure, what i mean is...
[09:20] mato if an application handles e.g. SIGINT/SIGTERM
[09:20] mato should it go and call zmq_term() as part of its exit process in such a case?
[09:20] mato or explicitly not do that, becuase the call may block...?
[09:21] mato i guess the answer is application dependent
[09:21] sustrik no idea
[09:21] mato oh, and another thing
[09:21] mato we talked about the non-zero-copy APIs
[09:22] mato and i have the most obvious names :)
[09:22] mato zmq_sendcopy and zmq_recvcopy ... ?
[09:22] mato would do fine for 2.1, in 3.x the whole mess can be done the right way
[09:23] sustrik dunno, does it makes sense to introduce something that will be changed anyway?
[09:23] sustrik it's just asking for problems
[09:23] mato why?
[09:24] mato well, one reason is that pieter uses all these "helper" functions in the user guide precisely for this reason
[09:25] sustrik so, we'll have send, sendcopy and the helper function
[09:25] sustrik :)
[09:25] mato no, the helpers can go away
[09:25] mato check with pieter, but imo the reason those are there is becuase zerocopy in C is "too much typing" or something :)
[09:25] mato at least that's what he told me
[09:25] sustrik anyway, first thing on this path is to defined the semantics for recvcopy when the buffer is no large enough
[09:27] mato yeah, good point
[09:27] mato presumably some error will have to be returned ...
[09:27] mato SCTP works in terms of messages as atomic units?
[09:30] sustrik mato: yes
[13:36] user was hoping someone can help me
[13:37] user cant download anything always comes up with an error
[13:43] drbobbeaty I was just able to download the POSIX version from the website:
[13:44] user whats that?
[13:44] drbobbeaty The POSIX version is the version for Linux, etc. There's also a Windows version on the same page.
[13:44] mato sustrik: are you around?
[13:45] mato sustrik: I've found a silly bug in the select() impl. of zmq_poll() on master
[13:45] sustrik mato: here am I
[13:45] mato sustrik: trivial fix, but something else is broken when using select()
[13:45] user k. my error code tells me: Archive: /media/OFFICE12/setup.exe
[13:45] user [/media/OFFICE12/setup.exe]
[13:45] user End-of-central-directory signature not found. Either this file is not
[13:45] user a zipfile, or it constitutes one disk of a multi-part archive. In the
[13:45] user latter case the central directory and zipfile comment will be found on
[13:45] user the last disk(s) of this archive.
[13:45] user note: /media/OFFICE12/setup.exe may be a plain executable, not an archive
[13:45] user zipinfo: cannot find zipfile directory in one of /media/OFFICE12/setup.exe or
[13:45] user /media/OFFICE12/, and cannot find /media/OFFICE12/setup.exe.ZIP, period.
[13:46] mato user: I'm sorry, you're probably on the wrong chat room here.
[13:46] user where do i need to go?
[13:47] sustrik :)
[13:47] sustrik mato: wull?
[13:48] mato sustrik: wull?
[13:48] sustrik well?
[13:48] mato hang on
[13:48] mato i'm on bloody win32
[13:48] mato everything is confusing :-)
[13:48] mato patience...
[13:49] mato sustrik: ok, 1st, somewhee around line 547 of zmq.cpp, the select () call needs to be changed to use maxfd + 1
[13:49] mato sustrik: not just maxfd
[13:49] mato sustrik: that's probably my fault
[13:49] sustrik ok
[13:50] mato sustrik: now, then, what i'm seeing is on win32 _or_ on Linux with ZMQ_FORCE_SELECT (and I patched zmq.cpp to also use select() for zmq_poll() when zmq_force_select() is defined)
[13:50] mato sustrik: for some reason a socket is not becoming ready on the app side when it should
[13:51] mato sustrik: i.e. data gets sent down XREP, XREQ on the client side never becomes ready
[13:51] sustrik do you have a simple test program?
[13:52] mato working on it...
[13:52] sustrik ok
[14:20] mato sustrik: ok, i have a test case... will msg you, it;s on the test box
[14:45] CIA-20 zeromq2: 03Martin Lucina 07master * r1abfc92 10/ src/zmq.cpp : minor problem in zmq_poll (select version) fixed -
[14:55] CIA-20 zeromq2: 03Martin Lucina 07master * rf49b77e 10/ src/zmq.cpp : zmq_poll honours ZMQ_FORCE_POLL and ZMQ_FORCE_SELECT options -
[15:16] sustrik mato: afaics the problem is that ZMQ_FD is edge-trigerred
[15:17] sustrik thus, IN/OUT flag may be set in the past, but the select/poll won't exit because of it
[15:18] sustrik mato: check how poll version of zmq_poll works
[15:20] mato sustrik: the code seems equivalent, no?
[15:20] sustrik nope
[15:21] mato then i don't understand the problem...
[15:21] sustrik you have to delete lines 572.577
[15:21] sustrik let me do it
[15:21] mato which lines, i have different line numbers here...
[15:22] mato and i'd like to understand what the problem is
[15:22] sustrik you should _not_ check POLLIN is set on ZMQ_FD and check ZMQ_EVENTS anyway
[15:22] sustrik whather
[15:22] sustrik ehether
[15:23] sustrik whether
[15:23] sustrik :)
[15:23] mato ?
[15:23] sustrik ...whether POLLIN is set...
[15:24] mato sustrik: hmm, so you're saying that each ZMQ_FD needs to be checked every time you come out of the select/poll() ?
[15:24] CIA-20 zeromq2: 03Martin Sustrik 07master * r4d51a52 10/ src/zmq.cpp : zmq_poll (select version) now correctly assumes that ZMQ_FD is edge-trigerred -
[15:25] sustrik yes
[15:25] sustrik committed
[15:25] sustrik the test program seems to work
[15:26] mato i still don't understand... select would not have exited if the fd did not become ready...?
[15:27] sustrik what?
[15:27] mato sustrik: if you're sitting in select(), and the notify fd becomes ready, then you read the events
[15:27] mato sustrik: what is the other code path?
[15:28] sustrik the commands in ZMQ_FD were already processed before calling zmq_poll
[15:28] sustrik select blocks forever
[15:28] sustrik although there are messages in available
[15:28] mato processed by who?
[15:28] sustrik random previous command
[15:29] mato ah, right, this is because ZMQ_FD is tapping straight into the signaller
[15:29] mato mumble
[15:29] sustrik ack
[15:29] mato i wish we could fix that
[15:29] sustrik ?
[15:29] sustrik it's fixed
[15:29] mato this is also why you do that first_pass thing, right?
[15:29] sustrik yes
[15:29] sustrik no timout on first pass
[15:30] sustrik exit immediately
[15:30] sustrik then check whether events are availalbe
[15:30] mato this is to pick up events coming from previously processed commands, right?
[15:30] sustrik yes
[15:30] mato ok, understood
[15:38] sustrik mato: btw, if you want to optimise it you can still perform the check when !first_pass
[15:39] sustrik getting ZMQ_EVENTS can be rather slow as it involves reading from the signaler => recv()
[15:40] mato right, that might be a good idea
[15:40] mato also, is getting ZMQ_FD slow?
[15:41] mato since that is also done more times than strictly necessary
[15:41] mato sustrik: btw, the signaler uses socketpair?
[15:41] mato sustrik: which translates to a tcp connection with winsock?
[15:44] sustrik yes
[15:44] mato interesting, i get an occasional "Address already in use" on Win32 from signaler.cpp:80
[15:45] sustrik let me see
[15:45] mato I guess this is just the poor M$ WIn$ock running out of tcp ports or something
[15:45] mato it's the "connect to remote peer" call...
[15:45] sustrik EADDRINUSE on connect?
[15:45] mato yeah
[15:45] mato bizarre
[15:46] sustrik hm, it's documented
[15:46] mato "not enough ports" ?
[15:47] sustrik very useful description:
[15:47] sustrik [EADDRINUSE]
[15:47] sustrik Attempt to establish a connection that uses addresses that are already in use.
[15:47] sustrik (that's POSIX)
[15:48] mato sustrik: windoze is slightly different
[15:48] sustrik Linux: EADDRINUSE
[15:48] sustrik Local address is already in use.
[15:48] mato
[15:49] sustrik The socket's local address is already in use and the socket was not marked to allow address reuse with SO_REUSEADDR. This error usually occurs when executing bind, but could be delayed until the connect function if the bind was to a wildcard address (INADDR_ANY or in6addr_any) for the local IP address. A specific address needs to be implicitly bound by the connect function.
[15:49] mato Yeah, I never quite understood the SO_REUSEADDR semantics on win32
[15:49] sustrik does it make any sense to you?
[15:49] mato but i'll try adding that in and see if it changes anything
[15:50] mato well, i think what it's saying is "the wildcard bind() picked a port that is still in time-wait state"
[15:50] mato or some nonsense like that
[15:50] sustrik :|
[15:50] mato let me try adding SO_REUSEADDR for win32 to the signaler listen socket and see what happens
[15:50] mato i don't care what it actually does on win32 as long as the error goes away :-)
[15:51] mato proper win32 solution is obviously to use named pipes/win32 objects/whatsits and iocp
[15:54] sustrik ok
[16:07] mato hmm
[16:07] mato i dunno
[16:07] mato doesn't seem to help much
[16:07] mato anyway, this doesn't really matter
[16:08] mato i added SO_REUSEADDR to both ends of the emulated socketpair and still get EADDRINUSE back
[16:08] mato so i think it's just poor windows running out of ports to auto-assign or something :-)
[16:08] mato sustrik: anyway, the problem with select() has been fixed, so it's all good!
[16:11] sustrik nice
[16:11] sustrik you may fill in the bug report for the EADDRINUSE stuff
[16:24] mato done
[20:00] cpscotti Hello there, anyone up to a philosophical (although basic) discussion on a zmq networking topology for a "generic" application? Regarding many clients & services but without a broker.
[20:17] cremes cpscotti: i'd be happy to have that conversation with you tomorrow; i have to leave for the rest of today
[20:19] cpscotti cremes: thanks.. if nothing is solved in my side of things I'll try tomorrow then
[20:19] cremes ok
[21:07] ModusPwnens is the send function faster than the receive function? or are they both just as fast?
[21:08] cpscotti send doesn't block, recv blocks
[21:08] cpscotti (as for the "program flow" speed)
[21:08] cpscotti now for the underlying stuff, dunno
[21:08] ModusPwnens even with pub/sub topology?
[21:08] jhawk28 recv can do a noblock flag
[21:08] cpscotti awl.. yep..
[21:08] ModusPwnens I see.
[21:09] ModusPwnens So that would explain benchmarking tests where
[21:09] ModusPwnens i run several in a row
[21:09] ModusPwnens
[21:09] ModusPwnens thats probably better than explaining it
[21:10] ModusPwnens I've sort of run into a wall with this and have been stuck on it for several days
[21:10] ModusPwnens because I am seeing those results
[21:10] ModusPwnens and am not sure what to make of them
[22:41] larrytheliquid are there any semantic implications to connecting multiple times to the same address?
[22:41] larrytheliquid from the same socket, that is