ZeroMq IRC Log

Monday June 13, 2011

[Time] Name	Message
[05:54] CIA-76	libzmq: 03Martin Sustrik 07master * rc7fb5c5 10/ (src/ctx.cpp src/select.cpp src/select.hpp src/windows.hpp): Reverting previous commit that broke MSVC2010 build ...
[05:54] CIA-76	libzmq: 03Martin Sustrik 07master * r970798f 10/ builds/msvc/libzmq/libzmq.vcproj : mtrie.cpp added to MSVC build ...
[08:12] mikko	sustrik: the icc patch
[08:12] mikko	sustrik: that didnt break gcc build?
[08:40] sustrik	mikko: no, i've tested it with gcc
[08:41] sustrik	and the compilation went ok
[08:41] sustrik	did it break it for you?
[08:50] mikko	no, im just wondering why gcc is so relaxed about these things
[08:54] jsimmons	built by bearded hippies :D
[08:55] sustrik	actaully, gcc gets it right, icc does not
[08:56] sustrik	zmq_assert (false); means that the execution never gets past that point
[08:56] mikko	i just viewed the patch it looking like a missing return
[08:56] sustrik	gcc detects the fact, icc does not
[08:56] mikko	sun studio works as well?
[08:56] mikko	i think at the moment sun studio is my favourite compiler
[08:56] sustrik	let me see...
[08:56] mikko	it seems to be the strictest of the three main ones
[08:57] sustrik	sun studio looks ok
[09:01] pieterh	hi
[09:02] pieterh	mikko: sustrik: did we figure out why mingw32 didn't complain with Steve's patch?
[09:03] mikko	pieterh: i don't know what the patch fixes really
[09:03] sustrik	no idea
[09:03] sustrik	some kind on windows black magic
[09:03] mikko	i've been slightly super-busy lately and haven't been able to pay much attention
[09:03] sustrik	order of includes can break things
[09:04] pieterh	I don't like blindly applying / reverting patches
[09:04] sustrik	i warned you about this one
[09:04] pieterh	let me boot up that old virtual XP
[09:30] pieterh	sustrik: ok, I've found the cause and the fix of the win32 build issue
[09:30] sustrik	yes?
[09:31] pieterh	I've no idea why it worked before, it doesn't seem directly caused by Steve's patch
[09:31] pieterh	basically if you include winsock2.h and then later winsock.h, it'll do weird stuff
[09:31] pieterh	tries to redefine (some) of the same constants
[09:32] pieterh	it's the mswsock.h include, I think (will check)
[09:32] pieterh	the fix is, in windows.hpp, #define _WINSOCKAPI_ // stops windows.h including winsock.h
[09:34] pieterh	well, it's not that header... in fact I can't find which header is responsible :-/
[09:34] pieterh	but the fix does work, have tested that
[09:35] sustrik	you mean steve's fix?
[09:36] sustrik	it doesn't work for me
[09:36] sustrik	msvc2010 build failed
[09:36] sustrik	when i reverted the patch it succeeded
[09:36] pieterh	yeah, agreed
[09:37] pieterh	there's something weird going on, when I define that macro in windows.hpp, at the start, it succeeds
[09:37] pieterh	otherwise, it fails
[09:37] pieterh	yet the very first include file that is supposed to be called starts by defining that macro
[09:37] pieterh	I think there's something broken in the Windows version number detection
[09:38] sustrik	do you have any idea what problem steve was fixing?
[09:38] sustrik	jenkins mingw builds seem to work ok
[09:38] pieterh	it looks like he was fixing some errors, e.g. using "" to include system header files
[09:39] pieterh	it's definitely a problem with version detection, possibly provoked by that LEAN_AND_MEAN change
[09:40] sustrik	the description of the patch is not very helpful: "Fix scope on Windows includes. Fix windows.h included before
[09:40] sustrik	winsock2.h. Remove definition of _WINSOCKAPI_.
[09:40] sustrik	"
[09:41] pieterh	sustrik: heh, " Remove definition of _WINSOCKAPI_."
[09:41] pieterh	He could as well have said, "Cause major breakage at compile time"
[09:42] pieterh	ok, since the fix is apparently to revert the patch, I'm going with that
[09:42] sustrik	ok
[09:43] pieterh	the problem here is if you somehow include winsock.h and then include windows.h (which the code does, after his change), you get symbols defined again by winsock2.h
[09:43] pieterh	or even if you include windows.h, then include winsock2.h, the same
[09:44] sustrik	does the unpatched version fail for you?
[09:44] pieterh	let me revert, and check that
[09:47] pieterh	sustrik: it builds fine, after reverting the patch, though I see what Steve was aiming at
[11:38] pieterh	sustrik: is there an issue for commit 864c18 (https://github.com/zeromq/libzmq/commit/864c18f797203c06e66e739166b246cfb3d47ce9)?
[11:39] pieterh	no changes to stable without issues and test cases
[11:39] pieterh	http://www.zeromq.org/docs:distributions#toc3
[11:40] pieterh	we managed to release 2.1.5 and 2.1.6 with breakage, it's not worth cutting corners
[11:40] pieterh	I'm happy to make a test case if there's an issue
[11:42] sustrik	drop it then
[11:42] pieterh	we shouldn't lower our standards for contributions
[11:42] pieterh	we agreed to have issues for changes
[11:42] sustrik	what it does is that it returns ENOMEM instead of asserting in zmq_msg_init*
[11:43] pieterh	for out of memory conditions?
[11:43] sustrik	yes
[11:43] pieterh	well, if there's no test case, there's no proof the fix actually works
[11:43] sustrik	sure
[11:43] sustrik	ditch it
[11:43] pieterh	ditched
[11:44] pieterh	I'd recommend being more strict about contributed patches in the future
[11:44] pieterh	though at the same time it's good to keep barriers low
[11:44] sustrik	:)
[11:45] pieterh	i think contributions to libzmq can be quite difficult, there are lower barriers elsewhere
[11:45] pieterh	plus there is an educational aspect, making test cases is just good practice
[11:46] pieterh	sustrik: I have an unrelated question / discussion
[11:46] sustrik	yes?
[11:46] pieterh	so I've gotten UDP working quite nicely
[11:46] sustrik	kudos!
[11:46] pieterh	it's actually a nice fit for 0MQ
[11:46] pieterh	I have a small wire protocol on top of UDP
[11:47] pieterh	there is a connection semantic, heartbeating, etc.
[11:47] pieterh	I'm going to call the protocol NOM-1
[11:47] pieterh	nom-oriented messaging protocol 1
[11:48] pieterh	so... I'd like to experiment with selectors
[11:48] pieterh	that is, socket validation at connection time, and receivers specifying selectors
[11:48] pieterh	so pull, dealer, and sub would all work with prefix filters
[11:49] pieterh	done at the sender side
[11:49] pieterh	the reason here is that I basically have one engine for all socket types
[11:50] pieterh	so if I do filtering for sub sockets, it's the same code whether it's pub-side or sub-side
[11:50] pieterh	and I can do it for free on all socket types where it makes sense
[11:50] pieterh	any obvious scalability problems that you can see with this?
[11:52] sustrik	why do you want to do filtering in the transport?
[11:52] sustrik	seems to be a wrong place for it
[11:52] pieterh	good question
[11:53] pieterh	it turns out the transport cannot simply be a transport
[11:53] pieterh	in fact the driver has to reimplement socket semantics
[11:53] pieterh	that is quite OK
[11:55] sustrik	what's the reason for that?
[11:55] pieterh	it's more to do with the VTX approach than UDP
[11:55] pieterh	I may be able to create a generic socket emulation layer above the transport
[11:55] pieterh	basically because VTX talks to applications over inproc
[11:56] pieterh	(you'll see the same issue with any bridge, in fact)
[11:56] pieterh	the semantics of app-to-bridge are pair
[11:57] sustrik	a brigde is basically a device imo
[11:57] pieterh	that is also what I'd hoped, but it doesn't work like that
[11:57] sustrik	why so?
[11:58] eintr	a device breaks delivery guarantees always, correct? it's not "transparent" as far as delivery goes, right?
[11:58] pieterh	well, if your bridge wants to support all socket types, it has to have N device models built in
[11:58] pieterh	where each device model in fact emulates a specific socket type
[11:58] sustrik	yes
[11:59] pieterh	which is where I ended up, with one engine emulating 10 socket types rather than 10 simpler engines
[11:59] pieterh	especially if all 10 simpler engines have to speak UDP
[11:59] pieterh	you can see that each bridge/device would be custom made for the transport
[11:59] sustrik	can't you simply specify the pattern when creating the bridge?
[12:00] pieterh	:-) of course
[12:00] pieterh	you say, "i want a push socket", and the bridge emulates that
[12:00] sustrik	right
[12:00] pieterh	but if you use the simplistic "bridge is device" approach you have to write 10 bridges
[12:00] pieterh	and then start the right one
[12:00] sustrik	not really
[12:01] sustrik	the device code is generic
[12:01] pieterh	only in 0MQ because you can use 0MQ sockets at both ends :-)
[12:01] pieterh	that is cheating
[12:01] pieterh	and doesn't work when one end is UDP or something else
[12:01] pieterh	you need to e.g. load-balance yourself
[12:01] sustrik	i see
[12:01] pieterh	manually, explicitly
[12:02] sustrik	but different patterns have different protcols
[12:02] sustrik	req/rep has backtrace stack
[12:02] pieterh	indeed
[12:02] sustrik	pub/sub has topics etc.
[12:02] pieterh	indeed
[12:02] pieterh	I will support all these over UDP, of course
[12:02] sustrik	one off-topic remark
[12:03] pieterh	sure
[12:03] sustrik	if you want pub-side filtering with UDP you have to build reliability into the transport
[12:03] pieterh	yes, I know
[12:03] sustrik	ok
[12:03] pieterh	in any case request-reply won't work otherwise either
[12:03] pieterh	single lost message = blocked client
[12:04] pieterh	it actually works really nicely
[12:04] pieterh	since I can add reliability precisely in those cases I need it
[12:04] sustrik	any point in using UDP then?
[12:04] pieterh	oh, yes
[12:04] sustrik	looks like duplicating TCP functionality
[12:04] pieterh	e.g. no reliability on pubsub
[12:04] pieterh	broadcast functionality, i.e. connect to *:port
[12:04] eintr	no timeouts
[12:05] pieterh	no reliability on push/pull or dealer
[12:05] pieterh	plus the goal isn't really UDP, it's about learning how to add user-space transports
[12:05] pieterh	next up, TCP
[12:06] sustrik	ok, i see
[12:06] pieterh	is it worth exploring selectors for pull sockets?
[12:06] sustrik	nope imo
[12:06] sustrik	the parallelised pipeline is for load distribution
[12:06] sustrik	filtering doesn't make sense there
[12:07] pieterh	it does if you do sender-side filtering
[12:07] sustrik	what would that be good for?
[12:07] pieterh	well, you have a task queue and then workers can join and specify the category of tasks they're prepared to handle
[12:08] sustrik	the workers are meant to be interchangeble
[12:08] pieterh	sure
[12:08] pieterh	you can have interchangeable workers
[12:08] pieterh	many for any category of tasks, any mix
[12:08] sustrik	then you should have multiple pipelines
[12:08] pieterh	ah, queuing issues
[12:09] sustrik	and administration
[12:09] pieterh	yes, that's one option but it means you split work over queues
[12:09] sustrik	imagine one worker fails
[12:09] sustrik	are the other workers able to take over the load?
[12:09] sustrik	etc.
[12:09] pieterh	yes
[12:09] pieterh	because we know when peers disappear
[12:10] sustrik	yes
[12:10] pieterh	but that's just the same as normal pipeline
[12:10] sustrik	however, with filtering it's not clear whether there's a worker that can process particular request
[12:10] sustrik	with multiple pipelines it's obvious
[12:10] pieterh	true, so a task may remain stuck in a queue
[12:11] pieterh	well, we solved this issue in AMQP, if a task can't be delivered because no-one's willing to handle it, it gets dropped
[12:12] pieterh	but ok
[12:12] pieterh	how did you implement xsub, then? you send subscribe messages from sub to pub?
[12:12] sustrik	yes
[12:13] pieterh	ok
[12:13] sustrik	here's a sketch of the arch whitepaper:
[12:13] sustrik	http://www.250bpm.com/pubsub
[12:15] pieterh	hmm, any reason for not simply replacing sub/pub with xsub/xpub semantics?
[12:15] pieterh	we seem to have three different ways to talk to 'special' sockets
[12:15] pieterh	setsockopt, send special frame, send special message
[12:16] sustrik	there are two layers in the stack
[12:16] sustrik	X- layer and non-X layer
[12:16] pieterh	:-)
[12:16] sustrik	in X-layer you compose the messages by hand
[12:16] pieterh	you always tell me about the architecture of your implementation
[12:17] sustrik	ok, forget it
[12:17] pieterh	whereas I'm always arguing about APIs :-)
[12:17] sustrik	as an end user you should use only non-X socket types
[12:17] sustrik	which are consistent in using socket options
[12:18] pieterh	yes, you're right
[12:18] sustrik	note that you can plug into subscription forwarding mechanism
[12:19] sustrik	that may be of use in vtx
[12:19] pieterh	we could one day see if there's an alternative implementation for ROUTER
[12:19] sustrik	ah, that reminds my of issue 190
[12:19] sustrik	i have to fix that sooner or later
[12:19] sustrik	at that point we have to separate req/rep from the router
[12:20] sustrik	i can do that
[12:20] sustrik	however, someone has to take care of new router/dealer socket types
[12:20] sustrik	would you like to become a maintainer?
[12:21] pieterh	well, I'm not competent to modify the code, but willing to learn
[12:21] pieterh	however, you mean 'contributor', right?
[12:21] pieterh	:)
[12:21] pieterh	how will you fix issue 190?
[12:22] sustrik	straightforwardly: drop messages on disconnect
[12:22] pieterh	drop messages where, on disconnect what?
[12:22] sustrik	client
[12:22] sustrik	requester
[12:22] pieterh	that doesn't seem to address the issue...
[12:22] pieterh	1000 requests waiting at the REP side, no?
[12:23] sustrik	if requester is dead, there's noone to send replies to => drop any pending requests & replies
[12:23] pieterh	so when a REQ dies, you go through the queue and remove any requests it sent?
[12:23] sustrik	yes
[12:24] pieterh	how does that end up separating req/rep from router?
[12:24] sustrik	router won't work anymore then
[12:24] sustrik	i assume "send(); close();" is a valid sequence for router
[12:24] pieterh	ah, right
[12:25] sustrik	so, i'll serpate the two
[12:25] pieterh	i don't see how you can solve issue 190 over more than 1 hop
[12:25] sustrik	i can't
[12:25] sustrik	it's just an optimisation
[12:25] sustrik	not perfect solution
[12:25] pieterh	imo it
[12:25] pieterh	it's a wrongly stated problem
[12:26] pieterh	"In my case I am attempting to do least recently used"
[12:26] pieterh	the problem is not the queuing at the rep socket end
[12:26] sustrik	yes
[12:26] pieterh	the problem is the semantics where clients disappear as part of the normal scenario
[12:27] sustrik	exactly
[12:27] pieterh	if this problem is even worth solving, it means it happens often
[12:27] pieterh	once a week, it's not an issue
[12:27] sustrik	yes
[12:27] pieterh	so the problem is not solvable at the socket level at all
[12:27] sustrik	?
[12:27] pieterh	it requires some protocol for disconnected clients and workers
[12:27] pieterh	i.e. I can make a request and come back later to fetch the response
[12:28] pieterh	if clients are connected, they're connected
[12:28] sustrik	you can do that using identities
[12:28] pieterh	explicit identities?
[12:28] sustrik	yeah
[12:28] pieterh	that's not a good answer
[12:28] sustrik	ugly
[12:28] pieterh	nope
[12:29] pieterh	it's hacking the socket layer for something it shouldn't be used for
[12:29] pieterh	and I don't see that issue 190 is a reason to break ROUTER functionality
[12:29] sustrik	router is a different pattern anyway
[12:29] pieterh	perhaps
[12:30] pieterh	I mean, it should be implemented, in one way or another
[12:30] sustrik	so my proposal is to split the two
[12:30] sustrik	what's missing though
[12:30] pieterh	being able to address anonymous peers by automatic identity is a valid pattern
[12:30] sustrik	is docuementation
[12:30] pieterh	it so happens xrep does that perfectly
[12:30] pieterh	clumsy, though
[12:30] pieterh	however the use case in 190 is bogus IMO
[12:31] sustrik	forget about the use case
[12:31] sustrik	router will break sooner or later anyway
[12:31] pieterh	why?
[12:31] sustrik	imagine adding discarding duplicates to req/rep
[12:31] sustrik	for example
[12:32] sustrik	basically any firther development of req/rep pattern breaks router
[12:32] pieterh	look, the basic addressing model for req-xrep-xreq-rep will break sooner or later
[12:32] sustrik	which mean req/rep is stuck atm
[12:32] pieterh	we know that
[12:33] pieterh	once you accept that router (and dealer) are valid user space patterns
[12:33] pieterh	we can design a better router API
[12:33] sustrik	sure, that's what i am proposing
[12:33] pieterh	makes sense
[12:33] sustrik	i'll separate req/rep from router
[12:33] sustrik	you'll become maintener of router
[12:33] pieterh	this won't happen in 2.1.x but could happen in 2.2
[12:33] sustrik	i'll take care of req/rep
[12:33] pieterh	well, 'maintainer' means, accepting patches and running test cases and enforcing process
[12:34] sustrik	you can leave the code as is if that's what you want
[12:34] sustrik	the point now is there's no documentation for router pattern
[12:34] pieterh	?
[12:34] sustrik	so even if i split the two
[12:34] pieterh	there's like 50 pages of that
[12:34] pieterh	please, one day when you have nothing else to do, read the Guide
[12:34] pieterh	seriously
[12:35] sustrik	i meant in man pages
[12:35] sustrik	so zmq_socket(3)
[12:35] sustrik	see
[12:35] sustrik	there should be at least a paragraph about router pattern
[12:35] pieterh	there is a section describing ZMQ_ROUTER, yes
[12:35] sustrik	and a paragraph for each associated socket type
[12:36] sustrik	in 2-1?
[12:36] pieterh	yes
[12:36] sustrik	let me see
[12:36] pieterh	ROUTER only makes sense within REQ-REP pattern
[12:36] pieterh	it's not a separate pattern
[12:36] sustrik	it is
[12:36] sustrik	ROUTER != XREP
[12:36] pieterh	I seem to remember that patterns are not interconnectable
[12:36] sustrik	exactly
[12:37] pieterh	yet ROUTER must be able to talk to REP, REQ, DEALER, and ROUTER
[12:37] pieterh	otherwise it's kind of... useless :)
[12:37] pieterh	hey, I have this great socket type but it can't talk to anything else
[12:37] pieterh	-1
[12:37] sustrik	ok, let it be for now
[12:37] sustrik	we'll sort it out once the router breaks
[12:37] pieterh	read the Guide, martin, get some idea of actual use cases for this stuff
[12:37] pieterh	it's good to have all the theory
[12:38] pieterh	but in the end it's what people do with it that really defines reality
[12:38] pieterh	if you break router arbitrarily, you'll annoy a lot of people
[12:39] sustrik	that's why router should be a separate pattern
[12:39] pieterh	the argument "you should not have used it, I warned you" won't work
[12:39] sustrik	changing req/rep won't break it
[12:39] pieterh	that's only plausible if you let patterns interconnect
[12:39] sustrik	req/rep is meant for stateless services
[12:39] pieterh	router should be part of the request-reply pattern, but be much more explicit
[12:39] sustrik	if you want something different, use router
[12:39] sustrik	easy
[12:40] pieterh	connecting to what?
[12:40] sustrik	another router?
[12:40] sustrik	dealer?
[12:40] pieterh	no, no, and like my son says, no
[12:40] pieterh	:)
[12:40] sustrik	you can add as many socket types to the router pattern as you want
[12:40] pieterh	routing is from req, to rep / dealer
[12:41] pieterh	this is splitting hairs
[12:41] sustrik	routing allows you to address particular service instance
[12:41] pieterh	making the router semantics a clearly defined package is good
[12:41] sustrik	which breaks the model of interchangable staless services
[12:41] sustrik	stateless
[12:41] pieterh	sustrik: you're not accurate, really sorry
[12:41] sustrik	shrug
[12:42] pieterh	router is definitely used for interchangeable stateless services
[12:42] pieterh	look at the lruqueue for example
[12:42] pieterh	I can't discuss this if you refuse to read the dozens of worked examples I made, which are widely used
[12:42] sustrik	what if the service instance you send message to is dead?
[12:43] pieterh	precisely
[12:43] sustrik	0mq has to pass it to some other instance of the service
[12:43] sustrik	which means the address is disregarded anyway
[12:43] pieterh	ok, as you like
[12:43] pieterh	i'm not going to insist on this
[12:44] pieterh	you've said for a year or so that XREP was not meant for end users
[12:44] sustrik	ok, cyl
[12:44] pieterh	we went ahead and did it
[12:44] pieterh	I'm sure you'll break it and explain why
[12:45] pieterh	and you're pretty stubborn about not learning why people use it, and what they do with it
[12:46] sustrik	they use it to address state in the network
[12:46] sustrik	which is ok, but not a stateless req/rep model
[12:47] sustrik	simply a different thing
[12:47] pieterh	hey, we use router to kill puppies, which is obviously wrong
[12:47] pieterh	the actual use case is to create application-level routing to peers
[12:47] pieterh	which is not the same as state
[12:48] sustrik	yes
[12:48] pieterh	req-rep already has state
[12:48] pieterh	a reply address is state
[12:48] sustrik	yes, but it's nicely encapsulated in the message
[12:48] pieterh	meaningless distinction
[12:48] sustrik	the point is not to have state at the nodes
[12:48] pieterh	obviously 190 is about state
[12:48] pieterh	there is no state at the nodes
[12:48] sustrik	erlang-style approach
[12:49] sustrik	you mean the queues?
[12:49] pieterh	what do you mean by 'nodes'?
[12:49] sustrik	applications
[12:49] sustrik	state = business logic state
[12:49] pieterh	I don't think a single of the router use cases puts state in the applications
[12:49] pieterh	you'd know that if you read the guide
[12:50] pieterh	in fact, categorically, router is used to construct devices
[12:50] sustrik	XREP
[12:51] pieterh	you got me
[12:51] sustrik	:)
[12:51] pieterh	should have said that half an hour ago
[12:51] sustrik	well, which pattern should i have a look at?
[12:51] pieterh	XREP/ROUTER is not used in applications
[12:51] sustrik	is that the state of affairs with the users?
[12:51] pieterh	it makes no sense and that was never covered in the guide
[12:52] pieterh	if they do stuff that's not explained in the Guide, I'm not responsible :)
[12:52] pieterh	I'm sure people try everything
[12:52] sustrik	well, but that's the point
[12:52] pieterh	however our _users_ are principally not application developers
[12:52] sustrik	if i break that uncovered in the guide use case
[12:52] pieterh	they are infrastructure builders
[12:52] pieterh	they build brokers
[12:52] sustrik	everybody will be pissed off
[12:53] pieterh	for good reasons
[12:53] sustrik	that's why i want to split the existing functioanlity into a separate pattern
[12:53] pieterh	but I'm confident when you actually read the guide you'll be like "ah, I get it!"
[12:54] pieterh	there's only 280 pages or so to get through
[12:54] pieterh	I've made a nice PDF for you to download
[12:54] sustrik	that's quite a lot. which part should i have a look at?
[12:54] pieterh	chapters 3 and 4, I guess
[12:55] sustrik	any pattern that makes your point clear?
[12:55] pieterh	you need to see how router/xrep is really used, why I forced that rename
[12:55] pieterh	well, there are a few
[12:55] pieterh	lruqueue, all the reliable request-reply patterns
[12:55] sustrik	ok, let me have a look at lrqueue
[12:55] pieterh	lruqueue was the original abuse of XREP to solve a real issue
[12:56] pieterh	there was no better answer at the time (I'd have loved one)
[12:56] pieterh	tbh the whole business with special message frames is annoying
[12:56] pieterh	but it does work
[12:56] sustrik	hm, no "lrqueue" in the text
[12:56] pieterh	lruqueue
[12:56] sustrik	ah
[12:56] pieterh	least recently used queue broker
[12:57] pieterh	the only state it maintains is presence/absence/busy/available of workers
[12:58] pieterh	does not use explicit identities
[12:58] pieterh	ok, cyl, I'm going skating with the kids, it's a holiday here in Belgium
[12:59] sustrik	cya
[13:01] pieterh	I think lruqueue is the canonical example, if you can solve that better, we're winning
[13:01] pieterh	cyl, nice chatting
[13:04] sustrik	ok, read it
[13:04] sustrik	the goal is to tweak the schduler
[13:04] sustrik	that can be done by socket options
[13:09] sustrik	i have a more powerfull tool on my todo list -- a priority based-scheduler
[13:09] sustrik	requires some work though
[16:22] michelp	OMG backscroll
[16:24] michelp	FWIW we use the lruqueue pattern to great effect
[17:52] brianjarita	hey all ... I am trying to daemonize a python script that is a handler for mongrel2. What is the best way to do this?
[17:58] pieterh	brianjarita: I kind of think you're in the wrong irc channel
[17:58] pieterh	this is #zeromq
[17:59] pieterh	we're more about ... well... puppies, and stuff
[18:02] brianjarita	haha ... thanks i'll ask #mongrel2
[18:02] pieterh	brianjarita: or maybe #python
[20:56] iFire	wondering, anyone use infiniband with zeromq?
[20:59] pieterh	iFire: hi
[20:59] iFire	just wondering. I don't really plan on it
[20:59] pieterh	http://www.zeromq.org/results:ib-tests-v206
[21:00] iFire	that's megabits right?
[21:01] pieterh	megabytes
[21:01] pieterh	ah, sorry, no, megabits
[21:01] iFire	what type of infiniband
[21:02] pieterh	that page is all the info I've got, but if you google 'zeromq infiniband' you can find more material
[21:02] iFire	4x 16gigabit ddr is rated for16gigabits
[21:02] iFire	which is about 2000megabytes/s (calculator)
[21:03] iFire	9026 megabits to megabytes = 1128 megabytes
[21:03] iFire	but experimental is good :)
[21:04] pieterh	also this isn't using rdma or anysuch
[21:04] pieterh	I'd assume that message throughput will be limited by CPU, to 6-8M events a second per core
[21:05] pieterh	but you should be able to saturate any network at messages of 64KB and above
[21:06] pieterh	in those 2.0.6 tests, obviously the network is saturated at about 9Mb/sec
[21:06] pieterh	network, or driver
[21:06] iFire	yea 900Megabits/sec
[21:06] iFire	9000Megabits/sec
[21:06] pieterh	that's not a zeromq limitation, afaics
[21:08] iFire	Mellanox MT25204 is either 10Gigabits/s or 20 gigabits/s
[21:09] iFire	I'm thinking it's probably 10gigabits in that machine
[21:09] pieterh	that's what I'd conclude
[21:09] iFire	hmm let me see what 8gigabits is in megabits
[21:09] pieterh	do you have material to test on?
[21:09] iFire	no
[21:09] iFire	I was just wondering
[21:10] pieterh	you may get more answers on the list
[21:43] ssi	does it make sense to use high-water marks on PULL/PUSH sockets?
[21:44] ssi	I see stuff in the guide regarding HWM with pub/sub model, but nothing regarding push/pull
[21:50] michelp	ssi, it does make sense. Check out the man page for zmq_socket
[21:56] ssi	I see... so the HWM will basically only be on the PUSH side... if there's noone pulling messages from downstream, the PUSH socket will fill up, and on the HWM it'll block
[21:56] ssi	so I guess the call to send() itself blocks?
[21:57] ssi	I'm working on writing a monitoring system that'll let me watch the backlogs on all the sockets for each node
[22:04] ssi	so does a PULL socket only have inbound messages if recv is being called? Ie, no queueing at the PULL end ever happens?
[22:04] ssi	otherwise, wouldn't PULL also be able to fill up and need a HWM to signal the upstream node not to send anymore?
[22:08] ssi	oshi I think I broke activemq's brain
[22:09] ssi	I tested my new jmsreceiver to zmq bridge with the new zmq pipeline, and ran 10 million small text messages through activemq
[22:09] ssi	zmq side ate it up, but activemq gave up around 8.5M
[22:21] michelp	ssi, you should look into zmq_poll, i'm not 100% positive on this but it will tell you when it's ok to push without blocking
[22:21] michelp	if you register your socket with POLLOUT
[22:22] michelp	yeah "For 0MQ sockets, at least one message may be sent to the socket without blocking." that's for POLLOUT
[22:22] michelp	the man page is zmq_poll
[22:23] ssi	hrm
[22:23] ssi	I'm using poll for receives
[22:23] ssi	here's the issue I'm facing:
[22:24] ssi	I have a JmsReceiver component which pulls messages off a JMS queue, converts them to my internal message format, and puts them into the pipeline
[22:24] ssi	if I fire it up with more messages on the queue than heap in my jvm, it eagerly consumes them all and OOMs
[22:25] ssi	so I want to find a solid way of making sure that I'm throttling everyone upstream while work is being done
[22:25] ssi	the more I dig in, the more I think most links in the chain are well-behaved
[22:26] ssi	I'm trying to print the backlog on all the sockets, but everything is coming back -1 so far
[22:29] ssi	my arch is kinda complex, but basically each pipeline node is a streamer device which is bound on the upstream end to the address that other nodes can send messages in through, and on the downstream end to a worker address. Each node has N workers which exist in a threadpool, consuming messages off the streamer and doing work. When they're finished with the work, they send the message down the pipeline to the stable address of the next node(s) in the fl
[22:30] ssi	the JmsReceiver is a node which also happens to be an MDB. onMessage, it creates a socket, connects to its own inbound stable address, sends that message, and closes the socket
[22:30] ssi	if queueing on push/pull is done at the push end, that may be my issue
[22:30] ssi	if I make that receiver maintain its socket, it may behave better
[22:38] michelp	woah sorry, i'm at work and not able to really pay enough attention to help you. maybe just as a some general advice i'd tell you to try and do what you want to do in a very small and simple test case first
[22:38] michelp	without all the complexity
[22:39] michelp	even in just some side code, or in a quick scripting language where you can experiment at will
[22:39] michelp	then translate that into your big project once you've figured out where the misbehavior is
[22:40] ssi	heheh sadly, this is the small test case :)
[22:40] ssi	it's not as complex as it sounds, really
[22:41] ssi	and the problem is indeed that the jms consumer is eating everything on the queue as fast as possible
[22:41] ssi	I just need to figure out how to properly monitor the backlogs... everything reports -1 no matter what
[22:47] ssi	hrm backlog isn't what I think it is
[22:47] ssi	is there a way to get ahold of the number of messages that are queued?
[22:53] pieterh	ssi: hi
[22:53] pieterh	how many processes is this split into?
[22:56] ssi	this is all done with inproc, so it's a single process
[22:56] ssi	threaded, of course
[22:56] pieterh	ok, so a few things here...
[22:56] pieterh	you can't measure the queue size, no
[22:56] ssi	right, reading that
[22:56] pieterh	wish that was possible but it's not
[22:57] pieterh	best you can do is set a lowish HWM everywhere and check for POLLOUT before writin
[22:57] pieterh	when you hit a HWM somewhere, you know you've a problem
[22:57] ssi	I am setting low HWMs everywhere, but i'm not checking pollout, so that's likely the part I'm missing
[22:57] ssi	I need to figure out how to do that (using the java bindings)
[22:58] pieterh	second thing, this is presumably your first real 0MQ application?
[22:58] ssi	yes
[22:58] pieterh	ok, so plan to throw it away
[22:58] ssi	and I'm not sure I'd call it "real" yet, but it certainly has come together quicker than I expected
[22:58] pieterh	especially if you make it bigger than you can fully control
[22:59] pieterh	it's kind of too easy to put things together without really understanding what's going on
[22:59] ssi	well fortunately, it's complex but it's not big. It's exactly one class of any significance
[22:59] pieterh	sure
[22:59] pieterh	once you internalize the 0MQ semantics the complexity will go away
[23:00] pieterh	third, for success, you have to start with a small piece, make that work fully, then add one piece at a time
[23:00] ssi	yes, that's what I've done... I just skipped over the HWM part I think
[23:00] ssi	trying to go back and make that happen before I get too far along
[23:00] pieterh	so assuming you're pretty close to having an accurate design
[23:01] pieterh	I think you can see memory consumption per thread
[23:01] pieterh	that may not be sufficient, you may need to break the app into multiple processes over tcp://
[23:02] pieterh	you need to understand why messages are building up
[23:02] pieterh	HWM is not really the best tool IME for resolving message build up, it's more of an exceptional condition
[23:02] ssi	well I don't think I even have buildup
[23:02] pieterh	which you can test for by doing non-blocking writes and checking for EAGAIN
[23:03] pieterh	you said you have out-of-memory?
[23:03] ssi	I've artificially slowed down my downstream processes to simulate it
[23:03] ssi	yes, I have out of memory because I connected to a jms queue with 50 messages on it, each around 10MB in size
[23:03] ssi	and a 128MB heap in my jvm
[23:03] pieterh	ah, so it actually works until you do breakage testing?
[23:03] ssi	and my jms receiver is eagerly consuming EVERYTHING
[23:03] ssi	I'm trying to find a way to signal the jms receiver that yes, the downstream nodes can accept, or no they can't
[23:03] pieterh	hmm
[23:03] ssi	yeah everything works great
[23:04] pieterh	how much of the guide have you read?
[23:04] ssi	all of it
[23:04] ssi	not sure I fully internalized it all
[23:04] ssi	but I read it all :)
[23:04] pieterh	so the pattern I'm thinking you need is a least-recently used routing
[23:04] pieterh	with some way to stop the JMS receiver in case of queue overflow
[23:05] pieterh	that's just a superficial opinion...
[23:05] ssi	yes, that make sense
[23:05] ssi	I don't know if I necessarily need the LRU or not
[23:05] pieterh	you don't have much traffic, so you can definitely synchronize between downstream and upstream
[23:06] ssi	I have a document I drew up explaining how my messaging works currently
[23:06] ssi	I'm trying to get my hands on it so I can show you
[23:06] pieterh	well, tbh it sounds like you do have it under control...
[23:07] ssi	I'm just not sure how to make it respect the HWM
[23:07] pieterh	find the specific case where you think it doesn't respect the HWM
[23:07] pieterh	no-one will help you debug anything more complex than a minimal test case
[23:08] ssi	fair enough
[23:08] ssi	can you point me at what you mean by "check pollout on the send"?
[23:08] pieterh	well, what I'd do is not use poll for that, it's painful
[23:08] pieterh	instead, use zmq_send with ZMQ_NOBLOCK
[23:08] pieterh	and check for an error return with EAGAIN as error value
[23:09] pieterh	I hope the Java binding exposes this error value, it must
[23:09] ssi	not sure I fully comprehend, but if I put together the simple test case it may come to me
[23:10] pieterh	there's a lot still missing from the Guide, unfortunately
[23:17] ssi	now, I'm correct from my reading that only the PUSH side has an HWM, is that right?
[23:18] pieterh	both sides will apply a queue limit
[23:18] pieterh	I'm not sure what the semantics are for HWM at the receiver side
[23:19] pieterh	it will, I assume, cause the PULL socket to stop reading from the network, which will cause the PUSH socket to eventually stop sending to that PULL socket
[23:19] pieterh	that means the actual queued data = queues at both sides plus network buffers at both sides plus whatever's in transit over the network
[23:19] ssi	that's fine, not entirely concerned with how much
[23:19] ssi	just as long as I can make sure that something stops it
[23:20] pieterh	hmm, the docs don't specify any exception handling for PULL sockets
[23:20] ssi	and are they entirely a function of message count? or does message size come into play at all
[23:20] pieterh	so let's assume they're accurate, and only the PUSH side counts
[23:20] pieterh	it'
[23:20] pieterh	it's only a messsage count
[23:20] pieterh	no byte counts
[23:20] ssi	ok
[23:21] ssi	that'll need to be tuned individually then... cause I have to be able to deal with enormous messages and tiny messages both
[23:21] pieterh	one pattern I've used successfully is credit-based flow control
[23:21] pieterh	so receiver sends credit messages to sender
[23:21] pieterh	which only sends when it has credit
[23:21] ssi	hrm that's interesting
[23:22] pieterh	it's fairly simple to implement, you need some basic command framing like all the toy protocols in the Guide have
[23:22] pieterh	then receiver says 'ready' by sending off credit, and tops up as it receives and processes data
[23:22] pieterh	sender can route based on credit, as well as stop/start sending
[23:23] pieterh	for this you don't use PUSH/PULL any more, but ROUTER/DEALER
[23:23] ssi	right
[23:23] ssi	and i worry about getting away from push/pull, because it so perfectly represents my flow
[23:23] ssi	(plus routing is still scary)
[23:23] pieterh	well, dealer is push and pull combined
[23:24] pieterh	and routing is scary but seems inevitable in many patterns
[23:26] pieterh	ssi: ok, good luck, I'm off now, it's 1.26am here :)
[23:27] ssi	ok, thanks for the help
[23:27] ssi	I hope to have this simple test case working tomorrow
[23:30] ssi	...or right now... and it definitely behaves the same way. I just need to figure out how to respect the HWM, and I think I'm good to go