ZeroMq IRC Log

Tuesday March 22, 2011

[Time] Name	Message
[01:02] reiddraper	I'm seeing ~4K requests/second for a REQ-REP socket with two python processes. The are sending "hello". Is this performance expected?
[01:04] jhawk28	not sure about python, but it usually depends
[01:04] jhawk28	on the network, the computing platform, and the language bindings
[01:05] reiddraper	jhawk28: localhost, ubuntu
[01:06] reiddraper	jhawk28: suppose I was expecting it to be an order of magnitude faster
[01:06] jhawk28	are you just req: Hello and Rep: hello?
[01:06] reiddraper	yes
[01:07] reiddraper	zeromq 2.1.3
[01:07] jhawk28	let me do a quick one in Java
[01:07] reiddraper	same performance with ipc and tcp
[01:08] reiddraper	jhawk28: thanks
[01:21] andrewvc	cremes: around?
[01:21] andrewvc	cremes: Wondering if you'd mind me releasing 0.7.3 w/ zdevice support
[01:23] jhawk28	reiddraper: I'm getting about 10k/s
[01:25] reiddraper	jhawk28: ok, seems reasonable that python would be 4K then
[01:26] jhawk28	thats Java with 2.1.3 on a 2.4 Core i5 (OSX)
[01:30] jhawk28	reiddraper: push/pull gets 2mil/s
[01:30] reiddraper	jhawk28: pretty big difference
[01:31] reiddraper	only really surprised because I've seen http servers do more req/s than what I'm seeing
[01:33] jhawk28	req/rep is synchronous
[01:33] jhawk28	single threaded both sides
[01:34] reiddraper	yeah, figured it would have to be, that being said, so is something like Redis, which gets 10's of thousands of operations / second, over tcp
[01:34] reiddraper	req-rep
[01:47] jhawk28	reiddraper: increase the number of clients
[01:48] jhawk28	when I bump up the number of clients, I am getting 25k/sec
[01:49] reiddraper	jhawk28: Cool, and to be honest, for what I have in mind, 4k/sec is plenty fast
[01:51] jhawk28	I could probably scale it more if I used XREP
[01:52] jhawk28	and actually split it between machines
[01:55] reiddraper	jhawk28: can you explain the difference betwen xreq/rep and req/rep?
[01:56] jhawk28	req/rep is syncronous, xreq/xrep uses identities
[01:56] reiddraper	ah, OK
[01:56] jhawk28	the identities are then used by zmq to route the response back to the correct socket
[01:58] jhawk28	thats as much as I know
[01:58] jhawk28	I haven't used them much
[01:58] jhawk28	its on my todo list...
[02:00] reiddraper	jhawk28: ok, so you don't get slowed down by slowest client that is load-balanced
[02:01] jhawk28	or worker
[02:01] jhawk28	thats what xreq is for (for dealing out work)
[02:04] reiddraper	ok, so for a queue broker, the clients sending xreq (give me a task), and the broker sending xrep (do this) makes sense?
[02:07] jhawk28	Chapter 3 does a good job explaining it: http://zguide.zeromq.org/page:all#toc45
[02:08] reiddraper	jhawk28: awesome. thanks
[02:16] believa	newb question - are the following statements true? You cannot "bind" multiple sockets to the same endpoint. You "can" connect multiple sockets to the same endpoint. A socket can "bind" and/or "connect" to multiple endpoints.
[02:18] jhawk28	Yes, yes
[02:19] jhawk28	yes
[02:19] believa	jhawk28: thanks for the confirmation
[02:20] neopallium	believa: you can bind one socket to multiple different endpoints, but you can't bind multiple sockets to the same endpoint.
[02:21] believa	neopallium: gotcha
[02:21] neopallium	just like you can't bind multiple tcp sockets to the same port on the same computer.
[02:22] believa	neopallium: that should result in EADDRINUSE right?
[02:30] neopallium	believa: yes
[02:30] believa	neopallium: thanks
[07:47] sustrik	reiddrapper: req/rep is lock-step; the performance is determined by the latency of your network
[07:47] sustrik	the actual messaging fabric is almost irrelevant
[08:09] pieterh	good morning
[08:10] pieterh	sustrik: we had this question twice in a day, perhaps worth some explanation in the guide
[08:12] sustrik	yes, that would be good
[08:12] sustrik	alternatively
[08:12] sustrik	there's a page about rinning perf tests
[08:12] sustrik	let me see
[08:12] pieterh	perhaps an explicit page we can refer to upfront, yes
[08:12] pieterh	e.g. expected throughput and latency of each pattern
[08:13] pieterh	very rough, but to set expectations properly
[08:14] sustrik	http://www.zeromq.org/results:perf-howto
[08:20] pieterh	sustrik: it's not very useful to beginners IMO
[08:21] pieterh	I'll think about how to explain this, it's got to be in terms of limits, capacity, speed of different patterns & transports
[08:21] pieterh	like a spec sheet
[08:21] sustrik	let me send you a diagram
[08:22] pieterh	sure
[08:39] pieterh	sustrik: random idea for cleaner semantics on multipart messages
[08:39] pieterh	make the MORE bit a property of a frame (zmq_msg_t) rather than a socket
[08:40] sustrik	that's how it works on wire level
[08:40] pieterh	it would also make more sense at the API level IMO
[08:40] sustrik	with API it's a pain to use
[08:40] sustrik	i though of combining the two approaches
[08:40] pieterh	it means I can prepare a frame and write it with a generic method
[08:40] sustrik	yes
[08:40] pieterh	if you consider zmq_msg_t as a 'smart blob' (and I like this), then it should have a more property
[08:41] sustrik	the problem is this:
[08:41] sustrik	zmq_send (msg, MSG_SNDMORE);
[08:41] sustrik	vs.
[08:41] sustrik	zmq_msg_setflag (msg, ZMQ_MORE, 1);
[08:41] pieterh	you can always do both
[08:41] sustrik	zmq_send (msg, 0);
[08:41] sustrik	yes
[08:42] pieterh	I see the zmq_send (..., MSG_SENDMORE) as either an optimization or an override
[08:42] sustrik	yes
[08:42] sustrik	convenience feature
[08:42] pieterh	e.g. if I have an identity frame, and want to send it, it's always going to be MORE
[08:43] pieterh	read / write then become symmetric
[08:43] sustrik	yes, it would simplify devices
[08:43] pieterh	yes
[08:43] pieterh	indeed, any generic handling of multipart messages becomes cleaner
[08:43] sustrik	ack
[08:43] pieterh	another question, is it necessary to destroy a message after sending it?
[08:43] pieterh	sending the same frame N times is rather clumsy today
[08:43] sustrik	it you don't nothing happens
[08:44] sustrik	but it's safer to do so for forward compatibility
[08:44] pieterh	ah, you mean _close is optional
[08:44] sustrik	nope
[08:44] sustrik	technically, closing empty message translates to noop
[08:45] sustrik	however, that is not guaranteed to hold in future versions of 0mq
[08:45] sustrik	so, preferable, close the messages so that 0mq can hook into message destruction process
[08:45] sustrik	preferably*
[08:46] pieterh	To send a message twice I need to:
[08:46] pieterh	zmq_msg_t copy;
[08:46] pieterh	zmq_msg_init (&copy);
[08:46] pieterh	zmq_msg_copy (&copy, &original);
[08:46] pieterh	zmq_send (socket, &copy, 0);
[08:46] pieterh	zmq_msg_close (&copy);
[08:46] sustrik	yes
[08:46] sustrik	well, you should close the original as well
[08:46] pieterh	so my question is whether it's necessary for 0MQ to destroy the message after sending
[08:46] sustrik	unless you are going to use it
[08:46] sustrik	not now
[08:46] sustrik	may be necessary in future
[08:47] pieterh	could I have a flag saying, "don't nullify after sending"?
[08:47] pieterh	MSG_REUSE
[08:47] sustrik	ah, a convenience feature
[08:47] sustrik	you can have that, but you should be aware it's slow
[08:47] pieterh	slower than creating copies each time?
[08:47] pieterh	how so?
[08:47] sustrik	there's refernce counting going on there
[08:48] sustrik	which is implemented using atomic ops
[08:48] sustrik	which in turn means the memory bus is locked each time you do so
[08:48] pieterh	yes, but I'm copying the message each time now
[08:48] pieterh	that also locks the memory bus each time
[08:48] sustrik	yes, we can add the convenience flag
[08:49] sustrik	nope, copying doesn't lock the bus
[08:49] sustrik	well, unless there's contention between CPU cores on that particular cacheline
[08:49] pieterh	well, copying also uses atomic ops for reference counting
[08:49] sustrik	ah, you mean zmq_msg_copy
[08:49] sustrik	yes
[08:49] sustrik	so yes, we can add the flag
[08:50] pieterh	yes, there's no other way to send the same frame twice
[08:50] sustrik	what i'm saying is that it should not be the default
[08:50] pieterh	aight...
[08:50] pieterh	ah, certainly
[08:52] pieterh	well, I'll add this to the 3.0 page but I have no idea how to make it :-)
[08:52] pieterh	it would be useful, though IMO
[08:53] sustrik	it's backward compatible, so no need to solve it immediately
[08:54] sustrik	the backward incompatible changes are what's in focus now
[08:54] sustrik	they have to be done in a single go, if possible
[08:54] sustrik	to minimise the pain
[08:54] sustrik	the remaining functionality can be added gradually afterwards
[08:58] pieterh	I'm not sure the changes will be as painful as you imagine
[08:58] sustrik	dunno, but minimising the pain is a good thing in itself
[08:59] pieterh	yes, at least doing it all in one go
[10:59] pieterh	sustrik: I've built the basic API for the high-level C binding, at https://github.com/zeromq/zapi
[10:59] pieterh	will fill in the pieces over the next few days
[11:01] drbobbeaty	pieterh: if the C level binding is separate in 3.x, is the C++ binding as well? Are they different bindings? What comes "standard" with the ZMQ libraries? Any 'default' API?
[11:01] pieterh	drbobbeaty: yes, we plan to split off the C++ binding as well
[11:01] pieterh	the default API is the Core C API
[11:02] pieterh	the new C binding will work over 2.x as well as 3.x
[11:02] drbobbeaty	Ah! I see the advantage to having the separate binding now - bridge the versions. Nice plan.
[11:03] pieterh	also it makes it much easier to add useful functionality without breaking other language bindings
[11:03] pieterh	so we can e.g. write a C reactor without affecting the core
[11:18] Guthur	pieterh: I think I'll draw some inspiration from the new C binding
[11:19] pieterh	Guthur: could be fun, I've tried to use a class-oriented approach for most of it
[11:20] pieterh	I'll be converting the Guide examples to zapi when it's ready
[11:20] pieterh	so if the C++ binding is anything like that, it'll be a lot easier for those examples too
[11:21] pieterh	Guthur: when you need a repository created in the zeromq organization, give me a shout
[11:21] Guthur	C# you mean, hehe
[11:21] pieterh	oh, sorry
[11:21] pieterh	C#
[11:24] Guthur	When 0MQ 3.0 API is finalized I'm going to develop a new branch for clrzmq2
[11:24] Guthur	I have a far better idea now what works and what doesn't
[11:25] pieterh	please don't call it clrzmq3 :-)
[11:25] Guthur	yeah I'll resist that temptation
[11:25] Guthur	It will just be a branch of clrzmq2
[11:26] pieterh	though ... embracing the chaos... it could be useful
[11:26] pieterh	if binding versions track the development version
[11:26] pieterh	so people know that clrzmq3 supports 3.x and 2.x
[11:26] Guthur	the assembly will be version 3.x
[11:26] Guthur	the assembly is currently 2.1.x
[11:26] pieterh	problem is that the version number is in the github repo name
[11:27] Guthur	true
[11:27] pieterh	we have the same problem with zeromq2
[11:28] Guthur	not an easy decision to be honest
[11:28] Guthur	I really don't want to confuse potential/current users
[11:30] Guthur	pieterh: can you briefly explain you reactor pattern thing in the C binding
[11:31] pieterh	Guthur: to be honest I've never used a reactor pattern so this is kind of a guess
[11:31] pieterh	the idea is to register the events you want to handle
[11:31] pieterh	and then a tickless poll loop can handle it
[11:31] mikko	howdy boys
[11:31] mikko	and girls
[11:31] pieterh	hey mikko!
[11:31] Guthur	hi mikko
[11:31] Guthur	pieterh: what is the alarm part
[11:32] Guthur	and clock
[11:32] pieterh	Guthur: if you look at a realistic app like the Majordomo broker
[11:32] pieterh	then it mixes socket events with timer events
[11:32] pieterh	e.g. "send heartbeats every 3 seconds"
[11:32] pieterh	"kill server if no response in 2500 msecs"
[11:33] pieterh	I made a proper tickless poll loop in the flcliapi (freelance client)
[11:33] pieterh	it calculates the next timer event and polls that long
[11:33] Guthur	ok so that would be too all registered sockets
[11:33] pieterh	rather than polling every second or whatever
[11:33] Guthur	I haven't got as far as freelance yet
[11:34] pieterh	take a brief look at the flcliapi poll loop, if you want to understand zloop
[11:34] Guthur	I've just got to majordomo
[11:34] pieterh	:-)
[11:35] pieterh	the reactor won't work for all cases, sometimes we poll selectively
[11:35] pieterh	but it should help the more complex designs
[11:40] Guthur	I think I've sort of included a limited reactor pattern in clrzmq2
[11:42] Guthur	but unfortunately the incremental nature of the development of the polling mechanism has left the API in a state that is less clear that I would like
[11:42] Guthur	it's one of the main area's I would like to refactor
[11:44] Guthur	I'd also like to simplify the send/recv
[11:45] pieterh	in C# strings are just blobs, right?
[11:45] Guthur	they are objects
[11:46] pieterh	right
[11:47] Guthur	i'd rather deal with them in an msg object as oppose to at the socket level send/recv
[11:48] pieterh	the breakdown I'm using in zapi is frame vs. msg
[11:48] pieterh	where frame is one part, msg is a multipart object
[11:48] Guthur	I meant to ask about the framw
[11:48] Guthur	frame*
[11:48] pieterh	0MQ uses 'msg' to mean 'part', which is confusing
[11:49] Guthur	yeah, was taking more about the multipart you describe
[11:49] pieterh	perhaps I should use 'part' instead of 'frame'... anyhow
[11:49] pieterh	the frame class lets you do things like 'receive the identity' and 'send the identity' with ROUTER sockets
[11:50] pieterh	whereas the msg class is more like 'recv a list of frames' and 'send a list of frames'
[11:51] Guthur	I was also then going to take a lazy marshalling approach, only marshalling the zeromq msg (part) to a c# data type when required, should improved performance in situations where you don't need to know about the whole message
[11:52] pieterh	that makes sense, it's what I'm doing in other places
[11:52] Guthur	it's sort of all in my head at the moment though, I really should try to define the API like you have done
[11:52] pieterh	yeah, start with the API, it makes everything clearer
[13:03] drbobbeaty	I have a core dump with ZeroMQ 2.1.3 this morning... I have detailed it in this gist: https://gist.github.com/881176 . It includes the stack trace and the code from OpenPGM that's the last step in the trace.
[13:03] drbobbeaty	It seems impossible to be true -- if the assert is causing the exception, then the value of minor_bucket has to be NULL... but it's not, as shown in the stack trace.
[13:03] drbobbeaty	Is this a problem for Steve?
[13:04] drbobbeaty	(I've received this on two different boxes on four separate occasions this morning)
[13:06] drbobbeaty	HA! I think it's the data_size being 0!
[13:06] drbobbeaty	Any ideas as to why that would be?
[13:07] sustrik	drbobbeaty: you have to discuss that with steve-o
[13:07] sustrik	seem to be a problem with new version of opnepgm
[13:09] drbobbeaty	Steve-o: can you have a look at https://gist.github.com/881176 and give me an idea of why data_size == 0 on the call? I'm hitting the assertion and have no idea why.
[13:10] drbobbeaty	pieterh: should I just hit the mailing list for steve-o?
[13:10] pieterh	drbobbeaty: I'd do that, he's in Asia and probably out of the office by now
[13:11] pieterh	the good news is you can get a new OpenPGM and use that with 2.1.3 without further changes
[13:16] Guthur	sustrik: I see in the IPC discussion no one is actually talking about using IOCP and named pipes. I've been meaning to ask you about whether you think an abstraction layer over either Sockets & Named Pipes or IOCP itself could feasibly provide the necessary functionality to mimic what is required from poll, select etc in 0MQ
[13:16] Guthur	this would all be window centric changes of course
[13:22] drbobbeaty	Steve-o: if you get this, please check the mailing list... I'm getting more than two core dumps an hour with 2.1.3 due to this data_size == 0 issue. Yikes!
[13:23] pieterh	drbobbeaty: sorry about this, we don't have the facilities to properly test OpenPGM yet
[13:24] pieterh	I'd advise you to rollback to 2.1.2 until we get a fix to it
[13:24] drbobbeaty	I understand... If I had to guess it's an edge condition where data_size == 0, and I'm just hitting it more with all the exchange feeds I'm dealing with.
[13:24] sustrik	Guthur: what's being discussed is a workaround
[13:25] drbobbeaty	pieterh: that's a good plan.
[13:25] sustrik	something that would look like IPC but would actually be TCP
[13:25] pieterh	drbobbeaty: I expect tomorrow morning Steve will have an updated OpenPGM package
[13:25] sustrik	real solution is IPC & NamedPipes
[13:25] pieterh	you can install it and rebuild 2.1.3, --with-openpgm=<version> or somesuch, I'm not 100% sure on that syntax
[13:28] Guthur	sustrik: does the abstraction sound like a feasible objective?
[13:29] sustrik	Guthur: the abstraction exists already
[13:30] sustrik	check how poll_t, select_t, epoll_t etc. interface with the rest of the system
[13:30] sustrik	check whether IOCP can use the same interface
[13:30] sustrik	if not, propose changes
[13:31] Guthur	I thought the problem was that underneath those abstractions they use functionality that IOCP does not provide
[13:31] Guthur	IOCP only providing notification of completion
[13:32] sustrik	right, IOCP has an additional buffer between the user and the network
[13:32] sustrik	something like AIO
[13:32] sustrik	is there a way to limit the size of the buffer?
[13:32] Guthur	I can check that out
[13:33] sustrik	yes, please
[13:33] sustrik	if there's no limit to the buffer
[13:33] sustrik	it could easily exhaust all the memory
[13:33] sustrik	in such case the code using IOCP would have to keep track of amount of memory in use
[13:35] Guthur	oh, that does not sound ideal
[13:45] sustrik	cremes: hi
[13:46] cremes	sustrik: good morning
[13:46] sustrik	morning
[13:46] sustrik	as for the assertion, any chance of getting backtrace?
[13:47] cremes	sustrik: i can try to capture it in gdb; give me a few minutes and i'll see what i can do
[13:47] sustrik	ok
[13:47] sustrik	another thing: the allocation mechanism you proposed
[13:47] sustrik	have you found out where the memory disappers?
[13:48] sustrik	(issue 174)
[13:48] cremes	sustrik: i need to run your code with the patch you supplied on my linux box
[13:48] cremes	i ran it on my osx box and it was disappearing in the same place as before
[13:48] cremes	(the backtrace in issue 174)
[13:52] sustrik	i though the proposal you made is related to 174
[13:54] cremes	sustrik: it was; based on your feedback i was assuming the memory growth was due to page fragmentation caused
[13:54] cremes	by small memory allocations
[13:55] sustrik	afaik most allocators have per-size caches
[13:55] sustrik	so you have special cache for 16 byte blocks
[13:56] sustrik	another one for 32 byte blocks etc.
[13:56] Guthur	sustrik: you can indeed specify a buffer size
[13:56] sustrik	that in turn leads to optimal heap utilisation
[13:56] Guthur	I need to check some of my resources at home though to remind myself of details of IOCP
[13:56] cremes	sustrik: okay, then i don't understand why in the ticket you wrote that it's due to a known problem with malloc
[13:56] sustrik	Guthur: how do you do that?
[13:57] Guthur	sustrik: during the read call
[13:57] Guthur	you specify a buffer and bytestoread
[13:58] sustrik	Guthur: i meant limiting the send buffer
[13:58] sustrik	say, totoal amount that can be used is 64kB
[13:58] sustrik	attempt to exceed the buffer would mean the send call would fail
[13:59] sustrik	is there anything like that in IOCP?
[13:59] sustrik	cremes: the problem i referred to is that processes don't return allocated memory to the OS
[14:00] sustrik	thus, it's not used, but cannot be reused by a different process
[14:00] cremes	sustrik: sure, and from my research that is due to page fragmentation caused by malloc/free called on lots of small blocks (smaller than a page)
[14:01] sustrik	does it return allocated memory at all?
[14:01] cremes	yes, if an entire page can be freed
[14:01] sustrik	that's linux?
[14:01] cremes	yes
[14:01] sustrik	nice
[14:02] cremes	i got this from a few different write-ups that i read on stackoverflow and one other site
[14:02] sustrik	iirc it wasn't the case in the past
[14:02] cremes	i'll try to find them again
[14:02] Guthur	sustrik: do you mean more than http://msdn.microsoft.com/en-us/library/aa365748(v=VS.85).aspx
[14:02] cremes	yes, it appears that was an issue with kernel 2.4 and earlier
[14:02] cremes	apparently 2.6 resolves that issue
[14:05] Guthur	you could presumably build some chunking mechanism on top of that with IOCP
[14:05] Guthur	send chunk 1 with Completed send chunk 2....
[14:05] Guthur	with/when
[14:09] sustrik	Guthur: yes, something like that
[14:09] sustrik	cremes: nice
[14:10] sustrik	now, the problem occurs because of 2 specific allocations
[14:10] sustrik	according to the OSX tool
[14:10] sustrik	1. allocating the chunk in yqueue_t
[14:10] sustrik	2. allocating the encoder/decoder buffers in engine
[14:11] sustrik	so, afaiu, if the size of those allocations is a muliply of page size, the problem should go away, right?
[14:11] sustrik	as for 2 the size of those buffers is 8kB
[14:12] sustrik	so there's no fragmentation issue
[14:12] sustrik	hm, for 1. the size is 12kB
[14:12] sustrik	so the fragmentation should not happen
[14:13] sustrik	but given that the test is run on OSX, the allocation mechanism may be different
[14:13] cremes	sustrik: true, but the memory growth is also reproducible on linux
[14:14] cremes	i just don't have a tool there to show where it's happening
[14:14] sustrik	:(
[14:14] cremes	i am assuming it occurs in the same place on both OSes
[14:14] sustrik	you'll have to run the test with my patch
[14:14] cremes	sustrik: just to confirm, you were able to reproduce this on your linux system with that example code, right?
[14:14] sustrik	that'll at least show whether it's the yqueue issue
[14:15] sustrik	yes, i think so
[14:15] sustrik	i assumed it was process not returning memory to the OS
[14:15] sustrik	but given the issue was solved in linux/2.6
[14:15] sustrik	it has to be something different
[14:16] cremes	sustrik: after i try to get this backtrace for the other issue (181) i'll apply your patch on linux and run it
[14:16] sustrik	thanks
[14:20] Guthur	pieterh: with MDP do you see anything inherently wrong with having workers being other brokers
[14:29] Guthur	maybe even have service requests routed via a URI
[14:52] pieterh	Guthur: workers can certainly be brokers as well
[14:54] cremes	sustrik: got a core; do you want the output from 'bt' or from 'thread apply all bt'?
[14:54] sustrik	presumably the latter
[14:55] cremes	sustrik: ok... i'll give you both :)
[14:55] sustrik	thx
[14:55] cremes	sustrik: https://gist.github.com/881336
[14:56] cremes	btw, this is off of commit 1619b3d84a04fe1886347fd83280a607
[14:59] sustrik	cremes: the assertion happens in mutex destructor
[14:59] sustrik	interesting
[15:00] sustrik	do you console output?
[15:00] sustrik	it should print out the actual error code
[15:00] cremes	i don't understand the question; i have a core file... is there something else you want me to look at?
[15:00] sustrik	errno
[15:00] cremes	p errno?
[15:00] sustrik	yes
[15:00] sustrik	in the asserted thread
[15:01] sustrik	thread 1
[15:01] sustrik	p errno
[15:01] cremes	Cannot find thread-local variables on this target
[15:01] sustrik	:\|
[15:02] sustrik	does the program show the console output
[15:02] sustrik	?
[15:02] sustrik	stderr
[15:02] sustrik	?
[15:02] sustrik	if so, the error should be visible there
[15:03] cremes	looking...
[15:03] sustrik	you should see the assert there
[15:04] sustrik	the line above it should be the error
[15:04] cremes	all it printed was the assert from socket_base.cpp
[15:05] sustrik	the stack trace shows a different assert
[15:05] sustrik	is that the same run?
[15:05] cremes	let me run it again and put everything to the console instead of to files (i usually redirect the output)
[15:05] cremes	same run
[15:05] sustrik	strange
[15:06] cremes	running again...
[15:12] cremes	sustrik: all it prints is:
[15:12] cremes	Assertion failed: sessions.empty () (socket_base.cpp:127)
[15:12] cremes	Aborted (core dumped)
[15:14] cremes	the thread backtraces are the same for this core
[15:15] sustrik	that's really strange
[15:15] sustrik	thread 1 reports the assertion happened in mutex.hpp
[15:16] sustrik	rather than in socket_base.cpp
[15:16] sustrik	maybe it's optimiser's fault
[15:16] cremes	i don't have an explanation :(
[15:16] pieterh	cremes: are you building a debug version?
[15:16] sustrik	any chance of building 0mq with optimisations disabled?
[15:16] cremes	pieterh: yes, ./configure --enable-debug
[15:16] cremes	sustrik: yes
[15:17] sustrik	--enable-debug should turn optimisations off
[15:17] sustrik	(-O0)
[15:17] cremes	let me rebuild
[15:17] cremes	i'll make clean first...
[15:18] sustrik	cremes: wait a sec
[15:18] sustrik	maybe it makes more sense to try to figure out what's failing
[15:18] sustrik	are you using identities?
[15:18] cremes	oops, too late
[15:18] cremes	yes, i am using identities
[15:19] sustrik	the failing socket seems to be xreq, right?
[15:19] cremes	yes
[15:20] sustrik	any chance it gets connected to two peers that happen to have the same identity
[15:20] sustrik	one of them via connect, other one via bind?
[15:20] cremes	let me take a quick look at the code; i want to say "no" but let me verify
[15:21] sustrik	simpler question: do you bind or connect the xreq socket; or both?
[15:22] cremes	hard to say
[15:22] cremes	if the xreq is part of a QUEUE device, then it's binding
[15:22] cremes	otherwise all other sockets connect
[15:23] sustrik	never both on the same socket, right?
[15:23] cremes	correct
[15:24] sustrik	hm, maybe the socket_base_t happens to get destructed twice
[15:24] sustrik	is it possible to add printf's to your program?
[15:25] cremes	sustrik: yes
[15:25] cremes	you want them added to calls to zmq_close() ?
[15:25] sustrik	something like this:
[15:25] sustrik	printf ("alloc %p\n", (void*) this);
[15:26] sustrik	in socket_base_t constructor
[15:26] sustrik	and
[15:26] sustrik	printf ("dealloc %p\n", (void*) this);
[15:26] sustrik	in the destructor
[15:26] sustrik	that should show us whether the destructor is called twice for the same object
[15:27] cremes	in the destructor, do you want this printf called first or last?
[15:27] cremes	presumably first
[15:27] sustrik	first
[15:28] cremes	running...
[15:28] cremes	hmmm, i should have separated that stuff out to stderr
[15:31] cremes	how do i redirect stdout and stderr to separate files in bash?
[15:33] cremes	figured it out
[15:33] pieterh	cremes: someproc 2> stderr.log
[15:33] cremes	running...
[15:38] mikko	pieterh: did you merge the pull req?
[15:38] mikko	there is prolly more coming soon
[15:38] mikko	([zeromq-dev] ZMQ 2.1.3 w/OpenPGM Assertion on pgm_rate_check2())
[15:38] pieterh	mikko: you mean the autoconf fix for OpenPGM in 2.1?
[15:38] mikko	y
[15:39] mikko	im thinking this atm
[15:39] pieterh	hmm, did you see my email about sending patches instead?
[15:39] pieterh	I think it's going to be simpler
[15:39] pieterh	pubsub instead of reqrep
[15:39] mikko	"There is a smarter way to do this, and I think it is:"
[15:39] pieterh	two subscribers (martin, myself), one publisher (you)
[15:40] mikko	i will update a patch and send to ML
[15:40] pieterh	that's best IMO
[15:41] pieterh	somewhere there's a magic git incantation to pull a commit from a random git, but I don't know it
[15:41] mikko	what do you mean?
[15:41] mikko	git is built on that sort of thing
[15:41] sustrik	cherry-pick?
[15:41] mikko	you could even pull changes from my personal repo
[15:42] mikko	and if i was privileged to i could push to your home machine
[15:42] cremes	sustrik: with a confirmed debug build, the backtraces are different now
[15:42] cremes	sustrik: https://gist.github.com/881439
[15:42] cremes	thread 1 and thread 19 look relevant
[15:42] cremes	i still can't print errno
[15:44] sustrik	never mind, this is the assert (empty) thing
[15:44] sustrik	what about the output?
[15:44] pieterh	mikko, ok, so how do we do this without using github's pull request?
[15:44] cremes	lots of alloc/dealloc messages
[15:44] pieterh	e.g. create an issue pointing to a remote git/commit
[15:45] mikko	pieterh: or mailing list?
[15:45] pieterh	let's work it out here...
[15:45] sustrik	cremes: check for 0x7f1a582a4ea0
[15:45] pieterh	what git commands do I need to use to pull one commit from your git
[15:45] cremes	sustrik: 1 alloc, 1 dealloc
[15:46] cremes	sustrik: however, that is the last dealloc printed
[15:46] mikko	git remote add mikko <url to my repo>
[15:46] pieterh	ok
[15:46] sustrik	and the printf is at the beginning of the destructor, right?
[15:46] mikko	git fetch mikko
[15:46] mikko	then check the hash of the commit
[15:46] cremes	correct
[15:46] pieterh	git fetch --no-tags, at least
[15:46] mikko	and cherry-pick it
[15:46] mikko	i think that should do it
[15:46] sustrik	then it's some other problem...
[15:46] cremes	sustrik: line 792 is the first line of the destructor
[15:47] pieterh	git fetch won't make a mess of your target git?
[15:47] cremes	sustrik: i grep'ed all of the dealloc lines to their own file 'z2'
[15:47] sustrik	792?
[15:47] sustrik	what file?
[15:47] cremes	sustrik: then ran: wc -l z2; sort -u z2 \| wc -l
[15:47] sustrik	socket_base_t?
[15:47] cremes	socket_base.cpp
[15:47] mikko	pieterh: git-fetch - Download objects and refs from another repository
[15:47] mikko	pieterh: nope
[15:47] sustrik	cremes: what version?
[15:48] sustrik	my socket_base.cpp ends at 790
[15:48] pieterh	mikko: ok, I like cherry-pick, it's how I pull patches from sustrik
[15:48] cremes	sustrik: oops, sorry, misread the vi output; line 124 of 792 (i added two printfs)
[15:48] pieterh	but you have to do 'git fetch --no-tags whatever' to avoid polluting your home git
[15:48] sustrik	cremes: ok
[15:48] pieterh	sustrik: do you want to try cherry-pick from mikko's git?
[15:49] sustrik	i want a patch on the ML
[15:49] sustrik	that's the process
[15:49] cremes	sustrik: anyway, by removing duplicate dealloc prints using 'sort -u' it looks like 180 sockets are dealloc'ed twice
[15:49] pieterh	sustrik: you don't send me patches via the ML
[15:49] sustrik	i should
[15:49] pieterh	so the process isn't symmetric now
[15:49] pieterh	perhaps
[15:50] pieterh	sending me a commit tag is much less work for you
[15:50] pieterh	and also easier for me
[15:50] pieterh	if you send the URI to the commit, it's as good as a patch
[15:50] sustrik	that's good for backporting
[15:50] pieterh	everyone who cares can review it
[15:50] pieterh	here we're porting in all directions, no back or fwd
[15:50] sustrik	it's only a reference to what has to be backported
[15:50] pieterh	nope, no special cases please
[15:50] sustrik	with new functionality you want a full patch
[15:51] pieterh	it should be a single process between any two gits
[15:51] pieterh	github pull requests are eliminated because they only work between forks
[15:52] sustrik	there has to be an entry point
[15:52] pieterh	entry point?
[15:52] sustrik	the place where patch enters the ecosystem
[15:52] pieterh	one always addresses a patch/commit to the author of the target git
[15:52] pieterh	entry point can be ML, that's excellent
[15:52] pieterh	it's ideal for this
[15:52] sustrik	yup
[15:52] pieterh	but signed patches are extra effort
[15:52] pieterh	and we can do it better
[15:53] pieterh	i think, anyhow, and I'm sure it should be orthogonal for any two gits
[15:53] pieterh	i.e. if you want to send a patch to mikko, same process
[15:53] sustrik	mikko doesn't care about sign-offs
[15:53] sustrik	as it's his private git
[15:54] pieterh	that's a separate issue
[15:54] pieterh	signed off or not, that's QC on the patch or commit
[15:54] pieterh	I'd like a single process for all of us, no special cases
[15:54] mikko	what about CLA?
[15:54] mikko	too complicated?
[15:54] pieterh	CLA?
[15:55] pieterh	way too complex
[15:55] pieterh	by 100x
[15:55] mikko	contributor license agreement
[15:55] mikko	like apache etc do
[15:55] pieterh	no way, dead body over, etc.
[15:55] sustrik	sign-off is simpler imo
[15:55] pieterh	it creates huge management issues and individual developers hate it to the point of not contributing
[15:56] pieterh	we switched to co-owned code and signoff last year, it's superior in every sense
[15:56] pieterh	this is not relevant to the thread however
[15:56] pieterh	cherry-picking is very simple for both parties
[15:56] mikko	pieterh: we were a bit too drunk to discuss this properly last weds
[15:56] pieterh	mikko: drunk? I'm Scottish, we don't get drunk
[15:57] pieterh	oh, hang on, yes we do... but it was English beer, doesn't count
[15:57] pieterh	sustrik: I'm not sure cherry-picking conforms to signing off, that's all
[15:58] pieterh	but it has to, of course
[15:58] mikko	can you not sign a commit?
[15:58] pieterh	pulling code from a public git means it's by definition signed off
[15:58] pieterh	you don't need to
[15:58] pieterh	once you commit & push to a git, you license the code under whatever, LGPL etc.
[15:59] pieterh	this is why pull requests work
[15:59] pieterh	but they are crippled for general use
[15:59] pieterh	sustrik: allow me to make a short proposal on the list, OK?
[16:00] sustrik	sure
[16:00] pieterh	thanks
[16:00] pieterh	BTW the new C API is almost working
[16:00] pieterh	it does everything I was asking for, e.g. automatic close of sockets, message reuse, etc.
[16:00] pieterh	nicely emulates 2.0 semantics on termination :-)
[16:01] sustrik	goodo
[16:01] cremes	sustrik: i applied your patch for issue 174 and ran your minimal test case (also in that issue)
[16:01] cremes	sustrik: shall i email you the output?
[16:01] cremes	it's too big to pastie
[16:04] Guthur	pieterh: you're scottish?
[16:04] pieterh	Guthur: half Scottish, half Belgian...
[16:04] pieterh	born somewhere in the middle of the North Sea
[16:06] Guthur	one of those water births then
[16:07] pieterh	lol
[16:08] Guthur	belgians aren't renown for their seafaring as well
[16:09] Guthur	I hear they don't have much of a navy
[16:10] pieterh	my great-great grandfather was hung for being a pirate
[16:11] sustrik	cremes: did it drop to 0?
[16:11] cremes	sustrik: no, it didn't; i just sent you the output in an email with attachment
[16:12] sustrik	ok
[16:13] sustrik	cremes: it's looks like it's dropping
[16:13] sustrik	see the end of the file
[16:13] sustrik	if you let it run a bit longet it would probably get to 0
[16:14] sustrik	longer*
[16:14] cremes	sustrik: i let it run for 5 minutes... it didn't appear to be dropping anymore
[16:15] cremes	sustrik: i'll let it go for 20m and see what happens
[16:15] sustrik	it seems like it was terminated in the middle of the process of dropping
[16:15] sustrik	see the last line
[16:15] sustrik	it's cut in the middle of output
[16:16] sustrik	356 355 354 35
[16:16] cremes	sustrik: i think that's from buffered output
[16:16] cremes	i was tailing the log file and it stopped there
[16:16] sustrik	ah
[16:16] sustrik	anyway, it dropped almost to zero
[16:16] sustrik	up to 30,000
[16:16] sustrik	then back to 353
[16:17] sustrik	the leak is either elsewhere
[16:17] sustrik	or it's process not returning memory to the OS
[16:18] cremes	sustrik: i'll let it sit for a while... i just saw it go from 1 to 3200 and now it's down to 310
[16:18] cremes	but it's not printing anything anymore
[16:18] sustrik	hm
[16:18] cremes	if it's 'cut off' due to stdout buffering, hitting ctrl-c ought to flush that buffer
[16:19] sustrik	maybe a problem with the buffering itself?
[16:19] sustrik	anyway, i've used simple non-buffered console
[16:19] sustrik	and i saw the number of chunks dropping to 0
[16:19] cremes	i don't know how to get an unbuffered xterm going
[16:20] sustrik	cremes: i don't think it matters
[16:20] cremes	i should probably add an explicit flush
[16:20] sustrik	350 chunks won't account for all the allocated memory you see
[16:20] sustrik	right?
[16:20] sustrik	1 chunk = 12kb
[16:20] mikko	fflush (stdout);
[16:20] sustrik	x350 = 4.2MB
[16:20] cremes	sustrik: do those chunks contain message data?
[16:21] sustrik	how large is your message?
[16:21] sustrik	8 bytes?
[16:21] cremes	your test case was using a 20byte message
[16:21] sustrik	ok
[16:21] sustrik	then the chunks contain message data
[16:23] cremes	i added fflush() and am rerunning; we'll see if that makes a difference
[16:23] cremes	it did; count went to 0
[16:23] cremes	resident memory is still inflated though
[16:24] sustrik	the next step would be to replace all the mallocs and frees and news and deletes
[16:24] sustrik	by something that would track amount of memory allocated
[16:24] cremes	sustrik: i think it's easier for me to just restart my applications every 48 hours
[16:24] cremes	to free up swap
[16:25] sustrik	hm
[16:25] sustrik	let's keep the issue open then
[16:25] sustrik	when i get some time
[16:25] cremes	ok
[16:25] cremes	i'll add an update with my findings
[16:25] spht	Does zmq not exit in its SIGINT handler? Can't SIGINT to kill my script using the python binding..
[16:26] sustrik	i'll try to check whether the memory is held by 0mq or glibc
[16:27] sustrik	spht: what version of 0mq are you using?
[16:27] spht	sustrik: Python module version is 2.0.10
[16:28] sustrik	there used to be a problem with SIGINT in old versions, let me check...
[16:29] cremes	spht: this was fixed in the 2.1 branches
[16:29] cremes	spht: time to upgrade
[16:29] spht	cremes: sustrik: Ahht thanks! I'll go upgrade ASAP :)
[16:29] ianbarber	mikko: about?
[16:29] mikko	ianbarber: yes
[16:29] mikko	bam!
[16:31] ianbarber	bam!
[16:31] ianbarber	so, I was translating a pieterh example
[16:31] ianbarber	which measures round trip speed
[16:31] ianbarber	it starts a client, worker and broker, and tries doing send() recv() 10000 time and then send() 10000 and recv() 10000
[16:32] ianbarber	to compare the speed of waiting for a response versus async
[16:32] ianbarber	in the c version I get ~6k/s for the first, around 50k/s for the second
[16:32] ianbarber	in the PHP i get ~6k/s for both
[16:33] mikko	that's interesting
[16:33] ianbarber	i thought it might be because PHP was doing a msg destroy after each one, but taking that out of the c version didn't affect it
[16:34] mikko	probably need to run with a profiler to see
[16:35] ianbarber	yeah, i guess
[17:01] pieterh	ianbarber: destroying messages, allocs, etc. don't make that much impact
[17:06] ianbarber	yeah, that's what i concluded.
[18:32] picasso	i'm designing an analytics system that needs to handle potentially heavy loads and usage spikes
[18:32] picasso	creating a web service API (preferrably in PHP), packaging incoming requests and dumping onto a message queue, and then processing these requests in the background
[18:33] picasso	i apologize for not being too familiar with zeroMQ, but is this a type of problem that would be a good use case for zero?
[18:43] cremes	picasso: yep, it sure would
[18:43] cremes	have you read the guide? http://zero.mq/zg
[18:43] cremes	it covers several use-cases
[18:46] picasso	seems i have quite a bit of reading to do :)
[19:13] cremes	picasso: just a little bit!
[19:49] michelp	i'm just reading the news about Sergey Aleynikov, was he a 0mq developer?
[19:49] michelp	i saw his github (maybe bitbucket) googleing around a few days ago but now it appears to be gone
[19:49] michelp	and it looked like he had made some contributions
[19:50] michelp	weird that i can't find it now, they must have deleted it
[19:52] pieterh	michelp: he wrote the original Erlang binding for 0MQ
[19:52] pieterh	user id on github is saleyn
[19:53] michelp	i'm not up on the details, but his sentence seems extremely harsh for just moving code, it doesn't sound they proved he used it in any way
[19:56] michelp	ah it's still up on github
[19:57] cremes	michelp: it doesn't matter if he used it or not; let's say you take a candy bar from the store but you don't eat it; has it been stolen?
[19:57] cremes	or not?
[19:58] michelp	yeah i guess it has. the couple of stories i've read so far don't have much detail, but i just found one that does
[19:59] pieterh	michelp: crimes against property are usually treated harshly unless you're wealthy
[19:59] spht	dang: He was employed for two years at Goldman on a salary of $400,000. In early June, he left Goldman to join Teza Technologies, a Chicago start-up which offered to triple his pay.
[20:00] spht	$1.2M coding, that's pretty decent I'd say :)
[20:08] pieterh	aight, zapi the high-level C binding for 0MQ is now ready and working!
[20:08] pieterh	http://zero.mq/c
[20:09] mikko	cool!
[20:09] mikko	did you take zfl build as base?
[20:09] pieterh	yes, for the autotools
[20:09] mikko	very good
[20:10] mikko	should be about the same process
[20:10] pieterh	the only change was to switch to valgrind for the selftest script
[20:10] pieterh	much better than whatever heap checker I was using in ZFL
[21:24] pieterh	sigh
[22:00] Guthur	pieterh, you work fast
[22:00] pieterh	yeah
[22:00] pieterh	am just making packages now
[22:00] pieterh	beta release in one day
[22:00] pieterh	will aim to also upload man pages to a web site, should be doable
[22:01] pieterh	first replying to people who confuse posix streams with length-specified messages
[22:01] pieterh	sigh
[22:01] Guthur	In the same time I haven't got nearly as far with the 0MQ-FIX bridge
[22:01] Guthur	admittedly I had to go to work in the middle
[22:03] Guthur	robustly extracting a FIX message from a socket wrote buffer is non too trivial exercise though
[22:03] pieterh	FIX is complex
[22:03] pieterh	that's why I suggested doing that totally separately
[22:05] Guthur	it's actually my first time working directly with raw sockets as well
[22:05] Guthur	so all a bit of a learning curce
[22:05] Guthur	curve*
[22:24] pieterh	raw sockets are pretty OK if you're not trying to be multithreaded or something
[22:31] Guthur	hehe, I quickly decided not to try that
[22:32] Guthur	it's not so much the Sockets it's extracting the individual messages from the stream
[22:32] Guthur	ensuring they are complete and valid
[22:34] Guthur	just trying to find the nice solution, I've made a state transition table, which should help
[22:34] Guthur	and just ran a small socket demo to sanity check my socket understanding
[22:34] Guthur	so all set now to do the real work
[22:46] pieterh	Guthur: are you working in C or C# for this?
[22:49] Guthur	C# for the time being, I think it might suit being implement in C, if its 'dumb' enough
[23:01] pieterh	I'm really curious to see how you get with this