Tuesday March 22, 2011

[Time] NameMessage
[01:02] reiddraper I'm seeing ~4K requests/second for a REQ-REP socket with two python processes. The are sending "hello". Is this performance expected?
[01:04] jhawk28 not sure about python, but it usually depends
[01:04] jhawk28 on the network, the computing platform, and the language bindings
[01:05] reiddraper jhawk28: localhost, ubuntu
[01:06] reiddraper jhawk28: suppose I was expecting it to be an order of magnitude faster
[01:06] jhawk28 are you just req: Hello and Rep: hello?
[01:06] reiddraper yes
[01:07] reiddraper zeromq 2.1.3
[01:07] jhawk28 let me do a quick one in Java
[01:07] reiddraper same performance with ipc and tcp
[01:08] reiddraper jhawk28: thanks
[01:21] andrewvc cremes: around?
[01:21] andrewvc cremes: Wondering if you'd mind me releasing 0.7.3 w/ zdevice support
[01:23] jhawk28 reiddraper: I'm getting about 10k/s
[01:25] reiddraper jhawk28: ok, seems reasonable that python would be 4K then
[01:26] jhawk28 thats Java with 2.1.3 on a 2.4 Core i5 (OSX)
[01:30] jhawk28 reiddraper: push/pull gets 2mil/s
[01:30] reiddraper jhawk28: pretty big difference
[01:31] reiddraper only really surprised because I've seen http servers do more req/s than what I'm seeing
[01:33] jhawk28 req/rep is synchronous
[01:33] jhawk28 single threaded both sides
[01:34] reiddraper yeah, figured it would have to be, that being said, so is something like Redis, which gets 10's of thousands of operations / second, over tcp
[01:34] reiddraper req-rep
[01:47] jhawk28 reiddraper: increase the number of clients
[01:48] jhawk28 when I bump up the number of clients, I am getting 25k/sec
[01:49] reiddraper jhawk28: Cool, and to be honest, for what I have in mind, 4k/sec is plenty fast
[01:51] jhawk28 I could probably scale it more if I used XREP
[01:52] jhawk28 and actually split it between machines
[01:55] reiddraper jhawk28: can you explain the difference betwen xreq/rep and req/rep?
[01:56] jhawk28 req/rep is syncronous, xreq/xrep uses identities
[01:56] reiddraper ah, OK
[01:56] jhawk28 the identities are then used by zmq to route the response back to the correct socket
[01:58] jhawk28 thats as much as I know
[01:58] jhawk28 I haven't used them much
[01:58] jhawk28 its on my todo list...
[02:00] reiddraper jhawk28: ok, so you don't get slowed down by slowest client that is load-balanced
[02:01] jhawk28 or worker
[02:01] jhawk28 thats what xreq is for (for dealing out work)
[02:04] reiddraper ok, so for a queue broker, the clients sending xreq (give me a task), and the broker sending xrep (do this) makes sense?
[02:07] jhawk28 Chapter 3 does a good job explaining it:
[02:08] reiddraper jhawk28: awesome. thanks
[02:16] believa newb question - are the following statements true? You cannot "bind" multiple sockets to the same endpoint. You "can" connect multiple sockets to the same endpoint. A socket can "bind" and/or "connect" to multiple endpoints.
[02:18] jhawk28 Yes, yes
[02:19] jhawk28 yes
[02:19] believa jhawk28: thanks for the confirmation
[02:20] neopallium believa: you can bind one socket to multiple different endpoints, but you can't bind multiple sockets to the same endpoint.
[02:21] believa neopallium: gotcha
[02:21] neopallium just like you can't bind multiple tcp sockets to the same port on the same computer.
[02:22] believa neopallium: that should result in EADDRINUSE right?
[02:30] neopallium believa: yes
[02:30] believa neopallium: thanks
[07:47] sustrik reiddrapper: req/rep is lock-step; the performance is determined by the latency of your network
[07:47] sustrik the actual messaging fabric is almost irrelevant
[08:09] pieterh good morning
[08:10] pieterh sustrik: we had this question twice in a day, perhaps worth some explanation in the guide
[08:12] sustrik yes, that would be good
[08:12] sustrik alternatively
[08:12] sustrik there's a page about rinning perf tests
[08:12] sustrik let me see
[08:12] pieterh perhaps an explicit page we can refer to upfront, yes
[08:12] pieterh e.g. expected throughput and latency of each pattern
[08:13] pieterh very rough, but to set expectations properly
[08:14] sustrik
[08:20] pieterh sustrik: it's not very useful to beginners IMO
[08:21] pieterh I'll think about how to explain this, it's got to be in terms of limits, capacity, speed of different patterns & transports
[08:21] pieterh like a spec sheet
[08:21] sustrik let me send you a diagram
[08:22] pieterh sure
[08:39] pieterh sustrik: random idea for cleaner semantics on multipart messages
[08:39] pieterh make the MORE bit a property of a frame (zmq_msg_t) rather than a socket
[08:40] sustrik that's how it works on wire level
[08:40] pieterh it would also make more sense at the API level IMO
[08:40] sustrik with API it's a pain to use
[08:40] sustrik i though of combining the two approaches
[08:40] pieterh it means I can prepare a frame and write it with a generic method
[08:40] sustrik yes
[08:40] pieterh if you consider zmq_msg_t as a 'smart blob' (and I like this), then it should have a more property
[08:41] sustrik the problem is this:
[08:41] sustrik zmq_send (msg, MSG_SNDMORE);
[08:41] sustrik vs.
[08:41] sustrik zmq_msg_setflag (msg, ZMQ_MORE, 1);
[08:41] pieterh you can always do both
[08:41] sustrik zmq_send (msg, 0);
[08:41] sustrik yes
[08:42] pieterh I see the zmq_send (..., MSG_SENDMORE) as either an optimization or an override
[08:42] sustrik yes
[08:42] sustrik convenience feature
[08:42] pieterh e.g. if I have an identity frame, and want to send it, it's always going to be MORE
[08:43] pieterh read / write then become symmetric
[08:43] sustrik yes, it would simplify devices
[08:43] pieterh yes
[08:43] pieterh indeed, any generic handling of multipart messages becomes cleaner
[08:43] sustrik ack
[08:43] pieterh another question, is it necessary to destroy a message after sending it?
[08:43] pieterh sending the same frame N times is rather clumsy today
[08:43] sustrik it you don't nothing happens
[08:44] sustrik but it's safer to do so for forward compatibility
[08:44] pieterh ah, you mean _close is optional
[08:44] sustrik nope
[08:44] sustrik technically, closing empty message translates to noop
[08:45] sustrik however, that is not guaranteed to hold in future versions of 0mq
[08:45] sustrik so, preferable, close the messages so that 0mq can hook into message destruction process
[08:45] sustrik preferably*
[08:46] pieterh To send a message twice I need to:
[08:46] pieterh zmq_msg_t copy;
[08:46] pieterh zmq_msg_init (&copy);
[08:46] pieterh zmq_msg_copy (&copy, &original);
[08:46] pieterh zmq_send (socket, &copy, 0);
[08:46] pieterh zmq_msg_close (&copy);
[08:46] sustrik yes
[08:46] sustrik well, you should close the original as well
[08:46] pieterh so my question is whether it's necessary for 0MQ to destroy the message after sending
[08:46] sustrik unless you are going to use it
[08:46] sustrik not now
[08:46] sustrik may be necessary in future
[08:47] pieterh could I have a flag saying, "don't nullify after sending"?
[08:47] pieterh MSG_REUSE
[08:47] sustrik ah, a convenience feature
[08:47] sustrik you can have that, but you should be aware it's slow
[08:47] pieterh slower than creating copies each time?
[08:47] pieterh how so?
[08:47] sustrik there's refernce counting going on there
[08:48] sustrik which is implemented using atomic ops
[08:48] sustrik which in turn means the memory bus is locked each time you do so
[08:48] pieterh yes, but I'm copying the message each time now
[08:48] pieterh that also locks the memory bus each time
[08:48] sustrik yes, we can add the convenience flag
[08:49] sustrik nope, copying doesn't lock the bus
[08:49] sustrik well, unless there's contention between CPU cores on that particular cacheline
[08:49] pieterh well, copying also uses atomic ops for reference counting
[08:49] sustrik ah, you mean zmq_msg_copy
[08:49] sustrik yes
[08:49] sustrik so yes, we can add the flag
[08:50] pieterh yes, there's no other way to send the same frame twice
[08:50] sustrik what i'm saying is that it should not be the default
[08:50] pieterh aight...
[08:50] pieterh ah, certainly
[08:52] pieterh well, I'll add this to the 3.0 page but I have no idea how to make it :-)
[08:52] pieterh it would be useful, though IMO
[08:53] sustrik it's backward compatible, so no need to solve it immediately
[08:54] sustrik the backward incompatible changes are what's in focus now
[08:54] sustrik they have to be done in a single go, if possible
[08:54] sustrik to minimise the pain
[08:54] sustrik the remaining functionality can be added gradually afterwards
[08:58] pieterh I'm not sure the changes will be as painful as you imagine
[08:58] sustrik dunno, but minimising the pain is a good thing in itself
[08:59] pieterh yes, at least doing it all in one go
[10:59] pieterh sustrik: I've built the basic API for the high-level C binding, at
[10:59] pieterh will fill in the pieces over the next few days
[11:01] drbobbeaty pieterh: if the C level binding is separate in 3.x, is the C++ binding as well? Are they different bindings? What comes "standard" with the ZMQ libraries? Any 'default' API?
[11:01] pieterh drbobbeaty: yes, we plan to split off the C++ binding as well
[11:01] pieterh the default API is the Core C API
[11:02] pieterh the new C binding will work over 2.x as well as 3.x
[11:02] drbobbeaty Ah! I see the advantage to having the separate binding now - bridge the versions. Nice plan.
[11:03] pieterh also it makes it much easier to add useful functionality without breaking other language bindings
[11:03] pieterh so we can e.g. write a C reactor without affecting the core
[11:18] Guthur pieterh: I think I'll draw some inspiration from the new C binding
[11:19] pieterh Guthur: could be fun, I've tried to use a class-oriented approach for most of it
[11:20] pieterh I'll be converting the Guide examples to zapi when it's ready
[11:20] pieterh so if the C++ binding is anything like that, it'll be a lot easier for those examples too
[11:21] pieterh Guthur: when you need a repository created in the zeromq organization, give me a shout
[11:21] Guthur C# you mean, hehe
[11:21] pieterh oh, sorry
[11:21] pieterh C#
[11:24] Guthur When 0MQ 3.0 API is finalized I'm going to develop a new branch for clrzmq2
[11:24] Guthur I have a far better idea now what works and what doesn't
[11:25] pieterh please don't call it clrzmq3 :-)
[11:25] Guthur yeah I'll resist that temptation
[11:25] Guthur It will just be a branch of clrzmq2
[11:26] pieterh though ... embracing the chaos... it could be useful
[11:26] pieterh if binding versions track the development version
[11:26] pieterh so people know that clrzmq3 supports 3.x and 2.x
[11:26] Guthur the assembly will be version 3.x
[11:26] Guthur the assembly is currently 2.1.x
[11:26] pieterh problem is that the version number is in the github repo name
[11:27] Guthur true
[11:27] pieterh we have the same problem with zeromq2
[11:28] Guthur not an easy decision to be honest
[11:28] Guthur I really don't want to confuse potential/current users
[11:30] Guthur pieterh: can you briefly explain you reactor pattern thing in the C binding
[11:31] pieterh Guthur: to be honest I've never used a reactor pattern so this is kind of a guess
[11:31] pieterh the idea is to register the events you want to handle
[11:31] pieterh and then a tickless poll loop can handle it
[11:31] mikko howdy boys
[11:31] mikko and girls
[11:31] pieterh hey mikko!
[11:31] Guthur hi mikko
[11:31] Guthur pieterh: what is the alarm part
[11:32] Guthur and clock
[11:32] pieterh Guthur: if you look at a realistic app like the Majordomo broker
[11:32] pieterh then it mixes socket events with timer events
[11:32] pieterh e.g. "send heartbeats every 3 seconds"
[11:32] pieterh "kill server if no response in 2500 msecs"
[11:33] pieterh I made a proper tickless poll loop in the flcliapi (freelance client)
[11:33] pieterh it calculates the next timer event and polls that long
[11:33] Guthur ok so that would be too all registered sockets
[11:33] pieterh rather than polling every second or whatever
[11:33] Guthur I haven't got as far as freelance yet
[11:34] pieterh take a brief look at the flcliapi poll loop, if you want to understand zloop
[11:34] Guthur I've just got to majordomo
[11:34] pieterh :-)
[11:35] pieterh the reactor won't work for all cases, sometimes we poll selectively
[11:35] pieterh but it should help the more complex designs
[11:40] Guthur I think I've sort of included a limited reactor pattern in clrzmq2
[11:42] Guthur but unfortunately the incremental nature of the development of the polling mechanism has left the API in a state that is less clear that I would like
[11:42] Guthur it's one of the main area's I would like to refactor
[11:44] Guthur I'd also like to simplify the send/recv
[11:45] pieterh in C# strings are just blobs, right?
[11:45] Guthur they are objects
[11:46] pieterh right
[11:47] Guthur i'd rather deal with them in an msg object as oppose to at the socket level send/recv
[11:48] pieterh the breakdown I'm using in zapi is frame vs. msg
[11:48] pieterh where frame is one part, msg is a multipart object
[11:48] Guthur I meant to ask about the framw
[11:48] Guthur frame*
[11:48] pieterh 0MQ uses 'msg' to mean 'part', which is confusing
[11:49] Guthur yeah, was taking more about the multipart you describe
[11:49] pieterh perhaps I should use 'part' instead of 'frame'... anyhow
[11:49] pieterh the frame class lets you do things like 'receive the identity' and 'send the identity' with ROUTER sockets
[11:50] pieterh whereas the msg class is more like 'recv a list of frames' and 'send a list of frames'
[11:51] Guthur I was also then going to take a lazy marshalling approach, only marshalling the zeromq msg (part) to a c# data type when required, should improved performance in situations where you don't need to know about the whole message
[11:52] pieterh that makes sense, it's what I'm doing in other places
[11:52] Guthur it's sort of all in my head at the moment though, I really should try to define the API like you have done
[11:52] pieterh yeah, start with the API, it makes everything clearer
[13:03] drbobbeaty I have a core dump with ZeroMQ 2.1.3 this morning... I have detailed it in this gist: . It includes the stack trace and the code from OpenPGM that's the last step in the trace.
[13:03] drbobbeaty It seems impossible to be true -- if the assert is causing the exception, then the value of minor_bucket has to be NULL... but it's not, as shown in the stack trace.
[13:03] drbobbeaty Is this a problem for Steve?
[13:04] drbobbeaty (I've received this on two different boxes on four separate occasions this morning)
[13:06] drbobbeaty HA! I think it's the data_size being 0!
[13:06] drbobbeaty Any ideas as to why that would be?
[13:07] sustrik drbobbeaty: you have to discuss that with steve-o
[13:07] sustrik seem to be a problem with new version of opnepgm
[13:09] drbobbeaty Steve-o: can you have a look at and give me an idea of why data_size == 0 on the call? I'm hitting the assertion and have no idea why.
[13:10] drbobbeaty pieterh: should I just hit the mailing list for steve-o?
[13:10] pieterh drbobbeaty: I'd do that, he's in Asia and probably out of the office by now
[13:11] pieterh the good news is you can get a new OpenPGM and use that with 2.1.3 without further changes
[13:16] Guthur sustrik: I see in the IPC discussion no one is actually talking about using IOCP and named pipes. I've been meaning to ask you about whether you think an abstraction layer over either Sockets & Named Pipes or IOCP itself could feasibly provide the necessary functionality to mimic what is required from poll, select etc in 0MQ
[13:16] Guthur this would all be window centric changes of course
[13:22] drbobbeaty Steve-o: if you get this, please check the mailing list... I'm getting more than two core dumps an hour with 2.1.3 due to this data_size == 0 issue. Yikes!
[13:23] pieterh drbobbeaty: sorry about this, we don't have the facilities to properly test OpenPGM yet
[13:24] pieterh I'd advise you to rollback to 2.1.2 until we get a fix to it
[13:24] drbobbeaty I understand... If I had to guess it's an edge condition where data_size == 0, and I'm just hitting it more with all the exchange feeds I'm dealing with.
[13:24] sustrik Guthur: what's being discussed is a workaround
[13:25] drbobbeaty pieterh: that's a good plan.
[13:25] sustrik something that would look like IPC but would actually be TCP
[13:25] pieterh drbobbeaty: I expect tomorrow morning Steve will have an updated OpenPGM package
[13:25] sustrik real solution is IPC & NamedPipes
[13:25] pieterh you can install it and rebuild 2.1.3, --with-openpgm=<version> or somesuch, I'm not 100% sure on that syntax
[13:28] Guthur sustrik: does the abstraction sound like a feasible objective?
[13:29] sustrik Guthur: the abstraction exists already
[13:30] sustrik check how poll_t, select_t, epoll_t etc. interface with the rest of the system
[13:30] sustrik check whether IOCP can use the same interface
[13:30] sustrik if not, propose changes
[13:31] Guthur I thought the problem was that underneath those abstractions they use functionality that IOCP does not provide
[13:31] Guthur IOCP only providing notification of completion
[13:32] sustrik right, IOCP has an additional buffer between the user and the network
[13:32] sustrik something like AIO
[13:32] sustrik is there a way to limit the size of the buffer?
[13:32] Guthur I can check that out
[13:33] sustrik yes, please
[13:33] sustrik if there's no limit to the buffer
[13:33] sustrik it could easily exhaust all the memory
[13:33] sustrik in such case the code using IOCP would have to keep track of amount of memory in use
[13:35] Guthur oh, that does not sound ideal
[13:45] sustrik cremes: hi
[13:46] cremes sustrik: good morning
[13:46] sustrik morning
[13:46] sustrik as for the assertion, any chance of getting backtrace?
[13:47] cremes sustrik: i can try to capture it in gdb; give me a few minutes and i'll see what i can do
[13:47] sustrik ok
[13:47] sustrik another thing: the allocation mechanism you proposed
[13:47] sustrik have you found out where the memory disappers?
[13:48] sustrik (issue 174)
[13:48] cremes sustrik: i need to run your code with the patch you supplied on my linux box
[13:48] cremes i ran it on my osx box and it was disappearing in the same place as before
[13:48] cremes (the backtrace in issue 174)
[13:52] sustrik i though the proposal you made is related to 174
[13:54] cremes sustrik: it was; based on your feedback i was assuming the memory growth was due to page fragmentation caused
[13:54] cremes by small memory allocations
[13:55] sustrik afaik most allocators have per-size caches
[13:55] sustrik so you have special cache for 16 byte blocks
[13:56] sustrik another one for 32 byte blocks etc.
[13:56] Guthur sustrik: you can indeed specify a buffer size
[13:56] sustrik that in turn leads to optimal heap utilisation
[13:56] Guthur I need to check some of my resources at home though to remind myself of details of IOCP
[13:56] cremes sustrik: okay, then i don't understand why in the ticket you wrote that it's due to a known problem with malloc
[13:56] sustrik Guthur: how do you do that?
[13:57] Guthur sustrik: during the read call
[13:57] Guthur you specify a buffer and bytestoread
[13:58] sustrik Guthur: i meant limiting the send buffer
[13:58] sustrik say, totoal amount that can be used is 64kB
[13:58] sustrik attempt to exceed the buffer would mean the send call would fail
[13:59] sustrik is there anything like that in IOCP?
[13:59] sustrik cremes: the problem i referred to is that processes don't return allocated memory to the OS
[14:00] sustrik thus, it's not used, but cannot be reused by a different process
[14:00] cremes sustrik: sure, and from my research that is due to page fragmentation caused by malloc/free called on lots of small blocks (smaller than a page)
[14:01] sustrik does it return allocated memory at all?
[14:01] cremes yes, if an entire page can be freed
[14:01] sustrik that's linux?
[14:01] cremes yes
[14:01] sustrik nice
[14:02] cremes i got this from a few different write-ups that i read on stackoverflow and one other site
[14:02] sustrik iirc it wasn't the case in the past
[14:02] cremes i'll try to find them again
[14:02] Guthur sustrik: do you mean more than
[14:02] cremes yes, it appears that was an issue with kernel 2.4 and earlier
[14:02] cremes apparently 2.6 resolves that issue
[14:05] Guthur you could presumably build some chunking mechanism on top of that with IOCP
[14:05] Guthur send chunk 1 with Completed send chunk 2....
[14:05] Guthur with/when
[14:09] sustrik Guthur: yes, something like that
[14:09] sustrik cremes: nice
[14:10] sustrik now, the problem occurs because of 2 specific allocations
[14:10] sustrik according to the OSX tool
[14:10] sustrik 1. allocating the chunk in yqueue_t
[14:10] sustrik 2. allocating the encoder/decoder buffers in engine
[14:11] sustrik so, afaiu, if the size of those allocations is a muliply of page size, the problem should go away, right?
[14:11] sustrik as for 2 the size of those buffers is 8kB
[14:12] sustrik so there's no fragmentation issue
[14:12] sustrik hm, for 1. the size is 12kB
[14:12] sustrik so the fragmentation should not happen
[14:13] sustrik but given that the test is run on OSX, the allocation mechanism may be different
[14:13] cremes sustrik: true, but the memory growth is also reproducible on linux
[14:14] cremes i just don't have a tool there to show where it's happening
[14:14] sustrik :(
[14:14] cremes i am *assuming* it occurs in the same place on both OSes
[14:14] sustrik you'll have to run the test with my patch
[14:14] cremes sustrik: just to confirm, you were able to reproduce this on your linux system with that example code, right?
[14:14] sustrik that'll at least show whether it's the yqueue issue
[14:15] sustrik yes, i think so
[14:15] sustrik i assumed it was process not returning memory to the OS
[14:15] sustrik but given the issue was solved in linux/2.6
[14:15] sustrik it has to be something different
[14:16] cremes sustrik: after i try to get this backtrace for the other issue (181) i'll apply your patch on linux and run it
[14:16] sustrik thanks
[14:20] Guthur pieterh: with MDP do you see anything inherently wrong with having workers being other brokers
[14:29] Guthur maybe even have service requests routed via a URI
[14:52] pieterh Guthur: workers can certainly be brokers as well
[14:54] cremes sustrik: got a core; do you want the output from 'bt' or from 'thread apply all bt'?
[14:54] sustrik presumably the latter
[14:55] cremes sustrik: ok... i'll give you both :)
[14:55] sustrik thx
[14:55] cremes sustrik:
[14:56] cremes btw, this is off of commit 1619b3d84a04fe1886347fd83280a607
[14:59] sustrik cremes: the assertion happens in mutex destructor
[14:59] sustrik interesting
[15:00] sustrik do you console output?
[15:00] sustrik it should print out the actual error code
[15:00] cremes i don't understand the question; i have a core file... is there something else you want me to look at?
[15:00] sustrik errno
[15:00] cremes p errno?
[15:00] sustrik yes
[15:00] sustrik in the asserted thread
[15:01] sustrik thread 1
[15:01] sustrik p errno
[15:01] cremes Cannot find thread-local variables on this target
[15:01] sustrik :|
[15:02] sustrik does the program show the console output
[15:02] sustrik ?
[15:02] sustrik stderr
[15:02] sustrik ?
[15:02] sustrik if so, the error should be visible there
[15:03] cremes looking...
[15:03] sustrik you should see the assert there
[15:04] sustrik the line above it should be the error
[15:04] cremes all it printed was the assert from socket_base.cpp
[15:05] sustrik the stack trace shows a different assert
[15:05] sustrik is that the same run?
[15:05] cremes let me run it again and put everything to the console instead of to files (i usually redirect the output)
[15:05] cremes same run
[15:05] sustrik strange
[15:06] cremes running again...
[15:12] cremes sustrik: all it prints is:
[15:12] cremes Assertion failed: sessions.empty () (socket_base.cpp:127)
[15:12] cremes Aborted (core dumped)
[15:14] cremes the thread backtraces are the same for this core
[15:15] sustrik that's really strange
[15:15] sustrik thread 1 reports the assertion happened in mutex.hpp
[15:16] sustrik rather than in socket_base.cpp
[15:16] sustrik maybe it's optimiser's fault
[15:16] cremes i don't have an explanation :(
[15:16] pieterh cremes: are you building a debug version?
[15:16] sustrik any chance of building 0mq with optimisations disabled?
[15:16] cremes pieterh: yes, ./configure --enable-debug
[15:16] cremes sustrik: yes
[15:17] sustrik --enable-debug should turn optimisations off
[15:17] sustrik (-O0)
[15:17] cremes let me rebuild
[15:17] cremes i'll make clean first...
[15:18] sustrik cremes: wait a sec
[15:18] sustrik maybe it makes more sense to try to figure out what's failing
[15:18] sustrik are you using identities?
[15:18] cremes oops, too late
[15:18] cremes yes, i am using identities
[15:19] sustrik the failing socket seems to be xreq, right?
[15:19] cremes yes
[15:20] sustrik any chance it gets connected to two peers that happen to have the same identity
[15:20] sustrik one of them via connect, other one via bind?
[15:20] cremes let me take a quick look at the code; i want to say "no" but let me verify
[15:21] sustrik simpler question: do you bind or connect the xreq socket; or both?
[15:22] cremes hard to say
[15:22] cremes if the xreq is part of a QUEUE device, then it's binding
[15:22] cremes otherwise all other sockets connect
[15:23] sustrik never both on the same socket, right?
[15:23] cremes correct
[15:24] sustrik hm, maybe the socket_base_t happens to get destructed twice
[15:24] sustrik is it possible to add printf's to your program?
[15:25] cremes sustrik: yes
[15:25] cremes you want them added to calls to zmq_close() ?
[15:25] sustrik something like this:
[15:25] sustrik printf ("alloc %p\n", (void*) this);
[15:26] sustrik in socket_base_t constructor
[15:26] sustrik and
[15:26] sustrik printf ("dealloc %p\n", (void*) this);
[15:26] sustrik in the destructor
[15:26] sustrik that should show us whether the destructor is called twice for the same object
[15:27] cremes in the destructor, do you want this printf called first or last?
[15:27] cremes presumably first
[15:27] sustrik first
[15:28] cremes running...
[15:28] cremes hmmm, i should have separated that stuff out to stderr
[15:31] cremes how do i redirect stdout and stderr to separate files in bash?
[15:33] cremes figured it out
[15:33] pieterh cremes: someproc 2> stderr.log
[15:33] cremes running...
[15:38] mikko pieterh: did you merge the pull req?
[15:38] mikko there is prolly more coming soon
[15:38] mikko ([zeromq-dev] ZMQ 2.1.3 w/OpenPGM Assertion on pgm_rate_check2())
[15:38] pieterh mikko: you mean the autoconf fix for OpenPGM in 2.1?
[15:38] mikko y
[15:39] mikko im thinking this atm
[15:39] pieterh hmm, did you see my email about sending patches instead?
[15:39] pieterh I think it's going to be simpler
[15:39] pieterh pubsub instead of reqrep
[15:39] mikko "There is a smarter way to do this, and I think it is:"
[15:39] pieterh two subscribers (martin, myself), one publisher (you)
[15:40] mikko i will update a patch and send to ML
[15:40] pieterh that's best IMO
[15:41] pieterh somewhere there's a magic git incantation to pull a commit from a random git, but I don't know it
[15:41] mikko what do you mean?
[15:41] mikko git is built on that sort of thing
[15:41] sustrik cherry-pick?
[15:41] mikko you could even pull changes from my personal repo
[15:42] mikko and if i was privileged to i could push to your home machine
[15:42] cremes sustrik: with a confirmed debug build, the backtraces are different now
[15:42] cremes sustrik:
[15:42] cremes thread 1 and thread 19 look relevant
[15:42] cremes i still can't print errno
[15:44] sustrik never mind, this is the assert (empty) thing
[15:44] sustrik what about the output?
[15:44] pieterh mikko, ok, so how do we do this without using github's pull request?
[15:44] cremes lots of alloc/dealloc messages
[15:44] pieterh e.g. create an issue pointing to a remote git/commit
[15:45] mikko pieterh: or mailing list?
[15:45] pieterh let's work it out here...
[15:45] sustrik cremes: check for 0x7f1a582a4ea0
[15:45] pieterh what git commands do I need to use to pull one commit from your git
[15:45] cremes sustrik: 1 alloc, 1 dealloc
[15:46] cremes sustrik: however, that is the *last* dealloc printed
[15:46] mikko git remote add mikko <url to my repo>
[15:46] pieterh ok
[15:46] sustrik and the printf is at the beginning of the destructor, right?
[15:46] mikko git fetch mikko
[15:46] mikko then check the hash of the commit
[15:46] cremes correct
[15:46] pieterh git fetch --no-tags, at least
[15:46] mikko and cherry-pick it
[15:46] mikko i think that should do it
[15:46] sustrik then it's some other problem...
[15:46] cremes sustrik: line 792 is the first line of the destructor
[15:47] pieterh git fetch won't make a mess of your target git?
[15:47] cremes sustrik: i grep'ed all of the dealloc lines to their own file 'z2'
[15:47] sustrik 792?
[15:47] sustrik what file?
[15:47] cremes sustrik: then ran: wc -l z2; sort -u z2 | wc -l
[15:47] sustrik socket_base_t?
[15:47] cremes socket_base.cpp
[15:47] mikko pieterh: git-fetch - Download objects and refs from another repository
[15:47] mikko pieterh: nope
[15:47] sustrik cremes: what version?
[15:48] sustrik my socket_base.cpp ends at 790
[15:48] pieterh mikko: ok, I like cherry-pick, it's how I pull patches from sustrik
[15:48] cremes sustrik: oops, sorry, misread the vi output; line 124 of 792 (i added two printfs)
[15:48] pieterh but you have to do 'git fetch --no-tags whatever' to avoid polluting your home git
[15:48] sustrik cremes: ok
[15:48] pieterh sustrik: do you want to try cherry-pick from mikko's git?
[15:49] sustrik i want a patch on the ML
[15:49] sustrik that's the process
[15:49] cremes sustrik: anyway, by removing duplicate dealloc prints using 'sort -u' it looks like 180 sockets are dealloc'ed twice
[15:49] pieterh sustrik: you don't send me patches via the ML
[15:49] sustrik i should
[15:49] pieterh so the process isn't symmetric now
[15:49] pieterh perhaps
[15:50] pieterh sending me a commit tag is much less work for you
[15:50] pieterh and also easier for me
[15:50] pieterh if you send the URI to the commit, it's as good as a patch
[15:50] sustrik that's good for backporting
[15:50] pieterh everyone who cares can review it
[15:50] pieterh here we're porting in all directions, no back or fwd
[15:50] sustrik it's only a reference to what has to be backported
[15:50] pieterh nope, no special cases please
[15:50] sustrik with new functionality you want a full patch
[15:51] pieterh it should be a single process between any two gits
[15:51] pieterh github pull requests are eliminated because they only work between forks
[15:52] sustrik there has to be an entry point
[15:52] pieterh entry point?
[15:52] sustrik the place where patch enters the ecosystem
[15:52] pieterh one always addresses a patch/commit to the author of the target git
[15:52] pieterh entry point can be ML, that's excellent
[15:52] pieterh it's ideal for this
[15:52] sustrik yup
[15:52] pieterh but signed patches are extra effort
[15:52] pieterh and we can do it better
[15:53] pieterh i think, anyhow, and I'm sure it should be orthogonal for any two gits
[15:53] pieterh i.e. if you want to send a patch to mikko, same process
[15:53] sustrik mikko doesn't care about sign-offs
[15:53] sustrik as it's his private git
[15:54] pieterh that's a separate issue
[15:54] pieterh signed off or not, that's QC on the patch or commit
[15:54] pieterh I'd like a single process for all of us, no special cases
[15:54] mikko what about CLA?
[15:54] mikko too complicated?
[15:54] pieterh CLA?
[15:55] pieterh way too complex
[15:55] pieterh by 100x
[15:55] mikko contributor license agreement
[15:55] mikko like apache etc do
[15:55] pieterh no way, dead body over, etc.
[15:55] sustrik sign-off is simpler imo
[15:55] pieterh it creates *huge* management issues and individual developers hate it to the point of not contributing
[15:56] pieterh we switched to co-owned code and signoff last year, it's superior in every sense
[15:56] pieterh this is not relevant to the thread however
[15:56] pieterh cherry-picking is very simple for both parties
[15:56] mikko pieterh: we were a bit too drunk to discuss this properly last weds
[15:56] pieterh mikko: drunk? I'm Scottish, we don't get drunk
[15:57] pieterh oh, hang on, yes we do... but it was English beer, doesn't count
[15:57] pieterh sustrik: I'm not sure cherry-picking conforms to signing off, that's all
[15:58] pieterh but it has to, of course
[15:58] mikko can you not sign a commit?
[15:58] pieterh pulling code from a public git means it's by definition signed off
[15:58] pieterh you don't need to
[15:58] pieterh once you commit & push to a git, you license the code under whatever, LGPL etc.
[15:59] pieterh this is why pull requests work
[15:59] pieterh but they are crippled for general use
[15:59] pieterh sustrik: allow me to make a short proposal on the list, OK?
[16:00] sustrik sure
[16:00] pieterh thanks
[16:00] pieterh BTW the new C API is almost working
[16:00] pieterh it does everything I was asking for, e.g. automatic close of sockets, message reuse, etc.
[16:00] pieterh nicely emulates 2.0 semantics on termination :-)
[16:01] sustrik goodo
[16:01] cremes sustrik: i applied your patch for issue 174 and ran your minimal test case (also in that issue)
[16:01] cremes sustrik: shall i email you the output?
[16:01] cremes it's too big to pastie
[16:04] Guthur pieterh: you're scottish?
[16:04] pieterh Guthur: half Scottish, half Belgian...
[16:04] pieterh born somewhere in the middle of the North Sea
[16:06] Guthur one of those water births then
[16:07] pieterh lol
[16:08] Guthur belgians aren't renown for their seafaring as well
[16:09] Guthur I hear they don't have much of a navy
[16:10] pieterh my great-great grandfather was hung for being a pirate
[16:11] sustrik cremes: did it drop to 0?
[16:11] cremes sustrik: no, it didn't; i just sent you the output in an email with attachment
[16:12] sustrik ok
[16:13] sustrik cremes: it's looks like it's dropping
[16:13] sustrik see the end of the file
[16:13] sustrik if you let it run a bit longet it would probably get to 0
[16:14] sustrik longer*
[16:14] cremes sustrik: i let it run for 5 minutes... it didn't appear to be dropping anymore
[16:15] cremes sustrik: i'll let it go for 20m and see what happens
[16:15] sustrik it seems like it was terminated in the middle of the process of dropping
[16:15] sustrik see the last line
[16:15] sustrik it's cut in the middle of output
[16:16] sustrik 356 355 354 35
[16:16] cremes sustrik: i think that's from buffered output
[16:16] cremes i was tailing the log file and it stopped there
[16:16] sustrik ah
[16:16] sustrik anyway, it dropped almost to zero
[16:16] sustrik up to 30,000
[16:16] sustrik then back to 353
[16:17] sustrik the leak is either elsewhere
[16:17] sustrik or it's process not returning memory to the OS
[16:18] cremes sustrik: i'll let it sit for a while... i just saw it go from 1 to 3200 and now it's down to 310
[16:18] cremes but it's not printing anything anymore
[16:18] sustrik hm
[16:18] cremes if it's 'cut off' due to stdout buffering, hitting ctrl-c ought to flush that buffer
[16:19] sustrik maybe a problem with the buffering itself?
[16:19] sustrik anyway, i've used simple non-buffered console
[16:19] sustrik and i saw the number of chunks dropping to 0
[16:19] cremes i don't know how to get an unbuffered xterm going
[16:20] sustrik cremes: i don't think it matters
[16:20] cremes i should probably add an explicit flush
[16:20] sustrik 350 chunks won't account for all the allocated memory you see
[16:20] sustrik right?
[16:20] sustrik 1 chunk = 12kb
[16:20] mikko fflush (stdout);
[16:20] sustrik x350 = 4.2MB
[16:20] cremes sustrik: do those chunks contain message data?
[16:21] sustrik how large is your message?
[16:21] sustrik 8 bytes?
[16:21] cremes your test case was using a 20byte message
[16:21] sustrik ok
[16:21] sustrik then the chunks contain message data
[16:23] cremes i added fflush() and am rerunning; we'll see if that makes a difference
[16:23] cremes it did; count went to 0
[16:23] cremes resident memory is still inflated though
[16:24] sustrik the next step would be to replace all the mallocs and frees and news and deletes
[16:24] sustrik by something that would track amount of memory allocated
[16:24] cremes sustrik: i think it's easier for me to just restart my applications every 48 hours
[16:24] cremes to free up swap
[16:25] sustrik hm
[16:25] sustrik let's keep the issue open then
[16:25] sustrik when i get some time
[16:25] cremes ok
[16:25] cremes i'll add an update with my findings
[16:25] spht Does zmq not exit in its SIGINT handler? Can't SIGINT to kill my script using the python binding..
[16:26] sustrik i'll try to check whether the memory is held by 0mq or glibc
[16:27] sustrik spht: what version of 0mq are you using?
[16:27] spht sustrik: Python module version is 2.0.10
[16:28] sustrik there used to be a problem with SIGINT in old versions, let me check...
[16:29] cremes spht: this was fixed in the 2.1 branches
[16:29] cremes spht: time to upgrade
[16:29] spht cremes: sustrik: Ahht thanks! I'll go upgrade ASAP :)
[16:29] ianbarber mikko: about?
[16:29] mikko ianbarber: yes
[16:29] mikko bam!
[16:31] ianbarber bam!
[16:31] ianbarber so, I was translating a pieterh example
[16:31] ianbarber which measures round trip speed
[16:31] ianbarber it starts a client, worker and broker, and tries doing send() recv() 10000 time and then send() 10000 and recv() 10000
[16:32] ianbarber to compare the speed of waiting for a response versus async
[16:32] ianbarber in the c version I get ~6k/s for the first, around 50k/s for the second
[16:32] ianbarber in the PHP i get ~6k/s for both
[16:33] mikko that's interesting
[16:33] ianbarber i thought it might be because PHP was doing a msg destroy after each one, but taking that out of the c version didn't affect it
[16:34] mikko probably need to run with a profiler to see
[16:35] ianbarber yeah, i guess
[17:01] pieterh ianbarber: destroying messages, allocs, etc. don't make that much impact
[17:06] ianbarber yeah, that's what i concluded.
[18:32] picasso i'm designing an analytics system that needs to handle potentially heavy loads and usage spikes
[18:32] picasso creating a web service API (preferrably in PHP), packaging incoming requests and dumping onto a message queue, and then processing these requests in the background
[18:33] picasso i apologize for not being too familiar with zeroMQ, but is this a type of problem that would be a good use case for zero?
[18:43] cremes picasso: yep, it sure would
[18:43] cremes have you read the guide?
[18:43] cremes it covers several use-cases
[18:46] picasso seems i have quite a bit of reading to do :)
[19:13] cremes picasso: just a little bit!
[19:49] michelp i'm just reading the news about Sergey Aleynikov, was he a 0mq developer?
[19:49] michelp i saw his github (maybe bitbucket) googleing around a few days ago but now it appears to be gone
[19:49] michelp and it looked like he had made some contributions
[19:50] michelp weird that i can't find it now, they must have deleted it
[19:52] pieterh michelp: he wrote the original Erlang binding for 0MQ
[19:52] pieterh user id on github is saleyn
[19:53] michelp i'm not up on the details, but his sentence seems extremely harsh for just moving code, it doesn't sound they proved he used it in any way
[19:56] michelp ah it's still up on github
[19:57] cremes michelp: it doesn't matter if he used it or not; let's say you take a candy bar from the store but you don't eat it; has it been stolen?
[19:57] cremes or not?
[19:58] michelp yeah i guess it has. the couple of stories i've read so far don't have much detail, but i just found one that does
[19:59] pieterh michelp: crimes against property are usually treated harshly unless you're wealthy
[19:59] spht dang: He was employed for two years at Goldman on a salary of $400,000. In early June, he left Goldman to join Teza Technologies, a Chicago start-up which offered to triple his pay.
[20:00] spht $1.2M coding, that's pretty decent I'd say :)
[20:08] pieterh aight, zapi the high-level C binding for 0MQ is now ready and working!
[20:08] pieterh
[20:09] mikko cool!
[20:09] mikko did you take zfl build as base?
[20:09] pieterh yes, for the autotools
[20:09] mikko very good
[20:10] mikko should be about the same process
[20:10] pieterh the only change was to switch to valgrind for the selftest script
[20:10] pieterh much better than whatever heap checker I was using in ZFL
[21:24] pieterh sigh
[22:00] Guthur pieterh, you work fast
[22:00] pieterh yeah
[22:00] pieterh am just making packages now
[22:00] pieterh beta release in one day
[22:00] pieterh will aim to also upload man pages to a web site, should be doable
[22:01] pieterh first replying to people who confuse posix streams with length-specified messages
[22:01] pieterh sigh
[22:01] Guthur In the same time I haven't got nearly as far with the 0MQ-FIX bridge
[22:01] Guthur admittedly I had to go to work in the middle
[22:03] Guthur robustly extracting a FIX message from a socket wrote buffer is non too trivial exercise though
[22:03] pieterh FIX is complex
[22:03] pieterh that's why I suggested doing that totally separately
[22:05] Guthur it's actually my first time working directly with raw sockets as well
[22:05] Guthur so all a bit of a learning curce
[22:05] Guthur curve*
[22:24] pieterh raw sockets are pretty OK if you're not trying to be multithreaded or something
[22:31] Guthur hehe, I quickly decided not to try that
[22:32] Guthur it's not so much the Sockets it's extracting the individual messages from the stream
[22:32] Guthur ensuring they are complete and valid
[22:34] Guthur just trying to find the nice solution, I've made a state transition table, which should help
[22:34] Guthur and just ran a small socket demo to sanity check my socket understanding
[22:34] Guthur so all set now to do the real work
[22:46] pieterh Guthur: are you working in C or C# for this?
[22:49] Guthur C# for the time being, I think it might suit being implement in C, if its 'dumb' enough
[23:01] pieterh I'm really curious to see how you get with this