IRC Log


Monday February 28, 2011

[Time] NameMessage
[09:13] pieterh sustrik: you there?
[09:21] sustrik pieterh: hi
[09:22] pieterh I have this feeling as we roll out 2.1.x people will complain about zmq_term
[09:23] sustrik we can switch LINGER default to 0
[09:23] pieterh that would make sense IMO
[09:23] pieterh or if not 0, then 1 second or whatever
[09:23] sustrik but that would mean that such a simple program as:
[09:23] pieterh neither 0 nor infinity are sensible defaults IMO
[09:23] sustrik send();close();term();exit()
[09:24] sustrik would not send anything
[09:24] pieterh sensible defaults work for simple programs
[09:24] sustrik anything larger than 0 and less than infinity does not make sense imo
[09:24] sustrik such a default is a trap
[09:25] sustrik it works in low-load envs (test env)
[09:25] sustrik and break when high load hits
[09:25] sustrik you can try some experimenting with throughput perf test
[09:25] pieterh possibly, but if you make it hard for people to write simple code, they won't
[09:25] sustrik if you set linger to say 10
[09:25] pieterh and it's easy to make 0MQ explain wtf is going on
[09:25] sustrik it works initially
[09:26] sustrik unless you send a lot of messages
[09:26] pieterh e.g. if linker is 1 second, it works, and if there is still unsent data at 1 second, it can say something
[09:26] sustrik then it breaks mysteriously
[09:26] pieterh mystery is entirely a design choice here
[09:26] pieterh point is,
[09:26] pieterh you're going to get a lot of complaints IMO
[09:26] sustrik i'd be rather explicit than mysterious
[09:26] sustrik let's see
[09:26] pieterh right now it's extremely mysterious
[09:27] sustrik well, it blocks
[09:27] sustrik what's mysterious about that?
[09:27] pieterh i don't even understand the reply you sent to Christian
[09:27] sustrik deterministic behaviour
[09:27] sustrik zmq_term() -> ETERM -> zmq_close()
[09:27] pieterh deterministically useless behaviour
[09:27] sustrik better than heisenbugs
[09:27] pieterh erlang binding works around it
[09:28] pieterh this is not going to end nicely...
[09:28] sustrik what's the thing with erlang?
[09:28] pieterh you didn't follow?
[09:29] sustrik haven't seen anything
[09:29] pieterh it tracks open sockets and secretly closes them all _before_ calling zmq_term
[09:29] sustrik where's the discussion?
[09:29] pieterh because an infinitely blocking system call is so insane
[09:29] pieterh utterly... pathological
[09:30] sustrik where's the discussion?
[09:30] pieterh can't find it immediately
[09:31] pieterh email is not a database
[09:31] pieterh as I've pointed out many times, it's useless for finding back stuff
[09:31] pieterh search for a thread titled "zmq_term() blocks in 2.1"
[09:32] pieterh "For simplicity, the port driver doesn't handle any threads itself.  And it never
[09:32] pieterh actually calls a zmq library function that could block indefinitely."
[09:32] sustrik zeromq-dev?
[09:32] pieterh from chris <csrl@gmx.com>
[09:32] pieterh yes, of course it's on zeromq-dev
[09:32] pieterh and you were on that thread
[09:34] sustrik aha, found it
[09:34] sustrik that's not about LINGER
[09:34] sustrik that's about necessity to close the sockets
[09:34] sustrik same problem you've reported a long time ago
[09:35] pieterh you mentioned Linger, I just pointed out that zmq_term blocks
[09:35] pieterh and that people will hit this more and more
[09:36] sustrik ok, there are several issue involved:
[09:36] sustrik 1. necessity to close sockets
[09:36] sustrik it would be nice to be able to avoid that
[09:37] sustrik however, i have no idea of how to do that
[09:37] sustrik 2. linger
[09:37] pieterh no other classic OS API requires that kind of thing
[09:37] sustrik 3. Ctrl+C
[09:37] sustrik this one was solved in one go with reaper thread
[09:37] sustrik you are free to send a patch
[09:37] private_meta uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
[09:38] pieterh sustrik: "send a patch" is telling me to jump in the lake, you know that
[09:38] sustrik well, i have no idea how to avoid 1.
[09:38] pieterh I'm raising a concern that the "stable 2.1" release will annoy many people and break a lot of code
[09:38] sustrik so, someone else have to
[09:38] pieterh this makes it very hard for me to release that with any kind of confidence
[09:39] pieterh except with a large disclaimer, which may be sufficient
[09:39] pieterh but...
[09:39] pieterh not when the explanation is confused
[09:40] sustrik what should i say? i have no idea how to fix it
[09:40] sustrik that's it
[09:40] pieterh sustrik: look, I can document the need to close sockets
[09:40] pieterh I can document the need to set LINGER even in the most trivial apps
[09:41] pieterh that's just "oh, it's not so shiny and elegant anymore"
[09:41] pieterh but if you tell people to call zmq_term before closing sockets, I'm kind of confused
[09:41] sustrik let's not mix the issues
[09:41] sustrik are you concerned about 1 or 2?
[09:41] pieterh this is your breakdown, it's not mine
[09:42] sustrik are you concerned about both?
[09:42] pieterh my concern is just "zmq_term blocks in 2.1 and I don't know why"
[09:42] sustrik ok
[09:42] sustrik 1. i can't solve it
[09:42] pieterh it's relatively easy to solve in a simple threaded app
[09:42] sustrik you are free to try
[09:42] pieterh 1. close sockets, 2. set linger = 0, terminate
[09:42] sustrik 2. linger is a problem
[09:42] pieterh but if this starts to go wrong in multithreaded apps, people will _refuse_ to use 0MQ...
[09:43] pieterh it's a class 1 fatal "no go" problem that will stop it going into production
[09:43] sustrik setting linger to some default value
[09:43] pieterh "sorry, we can't solve it" may be one answer
[09:43] pieterh but it's a really crappy answer
[09:43] sustrik means that even a single message won't pass
[09:43] sustrik given it's sufficiently long and/or network is sufficiently slow
[09:43] pieterh like I said, there are easy ways to make that work
[09:43] sustrik ?
[09:44] sustrik sure, do so
[09:44] pieterh a. use a sensible default linger value
[09:44] pieterh b. if the app still has unsent messages after that, issue a loud warning
[09:44] sustrik hm, returning an error from zmq_term()?
[09:44] sustrik that may work
[09:45] pieterh no, a loud warning
[09:45] sustrik what's that?
[09:45] pieterh printf ("123 messages not sent, please raise ZMQ_LINGER on socket")
[09:45] pieterh etc.
[09:45] pieterh something that gets literally printed and sent to logs
[09:45] pieterh or sent on sys://log if that ever goes live
[09:45] sustrik that works only with console is available
[09:45] sustrik problem on windows
[09:45] pieterh so, people _need_ consoles on production systems
[09:46] sustrik but the error would kind of make sense
[09:46] sustrik let me think about it
[09:46] pieterh returning an error?
[09:46] sustrik rc = zmq_term()
[09:46] pieterh maybe but it just makes the caller responsible again
[09:46] pieterh yes, it would at least be consistent
[09:46] sustrik if (rc == EPENDINGMESSAGES)...
[09:46] pieterh yes
[09:46] pieterh and then set LINGER to 1 second by default please
[09:47] sustrik ok, i'm going to think about it
[09:47] pieterh this simplifies simple cases
[09:47] sustrik the close() problem remains though
[09:47] pieterh as for the deadlock issue, it just needs accurate documentation
[09:47] sustrik ok
[09:47] pieterh accurate, i.e. precisely what do people have to do to avoid it
[09:48] pieterh this change to linger would be very good, at least it'll distinguish the deadlock from infinite linger
[09:48] pieterh that's a headache today, not knowing what's actually going wrong
[09:49] sustrik it won't distinguish the two cases :(
[09:49] sustrik it will just timeout the term() after a while
[09:49] sustrik and allow to restart it
[09:50] pieterh you would also timeout the deadlock?
[09:50] sustrik yes
[09:50] pieterh ...
[09:50] pieterh but in the deadlock case there are zero messages to send
[09:50] sustrik the deadlock is caused by the handshake
[09:51] sustrik "tell me whether there are more you've queued"
[09:51] sustrik "ok, there are no more messages"
[09:51] sustrik the application thread's part of the handshake is executed in zmq_close() call
[09:51] pieterh right
[09:52] pieterh well, we know how many sockets are not responding, right?
[09:52] sustrik yes
[09:52] pieterh that's valuable information to report
[09:53] sustrik yup
[09:53] pieterh rc = number of unclosed sockets, maybe
[09:54] sustrik possibly
[09:55] pieterh you can't use EPENDINGMESSAGES unless you know there are actually messages waiting
[09:55] pieterh something like ETIMEOUT
[09:55] pieterh if we can make this work sensibly, IMO 2.1 is ready for the big stage
[09:56] pieterh coffee, brb
[09:57] sustrik to be ready for big stage we need ubscription forwarding :|
[10:03] pieterh uhm, no, you just don't need to break every app already running...
[10:04] sustrik well, it's actually a bugfix
[10:04] sustrik people complained that messages are dropped on exit
[10:04] sustrik namely, mato
[10:05] pieterh well, there's always someone complaining... :-)
[10:06] pieterh my radar mainly focuses on the dev list
[10:06] sustrik it have been a common complaint back then
[10:06] pieterh LINGER per socket is also kind of a strange choice
[10:07] sustrik it's POSIX
[10:07] pieterh yes, this was a necessary change, no argument with that
[10:07] pieterh zmq_term is not POSIX :-)
[10:07] sustrik zmq_term = OS shutdown
[10:07] pieterh nope
[10:07] sustrik yes
[10:08] pieterh sigh
[10:08] pieterh then why am I calling "Shutdown OS" in my apps?
[10:08] pieterh 0MQ is _not_ a kernel module
[10:08] pieterh sorry, this is 2011 and we're on version 2.x.x
[10:08] pieterh please remain in the present
[10:08] Steve-o lol
[10:08] pieterh you may have a vision of where 0MQ will go
[10:08] sustrik zmq_term is *equivalent* to OS shutdown
[10:09] sustrik not OS shutdown itself
[10:09] pieterh but we are discussing today's code and today's design
[10:09] pieterh again, I do not call OS shutdown in my apps
[10:09] sustrik it does the same thing the TCP does with tx buffers on OS shutdown
[10:09] pieterh please, this analogy is not helpful
[10:09] pieterh it really is not helpful
[10:10] sustrik it's what it does
[10:10] pieterh "Sorry, sir, your app is deadlocking because zmq_term is like OS shutdown"
[10:10] sustrik shrug
[10:10] sustrik no point in this discussion
[10:10] pieterh well, it'll keep coming back
[10:10] sustrik i'll have a look at the timout for zmq_term()
[10:10] pieterh you won't be able to make it work IMO
[10:10] pieterh because you have LINGER per socket not per context
[10:11] sustrik they are two different timouts
[10:11] pieterh how would you modify the term timeout?
[10:11] pieterh as a user, I mean
[10:11] sustrik zmq_term_wiat (void *ctx, int timeout);
[10:12] pieterh so revert the old method to not blocking, and introduce a new one?
[10:12] sustrik introduce a new one
[10:12] pieterh +1, gets my vote
[10:13] pieterh it is totally explicit and leaves 2.0 semantics unchanged
[10:13] sustrik reverting zmq_term() to immediate would be consistent with 2.0
[10:13] pieterh ack
[10:13] sustrik however, 2.1 users may complain
[10:13] sustrik so it's up to concensus
[10:13] pieterh HEY EVERYONE!!!
[10:14] pieterh please ack/nack sustrik's suggestion here...
[10:14] sustrik something like that
[10:14] sustrik on mailing list
[10:15] pieterh yes
[10:15] pieterh it's a major topic, would you raise it then?
[10:16] sustrik i have to think about the whole thing first
[10:17] pieterh ok
[11:14] private_meta hmm
[11:14] private_meta uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
[11:15] pieterh private_meta: Chapter 3 of the Guide
[11:16] pieterh various routing based on using XREP / ROUTER socket and identities of peers
[12:14] stimpie Does anyone have measurements or explanations on performance on a connection per thread/cpu versus a singele connection per system with a dispatcher to each thread?
[12:18] ianbarber threads vs events? that's a big argument :) the http://www.kegel.com/c10k.html c10k page is a good overview, not 0MQ specific
[12:25] stimpie thats an interesting read but not exactly what I'am thinking about, I have a system with x cores and x threads, messages from other systems need to arrive at those threads. I can create a socket for each thread or create 1 socket from where messages are dispatched to each thread.
[12:26] stimpie With one socket each physical devices has only one address and other systems do not have to take the number of threads per system into account
[12:27] stimpie Each thread a socket appears faster to me but requires more 'global' knowledge, (messages should be duplicated across physical machines)
[12:29] ianbarber yeah, i see
[12:30] pieterh stimpie: it's not really an either/or choice IMO
[12:30] pieterh on the one hand you need a frontend able to poll your 10K sockets
[12:30] pieterh but you usually also need a bunch of threads to do the real work
[12:30] pieterh however it is pathological to create one thread per socket
[12:31] pieterh see asyncsrv example in Chapter 3 of the guide
[12:32] ianbarber pieterh: i think he's asking about one thread per core basically, with one tcp socket per thread, or a device on tcp with inproc/ipc to the other threads
[12:32] stimpie ianbarber, those should have been my words ;-)
[12:33] pieterh well, you want one thread per core for threads that do real work
[12:33] ianbarber definitely
[12:33] pieterh however, that does not map to TCP connections
[12:34] pieterh not "one tcp socket per thread", nope
[12:34] pieterh that would be an anti-pattern in 0MQ
[12:35] ianbarber i think, tbh, that a forwarder type device would be fine, they're pretty quick. If you did want to have a TCP listener per core, then you could have them check in to a name service, and have your clients query the name service
[12:36] ianbarber though I would have each of those TCP listeners be a separate process
[12:37] stimpie So you think the overhead of the forwarde (dispatcher) would not be a negative impact?
[12:38] stimpie Best way to find out is to benchmark I guess
[12:39] pieterh stimpie: best way is to benchmark, try any device and see how it performs
[12:39] ianbarber yeah
[12:40] stimpie I will do, thanks for your thoughts
[12:40] pieterh stimpie: the pattern I'd recommend is:
[12:40] pieterh n clients, connecting as usual to a queue
[12:41] pieterh m workers, where m is much smaller than n
[12:41] pieterh queue talking over inproc to workers
[12:41] pieterh total number of threads on the server is m + 1
[12:41] pieterh if m is too large, you will lose time in context switching
[12:42] pieterh sorry, total number of app threads on server is m + 1, there is also at least 1 I/O thread
[12:42] pieterh so optimal value for m is (total cores on server box) - 2
[12:43] pieterh assuming you can dedicate a whole multicore box to your server app
[12:43] pieterh this would be for CPU-limited workers, it's different if they are I/O bound
[12:45] ianbarber make sure to benchmark with as relastic conditions etc. as you can - it can be easy to benchmark with (say) much smaller messages than you'd normally use, and see a different performance character
[12:59] Guthur pieterh: The new projects page seems a little similar to the labs page, imo
[12:59] pieterh Guthur: yes, it's meant to overlap
[12:59] pieterh this projects page is a temporary place to collect community projects
[13:00] pieterh that is, projects we consider part of the 0MQ community and want to expose to potential contributors
[13:00] Guthur ok, and then what is labs?
[13:00] pieterh the Labs page goes a bit further and also doesn't really expose the core projects
[13:00] pieterh so my idea with the projects business is to show these on the main community page
[13:00] pieterh similarly as we do for the bindings
[13:01] Guthur ok, so I suppose they should have a reasonable level of maturity
[13:01] pieterh not necessarily but they should be tight extensions of 0MQ
[13:01] pieterh rather than apps which use it
[13:01] pieterh e.g. I'd consider zguide a project but not mongrel2
[13:02] Guthur oh ok, that clears it up
[13:02] pieterh ideally all these projects would gravitate towards the same workflow, core community of contributors, infrastructure, etc.
[13:02] pieterh like the bindings
[13:03] pieterh I had this vision of making it into a dashboard like this: http://extensions.wdeditor.com/
[13:03] pieterh that's based on my design
[13:03] pieterh but it'd have to be red/black/white of course :-)
[13:04] Guthur of course, hehe
[13:05] pieterh so you come to the community site and see a whole bunch of projects, each with a name/person/graphic
[13:05] pieterh I guess we're moving towards that very slowly
[13:06] Guthur so for an example, where would a implementation of the FIXT 1.1 (Transport Independent) protocol using ZeroMQ as the transport lie
[13:06] pieterh it's really up to the owner
[13:06] pieterh it's a choice: move it into the 0MQ community or keep it separate
[13:07] Guthur ok
[13:07] pieterh if, for example, there were several such bridges, it would be great to see them as 0MQ projects
[13:08] pieterh let me give another example
[13:09] pieterh I'm working on Whaleshark (http://zero.mq/ws)
[13:09] pieterh which depends on a bunch of other 0MQ layers
[13:09] pieterh like a name service, security service, etc.
[13:09] pieterh it could be fun to also include FIXT support
[13:09] pieterh so if the FIXT layer was aimed at 0MQ apps like Whaleshark, it's a natural 0MQ project
[13:10] pieterh but if it's aimed at FIXT apps, it's not
[13:10] Guthur FIXT seemed like a nice place to start with FIX and 0MQ, due to its transport independent spec
[13:10] Guthur ok i understand
[13:11] pieterh acid test would be, do you discuss project X here and on zeromq-dev, or on some other forum
[13:14] Guthur would it be possible to offer commercial support for such projects via a corporate entity, similar to how imatix is mentioned for whaleshark?
[13:14] pieterh of course
[13:15] pieterh that's why there's a 'website' column
[13:15] pieterh you'd probably not be able to use the zeromq.org domain without iMatix agreeing
[13:16] Guthur that's reasonable
[13:46] sustrik pieterh: it seems there a problem with the mailing list
[13:46] sustrik i've sent an email
[13:46] sustrik it haven't apperared
[13:46] pieterh hmm, ok, let me restart the server...
[13:48] pieterh rebooting, it'll take a minute or so
[13:48] pieterh there's a service (spam filter afair) which gets confused now and then
[13:59] pieterh sustrik: didn't help, I'm contacting Ewen
[14:33] sustrik thx
[14:41] Seta00 I need an example that uses polling on a sub socket :/
[14:42] pieterh Seta00: poll works the same on all socket types
[14:42] Seta00 well then I need an example that uses polling
[14:42] pieterh there are lots in the Guide
[14:43] Seta00 kk I'll check
[14:53] pieterh sustrik: I've put a note on the community page, this sucks, sorry
[15:55] travlr pieterh: just had to mention how much i appreciate the work you did with the online reference... much much nicer to work with... very thorough too! thanks.
[15:56] pieterh travlr: you mean the new API site?
[15:56] travlr yes
[15:56] pieterh np :-) it was fun to make
[15:56] travlr cool. thanks again for all
[15:56] pieterh we needed to cover older/newer versions anyhow
[15:57] travlr yes, very smooth and easy to work with
[16:51] private_meta Does the router in a router-to-dealer-relationship know when a dealer connects, even if it didn't send a message yet? Meaning, can I as a user of the router know that?
[16:53] pieterh private_meta: not when it connects, but if it sends a message, yes
[16:54] pieterh any router-to-anything depends on the anything sending something to the router first
[16:54] private_meta kk...
[16:55] private_meta pieterh: so that no messages are lost in a router-dealer-relationship the router must wait for the first message to arrive
[16:56] private_meta well, sounds logical now that i write it
[16:56] pieterh yes
[16:56] pieterh the router needs to know an address to send to
[16:56] pieterh that only comes with an input message
[16:56] pieterh unless (a) you pass the identities some other way
[16:56] pieterh or (b) you use durable sockets
[16:56] private_meta I'm in need of logon messages anyway
[16:57] pieterh and router is like pub: if there's no recipient, the message is not queued, it's dropped immediately
[16:58] private_meta pieterh: I seem to have overseen that in the docs, but what happens to a dealer trying to connect to a non-existant router, and how does the dealer know?
[16:59] pieterh it doesn't know unless it expects a reply and doesn't get one
[16:59] pieterh actually I'm writing this up now for Ch4
[17:00] private_meta so there is no such thing as "unknown host" or other error messages that I could get?
[17:00] pieterh nope
[17:01] pieterh note that tcp:// is a disconnected protocol... the host might be away at lunch and back in 2 hours, 0MQ will wait
[17:01] pieterh inproc:// will tell you if it can't connect
[17:01] private_meta Did you do that so you have an abstraction of any protocols?
[17:01] private_meta oh
[17:01] pieterh it's just more useful like that, for most apps
[17:02] private_meta I'm not quite sure how to implement a timeout to wait for that :/
[17:02] pieterh it's documented... hang on...
[17:02] pieterh ah, sorry, not yet pushed :-)
[17:03] private_meta huh=
[17:03] private_meta *huh?
[17:03] pieterh if you can wait a little while...
[17:03] private_meta define little while
[17:04] private_meta for some people, a week might be a little while, for others a little while is an hour :D
[17:05] private_meta As far as I figured, you use durable sockets where you have a fixed name whenever you reconnect (more or less), but also the router discards messages that are sent to a target it doesn't know. So if a router sends a message to a durable socket that is not yet connected, are these messages also discarded?
[17:06] pieterh durable sockets cannot be "not yet connected"
[17:06] pieterh a durable socket may be "temporarily away for lunch"
[17:07] pieterh i've no idea what a router socket does with durable sockets but I imagine it queues messages for them
[17:07] pieterh that would be consistent with PUB, but it's not documented afaik
[17:07] private_meta kk, so a computer where the durable socket is located on which, let's say, reboots, is "away for lunch" for the router?
[17:07] private_meta -which
[17:07] pieterh the whole business of "XREP discards and does not queue messages it can't route" is not documented
[17:08] private_meta kk
[17:24] pieterh private_meta: ok, http://zguide.zeromq.org/page:all#toc67
[17:28] private_meta sweet
[17:29] private_meta pieterh: So the initial timeout is oc pretty much the first heartbeat not coming through I assume?
[17:30] pieterh it's not quite that simple
[17:30] private_meta how so?
[17:30] pieterh you need a clock for the poll, should be the lowest heartbeat interval
[17:30] pieterh if you use the same heartbeat for all peers, that value
[17:30] pieterh then you need to allow for 2-3 lost heartbeats before declaring a 'disconnected peer'
[17:31] private_meta Yes, seems like a good thing to allow for single lost messages.
[17:32] private_meta Uhm... a "lost heartbeat" would be, in your case, a certain heartbeat not receiving a reply, wouldn't it? Isn't 0mq build so, if the client decides to connect one day, all those "lost" heartbeats would be sent?
[17:32] private_meta *built
[17:33] pieterh heartbeats don't get replies
[17:33] pieterh they are asynchronous in both directions
[17:33] private_meta ah yeah
[17:33] private_meta sorry, true
[17:33] pieterh please read the code and the docs...
[17:33] private_meta I will
[17:33] private_meta sorry for asking prematurely :)
[17:36] pieterh np, if there's anything unclear or missing in the text, let me know
[17:36] pieterh it's a first draft and raw
[17:40] private_meta pieterh: to get it straight, you would use one zmq_poll call with infinite timeout for message transfer and one with heartbeat timeout to send heartbeat messages?
[17:40] pieterh i don't think that's what the examples do
[17:40] private_meta You mean the pirate example?
[17:41] pieterh any of them
[17:41] private_meta Okay, I'll look at that one
[17:41] pieterh it's tempting to do heartbeating via a second socket
[17:41] pieterh this is a bad idea for two or three reasons
[17:41] pieterh which I'll document
[17:44] pieterh "First, if you're sending data you don't need to send heartbeats. Second, sockets may, due to network vagaries, become jammed. You need to know when your main data socket is silent because it's dead, rather than just not busy, so you need heartbeats on that socket. Lastly, two sockets is more complex than one."
[17:54] cremes is there a C FORWARDER device in the zguide anywhere? i can't seem to find one and I'd like one for testing
[18:02] pieterh cremes, afaik the msgqueue example will work if you use PUB and SUB
[18:02] pieterh a forwarder just reads and writes two sockets
[18:02] cremes pieterh: ok, i'll try it
[18:03] pieterh sorry, msgqueue just calls the built-in device, that's not what you want, is it
[18:03] pieterh you want the actual core, poll / recv / send?
[18:03] cremes no, i just want something that will subscribe to everything and publish out the other side
[18:04] cremes the built in device is probably okay then, yes?
[18:04] pieterh yes
[18:04] pieterh it's the same code for all three devices
[18:04] pieterh the only differences are the bind/connect directions and socket types
[18:09] zedas pieterh: what?! not even http://mulltedb.org :-)
[18:10] zedas pieterh: or i mean http://mulletdb.org/ :-)
[18:10] cremes pieterh: looks like i don't need it; i have isolated another slow leaker with PUB sockets
[18:10] pieterh zedas: uhm, what's the question?
[18:10] pieterh cremes: really, and it's not even Friday yet?
[18:11] cremes :)
[18:11] cremes well, i need to verify one or two more things.... but yeah
[18:12] pieterh zedas: you mean for the 0MQ projects list?
[18:12] pieterh and it's mulletdb.com, :-)
[18:13] zedas damn, see i don't even care about that project.
[18:13] zedas pieterh: yeah i was joking about "projects"
[18:14] pieterh yeah, the love shows
[18:14] pieterh tokyo cabinet seems useful
[18:14] pieterh not so sure about that zeromq stuff you are so keen about
[18:23] cremes false alarm on that leak... i was calling setsockopt(LINGER) after zmq_connect()
[18:23] cremes i guess it doesn't honor it after the socket has been bound/connected
[18:23] cremes or is that a bug?
[18:24] cremes nope, not a bug according to the man page
[18:41] sp4ke Hi
[18:42] sp4ke can anyone help me setting up zeromq with my project on Visual Studio 2010
[18:42] sp4ke i get unresolved external symbols when i build projects
[18:42] sp4ke i built the libzmq project and added the path to the directory on my project dpendencies
[18:52] sustrik the libs are in libs subdir
[18:52] sustrik iirc
[18:53] sp4ke in the libs subdir i've got only a libzmq.dll and libzmq.ilk
[18:53] sp4ke how can i add these files as dependencies in VS ?
[18:54] sp4ke i mean other than specify the path in the Librarry Directories which i did
[18:55] sustrik there should be libzmq.lib iirc
[18:55] sustrik you should link that with your project
[18:58] sp4ke ok thanx i found a discussion on irc archive it's common problem to not get the .lib the answer should be there
[19:29] pieterh cremes: you can set LINGER at any time before close, afaics
[19:30] cremes the docs say otherwise: "Caution: All options, with the exception of subscription strings, only take effect for subsequent socket bind/connects."
[19:30] cremes that's from the zmq_setsockopt man page
[19:30] cremes i don't think it's lying... my testing appears to bear this out
[19:32] pieterh i've been using LINGER in examples to stop zmq_term blocking, and I use it just before close
[19:32] pieterh something to clarify...
[19:32] cremes indeed
[19:33] pieterh example like https://github.com/imatix/zguide/blob/master/examples/C/lpclient.c
[19:59] mikko sigh
[20:05] Guthur cremes pieterh: that was my update
[20:06] pieterh Guthur: yeah, but is it accurate?
[20:06] Guthur sustrik mentioned that all options should be set before connect
[20:06] mikko Guthur: not all
[20:06] mikko zmq_subscribe can be set afterwards
[20:07] Guthur mikko, yeah besides that
[20:07] pieterh mikko: that's what the text says :-)
[20:07] pieterh Guthur: it should IMO say "ZMQ_SUBSCRIBE" rather than "subscription strings" but that's minor
[20:08] pieterh ZMQ_SUBSCRIBE, ZMQ_UNSUBSCRIBE, ZMQ_LINGER can afaik be set at any time
[20:09] pieterh not sure about ZMQ_RECONNECT_IVL
[20:09] Guthur ok, I can post another update patch
[20:09] Guthur if that's ok
[20:09] pieterh we need El Sustrik's formal confirmation with an "are you sure", IMO
[20:10] pieterh I made an issue: https://github.com/zeromq/zeromq2/issues/173
[20:10] Guthur hehe, yep that's are very sensible idea
[20:10] pieterh there are a couple of fuzzy areas that cropped up
[20:20] pieterh omg, I'm reinventing AMQP for Ch4... :-/
[20:21] pieterh please shoot me now before this goes too far
[20:24] Guthur at some point someone is bound to say 'It would be nice if core had this'
[20:24] Guthur and then that will be the end
[20:24] pieterh nah, it's all just user-space patterns
[20:25] pieterh the key IMO is not even software, but documented protocols
[20:25] Guthur is AMQP poorly documented?
[20:25] Guthur I am not very familiar with it to be honest
[20:26] pieterh hmm, depends on the version of AMQP, there are quite a few
[20:26] pieterh on this page http://www.amqp.org/confluence/display/AMQP/AMQP+Specification
[20:26] pieterh only AMQP/0-8 and AMQP/0-9-1 are properly documented
[20:27] pieterh 0-9 and 0-10 don't even have dates in the document... very shoddy work
[20:27] pieterh every version is incompatible with every other version
[20:27] pieterh oh, don't get me started :-)
[20:28] Guthur I don't think i'll delve into it too deeply
[20:28] Guthur I've enough on my plate without getting lost in AMQP
[20:28] pieterh :-)
[20:51] cremes pieterh: can you confirm this leaks memory on your system? https://gist.github.com/848007
[20:51] cremes if so, i'll open a ticket and attach it
[20:51] sustrik it's only SUBSCRIBE and UNSUBSCRIBE that affect the connection after it is established
[20:52] cremes sustrik: i think i *might* have found another leak with PUB
[20:52] sustrik yes?
[20:52] cremes see this gist: https://gist.github.com/848007
[20:52] pieterh cremes: nope
[20:52] cremes if someone can confirm it leaks on their system, i'll open a ticket
[20:52] pieterh it does not leak
[20:52] pieterh it does consume 300% CPU
[20:53] pieterh but memory usage is stable: "7867 ph 20 0 198m 1904 1148 S 312 0.0 1:09.50 leaker6 "
[20:53] cremes hrmm...
[20:53] pieterh sustrik: I've tested LINGER and it definitely works after the connection is established
[20:54] sustrik aaaah
[20:54] sustrik i recall something like that dimly
[20:54] sustrik let me check the code
[20:54] pieterh Ergo^: are you on the latest 0MQ?
[20:56] pieterh Ergo^: check the release notes, Ctrl-C was fixed but I don't recall exactly what version
[20:59] cremes pieterh: ah! make a small change to that code and it will leak like a sieve
[20:59] cremes change the number of client threads it spawn to something greater than 1
[20:59] pieterh cremes... put the 'free' into comments?
[20:59] pieterh ah, will try
[20:59] pieterh Ergo^: did you read the Guide yet?
[20:59] cremes i think it's a race condition bug
[20:59] pieterh cremes: I'll spend 10 minutes on that, would you spend 10 minutes reviewing http://rfc.zeromq.org/spec:7?
[21:00] cremes my pleasure
[21:00] pieterh Ergo^: until you've read at least Ch1 and Ch2, you're kind of in RTFM mode here
[21:01] sustrik ack: LINGER is socket-wide
[21:01] sustrik not connection-wide
[21:02] pieterh cremes: I hereby name this ship the "Leaky and Nasty"
[21:02] pieterh 7993 ph 20 0 1853m 1.4g 1148 S 382 17.6 2:42.12 leaker6
[21:02] cremes huzzah!
[21:02] pieterh That's 1.4g of memory in about 30 seconds
[21:02] pieterh with 10 client threads
[21:02] cremes i can email you guys a call-tree backtrace if that is helpful to you
[21:02] cremes yeah, same thing happens on my box
[21:03] pieterh i love it when people send beautiful C code that reproduces problems...
[21:03] cremes btw, it doesn't leak as fast when the LINGER line is uncommented but it still leaks *rapidly*
[21:10] sustrik what unit is s_clock() in?
[21:10] cremes milliseconds
[21:11] mikko success!
[21:12] sustrik cremes: ok, what about the cpu usage?
[21:12] mikko i managed to create pure shell-script that executes zeromq build and sends results over http to jenkins
[21:12] sustrik a peak followed by flat line?
[21:12] pieterh mikko: nice!
[21:12] cremes sustrik: let me take a look
[21:13] mikko also, on the other news. i am bringing up powerpc (debian 6.0) build slave soon(ish)
[21:13] cremes sustrik: did you update the code to use 2+ client threads? i see cpu spike and *stay* there
[21:13] sustrik mikko: btw, i've had a discussion with a guy who has problems building 0mq under mingw-win64
[21:14] cremes sustrik: reload that gist if you like; i updated it to create 5 client threads which more readily show the leak
[21:14] mikko sustrik: what is the problem?
[21:14] mikko using mingw64?
[21:14] sustrik order of includes, presumably
[21:14] sustrik https://github.com/zeromq/zeromq2/issues/#issue/60
[21:15] sustrik i just though it can possibly make sense to add that to builds
[21:15] sustrik cremes: ok, so it's processing something
[21:15] mikko sustrik: the current cluster is 32bit hardware
[21:15] sustrik that definitely looks like a bug
[21:15] mikko that's slightly problematic
[21:16] mikko would need a win64 box (i presume)
[21:16] sustrik ah, i though it's a cross-compile
[21:16] mikko or does the cross-compile work on 32bit?
[21:16] sustrik never mind
[21:16] sustrik no idea
[21:16] sustrik check the issue
[21:16] mikko can't do 'make check' without win64
[21:16] mikko i can add build
[21:16] sustrik mikko: spot on
[21:16] sustrik i forgot about the tests
[21:17] cremes sustrik: yes, i agree; i changed the publish interval to 500ms and cpu remains high
[21:17] cremes sustrik: whatever it is processing, it's stuck
[21:17] sustrik right
[21:17] cremes sustrik: i can send you the call-tree for the code that is allocating (and holding onto) all of this memory if that's helpful
[21:17] pieterh cremes: I think I see the problem
[21:17] sustrik yes, please
[21:18] pieterh the client is never pausing for breath
[21:18] sustrik it's not, but it's time-limited
[21:18] pieterh server can't keep up
[21:18] sustrik so it should send for 200ms
[21:18] pieterh let me set a HWM and do small sleep in the client after closing a socket...
[21:18] sustrik then stop
[21:18] pieterh the clock in the client has no purpose at all afaics
[21:20] cremes ok, so a small sleep inside the publish loop fixes it
[21:20] cremes but shouldn't it just drop those messages if they are in queue and undelivered?
[21:20] cremes LINGER = 0 in this case
[21:21] pieterh cremes: if I sleep 1 second after each publish burst, client memory usage is flat
[21:21] pieterh they are sent to publisher before you close the socket
[21:21] pieterh the memory consumption is in the server queue
[21:22] cremes hmmm, i can believe that
[21:22] sustrik 2 producers are definitely going to overload one consumer
[21:22] pieterh hmm, indeed, I set 10k HWM ons server socket, still runs out of memory
[21:22] sustrik you have to set HWM to make excess messages be dropped
[21:22] pieterh setting 10K HWM on client socket AND sleeping in between bursts, it's ok
[21:23] sustrik what about HWM on both sender and receiver?
[21:23] pieterh cremes: ah...
[21:23] pieterh LINGER is only executed at zmq_term time!
[21:24] sustrik zmq_close() time, to be precise
[21:24] pieterh bleh, you're right, and doing init/term in teh loop makes no difference
[21:25] pieterh cremes: you always find the weird cases... :-)
[21:25] sustrik have you tried with HWM on both sides?
[21:25] pieterh have tried on either side, no difference
[21:25] cremes i didn't think HWM had any effect on a SUB socket...?
[21:25] sustrik i meant *both*
[21:25] sustrik not either
[21:26] sustrik cremes: it does
[21:26] sustrik it specifies how many messages can be buffered before 0mq starts dropping them
[21:26] pieterh sustrik: either, both, makes no visible difference
[21:26] sustrik ok, that looks like a buf
[21:27] sustrik bug
[21:27] cremes on the zmq_socket() man page, it says N/A for HWM on a SUB socket
[21:27] sustrik oh
[21:27] sustrik i see
[21:27] pieterh the only thing that seems to work is a long (1 second) sleep in the client loop
[21:27] sustrik the clients are creating new connections all the time
[21:27] cremes sustrik: right
[21:27] pieterh cremes: yeah, I remember that, it's a bug, no?
[21:28] sustrik meaning that the server creates a new buffer each time
[21:28] sustrik each buffer is limited by HWM
[21:28] sustrik but the number of buffers is unlimited
[21:29] sustrik there should be MAX_CONNECTIONS socket options...
[21:29] sustrik option*
[21:29] cremes that buffer should be dropped when zmq_close() is called so it should catch up, right?
[21:29] Guthur what is expected to happen if you poll before TCP sockets are fully connected?
[21:29] sustrik cremes: the buffer is dropped on the client side
[21:30] sustrik the server side buffer remains untill all the messages are read from it
[21:30] cremes sustrik: i thought zmq_connect() is what created the buffer
[21:30] pieterh Guthur: nothing in particular?
[21:30] cremes ok, right
[21:30] sustrik cremes: yes
[21:30] pieterh sustrik: yes, but are there multiple buffers at the server side?
[21:30] sustrik but the server side buffer remains in place while there are messages in it
[21:31] sustrik yes, one buffer per connection
[21:31] pieterh it's N client-side buffers (that should be destroyed by close + LINGER=0) + 1 server-side buffer
[21:31] Guthur pieterh, I'm getting strange behaviour on POSIX OSs (linux and OSX) with polling with CLRZMQ2
[21:31] pieterh setting HWM on sub socket (server) makes no difference
[21:31] pieterh Guthur: 'strange' = ?
[21:31] sustrik the socket on the server side is never closed
[21:32] sustrik so the buffers remain
[21:32] Guthur pieterh, well if I don't delay the polling ever so slightly it throws an exception
[21:32] pieterh sustrik... where is that 1.4Gb of memory sitting then?
[21:32] Guthur and a users seems to be getting similar problems on OSX
[21:32] Guthur user*
[21:32] sustrik lot of buffers in the server socket
[21:32] sustrik they are gradually being emptied and deallocated
[21:32] Guthur same code works on windows fine though
[21:33] sustrik but client create new buffers even faster
[21:33] Guthur without the delay
[21:33] mikko http://johanharjono.com/archives/633
[21:33] mikko installation instructions missing something?
[21:33] pieterh and HWM is for each buffer independently... not the socket as such
[21:33] pieterh Guthur: no idea, we'd need some test code that reproduces it
[21:33] sustrik yes, HWM is same as SO_SNDBUF and SO_RCVBUF
[21:33] sustrik local
[21:34] sustrik doesn't affect the peer
[21:34] Guthur it's all related to this issue: https://github.com/zeromq/clrzmq2/issues/13
[21:34] pieterh cremes: so what did you not know that led you to think this could work?
[21:35] cremes pieterh: i saw another resource leak and followed it back to the PUB socket
[21:36] Guthur i do notice that if I place it in a try block it also works, I put this down to the fact a try block will possibly delay the polling ever so slightly
[21:36] cremes i'll have to look and see if i am overrunning the SUB socket on the other side like in this example
[21:36] pieterh seems like that opening/closing the client sockets each time is the cause
[21:36] sustrik this is a problem i wanted to address for a long time but never quite get to do it
[21:36] pieterh Guthur: I can't really help, have no idea what the exception could be or why
[21:36] sustrik there should be a socket option limiting the max number of concurrent connecitons
[21:37] peter_NOrth is nial dalton on this IRC ever?
[21:37] pieterh sustrik: anti-DoS protection
[21:37] sustrik exactly
[21:37] pieterh useful, but here we have a problem of documentation IMO
[21:37] pieterh or something
[21:38] sustrik possibly
[21:38] pieterh it's unclear how HWM and LINGER help here
[21:38] pieterh (in fact they don't)
[21:38] sustrik LINGER is irrelevant
[21:39] sustrik because it affects the send side
[21:39] sustrik and the problem is on recv side
[21:39] pieterh yes, but that's not obvious
[21:39] sustrik HWM would help in combination with MAX_CONNECTIONS
[21:39] sustrik MAX_CONNECTION * HWM = max number of messages queued
[21:39] pieterh possibly HWM affecting socket rather than each buffer
[21:39] pieterh ah, yes
[21:40] sustrik * MAX_MSG_SIZE = max memory used
[21:40] pieterh ...calculating...
[21:40] pieterh 102523.2231GB
[21:40] pieterh yeah, that'll do
[21:41] pieterh sustrik: why not add MAX_CONNECTIONS and MAX_MSG_SIZE to the 3.0 roadmap?
[21:41] pieterh they are excellent ideas
[21:41] Guthur pieterh, errno 4 mean anything?
[21:42] pieterh documenting them will perhaps give someone the incentive to go make the patch
[21:42] sustrik it can be added to 2.x
[21:42] NoToes Hi Guther, I'm "johndeko". So you've managed to reproduce the poll timing issue? If so I wont bother to reproduce it outside of Unity.
[21:42] sustrik no backward compatibility problem
[21:42] pieterh sustrik: sure
[21:42] pieterh we have a 2.2 roadmap page?
[21:42] sustrik nope
[21:42] Guthur NoToes, I think so
[21:42] Guthur very strange one though
[21:43] pieterh sustrik: ok, I'm going to make it, I assume?
[21:43] sustrik why not
[21:43] NoToes Sure is!
[21:43] sustrik no, i'm not
[21:44] sustrik it's either brian granger or minrk
[21:44] Guthur NoToes, a sleep of at least 100 milliseconds before starting to poll and there is no problem
[21:45] Guthur but I don't think that's really what you want to hear
[21:45] pieterh sustrik: ok, done, and I added the socket type renames since there was consensus on that
[21:45] pieterh oh, I can provide a patch for that already :-)
[21:45] sustrik what renames?
[21:46] pieterh :-)
[21:46] pieterh XREP -> ROUTER, XREQ -> DEALER
[21:46] sustrik yuck
[21:46] NoToes Guther, not really. It doesn't fill me with certainty and makes fast updates impossible.
[21:46] Guthur here it's an interrupted syscall
[21:46] pieterh yeah, you should have said that when it was discussed on zeromq-dev
[21:46] pieterh les absents on toujours tort
[21:46] Guthur that's the exception
[21:46] Guthur NoToes, ^
[21:46] sustrik ok, good
[21:47] sustrik i'll add it as an alias
[21:47] pieterh sustrik: thread has title "[0MQ/3.0] discuss: rename XREP to ROUTER"
[21:47] pieterh but we can introduce the name change in 2.2 as we did for PUSH/PULL
[21:48] cremes Ergo^: if the python 0mq interface allows you to send multipart messages, make sure the topic is the first
[21:48] Guthur NoToes, it maybe that you only have to do this after first connecting, and then things will be fine unless you have to reconnect again, that's a guess though
[21:48] cremes Ergo^: part and your json-encoded string is the second part
[21:48] cremes Ergo^: don't be overly concerned that the api doesn't have a single call that does everything you want
[21:48] Guthur NoToes, I have not got the sleep in the polling loop, rather just before it, does this work for you?
[21:48] cremes Ergo^: you can build your own convenience method from the methods already present, right?
[21:48] NoToes Guthur, well easy enough for me to test.
[21:49] NoToes Guthur, I'll try it out.
[21:49] Guthur cool
[21:50] Guthur sustrik, any idea why we would get an "Interrupted system call" error when polling to quickly after a TCP socket connection
[21:50] cremes Ergo^: i disagree; i don't think the api should have any explicit method dealing with json
[21:50] cremes Ergo^: why not a different serialization format? what is json's connection to 0mq?
[21:51] cremes Ergo^: i guess i fail to see the problem here; you can easily accomplish what you want with a 3-line method
[21:51] cremes Ergo^: why does it matter that the api doesn't already have it? write it and send in a patch...?
[21:51] sustrik Guthur: presumably, there's a signal generated somewhere
[21:52] mikko sustrik: http://build.zero.mq/job/ZeroMQ2-core-master_mingw64/5/console
[21:52] mikko mingw64 cross compile running
[21:52] mikko well, was running
[21:52] sustrik wow, that was quick
[21:52] cremes Ergo^: ok!
[21:53] mikko sustrik: not sure if that is my environment or something else
[21:53] sustrik no windows.h
[21:53] sustrik strange
[21:53] mikko might be something odd with the build i guess
[21:54] mikko ./configure --host=amd64-mingw32msvc --target=mingw64
[21:54] mikko do i need anything else?
[21:55] sustrik no idea
[21:55] NoToes Guthur, no luck with a sleep before the poll loop.
[21:55] sustrik try asking the guy who filled the issue
[21:55] mikko ok, will investigate
[21:55] sustrik he's pretty responsive
[21:56] cremes sustrik: what would you say is holding onto the memory if you saw this callstack? https://gist.github.com/848123
[21:56] sustrik that are messages
[21:56] cremes are they unsent and in a queue?
[21:57] sustrik they are received by I/O thread and waiting to be read by the application
[21:58] cremes sustrik: i don't understand that... it's from a PUB socket, so what is waiting to read it?
[21:58] sustrik sorry?
[21:58] sustrik I/O thread reads messages from TCP connections and buffers them
[21:58] sustrik application reads them
[21:59] cremes that call-tree is for a pub socket that is sending messages
[21:59] cremes i don't understand why you say the i/o thread has received them and is waiting for the application to read them
[21:59] sustrik oops
[21:59] cremes i though pub was broadcast, fire-and-forget
[21:59] sustrik missed the first line
[21:59] sustrik it is
[22:00] sustrik but there's some reliability built in
[22:00] cremes pieterh: sent you some feedback on that rfc
[22:00] sustrik namely, up to HWM messages are buffered before 0mq starts dropping them
[22:00] pieterh cremes: our email server is dead atm
[22:00] cremes ok, so what are the conditions that will cause pub to hang onto those messages?
[22:00] sustrik by default, HWM=infinite
[22:00] pieterh cremes: could you resend to pieterh@gmail.com, thanks
[22:00] cremes pieterh: that explains why the email bounced!
[22:01] pieterh bounced? that's not nice... rats...
[22:01] NoToes Guthur, adding a System.GC.Collect() instead of a sleep also works.
[22:01] cremes sustrik: ok, so they are in queue because there is a slow subscriber somewhere; is that right?
[22:02] Guthur NoToes, that's even weirder
[22:02] sustrik yes
[22:02] cremes ok
[22:02] sustrik to guard against slow consumers
[22:02] Guthur NoToes, but the sleep did not work for you?
[22:02] NoToes Guthur, doesn't say much if just takes up some time.
[22:02] sustrik all buffering has to have upper limit
[22:02] cremes and if there are *no* subscribers, it should just drop those messages, yes?
[22:03] sustrik so we need at least 3 options: HWM, MAX_CONNECTIONS, and MAX_SIZE
[22:03] sustrik cremes: yes
[22:03] cremes cool
[22:03] cremes i must have a slow subscriber somewhere.... damn it
[22:03] sustrik well, if you are doing something like the example posted
[22:04] sustrik i.e. publishing at full speed from serveral apps to a single app
[22:04] sustrik it's just going to blow up
[22:04] peter_NOrth dalton
[22:04] cremes i don't think i have that configuration though... i'll have to dig into this; thanks for your help
[22:05] sustrik you are welcome
[22:08] Guthur NoToes, crumbs, I can not replicate anymore
[22:09] Guthur it's just working now, grrr
[22:11] NoToes Guthur, That's timing bugs for you!
[22:14] pieterh cremes: thanks for the review, made changes
[22:14] pieterh could you send me that email bounce message so I can see the error?
[22:15] cremes pieterh: it wasn't a real bounce; the mail app refused to take a message to sustrik probably because it was too large
[22:15] cremes pieterh: so... never mind!
[22:15] pieterh ok
[22:19] pieterh cremes: could you send me random something to ph@imatix.com?
[22:20] pieterh I've fixed our email server but need to test
[22:20] NoToes Guthur, I missed your message. No putting a sleep before the loop, instead of in the poll loop didn't work.
[22:20] cremes pieterh: on its way
[22:20] pieterh zeromq-dev should be working again now
[22:20] pieterh thx!
[22:20] Guthur NoToes, There is no error with you either?
[22:21] NoToes Guthur, no error.
[22:21] NoToes Guthur, zmq_poll just always returns 0.
[22:21] pieterh sustrik: email list is fixed
[22:22] pieterh messages will be coming in slowly as servers retry
[22:22] Guthur NoToes, it seems as if I am getting a slightly different issue then
[22:22] Guthur Mine returns errno 4, if there the slight delay before starting the polling loop
[22:23] Guthur this translates to an "Interrupted system call"
[22:25] NoToes Guthur, OK, different issue then.
[22:26] Guthur which is doubly annoying, hehe
[22:26] NoToes Guthur, I suppose I should try to reproduce this outside of Unity then.
[22:26] Guthur NoToes, that would be helpful, and much appreciated if you could
[22:35] Guthur is there any advisable action an app should take when getting EINTR while polling?
[22:37] Guthur NoToes, I have found that if I catch that EINTR and then continue all is fine
[22:37] Guthur OSX does signal things properly I assume, and unity isn't suppressing them even, or something
[22:38] Guthur I admit we are on the borders of my knowledge here
[22:38] Guthur probably left the country to be honest
[22:40] NoToes Guthur Unfortunately I'm new to OSX as well.Is EINTER a signal or a return code from zmq_recv?
[22:41] Guthur http://api.zeromq.org/master:zmq-poll
[22:41] Guthur EINTR is returned if there is a signal
[22:42] Guthur well not return actually, the errno is set
[22:42] Guthur poll returns -1
[22:42] NoToes Guthur, Ah OK.
[22:43] Guthur I think the issue I have here in linux MONO is something I can't really rectify, but is easily worked around
[22:44] Guthur The OSX one is a little less clear
[22:44] Guthur have you tried it outside Unity?
[22:49] NoToes Guthur, I'm trying now...
[22:53] NoToes Guthur, it's working inside Unity now :(
[22:55] Guthur I wonder if it's a MONO issue
[22:56] Guthur but that doesn't really make much sense either, it's only a relatively simple interop call
[23:03] NoToes Guthur I don't know enough about zmq to make sense of it. Is it possible that there is a shared native buffer referenced by multiple managed objects (or something like that)? Would explain why the running the GC helps and the timing issues.
[23:05] Guthur NoToes, I'm looking through now
[23:22] Guthur NoToes, Not seeing anything at the moment
[23:22] pieterh Ergo^_: build using --with-openpgm afir
[23:23] Guthur it's getting late here so i'll probably get my head down soon, sorry we've been unable to get this sorted for you
[23:23] Guthur hopefully we'll get to the bottom of it eventually
[23:23] pieterh :-)
[23:24] NoToes Guthur, Thanks for all your help.
[23:24] Guthur no probs
[23:24] Guthur I might drop by the MONO channel tomorrow and see if I can get an clues
[23:24] Guthur an/any
[23:25] Guthur ok, it's late, night all