ZeroMq IRC Log

Monday February 28, 2011

[Time] Name	Message
[09:13] pieterh	sustrik: you there?
[09:21] sustrik	pieterh: hi
[09:22] pieterh	I have this feeling as we roll out 2.1.x people will complain about zmq_term
[09:23] sustrik	we can switch LINGER default to 0
[09:23] pieterh	that would make sense IMO
[09:23] pieterh	or if not 0, then 1 second or whatever
[09:23] sustrik	but that would mean that such a simple program as:
[09:23] pieterh	neither 0 nor infinity are sensible defaults IMO
[09:23] sustrik	send();close();term();exit()
[09:24] sustrik	would not send anything
[09:24] pieterh	sensible defaults work for simple programs
[09:24] sustrik	anything larger than 0 and less than infinity does not make sense imo
[09:24] sustrik	such a default is a trap
[09:25] sustrik	it works in low-load envs (test env)
[09:25] sustrik	and break when high load hits
[09:25] sustrik	you can try some experimenting with throughput perf test
[09:25] pieterh	possibly, but if you make it hard for people to write simple code, they won't
[09:25] sustrik	if you set linger to say 10
[09:25] pieterh	and it's easy to make 0MQ explain wtf is going on
[09:25] sustrik	it works initially
[09:26] sustrik	unless you send a lot of messages
[09:26] pieterh	e.g. if linker is 1 second, it works, and if there is still unsent data at 1 second, it can say something
[09:26] sustrik	then it breaks mysteriously
[09:26] pieterh	mystery is entirely a design choice here
[09:26] pieterh	point is,
[09:26] pieterh	you're going to get a lot of complaints IMO
[09:26] sustrik	i'd be rather explicit than mysterious
[09:26] sustrik	let's see
[09:26] pieterh	right now it's extremely mysterious
[09:27] sustrik	well, it blocks
[09:27] sustrik	what's mysterious about that?
[09:27] pieterh	i don't even understand the reply you sent to Christian
[09:27] sustrik	deterministic behaviour
[09:27] sustrik	zmq_term() -> ETERM -> zmq_close()
[09:27] pieterh	deterministically useless behaviour
[09:27] sustrik	better than heisenbugs
[09:27] pieterh	erlang binding works around it
[09:28] pieterh	this is not going to end nicely...
[09:28] sustrik	what's the thing with erlang?
[09:28] pieterh	you didn't follow?
[09:29] sustrik	haven't seen anything
[09:29] pieterh	it tracks open sockets and secretly closes them all _before_ calling zmq_term
[09:29] sustrik	where's the discussion?
[09:29] pieterh	because an infinitely blocking system call is so insane
[09:29] pieterh	utterly... pathological
[09:30] sustrik	where's the discussion?
[09:30] pieterh	can't find it immediately
[09:31] pieterh	email is not a database
[09:31] pieterh	as I've pointed out many times, it's useless for finding back stuff
[09:31] pieterh	search for a thread titled "zmq_term() blocks in 2.1"
[09:32] pieterh	"For simplicity, the port driver doesn't handle any threads itself. Â And it never
[09:32] pieterh	actually calls a zmq library function that could block indefinitely."
[09:32] sustrik	zeromq-dev?
[09:32] pieterh	from chrisÂ <csrl@gmx.com>
[09:32] pieterh	yes, of course it's on zeromq-dev
[09:32] pieterh	and you were on that thread
[09:34] sustrik	aha, found it
[09:34] sustrik	that's not about LINGER
[09:34] sustrik	that's about necessity to close the sockets
[09:34] sustrik	same problem you've reported a long time ago
[09:35] pieterh	you mentioned Linger, I just pointed out that zmq_term blocks
[09:35] pieterh	and that people will hit this more and more
[09:36] sustrik	ok, there are several issue involved:
[09:36] sustrik	1. necessity to close sockets
[09:36] sustrik	it would be nice to be able to avoid that
[09:37] sustrik	however, i have no idea of how to do that
[09:37] sustrik	2. linger
[09:37] pieterh	no other classic OS API requires that kind of thing
[09:37] sustrik	3. Ctrl+C
[09:37] sustrik	this one was solved in one go with reaper thread
[09:37] sustrik	you are free to send a patch
[09:37] private_meta	uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
[09:38] pieterh	sustrik: "send a patch" is telling me to jump in the lake, you know that
[09:38] sustrik	well, i have no idea how to avoid 1.
[09:38] pieterh	I'm raising a concern that the "stable 2.1" release will annoy many people and break a lot of code
[09:38] sustrik	so, someone else have to
[09:38] pieterh	this makes it very hard for me to release that with any kind of confidence
[09:39] pieterh	except with a large disclaimer, which may be sufficient
[09:39] pieterh	but...
[09:39] pieterh	not when the explanation is confused
[09:40] sustrik	what should i say? i have no idea how to fix it
[09:40] sustrik	that's it
[09:40] pieterh	sustrik: look, I can document the need to close sockets
[09:40] pieterh	I can document the need to set LINGER even in the most trivial apps
[09:41] pieterh	that's just "oh, it's not so shiny and elegant anymore"
[09:41] pieterh	but if you tell people to call zmq_term before closing sockets, I'm kind of confused
[09:41] sustrik	let's not mix the issues
[09:41] sustrik	are you concerned about 1 or 2?
[09:41] pieterh	this is your breakdown, it's not mine
[09:42] sustrik	are you concerned about both?
[09:42] pieterh	my concern is just "zmq_term blocks in 2.1 and I don't know why"
[09:42] sustrik	ok
[09:42] sustrik	1. i can't solve it
[09:42] pieterh	it's relatively easy to solve in a simple threaded app
[09:42] sustrik	you are free to try
[09:42] pieterh	1. close sockets, 2. set linger = 0, terminate
[09:42] sustrik	2. linger is a problem
[09:42] pieterh	but if this starts to go wrong in multithreaded apps, people will _refuse_ to use 0MQ...
[09:43] pieterh	it's a class 1 fatal "no go" problem that will stop it going into production
[09:43] sustrik	setting linger to some default value
[09:43] pieterh	"sorry, we can't solve it" may be one answer
[09:43] pieterh	but it's a really crappy answer
[09:43] sustrik	means that even a single message won't pass
[09:43] sustrik	given it's sufficiently long and/or network is sufficiently slow
[09:43] pieterh	like I said, there are easy ways to make that work
[09:43] sustrik	?
[09:44] sustrik	sure, do so
[09:44] pieterh	a. use a sensible default linger value
[09:44] pieterh	b. if the app still has unsent messages after that, issue a loud warning
[09:44] sustrik	hm, returning an error from zmq_term()?
[09:44] sustrik	that may work
[09:45] pieterh	no, a loud warning
[09:45] sustrik	what's that?
[09:45] pieterh	printf ("123 messages not sent, please raise ZMQ_LINGER on socket")
[09:45] pieterh	etc.
[09:45] pieterh	something that gets literally printed and sent to logs
[09:45] pieterh	or sent on sys://log if that ever goes live
[09:45] sustrik	that works only with console is available
[09:45] sustrik	problem on windows
[09:45] pieterh	so, people _need_ consoles on production systems
[09:46] sustrik	but the error would kind of make sense
[09:46] sustrik	let me think about it
[09:46] pieterh	returning an error?
[09:46] sustrik	rc = zmq_term()
[09:46] pieterh	maybe but it just makes the caller responsible again
[09:46] pieterh	yes, it would at least be consistent
[09:46] sustrik	if (rc == EPENDINGMESSAGES)...
[09:46] pieterh	yes
[09:46] pieterh	and then set LINGER to 1 second by default please
[09:47] sustrik	ok, i'm going to think about it
[09:47] pieterh	this simplifies simple cases
[09:47] sustrik	the close() problem remains though
[09:47] pieterh	as for the deadlock issue, it just needs accurate documentation
[09:47] sustrik	ok
[09:47] pieterh	accurate, i.e. precisely what do people have to do to avoid it
[09:48] pieterh	this change to linger would be very good, at least it'll distinguish the deadlock from infinite linger
[09:48] pieterh	that's a headache today, not knowing what's actually going wrong
[09:49] sustrik	it won't distinguish the two cases :(
[09:49] sustrik	it will just timeout the term() after a while
[09:49] sustrik	and allow to restart it
[09:50] pieterh	you would also timeout the deadlock?
[09:50] sustrik	yes
[09:50] pieterh	...
[09:50] pieterh	but in the deadlock case there are zero messages to send
[09:50] sustrik	the deadlock is caused by the handshake
[09:51] sustrik	"tell me whether there are more you've queued"
[09:51] sustrik	"ok, there are no more messages"
[09:51] sustrik	the application thread's part of the handshake is executed in zmq_close() call
[09:51] pieterh	right
[09:52] pieterh	well, we know how many sockets are not responding, right?
[09:52] sustrik	yes
[09:52] pieterh	that's valuable information to report
[09:53] sustrik	yup
[09:53] pieterh	rc = number of unclosed sockets, maybe
[09:54] sustrik	possibly
[09:55] pieterh	you can't use EPENDINGMESSAGES unless you know there are actually messages waiting
[09:55] pieterh	something like ETIMEOUT
[09:55] pieterh	if we can make this work sensibly, IMO 2.1 is ready for the big stage
[09:56] pieterh	coffee, brb
[09:57] sustrik	to be ready for big stage we need ubscription forwarding :\|
[10:03] pieterh	uhm, no, you just don't need to break every app already running...
[10:04] sustrik	well, it's actually a bugfix
[10:04] sustrik	people complained that messages are dropped on exit
[10:04] sustrik	namely, mato
[10:05] pieterh	well, there's always someone complaining... :-)
[10:06] pieterh	my radar mainly focuses on the dev list
[10:06] sustrik	it have been a common complaint back then
[10:06] pieterh	LINGER per socket is also kind of a strange choice
[10:07] sustrik	it's POSIX
[10:07] pieterh	yes, this was a necessary change, no argument with that
[10:07] pieterh	zmq_term is not POSIX :-)
[10:07] sustrik	zmq_term = OS shutdown
[10:07] pieterh	nope
[10:07] sustrik	yes
[10:08] pieterh	sigh
[10:08] pieterh	then why am I calling "Shutdown OS" in my apps?
[10:08] pieterh	0MQ is _not_ a kernel module
[10:08] pieterh	sorry, this is 2011 and we're on version 2.x.x
[10:08] pieterh	please remain in the present
[10:08] Steve-o	lol
[10:08] pieterh	you may have a vision of where 0MQ will go
[10:08] sustrik	zmq_term is equivalent to OS shutdown
[10:09] sustrik	not OS shutdown itself
[10:09] pieterh	but we are discussing today's code and today's design
[10:09] pieterh	again, I do not call OS shutdown in my apps
[10:09] sustrik	it does the same thing the TCP does with tx buffers on OS shutdown
[10:09] pieterh	please, this analogy is not helpful
[10:09] pieterh	it really is not helpful
[10:10] sustrik	it's what it does
[10:10] pieterh	"Sorry, sir, your app is deadlocking because zmq_term is like OS shutdown"
[10:10] sustrik	shrug
[10:10] sustrik	no point in this discussion
[10:10] pieterh	well, it'll keep coming back
[10:10] sustrik	i'll have a look at the timout for zmq_term()
[10:10] pieterh	you won't be able to make it work IMO
[10:10] pieterh	because you have LINGER per socket not per context
[10:11] sustrik	they are two different timouts
[10:11] pieterh	how would you modify the term timeout?
[10:11] pieterh	as a user, I mean
[10:11] sustrik	zmq_term_wiat (void *ctx, int timeout);
[10:12] pieterh	so revert the old method to not blocking, and introduce a new one?
[10:12] sustrik	introduce a new one
[10:12] pieterh	+1, gets my vote
[10:13] pieterh	it is totally explicit and leaves 2.0 semantics unchanged
[10:13] sustrik	reverting zmq_term() to immediate would be consistent with 2.0
[10:13] pieterh	ack
[10:13] sustrik	however, 2.1 users may complain
[10:13] sustrik	so it's up to concensus
[10:13] pieterh	HEY EVERYONE!!!
[10:14] pieterh	please ack/nack sustrik's suggestion here...
[10:14] sustrik	something like that
[10:14] sustrik	on mailing list
[10:15] pieterh	yes
[10:15] pieterh	it's a major topic, would you raise it then?
[10:16] sustrik	i have to think about the whole thing first
[10:17] pieterh	ok
[11:14] private_meta	hmm
[11:14] private_meta	uhm... which example set should I refer to when having multiple clients connect to the same port (in case of tcp), when I need to know which client sends the message and which client to send the message to, because Handling Multiple Sockets only seems to work with dedicated ports, or am I mistaken?
[11:15] pieterh	private_meta: Chapter 3 of the Guide
[11:16] pieterh	various routing based on using XREP / ROUTER socket and identities of peers
[12:14] stimpie	Does anyone have measurements or explanations on performance on a connection per thread/cpu versus a singele connection per system with a dispatcher to each thread?
[12:18] ianbarber	threads vs events? that's a big argument :) the http://www.kegel.com/c10k.html c10k page is a good overview, not 0MQ specific
[12:25] stimpie	thats an interesting read but not exactly what I'am thinking about, I have a system with x cores and x threads, messages from other systems need to arrive at those threads. I can create a socket for each thread or create 1 socket from where messages are dispatched to each thread.
[12:26] stimpie	With one socket each physical devices has only one address and other systems do not have to take the number of threads per system into account
[12:27] stimpie	Each thread a socket appears faster to me but requires more 'global' knowledge, (messages should be duplicated across physical machines)
[12:29] ianbarber	yeah, i see
[12:30] pieterh	stimpie: it's not really an either/or choice IMO
[12:30] pieterh	on the one hand you need a frontend able to poll your 10K sockets
[12:30] pieterh	but you usually also need a bunch of threads to do the real work
[12:30] pieterh	however it is pathological to create one thread per socket
[12:31] pieterh	see asyncsrv example in Chapter 3 of the guide
[12:32] ianbarber	pieterh: i think he's asking about one thread per core basically, with one tcp socket per thread, or a device on tcp with inproc/ipc to the other threads
[12:32] stimpie	ianbarber, those should have been my words ;-)
[12:33] pieterh	well, you want one thread per core for threads that do real work
[12:33] ianbarber	definitely
[12:33] pieterh	however, that does not map to TCP connections
[12:34] pieterh	not "one tcp socket per thread", nope
[12:34] pieterh	that would be an anti-pattern in 0MQ
[12:35] ianbarber	i think, tbh, that a forwarder type device would be fine, they're pretty quick. If you did want to have a TCP listener per core, then you could have them check in to a name service, and have your clients query the name service
[12:36] ianbarber	though I would have each of those TCP listeners be a separate process
[12:37] stimpie	So you think the overhead of the forwarde (dispatcher) would not be a negative impact?
[12:38] stimpie	Best way to find out is to benchmark I guess
[12:39] pieterh	stimpie: best way is to benchmark, try any device and see how it performs
[12:39] ianbarber	yeah
[12:40] stimpie	I will do, thanks for your thoughts
[12:40] pieterh	stimpie: the pattern I'd recommend is:
[12:40] pieterh	n clients, connecting as usual to a queue
[12:41] pieterh	m workers, where m is much smaller than n
[12:41] pieterh	queue talking over inproc to workers
[12:41] pieterh	total number of threads on the server is m + 1
[12:41] pieterh	if m is too large, you will lose time in context switching
[12:42] pieterh	sorry, total number of app threads on server is m + 1, there is also at least 1 I/O thread
[12:42] pieterh	so optimal value for m is (total cores on server box) - 2
[12:43] pieterh	assuming you can dedicate a whole multicore box to your server app
[12:43] pieterh	this would be for CPU-limited workers, it's different if they are I/O bound
[12:45] ianbarber	make sure to benchmark with as relastic conditions etc. as you can - it can be easy to benchmark with (say) much smaller messages than you'd normally use, and see a different performance character
[12:59] Guthur	pieterh: The new projects page seems a little similar to the labs page, imo
[12:59] pieterh	Guthur: yes, it's meant to overlap
[12:59] pieterh	this projects page is a temporary place to collect community projects
[13:00] pieterh	that is, projects we consider part of the 0MQ community and want to expose to potential contributors
[13:00] Guthur	ok, and then what is labs?
[13:00] pieterh	the Labs page goes a bit further and also doesn't really expose the core projects
[13:00] pieterh	so my idea with the projects business is to show these on the main community page
[13:00] pieterh	similarly as we do for the bindings
[13:01] Guthur	ok, so I suppose they should have a reasonable level of maturity
[13:01] pieterh	not necessarily but they should be tight extensions of 0MQ
[13:01] pieterh	rather than apps which use it
[13:01] pieterh	e.g. I'd consider zguide a project but not mongrel2
[13:02] Guthur	oh ok, that clears it up
[13:02] pieterh	ideally all these projects would gravitate towards the same workflow, core community of contributors, infrastructure, etc.
[13:02] pieterh	like the bindings
[13:03] pieterh	I had this vision of making it into a dashboard like this: http://extensions.wdeditor.com/
[13:03] pieterh	that's based on my design
[13:03] pieterh	but it'd have to be red/black/white of course :-)
[13:04] Guthur	of course, hehe
[13:05] pieterh	so you come to the community site and see a whole bunch of projects, each with a name/person/graphic
[13:05] pieterh	I guess we're moving towards that very slowly
[13:06] Guthur	so for an example, where would a implementation of the FIXT 1.1 (Transport Independent) protocol using ZeroMQ as the transport lie
[13:06] pieterh	it's really up to the owner
[13:06] pieterh	it's a choice: move it into the 0MQ community or keep it separate
[13:07] Guthur	ok
[13:07] pieterh	if, for example, there were several such bridges, it would be great to see them as 0MQ projects
[13:08] pieterh	let me give another example
[13:09] pieterh	I'm working on Whaleshark (http://zero.mq/ws)
[13:09] pieterh	which depends on a bunch of other 0MQ layers
[13:09] pieterh	like a name service, security service, etc.
[13:09] pieterh	it could be fun to also include FIXT support
[13:09] pieterh	so if the FIXT layer was aimed at 0MQ apps like Whaleshark, it's a natural 0MQ project
[13:10] pieterh	but if it's aimed at FIXT apps, it's not
[13:10] Guthur	FIXT seemed like a nice place to start with FIX and 0MQ, due to its transport independent spec
[13:10] Guthur	ok i understand
[13:11] pieterh	acid test would be, do you discuss project X here and on zeromq-dev, or on some other forum
[13:14] Guthur	would it be possible to offer commercial support for such projects via a corporate entity, similar to how imatix is mentioned for whaleshark?
[13:14] pieterh	of course
[13:15] pieterh	that's why there's a 'website' column
[13:15] pieterh	you'd probably not be able to use the zeromq.org domain without iMatix agreeing
[13:16] Guthur	that's reasonable
[13:46] sustrik	pieterh: it seems there a problem with the mailing list
[13:46] sustrik	i've sent an email
[13:46] sustrik	it haven't apperared
[13:46] pieterh	hmm, ok, let me restart the server...
[13:48] pieterh	rebooting, it'll take a minute or so
[13:48] pieterh	there's a service (spam filter afair) which gets confused now and then
[13:59] pieterh	sustrik: didn't help, I'm contacting Ewen
[14:33] sustrik	thx
[14:41] Seta00	I need an example that uses polling on a sub socket :/
[14:42] pieterh	Seta00: poll works the same on all socket types
[14:42] Seta00	well then I need an example that uses polling
[14:42] pieterh	there are lots in the Guide
[14:43] Seta00	kk I'll check
[14:53] pieterh	sustrik: I've put a note on the community page, this sucks, sorry
[15:55] travlr	pieterh: just had to mention how much i appreciate the work you did with the online reference... much much nicer to work with... very thorough too! thanks.
[15:56] pieterh	travlr: you mean the new API site?
[15:56] travlr	yes
[15:56] pieterh	np :-) it was fun to make
[15:56] travlr	cool. thanks again for all
[15:56] pieterh	we needed to cover older/newer versions anyhow
[15:57] travlr	yes, very smooth and easy to work with
[16:51] private_meta	Does the router in a router-to-dealer-relationship know when a dealer connects, even if it didn't send a message yet? Meaning, can I as a user of the router know that?
[16:53] pieterh	private_meta: not when it connects, but if it sends a message, yes
[16:54] pieterh	any router-to-anything depends on the anything sending something to the router first
[16:54] private_meta	kk...
[16:55] private_meta	pieterh: so that no messages are lost in a router-dealer-relationship the router must wait for the first message to arrive
[16:56] private_meta	well, sounds logical now that i write it
[16:56] pieterh	yes
[16:56] pieterh	the router needs to know an address to send to
[16:56] pieterh	that only comes with an input message
[16:56] pieterh	unless (a) you pass the identities some other way
[16:56] pieterh	or (b) you use durable sockets
[16:56] private_meta	I'm in need of logon messages anyway
[16:57] pieterh	and router is like pub: if there's no recipient, the message is not queued, it's dropped immediately
[16:58] private_meta	pieterh: I seem to have overseen that in the docs, but what happens to a dealer trying to connect to a non-existant router, and how does the dealer know?
[16:59] pieterh	it doesn't know unless it expects a reply and doesn't get one
[16:59] pieterh	actually I'm writing this up now for Ch4
[17:00] private_meta	so there is no such thing as "unknown host" or other error messages that I could get?
[17:00] pieterh	nope
[17:01] pieterh	note that tcp:// is a disconnected protocol... the host might be away at lunch and back in 2 hours, 0MQ will wait
[17:01] pieterh	inproc:// will tell you if it can't connect
[17:01] private_meta	Did you do that so you have an abstraction of any protocols?
[17:01] private_meta	oh
[17:01] pieterh	it's just more useful like that, for most apps
[17:02] private_meta	I'm not quite sure how to implement a timeout to wait for that :/
[17:02] pieterh	it's documented... hang on...
[17:02] pieterh	ah, sorry, not yet pushed :-)
[17:03] private_meta	huh=
[17:03] private_meta	*huh?
[17:03] pieterh	if you can wait a little while...
[17:03] private_meta	define little while
[17:04] private_meta	for some people, a week might be a little while, for others a little while is an hour :D
[17:05] private_meta	As far as I figured, you use durable sockets where you have a fixed name whenever you reconnect (more or less), but also the router discards messages that are sent to a target it doesn't know. So if a router sends a message to a durable socket that is not yet connected, are these messages also discarded?
[17:06] pieterh	durable sockets cannot be "not yet connected"
[17:06] pieterh	a durable socket may be "temporarily away for lunch"
[17:07] pieterh	i've no idea what a router socket does with durable sockets but I imagine it queues messages for them
[17:07] pieterh	that would be consistent with PUB, but it's not documented afaik
[17:07] private_meta	kk, so a computer where the durable socket is located on which, let's say, reboots, is "away for lunch" for the router?
[17:07] private_meta	-which
[17:07] pieterh	the whole business of "XREP discards and does not queue messages it can't route" is not documented
[17:08] private_meta	kk
[17:24] pieterh	private_meta: ok, http://zguide.zeromq.org/page:all#toc67
[17:28] private_meta	sweet
[17:29] private_meta	pieterh: So the initial timeout is oc pretty much the first heartbeat not coming through I assume?
[17:30] pieterh	it's not quite that simple
[17:30] private_meta	how so?
[17:30] pieterh	you need a clock for the poll, should be the lowest heartbeat interval
[17:30] pieterh	if you use the same heartbeat for all peers, that value
[17:30] pieterh	then you need to allow for 2-3 lost heartbeats before declaring a 'disconnected peer'
[17:31] private_meta	Yes, seems like a good thing to allow for single lost messages.
[17:32] private_meta	Uhm... a "lost heartbeat" would be, in your case, a certain heartbeat not receiving a reply, wouldn't it? Isn't 0mq build so, if the client decides to connect one day, all those "lost" heartbeats would be sent?
[17:32] private_meta	*built
[17:33] pieterh	heartbeats don't get replies
[17:33] pieterh	they are asynchronous in both directions
[17:33] private_meta	ah yeah
[17:33] private_meta	sorry, true
[17:33] pieterh	please read the code and the docs...
[17:33] private_meta	I will
[17:33] private_meta	sorry for asking prematurely :)
[17:36] pieterh	np, if there's anything unclear or missing in the text, let me know
[17:36] pieterh	it's a first draft and raw
[17:40] private_meta	pieterh: to get it straight, you would use one zmq_poll call with infinite timeout for message transfer and one with heartbeat timeout to send heartbeat messages?
[17:40] pieterh	i don't think that's what the examples do
[17:40] private_meta	You mean the pirate example?
[17:41] pieterh	any of them
[17:41] private_meta	Okay, I'll look at that one
[17:41] pieterh	it's tempting to do heartbeating via a second socket
[17:41] pieterh	this is a bad idea for two or three reasons
[17:41] pieterh	which I'll document
[17:44] pieterh	"First, if you're sending data you don't need to send heartbeats. Second, sockets may, due to network vagaries, become jammed. You need to know when your main data socket is silent because it's dead, rather than just not busy, so you need heartbeats on that socket. Lastly, two sockets is more complex than one."
[17:54] cremes	is there a C FORWARDER device in the zguide anywhere? i can't seem to find one and I'd like one for testing
[18:02] pieterh	cremes, afaik the msgqueue example will work if you use PUB and SUB
[18:02] pieterh	a forwarder just reads and writes two sockets
[18:02] cremes	pieterh: ok, i'll try it
[18:03] pieterh	sorry, msgqueue just calls the built-in device, that's not what you want, is it
[18:03] pieterh	you want the actual core, poll / recv / send?
[18:03] cremes	no, i just want something that will subscribe to everything and publish out the other side
[18:04] cremes	the built in device is probably okay then, yes?
[18:04] pieterh	yes
[18:04] pieterh	it's the same code for all three devices
[18:04] pieterh	the only differences are the bind/connect directions and socket types
[18:09] zedas	pieterh: what?! not even http://mulltedb.org :-)
[18:10] zedas	pieterh: or i mean http://mulletdb.org/ :-)
[18:10] cremes	pieterh: looks like i don't need it; i have isolated another slow leaker with PUB sockets
[18:10] pieterh	zedas: uhm, what's the question?
[18:10] pieterh	cremes: really, and it's not even Friday yet?
[18:11] cremes	:)
[18:11] cremes	well, i need to verify one or two more things.... but yeah
[18:12] pieterh	zedas: you mean for the 0MQ projects list?
[18:12] pieterh	and it's mulletdb.com, :-)
[18:13] zedas	damn, see i don't even care about that project.
[18:13] zedas	pieterh: yeah i was joking about "projects"
[18:14] pieterh	yeah, the love shows
[18:14] pieterh	tokyo cabinet seems useful
[18:14] pieterh	not so sure about that zeromq stuff you are so keen about
[18:23] cremes	false alarm on that leak... i was calling setsockopt(LINGER) after zmq_connect()
[18:23] cremes	i guess it doesn't honor it after the socket has been bound/connected
[18:23] cremes	or is that a bug?
[18:24] cremes	nope, not a bug according to the man page
[18:41] sp4ke	Hi
[18:42] sp4ke	can anyone help me setting up zeromq with my project on Visual Studio 2010
[18:42] sp4ke	i get unresolved external symbols when i build projects
[18:42] sp4ke	i built the libzmq project and added the path to the directory on my project dpendencies
[18:52] sustrik	the libs are in libs subdir
[18:52] sustrik	iirc
[18:53] sp4ke	in the libs subdir i've got only a libzmq.dll and libzmq.ilk
[18:53] sp4ke	how can i add these files as dependencies in VS ?
[18:54] sp4ke	i mean other than specify the path in the Librarry Directories which i did
[18:55] sustrik	there should be libzmq.lib iirc
[18:55] sustrik	you should link that with your project
[18:58] sp4ke	ok thanx i found a discussion on irc archive it's common problem to not get the .lib the answer should be there
[19:29] pieterh	cremes: you can set LINGER at any time before close, afaics
[19:30] cremes	the docs say otherwise: "Caution: All options, with the exception of subscription strings, only take effect for subsequent socket bind/connects."
[19:30] cremes	that's from the zmq_setsockopt man page
[19:30] cremes	i don't think it's lying... my testing appears to bear this out
[19:32] pieterh	i've been using LINGER in examples to stop zmq_term blocking, and I use it just before close
[19:32] pieterh	something to clarify...
[19:32] cremes	indeed
[19:33] pieterh	example like https://github.com/imatix/zguide/blob/master/examples/C/lpclient.c
[19:59] mikko	sigh
[20:05] Guthur	cremes pieterh: that was my update
[20:06] pieterh	Guthur: yeah, but is it accurate?
[20:06] Guthur	sustrik mentioned that all options should be set before connect
[20:06] mikko	Guthur: not all
[20:06] mikko	zmq_subscribe can be set afterwards
[20:07] Guthur	mikko, yeah besides that
[20:07] pieterh	mikko: that's what the text says :-)
[20:07] pieterh	Guthur: it should IMO say "ZMQ_SUBSCRIBE" rather than "subscription strings" but that's minor
[20:08] pieterh	ZMQ_SUBSCRIBE, ZMQ_UNSUBSCRIBE, ZMQ_LINGER can afaik be set at any time
[20:09] pieterh	not sure about ZMQ_RECONNECT_IVL
[20:09] Guthur	ok, I can post another update patch
[20:09] Guthur	if that's ok
[20:09] pieterh	we need El Sustrik's formal confirmation with an "are you sure", IMO
[20:10] pieterh	I made an issue: https://github.com/zeromq/zeromq2/issues/173
[20:10] Guthur	hehe, yep that's are very sensible idea
[20:10] pieterh	there are a couple of fuzzy areas that cropped up
[20:20] pieterh	omg, I'm reinventing AMQP for Ch4... :-/
[20:21] pieterh	please shoot me now before this goes too far
[20:24] Guthur	at some point someone is bound to say 'It would be nice if core had this'
[20:24] Guthur	and then that will be the end
[20:24] pieterh	nah, it's all just user-space patterns
[20:25] pieterh	the key IMO is not even software, but documented protocols
[20:25] Guthur	is AMQP poorly documented?
[20:25] Guthur	I am not very familiar with it to be honest
[20:26] pieterh	hmm, depends on the version of AMQP, there are quite a few
[20:26] pieterh	on this page http://www.amqp.org/confluence/display/AMQP/AMQP+Specification
[20:26] pieterh	only AMQP/0-8 and AMQP/0-9-1 are properly documented
[20:27] pieterh	0-9 and 0-10 don't even have dates in the document... very shoddy work
[20:27] pieterh	every version is incompatible with every other version
[20:27] pieterh	oh, don't get me started :-)
[20:28] Guthur	I don't think i'll delve into it too deeply
[20:28] Guthur	I've enough on my plate without getting lost in AMQP
[20:28] pieterh	:-)
[20:51] cremes	pieterh: can you confirm this leaks memory on your system? https://gist.github.com/848007
[20:51] cremes	if so, i'll open a ticket and attach it
[20:51] sustrik	it's only SUBSCRIBE and UNSUBSCRIBE that affect the connection after it is established
[20:52] cremes	sustrik: i think i might have found another leak with PUB
[20:52] sustrik	yes?
[20:52] cremes	see this gist: https://gist.github.com/848007
[20:52] pieterh	cremes: nope
[20:52] cremes	if someone can confirm it leaks on their system, i'll open a ticket
[20:52] pieterh	it does not leak
[20:52] pieterh	it does consume 300% CPU
[20:53] pieterh	but memory usage is stable: "7867 ph 20 0 198m 1904 1148 S 312 0.0 1:09.50 leaker6 "
[20:53] cremes	hrmm...
[20:53] pieterh	sustrik: I've tested LINGER and it definitely works after the connection is established
[20:54] sustrik	aaaah
[20:54] sustrik	i recall something like that dimly
[20:54] sustrik	let me check the code
[20:54] pieterh	Ergo^: are you on the latest 0MQ?
[20:56] pieterh	Ergo^: check the release notes, Ctrl-C was fixed but I don't recall exactly what version
[20:59] cremes	pieterh: ah! make a small change to that code and it will leak like a sieve
[20:59] cremes	change the number of client threads it spawn to something greater than 1
[20:59] pieterh	cremes... put the 'free' into comments?
[20:59] pieterh	ah, will try
[20:59] pieterh	Ergo^: did you read the Guide yet?
[20:59] cremes	i think it's a race condition bug
[20:59] pieterh	cremes: I'll spend 10 minutes on that, would you spend 10 minutes reviewing http://rfc.zeromq.org/spec:7?
[21:00] cremes	my pleasure
[21:00] pieterh	Ergo^: until you've read at least Ch1 and Ch2, you're kind of in RTFM mode here
[21:01] sustrik	ack: LINGER is socket-wide
[21:01] sustrik	not connection-wide
[21:02] pieterh	cremes: I hereby name this ship the "Leaky and Nasty"
[21:02] pieterh	7993 ph 20 0 1853m 1.4g 1148 S 382 17.6 2:42.12 leaker6
[21:02] cremes	huzzah!
[21:02] pieterh	That's 1.4g of memory in about 30 seconds
[21:02] pieterh	with 10 client threads
[21:02] cremes	i can email you guys a call-tree backtrace if that is helpful to you
[21:02] cremes	yeah, same thing happens on my box
[21:03] pieterh	i love it when people send beautiful C code that reproduces problems...
[21:03] cremes	btw, it doesn't leak as fast when the LINGER line is uncommented but it still leaks rapidly
[21:10] sustrik	what unit is s_clock() in?
[21:10] cremes	milliseconds
[21:11] mikko	success!
[21:12] sustrik	cremes: ok, what about the cpu usage?
[21:12] mikko	i managed to create pure shell-script that executes zeromq build and sends results over http to jenkins
[21:12] sustrik	a peak followed by flat line?
[21:12] pieterh	mikko: nice!
[21:12] cremes	sustrik: let me take a look
[21:13] mikko	also, on the other news. i am bringing up powerpc (debian 6.0) build slave soon(ish)
[21:13] cremes	sustrik: did you update the code to use 2+ client threads? i see cpu spike and stay there
[21:13] sustrik	mikko: btw, i've had a discussion with a guy who has problems building 0mq under mingw-win64
[21:14] cremes	sustrik: reload that gist if you like; i updated it to create 5 client threads which more readily show the leak
[21:14] mikko	sustrik: what is the problem?
[21:14] mikko	using mingw64?
[21:14] sustrik	order of includes, presumably
[21:14] sustrik	https://github.com/zeromq/zeromq2/issues/#issue/60
[21:15] sustrik	i just though it can possibly make sense to add that to builds
[21:15] sustrik	cremes: ok, so it's processing something
[21:15] mikko	sustrik: the current cluster is 32bit hardware
[21:15] sustrik	that definitely looks like a bug
[21:15] mikko	that's slightly problematic
[21:16] mikko	would need a win64 box (i presume)
[21:16] sustrik	ah, i though it's a cross-compile
[21:16] mikko	or does the cross-compile work on 32bit?
[21:16] sustrik	never mind
[21:16] sustrik	no idea
[21:16] sustrik	check the issue
[21:16] mikko	can't do 'make check' without win64
[21:16] mikko	i can add build
[21:16] sustrik	mikko: spot on
[21:16] sustrik	i forgot about the tests
[21:17] cremes	sustrik: yes, i agree; i changed the publish interval to 500ms and cpu remains high
[21:17] cremes	sustrik: whatever it is processing, it's stuck
[21:17] sustrik	right
[21:17] cremes	sustrik: i can send you the call-tree for the code that is allocating (and holding onto) all of this memory if that's helpful
[21:17] pieterh	cremes: I think I see the problem
[21:17] sustrik	yes, please
[21:18] pieterh	the client is never pausing for breath
[21:18] sustrik	it's not, but it's time-limited
[21:18] pieterh	server can't keep up
[21:18] sustrik	so it should send for 200ms
[21:18] pieterh	let me set a HWM and do small sleep in the client after closing a socket...
[21:18] sustrik	then stop
[21:18] pieterh	the clock in the client has no purpose at all afaics
[21:20] cremes	ok, so a small sleep inside the publish loop fixes it
[21:20] cremes	but shouldn't it just drop those messages if they are in queue and undelivered?
[21:20] cremes	LINGER = 0 in this case
[21:21] pieterh	cremes: if I sleep 1 second after each publish burst, client memory usage is flat
[21:21] pieterh	they are sent to publisher before you close the socket
[21:21] pieterh	the memory consumption is in the server queue
[21:22] cremes	hmmm, i can believe that
[21:22] sustrik	2 producers are definitely going to overload one consumer
[21:22] pieterh	hmm, indeed, I set 10k HWM ons server socket, still runs out of memory
[21:22] sustrik	you have to set HWM to make excess messages be dropped
[21:22] pieterh	setting 10K HWM on client socket AND sleeping in between bursts, it's ok
[21:23] sustrik	what about HWM on both sender and receiver?
[21:23] pieterh	cremes: ah...
[21:23] pieterh	LINGER is only executed at zmq_term time!
[21:24] sustrik	zmq_close() time, to be precise
[21:24] pieterh	bleh, you're right, and doing init/term in teh loop makes no difference
[21:25] pieterh	cremes: you always find the weird cases... :-)
[21:25] sustrik	have you tried with HWM on both sides?
[21:25] pieterh	have tried on either side, no difference
[21:25] cremes	i didn't think HWM had any effect on a SUB socket...?
[21:25] sustrik	i meant both
[21:25] sustrik	not either
[21:26] sustrik	cremes: it does
[21:26] sustrik	it specifies how many messages can be buffered before 0mq starts dropping them
[21:26] pieterh	sustrik: either, both, makes no visible difference
[21:26] sustrik	ok, that looks like a buf
[21:27] sustrik	bug
[21:27] cremes	on the zmq_socket() man page, it says N/A for HWM on a SUB socket
[21:27] sustrik	oh
[21:27] sustrik	i see
[21:27] pieterh	the only thing that seems to work is a long (1 second) sleep in the client loop
[21:27] sustrik	the clients are creating new connections all the time
[21:27] cremes	sustrik: right
[21:27] pieterh	cremes: yeah, I remember that, it's a bug, no?
[21:28] sustrik	meaning that the server creates a new buffer each time
[21:28] sustrik	each buffer is limited by HWM
[21:28] sustrik	but the number of buffers is unlimited
[21:29] sustrik	there should be MAX_CONNECTIONS socket options...
[21:29] sustrik	option*
[21:29] cremes	that buffer should be dropped when zmq_close() is called so it should catch up, right?
[21:29] Guthur	what is expected to happen if you poll before TCP sockets are fully connected?
[21:29] sustrik	cremes: the buffer is dropped on the client side
[21:30] sustrik	the server side buffer remains untill all the messages are read from it
[21:30] cremes	sustrik: i thought zmq_connect() is what created the buffer
[21:30] pieterh	Guthur: nothing in particular?
[21:30] cremes	ok, right
[21:30] sustrik	cremes: yes
[21:30] pieterh	sustrik: yes, but are there multiple buffers at the server side?
[21:30] sustrik	but the server side buffer remains in place while there are messages in it
[21:31] sustrik	yes, one buffer per connection
[21:31] pieterh	it's N client-side buffers (that should be destroyed by close + LINGER=0) + 1 server-side buffer
[21:31] Guthur	pieterh, I'm getting strange behaviour on POSIX OSs (linux and OSX) with polling with CLRZMQ2
[21:31] pieterh	setting HWM on sub socket (server) makes no difference
[21:31] pieterh	Guthur: 'strange' = ?
[21:31] sustrik	the socket on the server side is never closed
[21:32] sustrik	so the buffers remain
[21:32] Guthur	pieterh, well if I don't delay the polling ever so slightly it throws an exception
[21:32] pieterh	sustrik... where is that 1.4Gb of memory sitting then?
[21:32] Guthur	and a users seems to be getting similar problems on OSX
[21:32] Guthur	user*
[21:32] sustrik	lot of buffers in the server socket
[21:32] sustrik	they are gradually being emptied and deallocated
[21:32] Guthur	same code works on windows fine though
[21:33] sustrik	but client create new buffers even faster
[21:33] Guthur	without the delay
[21:33] mikko	http://johanharjono.com/archives/633
[21:33] mikko	installation instructions missing something?
[21:33] pieterh	and HWM is for each buffer independently... not the socket as such
[21:33] pieterh	Guthur: no idea, we'd need some test code that reproduces it
[21:33] sustrik	yes, HWM is same as SO_SNDBUF and SO_RCVBUF
[21:33] sustrik	local
[21:34] sustrik	doesn't affect the peer
[21:34] Guthur	it's all related to this issue: https://github.com/zeromq/clrzmq2/issues/13
[21:34] pieterh	cremes: so what did you not know that led you to think this could work?
[21:35] cremes	pieterh: i saw another resource leak and followed it back to the PUB socket
[21:36] Guthur	i do notice that if I place it in a try block it also works, I put this down to the fact a try block will possibly delay the polling ever so slightly
[21:36] cremes	i'll have to look and see if i am overrunning the SUB socket on the other side like in this example
[21:36] pieterh	seems like that opening/closing the client sockets each time is the cause
[21:36] sustrik	this is a problem i wanted to address for a long time but never quite get to do it
[21:36] pieterh	Guthur: I can't really help, have no idea what the exception could be or why
[21:36] sustrik	there should be a socket option limiting the max number of concurrent connecitons
[21:37] peter_NOrth	is nial dalton on this IRC ever?
[21:37] pieterh	sustrik: anti-DoS protection
[21:37] sustrik	exactly
[21:37] pieterh	useful, but here we have a problem of documentation IMO
[21:37] pieterh	or something
[21:38] sustrik	possibly
[21:38] pieterh	it's unclear how HWM and LINGER help here
[21:38] pieterh	(in fact they don't)
[21:38] sustrik	LINGER is irrelevant
[21:39] sustrik	because it affects the send side
[21:39] sustrik	and the problem is on recv side
[21:39] pieterh	yes, but that's not obvious
[21:39] sustrik	HWM would help in combination with MAX_CONNECTIONS
[21:39] sustrik	MAX_CONNECTION * HWM = max number of messages queued
[21:39] pieterh	possibly HWM affecting socket rather than each buffer
[21:39] pieterh	ah, yes
[21:40] sustrik	* MAX_MSG_SIZE = max memory used
[21:40] pieterh	...calculating...
[21:40] pieterh	102523.2231GB
[21:40] pieterh	yeah, that'll do
[21:41] pieterh	sustrik: why not add MAX_CONNECTIONS and MAX_MSG_SIZE to the 3.0 roadmap?
[21:41] pieterh	they are excellent ideas
[21:41] Guthur	pieterh, errno 4 mean anything?
[21:42] pieterh	documenting them will perhaps give someone the incentive to go make the patch
[21:42] sustrik	it can be added to 2.x
[21:42] NoToes	Hi Guther, I'm "johndeko". So you've managed to reproduce the poll timing issue? If so I wont bother to reproduce it outside of Unity.
[21:42] sustrik	no backward compatibility problem
[21:42] pieterh	sustrik: sure
[21:42] pieterh	we have a 2.2 roadmap page?
[21:42] sustrik	nope
[21:42] Guthur	NoToes, I think so
[21:42] Guthur	very strange one though
[21:43] pieterh	sustrik: ok, I'm going to make it, I assume?
[21:43] sustrik	why not
[21:43] NoToes	Sure is!
[21:43] sustrik	no, i'm not
[21:44] sustrik	it's either brian granger or minrk
[21:44] Guthur	NoToes, a sleep of at least 100 milliseconds before starting to poll and there is no problem
[21:45] Guthur	but I don't think that's really what you want to hear
[21:45] pieterh	sustrik: ok, done, and I added the socket type renames since there was consensus on that
[21:45] pieterh	oh, I can provide a patch for that already :-)
[21:45] sustrik	what renames?
[21:46] pieterh	:-)
[21:46] pieterh	XREP -> ROUTER, XREQ -> DEALER
[21:46] sustrik	yuck
[21:46] NoToes	Guther, not really. It doesn't fill me with certainty and makes fast updates impossible.
[21:46] Guthur	here it's an interrupted syscall
[21:46] pieterh	yeah, you should have said that when it was discussed on zeromq-dev
[21:46] pieterh	les absents on toujours tort
[21:46] Guthur	that's the exception
[21:46] Guthur	NoToes, ^
[21:46] sustrik	ok, good
[21:47] sustrik	i'll add it as an alias
[21:47] pieterh	sustrik: thread has title "[0MQ/3.0] discuss: renameÂ XREPÂ toÂ ROUTER"
[21:47] pieterh	but we can introduce the name change in 2.2 as we did for PUSH/PULL
[21:48] cremes	Ergo^: if the python 0mq interface allows you to send multipart messages, make sure the topic is the first
[21:48] Guthur	NoToes, it maybe that you only have to do this after first connecting, and then things will be fine unless you have to reconnect again, that's a guess though
[21:48] cremes	Ergo^: part and your json-encoded string is the second part
[21:48] cremes	Ergo^: don't be overly concerned that the api doesn't have a single call that does everything you want
[21:48] Guthur	NoToes, I have not got the sleep in the polling loop, rather just before it, does this work for you?
[21:48] cremes	Ergo^: you can build your own convenience method from the methods already present, right?
[21:48] NoToes	Guthur, well easy enough for me to test.
[21:49] NoToes	Guthur, I'll try it out.
[21:49] Guthur	cool
[21:50] Guthur	sustrik, any idea why we would get an "Interrupted system call" error when polling to quickly after a TCP socket connection
[21:50] cremes	Ergo^: i disagree; i don't think the api should have any explicit method dealing with json
[21:50] cremes	Ergo^: why not a different serialization format? what is json's connection to 0mq?
[21:51] cremes	Ergo^: i guess i fail to see the problem here; you can easily accomplish what you want with a 3-line method
[21:51] cremes	Ergo^: why does it matter that the api doesn't already have it? write it and send in a patch...?
[21:51] sustrik	Guthur: presumably, there's a signal generated somewhere
[21:52] mikko	sustrik: http://build.zero.mq/job/ZeroMQ2-core-master_mingw64/5/console
[21:52] mikko	mingw64 cross compile running
[21:52] mikko	well, was running
[21:52] sustrik	wow, that was quick
[21:52] cremes	Ergo^: ok!
[21:53] mikko	sustrik: not sure if that is my environment or something else
[21:53] sustrik	no windows.h
[21:53] sustrik	strange
[21:53] mikko	might be something odd with the build i guess
[21:54] mikko	./configure --host=amd64-mingw32msvc --target=mingw64
[21:54] mikko	do i need anything else?
[21:55] sustrik	no idea
[21:55] NoToes	Guthur, no luck with a sleep before the poll loop.
[21:55] sustrik	try asking the guy who filled the issue
[21:55] mikko	ok, will investigate
[21:55] sustrik	he's pretty responsive
[21:56] cremes	sustrik: what would you say is holding onto the memory if you saw this callstack? https://gist.github.com/848123
[21:56] sustrik	that are messages
[21:56] cremes	are they unsent and in a queue?
[21:57] sustrik	they are received by I/O thread and waiting to be read by the application
[21:58] cremes	sustrik: i don't understand that... it's from a PUB socket, so what is waiting to read it?
[21:58] sustrik	sorry?
[21:58] sustrik	I/O thread reads messages from TCP connections and buffers them
[21:58] sustrik	application reads them
[21:59] cremes	that call-tree is for a pub socket that is sending messages
[21:59] cremes	i don't understand why you say the i/o thread has received them and is waiting for the application to read them
[21:59] sustrik	oops
[21:59] cremes	i though pub was broadcast, fire-and-forget
[21:59] sustrik	missed the first line
[21:59] sustrik	it is
[22:00] sustrik	but there's some reliability built in
[22:00] cremes	pieterh: sent you some feedback on that rfc
[22:00] sustrik	namely, up to HWM messages are buffered before 0mq starts dropping them
[22:00] pieterh	cremes: our email server is dead atm
[22:00] cremes	ok, so what are the conditions that will cause pub to hang onto those messages?
[22:00] sustrik	by default, HWM=infinite
[22:00] pieterh	cremes: could you resend to pieterh@gmail.com, thanks
[22:00] cremes	pieterh: that explains why the email bounced!
[22:01] pieterh	bounced? that's not nice... rats...
[22:01] NoToes	Guthur, adding a System.GC.Collect() instead of a sleep also works.
[22:01] cremes	sustrik: ok, so they are in queue because there is a slow subscriber somewhere; is that right?
[22:02] Guthur	NoToes, that's even weirder
[22:02] sustrik	yes
[22:02] cremes	ok
[22:02] sustrik	to guard against slow consumers
[22:02] Guthur	NoToes, but the sleep did not work for you?
[22:02] NoToes	Guthur, doesn't say much if just takes up some time.
[22:02] sustrik	all buffering has to have upper limit
[22:02] cremes	and if there are no subscribers, it should just drop those messages, yes?
[22:03] sustrik	so we need at least 3 options: HWM, MAX_CONNECTIONS, and MAX_SIZE
[22:03] sustrik	cremes: yes
[22:03] cremes	cool
[22:03] cremes	i must have a slow subscriber somewhere.... damn it
[22:03] sustrik	well, if you are doing something like the example posted
[22:04] sustrik	i.e. publishing at full speed from serveral apps to a single app
[22:04] sustrik	it's just going to blow up
[22:04] peter_NOrth	dalton
[22:04] cremes	i don't think i have that configuration though... i'll have to dig into this; thanks for your help
[22:05] sustrik	you are welcome
[22:08] Guthur	NoToes, crumbs, I can not replicate anymore
[22:09] Guthur	it's just working now, grrr
[22:11] NoToes	Guthur, That's timing bugs for you!
[22:14] pieterh	cremes: thanks for the review, made changes
[22:14] pieterh	could you send me that email bounce message so I can see the error?
[22:15] cremes	pieterh: it wasn't a real bounce; the mail app refused to take a message to sustrik probably because it was too large
[22:15] cremes	pieterh: so... never mind!
[22:15] pieterh	ok
[22:19] pieterh	cremes: could you send me random something to ph@imatix.com?
[22:20] pieterh	I've fixed our email server but need to test
[22:20] NoToes	Guthur, I missed your message. No putting a sleep before the loop, instead of in the poll loop didn't work.
[22:20] cremes	pieterh: on its way
[22:20] pieterh	zeromq-dev should be working again now
[22:20] pieterh	thx!
[22:20] Guthur	NoToes, There is no error with you either?
[22:21] NoToes	Guthur, no error.
[22:21] NoToes	Guthur, zmq_poll just always returns 0.
[22:21] pieterh	sustrik: email list is fixed
[22:22] pieterh	messages will be coming in slowly as servers retry
[22:22] Guthur	NoToes, it seems as if I am getting a slightly different issue then
[22:22] Guthur	Mine returns errno 4, if there the slight delay before starting the polling loop
[22:23] Guthur	this translates to an "Interrupted system call"
[22:25] NoToes	Guthur, OK, different issue then.
[22:26] Guthur	which is doubly annoying, hehe
[22:26] NoToes	Guthur, I suppose I should try to reproduce this outside of Unity then.
[22:26] Guthur	NoToes, that would be helpful, and much appreciated if you could
[22:35] Guthur	is there any advisable action an app should take when getting EINTR while polling?
[22:37] Guthur	NoToes, I have found that if I catch that EINTR and then continue all is fine
[22:37] Guthur	OSX does signal things properly I assume, and unity isn't suppressing them even, or something
[22:38] Guthur	I admit we are on the borders of my knowledge here
[22:38] Guthur	probably left the country to be honest
[22:40] NoToes	Guthur Unfortunately I'm new to OSX as well.Is EINTER a signal or a return code from zmq_recv?
[22:41] Guthur	http://api.zeromq.org/master:zmq-poll
[22:41] Guthur	EINTR is returned if there is a signal
[22:42] Guthur	well not return actually, the errno is set
[22:42] Guthur	poll returns -1
[22:42] NoToes	Guthur, Ah OK.
[22:43] Guthur	I think the issue I have here in linux MONO is something I can't really rectify, but is easily worked around
[22:44] Guthur	The OSX one is a little less clear
[22:44] Guthur	have you tried it outside Unity?
[22:49] NoToes	Guthur, I'm trying now...
[22:53] NoToes	Guthur, it's working inside Unity now :(
[22:55] Guthur	I wonder if it's a MONO issue
[22:56] Guthur	but that doesn't really make much sense either, it's only a relatively simple interop call
[23:03] NoToes	Guthur I don't know enough about zmq to make sense of it. Is it possible that there is a shared native buffer referenced by multiple managed objects (or something like that)? Would explain why the running the GC helps and the timing issues.
[23:05] Guthur	NoToes, I'm looking through now
[23:22] Guthur	NoToes, Not seeing anything at the moment
[23:22] pieterh	Ergo^_: build using --with-openpgm afir
[23:23] Guthur	it's getting late here so i'll probably get my head down soon, sorry we've been unable to get this sorted for you
[23:23] Guthur	hopefully we'll get to the bottom of it eventually
[23:23] pieterh	:-)
[23:24] NoToes	Guthur, Thanks for all your help.
[23:24] Guthur	no probs
[23:24] Guthur	I might drop by the MONO channel tomorrow and see if I can get an clues
[23:24] Guthur	an/any
[23:25] Guthur	ok, it's late, night all