Saturday February 12, 2011

[Time] NameMessage
[02:25] andrewvc chuck
[02:25] andrewvc err
[02:25] andrewvc cremes: around?
[02:25] andrewvc I think I found the bug, it's in ffi-rzmq I believe
[06:41] yrashk under certain circumstances I am getting some weird segfault in zmq_recv (I use it in an Erlang NIF, and it works fine until I create a so called 'release' for my erlang app)
[06:41] yrashk
[06:41] yrashk peer_identity = 0x870e5bfa <Address 0x870e5bfa out of bounds>
[06:41] yrashk looks suspicious
[06:41] yrashk any ideas what to do to figure out the source of the problem
[06:41] yrashk happens only in that beforementioned setting and only on osx
[06:41] yrashk works fine on linux
[06:45] yrashk sustrik: ^^^
[06:51] yrashk and yes, this is 2.1.0 and master
[07:31] sustrik yrashk: aren't you using same socket from multiple threads?
[07:32] yrashk well, all sockets are created in thread #1
[07:32] yrashk then I use push socket from a thread #1
[07:32] yrashk and a pul socket from thread #2
[07:32] yrashk pull*
[07:33] yrashk will that result in an undefined behaviour?
[07:35] sustrik if passing of the socket from #1 to #2 is well synchronised, then it should be ok
[07:36] yrashk Each ØMQ socket belonging to a particular context may only be used by the thread that created it using zmq_socket().
[07:37] yrashk I might have missed this
[07:37] yrashk so the documentation says I have to use the socket form a thread that created it?
[07:37] yrashk from*
[07:37] sustrik with 2.1 the restriction is alleviated
[07:37] yrashk k
[07:37] sustrik you *can* pass sockets between threads
[07:38] sustrik but you can't access a socket from 2 threads in parallel
[07:38] yrashk I am never touching that pull socket from the 1st thread
[07:38] yrashk except for its initialization
[07:39] sustrik how do you pass the socket to the other thread?
[07:40] yrashk it just lives in a class instance that 1st thread creates
[07:42] yrashk after that 1s thread creates a 2nd thread and passes that object as an argument to a thread function
[07:42] sustrik ok, i see
[07:42] sustrik that should be ok
[07:43] yrashk is there anything else that might result in that segfault? specifically on OSX
[07:43] yrashk as Linux build seems to work just fine
[07:43] yrashk at least I am yet to see a single crash in the same scenario
[07:43] yrashk while OSX build crashes 80-90% times
[07:44] sustrik well, what you have shown me looks like a socket have been closed while still being used
[07:44] sustrik either from another thread
[07:44] sustrik or from a single thread this way: zmq_close(x);zmq_recv(x,...);
[07:45] sustrik do you have a minimal test case?
[07:45] yrashk not yet
[07:45] yrashk the setting is pretty complicated
[07:45] sustrik ok
[07:45] yrashk as it works just fine when not packaged as an erlang app release
[07:47] yrashk I am just trying to think of any possible explanation of this segfault
[07:48] sustrik library mismatch?
[07:48] sustrik the recv that fails, is it the first recv called?
[07:50] yrashk nope
[07:50] yrashk normally anywhere from like 4 calls to about, say, 20
[07:52] sustrik memory overwrite then?
[07:52] sustrik it's just a guesswork
[07:53] yrashk well may be
[07:54] yrashk but I am not quite positive how can this happen
[07:54] yrashk and only when the app is packaged
[07:54] sustrik it'a a segfault, right?
[07:55] sustrik does it print out the address it segfaults at?
[07:56] yrashk it is
[07:57] yrashk
[07:58] sustrik i don't see the notification about the segfault there
[07:58] sustrik i mean what address it tries to access that is out of bounds?
[07:59] yrashk peer_identity = 0x870e5bfa <Address 0x870e5bfa out of bounds>
[07:59] yrashk I guess this is it
[08:00] sustrik nope
[08:00] sustrik it prints "segfault" somewhere
[08:00] sustrik is there an address mentioned there?
[08:01] yrashk it didn't print any address
[08:02] yrashk anyway I think you led me into something
[08:04] yrashk thanks a lot!
[08:07] yrashk I really appreciate your help, sustrik -- I think I got it
[08:08] sustrik what's the problem?
[08:09] yrashk apparently due to the hack I used to ensure a NIF module is loaded, I was calling initializion of that NIF twice
[08:09] yrashk and it rewrote the context
[08:09] yrashk facepalm
[08:09] yrashk absolute facepalm
[08:10] yrashk well I didn't check that it actually overwrote the context
[08:10] yrashk but it is fairly trivial to guess that from the code
[08:10] yrashk because it's my code
[08:10] yrashk :]
[08:12] sustrik :)
[10:00] pieterh hi guys
[10:00] pieterh seems our email server was out of action for a while
[10:00] pieterh it looks like stuff was queued but not being sent out
[10:01] pieterh this would have affected the zeromq-dev list presumably
[10:01] pieterh anyhow, I rebooted the beast and emails are now slowly appearing
[13:57] pieterh Anyone hitting "Successful WSASTARTUP not yet performed (c:\work\src\zeromq2\src\mailbox.cpp:263)" on Windows
[13:57] pieterh ?
[14:18] Guthur pieterh, never seen such an error myself
[14:18] Guthur which windows version?
[14:21] pieterh Guthur: I'm running WinXP, this hits at zmq_init()
[14:21] pieterh Seems new to 2.1.0
[14:22] Guthur I've be running 2.1.0 ok recently, WinXP as well
[14:22] Guthur sorry, that's not really helpful for you though
[14:23] pieterh I'll check if zmq is doing WSAStartup or not...
[14:24] Guthur i'm looking at IOCP today for ICP on win
[14:24] Guthur not sure if I will have it ready this weekend though, it's not very well documented
[14:25] Guthur ICP/IPC
[14:27] pieterh getting ipc: to work on win32 would be great
[14:28] pieterh for some reason I'm getting zmq calling make_socketpair before doing WSAStartup... strange
[14:37] sustrik pieterh: what version are you using?
[14:37] pieterh latest from github
[14:37] pieterh stepping though, it definitely tries to create a mailbox socket pair before doing WSAStartup
[14:38] pieterh C++ is a joy to understand
[14:38] sustrik how come?
[14:38] sustrik see ctx.cpp
[14:38] pieterh I am staring at it :-)
[14:38] sustrik ctx_t constructor
[14:38] sustrik line 36
[14:38] pieterh any specific line no?
[14:39] sustrik the very thing fitst done is WSAStartup
[14:39] pieterh well, line 36 is a blank line here
[14:39] pieterh yes, the very first thing it does is WSAStartup
[14:40] sustrik that's zmq_init() implementation
[14:40] pieterh ok, when I debug it step by step, I get...
[14:40] pieterh (hang on, it'll take me a second...)
[14:41] Guthur cool, just got feedback of some users of zmq2 and clrzmq2 being used as core tech, sweet
[14:42] Guthur good to be getting feedback on some field tests
[14:42] pieterh sustrik: array, vector, mutex_t, vector, mailbox constructors before it does first line of ctx constructor
[14:42] pieterh that is, after calling ctx constructor from zmq_init...
[14:42] sustrik wait a sec, checking...
[14:42] pieterh Guthur: saw that on twitter... nice
[14:43] sustrik ok, got it
[14:43] sustrik let me fix it
[14:43] pieterh excellent, can you explain what it's doing?
[14:43] Guthur pieterh, they already caught a couple of bugs, so paying dividends already
[14:43] Guthur bugs in clrzmq2
[14:44] pieterh It's nice to have users :-)
[14:44] sustrik constructors of embedded object are called *before* the constructor of the main object
[14:44] pieterh sustrik: ah, and mailbox is embedded I guess
[14:44] sustrik ctx_t has a member called term_mailbox
[14:44] pieterh right
[14:44] sustrik right
[14:45] pieterh you could move the WSAStartup code to zmq_init
[14:45] sustrik yes, i should
[14:45] pieterh well, let me try that, test it, submit a patch
[14:45] sustrik goodo
[14:45] pieterh it'll take me 3 minutes...
[14:46] sustrik also, to retain symetricity, move WSACleanup to zmq_term()
[14:50] zchrish In C++, should a context ever be introduced in a place other than in main()?
[14:52] zchrish I am trying to design a thread management strategy to incorporate some sort of error management. Will a context ever become corrupted?
[14:59] pieterh sustrik: ok, fixed and tested, sending patch now
[15:04] pieterh zchrish: you can create a context anywhere you like but two threads that want to communicate via inproc: must share the same context
[15:04] pieterh so the natural place is usually where you create child threads, which is usually main()
[15:05] pieterh and no, a context will not become corrupted unless your application overwrites memory erroneously
[15:05] zchrish OK; thanks.
[15:17] sustrik pieterh: please, sign-off the patch
[15:17] sustrik (commit -s)
[15:40] Guthur sustrik, he has left
[15:54] sustrik ah, missed that, thanks
[15:55] Guthur sustrik, question about the IOCP integration...
[15:55] sustrik sure
[15:55] sustrik go on
[15:55] Guthur will the zmq engine be able to call PostQueuedCompletionStatus on socket recvs and sends
[15:56] sustrik ?
[15:56] Guthur i could be missing something but that seems to be how IOCP works, it's just a means of syncing stuff
[15:57] Guthur so the polling object calls GetQueuedCompletionStatusEx to get any signalled events
[15:58] Guthur but i'm having trouble seeing where these would get signalled form
[15:58] Guthur from*
[15:58] sustrik i would say the NamesPipe would signal it
[15:58] sustrik you don't need to do that yourself
[15:58] Guthur that's what i was initially thinking too
[15:59] Guthur i'll dig a bit more
[15:59] Guthur the documentation is useless
[15:59] Guthur MSDN is crap
[16:01] sustrik any examples out there?
[16:02] Guthur some stuff, I have more code here, but they do seem to be calling PostQueuedCompletionStatus explicitly
[16:02] Guthur they are passing custom overlapped structs with event details
[16:03] Guthur I have some server code here, i'll look through that
[16:04] sustrik Have you seen this:
[16:04] sustrik
[16:06] Guthur I hadn't seen that
[16:59] Guthur I think I'm just going to have to throw some code together an experiment
[17:23] sustrik maybe discussing it at some windows forum may give you some insight into different technologies
[17:23] Guthur sustrik, way ahead. hehe
[17:23] Guthur talking to someone on #winapi
[17:23] sustrik :)
[17:23] Guthur it is indeed possible to get events automatically from pipes via IOCP
[17:23] Guthur it's just a little confusing
[17:32] sustrik i see
[17:36] CIA-21 zeromq2: 03Pieter Hintjens 07master * r14a0e14 10/ (src/ctx.cpp src/zmq.cpp):
[17:36] CIA-21 zeromq2: Fixed win32 issue with WSAStartup
[17:36] CIA-21 zeromq2: - ctx constructor was calling mailbox_t constructor implicitly
[17:36] CIA-21 zeromq2: - moved WSAStartup and WSACleanup to be outside constructor/destructor
[17:36] CIA-21 zeromq2: Signed-off-by: Pieter Hintjens <> -
[19:01] eut does anyone use the lua zmq bindings? i'm having some trouble with the nonblocking recv
[19:02] eut it seems as though i can never receive a message
[19:05] cremes eut: what kind of sockets are you using in your test?
[19:05] eut xrep/xreq
[19:09] eut ah, never mind...
[19:09] cremes ok
[19:09] eut it looks like zmq buffers outgoing messages, sending several all at once
[19:10] eut so sometimes i would quit listening before it finally sent
[19:10] cremes i believe it has an internal timer on its I/O thread so that messages are coalesced and sent
[19:10] cremes kind of like nagle's algorithm
[19:10] eut ok i see
[19:11] eut is there a way to influence that internal timer (or whatever)?
[19:18] cremes eut: don't know... plus, i may be wrong on that
[19:18] cremes ask the mailing list; be sure to include a pointer to your code just in case it's a different issue
[19:19] eut ok
[19:35] zedas hey can anyone point me at the docs on how 2.1.0 does graceful shutdown? apparently there's a change where sockets will LINGER or not?
[19:37] cremes zedas:
[19:38] cremes and
[19:38] cremes
[19:38] cremes by default, it will "linger" forever until all packets are flushed
[19:47] Guthur ah, i think i'm making progress
[19:47] Guthur i'll not have the IOCP in ZMQ tonight, but I think i might be able to get it in soon enough
[19:48] Guthur I have it working in a small test client server app
[20:03] zedas cremes: ok thanks, i've gotta get 2.1.0 working with mongrel2 and this is the only thing that's broken right now.