Sunday November 21, 2010

[Time] NameMessage
[10:14] mikko sustrik:
[10:14] mikko this time with sun studio
[10:14] sustrik mikko: let me see
[10:15] sustrik still the same bug
[10:16] sustrik dhammika seems to understand what's going on
[10:39] mikko sustrik: it's a lot harder to reproduce with debug build
[10:51] sustrik ack
[10:58] mikko
[10:58] mikko backtrace with symbols
[10:59] mikko the crash is in #0 zmq::object_t::get_tid (this=0xb1c088c8) at object.cpp:47
[11:09] sustrik mikko: that's maint?
[11:10] sustrik nope, it's master
[11:11] mikko master
[11:11] mikko ran while [ 1 = 1 ]; do ./test_shutdown_stress ; echo "x"; done until it crashed
[11:13] sustrik hm, looks like some kind of stack overwrite
[11:13] sustrik the error actually emerges in the test program, not 0mq core...
[11:14] sustrik strange
[11:17] sustrik hm, cannot reproduce on my box
[11:20] mikko takes anything between 10 to 100 runs here
[11:20] mikko until it crahes
[11:29] sustrik on my box it runs ~30 times
[11:29] sustrik then the test slows down considerably
[11:29] sustrik (too much open sockets?)
[11:29] sustrik anyway, it doesn't fail
[11:35] sustrik mikko: can you possibly check the patch proposed in my last email?
[11:36] sustrik it's probably a different problem, but there's a chance that it's actually a different manifestation of the same bug...
[11:37] sustrik just add the two lines into zmq_engine.cpp
[11:42] mikko line 155?
[11:48] sustrik mikko: yes
[11:49] mikko *** glibc detected *** /tmp/zeromq2/tests/.libs/lt-test_shutdown_stress: corrupted double-linked list: 0xb0b058c8 ***
[11:49] mikko after ~50 runs
[11:50] mikko
[11:50] mikko bt
[11:50] mikko let me do thread apply all bt
[11:51] mikko
[11:52] sustrik hm, probably the same problem
[12:21] rgl are there any plans to properly support a 64 bit build on windows?
[12:26] Guthur rgl: what happens with a 64bit build?
[12:26] Guthur I tried it briefly and it seemed ok?
[12:26] rgl Guthur, windows uses a different model for 64-bit than common linuxes; eg. on windows a long is not 64 bit.
[12:27] rgl Guthur, which means that some code on ZMQ does not correctly build (well, it build, but with warnings that do not seem ok to ignore)
[12:28] Guthur ok, seems fair enough, I didn't really test or watch thoroughly the build
[12:29] mikko rgl: are you building with mingw or msvc?
[12:29] rgl the C data model on windows is LLP64 but on linux its LP64
[12:29] rgl mikko, it does not matter. mingw uses the normal windows data model.
[12:29] mikko rgl: it seems to be a bit different
[12:29] rgl mikko, but, I'm using msvc *G*
[12:29] mikko __int64_t is defined as long long on mingw32
[12:30] rgl lemme put the msvc compiler logs somewhere.
[12:30] mikko rgl: i thought zeromq uses fixed size ints everywhere
[12:31] Guthur I know in some of my builds i see warnings about casting size_t to unsigned int
[12:31] Guthur not directly related, more in reply to the fixed size ints
[12:32] mikko Guthur: is that zeromq core or binding code?
[12:32] mikko i think i need to do MSVC build at some point
[12:32] Guthur mikko, core
[12:34] rgl
[12:35] mikko rgl: can you try the same build with master?
[12:35] rgl I can. I just have to get the repo.
[12:40] rgl its more-or-less the same
[12:51] Guthur lot's of size_t conversions
[12:51] Guthur size_t is very troublesome, especially for bindings
[13:00] rgl can we safely ignore them?
[13:21] mikko size_t to int isn't really a safe conversion
[13:22] mikko it would require bounds checking
[13:23] mikko not sure why the array index isn't size_t there
[13:24] mikko most of those looks pretty safe casts
[13:37] rgl humm tried to run remote_thr.exe tcp:// 3000 1234567 and it completly spiked the memory usage to 4G *G*
[13:37] rgl is this supposed to happen when no watermark is set on the socket?
[15:46] Guthur Is size_t really a requirement?
[15:46] Guthur It's very bothersome for bindings like said, because it is hard to be sure what size it really is
[15:47] Guthur primitive or explicit types are easier
[15:57] sustrik Guthur: checked POSIX SO_SNDBUF which is kind of similar
[15:57] sustrik and it is indeed "int" rather than "size_t"
[15:58] sustrik on the other hand, int is not a fixed-size interger either
[16:42] sustrik rgl: yes, it's supposed to happen
[17:45] Guthur sustrik, I have a bad feeling my attempts at cross platform (x86/-64) support with clrzmq2 are possibly not going to stand up, needs more testing really
[18:04] sustrik Guthur: sure, that's how it goes with cross-platform support
[18:04] sustrik in most cases you cannot do all the work yourself
[18:04] sustrik you have to rely on people who actually own and understand the OSes in question
[18:05] Guthur I'll certainly be keeping an eye out for feedback
[18:27] rgl Guthur, I have one request. Maybe it does not make much sense, but It would be great to choose (at runtime) the version of libzmq do use, say, if the os is 32-bit. use the 32-version, otherwise use the 64 bit version.
[18:28] Guthur well as far as clrzmq2 goes, I have used preprocessor directives
[18:29] Guthur It would not be impossible to do it at runtime, not sure of benefits though
[18:31] rgl just more convinient. you just need to drop both versions of the native library and the whole application will work on either 32 or 64 bit.
[18:38] Guthur Might as well drop both clrzmq.dll as well then though
[18:38] Guthur and then have the application decide
[18:39] Guthur that's not generally the approach of Linux, you tend to have either discrete packages or build from source
[18:40] Guthur rgl: Reasonable enough thought, just in imho it's something that should be delegated to the application provider
[18:41] rgl ah yes. indeed. it could be delegated to the app. point taken :D
[20:33] mikko mato: is there a reason why on HP UX the CPPFLAGS are overriden completely?
[20:33] mikko CPPFLAGS="-D_POSIX_C_SOURCE=200112L"