[Time] Name | Message |
[10:14] mikko
|
sustrik: http://build.valokuva.org/job/ZeroMQ2-core-master_SunStudio/70/console
|
[10:14] mikko
|
this time with sun studio
|
[10:14] sustrik
|
mikko: let me see
|
[10:15] sustrik
|
still the same bug
|
[10:16] sustrik
|
dhammika seems to understand what's going on
|
[10:39] mikko
|
sustrik: it's a lot harder to reproduce with debug build
|
[10:51] sustrik
|
ack
|
[10:58] mikko
|
https://gist.github.com/6d2c9eb0b3efc2afd9a8
|
[10:58] mikko
|
backtrace with symbols
|
[10:59] mikko
|
the crash is in #0 zmq::object_t::get_tid (this=0xb1c088c8) at object.cpp:47
|
[11:09] sustrik
|
mikko: that's maint?
|
[11:10] sustrik
|
nope, it's master
|
[11:11] mikko
|
master
|
[11:11] mikko
|
ran while [ 1 = 1 ]; do ./test_shutdown_stress ; echo "x"; done until it crashed
|
[11:13] sustrik
|
hm, looks like some kind of stack overwrite
|
[11:13] sustrik
|
the error actually emerges in the test program, not 0mq core...
|
[11:14] sustrik
|
strange
|
[11:17] sustrik
|
hm, cannot reproduce on my box
|
[11:20] mikko
|
takes anything between 10 to 100 runs here
|
[11:20] mikko
|
until it crahes
|
[11:29] sustrik
|
on my box it runs ~30 times
|
[11:29] sustrik
|
then the test slows down considerably
|
[11:29] sustrik
|
(too much open sockets?)
|
[11:29] sustrik
|
anyway, it doesn't fail
|
[11:35] sustrik
|
mikko: can you possibly check the patch proposed in my last email?
|
[11:36] sustrik
|
it's probably a different problem, but there's a chance that it's actually a different manifestation of the same bug...
|
[11:37] sustrik
|
just add the two lines into zmq_engine.cpp
|
[11:42] mikko
|
line 155?
|
[11:48] sustrik
|
mikko: yes
|
[11:49] mikko
|
*** glibc detected *** /tmp/zeromq2/tests/.libs/lt-test_shutdown_stress: corrupted double-linked list: 0xb0b058c8 ***
|
[11:49] mikko
|
after ~50 runs
|
[11:50] mikko
|
https://gist.github.com/0707e21ec88f54822fd3
|
[11:50] mikko
|
bt
|
[11:50] mikko
|
let me do thread apply all bt
|
[11:51] mikko
|
https://gist.github.com/a87f11557fee4c7617aa
|
[11:52] sustrik
|
hm, probably the same problem
|
[12:21] rgl
|
are there any plans to properly support a 64 bit build on windows?
|
[12:26] Guthur
|
rgl: what happens with a 64bit build?
|
[12:26] Guthur
|
I tried it briefly and it seemed ok?
|
[12:26] rgl
|
Guthur, windows uses a different model for 64-bit than common linuxes; eg. on windows a long is not 64 bit.
|
[12:27] rgl
|
Guthur, which means that some code on ZMQ does not correctly build (well, it build, but with warnings that do not seem ok to ignore)
|
[12:28] Guthur
|
ok, seems fair enough, I didn't really test or watch thoroughly the build
|
[12:29] mikko
|
rgl: are you building with mingw or msvc?
|
[12:29] rgl
|
the C data model on windows is LLP64 but on linux its LP64
|
[12:29] rgl
|
mikko, it does not matter. mingw uses the normal windows data model.
|
[12:29] mikko
|
rgl: it seems to be a bit different
|
[12:29] rgl
|
mikko, but, I'm using msvc *G*
|
[12:29] mikko
|
__int64_t is defined as long long on mingw32
|
[12:30] rgl
|
lemme put the msvc compiler logs somewhere.
|
[12:30] mikko
|
rgl: i thought zeromq uses fixed size ints everywhere
|
[12:31] Guthur
|
I know in some of my builds i see warnings about casting size_t to unsigned int
|
[12:31] Guthur
|
not directly related, more in reply to the fixed size ints
|
[12:32] mikko
|
Guthur: is that zeromq core or binding code?
|
[12:32] mikko
|
i think i need to do MSVC build at some point
|
[12:32] Guthur
|
mikko, core
|
[12:34] rgl
|
http://pastie.org/pastes/1315022/text
|
[12:35] mikko
|
rgl: can you try the same build with master?
|
[12:35] rgl
|
I can. I just have to get the repo.
|
[12:40] rgl
|
its more-or-less the same http://pastie.org/pastes/1315029/text
|
[12:51] Guthur
|
lot's of size_t conversions
|
[12:51] Guthur
|
size_t is very troublesome, especially for bindings
|
[13:00] rgl
|
can we safely ignore them?
|
[13:21] mikko
|
size_t to int isn't really a safe conversion
|
[13:22] mikko
|
it would require bounds checking
|
[13:23] mikko
|
not sure why the array index isn't size_t there
|
[13:24] mikko
|
most of those looks pretty safe casts
|
[13:37] rgl
|
humm tried to run remote_thr.exe tcp://127.0.0.1:56789 3000 1234567 and it completly spiked the memory usage to 4G *G*
|
[13:37] rgl
|
is this supposed to happen when no watermark is set on the socket?
|
[15:46] Guthur
|
Is size_t really a requirement?
|
[15:46] Guthur
|
It's very bothersome for bindings like said, because it is hard to be sure what size it really is
|
[15:47] Guthur
|
primitive or explicit types are easier
|
[15:57] sustrik
|
Guthur: checked POSIX SO_SNDBUF which is kind of similar
|
[15:57] sustrik
|
and it is indeed "int" rather than "size_t"
|
[15:58] sustrik
|
on the other hand, int is not a fixed-size interger either
|
[16:42] sustrik
|
rgl: yes, it's supposed to happen
|
[17:45] Guthur
|
sustrik, I have a bad feeling my attempts at cross platform (x86/-64) support with clrzmq2 are possibly not going to stand up, needs more testing really
|
[18:04] sustrik
|
Guthur: sure, that's how it goes with cross-platform support
|
[18:04] sustrik
|
in most cases you cannot do all the work yourself
|
[18:04] sustrik
|
you have to rely on people who actually own and understand the OSes in question
|
[18:05] Guthur
|
I'll certainly be keeping an eye out for feedback
|
[18:27] rgl
|
Guthur, I have one request. Maybe it does not make much sense, but It would be great to choose (at runtime) the version of libzmq do use, say, if the os is 32-bit. use the 32-version, otherwise use the 64 bit version.
|
[18:28] Guthur
|
well as far as clrzmq2 goes, I have used preprocessor directives
|
[18:29] Guthur
|
It would not be impossible to do it at runtime, not sure of benefits though
|
[18:31] rgl
|
just more convinient. you just need to drop both versions of the native library and the whole application will work on either 32 or 64 bit.
|
[18:38] Guthur
|
Might as well drop both clrzmq.dll as well then though
|
[18:38] Guthur
|
and then have the application decide
|
[18:39] Guthur
|
that's not generally the approach of Linux, you tend to have either discrete packages or build from source
|
[18:40] Guthur
|
rgl: Reasonable enough thought, just in imho it's something that should be delegated to the application provider
|
[18:41] rgl
|
ah yes. indeed. it could be delegated to the app. point taken :D
|
[20:33] mikko
|
mato: is there a reason why on HP UX the CPPFLAGS are overriden completely?
|
[20:33] mikko
|
CPPFLAGS="-D_POSIX_C_SOURCE=200112L"
|