[Time] Name | Message |
[11:53] jsia
|
im coding using C++
|
[11:53] jsia
|
can anyone help on how to share the zmq context within 2 classes
|
[13:28] hwinkel
|
hi
|
[13:45] mikko
|
jsia: whats the problem?
|
[13:49] jsia
|
I used the C library for zeromq
|
[13:50] jsia
|
Im having problems with th einproc socket
|
[13:50] mikko
|
i tend to use boost::shared_ptr for sharing the context
|
[13:50] jsia
|
im using ZMQ_ROUTER
|
[13:50] mikko
|
what is the problem?
|
[13:50] jsia
|
the problem is during the first send I can receive the message and parse it correctly
|
[13:50] jsia
|
the recipient needs to reply to the sender it sends the message but the other party can't receive the message
|
[13:51] mikko
|
ROUTER on both ends?
|
[13:51] jsia
|
yup
|
[13:51] jsia
|
I have 2 classes I named it the client and the other one the proxy
|
[13:51] mikko
|
is the peer id the first part in the reply message?
|
[13:51] jsia
|
I have the code running in python
|
[13:51] jsia
|
I am just recoding it into C
|
[13:51] jsia
|
yup
|
[13:51] jsia
|
I'm passing the id
|
[13:52] jsia
|
I send two parts the first is the identity
|
[13:52] jsia
|
then the next is the real payload
|
[13:52] mikko
|
if you got router to router connection how do you know the id?
|
[13:52] mikko
|
do you use explicit identities?
|
[13:53] jsia
|
yup
|
[13:53] jsia
|
I pass the class object
|
[13:53] jsia
|
I have a method in the first class where it can return the id
|
[13:53] jsia
|
and alsoi the context
|
[13:53] jsia
|
by the way im using the same context
|
[13:53] jsia
|
isn't it that inproc communications are like that?
|
[13:53] jsia
|
that was what I did in python
|
[13:54] mikko
|
are these one to one communications?
|
[13:54] mikko
|
or one to many?
|
[13:54] jsia
|
currenyl one to one
|
[13:55] mikko
|
can i see the code?
|
[13:55] jsia
|
its a bit huge I'll try to get the most important snippets
|
[13:56] mikko
|
jsia: are you setting the identity on both sockets?
|
[13:57] jsia
|
yup
|
[13:57] jsia
|
oh
|
[13:57] jsia
|
wait
|
[13:57] jsia
|
only atht the proxy
|
[13:57] mikko
|
i'm not sure about ROUTER <-> ROUTER
|
[13:57] jsia
|
only one has an identity
|
[13:57] jsia
|
int linger = 0; this->INTERNAL_SOCKET = zmq_socket(context, ZMQ_ROUTER); identity = "mode-127.0.0.1_CL"; zmq_setsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, identity.c_str(), identity.size()); zmq_setsockopt(this->INTERNAL_SOCKET, ZMQ_LINGER, &linger, sizeof(linger)); zmq_bind(this->INTERNAL_SOCKET, "inproc://ipc.proc");
|
[13:57] mikko
|
you might be better off with DEALER <-> ROUTER
|
[13:57] jsia
|
that's how I bind the proxy
|
[13:58] jsia
|
do you think its better?
|
[13:58] jsia
|
Im trying to buiuld an async req reply
|
[13:58] mikko
|
it might be depending on the scenario
|
[13:58] jsia
|
woith reliability
|
[13:58] jsia
|
so I made a protocol
|
[13:58] mikko
|
DEALER automatically load-balances to connected peers
|
[13:58] jsia
|
that the client talks to a proxy registers itself wwaits for acknowledgement
|
[13:59] mikko
|
and you can use ROUTER to reply to specific DEALER
|
[13:59] jsia
|
then the proxy talks with another external proxy for the real consumer apps
|
[13:59] mikko
|
what i've used for brokers in the past
|
[14:00] mikko
|
producer DEALER < - > ROUTER broker DEALER < - > ROUTER consumer
|
[14:00] mikko
|
so producer fair queues between multiple brokers
|
[14:00] mikko
|
which fair queue to multiple consumers
|
[14:00] mikko
|
which ACK back to the specific broker
|
[14:00] jsia
|
ic
|
[14:01] jsia
|
that can also be a nice design
|
[14:02] jsia
|
does that provide async request reply?
|
[14:02] mikko
|
i guess you can use that for async reqrep
|
[14:02] mikko
|
i used it for persistent device
|
[14:03] jsia
|
ic
|
[14:03] mikko
|
but i dont see why it wouldnt work for async request reply
|
[14:04] mikko
|
it's a bit heavier design though
|
[14:04] jsia
|
its funny I have this wworking in python
|
[14:04] jsia
|
when I recode it basing on the python code it doesn't work
|
[14:05] mikko
|
are you making sure that you bind before connect?
|
[14:05] mikko
|
the inproc endpoint
|
[14:05] jsia
|
yup
|
[14:05] jsia
|
I call the proxy initialization first
|
[14:05] jsia
|
before the client
|
[14:05] mikko
|
you dont seem to be checking for errors in that code (or its stripped down)
|
[14:05] mikko
|
is this threaded or single process?
|
[14:05] jsia
|
ill try to make a striped down version
|
[14:05] jsia
|
just pass simple messages
|
[14:05] jsia
|
coz currently Im passing protobuf messages
|
[14:06] mikko
|
but is the program in multiple threads?
|
[14:06] jsia
|
currently the proxy runs on one thread
|
[14:06] jsia
|
and the client runs on another
|
[14:06] mikko
|
ok
|
[14:06] jsia
|
those are the only threads that im using
|
[14:06] mikko
|
and you are not receiving any messages from client?
|
[14:07] jsia
|
the proxy is receiving the message from the client
|
[14:07] jsia
|
but when the proxy replies
|
[14:07] jsia
|
the client is not receiving it
|
[14:07] mikko
|
are you copying the first message part from client to outgoing message?
|
[14:07] jsia
|
yup
|
[14:07] mikko
|
can you gist.github.com that part where the client reply happens
|
[14:07] jsia
|
ok
|
[14:08] mikko
|
why are you using C bindings inside C++ ?
|
[14:08] mikko
|
just out of curiosity
|
[14:10] jsia
|
U read about C bindings first then I was too lazy to implement the C++ hahaha
|
[14:10] jsia
|
i read the C++ code, it's also using the C library
|
[14:10] jsia
|
hahaha
|
[14:10] mikko
|
yes, it's pretty slim .hpp file
|
[14:10] jsia
|
I tried it a while ago
|
[14:11] jsia
|
I got problems with the context
|
[14:11] mikko
|
what sort of problems?
|
[14:11] mikko
|
i usually create context in the main () and pass it down from there
|
[14:12] mikko
|
and tend to create the sockets there as well and inject to different objects
|
[14:12] mikko
|
https://github.com/mkoppanen/pzq/blob/master/src/main.cpp#L149
|
[14:15] jsia
|
git@gist.github.com:fe8f95bcb86b61279a7c.git
|
[14:15] jsia
|
I tried that a while ago however its wierd
|
[14:16] jsia
|
i was initializing the classes on main after i created the context
|
[14:16] jsia
|
inside the class initialization
|
[14:16] jsia
|
I bind the socket
|
[14:16] jsia
|
then after it finishes the initalization the socket gets disconnected
|
[14:17] mikko
|
does it go out of scope?
|
[14:18] mikko
|
zmq_getsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, (void *) sidentity, &identity_size);
|
[14:18] mikko
|
you are getting the identity of the internal socket here
|
[14:18] mikko
|
but you are sending a message to the client
|
[14:19] mikko
|
not sure i fully understand this code
|
[14:19] jsia
|
I send two parts
|
[14:19] jsia
|
the first is the identity
|
[14:19] jsia
|
however when I try to execute s_recv
|
[14:19] jsia
|
I don't get the first message
|
[14:19] jsia
|
when I get the next message I get the payload
|
[14:19] jsia
|
that's why I executed two s_recv
|
[14:20] mikko
|
you need to check for RCVMORE
|
[14:20] mikko
|
c bindings will return only one part at a time
|
[14:20] jsia
|
but when I execute two recv will I get both messages?
|
[14:21] mikko
|
but you dont really know where the boundary is
|
[14:21] mikko
|
that makes the assumption that client always sends two messages
|
[14:21] jsia
|
yeah
|
[14:21] mikko
|
it's better to loop with ZMQ_RCVMORE
|
[14:21] jsia
|
ok
|
[14:21] jsia
|
currently i only send two messages
|
[14:21] mikko
|
https://github.com/mkoppanen/pzq/blob/master/src/socket.hpp#L53
|
[14:21] jsia
|
ill fix that one
|
[14:21] mikko
|
something similar to that
|
[14:21] mikko
|
where do you send the message back to client?
|
[14:21] mikko
|
on line 41?
|
[14:22] jsia
|
yup
|
[14:22] mikko
|
and identity is what you init on line 23?
|
[14:22] jsia
|
yup
|
[14:22] mikko
|
that confuses me
|
[14:23] mikko
|
you are getting the identity of the INTERNAL_SOCKET
|
[14:23] mikko
|
and use that when sending a message to EXTERNAL_SOKET
|
[14:23] mikko
|
+C
|
[14:23] mikko
|
you most likely want to use the identity of the EXTERNAL_SOCKET rather than the internal one
|
[14:23] mikko
|
otherwise it will be null router
|
[14:23] mikko
|
routed*
|
[14:24] mikko
|
or did i miss something?
|
[14:25] jsia
|
the external socket is a tcp socket
|
[14:25] jsia
|
that's for my external communication
|
[14:26] mikko
|
i am very confused
|
[14:26] jsia
|
Sending regiseter node-127.0.0.1_CL as identity Sending localhost-10834-0client_service as content finish sending .... The message id is localhost-10834-0 output message id to be passeed to the get_response localhost-10834-0 Fetching Internal Identity node-127.0.0.1_CL Receiving next message part received regitration request Proxy will reply with register_req_ack Sending reply back to the internal socket Sending node-127.0.0.1_CL
|
[14:26] jsia
|
oops
|
[14:26] jsia
|
git@gist.github.com:2e0ff7ec07fb9fcb243a.git
|
[14:26] jsia
|
here's a sample output when I run the code
|
[14:28] mikko
|
you are using the identity of the internal socket when you send
|
[14:28] mikko
|
are you trying to send message to itself?
|
[14:29] jsia
|
yup
|
[14:29] jsia
|
im sending it as the first part of the message
|
[14:30] mikko
|
but this confuses me
|
[14:30] mikko
|
are you not trying to send a message to a socket that is connected to INTERNAL_SOCKET ?
|
[14:30] mikko
|
rather than to the INTERNAL_SOCKET itself
|
[14:31] jsia
|
yup
|
[14:31] mikko
|
so why do you route the message to the INTERNAL_SOCKET ?
|
[14:31] jsia
|
Im trying to get the identity that sent me a mesage
|
[14:31] mikko
|
the first part determines where the message goes to
|
[14:31] jsia
|
and answer back to it
|
[14:31] mikko
|
zmq_getsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, (void *) sidentity, &identity_size);
|
[14:31] mikko
|
this will give you the identity of the current socket
|
[14:31] mikko
|
not the remote peer
|
[14:32] jsia
|
I see
|
[14:33] jsia
|
hmmm I think that's the difference with my python code
|
[14:33] jsia
|
coz with python I can receive the first part of the message that the client is sending
|
[14:33] jsia
|
I think I have to implement the recvmore
|
[14:34] mikko
|
jsia: https://github.com/mkoppanen/pzq/blob/master/src/socket.hpp#L53
|
[14:34] mikko
|
you need something like that
|
[14:35] mikko
|
and the first part is the remote identity
|
[14:36] jsia
|
ok ill give it a try :)
|
[14:36] jsia
|
thnx for the help :D
|
[14:36] jsia
|
Ill keep you posted what happened
|
[14:37] mikko
|
cool
|
[14:37] mikko
|
ill hit the gym, bbl
|
[16:40] mbj
|
Is there a zmq buildin that enables workers to "pull request when ready". So the upstream will not push data to dead / inexistend workers?
|
[17:25] mikko
|
mbj: yes
|
[17:25] mikko
|
mbj: how much have you read the guide?
|
[17:30] mbj
|
mikko: I read the guide until a point I asked myself why is mongrel2 using PUSH / PULL for handlers. When a handler was handling a message the peer would send messages to this address even when this handler was removed and will never come back.
|
[17:31] mbj
|
mikko: Maybe I missunderstood this part.
|
[17:31] mikko
|
mbj: messages wont be sent to that peer when it goes away
|
[17:31] mikko
|
the messages are round-robined between connected peers
|
[17:31] mikko
|
there is a small window before the upstream notices that the downstream has gone away
|
[17:31] mbj
|
Okay but there could be some messages not jet processed in the handlers mailbox
|
[17:32] mikko
|
but by design push/pull is fire and forget
|
[17:32] mikko
|
there are other socket types if ACK-type behaviour is needed
|
[17:32] mikko
|
also, LIBZMQ-160 might be somewhat related
|
[17:32] mbj
|
reading
|
[17:33] mikko
|
but in case ACKs are needed ROUTER/DEALER pattern can be used
|
[17:33] mbj
|
So I have to refine my initial query
|
[17:36] mikko
|
some messages can be pushed to the downstream (depends on HWM values) and in case of a crash they might get lost
|
[17:36] mikko
|
in such a scenario timeout based retry is the best bet
|
[17:37] mbj
|
mikko: Will do some more doc research, thx for your time and for pointing me on the "connected" part
|
[17:39] mikko
|
i think the guide hopefully explains this
|
[17:40] mbj
|
mikko: actualy it does. My problem was the "connected". Since I did not realize zeromq will not reconnect from peer to downstream (downstream had initiated the connection) this question came up.
|
[17:41] mikko
|
mbj: in zeromq the upstream can also connect to bound downstream
|
[17:41] mikko
|
:)
|
[17:41] mikko
|
to confuse things more
|
[17:42] mikko
|
but if there is no underlying tcp connection to peer messages wont be sent to it
|
[17:42] mbj
|
mikko: I knew
|
[17:47] mbj
|
mikko: Im implementing worker driven consuming of jobs. I'll make sure no message is in the workers mailbox when he crashes using REQ/REP at the jobsserver. Used PULL before but ran into the assumption the jobserver will not realize a worker is gone and try to reconnect while throwing messages in the workers socket :)
|
[17:54] mbj
|
Can I limit a workers mailbox size using HQM on workers PULL socket?
|
[17:54] mbj
|
s/mailbox size/mailbox capacity/
|
[17:54] mbj
|
s/HQM/HWM/ (sorry)
|
[18:49] mikko
|
mbj: you might want to use more explicit ack mechanism
|
[18:50] mikko
|
like the worker signals back when the work is done
|
[18:58] mbj
|
mikko: I'll do so, saw the discussions on HWM on stackoverflow
|
[19:53] mbj
|
Is it possible to receive the current outgoing/incoming mailbox queue length of a socket? (And is mailbox the corret term?)
|
[20:15] mikko
|
mbj: nope
|
[20:27] hoppy
|
I have a long-running app that uses PUB with multiple SUB clients, and REP with multiple REQ clients that runs flawlessly for hours and then stops communicating. This is Centos 5. Any suggestions on a debugging methodology, or other ideas?
|
[20:40] mbj
|
mikko: Something planned in this direction? Would be nice to have a getsockopt option.
|
[20:41] mikko
|
mbj: there has been some talk
|
[20:42] mikko
|
mbj: but it's impossible to provide accurate count
|
[20:42] mikko
|
hoppy: what language?
|
[20:47] mbj
|
mikko: Because there is one queue for all sockets? Or the queue data structure does not allow to detemine the length in an easy way?
|
[20:48] mikko
|
no, not that
|
[20:49] mikko
|
because some messages might be in network buffers on each side
|
[20:49] mbj
|
mikko: Ahh and the size of this buffer does not tell you if there are 1000 small or one big message
|
[20:49] mikko
|
indeed
|
[20:50] mikko
|
tcp buffers for example are just bytes
|
[20:50] mbj
|
Forgot about them
|
[20:50] mbj
|
zmqs abstraction is good :)
|
[20:51] mbj
|
So the tcp buffers are normally larger than zmq mailbox queue...
|
[20:53] hoppy
|
mikko: C++ as the main app, there are C++ and Python clients both talking to it.
|
[20:56] mikko
|
hoppy: do both req/rep and pub/sub stop?
|
[20:56] mikko
|
are they in same thread?
|
[20:56] hoppy
|
it's single threaded. The req/rep continue fine, but the PUBs quit being heard by the subscribers.
|
[20:57] mikko
|
hoppy: 2.1?
|
[20:57] hoppy
|
I have a similar case in another app (again PUB/SUB) with a similar failure rate - but that one is TCP across machines.
|
[20:58] mikko
|
in both cases is the producer C or C++ and consumer python?
|
[20:58] mikko
|
oh nm
|
[20:58] mikko
|
that sounds odd indeed
|
[20:59] hoppy
|
I've been playing with versions for some time hoping to quash it, right now using 2.1.9 that I built my own rpm for.
|
[20:59] mikko
|
hoppy: it would be beneficial if you can attach gdb
|
[20:59] mikko
|
and see where there server is
|
[20:59] hoppy
|
in the other app python to python. In this one C++ with both clients
|
[20:59] mikko
|
thread apply all bt
|
[21:00] mikko
|
where the*
|
[21:01] mikko
|
hoppy: is it possible that the main app is blocking somewhere outside zeromq?
|
[21:01] mikko
|
or have you isolated this to be zeromq related?
|
[21:01] hoppy
|
I don't believe so.
|
[21:01] mikko
|
backtrace would be helpful
|
[21:01] hoppy
|
I need to double check that it is stuck now, but this is what's up ATM:
|
[21:02] hoppy
|
gdb) thread apply all bt Thread 4 (Thread 0x428a1940 (LWP 24268)): #0 0x0000003178cd48a8 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003e45c0f2f8 in zmq_msg_data () from /usr/lib64/libzmq.so.1 #2 0x0000003e45c22457 in zmq_msg_data () from /usr/lib64/libzmq.so.1 #3 0x000000317980673d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003178cd44bd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x432a2940 (LWP 24270)):
|
[21:02] hoppy
|
useless pasting that.
|
[21:02] hoppy
|
send me a quick email hoppy@gnurdle.com and I'll reply with the stacktrace
|
[21:02] mikko
|
gist.github.com works well for paste
|
[21:02] hoppy
|
(needs a better irc client)
|
[21:03] mikko
|
just paste to http://gist.github.com and provide url to your paste
|
[21:03] mikko
|
that is probably the easiest way
|
[21:03] hoppy
|
git://gist.github.com/1307887.git
|
[21:04] mikko
|
thanks
|
[21:04] hoppy
|
It didn't like being debugged. I had to restart after that.
|
[21:05] mikko
|
most likely zmq_poll returning with interrupted
|
[21:05] mikko
|
if you are using it
|
[21:05] hoppy
|
agree
|
[21:05] mikko
|
might be helpful to compile with symbols and without optimisations
|
[21:05] mikko
|
the current backtrace seems to be optimised and doesn't have symbols
|
[21:06] hoppy
|
k.
|
[21:06] hoppy
|
I think I will try to make a smaller testcase and see if I can reproduce.
|
[21:06] mikko
|
that would be useful
|
[21:06] hoppy
|
this one has 15-20G/day going through it, and it is ugly to knock it over.
|
[21:06] mikko
|
if you can isolate a test case open an issue in jira
|
[21:07] mikko
|
the gdb trace only helps if you take it when the app starts blocking
|
[21:07] hoppy
|
will try.
|
[21:07] mikko
|
i assume the issue is on server
|
[21:07] hoppy
|
agree
|
[21:07] mikko
|
any possiblity of a flaky network or so?
|
[21:07] mikko
|
have you tried reconnecting a consumer after this happens?
|
[21:07] hoppy
|
should all be tcp inside the same box.
|
[21:08] mikko
|
ok, that excludes network pretty much
|
[21:08] mikko
|
reproduce case would be good
|
[21:08] mikko
|
and if that doesn't work a backtrace when things grind to halt might help as well
|
[21:08] mbj
|
does zmq_recv with ZMQ_NOBLOCK guarantee the whole message is readable not only a part? (IMHO it does, but it is better to ask once rather than building a system around an incorrect assumption)
|
[21:09] hoppy
|
this thing is basically doing req/rep against requestors who want to poke something into a database, poking it in, and then notifying everybody on a pub/sub that it went it.and every time s
|
[21:09] mikko
|
mbj: yes, if you can receive a part you can receive rest
|
[21:09] mbj
|
mikko: thx
|
[21:09] hoppy
|
I can always still see things going in the database, but can't hear the pub/sub anymore
|
[21:10] mikko
|
hoppy: so req rep keeps going but pub sub stops?
|
[21:10] mikko
|
ok
|
[21:10] hoppy
|
correct
|
[21:10] cremes
|
hoppy: and netstat shows the ports still open/connected ?
|
[21:10] hoppy
|
haven't checked that.
|
[21:11] mikko
|
that would be helpful as well
|
[21:11] hoppy
|
I'm basically trying to get an isolation procedure.
|
[21:11] hoppy
|
life would be simpler if I could get it to fail in a crucible, we'll see how lucky I am with that.
|
[21:12] hoppy
|
concerned because it is a very long time between issues.
|
[21:12] cremes
|
hoppy: i typically run PUB/SUB for 5 days at a time and push about 10G a day through the PUB
|
[21:12] cremes
|
without any hangs or weirdness
|
[21:13] cremes
|
and all socket events are handled via zmq_poll & friends
|
[21:13] cremes
|
so the same code paths are likely being exercised
|
[21:13] mikko
|
cremes: this is on linux as well?
|
[21:14] cremes
|
osx + linux
|
[21:14] cremes
|
hoppy: any chance zmq_close() is getting called on the pub socket?
|
[21:14] cremes
|
by default the socket will linger indefinitely
|
[21:14] hoppy
|
I doubt it.
|
[21:15] cremes
|
you might want to set LINGER to 0 and see if the thread goes away which could indicate it got closed
|
[21:15] cremes
|
ok
|
[21:15] cremes
|
another idea...
|
[21:15] cremes
|
since you are doing zmq_poll, do you have any "churn" on the poll item list that you hand to that function?
|
[21:15] cremes
|
that is, do you add and/or delete a lot of sockets from that list?
|
[21:15] cremes
|
if so, it's possible you are leaving the PUB socket off the list and it never gets polled again
|
[21:16] cremes
|
(i did that once :)
|
[21:16] mikko
|
you shouldn't need to poll pub
|
[21:16] mikko
|
as the messages are dropped if there are no peers
|
[21:18] cremes
|
er, sorry... the SUB socket that is supposed to be receiving data from the PUB
|
[21:19] mikko
|
i understood that in this case there are many consumers
|
[21:19] mikko
|
which all stop receiving the pubs
|