ZeroMq IRC Log

Sunday October 23, 2011

[Time] Name	Message
[11:53] jsia	im coding using C++
[11:53] jsia	can anyone help on how to share the zmq context within 2 classes
[13:28] hwinkel	hi
[13:45] mikko	jsia: whats the problem?
[13:49] jsia	I used the C library for zeromq
[13:50] jsia	Im having problems with th einproc socket
[13:50] mikko	i tend to use boost::shared_ptr for sharing the context
[13:50] jsia	im using ZMQ_ROUTER
[13:50] mikko	what is the problem?
[13:50] jsia	the problem is during the first send I can receive the message and parse it correctly
[13:50] jsia	the recipient needs to reply to the sender it sends the message but the other party can't receive the message
[13:51] mikko	ROUTER on both ends?
[13:51] jsia	yup
[13:51] jsia	I have 2 classes I named it the client and the other one the proxy
[13:51] mikko	is the peer id the first part in the reply message?
[13:51] jsia	I have the code running in python
[13:51] jsia	I am just recoding it into C
[13:51] jsia	yup
[13:51] jsia	I'm passing the id
[13:52] jsia	I send two parts the first is the identity
[13:52] jsia	then the next is the real payload
[13:52] mikko	if you got router to router connection how do you know the id?
[13:52] mikko	do you use explicit identities?
[13:53] jsia	yup
[13:53] jsia	I pass the class object
[13:53] jsia	I have a method in the first class where it can return the id
[13:53] jsia	and alsoi the context
[13:53] jsia	by the way im using the same context
[13:53] jsia	isn't it that inproc communications are like that?
[13:53] jsia	that was what I did in python
[13:54] mikko	are these one to one communications?
[13:54] mikko	or one to many?
[13:54] jsia	currenyl one to one
[13:55] mikko	can i see the code?
[13:55] jsia	its a bit huge I'll try to get the most important snippets
[13:56] mikko	jsia: are you setting the identity on both sockets?
[13:57] jsia	yup
[13:57] jsia	oh
[13:57] jsia	wait
[13:57] jsia	only atht the proxy
[13:57] mikko	i'm not sure about ROUTER <-> ROUTER
[13:57] jsia	only one has an identity
[13:57] jsia	int linger = 0; this->INTERNAL_SOCKET = zmq_socket(context, ZMQ_ROUTER); identity = "mode-127.0.0.1_CL"; zmq_setsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, identity.c_str(), identity.size()); zmq_setsockopt(this->INTERNAL_SOCKET, ZMQ_LINGER, &linger, sizeof(linger)); zmq_bind(this->INTERNAL_SOCKET, "inproc://ipc.proc");
[13:57] mikko	you might be better off with DEALER <-> ROUTER
[13:57] jsia	that's how I bind the proxy
[13:58] jsia	do you think its better?
[13:58] jsia	Im trying to buiuld an async req reply
[13:58] mikko	it might be depending on the scenario
[13:58] jsia	woith reliability
[13:58] jsia	so I made a protocol
[13:58] mikko	DEALER automatically load-balances to connected peers
[13:58] jsia	that the client talks to a proxy registers itself wwaits for acknowledgement
[13:59] mikko	and you can use ROUTER to reply to specific DEALER
[13:59] jsia	then the proxy talks with another external proxy for the real consumer apps
[13:59] mikko	what i've used for brokers in the past
[14:00] mikko	producer DEALER < - > ROUTER broker DEALER < - > ROUTER consumer
[14:00] mikko	so producer fair queues between multiple brokers
[14:00] mikko	which fair queue to multiple consumers
[14:00] mikko	which ACK back to the specific broker
[14:00] jsia	ic
[14:01] jsia	that can also be a nice design
[14:02] jsia	does that provide async request reply?
[14:02] mikko	i guess you can use that for async reqrep
[14:02] mikko	i used it for persistent device
[14:03] jsia	ic
[14:03] mikko	but i dont see why it wouldnt work for async request reply
[14:04] mikko	it's a bit heavier design though
[14:04] jsia	its funny I have this wworking in python
[14:04] jsia	when I recode it basing on the python code it doesn't work
[14:05] mikko	are you making sure that you bind before connect?
[14:05] mikko	the inproc endpoint
[14:05] jsia	yup
[14:05] jsia	I call the proxy initialization first
[14:05] jsia	before the client
[14:05] mikko	you dont seem to be checking for errors in that code (or its stripped down)
[14:05] mikko	is this threaded or single process?
[14:05] jsia	ill try to make a striped down version
[14:05] jsia	just pass simple messages
[14:05] jsia	coz currently Im passing protobuf messages
[14:06] mikko	but is the program in multiple threads?
[14:06] jsia	currently the proxy runs on one thread
[14:06] jsia	and the client runs on another
[14:06] mikko	ok
[14:06] jsia	those are the only threads that im using
[14:06] mikko	and you are not receiving any messages from client?
[14:07] jsia	the proxy is receiving the message from the client
[14:07] jsia	but when the proxy replies
[14:07] jsia	the client is not receiving it
[14:07] mikko	are you copying the first message part from client to outgoing message?
[14:07] jsia	yup
[14:07] mikko	can you gist.github.com that part where the client reply happens
[14:07] jsia	ok
[14:08] mikko	why are you using C bindings inside C++ ?
[14:08] mikko	just out of curiosity
[14:10] jsia	U read about C bindings first then I was too lazy to implement the C++ hahaha
[14:10] jsia	i read the C++ code, it's also using the C library
[14:10] jsia	hahaha
[14:10] mikko	yes, it's pretty slim .hpp file
[14:10] jsia	I tried it a while ago
[14:11] jsia	I got problems with the context
[14:11] mikko	what sort of problems?
[14:11] mikko	i usually create context in the main () and pass it down from there
[14:12] mikko	and tend to create the sockets there as well and inject to different objects
[14:12] mikko	https://github.com/mkoppanen/pzq/blob/master/src/main.cpp#L149
[14:15] jsia	git@gist.github.com:fe8f95bcb86b61279a7c.git
[14:15] jsia	I tried that a while ago however its wierd
[14:16] jsia	i was initializing the classes on main after i created the context
[14:16] jsia	inside the class initialization
[14:16] jsia	I bind the socket
[14:16] jsia	then after it finishes the initalization the socket gets disconnected
[14:17] mikko	does it go out of scope?
[14:18] mikko	zmq_getsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, (void *) sidentity, &identity_size);
[14:18] mikko	you are getting the identity of the internal socket here
[14:18] mikko	but you are sending a message to the client
[14:19] mikko	not sure i fully understand this code
[14:19] jsia	I send two parts
[14:19] jsia	the first is the identity
[14:19] jsia	however when I try to execute s_recv
[14:19] jsia	I don't get the first message
[14:19] jsia	when I get the next message I get the payload
[14:19] jsia	that's why I executed two s_recv
[14:20] mikko	you need to check for RCVMORE
[14:20] mikko	c bindings will return only one part at a time
[14:20] jsia	but when I execute two recv will I get both messages?
[14:21] mikko	but you dont really know where the boundary is
[14:21] mikko	that makes the assumption that client always sends two messages
[14:21] jsia	yeah
[14:21] mikko	it's better to loop with ZMQ_RCVMORE
[14:21] jsia	ok
[14:21] jsia	currently i only send two messages
[14:21] mikko	https://github.com/mkoppanen/pzq/blob/master/src/socket.hpp#L53
[14:21] jsia	ill fix that one
[14:21] mikko	something similar to that
[14:21] mikko	where do you send the message back to client?
[14:21] mikko	on line 41?
[14:22] jsia	yup
[14:22] mikko	and identity is what you init on line 23?
[14:22] jsia	yup
[14:22] mikko	that confuses me
[14:23] mikko	you are getting the identity of the INTERNAL_SOCKET
[14:23] mikko	and use that when sending a message to EXTERNAL_SOKET
[14:23] mikko	+C
[14:23] mikko	you most likely want to use the identity of the EXTERNAL_SOCKET rather than the internal one
[14:23] mikko	otherwise it will be null router
[14:23] mikko	routed*
[14:24] mikko	or did i miss something?
[14:25] jsia	the external socket is a tcp socket
[14:25] jsia	that's for my external communication
[14:26] mikko	i am very confused
[14:26] jsia	Sending regiseter node-127.0.0.1_CL as identity Sending localhost-10834-0client_service as content finish sending .... The message id is localhost-10834-0 output message id to be passeed to the get_response localhost-10834-0 Fetching Internal Identity node-127.0.0.1_CL Receiving next message part received regitration request Proxy will reply with register_req_ack Sending reply back to the internal socket Sending node-127.0.0.1_CL
[14:26] jsia	oops
[14:26] jsia	git@gist.github.com:2e0ff7ec07fb9fcb243a.git
[14:26] jsia	here's a sample output when I run the code
[14:28] mikko	you are using the identity of the internal socket when you send
[14:28] mikko	are you trying to send message to itself?
[14:29] jsia	yup
[14:29] jsia	im sending it as the first part of the message
[14:30] mikko	but this confuses me
[14:30] mikko	are you not trying to send a message to a socket that is connected to INTERNAL_SOCKET ?
[14:30] mikko	rather than to the INTERNAL_SOCKET itself
[14:31] jsia	yup
[14:31] mikko	so why do you route the message to the INTERNAL_SOCKET ?
[14:31] jsia	Im trying to get the identity that sent me a mesage
[14:31] mikko	the first part determines where the message goes to
[14:31] jsia	and answer back to it
[14:31] mikko	zmq_getsockopt(this->INTERNAL_SOCKET, ZMQ_IDENTITY, (void *) sidentity, &identity_size);
[14:31] mikko	this will give you the identity of the current socket
[14:31] mikko	not the remote peer
[14:32] jsia	I see
[14:33] jsia	hmmm I think that's the difference with my python code
[14:33] jsia	coz with python I can receive the first part of the message that the client is sending
[14:33] jsia	I think I have to implement the recvmore
[14:34] mikko	jsia: https://github.com/mkoppanen/pzq/blob/master/src/socket.hpp#L53
[14:34] mikko	you need something like that
[14:35] mikko	and the first part is the remote identity
[14:36] jsia	ok ill give it a try :)
[14:36] jsia	thnx for the help :D
[14:36] jsia	Ill keep you posted what happened
[14:37] mikko	cool
[14:37] mikko	ill hit the gym, bbl
[16:40] mbj	Is there a zmq buildin that enables workers to "pull request when ready". So the upstream will not push data to dead / inexistend workers?
[17:25] mikko	mbj: yes
[17:25] mikko	mbj: how much have you read the guide?
[17:30] mbj	mikko: I read the guide until a point I asked myself why is mongrel2 using PUSH / PULL for handlers. When a handler was handling a message the peer would send messages to this address even when this handler was removed and will never come back.
[17:31] mbj	mikko: Maybe I missunderstood this part.
[17:31] mikko	mbj: messages wont be sent to that peer when it goes away
[17:31] mikko	the messages are round-robined between connected peers
[17:31] mikko	there is a small window before the upstream notices that the downstream has gone away
[17:31] mbj	Okay but there could be some messages not jet processed in the handlers mailbox
[17:32] mikko	but by design push/pull is fire and forget
[17:32] mikko	there are other socket types if ACK-type behaviour is needed
[17:32] mikko	also, LIBZMQ-160 might be somewhat related
[17:32] mbj	reading
[17:33] mikko	but in case ACKs are needed ROUTER/DEALER pattern can be used
[17:33] mbj	So I have to refine my initial query
[17:36] mikko	some messages can be pushed to the downstream (depends on HWM values) and in case of a crash they might get lost
[17:36] mikko	in such a scenario timeout based retry is the best bet
[17:37] mbj	mikko: Will do some more doc research, thx for your time and for pointing me on the "connected" part
[17:39] mikko	i think the guide hopefully explains this
[17:40] mbj	mikko: actualy it does. My problem was the "connected". Since I did not realize zeromq will not reconnect from peer to downstream (downstream had initiated the connection) this question came up.
[17:41] mikko	mbj: in zeromq the upstream can also connect to bound downstream
[17:41] mikko	:)
[17:41] mikko	to confuse things more
[17:42] mikko	but if there is no underlying tcp connection to peer messages wont be sent to it
[17:42] mbj	mikko: I knew
[17:47] mbj	mikko: Im implementing worker driven consuming of jobs. I'll make sure no message is in the workers mailbox when he crashes using REQ/REP at the jobsserver. Used PULL before but ran into the assumption the jobserver will not realize a worker is gone and try to reconnect while throwing messages in the workers socket :)
[17:54] mbj	Can I limit a workers mailbox size using HQM on workers PULL socket?
[17:54] mbj	s/mailbox size/mailbox capacity/
[17:54] mbj	s/HQM/HWM/ (sorry)
[18:49] mikko	mbj: you might want to use more explicit ack mechanism
[18:50] mikko	like the worker signals back when the work is done
[18:58] mbj	mikko: I'll do so, saw the discussions on HWM on stackoverflow
[19:53] mbj	Is it possible to receive the current outgoing/incoming mailbox queue length of a socket? (And is mailbox the corret term?)
[20:15] mikko	mbj: nope
[20:27] hoppy	I have a long-running app that uses PUB with multiple SUB clients, and REP with multiple REQ clients that runs flawlessly for hours and then stops communicating. This is Centos 5. Any suggestions on a debugging methodology, or other ideas?
[20:40] mbj	mikko: Something planned in this direction? Would be nice to have a getsockopt option.
[20:41] mikko	mbj: there has been some talk
[20:42] mikko	mbj: but it's impossible to provide accurate count
[20:42] mikko	hoppy: what language?
[20:47] mbj	mikko: Because there is one queue for all sockets? Or the queue data structure does not allow to detemine the length in an easy way?
[20:48] mikko	no, not that
[20:49] mikko	because some messages might be in network buffers on each side
[20:49] mbj	mikko: Ahh and the size of this buffer does not tell you if there are 1000 small or one big message
[20:49] mikko	indeed
[20:50] mikko	tcp buffers for example are just bytes
[20:50] mbj	Forgot about them
[20:50] mbj	zmqs abstraction is good :)
[20:51] mbj	So the tcp buffers are normally larger than zmq mailbox queue...
[20:53] hoppy	mikko: C++ as the main app, there are C++ and Python clients both talking to it.
[20:56] mikko	hoppy: do both req/rep and pub/sub stop?
[20:56] mikko	are they in same thread?
[20:56] hoppy	it's single threaded. The req/rep continue fine, but the PUBs quit being heard by the subscribers.
[20:57] mikko	hoppy: 2.1?
[20:57] hoppy	I have a similar case in another app (again PUB/SUB) with a similar failure rate - but that one is TCP across machines.
[20:58] mikko	in both cases is the producer C or C++ and consumer python?
[20:58] mikko	oh nm
[20:58] mikko	that sounds odd indeed
[20:59] hoppy	I've been playing with versions for some time hoping to quash it, right now using 2.1.9 that I built my own rpm for.
[20:59] mikko	hoppy: it would be beneficial if you can attach gdb
[20:59] mikko	and see where there server is
[20:59] hoppy	in the other app python to python. In this one C++ with both clients
[20:59] mikko	thread apply all bt
[21:00] mikko	where the*
[21:01] mikko	hoppy: is it possible that the main app is blocking somewhere outside zeromq?
[21:01] mikko	or have you isolated this to be zeromq related?
[21:01] hoppy	I don't believe so.
[21:01] mikko	backtrace would be helpful
[21:01] hoppy	I need to double check that it is stuck now, but this is what's up ATM:
[21:02] hoppy	gdb) thread apply all bt Thread 4 (Thread 0x428a1940 (LWP 24268)): #0 0x0000003178cd48a8 in epoll_wait () from /lib64/libc.so.6 #1 0x0000003e45c0f2f8 in zmq_msg_data () from /usr/lib64/libzmq.so.1 #2 0x0000003e45c22457 in zmq_msg_data () from /usr/lib64/libzmq.so.1 #3 0x000000317980673d in start_thread () from /lib64/libpthread.so.0 #4 0x0000003178cd44bd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x432a2940 (LWP 24270)):
[21:02] hoppy	useless pasting that.
[21:02] hoppy	send me a quick email hoppy@gnurdle.com and I'll reply with the stacktrace
[21:02] mikko	gist.github.com works well for paste
[21:02] hoppy	(needs a better irc client)
[21:03] mikko	just paste to http://gist.github.com and provide url to your paste
[21:03] mikko	that is probably the easiest way
[21:03] hoppy	git://gist.github.com/1307887.git
[21:04] mikko	thanks
[21:04] hoppy	It didn't like being debugged. I had to restart after that.
[21:05] mikko	most likely zmq_poll returning with interrupted
[21:05] mikko	if you are using it
[21:05] hoppy	agree
[21:05] mikko	might be helpful to compile with symbols and without optimisations
[21:05] mikko	the current backtrace seems to be optimised and doesn't have symbols
[21:06] hoppy	k.
[21:06] hoppy	I think I will try to make a smaller testcase and see if I can reproduce.
[21:06] mikko	that would be useful
[21:06] hoppy	this one has 15-20G/day going through it, and it is ugly to knock it over.
[21:06] mikko	if you can isolate a test case open an issue in jira
[21:07] mikko	the gdb trace only helps if you take it when the app starts blocking
[21:07] hoppy	will try.
[21:07] mikko	i assume the issue is on server
[21:07] hoppy	agree
[21:07] mikko	any possiblity of a flaky network or so?
[21:07] mikko	have you tried reconnecting a consumer after this happens?
[21:07] hoppy	should all be tcp inside the same box.
[21:08] mikko	ok, that excludes network pretty much
[21:08] mikko	reproduce case would be good
[21:08] mikko	and if that doesn't work a backtrace when things grind to halt might help as well
[21:08] mbj	does zmq_recv with ZMQ_NOBLOCK guarantee the whole message is readable not only a part? (IMHO it does, but it is better to ask once rather than building a system around an incorrect assumption)
[21:09] hoppy	this thing is basically doing req/rep against requestors who want to poke something into a database, poking it in, and then notifying everybody on a pub/sub that it went it.and every time s
[21:09] mikko	mbj: yes, if you can receive a part you can receive rest
[21:09] mbj	mikko: thx
[21:09] hoppy	I can always still see things going in the database, but can't hear the pub/sub anymore
[21:10] mikko	hoppy: so req rep keeps going but pub sub stops?
[21:10] mikko	ok
[21:10] hoppy	correct
[21:10] cremes	hoppy: and netstat shows the ports still open/connected ?
[21:10] hoppy	haven't checked that.
[21:11] mikko	that would be helpful as well
[21:11] hoppy	I'm basically trying to get an isolation procedure.
[21:11] hoppy	life would be simpler if I could get it to fail in a crucible, we'll see how lucky I am with that.
[21:12] hoppy	concerned because it is a very long time between issues.
[21:12] cremes	hoppy: i typically run PUB/SUB for 5 days at a time and push about 10G a day through the PUB
[21:12] cremes	without any hangs or weirdness
[21:13] cremes	and all socket events are handled via zmq_poll & friends
[21:13] cremes	so the same code paths are likely being exercised
[21:13] mikko	cremes: this is on linux as well?
[21:14] cremes	osx + linux
[21:14] cremes	hoppy: any chance zmq_close() is getting called on the pub socket?
[21:14] cremes	by default the socket will linger indefinitely
[21:14] hoppy	I doubt it.
[21:15] cremes	you might want to set LINGER to 0 and see if the thread goes away which could indicate it got closed
[21:15] cremes	ok
[21:15] cremes	another idea...
[21:15] cremes	since you are doing zmq_poll, do you have any "churn" on the poll item list that you hand to that function?
[21:15] cremes	that is, do you add and/or delete a lot of sockets from that list?
[21:15] cremes	if so, it's possible you are leaving the PUB socket off the list and it never gets polled again
[21:16] cremes	(i did that once :)
[21:16] mikko	you shouldn't need to poll pub
[21:16] mikko	as the messages are dropped if there are no peers
[21:18] cremes	er, sorry... the SUB socket that is supposed to be receiving data from the PUB
[21:19] mikko	i understood that in this case there are many consumers
[21:19] mikko	which all stop receiving the pubs

ZeroMq Home

Sunday October 23, 2011