migrated from Bugzilla #458512
status RESOLVED severity major in component MQTT-Embedded-C for 1.2
Reported in version 1.0 on platform Other
Assigned to: Ian Craggs
Original attachment names and IDs:
On 2015-01-27 07:41:57 -0500, Roger Fowler wrote:
Created attachment 250259
Mosquitto message log
The HELLO.INO program eventually hangs when the Mosquitto MAX_INFLIGHT_MSGS limit is reached. Running Mosquitto in verbose mode (output attached) shows that the client has subscribed to a topic with QoS=2 but is not responding to the PUBREL messages i.e. it never issues a PUBCOMP in response. Eventually this hits the max_inflight_msgs limit which causes the hang. The log shows the first PUBREL is sent by the broker at 1422353538 with message id 2 but no corresponding PUBCOMP is ever returned by the client.
The necessary code to respond to an incoming PUBREL appears to be missing from:
MQTTClient -> MQTTClient.h (around line 568) and
MQTTClientC -> MQTTClient.c (around line 270)
Hope I'm not standing on anyone's toes here, and apologies if I've reported this incorrectly - I'm a newbie to Bugzilla, Eclipse and Git !
Regards
Roger Fowler
On 2015-01-27 07:54:18 -0500, Roger Fowler wrote:
Created attachment 250261
Mosquitto message log
Annotated with <<< to identify the PUBREL messages from the server that were not responded to
On 2015-01-28 05:12:00 -0500, Ian Craggs wrote:
Hi Roger,
thanks for the report -- you've done exactly the right thing by raising this bug! I was going to invite you to raise any bugs you found when I saw your comments on Twitter, but I didn't actually do so yet.
I think I've fixed this on the mbed version. I'll look into it.
On 2015-01-28 05:41:06 -0500, Ian Craggs wrote:
Roger, I've pushed a fix. Should work now. Let me know. Thanks!
On 2015-01-28 06:04:59 -0500, Ian Craggs wrote:
Actually, where are you getting the code from? I've updated the git repo, but not any other packages yet.
On 2015-01-28 06:35:12 -0500, Roger Fowler wrote:
Hi Ian
I got the original code from https://www.eclipse.org/downloads/download.php?file=/paho/arduino.zip which is the Arduino download link on https://www.eclipse.org/paho/clients/c/embedded/
(I noticed that it contained a couple of other bugs which have already been fixed so I'm guessing that the zip file is back-level.(
The repository link on that page, https://www.eclipse.org/paho/clients/c/embedded/ doesn't appear to work (for me) but I've managed to do a git clone from https://git.eclipse.org/r/paho/org.eclipse.paho.mqtt.embedded-c
This is how I've picked up your updated MQTTClient.h
First test shows that the client is now responding with PUBCOMPs to the incoming PUBREL packets nicely. However, the HELLO.INO has just hung with just the occasional PINGREQ being issued by the client. I need to investigate and will send you an update soon.
Cheers
Roger
On 2015-01-28 09:24:39 -0500, Roger Fowler wrote:
Created attachment 250301
Mosquitto log 28-1-2015
Mosquitto log after the MQTTClient.h patch has been installed
On 2015-01-28 09:38:50 -0500, Roger Fowler wrote:
With the MQTTClient patch included, the message flow from the Mosquito log appears as you would expect (attached as MOSQUITO.OUT). However, the HELLO.INO program still hangs after a while.
I've narrowed this down to client.publish() calls by the client for QoS=2 payloads (QoQ 0 and 1 are working fine). On the eleventh iteration of loop(), the code hangs in the yield() call immediately after the call to client.publish(). The publish call by the client goes through the full PUBLISH/PUBREC/PUBREL/PUBCOMP exchange (client msg id 12). Next, the broker sends the message back to client (as the client is subscribed to the same topic), again at QoS=2. This too goes through the full PUBLISH/PUBREC/PUBREL/PUBCOMP exchange (broker msg id 11). But the messageArrived() function on the client doesn't get driven this time.
My original thought that we were hitting max_inflight_msgs in the broker may have been a red herring. I've increased this limit to 38 but the code still hangs in yield() on the eleventh iteration of loop. After the hang, the only activity on the Mosquito log is the occasional PINGREQ/PINGRESP exchange.
On 2015-01-28 09:40:36 -0500, Roger Fowler wrote:
BTW, the log shows the activity after the QoS=0 and QoS=1 calls have been excluded from the run
On 2015-01-28 09:47:18 -0500, Roger Fowler wrote:
Actually, the hang is not within yield() itself. I've just confirmed it is stuck in the loop which repeatedly calls yield(), because 'arrivedcount' is never being updated by messageArrived().
On 2015-01-29 12:23:58 -0500, Ian Craggs wrote:
Are you using an ethernet shield?
Does messageArrived get called for the last QoS 2 message?
If you comment out the section that sends and receives QoS 2 messages, leaving QoS 0 or 1 messages being sent and received only, does it go on for longer?
I've been doing some debugging and sometimes things are getting a bit unstable making things tricky. I've used some code I found to display the free memory (http://playground.arduino.cc/Code/AvailableMemory) which was very low (177 bytes). I've reduced the memory use in my test code, but we have to be very careful not to overflow memory in the basic Arduino. If you don't want to send/receive QoS 1 messages, that code can be conditionally compiled out, for example.
On 2015-01-29 13:10:46 -0500, Benjamin Cabé wrote:
(In reply to Ian Craggs from comment # 10)
I've reduced the memory use in my test code, but we
have to be very careful not to overflow memory in the basic Arduino.
I think that is a very good point!
How about getting rid of all the sprintf's in the example code? AFAICT they can all be replaced by Serial.print/Serial.println calls and this would allow to get rid of the printbuffer that would become essentially useless, plus this would also make the binary code a few KBs smaller by not having to link to sprintf (but having to use Serial.print(int) would mitigate this �small� saving: removing sprintf save 1.5K, Serial.print(int) costs 500B...)
Also there are many "almost-duplicated" string that could probably be simplified/factored
On 2015-01-29 18:08:56 -0500, Ian Craggs wrote:
Benjamin,
I wasn't being that careful with the example. I'll remove the sprintfs as you suggest.
Quite a lot of code can also be removed if we don't have a message handler per subscription. I can make that conditionally compiled too.
I can save a few more bytes by optimizing the data types too. Int return types should be one byte for instance.
There are also error checks in the low-level code. These could be conditionally compiled for environments like this where every byte counts.
On 2015-01-30 05:42:12 -0500, Roger Fowler wrote:
(In reply to Ian Craggs from comment # 10)
Hi Ian - thanks for this.
Are you using an ethernet shield?
Yes
Does messageArrived get called for the last QoS 2 message?
No. Last QoS 2 msg sent by client gets full PUBLISH/PUBREC/PUBREL/PUBCOMP exchange. Then the broker issues a QoS 2 msg back to client (as it subscribed to this topic). This also goes through the PUBLISH/PUBREC/PUBREL/PUBCOMP exchange. But messageArrived() doesn't get called on the client after the final PUBCOMP is received.
If you comment out the section that sends and receives QoS 2 messages,
leaving QoS 0 or 1 messages being sent and received only, does it go on for
longer?
There doesn't appear to be any problem with QoS 1 or 2 msgs - the code runs freely as long as QoS 2 is avoided. If QoS 2 msgs are sent then the code hangs on the eleventh iteration of loop(). This is regardless of whether just QoS 2 msgs are being sent or all three types.
I've been doing some debugging and sometimes things are getting a bit
unstable making things tricky. I've used some code I found to display the
free memory (http://playground.arduino.cc/Code/AvailableMemory) which was
very low (177 bytes). I've reduced the memory use in my test code, but we
have to be very careful not to overflow memory in the basic Arduino. If you
don't want to send/receive QoS 1 messages, that code can be conditionally
compiled out, for example.
I'm using an Arduino Mega 2560 so have a little more SRAM to play with. Using the code at that link, the available memory is showing as 5856 bytes.
Cheers, Roger
On 2015-01-30 09:17:10 -0500, Ian Craggs wrote:
After removing the sprintfs, I realized that they contributed to sketch size more than use of RAM. Nevertheless I've reduced my use of RAM enough to make the program run reliably, and I found the problem. QoS 2 message ids weren't being freed up when the PUBREL arrived.
As I write, my own Arduino Uno is up to packet id 1320 :-)
I've pushed a fix to the master branch.
On 2015-01-30 09:21:03 -0500, Roger Fowler wrote:
Ian
That's great news, thanks. I'll download and test when I get home this evening.
Have a good weekend
Roger.
On 2015-01-31 04:37:20 -0500, Roger Fowler wrote:
Looks like it is fixed - no more hangs :-)
Cheers