Comments (17)
I have traced down the problem to lwmqtt_write_to_network()
in client.c
. You have a while-loop in there to send a message in parts if necessary. You interrupt that loop based on the timeout given to your esp_mqtt_init()
function. The time to a timeout you pass to client->network_write()
is based on the remaining time which doesn't make much sense. This means if client->network_write()
times out there is no remaining time so the while-loop will never loop (it will interrupt immidiatly with LWMQTT_NETWORK_TIMEOUT
)and try to send the remaining data in another try.
To fix that and actually enable multiple partial writes the timeout time given to client->network_write
has to be shorter than the time used to calculate the remaining_time
in lwmqtt_write_to_network()
.
I just changed the line
lwmqtt_err_t err = client->network_write(client->network, client->write_buf + offset + written, len - written, &partial_write, (uint32_t)remaining_time / 15);
as my timeout time is 15s.
I will test this and wether it fixes the problem, so far it seems promising.
from esp-mqtt.
Every call into lwmqtt resets the command timeout and passes it down to nested function calls to ensure an action is cancelled in case the timeout has been reached. In esp-mqtt the command timeout is defined globally and passed into the lwmqtt function calls.
If you initialized esp-mqtt with a command timeout of 15s it will reset the timer when you call esp_mqtt_publish
. After the packet has been encoded it will call client->network_write
the first time with the 15s as the initial timeout. As the client->network_write
my return early without writing all the data, it will call it repeatedly until the timeout has been reached. In case of a timeout the operation is aborted.
Since we assume that the command timeout is reasonably high and client->network_write
quite quick, there should not be an issue with timeouts if the connection is healthy. However, in this scenario the data might overwhelm the modem and cause a delay that might trigger the timeout.
Did you try a higher command timeout and keep alive?
from esp-mqtt.
As the client->network_write my return early without writing all the data, it will call it repeatedly until the timeout has been reached. In case of a timeout the operation is aborted.
No. If client->network_write
returns it has a) sent all data or b) timed out and only sent a part of the data or nothing at all. In case b) 15s have allready passed so there will be no repeated call to client->network_write
.
See http://pubs.opengroup.org/onlinepubs/007908799/xns/getsockopt.html
SO_SNDTIMEO
Reports the timeout value specifying the amount of time that an output function blocks because flow control prevents data from being sent. If a send operation has blocked for this time, it returns with a partial count or with errno set to [EAGAIN] or [EWOULDBLOCK] if no data were sent. The default for this option is zero, which indicates that a send operation will not time out. The option stores a timeval structure. Note that not all implementations allow this option to be retrieved.
I verified that behaviour by changing lwmqtt_write_to_network
to:
static lwmqtt_err_t lwmqtt_write_to_network(lwmqtt_client_t* client, size_t offset, size_t len) {
// prepare counter
size_t written = 0;
ESP_LOGI("client.c_write", "lwmqtt sending data..."); // DEBUG
// write while data is left
while (written < len) {
// check remaining time
int32_t remaining_time = client->timer_get(client->command_timer);
ESP_LOGI("client.c_write", "...remaining time %d", remaining_time); // DEBUG
if (remaining_time <= 0) {
return LWMQTT_NETWORK_TIMEOUT;
}
// write
size_t partial_write = 0;
lwmqtt_err_t err = client->network_write(client->network, client->write_buf + offset + written, len - written, &partial_write, (uint32_t)remaining_time / 15);
if (err != LWMQTT_SUCCESS) {
return err;
}
// increment counter
written += partial_write;
ESP_LOGI("client.c_write", "...bytes written %d/%d", written, len); // DEBUG
}
return LWMQTT_SUCCESS;
}
And esp_lwmqtt_network_write
to:
lwmqtt_err_t esp_lwmqtt_network_write(void* ref, uint8_t* buffer, size_t len, size_t* sent, uint32_t timeout) {
// cast network reference
esp_lwmqtt_network_t* n = (esp_lwmqtt_network_t*)ref;
// set timeout
struct timeval t = {.tv_sec = timeout / 1000, .tv_usec = (timeout % 1000) * 1000};
int rc = lwip_setsockopt_r(n->socket, SOL_SOCKET, SO_SNDTIMEO, (char*)&t, sizeof(t));
if (rc < 0) {
return LWMQTT_NETWORK_FAILED_WRITE;
}
// write to socket
int bytes = lwip_write_r(n->socket, buffer, len);
ESP_LOGI("esp_lwmqtt", "network write errno: %d", errno);
if (bytes < 0 && errno != EAGAIN) {
return LWMQTT_NETWORK_FAILED_WRITE;
} else if (errno != EAGAIN) {
// increment counter
*sent += bytes;
} else {
vTaskDelay(200 / portTICK_PERIOD_MS);
}
return LWMQTT_SUCCESS;
}
That revealed another little mistake as lwip_write_r
returns -1
when it fails to send any data and errno == EAGAIN
. That causes written
to count down if that case isn't catched like in the shown modified code. The vTaskDelay
is ugly and something similar should probably be achieved with select
.
And yes, extending the timeout and the keep alive to 60s each doesn't help.
from esp-mqtt.
The write
system call may return early with not all data written:
On success, the number of bytes written is returned (zero indicates nothing was written). It is not an error if this number is smaller than the number of bytes requested; this may happen for example because the disk device was filled. (http://man7.org/linux/man-pages/man2/write.2.html)
While this might happen on Linux it does not necessarily happen on the ESP.
However the other thing really seems to be a bug. We should definitely check if the returned value is below zero and not decrease the total amount. I'll fix it in lwmqtt and esp-mqtt. Thanks for noticing.
from esp-mqtt.
What do you mean by "early"? I think you don't get my point. The write
function can return without having sent anything or only a part of the message, yes. But if it does so it does so after a timeout specified by SO_SNDTIMEO
. You set this timeout to be the same time lwmqtt_write_to_network
uses to determine wether it should retry a send or not. This means the check
if (remaining_time <= 0) {
return LWMQTT_NETWORK_TIMEOUT;
}
will always trigger if there has only been a partial or failed write, which in return means there will never be a retry in that while-loop.
from esp-mqtt.
The timeout is not there to perform any retries. The check is there to stop writing if the operation takes more time that has been granted. In other words: If something takes too long, we want to stop and eventually reestablish the connection and try again. The higher you set the command timeout the more time you grant to finish reads and writes in time.
from esp-mqtt.
Than why even do that loop?
// write while data is left
while (written < len) {...}
from esp-mqtt.
Because on Linux write might not send all data at once and need multiple calls. Depending on the OS the kernel might have a smaller output buffer than the data you're about to send. Thus, you need to call write multiple times to get all the data through. The loop tracks the amount of bytes written and stops if all has been written.
from esp-mqtt.
Yes. But between each retry to send the remaining data the timespan is SO_SNDTIMEO
. The library sets SO_SNDTIMEO
to the same 15s (in my case) as the command_timer
that times out the retries.
size_t written = 0;
// write while data is left
while (written < len) {
*/ 2. now we loop to try to send the rest of the data but before that we check wether we timed out
*/ as the call to write allready took 15s we whill always timeout here to and not retry
// check remaining time
int32_t remaining_time = client->timer_get(client->command_timer);
if (remaining_time <= 0) {
return LWMQTT_NETWORK_TIMEOUT;
}
// write
size_t partial_write = 0;
*/ 1. this call takes 15s before it returns if it only sends part of the data...
*/ (as remaining_time will be 15s at the start)
lwmqtt_err_t err = client->network_write(client->network, client->write_buf + offset + written, len - written, &partial_write, (uint32_t)remaining_time);
if (err != LWMQTT_SUCCESS) {
return err;
}
// increment counter
written += partial_write;
}
The command_timer
should take longer to timeout the while loop than what is defined as timeout in SO_SNDTIMEO
. The command_timer
is based on the time defined in timeout
and so is SO_SNDTIMEO
. That means timeout/timeout = 1
which is the maximum partial writes that could happen. If SO_SNDTIMEO
would for example be only 1s while timeout for the command_timer
would stay at 15s that would be 15 maximum partial writes before the publish-try completly times out.
I hope you see what I mean now as I fear we could otherwise keep moving in circles for hours... :D
from esp-mqtt.
There is no difference in giving the client->network_write
a smaller timeout: It will just timeout earlier. Yes the loop would then retry the operation, but this has the same effect as giving it the whole timeout from the beginning. Either the underlying write is able to write the bytes out in time or not. I can't see why this would make a difference.
An example: You need to bake 10 cakes. I give you the task and say: "Come back to me after and hour if you didn't already bake all the cakes". If you manage to bake all cakes in time, all is good. If you're to slow then you reach the timeout and come back to me. Now, there is no difference in giving you 6 times the task to bake 10 cakes in 10 minutes. Either you're fast enough or you'r to slow. ;)
from esp-mqtt.
I see. Now we are on the same page and what you say makes sense and fits into my understanding. And I now understand that there could be a thrid case in which the write only writes part of the data but returns before it times out, thats for the loop is for. Sadly that still doesn't fix my issue. I will update to your latest changes and do some more digging.
from esp-mqtt.
Ok cool. But, Is your issue not more related to memory management? I mean 37K is a lot of data for the esp32...
from esp-mqtt.
No. The memory management problem I had described in #26 is working now. And with your fix 8fd0f16 I am able to check wether the task has started and if not (as not enough memory was available) I just TaskDelay
a bit to wait for the IDLE task to free memory, as there should be enough memory, just not freed yet, and then try to start the task again. To double check I call heap_caps_get_largest_free_block(MALLOC_CAP_8BIT)
and print its result alot. I'm pretty much always above 15kB.
The problem has to be somewhat connection related as I also have problems connecting to the broker at times. Lwmqtt errors -6 and -9. In Wireshark there is alot going on and I'm yet unable to get any solid structure into it to better describe the problem. Trying it anyway I see a few errors:
- Sometimes I can see lwip trying to send something and it never reaches my computer (can't see the package in Wireshark). That means the socket is now "full" (wont accept any more data, program keeps running) and can't sent anything else... the timeout happens -4. The client then disconnects and can't connect again because the socket is still blocked and the client can't send a MQTT connection request.
- My mosquitto broker sometime (randomly?!, but mostly inbetween multiple TCP packages of one message) sends a TCP-RST to the ESP and the ESP doesn't react with a new SYN as expected (in my TCP/IP understanding). That causes the whole connection to not work anymore... hard reset needed.
- MQTT client starts to send data. Wireshark shows the first few ~1490byte TCP packages of the image but then a flood of retransmissions messes everything up.
- Can't connect to mosquitto... The sent MQTT connection request isn't shown in Wireshark. Wireshark only prompts "tcp previous segment not captured". Lwmqtt errors -6 and -9 mostly.
Errors 1. and 3. don't happen if I only send a timestamp which is so small, it doesn't have to be split into multiple TCP packages.
Most of it sounds like problems with the physical connection to me, but if I run a simple program on the ESP that just sends random strings (no camera, no MQTT) via TCP (using the write
function from sockets.h
) I don't have any connectivity problems whatsoever.
Best thing... sometimes there are like 5minutes where everything is working like a charm no problems at all...
I know this is probably very confusing but I'm pretty much unable to structure everything that's going wrong at the moment...
from esp-mqtt.
That is quite a weird problem indeed. One thing you could try that I can think of is changing lwmqtt_write_to_network
to call client->network_write
multiple times with buffers up to 1024 bytes or just below the MTU (usually 1400 or 1500 bytes):
size_t write_len = len - written;
if (length > 1024) {
length = 1024;
}
lwmqtt_err_t err = client->network_write(client->network,
client->write_buf + offset + written, write_len, &partial_write,
(uint32_t)remaining_time);
Maybe lwip is doing something weird under the hood if it is called with big buffers.
from esp-mqtt.
Doing that would mean I will have to reassemble the data on the other side by myself? If I really need to go that route I will just send multiple MQTT-publishes.
At the moment I'm trying to trace down the problems by enabling the debug-logs from lwip. In combination with wireshark it seems like there are indeed a lot of packages dropping. Lwip correctly tries to resent them and sometimes they reach there goal... sometimes not. The retries happen in a somewhat wierd and slow pattern so eventually it just times out after a while.
At this point my problem is still unsolved but I'm pretty sure it has nothing to do with your library so I will close this issue.
from esp-mqtt.
If anyone may end up here or you want to note it in your README. I found a solution for my problem which completly solves my connection issues.
As stated here espressif/esp-mqtt#48 for larger buffers sent via lwip #define CONFIG_USE_ONLY_LWIP_SELECT 1
needs to be set. In menuconfig it can be found at "Component Setting" -> "LWIP" -> "Support LWIP socket select() only".
from esp-mqtt.
Thanks for figuring it out. I added a note to the readme. I hope this gets resolved soon.
from esp-mqtt.
Related Issues (20)
- Reconnect after lost connection HOT 6
- include string.h missing - memset undefined
- MQTT connection works, publish fails - v0.4.4 OK, v 0.5.1 FAILS HOT 15
- Initializing NVS before WIFI HOT 3
- branch idf3: 2nd esp_mqtt_publish(QOS1) always fails HOT 5
- Feature: add event ESP_MQTT_STATUS_STOPPED HOT 2
- Connection loss after a few seconds HOT 10
- Max library MQTT message recieve size? HOT 3
- Installation help HOT 5
- Security features HOT 3
- example HOT 2
- does this ever vTaskDelay? HOT 4
- Error -6 and -13 on sending larger (up to 37kB) chunks of data HOT 4
- Query: esp_mqtt_start API HOT 1
- How does it compare to "official" esp-mqtt ? HOT 1
- Support multiple instances in parallel HOT 1
- keep getting this error dont know what to do. 11:54:16.549 -> Scan start 11:54:16.549 -> E (74130) wifi:init nvs: failed, ret=101 11:54:16.549 -> Scan done 11:54:16.549 -> -2 networks found 11:54:16.549 -> Nr | SSID | RSSI | CH | Encryption
- old version of lwmqtt HOT 2
- esp_mqtt: lwmqtt_connect: -5 with mosquitto broker HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from esp-mqtt.