Giter VIP home page Giter VIP logo

Comments (11)

massi-ang avatar massi-ang commented on September 24, 2024 2

To reproduce you can modify the sample pub-sub to run the following code once a connection has been acquired (using 1.3.4):

var d=[];
const SIZE=120000
for (var j = 0;  j<SIZE ; j++) {
    d.push('a');
}
setInterval(() => {connection.publish('test', JSON.stringify({'msg': d}), 0);}, 20)
setInterval(() => {
    const used = process.memoryUsage().heapUsed / 1024 / 1024;
    console.log(`The script uses approximately ${Math.round(used * 100) / 100} MB`);
    }, 1000);
    
 

Connecting to Greengrass this logs:

The script uses approximately 52.79 MB
The script uses approximately 98.75 MB
The script uses approximately 144.13 MB
The script uses approximately 190.45 MB
The script uses approximately 233.85 MB
The script uses approximately 280.58 MB
The script uses approximately 327.43 MB
The script uses approximately 374.23 MB
The script uses approximately 417.31 MB
The script uses approximately 466.4 MB

<--- Last few GCs --->

[12871:0x57ff080]    21723 ms: Mark-sweep (reduce) 492.8 (502.2) -> 492.8 (502.2) MB, 8.7 / 0.0 ms  (average mu = 0.248, current mu = 0.002) last resort GC in old space requested
[12871:0x57ff080]    21731 ms: Mark-sweep (reduce) 492.8 (502.2) -> 492.8 (502.2) MB, 7.8 / 0.0 ms  (average mu = 0.146, current mu = 0.002) last resort GC in old space requested


<--- JS stacktrace --->

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
 1: 0xa63060 node::Abort() [node]
 2: 0x995c57 node::FatalError(char const*, char const*) [node]
 3: 0xc3bc1e v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xc3bf97 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xe04a05  [node]
 6: 0xe16cd1 v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 7: 0xdda89a v8::internal::Factory::AllocateRaw(int, v8::internal::AllocationType, v8::internal::AllocationAlignment) [node]
 8: 0xdd3da4 v8::internal::FactoryBase<v8::internal::Factory>::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [node]
 9: 0xdd5ea0 v8::internal::FactoryBase<v8::internal::Factory>::NewRawOneByteString(int, v8::internal::AllocationType) [node]
10: 0x103bf3a v8::internal::String::SlowFlatten(v8::internal::Isolate*, v8::internal::Handle<v8::internal::ConsString>, v8::internal::AllocationType) [node]
11: 0xc4eef9 v8::String::Utf8Length(v8::Isolate*) const [node]
12: 0xa4091a  [node]
13: 0xca876b  [node]
14: 0xca9d1c  [node]
15: 0xcaa396 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
16: 0x14c8439  [node]
Signal received: -2102827800, errno: 32767
################################################################################
Resolved stacktrace:
################################################################################
0x00007fd2c0b8895b: ?? ??:0
0x00000000000a2213: s_print_stack_trace at module.c:?
0x000000000000f5e0: __restore_rt at sigaction.c:?
0x00007fd2c9cb5277: ?? ??:0
0x00007fd2c9cb6968: ?? ??:0
node() [0xa63071]
node(_ZN4node10FatalErrorEPKcS1_+0) [0x995c57]
node(_ZN2v85Utils16ReportOOMFailureEPNS_8internal7IsolateEPKcb+0x4e) [0xc3bc1e]
node(_ZN2v88internal2V823FatalProcessOutOfMemoryEPNS0_7IsolateEPKcb+0x347) [0xc3bf97]
node() [0xe04a05]
node(_ZN2v88internal4Heap34AllocateRawWithRetryOrFailSlowPathEiNS0_14AllocationTypeENS0_16AllocationOriginENS0_19AllocationAlignmentE+0xf1) [0xe16cd1]
node(_ZN2v88internal7Factory11AllocateRawEiNS0_14AllocationTypeENS0_19AllocationAlignmentE+0x9a) [0xdda89a]
node(_ZN2v88internal11FactoryBaseINS0_7FactoryEE26AllocateRawWithImmortalMapEiNS0_14AllocationTypeENS0_3MapENS0_19AllocationAlignmentE+0x14) [0xdd3da4]
node(_ZN2v88internal11FactoryBaseINS0_7FactoryEE19NewRawOneByteStringEiNS0_14AllocationTypeE+0x50) [0xdd5ea0]
node(_ZN2v88internal6String11SlowFlattenEPNS0_7IsolateENS0_6HandleINS0_10ConsStringEEENS0_14AllocationTypeE+0x18a) [0x103bf3a]
node(_ZNK2v86String10Utf8LengthEPNS_7IsolateE+0x19) [0xc4eef9]
node() [0xa4091a]
node() [0xca876b]
node() [0xca9d1c]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0x16) [0xcaa396]
node() [0x14c8439]
################################################################################
Raw stacktrace:
################################################################################
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(aws_backtrace_print+0x4b) [0x7fd2c0b8895b]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(+0xa2213) [0x7fd2c0a88213]
/lib64/libpthread.so.0(+0xf5e0) [0x7fd2ca05b5e0]
/lib64/libc.so.6(gsignal+0x37) [0x7fd2c9cb5277]
/lib64/libc.so.6(abort+0x148) [0x7fd2c9cb6968]
node() [0xa63071]
node(_ZN4node10FatalErrorEPKcS1_+0) [0x995c57]
node(_ZN2v85Utils16ReportOOMFailureEPNS_8internal7IsolateEPKcb+0x4e) [0xc3bc1e]
node(_ZN2v88internal2V823FatalProcessOutOfMemoryEPNS0_7IsolateEPKcb+0x347) [0xc3bf97]
node() [0xe04a05]
node(_ZN2v88internal4Heap34AllocateRawWithRetryOrFailSlowPathEiNS0_14AllocationTypeENS0_16AllocationOriginENS0_19AllocationAlignmentE+0xf1) [0xe16cd1]
node(_ZN2v88internal7Factory11AllocateRawEiNS0_14AllocationTypeENS0_19AllocationAlignmentE+0x9a) [0xdda89a]
node(_ZN2v88internal11FactoryBaseINS0_7FactoryEE26AllocateRawWithImmortalMapEiNS0_14AllocationTypeENS0_3MapENS0_19AllocationAlignmentE+0x14) [0xdd3da4]
node(_ZN2v88internal11FactoryBaseINS0_7FactoryEE19NewRawOneByteStringEiNS0_14AllocationTypeE+0x50) [0xdd5ea0]
node(_ZN2v88internal6String11SlowFlattenEPNS0_7IsolateENS0_6HandleINS0_10ConsStringEEENS0_14AllocationTypeE+0x18a) [0x103bf3a]
node(_ZNK2v86String10Utf8LengthEPNS_7IsolateE+0x19) [0xc4eef9]
node() [0xa4091a]
node() [0xca876b]
node() [0xca9d1c]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0x16) [0xcaa396]
node() [0x14c8439]

Connecting to AWS IoT Core does not work with SIZE=120000. Must reduce to SIZE to 20000.
Output is similar, but the MQTT connection times out

The script uses approximately 11.83 MB
The script uses approximately 19.09 MB
The script uses approximately 27.92 MB
The script uses approximately 35.35 MB
The script uses approximately 42.44 MB
The script uses approximately 49.53 MB
The script uses approximately 58.26 MB
The script uses approximately 65.55 MB
The script uses approximately 72.75 MB
The script uses approximately 81.33 MB
The script uses approximately 88.62 MB
The script uses approximately 95.7 MB
The script uses approximately 104.43 MB
The script uses approximately 111.72 MB
The script uses approximately 119 MB
The script uses approximately 127.58 MB
The script uses approximately 134.34 MB
The script uses approximately 142.03 MB
The script uses approximately 150.62 MB
The script uses approximately 157.91 MB
The script uses approximately 165.19 MB
The script uses approximately 173.76 MB
AWS IoT MQTT interrupt
CrtError: libaws-c-mqtt: AWS_ERROR_MQTT_TIMEOUT, Time limit between request and response has been exceeded.
    at MqttClientConnection._on_connection_interrupted (/home/ec2-user/environment/device/node_modules/aws-crt/dist/native/mqtt.js:336:32)
    at /home/ec2-user/environment/device/node_modules/aws-crt/dist/native/mqtt.js:114:113 {
  error: 5129,
  error_code: 5129,
  error_name: 'AWS_ERROR_MQTT_TIMEOUT'
}
N-API call failed: napi_call_function(env, this_ptr, function, argc, argv, NULL)
    @ /codebuild/output/src902820455/src/aws-crt-nodejs/source/module.c:368: napi_pending_exception
Calling (error_code) => { this._on_connection_interrupted(error_code); }
Error: libaws-c-mqtt: AWS_ERROR_MQTT_TIMEOUT, Time limit between request and response has been exceeded.
Stack:
Error: libaws-c-mqtt: AWS_ERROR_MQTT_TIMEOUT, Time limit between request and response has been exceeded.
    at MqttClientConnection._on_connection_interrupted (/home/ec2-user/environment/device/node_modules/aws-crt/dist/native/mqtt.js:336:32)
    at /home/ec2-user/environment/device/node_modules/aws-crt/dist/native/mqtt.js:114:113
N-API call failed: aws_napi_dispatch_threadsafe_function( env, binding->on_connection_interrupted, NULL, on_interrupted, num_params, params)
    @ /codebuild/output/src902820455/src/aws-crt-nodejs/source/mqtt_client_connection.c:96: napi_pending_exception
Fatal error condition occurred in /codebuild/output/src902820455/src/aws-crt-nodejs/source/mqtt_client_connection.c:96: aws_napi_dispatch_threadsafe_function( env, binding->on_connection_interrupted, NULL, on_interrupted, num_params, params)
Exiting Application
################################################################################
Resolved stacktrace:
################################################################################
0x00007f688c34d95b: ?? ??:0
0x00007f688c343423: ?? ??:0
0x00000000000a4111: s_on_connection_interrupted_call at mqtt_client_connection.c:?
node() [0xa32868]
node() [0x144c979]
node(uv_run+0x2f0) [0x14452f0]
node(_ZN4node13SpinEventLoopEPNS_11EnvironmentE+0x135) [0x9b5ed5]
node(_ZN4node16NodeMainInstance3RunEPKNS_16EnvSerializeInfoE+0x170) [0xaa44d0]
node(_ZN4node5StartEiPPc+0x10a) [0xa2fe0a]
0x00007f68907e9445: ?? ??:0
node() [0x9b2ecc]
################################################################################
Raw stacktrace:
################################################################################
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(aws_backtrace_print+0x4b) [0x7f688c34d95b]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(aws_fatal_assert+0x43) [0x7f688c343423]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(+0xa4111) [0x7f688c24f111]
node() [0xa32868]
node() [0x144c979]
node(uv_run+0x2f0) [0x14452f0]
node(_ZN4node13SpinEventLoopEPNS_11EnvironmentE+0x135) [0x9b5ed5]
node(_ZN4node16NodeMainInstance3RunEPKNS_16EnvSerializeInfoE+0x170) [0xaa44d0]
node(_ZN4node5StartEiPPc+0x10a) [0xa2fe0a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f68907e9445]
node() [0x9b2ecc]
Signal received: -1870884220, errno: 32616
################################################################################
Resolved stacktrace:
################################################################################
0x00007f688c34d95b: ?? ??:0
0x00000000000a2213: s_print_stack_trace at module.c:?
0x000000000000f5e0: __restore_rt at sigaction.c:?
0x00007f68907fd277: ?? ??:0
0x00007f68907fe968: ?? ??:0
0x0000000000198428: aws_fatal_assert at ??:?
0x00000000000a4111: s_on_connection_interrupted_call at mqtt_client_connection.c:?
node() [0xa32868]
node() [0x144c979]
node(uv_run+0x2f0) [0x14452f0]
node(_ZN4node13SpinEventLoopEPNS_11EnvironmentE+0x135) [0x9b5ed5]
node(_ZN4node16NodeMainInstance3RunEPKNS_16EnvSerializeInfoE+0x170) [0xaa44d0]
node(_ZN4node5StartEiPPc+0x10a) [0xa2fe0a]
0x00007f68907e9445: ?? ??:0
node() [0x9b2ecc]
################################################################################
Raw stacktrace:
################################################################################
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(aws_backtrace_print+0x4b) [0x7f688c34d95b]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(+0xa2213) [0x7f688c24d213]
/lib64/libpthread.so.0(+0xf5e0) [0x7f6890ba35e0]
/lib64/libc.so.6(gsignal+0x37) [0x7f68907fd277]
/lib64/libc.so.6(abort+0x148) [0x7f68907fe968]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(+0x198428) [0x7f688c343428]
/home/ec2-user/environment/device/node_modules/aws-crt/dist/bin/linux-x64/aws-crt-nodejs.node(+0xa4111) [0x7f688c24f111]
node() [0xa32868]
node() [0x144c979]
node(uv_run+0x2f0) [0x14452f0]
node(_ZN4node13SpinEventLoopEPNS_11EnvironmentE+0x135) [0x9b5ed5]
node(_ZN4node16NodeMainInstance3RunEPKNS_16EnvSerializeInfoE+0x170) [0xaa44d0]
node(_ZN4node5StartEiPPc+0x10a) [0xa2fe0a]
/lib64/libc.so.6(__libc_start_main+0xf5) [0x7f68907e9445]
node() [0x9b2ecc]

If connection.publish is commented out memory stays constant

from aws-crt-nodejs.

accnops avatar accnops commented on September 24, 2024

I run this on the Docker container node:12.18.0-slim.

from aws-crt-nodejs.

massi-ang avatar massi-ang commented on September 24, 2024

The issue is in the fact that the publish method creates a callback closure (

function on_publish(packet_id: number, error_code: number) {
) which does not allow the Node GC to free the memory assigned to the payload.

from aws-crt-nodejs.

DavidOgunsAWS avatar DavidOgunsAWS commented on September 24, 2024

Closing this in favor of related PR: https://www.github.com/awslabs/aws-crt-nodejs/pull/155 to run the build w/tests up front. I've rebased the work starting with these changes. Thanks for your help!

from aws-crt-nodejs.

DavidOgunsAWS avatar DavidOgunsAWS commented on September 24, 2024

Intended on closing PR not this issue before

from aws-crt-nodejs.

laurentva avatar laurentva commented on September 24, 2024

Hey @DavidOgunsAWS , the memory leak is still present. I've taken over for @accnops where I thought this memory leak had been fixed in a new release. We're using version 1.3.8 of your library, as we thought the memory leak was fixed from version 1.3.6 onwards.

I should elaborate a bit on my use case, where currently there are quite a lot of publishes happening per second. We want to send at least 500 packets per second in a next deliverable. In Arthur's post, we were still only at 100. As stated, each packet is sized 700-900 kB. This is a naive implementation that's just a stringified JSON, so we can still gain bandwidth on packing the data that's being sent.

I've tried a few approaches in handling this MQTT connection over the past few days, and unfortunately none have worked stable for this load.

Aggregating the data

When I saw that the data was being sent at effectively 500 Hz, and there was no aggregation going on, the CPU was running >100%. The first thing I did was add aggregation to the stream with configurable parameters for either a timeout or a maximum amount of aggregated packets before the packet would be published.

This reduced the CPU usage a lot, to about 50%, when I chose to aggregate 5 packets. However, the MQTT still crashes after a few minutes due to sudden disconnect errors (these are the same errors that occur when sending a packet that is too large). Additionally, after restarting after crashing, the process systematically restarts after ~45 seconds, and doesn't seem to get out of it. It only survives for a few minutes if I turn off the data stream for the first minute, or reduce the packet rate in my environment.

After messing about with the number of packets to try to get a crash-free aggregation configuration, I worked with a hypothetical smaller packet size of about 70 kB, working from the hypothesis that we should eventually use heavily optimized packed data to achieve a large packet rate. This worked more stable, but due to this stability we were able to verify that for normal CPU (~50%), we still see a memory leak. This memory leak leads to increased CPU consumption, and eventually, the CPU usage becomes too high and we experience a crash. It is my hope that this is the main underlying issue.

Summary of issues

  • The CPU seems to suffer heavily when publishing, I got my Node process to go above 100% CPU when performing 500 publishes per second. So there's a bottleneck, apart from the network itself, on the amount of publishes going on.
  • The MQTT connection seems to crash on a large packet. When the packet sizes are large, thus a large aggregated packet can't be sent over the MQTT.

Thus there is a fundamental issue here where we're capped on both the times we send data, and the size of the data. Additionally, playing with the parameters I haven't been able to get a stable connection going for the 500 packets, do you think there is a max throughput in the MQTT connection?

  • Crash triggered immediately when sending a lot of data after connecting.

  • Memory leak when continuously publishing data, leading to an ever increasing CPU usage, ultimately crashing the process. Is this actually fixed? It's easy to reproduce.

Conclusion

Our next deliverable entails sending 500 packets per second. The next milestone would entail 1300 packets per second, with the final product needing about 4500 packets per second to be sent. Even after optimizations to the data size, I'm not sure if we can achieve those numbers with this library. What is your advice on this issue, and advised usage of your API? What material could I provide you to help fix these breaking issues?

from aws-crt-nodejs.

massi-ang avatar massi-ang commented on September 24, 2024

@laurentva Hi Laurent, if you are using AWS IoT Core as you MQTT broker, you should be aware that a single connection can support a maximum of 100 msg/sec at 5kB message, or 512 kB/sec throughput. If you have sustained traffic at 100 msg/sec 512kB/sec or more, I would suggest looking into using alternative services, such as Amazon Kinesis Data Stream.

from aws-crt-nodejs.

laurentva avatar laurentva commented on September 24, 2024

Hey! After a conversation with Massimiliano, he told me I should give you more information about my usecase... The AWS IoT MQTT connection is between an on-premise server and a Greengrass edge server. This means that the limitations of 512 kBps bandwidth per connection should not apply.

from aws-crt-nodejs.

massi-ang avatar massi-ang commented on September 24, 2024

Any progress on the resolution for this issue?

from aws-crt-nodejs.

laurentva avatar laurentva commented on September 24, 2024

We replaced aws-crt-nodejs with the generic mqtt package and have moved on 😄 thanks again for the recommendation.

from aws-crt-nodejs.

bretambrose avatar bretambrose commented on September 24, 2024

We believe this, and other similar issues, were fixed in v1.9.1

from aws-crt-nodejs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.