Giter VIP home page Giter VIP logo

Comments (17)

ktsaou avatar ktsaou commented on September 13, 2024

I confirm this. netdata pauses with this:

#0  __wait () at src/thread/__wait.c:14
#1  0x00007f871ce90873 in __lockfile () at src/stdio/__lockfile.c:10
#2  0x00007f871ce92620 in close_file () at src/stdio/__stdio_exit.c:11
#3  0x00007f871ce9266b in __stdio_exit_needed () at src/stdio/__stdio_exit.c:19
#4  0x00007f871cdec8bb in exit () at src/exit/exit.c:32
#5  0x00007f871ce0a27b in netdata_cleanup_and_exit ()
#6  0x00007f871ce0dfd4 in main ()

So, it calls exit(), but then libc waits for a lock. I guess something is not cleaned properly. Searching for a solution...

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

I think it is a timing issue on musl-libc (the libc used to build the static netdata).

I added a small delay of 3 seconds while exiting, to allow all background threads of netdata to exit properly. It seems fixed (although, yes, I should find a better solution to synchronize the exit of all threads).

The 1 minute delay though on centos 6.3 remains. netdata exits after 3 seconds, but somehow the init scripts decide to wait for a minute:

18694 pts/0    S+     0:00  |                   \_ /bin/sh /sbin/service netdata stop
18699 pts/0    S+     0:00  |                       \_ /bin/sh /etc/init.d/netdata stop
18704 pts/0    S+     0:00  |                           \_ sleep 60

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

@ktsaou
I know it not a good solution, but my workaround is:

$ diff -u  /etc/init.d/netdata.orig  /etc/init.d/netdata
--- /etc/init.d/netdata.orig	2017-09-25 11:20:06.476164160 +0900
+++ /etc/init.d/netdata	2017-12-27 13:40:04.997113378 +0900
@@ -15,7 +15,7 @@
 DAEMON_PATH=/opt/netdata/usr/sbin
 PIDFILE=/opt/netdata/var/run/$DAEMON.pid
 DAEMONOPTS="-P $PIDFILE"
-STOP_TIMEOUT="60"
+STOP_TIMEOUT="3"

 [ -e /etc/sysconfig/$DAEMON ] && . /etc/sysconfig/$DAEMON

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

fixed it.

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

@ktsaou
It seems not be fixed.

$ sudo sh -c 'time /sbin/service netdata restart'
Stopping netdata...                                        [  OK  ]
Starting netdata...                                        [  OK  ]

real	1m1.224s
user	0m0.012s
sys	0m0.003s

Installed version: netdata-v1.9.0-89-gb4c2b536-x86_64-20180103-052914.gz.run

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

This is the init script taking that time.
While this happens, do ps fax and you will see:

18694 pts/0    S+     0:00  |                   \_ /bin/sh /sbin/service netdata stop
18699 pts/0    S+     0:00  |                       \_ /bin/sh /etc/init.d/netdata stop
18704 pts/0    S+     0:00  |                           \_ sleep 60

So, the init script just waits 60 seconds.
If you know how to fix the init script, please submit a PR.

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

@ktsaou
This is a part of killproc function:

if checkpid $pid 2>&1; then
   # TERM first, then KILL if not dead
   kill -TERM $pid >/dev/null 2>&1
   usleep 100000
   if checkpid $pid && sleep 1 &&
      checkpid $pid && sleep $delay &&
      checkpid $pid ; then
        kill -KILL $pid >/dev/null 2>&1
        usleep 100000
   fi
fi
checkpid $pid
RC=$?
[ "$RC" -eq 0 ] && failure $"$base shutdown" || success $"$base shutdown"
RC=$((! $RC))

If we can stop netdata within 1.1s, init script won't execute sleep $delay ($delay=60).
I know it not a good solution, but I can send a PR to shorten delay from 60 to 3(default).
How do you think?

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

Well, the time netdata needs to exit is subject to the size of the database and the speed of the disks. If we timeout in 3 seconds, data will be lost...

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

@ktsaou
I found tc-qos-helper.sh process is not finished in time:

 6245 ?        Sl     0:00 /opt/netdata/bin/srv/netdata -P /opt/netdata/var/run/netdata.pid
 6256 ?        S      0:00  \_ bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1
 6264 ?        Z      0:00  \_ [python] <defunct>
 6266 ?        Z      0:00  \_ [apps.plugin] <defunct>

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

Could you please post /opt/netdata/var/log/netdata/error.log while netdata exits?

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024
2018-01-04 17:17:38: netdata INFO  : MAIN : SIGNAL: Received SIGTERM. Cleaning up to exit...
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: netdata prepares to exit with code 0...
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: stopping master threads...
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGIN[proc]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGIN[diskspace]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGIN[cgroup]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGIN[tc]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGIN[idlejitter]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: HEALTH
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: PLUGINSD
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: WEB_SERVER[multi]
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: Stopping master thread: STATSD
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: cleaning up the database...
2018-01-04 17:17:38: netdata INFO  : MAIN : Cleaning up database [1 hosts(s)]...
2018-01-04 17:17:38: netdata INFO  : MAIN : Cleaning up database of host '********************************'...
2018-01-04 17:17:38: netdata INFO  : PLUGIN[proc] : PLUGIN[proc]: thread with task id 29286 finished
2018-01-04 17:17:38: netdata INFO  : PLUGIN[diskspace] : PLUGIN[diskspace]: thread with task id 29287 finished
2018-01-04 17:17:38: netdata INFO  : PLUGIN[cgroup] : PLUGIN[cgroup]: thread with task id 29288 finished
2018-01-04 17:17:38: netdata INFO  : PLUGIN[idlejitter] : PLUGIN[idlejitter]: thread with task id 29290 finished
2018-01-04 17:17:38: netdata INFO  : HEALTH : HEALTH: thread with task id 29292 finished
2018-01-04 17:17:38: netdata INFO  : PLUGINSD : PLUGINSD: thread with task id 29293 finished
2018-01-04 17:17:38: netdata INFO  : WEB_SERVER[multi] : WEB_SERVER[multi]: thread with task id 29294 finished
2018-01-04 17:17:38: netdata INFO  : STATSD : STATSD: thread with task id 29295 finished
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: freeing database memory...
2018-01-04 17:17:38: netdata INFO  : MAIN : Freeing all memory for host '********************************'...
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: removing netdata PID file '/opt/netdata/var/run/netdata.pid'...
2018-01-04 17:17:38: netdata INFO  : MAIN : EXIT: all done - netdata is now exiting - bye bye...
2018-01-04 17:17:38: netdata ERROR : PLUGINSD[apps]: '/opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin' (pid 29306) disconnected after 364 successful data collections (ENDs).
2018-01-04 17:17:39: netdata INFO  : PLUGIN[tc] : PLUGIN[tc]: thread with task id 29289 finished
2018-01-04 17:17:39: netdata ERROR : PLUGINSD[python.d]: '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 29304) disconnected after 273 successful data collections (ENDs).
2018-01-04 17:17:41: netdata INFO  : PLUGINSD[charts.d] : PLUGINSD[charts.d]: thread with task id 29301 finished

there is no exit log for tc-qos-helper because it was kill by SIGKILL after 60s.

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

well, you are right. I implemented the cleanup and then I just bypassed it.
Making a PR now.

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

merged it.
could you please update and check it again?

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

@ktsaou
The new static package has not been built. Can you build it?
And can you reopen this issue until it is fixed?

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

It is automatic, every night.
But I just did it by hand too: netdata-v1.9.0-127-gcafaf427-x86_64-20180105-002205.gz.run

I'll reopen it if it is not fixed.

from binary-packages.

Wing924 avatar Wing924 commented on September 13, 2024

Thank you for the quickly build :)
Now it stops less than 1 second.

$ sudo sh -c 'time service netdata stop'
Stopping netdata...  [  OK  ]

real	0m0.734s
user	0m0.010s
sys	0m0.003s

from binary-packages.

ktsaou avatar ktsaou commented on September 13, 2024

ok. so it is fixed.

from binary-packages.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.