Comments (17)
I confirm this. netdata pauses with this:
#0 __wait () at src/thread/__wait.c:14
#1 0x00007f871ce90873 in __lockfile () at src/stdio/__lockfile.c:10
#2 0x00007f871ce92620 in close_file () at src/stdio/__stdio_exit.c:11
#3 0x00007f871ce9266b in __stdio_exit_needed () at src/stdio/__stdio_exit.c:19
#4 0x00007f871cdec8bb in exit () at src/exit/exit.c:32
#5 0x00007f871ce0a27b in netdata_cleanup_and_exit ()
#6 0x00007f871ce0dfd4 in main ()
So, it calls exit()
, but then libc waits for a lock. I guess something is not cleaned properly. Searching for a solution...
from binary-packages.
I think it is a timing issue on musl-libc (the libc used to build the static netdata).
I added a small delay of 3 seconds while exiting, to allow all background threads of netdata to exit properly. It seems fixed (although, yes, I should find a better solution to synchronize the exit of all threads).
The 1 minute delay though on centos 6.3 remains. netdata exits after 3 seconds, but somehow the init scripts decide to wait for a minute:
18694 pts/0 S+ 0:00 | \_ /bin/sh /sbin/service netdata stop
18699 pts/0 S+ 0:00 | \_ /bin/sh /etc/init.d/netdata stop
18704 pts/0 S+ 0:00 | \_ sleep 60
from binary-packages.
@ktsaou
I know it not a good solution, but my workaround is:
$ diff -u /etc/init.d/netdata.orig /etc/init.d/netdata
--- /etc/init.d/netdata.orig 2017-09-25 11:20:06.476164160 +0900
+++ /etc/init.d/netdata 2017-12-27 13:40:04.997113378 +0900
@@ -15,7 +15,7 @@
DAEMON_PATH=/opt/netdata/usr/sbin
PIDFILE=/opt/netdata/var/run/$DAEMON.pid
DAEMONOPTS="-P $PIDFILE"
-STOP_TIMEOUT="60"
+STOP_TIMEOUT="3"
[ -e /etc/sysconfig/$DAEMON ] && . /etc/sysconfig/$DAEMON
from binary-packages.
fixed it.
from binary-packages.
@ktsaou
It seems not be fixed.
$ sudo sh -c 'time /sbin/service netdata restart'
Stopping netdata... [ OK ]
Starting netdata... [ OK ]
real 1m1.224s
user 0m0.012s
sys 0m0.003s
Installed version: netdata-v1.9.0-89-gb4c2b536-x86_64-20180103-052914.gz.run
from binary-packages.
This is the init script taking that time.
While this happens, do ps fax
and you will see:
18694 pts/0 S+ 0:00 | \_ /bin/sh /sbin/service netdata stop
18699 pts/0 S+ 0:00 | \_ /bin/sh /etc/init.d/netdata stop
18704 pts/0 S+ 0:00 | \_ sleep 60
So, the init script just waits 60 seconds.
If you know how to fix the init script, please submit a PR.
from binary-packages.
@ktsaou
This is a part of killproc function:
if checkpid $pid 2>&1; then
# TERM first, then KILL if not dead
kill -TERM $pid >/dev/null 2>&1
usleep 100000
if checkpid $pid && sleep 1 &&
checkpid $pid && sleep $delay &&
checkpid $pid ; then
kill -KILL $pid >/dev/null 2>&1
usleep 100000
fi
fi
checkpid $pid
RC=$?
[ "$RC" -eq 0 ] && failure $"$base shutdown" || success $"$base shutdown"
RC=$((! $RC))
If we can stop netdata within 1.1s, init script won't execute sleep $delay
($delay=60).
I know it not a good solution, but I can send a PR to shorten delay from 60 to 3(default).
How do you think?
from binary-packages.
Well, the time netdata needs to exit is subject to the size of the database and the speed of the disks. If we timeout in 3 seconds, data will be lost...
from binary-packages.
@ktsaou
I found tc-qos-helper.sh process is not finished in time:
6245 ? Sl 0:00 /opt/netdata/bin/srv/netdata -P /opt/netdata/var/run/netdata.pid
6256 ? S 0:00 \_ bash /opt/netdata/usr/libexec/netdata/plugins.d/tc-qos-helper.sh 1
6264 ? Z 0:00 \_ [python] <defunct>
6266 ? Z 0:00 \_ [apps.plugin] <defunct>
from binary-packages.
Could you please post /opt/netdata/var/log/netdata/error.log while netdata exits?
from binary-packages.
2018-01-04 17:17:38: netdata INFO : MAIN : SIGNAL: Received SIGTERM. Cleaning up to exit...
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: netdata prepares to exit with code 0...
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: stopping master threads...
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGIN[proc]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGIN[diskspace]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGIN[cgroup]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGIN[tc]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGIN[idlejitter]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: HEALTH
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: PLUGINSD
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: WEB_SERVER[multi]
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: Stopping master thread: STATSD
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: cleaning up the database...
2018-01-04 17:17:38: netdata INFO : MAIN : Cleaning up database [1 hosts(s)]...
2018-01-04 17:17:38: netdata INFO : MAIN : Cleaning up database of host '********************************'...
2018-01-04 17:17:38: netdata INFO : PLUGIN[proc] : PLUGIN[proc]: thread with task id 29286 finished
2018-01-04 17:17:38: netdata INFO : PLUGIN[diskspace] : PLUGIN[diskspace]: thread with task id 29287 finished
2018-01-04 17:17:38: netdata INFO : PLUGIN[cgroup] : PLUGIN[cgroup]: thread with task id 29288 finished
2018-01-04 17:17:38: netdata INFO : PLUGIN[idlejitter] : PLUGIN[idlejitter]: thread with task id 29290 finished
2018-01-04 17:17:38: netdata INFO : HEALTH : HEALTH: thread with task id 29292 finished
2018-01-04 17:17:38: netdata INFO : PLUGINSD : PLUGINSD: thread with task id 29293 finished
2018-01-04 17:17:38: netdata INFO : WEB_SERVER[multi] : WEB_SERVER[multi]: thread with task id 29294 finished
2018-01-04 17:17:38: netdata INFO : STATSD : STATSD: thread with task id 29295 finished
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: freeing database memory...
2018-01-04 17:17:38: netdata INFO : MAIN : Freeing all memory for host '********************************'...
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: removing netdata PID file '/opt/netdata/var/run/netdata.pid'...
2018-01-04 17:17:38: netdata INFO : MAIN : EXIT: all done - netdata is now exiting - bye bye...
2018-01-04 17:17:38: netdata ERROR : PLUGINSD[apps]: '/opt/netdata/usr/libexec/netdata/plugins.d/apps.plugin' (pid 29306) disconnected after 364 successful data collections (ENDs).
2018-01-04 17:17:39: netdata INFO : PLUGIN[tc] : PLUGIN[tc]: thread with task id 29289 finished
2018-01-04 17:17:39: netdata ERROR : PLUGINSD[python.d]: '/opt/netdata/usr/libexec/netdata/plugins.d/python.d.plugin' (pid 29304) disconnected after 273 successful data collections (ENDs).
2018-01-04 17:17:41: netdata INFO : PLUGINSD[charts.d] : PLUGINSD[charts.d]: thread with task id 29301 finished
there is no exit log for tc-qos-helper because it was kill by SIGKILL after 60s.
from binary-packages.
well, you are right. I implemented the cleanup and then I just bypassed it.
Making a PR now.
from binary-packages.
merged it.
could you please update and check it again?
from binary-packages.
@ktsaou
The new static package has not been built. Can you build it?
And can you reopen this issue until it is fixed?
from binary-packages.
It is automatic, every night.
But I just did it by hand too: netdata-v1.9.0-127-gcafaf427-x86_64-20180105-002205.gz.run
I'll reopen it if it is not fixed.
from binary-packages.
Thank you for the quickly build :)
Now it stops less than 1 second.
$ sudo sh -c 'time service netdata stop'
Stopping netdata... [ OK ]
real 0m0.734s
user 0m0.010s
sys 0m0.003s
from binary-packages.
ok. so it is fixed.
from binary-packages.
Related Issues (9)
- Uninstall binary netdata package? HOT 4
- Check version of netdata installation from command line? HOT 4
- How to install netdata without starting it? HOT 5
- ipmi plugin for binary package HOT 2
- Openldap deployment on existing server HOT 2
- /opt
- /optls
- New Binary Packages haven't shown up for some time HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from binary-packages.