Comments (8)
@ddebroy some additional information - We are exhausting Inodes as shown below.
swarm-worker000003:~$ df -i
Filesystem Inodes Used Available Use% Mounted on
overlay 1966080 1960215 5865 100% /
tmpfs 1792091 186 1791905 0% /dev
tmpfs 1792091 15 1792076 0% /sys/fs/cgroup
tmpfs 1792091 1884 1790207 0% /etc
/dev/sda1 1966080 1960215 5865 100% /home
tmpfs 1792091 1884 1790207 0% /mnt
shm 1792091 1 1792090 0% /dev/shm
tmpfs 1792091 1884 1790207 0% /lib/firmware
/dev/sda1 1966080 1960215 5865 100% /var/log
/dev/sda1 1966080 1960215 5865 100% /etc/ssh
tmpfs 1792091 1884 1790207 0% /lib/modules
/dev/sda1 1966080 1960215 5865 100% /etc/hosts
/dev/sda1 1966080 1960215 5865 100% /var/etc/hostname
/dev/sda1 1966080 1960215 5865 100% /etc/resolv.conf
/dev/sda1 1966080 1960215 5865 100% /var/etc/docker
tmpfs 1792091 376 1791715 0% /var/run/docker.sock
/dev/sda1 1966080 1960215 5865 100% /var/lib/waagent
tmpfs 1792091 1884 1790207 0% /usr/local/bin/docker
/dev/sdb1 256 27 229 11% /mnt/resource
Based on moby/moby#10613 we ran docker rmi $(docker images -q --filter "dangling=true")
and this took inodes used down to 21%.
from for-azure.
Seems like something is off with the VHD used by the template I pointed to earlier: it is not mounting /dev/sdb
correctly. Will update with more findings.
from for-azure.
Thanks @ddebroy
from for-azure.
Update: It turns out the template I referred to earlier https://download.docker.com/azure/17.06/17.06.2/Docker-DDC.tmpl points to VHD 1.0.9 which did not incorporate the enhancement to use the second larger sdb
disk provisioned by Azure to mount /var/lib/docker
. That enhancement is first being rolled out in the CE version and then based on how things are going, will roll be rolled out as part of the next EE release: 17.06.2-ee4.
Re-reading the above, it sounds like the df -h
output was in sync with what UCP was reporting but the problem was the i-node exhaustion which docker rmi
took care of, correct?
from for-azure.
Yes @ddebroy, but we ended up in a bad state between managers and workers - similar to what is described here: docker-archive/classicswarm#2044
Although we could pull down images after the docker rmi
, the tasks wouldn't advance past 'assigned' state, and the workers were logging the following:
Not enough managers yet. We only have 0 and we need 3 to continue.
sleep for a bit, and try again when we wake up.
We tried provisioning a new worker (which failed to connect) and restarting the ucp agent and controller, to no avail.
At this point, we deleted the cluster again and may wait for 17.06.2-ee4. Is there an expected release date ?
from for-azure.
Hmm .. I am not sure of the steps you took but a worker will never log the message
Not enough managers yet. We only have 0 and we need 3 to continue.
sleep for a bit, and try again when we wake up.
It is something a new manager logs when it is unable to join the swarm. Sounds like you were trying to bring up new manager nodes? Looking through your diagnostics logs from the initial message, the swarm appears to be in a stable state. I guess the swarm cluster ended up in a bad state once the inode issue appeared.
By any chance, is there a way, you can share steps to repro step (2) above in a manner as close to what you tried as possible: Deploy a number of services (accumulated worker images are about 14GB)
that will allow us to repro your environment internally and investigate?
Regarding 17.06.2-ee-4: we are running into some delays with getting the VHDs (that work the way we want with 17.06.2-ee-4) published through Azure. Will update once that is done and we are ready.
from for-azure.
Sure - it is probably a side effect of node.js applications, where we have thousands of tiny files that make up the application. I'll see if I can locate a suitable example, otherwise I'll publish a sample for you that triggers the issue.
from for-azure.
@ddebroy - @jeffnessen mentioned he has a suitable test container for you.
from for-azure.
Related Issues (20)
- Newly provisioned swarm is not working as swarm is not initialized. HOT 5
- Cloudstor plugin not enabled in newly provisioned swarm HOT 11
- Cannot SSH into node after VM restart - no agent container HOT 3
- waagent.log is not rotating 18.03.0-ce HOT 1
- tcp4 / tcp port not being exposed/mapped to running container after it's been in use before
- Not able to share cloudstor azure named volumes across multiple containers on same host HOT 12
- Docker logs not moving to storage accounts instead kept on Disk. HOT 3
- how to enable auto-scaling for swarm-worker-vmss on the basis of Memory usage
- Cloudstor: Prevent deletion of underlying Azure file share when docker volume is removed. HOT 2
- Fail to deploy Docker for Azure HOT 3
- Unable to SSH into Manager VMSS's after upgrading the instance(s) to the last mode on Azure portal HOT 2
- Project no longer supported? HOT 10
- Mongodb failed to run with persisted volume with cloudstor plugin. HOT 2
- Enable hard link support in cloudstor:azure
- Cannot restart docker daemon on management nodes
- VMSS restart hangs indefinitely at creating .ssh directory
- Error response from daemon: plugin cloudstor:azure already exists
- Storage account
- Can't connect to my Azure Docker Image BDD from SQL Management Studio
- Does not work at all
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from for-azure.