Giter VIP home page Giter VIP logo

Comments (8)

spanditcaa avatar spanditcaa commented on July 17, 2024

@ddebroy some additional information - We are exhausting Inodes as shown below.

swarm-worker000003:~$ df -i
Filesystem              Inodes      Used Available Use% Mounted on
overlay                1966080   1960215      5865 100% /
tmpfs                  1792091       186   1791905   0% /dev
tmpfs                  1792091        15   1792076   0% /sys/fs/cgroup
tmpfs                  1792091      1884   1790207   0% /etc
/dev/sda1              1966080   1960215      5865 100% /home
tmpfs                  1792091      1884   1790207   0% /mnt
shm                    1792091         1   1792090   0% /dev/shm
tmpfs                  1792091      1884   1790207   0% /lib/firmware
/dev/sda1              1966080   1960215      5865 100% /var/log
/dev/sda1              1966080   1960215      5865 100% /etc/ssh
tmpfs                  1792091      1884   1790207   0% /lib/modules
/dev/sda1              1966080   1960215      5865 100% /etc/hosts
/dev/sda1              1966080   1960215      5865 100% /var/etc/hostname
/dev/sda1              1966080   1960215      5865 100% /etc/resolv.conf
/dev/sda1              1966080   1960215      5865 100% /var/etc/docker
tmpfs                  1792091       376   1791715   0% /var/run/docker.sock
/dev/sda1              1966080   1960215      5865 100% /var/lib/waagent
tmpfs                  1792091      1884   1790207   0% /usr/local/bin/docker
/dev/sdb1                  256        27       229  11% /mnt/resource

Based on moby/moby#10613 we ran docker rmi $(docker images -q --filter "dangling=true") and this took inodes used down to 21%.

from for-azure.

ddebroy avatar ddebroy commented on July 17, 2024

Seems like something is off with the VHD used by the template I pointed to earlier: it is not mounting /dev/sdb correctly. Will update with more findings.

from for-azure.

spanditcaa avatar spanditcaa commented on July 17, 2024

Thanks @ddebroy

from for-azure.

ddebroy avatar ddebroy commented on July 17, 2024

Update: It turns out the template I referred to earlier https://download.docker.com/azure/17.06/17.06.2/Docker-DDC.tmpl points to VHD 1.0.9 which did not incorporate the enhancement to use the second larger sdb disk provisioned by Azure to mount /var/lib/docker. That enhancement is first being rolled out in the CE version and then based on how things are going, will roll be rolled out as part of the next EE release: 17.06.2-ee4.

Re-reading the above, it sounds like the df -h output was in sync with what UCP was reporting but the problem was the i-node exhaustion which docker rmi took care of, correct?

from for-azure.

spanditcaa avatar spanditcaa commented on July 17, 2024

Yes @ddebroy, but we ended up in a bad state between managers and workers - similar to what is described here: docker-archive/classicswarm#2044

Although we could pull down images after the docker rmi, the tasks wouldn't advance past 'assigned' state, and the workers were logging the following:

Not enough managers yet. We only have 0 and we need 3 to continue.
sleep for a bit, and try again when we wake up.

We tried provisioning a new worker (which failed to connect) and restarting the ucp agent and controller, to no avail.

At this point, we deleted the cluster again and may wait for 17.06.2-ee4. Is there an expected release date ?

from for-azure.

ddebroy avatar ddebroy commented on July 17, 2024

Hmm .. I am not sure of the steps you took but a worker will never log the message

Not enough managers yet. We only have 0 and we need 3 to continue.
sleep for a bit, and try again when we wake up.

It is something a new manager logs when it is unable to join the swarm. Sounds like you were trying to bring up new manager nodes? Looking through your diagnostics logs from the initial message, the swarm appears to be in a stable state. I guess the swarm cluster ended up in a bad state once the inode issue appeared.

By any chance, is there a way, you can share steps to repro step (2) above in a manner as close to what you tried as possible: Deploy a number of services (accumulated worker images are about 14GB) that will allow us to repro your environment internally and investigate?

Regarding 17.06.2-ee-4: we are running into some delays with getting the VHDs (that work the way we want with 17.06.2-ee-4) published through Azure. Will update once that is done and we are ready.

from for-azure.

spanditcaa avatar spanditcaa commented on July 17, 2024

Sure - it is probably a side effect of node.js applications, where we have thousands of tiny files that make up the application. I'll see if I can locate a suitable example, otherwise I'll publish a sample for you that triggers the issue.

from for-azure.

spanditcaa avatar spanditcaa commented on July 17, 2024

@ddebroy - @jeffnessen mentioned he has a suitable test container for you.

from for-azure.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.