Giter VIP home page Giter VIP logo

dev's Issues

terminating a one of three etcd masters caused massive fail

The chain of events:

  1. Terminate a master node #1
  2. Notice that etcd on master node #2 hangs with:
$ fleetctl list-units
2014/10/11 02:38:21 ERROR fleetctl.go:171: error attempting to check latest fleet version in Registry: timeout reached
2014/10/11 02:38:21 INFO client.go:278: Failed getting response from http://127.0.0.1:4001/: cancelled
Error retrieving list of units from repository: timeout reached
  1. Notice that etcd on master node #3 fails with:
$ ssh 10.10.126.61 etcdctl ls /
Error:  501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
  1. Issue systemctl restart etcd on master #2
  2. Notice that fleet is all screwed up:
$ fleetctl list-units
UNIT                    MACHINE             ACTIVE      SUB
[email protected]       268251a3.../10.10.126.61    inactive    dead
[email protected]       4e283163.../10.10.146.25    inactive    dead
[email protected]       836de616.../10.10.1.191     inactive    dead
[email protected]            268251a3.../10.10.126.61    inactive    dead
[email protected]            4e283163.../10.10.146.25    inactive    dead
[email protected]            836de616.../10.10.1.191     activating  start-pre
[email protected]    268251a3.../10.10.126.61    activating  start-pre
[email protected]    4e283163.../10.10.146.25    activating  start-pre
[email protected]    836de616.../10.10.1.191     active      running
[email protected]     836de616.../10.10.1.191     active      running
[email protected]     4e283163.../10.10.146.25    activating  start-pre
[email protected]     268251a3.../10.10.126.61    activating  start-pre
[email protected]      268251a3.../10.10.126.61    activating  start-pre
[email protected]      4e283163.../10.10.146.25    activating  start-pre
[email protected]      836de616.../10.10.1.191     active      running
[email protected]         836de616.../10.10.1.191     active      running
[email protected]         4e283163.../10.10.146.25    active      running
[email protected]         268251a3.../10.10.126.61    active      running
[email protected]         836de616.../10.10.1.191     activating  start-pre
[email protected]         4e283163.../10.10.146.25    inactive    dead
[email protected]         268251a3.../10.10.126.61    failed      failed
[email protected]            268251a3.../10.10.126.61    active      running
[email protected]            4e283163.../10.10.146.25    active      running
[email protected]            836de616.../10.10.1.191     active      running
nginx-presence-dns.service      4e283163.../10.10.146.25    activating  start-pre
[email protected]        268251a3.../10.10.126.61    inactive    dead
[email protected]        4e283163.../10.10.146.25    inactive    dead
[email protected]        4e283163.../10.10.146.25    inactive    dead
[email protected]             268251a3.../10.10.126.61    activating  start-pre
[email protected]             4e283163.../10.10.146.25    activating  start-pre
[email protected]             4e283163.../10.10.146.25    activating  start-pre

fleet failure should force a reboot or shutdown or something

We had a cluster of six machines, 3 master and 3 worker. But fleetctl list-machines showed only one worker. This came recently after the failure described in #1, so it is possible that fleet died in reaction to the etcd failure.

Regardless there are classes of failure where we should reboot/terminate/bail:

  • failure of fleet on a worker
  • failure of etcd on a master
  • ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.