Giter VIP home page Giter VIP logo

incident-command's Introduction

Incident Commander

Incident Commander, first 15 minutes (... and the rest of the minutes), or "How to start (and continue) with Incident Command"

Why?

With commanding incidents, the worst part is to start doing it. Once you start, it only gets easier. Well, it doesn't, but starting is pretty tough still.

So, to help you start with this fine discipline, I've prepared a general flowchart for commanding responses to typical disruptions happening in IT.

Incident Command Flowchart

graph TD
    A[Incident started] --> |assign commander| B[Are the right people in the room?]
    B -->|yes|C[Do we understand <br> symptoms?]
    B -->|no|B1[reach out and <br> involve more people] --> B
    C -->|yes|D[Are external <br> customers impacted]
    C -->|no|B
    D -->|yes|E[Assign Comms Lead]
    E --> F[Update external <br> comms channels]
    G[What change do we <br> need to make next?]
    F --> G
    D --> G
    G --> H[Does the change pose additional risk?]
    H -->|no|I[Implement the change]
    H -->|yes|H1[communicate with stakeholders] --> H2[Is the risk worth the outcome?] -->|yes|I
    H2 -->|no|G
    I --> J[Did the change resolve <br> all the symptoms?]
    J -->|yes| K[Resolve the incident]
    J -->|no| B

Steps commented & explained

Incident started

Someone decided that something warrants an incident. It is important that people are able to trust themselves when starting incidents, it is better to be safe than sorry. This can be done by a customer focusing role, but also by roles without direct customer contact.

Remember though, internal customers are customers too.

Assign commander

If you're the only one knowing about the incident, why not assign that role to yourself? Before you and the team get to any tech work, there will be some organisation work happening, so it is good that the commander role is assigned and the responsibilities are taken care of right from the start.

Are the right people in the room?

Important question! At this point, you probably have some people involved in the incident, depending on how well your service ownership model works.

Ask the team if they're confident they'll be able to solve the problem or if they'd like to involve more responders.

These might be:

  • more experienced engineers in the same field
  • people having more context

Some cultures and personalities can be reluctant to ask for help, so as a commander, you need to make sure that people are comfortable pulling in more people if needed.

Do we understand the symptoms?

You probably collect tons of metrics, logs, traces. Do you know which ones matter in this specific case? Do you understand the causality? If not sure, you might want to either take time to research or pull in more people (see arrow going back to "right people").

Are external customers impacted?

At this point, you need to figure out if what's happening is impacting people and to what degree. Think about the future, too.

If your customers live in a specific timezone and it is night for them at the moment, they might be okay now, but they're likely to be impacted in the near future. Calculate that into your preps.

If unsure, check for signs of customer impact - that might be your SLO breaches, amount of tickets being open. Communicate openly in your company's more populated channels to find out. Be transparent.

Assign Comms Lead

Comms Lead is useful to handle things related to communication to external customers. They can take care of handling the status page updates and other comms to keep the public informed, while you focus on addressing the tech issues.

TODO: Risk of changes

incident-command's People

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.