Giter VIP home page Giter VIP logo

deprecated-dispatch's Introduction

⚠️ DEPRECATED ⚠️

Build Status

Dispatch logo

Dispatch is an alarm routing tool for security and platform incident response teams. It dynamically routes alarms to PagerDuty or Slack based on incident severity, urgency, or type. Dispatch sends interactive Slack direct messages that empower users to self-triage their own security alarms. It also supports emergency broadcast style alerts via Slack, as well as escalating alarms from Slack to PagerDuty. For each alarm, Dispatch creates a GitHub issue for auditing and logging purposes, avoiding the need to maintain a separate database to store state.

To use Dispatch, have your applications and monitoring systems send AWS Simple Notification Service (SNS) messages following the Dispatch message specification to your Dispatch SNS topic.

Dispatch alert types

  • Self-service alerts send interactive Slack messages to users, prompting them to answer yes or no. The user's response is tracked via a GitHub issue for audit purposes. If a user responds yes, it closes the issue. If a user response no, Dispatch escalates the alarm to PagerDuty.
  • Broadcast alerts are non-interactive messages delivered via Slack to multiple users. These alerts create a single GitHub issue for audit purposes with a list of users that received the message.
  • High priority alerts are sent directly to PagerDuty.
  • Low priority alerts are sent directly to a GitHub issue.

Architecture

Dispatch consists of two separate AWS Lambda functions that leverage the lambda-cfn framework:

  • dispatch-incoming: receives SNS notifications and creates PagerDuty alarms or GitHub issues.
  • dispatch-triage: uses API Gateway to respond to Slack interactive messages, either closing the corresponding GitHub issue or escalating the issue to PagerDuty.

Prerequisites

Lambda-cfn

To deploy and manage Dispatch you'll need to globally install the latest version of lambda-cfn.

npm install -g @mapbox/lambda-cfn

Third party services

You'll also need a GitHub organization with private repositories, a PagerDuty account, and a Slack workspace in order to run Dispatch.

Set up

To set up Dispatch for your organization, you'll need to do the following:

  1. Configure GitHub
  2. Configure PagerDuty
  3. Configure the Dispatch Slack app and bot
  4. Configure AWS Key Management Service (KMS)
  5. Deploy the dispatch-incoming AWS Lambda function
  6. Deploy the dispatch-triage AWS Lambda function
  7. Update the Dispatch Slack app with the dispatch-triage API Gateway URL

1. Configure GitHub

To configure GitHub for Dispatch, you'll need to do the following:

  1. Create or select a default GitHub repository for Dispatch GitHub issues
  2. Select or create a failover default GitHub user or team
  3. Create a machine account or select an existing user account to run Dispatch
  4. Generate a GitHub personal access token with repo scope with the account from Step #2

Dispatch creates a new GitHub issue for each alarm, using the title and body from the Dispatch message specification to populate the issue. You can use an existing GitHub repository or create a new one. You'll provide the name of the default GitHub repository via the GitHubRepo CloudFormation parameter when deploying the incoming and triage functions via lambda-cfn in steps 3 and 4 of setup. Dispatch will default to creating issues in this repository; however, you can also specify a different destination repository using the githubRepo property in the SNS message specification. This allows different alarms to be routed to different GitHub repos.

When deploying Dispatch you'll also need to provide a GitHub personal access token with a full repo scope via the GitHubToken CloudFormation parameter. For least privilege we recommend that you use a dedicated GitHub account that only has write access to your Dispatch alerts repository. Dispatch will use the account associated with the access token to create GitHub issues.

If Dispatch doesn't receive the GitHub handle for the user in the SNS message, then it will fallback to tagging a default GitHub user or GitHub team. Provide this via the GitHubDefaultUser CloudFormation parameter.

It's on our road map to evaluate and possibly switch to GitHub apps instead of personal access tokens.

2. Configure PagerDuty

You'll need to create a new PagerDuty service or use an existing one for Dispatch to send alerts to. You'll also need a PagerDuty admin or account owner to generate a new dedicated API key for Dispatch.

3. Configure Slack

You'll need to create a custom Slack app and bot user in your Slack workspace for Dispatch. It's on our road map to eventually publish an installable Slack app in the public Slack App Directory to make this process easier.

  1. Visit https://api.slack.com/apps/, click Create an App. Provide a name, select your Slack workspace, then click Create App.
  2. Scroll down to App Credentials and save the value for Verification Token somewhere safe and secure. You'll need this value later when deploying dispatch-triage for the SlackVerificationToken parameter.
  3. Scroll down to Display Information and upload the Dispatch Slack App icon as well as provide a description for your users. We recommend "Security alarm routing bot - https://github.com/mapbox/dispatch" but feel free to use your own!
  4. Click on Bot Users under the Features section, then create a Bot User named Dispatch and check Always Show My Bot as Online.
  5. Click on OAuth & Permissions under the Features section, then scroll down to the Scopes section. Add the chat:write:bot scope. You should already see the bot scope added from step 2, but if not then add it.
  6. On the same page, scroll to the top and click on Install App to Workspace then Authorize.
  7. Save the value for the Bot User OAuth Access Token somewhere safe - you'll need it for the SlackBotToken parameter later when deploying the dispatch-incoming Lambda function. You can also retrieve this later by clicking on Install App under the Settings section.

4. Configure AWS Key Management Service (KMS)

Dispatch by default uses cloudformation-kms to decrypt the values of sensitive CloudFormation parameters, such as PagerDuty and Slack API keys, that are encrypted as part of the deploy process with lambda-cfn. Follow the setup instructions for cloudformation-kms.

If you'd prefer to not use cloudformation-kms, then you can also edit the CloudFormation templates for both incoming and triage to use raw KMS key ARNs instead of cloudformation-kms stacks. Replace the following statements section of function.template.js for both the dispatch-incoming and dispatch-triage AWS Lambda functions.

Instead of

  statements: [
    {
      Effect: 'Allow',
      Action: [
        'kms:Decrypt'
      ],
      Resource: {
        'Fn::ImportValue': {
          'Ref': 'KmsKey'
        }
      }
    }
  ],

Instead use

  statements: [
    {
      Effect: 'Allow',
      Action: [
        'kms:Decrypt'
      ],
      Resource: {
        'Ref': 'KmsKey'
      }
    }
  ],

This will allow you to pass in a raw KMS key ARN when deploying both Lambda functions instead of a CloudFormation stack name or alias.

5. Deploy the dispatch-incoming AWS Lambda function

To deploy dispatch-incoming to your AWS infrastructure you'll need to first clone Dispatch, navigate to the incoming directory, then use lambda-cfn create to launch a new CloudFormation stack. Since we're providing sensitive credentials as parameter values, to encrypt them in CloudFormation we'll use the -k flag with lambda-cfn create.

git clone [email protected]:mapbox/dispatch.git
cd dispatch/incoming
lambda-cfn create <environment-name> -k

For example, if you run lambda-cfn create dev -k this will create a CloudFormation stack named dispatch-incoming-dev.

When deploying or updating dispatch-incoming you'll need to provide values for the following CloudFormation parameters:

  • GitHubOwner = Your GitHub organization's name
  • GitHubDefaultUser = Default GitHub user or team when a user's GitHub handle is missing
  • GitHubRepo = Default GitHub repository for Dispatch issues
  • GitHubToken = [sensitive] GitHub personal access token for Dispatch machine account
  • PagerDutyServiceId = The ID of your Dispatch PagerDuty service, obtained from the service URL in PagerDuty
  • PagerDutyFromAddress = Email address of a valid PagerDuty user in your team, required by the PagerDuty API
  • PagerDutyApiKey = [sensitive] PagerDuty API key
  • slackDefaultChannel = Fallback Slack channel for when Dispatch direct messages fail
  • SlackBotToken = [sensitive] Bot user OAuth access token from your Dispatch Slack app (begins with xoxb-)
  • KmsKey = Cloudformation-kms stack name or AWS KMS key ARN to encrypt sensitive parameter values

For CodeS3Bucket, CodeS3Prefix, GitSha, and ServiceAlarmEmail please see the lambda-cfn documentation for these parameters.

6. Deploy the dispatch-triage AWS Lambda function

Similar to deploying dispatch-incoming, switch to the triage directory then deploy dispatch-triage using lambda-cfn create -k <environment name>.

You'll need to provide most of the same parameter values from deploying dispatch-incoming. Notably, you'll need to provide the Slack verification token for your Dispatch app (step #2 of configuring Slack) for the SlackVerificationToken CloudFormation parameter.

7. Update the Dispatch Slack app with the dispatch-triage API Gateway URL

  1. After deploying dispatch-triage, from the triage directory run lambda-cfn info <environment name> then scroll down to the Outputs section of the CloudFormation template.
  2. Copy the value for triageWebhookAPIEndpoint. It should be an AWS API Gateway URL.
  3. Visit your Slack Apps, then click on Interactive Components under the Features section.
  4. Click on Enable Interactive Components.
  5. Paste the URL for triageWebhookAPIEndpoint under Request URL and click on Save changes.

You're done setting up Dispatch! You can now test and verify your installation, see the Testing section.

Testing

You can test your Dispatch installation by using the AWS CLI to send SNS messages that follow the Dispatch message specification. For the complete message specification see MESSAGE-SPEC.md.

We've provided examples for each Dispatch alert type - self-service, high priority, and broadcast - below. To obtain your Dispatch SNS topic ARN ($SNS_ARN in the examples), from the incoming directory:

  1. Run lambda-cfn info <environment name>
  2. Scroll down to the Outputs section of the CloudFormation template and copy the value for incomingSNSTopic.

Self-service example

This will send a Slack direct message from your Dispatch bot and create a GitHub issue in your Dispatch repo for a user. If the user clicks yes it will close the GitHub issue. If the user clicks no it will trigger a PagerDuty incident.

Replace $SNS_ARN and $USER with your SNS topic ARN and your GitHub and Slack usernames.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" \
--message "{\"type\":\"self-service\",\"users\":[{\"slackId\": \"$USER\",\"github\":\"$USER\"}],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"prompt\":\"testSlackPrompt\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"

Broadcast example

Broadcast alerts send non-interactive Slack messages to multiple users. They create a single GitHub issue of the broadcast for audit purposes, but do not create a GitHub issue for each user. Replace $SNS_ARN with your SNS topic ARN and provide GitHub and Slack usernames for $USER1 and $USER2.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" \
--message "{\"type\":\"broadcast\",\"users\":[{\"slackId\": \"$USER1\"},{\"slackId\": \"$USER2\"}],\"body\":{\"github\":{\"title\":\"broadcast title\",\"body\":\"broadcast body\", \"labels\": [\"broadcast\"]},\"slack\":{\"message\":\"testSlackMessage\"}}}"

High priority example

High priority Dispatch alerts create PagerDuty incidents without creating a GitHub issue. Replace $SNS_ARN and $PD_SERVICE_ID with your SNS topic ARN and PagerDuty service ID.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"high-priority\",\"body\":{\"pagerduty\":{\"service\":\"$PD_SERVICE_ID\",\"title\":\"testAlert\",\"body\":\"testAlert\"}}}"

Low priority example

Low priority Dispatch alerts create a GitHub issue only. Replace $SNS_ARN and $GITHUB_REPO with your SNS topic ARN and target GitHub repository.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"low-priority\",\"githubRepo\":\"$GITHUB_REPO\",\"body\":{\"github\":{\"title\":\"low-priority title\",\"body\":\"low-priority body\", \"labels\": [\"low_priority\"]}}}"

Nag example

Low priority Dispatch alerts create a GitHub issue only. Replace $SNS_ARN and $GITHUB_REPO with your SNS topic ARN and target GitHub repository.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"nag\",\"githubRepo\":\"$GITHUB_REPO\",\"body\":{\"github\":{\"title\":\"nag title\",\"body\":\"low-priority body\", \"labels\": [\"low_priority\"]}}}"

Development

Installation

Make sure you are running Node 6.10.3 with npm 5 installed.

git clone [email protected]:mapbox/dispatch.git
cd dispatch
npm install

Tests

Dispatch uses eslint for linting and tape for tests. It mocks HTTP requests with sinon and nock. Tests run on Travis CI after every commit.

  • npm test will run eslint then tape.
  • npm lint will only run eslint.
  • npm unit-test will only run tape tests.

Feature Roadmap

The planned features and development roadmap for Dispatch can be found in the Dispatch Roadmap GitHub project.

Contributing

Contributors are welcome! If you want to contribute, please fork this repo then submit a pull request (PR).

All of your tests should pass both locally and in Travis before we'll accept your PR. We also request that you add additional test coverage and documentation updates in your PR where applicable.

deprecated-dispatch's People

Contributors

agius avatar alulsh avatar elfakyn avatar ianshward avatar k-mahoney avatar matiskay avatar npeternel avatar zmully avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deprecated-dispatch's Issues

Unique identifier per dispatch run for logging

@zmully and I discussed having a unique identifier for all logging output associated with a single dispatch run, e.g. dispatch-incoming -> dispatch-oracle -> dispatch-triage. It will be difficult to resolve any issues with one of these steps if we can't identify all the associated logs.

I'm going to take a look at the triggering SNS message metadata and the dispatch-incoming lambda context object to see if there is an ideal unique identifier we can pass with all logs from the three lambda functions associated with a single dispatch run.

/cc @oliikit @ianshward

Refactoring to work with Slack's API changes

Slack announced that they are removing the username object from their API and will now only support a mutable display_name object and the full user id - W012A3CDE.

https://api.slack.com/changelog/2017-09-the-one-about-usernames

Currently we track the username in our employee database, which is what we query to associate the GitHub username or other IDs from the initial alarm with the Slack user. We will no longer be able to rely on this.

Some initial thoughts on how to pivot and account for this change...

  • Replace the stored Slack username with the full, immutable user id in internal database
  • Further improve fallback and error handling for missing username cases

/cc @mapbox/security

Dispatch responses returned as JSON by API Gateway

The default lambda-cfn config treats anything returned by the function as JSON. With Dispatch triage this means the response to the user is always quoted. NBD, but kind of annoying. The default API-gateway response mapping would need to be updated for triage.

image

Adapt a patrol rule to use Dispatch

Which patrol rule should we adapt to use Dispatch? How about 2FA disabled on Github ? What does it actually take to adapt a rule to use Dispatch?

The patrol rule needs to send an SNS message to dispatch. To do this there needs to be an SNS IAM policy in place to allow the lambda to post to the dispatch SNS topic. @zmully i partly recall you discussing this. Is this supported in lambda-cfn?

Otherwise, it looks like what it will take is swapping out message with code that sends a dispatch-formatted message to dispatch's SNS topic. Maybe we could ship a client with some error handling in dispatch, so you could do, within patrol-rules-github:

var client = require('dispatch').client
client.send({
  // properties for the message
}, function(err, res) {

}

@k-mahoney @zmully do you think it makes sense to start w/ this rule first, and, what do you think about the steps and the client?

Make KMS configurable

Right now the KMS key for encrypting secure CloudFormation parameters is hard coded in the CloudFormation template for both dispatch-incoming and dispatch-triage.

We should find a way to parameterize it. Hopefully this is a matter of creating a new CloudFormation parameter, then reference in the value of the parameter in the IAM statement.

/cc @k-mahoney

Do not require a GitHub issue for broadcast

We should have the option to send a broadcast message without creating a GitHub issue. Have some initial work on this, iterated through discussion related to Jamf enrollment alert PoC.

User testing

We need to test both the broadcast and self-service work flow with folks outside of the security team - to get a sense of how simple it is to use, how straight forward.

Alongside testing with a handful of specific people, as a PoC we'll be doing our initial test with the DC office, sending a broadcast alert instructing how to register for Jamf and the deadline for doing so. Alert text can be found here. Will track feedback in this ticket.

Round 1

Round 2

/cc @zmully @oliikit

Staging dispatch

@k-mahoney what would it take to get a staging dispatch in place, whereby we could start sending messages to a staging dispatch-incoming and DM ourselves in the Mapbox Slack org? Getting staging dispatch-incoming and dispatch-triage stacks up is straight forward, but I have less of an idea what's required to hook up Slack. Does it make sense timing-wise to set this up now?

cc @oliikit

Support for announce-only tasking

Sometimes we'll want to use dispatch to DM + create GH issue for folks, but, not give the option to escalate the incident to Pagerduty since doing so would not serve a helpful purpose. An example is where we want to task a set of individuals with going through the initial enrollment of their laptop into Jamf.

Questions related to this are:

  • Do you have a single button in the Slack message, or, do you just send a non-interactive Slack DM message with a link to the Github issue? I lean toward the latter.
  • How do we capture this in the message spec? Is it announce?
{
  priority: high | self_service | announce
}

cc @k-mahoney @oliikit

Volume testing broadcast messages

Make sure we can:

  • Dispatch a broadcast message to the entire company (say 300 for now)
  • Dispatch a self-service message to entire company

Then double the volume to see if we hit any API limits in GitHub (self-service) or Slack (either type).

Expand GitHub issue ticket functionality

In working through implementing out first broadcast and self-service alerts, we've found that more passive error reporting in the associated GitHub issue would be ideal - rather than failing outright or escalating immediately to PD. For Slack errors specifically, @zmully has laid out a lot of the initial work on this in #78.

Similarly, it would make sense to have a dispatch route that simply ingested the incoming SNS and opened an issue ticket and tagged a specified team or team member. We currently have priority, self-service, and broadcast alert types - so maybe this could be low-priority.

dispatch-oracle: knowing who to DM and tag

In order to know who to DM on Slack, as well as tag in Github issues, for rules like https://github.com/mapbox/patrol-service/issues/137 we need a way to look up a user's Slack and Github. On one hand we could assume that all messages sent to dispatch should already contain who to DM and who to tag. However, having to bake that into every single patrol rule would not be very convenient. I'm leaning toward that we should support two ways (example message json sent to dispatch. Property names are just examples and not implying we call them this):

{
  recipient: {
    slack: "@ianshward",
    github: "@ianshward"
  }
}

In this format, we tell dispatch we know who to send the message to

So what does our REST endpoint do

For our REST endpoint (dispatch-oracle), it'd be an api-gateway-fronted lambda. The lambda code would fetch the latest version of our internal list of GitHub and Slack handles.

The interface could look like:

lookup('@ianshward')

What do other peoples' REST endpoints do

Assuming we open source dispatch, people can plug in a URL to their "oracle endpoint" and it can work however they want it to. It should just return JSON in a specified manner.

Where does this live

This dispatch-oracle should be a separate GH project, since if we open source Dispatch, dispatch-oracle is very custom to our use case and we would not want to make it publicly available.

Is it overkill

I don't think so. This is one of the most critical aspects of dispatch - knowing who to assign a task to, and, ensuring dispatch always has a very up-to-date directory (which /us does for us already).

cc @k-mahoney @oliikit

Load testing self-service messages

See #51 for background on broadcast message load testing. Self-service messages will be more difficult to control concurrency for as the current dispatch architecture for self-service dispatches map a single SNS event to a single self-service dispatch lambda function. As Lambda has no concurrency controls other than an account invocation limit, if 200 messages are dispatched, Lambda could invoke all 200 Dispatch-incoming lambda functions at once, and Slack would rate-limit some number of them.

Possibilities:

{
  forever: true,
  retries: 20, 
  factor: 1.22226,
  maxTimeout: 5 * 60 * 1000,
  randomize: true
}

Update/add docs for open sourcing

As part of open sourcing Dispatch we should update our existing documentation as well as add new docs.

Update existing

  • README.md
  • MESSAGE-SPEC.md

Add new

  • [ ] CONTRIBUTING.md using a Contributing section of the README instead
  • CHANGELOG.md

@k-mahoney noted that for the MESSAGE-SPEC, we should be clear about their needing to be an outside oracle that stores username mappings, and that this could be either a service or a file.

Release Dispatch Slack app

Carrying on from https://github.com/mapbox/dispatch/issues/63, we have the option to potentially release dispatch as a Slack application. Currently we include documentation for setting it up as a private Workspace application, it'd be great to alleviate the manual steps we can.

Will need to explore the Slack app release process further, as well as determine whether custom parameters like the API Gateway URL and SNS Topic are possible in a released application.

TODOs in dispatch-triage

There are about five TODOs in dispatch-triage which I left open related to the finalization of the message spec, and, uncertainty I had around what would be where in terms of the payload that comes from slack. There are also a couple cases of adding proper error handling. They're all identified with a TODO in the dispatch-triage function.

cc @k-mahoney @oliikit

Update readme documentation for dispatch

Readme needs to be updated before we 🛳️. We can start off with the overall purpose of dispatch and our motivation for developing the system, followed by the more technical documentation:

  • overall architecture
  • how to send a message to dispatch
  • how to start up your own dispatch service

The answers to "why dispatch is a thing" and what purposes security will use it for should reside in /security. (Ticket-https://github.com/mapbox/security/issues/533)

cc @mapbox/security

Open sourcing Dispatch

Creating an umbrella ticket to discuss and track how to open source Dispatch.

We're dividing tasks into two categories:

  1. bare minimum clean up, sanitization, and parameterization that must be done before "flipping the switch" to make Dispatch open source
  2. improvements and fixes that can technically be done after open sourcing but are still critical to Dispatch's success and adoption as an open source project

The first category of tasks won't take long at all, but it'll be important to knock these out of the way quickly so we can spend more of our week working on the features in the second category. We don't necessarily have to open source the repo after all of category 1 is complete - but we want to be able to confidently do this by the end of next week (Friday, November 10th).

@k-mahoney gardened the Dispatch project to move all tasks from category 1 to the "Current Phase". Once those are done (or almost done) we can start moving high priority category 2 items to the "Current phase".

Category 1:

Issue already in Category 2 for sure:

Dashboard of Dispatch activity in Sumologic

So we can keep 👀 on dispatch's activity, we should have a dashboard of it's activities which should include, but is not limited to:

  • dispatch oracle's directory creation (should be every 2 hours)
  • requests to oracle
  • messages sent to users which includes:
    • message content
    • if the user took action
  • any errors

thoughts on test organization

@oliikit I took a look at the gh-tests branch. I know this is still in progress, but I've got some test organization thoughts that may be helpful. Here goes:

  • Create a ./test/fixtures directory
    • sns.js looks like it's a fixture. is so, you can stick that into the ./test/fixtures directory
    • we'll likely have a few more fixtures
  • Put the tests for the lambdas in sub directories, like:
    • ./test/triage/triage.test.js
    • ./test/incoming/incoming.test.js
    • Based on yesterday's meeting, we're going to rename the code files from dispatch-triage.js and dispatch-incoming.js to just triage.js and incoming.js so these test names will follow suit.
  • Sounds good to keep the ./lib tests in ./test/lib like you've got them.
  • I'm not sure where patrol-sns-examples.js belongs yet... it looks like a fixture. I know this was already there. You could put it in ./test/fixtures for now

cc @k-mahoney

"Send reminder" functionality

For the future backlog - for tasks that create GH issues to track a team member's progress on completing an action, especially those where we want dispatch to track state as to who has and has not taken an action when doing so is otherwise impossible to track from some other data source we have access to, we could have a remind CLI command which would work like:

  • $ remind "title of issue" "text of reminder"

  • dispatch-remind looks up all the GH issues open that match the title
  • dispatch-remind posts a reminder comment to remind folks they still need to complete the action

Interactive slack buttons support return values

Currently the interactive slack buttons return the value of the triage function's successful callback, so the user gets a message like "b3c90ad35e58: closed GitHub issue 41" when they click on an interactive button.

The Slack interactive buttons (https://api.slack.com/docs/interactive-message-field-guide) support passing a value behind the scenes which we can use to hold the value of what we'd like the response to the user to be. This is currently hardcoded in dispatch.

So the SNS message could look like:

{
   ...
        slack: {
            message: 'STRING_VALUE', 
            actions: {
                yes: 'Yes, I've enrolled!',
                yes_response: 'Thank you for enrolling your computer in JAMF!',
                no: 'Nope, I had an issue enrolling!', // Slack button text for 'no' action type
                no_response: 'We're sorry you've had issues, the Security team has been notified, and we'll be in touch!'
            }
        }
    }
}

PagerDuty serviceId passed in dispatch message body

Currently, the PagerDuty serivceId is configured as a stack parameter in dispatch-incoming and triage, but should be overrideable by the a serviceId in the Dispatch message.

For example, a self-service dispatch creates a PagerDuty incident if the user hits the "no" slack action. Right now that will always go to the service set by the stack parameter, but if the message were to look like:

body: { // required
        github: { ... },
        slack: { ... }, 
       pagerduty: {
            service: 'PDNCI9', //optional, defaults to stack parameter
            title: 'Something' //optional, defaults to github title
            body: 'Something' //optional, defaults to github body
       }
    }

Protect dispatch-triage api-gateway

We should look at how to add authentication on the dispatch-triage api-gateway. I believe we could have api-gateway require an API token, and then pass the dispatch-triage api-gateway API token to the dispatch-incoming stack as a CFN parameter... or concatenated onto the action URL parameter (which is the api-gateway REST endpoint URL).

cc @k-mahoney @oliikit

Add labels to Github issues

We should support adding labels to Github issues. This will allow us to manage /dispatch-alerts with what's been looked at with the L1.

The label that is created should be tied to the inbound SNS message.

Retrigger: false - add option to check closed issues before creating new GitHub issue

I'd like add an option to the Dispatch message spec (and the relevant code in Dispatch itself) so that Dispatch also checks for closed GitHub issues with the same title before deciding to create a new issue + send a Slack DM.

The ultimate goal is to leverage Dispatch for checking state (in the GitHub issues) and for the code in my Patrol rule to avoid having to manage state.

Consider the following use case:

  1. A patrol-rule-google function checks for new public Google Drive documents. It can't use push notifications so it instead uses polling and returns the 50-100 most recent public file ACL changes in Google Drive. For the sake of not worrying about existing public Google Drive docs, it won't return any data before a specified date.
  2. This rule sends an SNS message to Dispatch using the message spec.
  3. Dispatch creates a GitHub issue with a title like "User X created a public Google Drive document on ." Due to the combination of username and timestamp, this title is unique (for our purposes). It also sends a Slack DM to the user.
  4. The user responds to the ticket and makes the Google Drive document private. They close the GitHub issue.
  5. Our patrol rule runs again after 30 minutes and since it's polling, returns the same data from Step 1. Since the original GitHub issue was closed, Dispatch creates a new GitHub issue (Dispatch currently only searches for open issues) and sends another Slack DM to the user. The user closes the issue and the process keeps repeating. This results in spamming the user and is highly undesirable.

/cc @k-mahoney @zmully @ianshward

Add package-lock.json

We should add a package-lock.json file to this project to avoid issues like https://github.com/mapbox/dispatch/issues/84 in the future.

Per the npm docs on package-lock.json:

package-lock.json is automatically generated for any operations where npm modifies either the node_modules tree, or package.json. It describes the exact tree that was generated, such that subsequent installs are able to generate identical trees, regardless of intermediate dependency updates.

This file is intended to be committed into source repositories

/cc @oliikit @zmully @k-mahoney

Integrating with Dispatch-oracle

So dispatch-oracle is created.

As #22 pointed out, we should create a utils function within Dispatch that we can provide a REST endpoint to lookup who we're trying to retrieve. The lookup function should look like

lookup('oliikit') // looking up Github handle
lookup('ASADSKLABA1232') // looking up an asset

The parameters may change for us based on optimizing the oracle's code (per https://github.com/mapbox/dispatch-oracle/issues/3#issuecomment-324465446).

The oracle's README covers how the request looks like and its response.

cc @ianshward @k-mahoney @zmully

Code clean-up tasks

Ticket to track assorted, lingering code clean-up tasks, in preparation for future open source plans.

  • Merge linter https://github.com/mapbox/dispatch/pull/36
  • Add Travis
  • Abstract dispatch-incoming GitHub issue creation to function
  • Abstract dispatch-incoming Slack alert post to function
  • Normalize styling
  • Normalize test structure
  • Add in-line function documentation
  • Expand tests in dispatch-incoming
    • Tests for decrypt failure behavior
    • Add tests for error handling in self-service events
    • Add tests for error handling in PD events

/cc @oliikit @ianshward

Handle users with different github and slack handles

Currently Dispatch assumes both github and slack handles are the same, and uses the github handle returned from the Oracle for both operations. It's possible that users (especially new users) could have different handles, and it's possible that in the future this will not be a safe assumption as github handles are globally unique, while slack handles are only unique to the company (I believe).

The Oracle already returns both in its response.

Commandline tool for manually triggering dispatch events

For many incidents, the appropriate course of action would be to open a master issue to triage and coordinate with then manually trigger a Dispatch event to notify affected users.

For instance, a credential leak requires rotation of X users credentials. While triaging the main incident, a Dispatch self-service event would be triggered from the commandline something like:

$ dispatch self-service --users affectedUsers.json --message message.json

The users file would be the array to be passed to the Oracle for lookup:
[ 'user1', 'user2', 'user3', 'user4' ]

The message json file would be appropriate the body object from the Dispatch message specification, for example a self-service message:

{
      github: {
            title: 'User1 NPM credentials require immediate rotation'
            body: 'We have detected that your NPM credentials may have been compromised. Rotate your credentials immediately. Please see issue #XXX for details and instructions'
        }
        slack: {
            message: 'Your NPM credentials may have been compromised, please rotate your NPM tokens immediately, following the instructions in issue #XXX. If you have any questions please ask in #security',
            actions: {
                yes: 'I have rotated all my NPM tokens',
                no: 'I'm busy, remind me later!', 
            }
        }
}

In this case of a self-service message, the dispatch CLI would:

  1. do a CFN look up for the SNS topic of the dispatch-incoming-production stack
  2. Query the Oracle with the user array
  3. Perform some basic error handling and defaults on the Oracle response
  4. Generate Dispatch messages and inject into the Dispatch SNS topic

Support more than just a single Github repository

Dispatch should allow for dispatch messages to define to what Github repository a message gets posted, instead of having this be a single repository. This entails at least changing the message spec. to support a repo parameter in the github property, and having the code use this to determine where to post the message.

Interactive dispatch generates two slack messages

@k-mahoney could not get the "replace_original": "false" flag to work so went with generating two messages instead. The issue here is that the action response (coming from the triage function) will by default overwrite the original message. By using two messages, only the second message, containing the "prompt" and actions gets overwritten, leaving the original alert body visible to the user.

https://api.slack.com/interactive-messages#building_workflows

Better error handling when parsing message

If this JSON.parse is done in a try/catch it'd be possible to point out incorrectly formatted json when the parsing fails. We've seen at least one bad message come through:

SyntaxError: Unexpected token } in JSON at position 174
at Object.parse (native)
at /var/task/incoming/function.js:28:26
at /var/task/lib/utils.js:15:7
at /var/task/node_modules/decrypt-kms-env/index.js:21:5
at Queue._call (/var/task/node_modules/decrypt-kms-env/index.js:67:5)
at maybeNotify (/var/task/node_modules/d3-queue/build/d3-queue.js:120:7)
at /var/task/node_modules/d3-queue/build/d3-queue.js:91:12
at Response. (/var/task/node_modules/decrypt-kms-env/index.js:61:9)
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:364:18)
at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:38:9)

cc @k-mahoney @alulsh

Message specification

What information do we need from the patrol SNS message for dispatch to work?

We'll need to have content to post to both the GitHub issue and Slack, as well as directions on how to resolve the issue if necessary. The GitHub username will be provided in the case of GitHub related alarms which will need to be mapped to the correct Slack username.

Tentative list:

  • Issue content for GitHub
  • Message content for Slack prompt
  • Button text for affirmative option (yes)
  • Button text for negative option (no)
  • GitHub username
  • AWS username
  • Slack username
  • Timestamp
  • Priority

We can add to this and further develop structure as we work through getting a working skeleton this week. I added some tentative JSON test objects for the time being that I'm using for Slack testing. These are by no means set in stone, just something to work with.

/cc @mapbox/security

Tests are failing on master

Tests were passing on master 4 days ago on Friday September 22nd, see https://travis-ci.com/mapbox/dispatch/builds/55238770. As of today they are now failing, see https://travis-ci.com/mapbox/dispatch/builds/55455033.

@oliikit first reported this while working on https://github.com/mapbox/dispatch/pull/83.

I added .log(console.log) to the failing nock test and got the following result:

matching https://api.pagerduty.com:443 to POST https://api.pagerduty.com:443/incidents: true
bodies don't match:
 { incident:
   { type: 'incident',
     title: '6cf9397c71e2: user kara responded \'no\' for self-service issue 7',
     service: { id: 'XXXXXXX', type: 'service_reference' },
     incident_key: '6cf9397c71e2' } }
 {"incident":{"type":"incident","title":"6cf9397c71e2: user kara responded 'no' for self-service issue 7","service":{"id":"XXXXXXX","type":"service_reference"},"incident_key":"6cf9397c71e2","body":{"type":"incident_body","details":"6cf9397c71e2: user kara responded 'no' for self-service issue 7\n\n https://github.com/testOwner/testRepo/issues/7"}}}

For some reason or another either PD sends a body property now or the body property was dropped (unsure which is the real request and which is the nock).

/cc @mapbox/security @oliikit

using the awscli sns to trigger dispatch-incoming

self-service

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"self-service\",\"users\":[\"USER\"],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"

high

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"high\",\"body\":{\"pagerduty\":{\"title\":\"pagerduty title\"}}}}"

replace the SNS_ARN with the correct SNS arn for the function under test.

CloudFormation parameter updates

Currently the GitHub repository dispatch tickets are opened in is set via the CloudFormation template. This should be updated such that the repository specified in the template acts as the default or fallback destination for dispatch tickets and allow a repository to be specified in the message specification. This way different dispatch alerts can be ticketed in specific repositories, rather than all defaulting to one.

/cc @alulsh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.