mapbox / deprecated-dispatch Goto Github PK

Alarm routing engine for security and platform incident response teams.

License: Other

JavaScript 100.00%

security pagerduty slack security-dataset-notifications

deprecated-dispatch's Introduction

⚠️ DEPRECATED ⚠️

Dispatch is an alarm routing tool for security and platform incident response teams. It dynamically routes alarms to PagerDuty or Slack based on incident severity, urgency, or type. Dispatch sends interactive Slack direct messages that empower users to self-triage their own security alarms. It also supports emergency broadcast style alerts via Slack, as well as escalating alarms from Slack to PagerDuty. For each alarm, Dispatch creates a GitHub issue for auditing and logging purposes, avoiding the need to maintain a separate database to store state.

To use Dispatch, have your applications and monitoring systems send AWS Simple Notification Service (SNS) messages following the Dispatch message specification to your Dispatch SNS topic.

Dispatch alert types

Self-service alerts send interactive Slack messages to users, prompting them to answer yes or no. The user's response is tracked via a GitHub issue for audit purposes. If a user responds yes, it closes the issue. If a user response no, Dispatch escalates the alarm to PagerDuty.
Broadcast alerts are non-interactive messages delivered via Slack to multiple users. These alerts create a single GitHub issue for audit purposes with a list of users that received the message.
High priority alerts are sent directly to PagerDuty.
Low priority alerts are sent directly to a GitHub issue.

Architecture

Dispatch consists of two separate AWS Lambda functions that leverage the lambda-cfn framework:

dispatch-incoming: receives SNS notifications and creates PagerDuty alarms or GitHub issues.
dispatch-triage: uses API Gateway to respond to Slack interactive messages, either closing the corresponding GitHub issue or escalating the issue to PagerDuty.

Prerequisites

Lambda-cfn

To deploy and manage Dispatch you'll need to globally install the latest version of lambda-cfn.

npm install -g @mapbox/lambda-cfn

Third party services

You'll also need a GitHub organization with private repositories, a PagerDuty account, and a Slack workspace in order to run Dispatch.

Set up

To set up Dispatch for your organization, you'll need to do the following:

Configure GitHub
Configure PagerDuty
Configure the Dispatch Slack app and bot
Configure AWS Key Management Service (KMS)
Deploy the dispatch-incoming AWS Lambda function
Deploy the dispatch-triage AWS Lambda function
Update the Dispatch Slack app with the dispatch-triage API Gateway URL

1. Configure GitHub

To configure GitHub for Dispatch, you'll need to do the following:

Create or select a default GitHub repository for Dispatch GitHub issues
Select or create a failover default GitHub user or team
Create a machine account or select an existing user account to run Dispatch
Generate a GitHub personal access token with repo scope with the account from Step #2

Dispatch creates a new GitHub issue for each alarm, using the title and body from the Dispatch message specification to populate the issue. You can use an existing GitHub repository or create a new one. You'll provide the name of the default GitHub repository via the GitHubRepo CloudFormation parameter when deploying the incoming and triage functions via lambda-cfn in steps 3 and 4 of setup. Dispatch will default to creating issues in this repository; however, you can also specify a different destination repository using the githubRepo property in the SNS message specification. This allows different alarms to be routed to different GitHub repos.

When deploying Dispatch you'll also need to provide a GitHub personal access token with a full repo scope via the GitHubToken CloudFormation parameter. For least privilege we recommend that you use a dedicated GitHub account that only has write access to your Dispatch alerts repository. Dispatch will use the account associated with the access token to create GitHub issues.

If Dispatch doesn't receive the GitHub handle for the user in the SNS message, then it will fallback to tagging a default GitHub user or GitHub team. Provide this via the GitHubDefaultUser CloudFormation parameter.

It's on our road map to evaluate and possibly switch to GitHub apps instead of personal access tokens.

2. Configure PagerDuty

You'll need to create a new PagerDuty service or use an existing one for Dispatch to send alerts to. You'll also need a PagerDuty admin or account owner to generate a new dedicated API key for Dispatch.

3. Configure Slack

You'll need to create a custom Slack app and bot user in your Slack workspace for Dispatch. It's on our road map to eventually publish an installable Slack app in the public Slack App Directory to make this process easier.

Visit https://api.slack.com/apps/, click Create an App. Provide a name, select your Slack workspace, then click Create App.
Scroll down to App Credentials and save the value for Verification Token somewhere safe and secure. You'll need this value later when deploying dispatch-triage for the SlackVerificationToken parameter.
Scroll down to Display Information and upload the Dispatch Slack App icon as well as provide a description for your users. We recommend "Security alarm routing bot - https://github.com/mapbox/dispatch" but feel free to use your own!
Click on Bot Users under the Features section, then create a Bot User named Dispatch and check Always Show My Bot as Online.
Click on OAuth & Permissions under the Features section, then scroll down to the Scopes section. Add the chat:write:bot scope. You should already see the bot scope added from step 2, but if not then add it.
On the same page, scroll to the top and click on Install App to Workspace then Authorize.
Save the value for the Bot User OAuth Access Token somewhere safe - you'll need it for the SlackBotToken parameter later when deploying the dispatch-incoming Lambda function. You can also retrieve this later by clicking on Install App under the Settings section.

4. Configure AWS Key Management Service (KMS)

Dispatch by default uses cloudformation-kms to decrypt the values of sensitive CloudFormation parameters, such as PagerDuty and Slack API keys, that are encrypted as part of the deploy process with lambda-cfn. Follow the setup instructions for cloudformation-kms.

If you'd prefer to not use cloudformation-kms, then you can also edit the CloudFormation templates for both incoming and triage to use raw KMS key ARNs instead of cloudformation-kms stacks. Replace the following statements section of function.template.js for both the dispatch-incoming and dispatch-triage AWS Lambda functions.

Instead of

  statements: [
    {
      Effect: 'Allow',
      Action: [
        'kms:Decrypt'
      ],
      Resource: {
        'Fn::ImportValue': {
          'Ref': 'KmsKey'
        }
      }
    }
  ],

Instead use

  statements: [
    {
      Effect: 'Allow',
      Action: [
        'kms:Decrypt'
      ],
      Resource: {
        'Ref': 'KmsKey'
      }
    }
  ],

This will allow you to pass in a raw KMS key ARN when deploying both Lambda functions instead of a CloudFormation stack name or alias.

5. Deploy the dispatch-incoming AWS Lambda function

To deploy dispatch-incoming to your AWS infrastructure you'll need to first clone Dispatch, navigate to the incoming directory, then use lambda-cfn create to launch a new CloudFormation stack. Since we're providing sensitive credentials as parameter values, to encrypt them in CloudFormation we'll use the -k flag with lambda-cfn create.

git clone [email protected]:mapbox/dispatch.git
cd dispatch/incoming
lambda-cfn create <environment-name> -k

For example, if you run lambda-cfn create dev -k this will create a CloudFormation stack named dispatch-incoming-dev.

When deploying or updating dispatch-incoming you'll need to provide values for the following CloudFormation parameters:

GitHubOwner = Your GitHub organization's name
GitHubDefaultUser = Default GitHub user or team when a user's GitHub handle is missing
GitHubRepo = Default GitHub repository for Dispatch issues
GitHubToken = [sensitive] GitHub personal access token for Dispatch machine account
PagerDutyServiceId = The ID of your Dispatch PagerDuty service, obtained from the service URL in PagerDuty
PagerDutyFromAddress = Email address of a valid PagerDuty user in your team, required by the PagerDuty API
PagerDutyApiKey = [sensitive] PagerDuty API key
slackDefaultChannel = Fallback Slack channel for when Dispatch direct messages fail
SlackBotToken = [sensitive] Bot user OAuth access token from your Dispatch Slack app (begins with xoxb-)
KmsKey = Cloudformation-kms stack name or AWS KMS key ARN to encrypt sensitive parameter values

For CodeS3Bucket, CodeS3Prefix, GitSha, and ServiceAlarmEmail please see the lambda-cfn documentation for these parameters.

6. Deploy the dispatch-triage AWS Lambda function

Similar to deploying dispatch-incoming, switch to the triage directory then deploy dispatch-triage using lambda-cfn create -k <environment name>.

You'll need to provide most of the same parameter values from deploying dispatch-incoming. Notably, you'll need to provide the Slack verification token for your Dispatch app (step #2 of configuring Slack) for the SlackVerificationToken CloudFormation parameter.

7. Update the Dispatch Slack app with the dispatch-triage API Gateway URL

After deploying dispatch-triage, from the triage directory run lambda-cfn info <environment name> then scroll down to the Outputs section of the CloudFormation template.
Copy the value for triageWebhookAPIEndpoint. It should be an AWS API Gateway URL.
Visit your Slack Apps, then click on Interactive Components under the Features section.
Click on Enable Interactive Components.
Paste the URL for triageWebhookAPIEndpoint under Request URL and click on Save changes.

You're done setting up Dispatch! You can now test and verify your installation, see the Testing section.

Testing

You can test your Dispatch installation by using the AWS CLI to send SNS messages that follow the Dispatch message specification. For the complete message specification see MESSAGE-SPEC.md.

We've provided examples for each Dispatch alert type - self-service, high priority, and broadcast - below. To obtain your Dispatch SNS topic ARN ($SNS_ARN in the examples), from the incoming directory:

Run lambda-cfn info <environment name>
Scroll down to the Outputs section of the CloudFormation template and copy the value for incomingSNSTopic.

Self-service example

This will send a Slack direct message from your Dispatch bot and create a GitHub issue in your Dispatch repo for a user. If the user clicks yes it will close the GitHub issue. If the user clicks no it will trigger a PagerDuty incident.

Replace $SNS_ARN and $USER with your SNS topic ARN and your GitHub and Slack usernames.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" \
--message "{\"type\":\"self-service\",\"users\":[{\"slackId\": \"$USER\",\"github\":\"$USER\"}],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"prompt\":\"testSlackPrompt\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"

Broadcast example

Broadcast alerts send non-interactive Slack messages to multiple users. They create a single GitHub issue of the broadcast for audit purposes, but do not create a GitHub issue for each user. Replace $SNS_ARN with your SNS topic ARN and provide GitHub and Slack usernames for $USER1 and $USER2.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" \
--message "{\"type\":\"broadcast\",\"users\":[{\"slackId\": \"$USER1\"},{\"slackId\": \"$USER2\"}],\"body\":{\"github\":{\"title\":\"broadcast title\",\"body\":\"broadcast body\", \"labels\": [\"broadcast\"]},\"slack\":{\"message\":\"testSlackMessage\"}}}"

High priority example

High priority Dispatch alerts create PagerDuty incidents without creating a GitHub issue. Replace $SNS_ARN and $PD_SERVICE_ID with your SNS topic ARN and PagerDuty service ID.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"high-priority\",\"body\":{\"pagerduty\":{\"service\":\"$PD_SERVICE_ID\",\"title\":\"testAlert\",\"body\":\"testAlert\"}}}"

Low priority example

Low priority Dispatch alerts create a GitHub issue only. Replace $SNS_ARN and $GITHUB_REPO with your SNS topic ARN and target GitHub repository.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"low-priority\",\"githubRepo\":\"$GITHUB_REPO\",\"body\":{\"github\":{\"title\":\"low-priority title\",\"body\":\"low-priority body\", \"labels\": [\"low_priority\"]}}}"

Nag example

Low priority Dispatch alerts create a GitHub issue only. Replace $SNS_ARN and $GITHUB_REPO with your SNS topic ARN and target GitHub repository.

aws sns publish --topic-arn "$SNS_ARN" --subject "test" --message "{\"type\":\"nag\",\"githubRepo\":\"$GITHUB_REPO\",\"body\":{\"github\":{\"title\":\"nag title\",\"body\":\"low-priority body\", \"labels\": [\"low_priority\"]}}}"

Development

Installation

Make sure you are running Node 6.10.3 with npm 5 installed.

git clone [email protected]:mapbox/dispatch.git
cd dispatch
npm install

Tests

Dispatch uses eslint for linting and tape for tests. It mocks HTTP requests with sinon and nock. Tests run on Travis CI after every commit.

npm test will run eslint then tape.
npm lint will only run eslint.
npm unit-test will only run tape tests.

Feature Roadmap

The planned features and development roadmap for Dispatch can be found in the Dispatch Roadmap GitHub project.

Contributing

Contributors are welcome! If you want to contribute, please fork this repo then submit a pull request (PR).

All of your tests should pass both locally and in Travis before we'll accept your PR. We also request that you add additional test coverage and documentation updates in your PR where applicable.

deprecated-dispatch's People

Contributors

Stargazers

Watchers

Forkers

jrouly rubythonode rajiteh forkkit hashneet326 isabella232 mapclone

deprecated-dispatch's Issues

Unique identifier per dispatch run for logging

@zmully and I discussed having a unique identifier for all logging output associated with a single dispatch run, e.g. dispatch-incoming -> dispatch-oracle -> dispatch-triage. It will be difficult to resolve any issues with one of these steps if we can't identify all the associated logs.

I'm going to take a look at the triggering SNS message metadata and the dispatch-incoming lambda context object to see if there is an ideal unique identifier we can pass with all logs from the three lambda functions associated with a single dispatch run.

/cc @oliikit @ianshward

Refactoring to work with Slack's API changes

Slack announced that they are removing the username object from their API and will now only support a mutable display_name object and the full user id - W012A3CDE.

https://api.slack.com/changelog/2017-09-the-one-about-usernames

Currently we track the username in our employee database, which is what we query to associate the GitHub username or other IDs from the initial alarm with the Slack user. We will no longer be able to rely on this.

Some initial thoughts on how to pivot and account for this change...

Replace the stored Slack username with the full, immutable user id in internal database
Further improve fallback and error handling for missing username cases

/cc @mapbox/security

Dispatch responses returned as JSON by API Gateway

The default lambda-cfn config treats anything returned by the function as JSON. With Dispatch triage this means the response to the user is always quoted. NBD, but kind of annoying. The default API-gateway response mapping would need to be updated for triage.

Adapt a patrol rule to use Dispatch

Which patrol rule should we adapt to use Dispatch? How about 2FA disabled on Github ? What does it actually take to adapt a rule to use Dispatch?

The patrol rule needs to send an SNS message to dispatch. To do this there needs to be an SNS IAM policy in place to allow the lambda to post to the dispatch SNS topic. @zmully i partly recall you discussing this. Is this supported in lambda-cfn?

Otherwise, it looks like what it will take is swapping out message with code that sends a dispatch-formatted message to dispatch's SNS topic. Maybe we could ship a client with some error handling in dispatch, so you could do, within patrol-rules-github:

var client = require('dispatch').client
client.send({
  // properties for the message
}, function(err, res) {

}

@k-mahoney @zmully do you think it makes sense to start w/ this rule first, and, what do you think about the steps and the client?

Rename ./dispatch-triage and ./dispatch-incoming

This is just a refactor task, likely best todo after https://github.com/mapbox/dispatch/issues/6 the idea is to rename them to just:

./triage
./incoming

(getting rid of the dispatch- prefix) which is just redundant.

Tests for dispatch-triage

dispatch-triage still needs tests. Waiting on wrapper around dke

Make KMS configurable

Right now the KMS key for encrypting secure CloudFormation parameters is hard coded in the CloudFormation template for both dispatch-incoming and dispatch-triage.

We should find a way to parameterize it. Hopefully this is a matter of creating a new CloudFormation parameter, then reference in the value of the parameter in the IAM statement.

/cc @k-mahoney

Do not require a GitHub issue for broadcast

We should have the option to send a broadcast message without creating a GitHub issue. Have some initial work on this, iterated through discussion related to Jamf enrollment alert PoC.

Tracking Slack username mismatches in Jamf enrollment broadcast

Several dispatch messages have failed due to incorrect Slack usernames from internal database and missing username objects due to Slack's API changes.

/cc @zmully @oliikit @ianshward

User testing

We need to test both the broadcast and self-service work flow with folks outside of the security team - to get a sense of how simple it is to use, how straight forward.

Alongside testing with a handful of specific people, as a PoC we'll be doing our initial test with the DC office, sending a broadcast alert instructing how to register for Jamf and the deadline for doing so. Alert text can be found here. Will track feedback in this ticket.

Round 1

Round 2

@drboyer

/cc @zmully @oliikit

Staging dispatch

@k-mahoney what would it take to get a staging dispatch in place, whereby we could start sending messages to a staging dispatch-incoming and DM ourselves in the Mapbox Slack org? Getting staging dispatch-incoming and dispatch-triage stacks up is straight forward, but I have less of an idea what's required to hook up Slack. Does it make sense timing-wise to set this up now?

cc @oliikit

Dispatch Slack app description

Before we 🛳️, we need to edit the info about for the Slack app so users can know what dispatch is. We should link to the internal documentation (https://github.com/mapbox/security/issues/533).

It would also be nice to have a custom avatar for dispatch, but that's not required for the completion of this ticket.

Support for announce-only tasking

Sometimes we'll want to use dispatch to DM + create GH issue for folks, but, not give the option to escalate the incident to Pagerduty since doing so would not serve a helpful purpose. An example is where we want to task a set of individuals with going through the initial enrollment of their laptop into Jamf.

Questions related to this are:

Do you have a single button in the Slack message, or, do you just send a non-interactive Slack DM message with a link to the Github issue? I lean toward the latter.
How do we capture this in the message spec? Is it announce?

{
  priority: high | self_service | announce
}

cc @k-mahoney @oliikit

Volume testing broadcast messages

Make sure we can:

Dispatch a broadcast message to the entire company (say 300 for now)
Dispatch a self-service message to entire company

Then double the volume to see if we hit any API limits in GitHub (self-service) or Slack (either type).

Expand GitHub issue ticket functionality

In working through implementing out first broadcast and self-service alerts, we've found that more passive error reporting in the associated GitHub issue would be ideal - rather than failing outright or escalating immediately to PD. For Slack errors specifically, @zmully has laid out a lot of the initial work on this in #78.

Similarly, it would make sense to have a dispatch route that simply ingested the incoming SNS and opened an issue ticket and tagged a specified team or team member. We currently have priority, self-service, and broadcast alert types - so maybe this could be low-priority.

dispatch-oracle: knowing who to DM and tag

In order to know who to DM on Slack, as well as tag in Github issues, for rules like https://github.com/mapbox/patrol-service/issues/137 we need a way to look up a user's Slack and Github. On one hand we could assume that all messages sent to dispatch should already contain who to DM and who to tag. However, having to bake that into every single patrol rule would not be very convenient. I'm leaning toward that we should support two ways (example message json sent to dispatch. Property names are just examples and not implying we call them this):

{
  recipient: {
    slack: "@ianshward",
    github: "@ianshward"
  }
}

In this format, we tell dispatch we know who to send the message to

So what does our REST endpoint do

For our REST endpoint (dispatch-oracle), it'd be an api-gateway-fronted lambda. The lambda code would fetch the latest version of our internal list of GitHub and Slack handles.

The interface could look like:

lookup('@ianshward')

What do other peoples' REST endpoints do

Assuming we open source dispatch, people can plug in a URL to their "oracle endpoint" and it can work however they want it to. It should just return JSON in a specified manner.

Where does this live

This dispatch-oracle should be a separate GH project, since if we open source Dispatch, dispatch-oracle is very custom to our use case and we would not want to make it publicly available.

Is it overkill

I don't think so. This is one of the most critical aspects of dispatch - knowing who to assign a task to, and, ensuring dispatch always has a very up-to-date directory (which /us does for us already).

cc @k-mahoney @oliikit

Adapt to lambda-cfn 1r1s

@zmully has a branch of lambda-cfn for 1r1s. We should try to switch to that branch sooner than later.

cc @oliikit @k-mahoney

Load testing self-service messages

See #51 for background on broadcast message load testing. Self-service messages will be more difficult to control concurrency for as the current dispatch architecture for self-service dispatches map a single SNS event to a single self-service dispatch lambda function. As Lambda has no concurrency controls other than an account invocation limit, if 200 messages are dispatched, Lambda could invoke all 200 Dispatch-incoming lambda functions at once, and Slack would rate-limit some number of them.

Possibilities:

use a single SNS message with multiple records, invoking a single lambda, then concurrency control can be implemented within the lambda. At large volumes, it may not be possible to process all messages before the lambda timeout is reached.
implement a dead letter queue for rate limited messages.
update the default Slack webclient retry policy (this is undocumented: https://github.com/slackapi/node-slack-sdk/blob/33b87762225218ae74828d336127ce5773f6e19d/lib/clients/client.js). The webclient uses https://github.com/tim-kos/node-retry to do this, and the default policy for the webclient is: https://github.com/slackapi/node-slack-sdk/blob/33b87762225218ae74828d336127ce5773f6e19d/lib/clients/retry-policies.js#L28-L32 which equates to a max of 10 attempts in 30 minutes. A suggested retry policy for testing: try 20 times for 5 minutes, exponential backoff, randomizing retry window, with the last retry at 5 minutes:

{
  forever: true,
  retries: 20, 
  factor: 1.22226,
  maxTimeout: 5 * 60 * 1000,
  randomize: true
}

Update/add docs for open sourcing

As part of open sourcing Dispatch we should update our existing documentation as well as add new docs.

Update existing

README.md
MESSAGE-SPEC.md

Add new

~~[ ] CONTRIBUTING.md~~ using a Contributing section of the README instead
CHANGELOG.md

@k-mahoney noted that for the MESSAGE-SPEC, we should be clear about their needing to be an outside oracle that stores username mappings, and that this could be either a service or a file.

Release Dispatch Slack app

Carrying on from https://github.com/mapbox/dispatch/issues/63, we have the option to potentially release dispatch as a Slack application. Currently we include documentation for setting it up as a private Workspace application, it'd be great to alleviate the manual steps we can.

Will need to explore the Slack app release process further, as well as determine whether custom parameters like the API Gateway URL and SNS Topic are possible in a released application.

TODOs in dispatch-triage

There are about five TODOs in dispatch-triage which I left open related to the finalization of the message spec, and, uncertainty I had around what would be where in terms of the payload that comes from slack. There are also a couple cases of adding proper error handling. They're all identified with a TODO in the dispatch-triage function.

cc @k-mahoney @oliikit

Create wrapper around `dke` to make easier to write tests for lambdas

In order to make writing tests, we need to write a wrapper around dke and stick in ./lib/utils and use that wrapper in the lambdas. The wrapper can check for NODE_ENV=test and in that case would not actually try to decrypt the env, but instead it'd do nothing and just return the process.env as-is.

cc @oliikit

Add body to Pagerduty message body for self service dispatches

The PD message body is currently empty, the incident title is the slack message body.

Update readme documentation for dispatch

Readme needs to be updated before we 🛳️. We can start off with the overall purpose of dispatch and our motivation for developing the system, followed by the more technical documentation:

overall architecture
how to send a message to dispatch
how to start up your own dispatch service

The answers to "why dispatch is a thing" and what purposes security will use it for should reside in /security. (Ticket-https://github.com/mapbox/security/issues/533)

cc @mapbox/security

Open sourcing Dispatch

Creating an umbrella ticket to discuss and track how to open source Dispatch.

We're dividing tasks into two categories:

bare minimum clean up, sanitization, and parameterization that must be done before "flipping the switch" to make Dispatch open source
improvements and fixes that can technically be done after open sourcing but are still critical to Dispatch's success and adoption as an open source project

The first category of tasks won't take long at all, but it'll be important to knock these out of the way quickly so we can spend more of our week working on the features in the second category. We don't necessarily have to open source the repo after all of category 1 is complete - but we want to be able to confidently do this by the end of next week (Friday, November 10th).

@k-mahoney gardened the Dispatch project to move all tasks from category 1 to the "Current Phase". Once those are done (or almost done) we can start moving high priority category 2 items to the "Current phase".

Category 1:

Code clean up, especially adding a license (https://github.com/mapbox/dispatch/issues/37)
Make KMS configurable (https://github.com/mapbox/dispatch/issues/95)
Update GitHub repository parameter (https://github.com/mapbox/dispatch/issues/98)
Update/add docs for open sourcing (https://github.com/mapbox/dispatch/issues/96) (this is somewhere between category 1 and category 2)
Dispatch mascot graphic (https://github.com/mapbox/design/issues/209)

Issue already in Category 2 for sure:

Low priority PagerDuty alerts (https://github.com/mapbox/dispatch/issues/81)
More helpful and descriptive PagerDuty alerts (https://github.com/mapbox/dispatch/issues/89)

Mapbox-machine-dispatcher can't tag mapbox security

Dispatch can't tag @mapbox/security because they aren't in the Mapbox org. Is there a better way we can tag security when there's an undefined message sent out?

Dashboard of Dispatch activity in Sumologic

So we can keep 👀 on dispatch's activity, we should have a dashboard of it's activities which should include, but is not limited to:

dispatch oracle's directory creation (should be every 2 hours)
requests to oracle
messages sent to users which includes:
- message content
- if the user took action
any errors

thoughts on test organization

@oliikit I took a look at the gh-tests branch. I know this is still in progress, but I've got some test organization thoughts that may be helpful. Here goes:

Create a ./test/fixtures directory
- sns.js looks like it's a fixture. is so, you can stick that into the ./test/fixtures directory
- we'll likely have a few more fixtures
Put the tests for the lambdas in sub directories, like:
- ./test/triage/triage.test.js
- ./test/incoming/incoming.test.js
- Based on yesterday's meeting, we're going to rename the code files from dispatch-triage.js and dispatch-incoming.js to just triage.js and incoming.js so these test names will follow suit.
Sounds good to keep the ./lib tests in ./test/lib like you've got them.
I'm not sure where patrol-sns-examples.js belongs yet... it looks like a fixture. I know this was already there. You could put it in ./test/fixtures for now

cc @k-mahoney

Fine tuning the Slack interactive message flow

As we work through more detailed UX testing, a number of minor issues with the interactive Slack issue flow have come to light. Opening this to track them.

"Send reminder" functionality

For the future backlog - for tasks that create GH issues to track a team member's progress on completing an action, especially those where we want dispatch to track state as to who has and has not taken an action when doing so is otherwise impossible to track from some other data source we have access to, we could have a remind CLI command which would work like:

$ remind "title of issue" "text of reminder"
dispatch-remind looks up all the GH issues open that match the title
dispatch-remind posts a reminder comment to remind folks they still need to complete the action

Interactive slack buttons support return values

Currently the interactive slack buttons return the value of the triage function's successful callback, so the user gets a message like "b3c90ad35e58: closed GitHub issue 41" when they click on an interactive button.

The Slack interactive buttons (https://api.slack.com/docs/interactive-message-field-guide) support passing a value behind the scenes which we can use to hold the value of what we'd like the response to the user to be. This is currently hardcoded in dispatch.

So the SNS message could look like:

{
   ...
        slack: {
            message: 'STRING_VALUE', 
            actions: {
                yes: 'Yes, I've enrolled!',
                yes_response: 'Thank you for enrolling your computer in JAMF!',
                no: 'Nope, I had an issue enrolling!', // Slack button text for 'no' action type
                no_response: 'We're sorry you've had issues, the Security team has been notified, and we'll be in touch!'
            }
        }
    }
}

PagerDuty serviceId passed in dispatch message body

Currently, the PagerDuty serivceId is configured as a stack parameter in dispatch-incoming and triage, but should be overrideable by the a serviceId in the Dispatch message.

For example, a self-service dispatch creates a PagerDuty incident if the user hits the "no" slack action. Right now that will always go to the service set by the stack parameter, but if the message were to look like:

body: { // required
        github: { ... },
        slack: { ... }, 
       pagerduty: {
            service: 'PDNCI9', //optional, defaults to stack parameter
            title: 'Something' //optional, defaults to github title
            body: 'Something' //optional, defaults to github body
       }
    }

Protect dispatch-triage api-gateway

We should look at how to add authentication on the dispatch-triage api-gateway. I believe we could have api-gateway require an API token, and then pass the dispatch-triage api-gateway API token to the dispatch-incoming stack as a CFN parameter... or concatenated onto the action URL parameter (which is the api-gateway REST endpoint URL).

cc @k-mahoney @oliikit

Add labels to Github issues

We should support adding labels to Github issues. This will allow us to manage /dispatch-alerts with what's been looked at with the L1.

The label that is created should be tied to the inbound SNS message.

Retrigger: false - add option to check closed issues before creating new GitHub issue

I'd like add an option to the Dispatch message spec (and the relevant code in Dispatch itself) so that Dispatch also checks for closed GitHub issues with the same title before deciding to create a new issue + send a Slack DM.

The ultimate goal is to leverage Dispatch for checking state (in the GitHub issues) and for the code in my Patrol rule to avoid having to manage state.

Consider the following use case:

A patrol-rule-google function checks for new public Google Drive documents. It can't use push notifications so it instead uses polling and returns the 50-100 most recent public file ACL changes in Google Drive. For the sake of not worrying about existing public Google Drive docs, it won't return any data before a specified date.
This rule sends an SNS message to Dispatch using the message spec.
Dispatch creates a GitHub issue with a title like "User X created a public Google Drive document on ." Due to the combination of username and timestamp, this title is unique (for our purposes). It also sends a Slack DM to the user.
The user responds to the ticket and makes the Google Drive document private. They close the GitHub issue.
Our patrol rule runs again after 30 minutes and since it's polling, returns the same data from Step 1. Since the original GitHub issue was closed, Dispatch creates a new GitHub issue (Dispatch currently only searches for open issues) and sends another Slack DM to the user. The user closes the issue and the process keeps repeating. This results in spamming the user and is highly undesirable.

/cc @k-mahoney @zmully @ianshward

Add package-lock.json

We should add a package-lock.json file to this project to avoid issues like https://github.com/mapbox/dispatch/issues/84 in the future.

Per the npm docs on package-lock.json:

package-lock.json is automatically generated for any operations where npm modifies either the node_modules tree, or package.json. It describes the exact tree that was generated, such that subsequent installs are able to generate identical trees, regardless of intermediate dependency updates.

This file is intended to be committed into source repositories

/cc @oliikit @zmully @k-mahoney

Integrating with Dispatch-oracle

So dispatch-oracle is created.

As #22 pointed out, we should create a utils function within Dispatch that we can provide a REST endpoint to lookup who we're trying to retrieve. The lookup function should look like

lookup('oliikit') // looking up Github handle
lookup('ASADSKLABA1232') // looking up an asset

The parameters may change for us based on optimizing the oracle's code (per https://github.com/mapbox/dispatch-oracle/issues/3#issuecomment-324465446).

The oracle's README covers how the request looks like and its response.

cc @ianshward @k-mahoney @zmully

Code clean-up tasks

Ticket to track assorted, lingering code clean-up tasks, in preparation for future open source plans.

/cc @oliikit @ianshward

Handle users with different github and slack handles

Currently Dispatch assumes both github and slack handles are the same, and uses the github handle returned from the Oracle for both operations. It's possible that users (especially new users) could have different handles, and it's possible that in the future this will not be a safe assumption as github handles are globally unique, while slack handles are only unique to the company (I believe).

The Oracle already returns both in its response.

Commandline tool for manually triggering dispatch events

For many incidents, the appropriate course of action would be to open a master issue to triage and coordinate with then manually trigger a Dispatch event to notify affected users.

For instance, a credential leak requires rotation of X users credentials. While triaging the main incident, a Dispatch self-service event would be triggered from the commandline something like:

$ dispatch self-service --users affectedUsers.json --message message.json

The users file would be the array to be passed to the Oracle for lookup:
[ 'user1', 'user2', 'user3', 'user4' ]

The message json file would be appropriate the body object from the Dispatch message specification, for example a self-service message:

{
      github: {
            title: 'User1 NPM credentials require immediate rotation'
            body: 'We have detected that your NPM credentials may have been compromised. Rotate your credentials immediately. Please see issue #XXX for details and instructions'
        }
        slack: {
            message: 'Your NPM credentials may have been compromised, please rotate your NPM tokens immediately, following the instructions in issue #XXX. If you have any questions please ask in #security',
            actions: {
                yes: 'I have rotated all my NPM tokens',
                no: 'I'm busy, remind me later!', 
            }
        }
}

In this case of a self-service message, the dispatch CLI would:

do a CFN look up for the SNS topic of the dispatch-incoming-production stack
Query the Oracle with the user array
Perform some basic error handling and defaults on the Oracle response
Generate Dispatch messages and inject into the Dispatch SNS topic

Support more than just a single Github repository

Dispatch should allow for dispatch messages to define to what Github repository a message gets posted, instead of having this be a single repository. This entails at least changing the message spec. to support a repo parameter in the github property, and having the code use this to determine where to post the message.

Interactive dispatch generates two slack messages

@k-mahoney could not get the "replace_original": "false" flag to work so went with generating two messages instead. The issue here is that the action response (coming from the triage function) will by default overwrite the original message. By using two messages, only the second message, containing the "prompt" and actions gets overwritten, leaving the original alert body visible to the user.

https://api.slack.com/interactive-messages#building_workflows

Tests for slack integration

Creating a ticket to track this todo. @k-mahoney has a PR in progress for tests around all of the Slack usage in this project.

Better error handling when parsing message

If this JSON.parse is done in a try/catch it'd be possible to point out incorrectly formatted json when the parsing fails. We've seen at least one bad message come through:

SyntaxError: Unexpected token } in JSON at position 174
at Object.parse (native)
at /var/task/incoming/function.js:28:26
at /var/task/lib/utils.js:15:7
at /var/task/node_modules/decrypt-kms-env/index.js:21:5
at Queue._call (/var/task/node_modules/decrypt-kms-env/index.js:67:5)
at maybeNotify (/var/task/node_modules/d3-queue/build/d3-queue.js:120:7)
at /var/task/node_modules/d3-queue/build/d3-queue.js:91:12
at Response. (/var/task/node_modules/decrypt-kms-env/index.js:61:9)
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:364:18)
at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:38:9)

cc @k-mahoney @alulsh

Expand GitHub issue ticket functionality

Message specification

What information do we need from the patrol SNS message for dispatch to work?

We'll need to have content to post to both the GitHub issue and Slack, as well as directions on how to resolve the issue if necessary. The GitHub username will be provided in the case of GitHub related alarms which will need to be mapped to the correct Slack username.

Tentative list:

We can add to this and further develop structure as we work through getting a working skeleton this week. I added some tentative JSON test objects for the time being that I'm using for Slack testing. These are by no means set in stone, just something to work with.

/cc @mapbox/security

Remove Node 4 tests from Travis

Looks like we're still testing for Node 4 compatibility on Travis

https://github.com/mapbox/dispatch/blob/3579a68e48381c83b68f7f6824356dcce503b770/.travis.yml#L4

even though we only support Node 6

https://github.com/mapbox/dispatch/blob/3579a68e48381c83b68f7f6824356dcce503b770/package.json#L7

@oliikit @zmully @k-mahoney - any reason why we need these Node 4 tests or can we remove them?

Tests are failing on master

Tests were passing on master 4 days ago on Friday September 22nd, see https://travis-ci.com/mapbox/dispatch/builds/55238770. As of today they are now failing, see https://travis-ci.com/mapbox/dispatch/builds/55455033.

@oliikit first reported this while working on https://github.com/mapbox/dispatch/pull/83.

I added .log(console.log) to the failing nock test and got the following result:

matching https://api.pagerduty.com:443 to POST https://api.pagerduty.com:443/incidents: true
bodies don't match:
 { incident:
   { type: 'incident',
     title: '6cf9397c71e2: user kara responded \'no\' for self-service issue 7',
     service: { id: 'XXXXXXX', type: 'service_reference' },
     incident_key: '6cf9397c71e2' } }
 {"incident":{"type":"incident","title":"6cf9397c71e2: user kara responded 'no' for self-service issue 7","service":{"id":"XXXXXXX","type":"service_reference"},"incident_key":"6cf9397c71e2","body":{"type":"incident_body","details":"6cf9397c71e2: user kara responded 'no' for self-service issue 7\n\n https://github.com/testOwner/testRepo/issues/7"}}}

For some reason or another either PD sends a body property now or the body property was dropped (unsure which is the real request and which is the nock).

/cc @mapbox/security @oliikit

using the awscli sns to trigger dispatch-incoming

self-service

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"self-service\",\"users\":[\"USER\"],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"

high

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"high\",\"body\":{\"pagerduty\":{\"title\":\"pagerduty title\"}}}}"

replace the SNS_ARN with the correct SNS arn for the function under test.

CloudFormation parameter updates

Currently the GitHub repository dispatch tickets are opened in is set via the CloudFormation template. This should be updated such that the repository specified in the template acts as the default or fallback destination for dispatch tickets and allow a repository to be specified in the message specification. This way different dispatch alerts can be ticketed in specific repositories, rather than all defaulting to one.

/cc @alulsh

mapbox / deprecated-dispatch Goto Github PK

deprecated-dispatch's Introduction

Dispatch alert types

Architecture

Prerequisites

Lambda-cfn

Third party services

Set up

1. Configure GitHub

2. Configure PagerDuty

3. Configure Slack

4. Configure AWS Key Management Service (KMS)

5. Deploy the dispatch-incoming AWS Lambda function

6. Deploy the dispatch-triage AWS Lambda function

7. Update the Dispatch Slack app with the dispatch-triage API Gateway URL

Testing

Self-service example

Broadcast example

High priority example

Low priority example

Nag example

Development

Installation

Tests

Feature Roadmap

Contributing

deprecated-dispatch's People

Contributors

Stargazers

Watchers

Forkers

deprecated-dispatch's Issues

So what does our REST endpoint do

What do other peoples' REST endpoints do

Where does this live

Is it overkill

Update existing

Add new

Recommend Projects

Recommend Topics

Recommend Org