Giter VIP home page Giter VIP logo

deprecated-dispatch's Issues

Unique identifier per dispatch run for logging

@zmully and I discussed having a unique identifier for all logging output associated with a single dispatch run, e.g. dispatch-incoming -> dispatch-oracle -> dispatch-triage. It will be difficult to resolve any issues with one of these steps if we can't identify all the associated logs.

I'm going to take a look at the triggering SNS message metadata and the dispatch-incoming lambda context object to see if there is an ideal unique identifier we can pass with all logs from the three lambda functions associated with a single dispatch run.

/cc @oliikit @ianshward

Update readme documentation for dispatch

Readme needs to be updated before we ๐Ÿ›ณ๏ธ. We can start off with the overall purpose of dispatch and our motivation for developing the system, followed by the more technical documentation:

  • overall architecture
  • how to send a message to dispatch
  • how to start up your own dispatch service

The answers to "why dispatch is a thing" and what purposes security will use it for should reside in /security. (Ticket-https://github.com/mapbox/security/issues/533)

cc @mapbox/security

Refactoring to work with Slack's API changes

Slack announced that they are removing the username object from their API and will now only support a mutable display_name object and the full user id - W012A3CDE.

https://api.slack.com/changelog/2017-09-the-one-about-usernames

Currently we track the username in our employee database, which is what we query to associate the GitHub username or other IDs from the initial alarm with the Slack user. We will no longer be able to rely on this.

Some initial thoughts on how to pivot and account for this change...

  • Replace the stored Slack username with the full, immutable user id in internal database
  • Further improve fallback and error handling for missing username cases

/cc @mapbox/security

TODOs in dispatch-triage

There are about five TODOs in dispatch-triage which I left open related to the finalization of the message spec, and, uncertainty I had around what would be where in terms of the payload that comes from slack. There are also a couple cases of adding proper error handling. They're all identified with a TODO in the dispatch-triage function.

cc @k-mahoney @oliikit

Handle users with different github and slack handles

Currently Dispatch assumes both github and slack handles are the same, and uses the github handle returned from the Oracle for both operations. It's possible that users (especially new users) could have different handles, and it's possible that in the future this will not be a safe assumption as github handles are globally unique, while slack handles are only unique to the company (I believe).

The Oracle already returns both in its response.

Integrating with Dispatch-oracle

So dispatch-oracle is created.

As #22 pointed out, we should create a utils function within Dispatch that we can provide a REST endpoint to lookup who we're trying to retrieve. The lookup function should look like

lookup('oliikit') // looking up Github handle
lookup('ASADSKLABA1232') // looking up an asset

The parameters may change for us based on optimizing the oracle's code (per https://github.com/mapbox/dispatch-oracle/issues/3#issuecomment-324465446).

The oracle's README covers how the request looks like and its response.

cc @ianshward @k-mahoney @zmully

Add labels to Github issues

We should support adding labels to Github issues. This will allow us to manage /dispatch-alerts with what's been looked at with the L1.

The label that is created should be tied to the inbound SNS message.

Release Dispatch Slack app

Carrying on from https://github.com/mapbox/dispatch/issues/63, we have the option to potentially release dispatch as a Slack application. Currently we include documentation for setting it up as a private Workspace application, it'd be great to alleviate the manual steps we can.

Will need to explore the Slack app release process further, as well as determine whether custom parameters like the API Gateway URL and SNS Topic are possible in a released application.

Open sourcing Dispatch

Creating an umbrella ticket to discuss and track how to open source Dispatch.

We're dividing tasks into two categories:

  1. bare minimum clean up, sanitization, and parameterization that must be done before "flipping the switch" to make Dispatch open source
  2. improvements and fixes that can technically be done after open sourcing but are still critical to Dispatch's success and adoption as an open source project

The first category of tasks won't take long at all, but it'll be important to knock these out of the way quickly so we can spend more of our week working on the features in the second category. We don't necessarily have to open source the repo after all of category 1 is complete - but we want to be able to confidently do this by the end of next week (Friday, November 10th).

@k-mahoney gardened the Dispatch project to move all tasks from category 1 to the "Current Phase". Once those are done (or almost done) we can start moving high priority category 2 items to the "Current phase".

Category 1:

Issue already in Category 2 for sure:

Protect dispatch-triage api-gateway

We should look at how to add authentication on the dispatch-triage api-gateway. I believe we could have api-gateway require an API token, and then pass the dispatch-triage api-gateway API token to the dispatch-incoming stack as a CFN parameter... or concatenated onto the action URL parameter (which is the api-gateway REST endpoint URL).

cc @k-mahoney @oliikit

Tests are failing on master

Tests were passing on master 4 days ago on Friday September 22nd, see https://travis-ci.com/mapbox/dispatch/builds/55238770. As of today they are now failing, see https://travis-ci.com/mapbox/dispatch/builds/55455033.

@oliikit first reported this while working on https://github.com/mapbox/dispatch/pull/83.

I added .log(console.log) to the failing nock test and got the following result:

matching https://api.pagerduty.com:443 to POST https://api.pagerduty.com:443/incidents: true
bodies don't match:
 { incident:
   { type: 'incident',
     title: '6cf9397c71e2: user kara responded \'no\' for self-service issue 7',
     service: { id: 'XXXXXXX', type: 'service_reference' },
     incident_key: '6cf9397c71e2' } }
 {"incident":{"type":"incident","title":"6cf9397c71e2: user kara responded 'no' for self-service issue 7","service":{"id":"XXXXXXX","type":"service_reference"},"incident_key":"6cf9397c71e2","body":{"type":"incident_body","details":"6cf9397c71e2: user kara responded 'no' for self-service issue 7\n\n https://github.com/testOwner/testRepo/issues/7"}}}

For some reason or another either PD sends a body property now or the body property was dropped (unsure which is the real request and which is the nock).

/cc @mapbox/security @oliikit

thoughts on test organization

@oliikit I took a look at the gh-tests branch. I know this is still in progress, but I've got some test organization thoughts that may be helpful. Here goes:

  • Create a ./test/fixtures directory
    • sns.js looks like it's a fixture. is so, you can stick that into the ./test/fixtures directory
    • we'll likely have a few more fixtures
  • Put the tests for the lambdas in sub directories, like:
    • ./test/triage/triage.test.js
    • ./test/incoming/incoming.test.js
    • Based on yesterday's meeting, we're going to rename the code files from dispatch-triage.js and dispatch-incoming.js to just triage.js and incoming.js so these test names will follow suit.
  • Sounds good to keep the ./lib tests in ./test/lib like you've got them.
  • I'm not sure where patrol-sns-examples.js belongs yet... it looks like a fixture. I know this was already there. You could put it in ./test/fixtures for now

cc @k-mahoney

PagerDuty serviceId passed in dispatch message body

Currently, the PagerDuty serivceId is configured as a stack parameter in dispatch-incoming and triage, but should be overrideable by the a serviceId in the Dispatch message.

For example, a self-service dispatch creates a PagerDuty incident if the user hits the "no" slack action. Right now that will always go to the service set by the stack parameter, but if the message were to look like:

body: { // required
        github: { ... },
        slack: { ... }, 
       pagerduty: {
            service: 'PDNCI9', //optional, defaults to stack parameter
            title: 'Something' //optional, defaults to github title
            body: 'Something' //optional, defaults to github body
       }
    }

Support for announce-only tasking

Sometimes we'll want to use dispatch to DM + create GH issue for folks, but, not give the option to escalate the incident to Pagerduty since doing so would not serve a helpful purpose. An example is where we want to task a set of individuals with going through the initial enrollment of their laptop into Jamf.

Questions related to this are:

  • Do you have a single button in the Slack message, or, do you just send a non-interactive Slack DM message with a link to the Github issue? I lean toward the latter.
  • How do we capture this in the message spec? Is it announce?
{
  priority: high | self_service | announce
}

cc @k-mahoney @oliikit

Dispatch responses returned as JSON by API Gateway

The default lambda-cfn config treats anything returned by the function as JSON. With Dispatch triage this means the response to the user is always quoted. NBD, but kind of annoying. The default API-gateway response mapping would need to be updated for triage.

image

Expand GitHub issue ticket functionality

In working through implementing out first broadcast and self-service alerts, we've found that more passive error reporting in the associated GitHub issue would be ideal - rather than failing outright or escalating immediately to PD. For Slack errors specifically, @zmully has laid out a lot of the initial work on this in #78.

Similarly, it would make sense to have a dispatch route that simply ingested the incoming SNS and opened an issue ticket and tagged a specified team or team member. We currently have priority, self-service, and broadcast alert types - so maybe this could be low-priority.

Staging dispatch

@k-mahoney what would it take to get a staging dispatch in place, whereby we could start sending messages to a staging dispatch-incoming and DM ourselves in the Mapbox Slack org? Getting staging dispatch-incoming and dispatch-triage stacks up is straight forward, but I have less of an idea what's required to hook up Slack. Does it make sense timing-wise to set this up now?

cc @oliikit

Add package-lock.json

We should add a package-lock.json file to this project to avoid issues like https://github.com/mapbox/dispatch/issues/84 in the future.

Per the npm docs on package-lock.json:

package-lock.json is automatically generated for any operations where npm modifies either the node_modules tree, or package.json. It describes the exact tree that was generated, such that subsequent installs are able to generate identical trees, regardless of intermediate dependency updates.

This file is intended to be committed into source repositories

/cc @oliikit @zmully @k-mahoney

Dashboard of Dispatch activity in Sumologic

So we can keep ๐Ÿ‘€ on dispatch's activity, we should have a dashboard of it's activities which should include, but is not limited to:

  • dispatch oracle's directory creation (should be every 2 hours)
  • requests to oracle
  • messages sent to users which includes:
    • message content
    • if the user took action
  • any errors

"Send reminder" functionality

For the future backlog - for tasks that create GH issues to track a team member's progress on completing an action, especially those where we want dispatch to track state as to who has and has not taken an action when doing so is otherwise impossible to track from some other data source we have access to, we could have a remind CLI command which would work like:

  • $ remind "title of issue" "text of reminder"

  • dispatch-remind looks up all the GH issues open that match the title
  • dispatch-remind posts a reminder comment to remind folks they still need to complete the action

Code clean-up tasks

Ticket to track assorted, lingering code clean-up tasks, in preparation for future open source plans.

  • Merge linter https://github.com/mapbox/dispatch/pull/36
  • Add Travis
  • Abstract dispatch-incoming GitHub issue creation to function
  • Abstract dispatch-incoming Slack alert post to function
  • Normalize styling
  • Normalize test structure
  • Add in-line function documentation
  • Expand tests in dispatch-incoming
    • Tests for decrypt failure behavior
    • Add tests for error handling in self-service events
    • Add tests for error handling in PD events

/cc @oliikit @ianshward

Adapt a patrol rule to use Dispatch

Which patrol rule should we adapt to use Dispatch? How about 2FA disabled on Github ? What does it actually take to adapt a rule to use Dispatch?

The patrol rule needs to send an SNS message to dispatch. To do this there needs to be an SNS IAM policy in place to allow the lambda to post to the dispatch SNS topic. @zmully i partly recall you discussing this. Is this supported in lambda-cfn?

Otherwise, it looks like what it will take is swapping out message with code that sends a dispatch-formatted message to dispatch's SNS topic. Maybe we could ship a client with some error handling in dispatch, so you could do, within patrol-rules-github:

var client = require('dispatch').client
client.send({
  // properties for the message
}, function(err, res) {

}

@k-mahoney @zmully do you think it makes sense to start w/ this rule first, and, what do you think about the steps and the client?

Support more than just a single Github repository

Dispatch should allow for dispatch messages to define to what Github repository a message gets posted, instead of having this be a single repository. This entails at least changing the message spec. to support a repo parameter in the github property, and having the code use this to determine where to post the message.

using the awscli sns to trigger dispatch-incoming

self-service

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"self-service\",\"users\":[\"USER\"],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"

high

aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"high\",\"body\":{\"pagerduty\":{\"title\":\"pagerduty title\"}}}}"

replace the SNS_ARN with the correct SNS arn for the function under test.

Retrigger: false - add option to check closed issues before creating new GitHub issue

I'd like add an option to the Dispatch message spec (and the relevant code in Dispatch itself) so that Dispatch also checks for closed GitHub issues with the same title before deciding to create a new issue + send a Slack DM.

The ultimate goal is to leverage Dispatch for checking state (in the GitHub issues) and for the code in my Patrol rule to avoid having to manage state.

Consider the following use case:

  1. A patrol-rule-google function checks for new public Google Drive documents. It can't use push notifications so it instead uses polling and returns the 50-100 most recent public file ACL changes in Google Drive. For the sake of not worrying about existing public Google Drive docs, it won't return any data before a specified date.
  2. This rule sends an SNS message to Dispatch using the message spec.
  3. Dispatch creates a GitHub issue with a title like "User X created a public Google Drive document on ." Due to the combination of username and timestamp, this title is unique (for our purposes). It also sends a Slack DM to the user.
  4. The user responds to the ticket and makes the Google Drive document private. They close the GitHub issue.
  5. Our patrol rule runs again after 30 minutes and since it's polling, returns the same data from Step 1. Since the original GitHub issue was closed, Dispatch creates a new GitHub issue (Dispatch currently only searches for open issues) and sends another Slack DM to the user. The user closes the issue and the process keeps repeating. This results in spamming the user and is highly undesirable.

/cc @k-mahoney @zmully @ianshward

Load testing self-service messages

See #51 for background on broadcast message load testing. Self-service messages will be more difficult to control concurrency for as the current dispatch architecture for self-service dispatches map a single SNS event to a single self-service dispatch lambda function. As Lambda has no concurrency controls other than an account invocation limit, if 200 messages are dispatched, Lambda could invoke all 200 Dispatch-incoming lambda functions at once, and Slack would rate-limit some number of them.

Possibilities:

{
  forever: true,
  retries: 20, 
  factor: 1.22226,
  maxTimeout: 5 * 60 * 1000,
  randomize: true
}

Update/add docs for open sourcing

As part of open sourcing Dispatch we should update our existing documentation as well as add new docs.

Update existing

  • README.md
  • MESSAGE-SPEC.md

Add new

  • [ ] CONTRIBUTING.md using a Contributing section of the README instead
  • CHANGELOG.md

@k-mahoney noted that for the MESSAGE-SPEC, we should be clear about their needing to be an outside oracle that stores username mappings, and that this could be either a service or a file.

Interactive slack buttons support return values

Currently the interactive slack buttons return the value of the triage function's successful callback, so the user gets a message like "b3c90ad35e58: closed GitHub issue 41" when they click on an interactive button.

The Slack interactive buttons (https://api.slack.com/docs/interactive-message-field-guide) support passing a value behind the scenes which we can use to hold the value of what we'd like the response to the user to be. This is currently hardcoded in dispatch.

So the SNS message could look like:

{
   ...
        slack: {
            message: 'STRING_VALUE', 
            actions: {
                yes: 'Yes, I've enrolled!',
                yes_response: 'Thank you for enrolling your computer in JAMF!',
                no: 'Nope, I had an issue enrolling!', // Slack button text for 'no' action type
                no_response: 'We're sorry you've had issues, the Security team has been notified, and we'll be in touch!'
            }
        }
    }
}

dispatch-oracle: knowing who to DM and tag

In order to know who to DM on Slack, as well as tag in Github issues, for rules like https://github.com/mapbox/patrol-service/issues/137 we need a way to look up a user's Slack and Github. On one hand we could assume that all messages sent to dispatch should already contain who to DM and who to tag. However, having to bake that into every single patrol rule would not be very convenient. I'm leaning toward that we should support two ways (example message json sent to dispatch. Property names are just examples and not implying we call them this):

{
  recipient: {
    slack: "@ianshward",
    github: "@ianshward"
  }
}

In this format, we tell dispatch we know who to send the message to

So what does our REST endpoint do

For our REST endpoint (dispatch-oracle), it'd be an api-gateway-fronted lambda. The lambda code would fetch the latest version of our internal list of GitHub and Slack handles.

The interface could look like:

lookup('@ianshward')

What do other peoples' REST endpoints do

Assuming we open source dispatch, people can plug in a URL to their "oracle endpoint" and it can work however they want it to. It should just return JSON in a specified manner.

Where does this live

This dispatch-oracle should be a separate GH project, since if we open source Dispatch, dispatch-oracle is very custom to our use case and we would not want to make it publicly available.

Is it overkill

I don't think so. This is one of the most critical aspects of dispatch - knowing who to assign a task to, and, ensuring dispatch always has a very up-to-date directory (which /us does for us already).

cc @k-mahoney @oliikit

User testing

We need to test both the broadcast and self-service work flow with folks outside of the security team - to get a sense of how simple it is to use, how straight forward.

Alongside testing with a handful of specific people, as a PoC we'll be doing our initial test with the DC office, sending a broadcast alert instructing how to register for Jamf and the deadline for doing so. Alert text can be found here. Will track feedback in this ticket.

Round 1

Round 2

/cc @zmully @oliikit

Volume testing broadcast messages

Make sure we can:

  • Dispatch a broadcast message to the entire company (say 300 for now)
  • Dispatch a self-service message to entire company

Then double the volume to see if we hit any API limits in GitHub (self-service) or Slack (either type).

Do not require a GitHub issue for broadcast

We should have the option to send a broadcast message without creating a GitHub issue. Have some initial work on this, iterated through discussion related to Jamf enrollment alert PoC.

CloudFormation parameter updates

Currently the GitHub repository dispatch tickets are opened in is set via the CloudFormation template. This should be updated such that the repository specified in the template acts as the default or fallback destination for dispatch tickets and allow a repository to be specified in the message specification. This way different dispatch alerts can be ticketed in specific repositories, rather than all defaulting to one.

/cc @alulsh

Interactive dispatch generates two slack messages

@k-mahoney could not get the "replace_original": "false" flag to work so went with generating two messages instead. The issue here is that the action response (coming from the triage function) will by default overwrite the original message. By using two messages, only the second message, containing the "prompt" and actions gets overwritten, leaving the original alert body visible to the user.

https://api.slack.com/interactive-messages#building_workflows

Make KMS configurable

Right now the KMS key for encrypting secure CloudFormation parameters is hard coded in the CloudFormation template for both dispatch-incoming and dispatch-triage.

We should find a way to parameterize it. Hopefully this is a matter of creating a new CloudFormation parameter, then reference in the value of the parameter in the IAM statement.

/cc @k-mahoney

Message specification

What information do we need from the patrol SNS message for dispatch to work?

We'll need to have content to post to both the GitHub issue and Slack, as well as directions on how to resolve the issue if necessary. The GitHub username will be provided in the case of GitHub related alarms which will need to be mapped to the correct Slack username.

Tentative list:

  • Issue content for GitHub
  • Message content for Slack prompt
  • Button text for affirmative option (yes)
  • Button text for negative option (no)
  • GitHub username
  • AWS username
  • Slack username
  • Timestamp
  • Priority

We can add to this and further develop structure as we work through getting a working skeleton this week. I added some tentative JSON test objects for the time being that I'm using for Slack testing. These are by no means set in stone, just something to work with.

/cc @mapbox/security

Commandline tool for manually triggering dispatch events

For many incidents, the appropriate course of action would be to open a master issue to triage and coordinate with then manually trigger a Dispatch event to notify affected users.

For instance, a credential leak requires rotation of X users credentials. While triaging the main incident, a Dispatch self-service event would be triggered from the commandline something like:

$ dispatch self-service --users affectedUsers.json --message message.json

The users file would be the array to be passed to the Oracle for lookup:
[ 'user1', 'user2', 'user3', 'user4' ]

The message json file would be appropriate the body object from the Dispatch message specification, for example a self-service message:

{
      github: {
            title: 'User1 NPM credentials require immediate rotation'
            body: 'We have detected that your NPM credentials may have been compromised. Rotate your credentials immediately. Please see issue #XXX for details and instructions'
        }
        slack: {
            message: 'Your NPM credentials may have been compromised, please rotate your NPM tokens immediately, following the instructions in issue #XXX. If you have any questions please ask in #security',
            actions: {
                yes: 'I have rotated all my NPM tokens',
                no: 'I'm busy, remind me later!', 
            }
        }
}

In this case of a self-service message, the dispatch CLI would:

  1. do a CFN look up for the SNS topic of the dispatch-incoming-production stack
  2. Query the Oracle with the user array
  3. Perform some basic error handling and defaults on the Oracle response
  4. Generate Dispatch messages and inject into the Dispatch SNS topic

Better error handling when parsing message

If this JSON.parse is done in a try/catch it'd be possible to point out incorrectly formatted json when the parsing fails. We've seen at least one bad message come through:

SyntaxError: Unexpected token } in JSON at position 174
at Object.parse (native)
at /var/task/incoming/function.js:28:26
at /var/task/lib/utils.js:15:7
at /var/task/node_modules/decrypt-kms-env/index.js:21:5
at Queue._call (/var/task/node_modules/decrypt-kms-env/index.js:67:5)
at maybeNotify (/var/task/node_modules/d3-queue/build/d3-queue.js:120:7)
at /var/task/node_modules/d3-queue/build/d3-queue.js:91:12
at Response. (/var/task/node_modules/decrypt-kms-env/index.js:61:9)
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:364:18)
at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:38:9)

cc @k-mahoney @alulsh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.