mapbox / deprecated-dispatch Goto Github PK
View Code? Open in Web Editor NEWAlarm routing engine for security and platform incident response teams.
License: Other
Alarm routing engine for security and platform incident response teams.
License: Other
@zmully and I discussed having a unique identifier for all logging output associated with a single dispatch run, e.g. dispatch-incoming
-> dispatch-oracle
-> dispatch-triage
. It will be difficult to resolve any issues with one of these steps if we can't identify all the associated logs.
I'm going to take a look at the triggering SNS message metadata and the dispatch-incoming
lambda context object to see if there is an ideal unique identifier we can pass with all logs from the three lambda functions associated with a single dispatch
run.
/cc @oliikit @ianshward
Readme needs to be updated before we ๐ณ๏ธ. We can start off with the overall purpose of dispatch and our motivation for developing the system, followed by the more technical documentation:
The answers to "why dispatch is a thing" and what purposes security will use it for should reside in /security. (Ticket-https://github.com/mapbox/security/issues/533)
cc @mapbox/security
Slack announced that they are removing the username
object from their API and will now only support a mutable display_name
object and the full user id
- W012A3CDE
.
https://api.slack.com/changelog/2017-09-the-one-about-usernames
Currently we track the username
in our employee database, which is what we query to associate the GitHub username or other IDs from the initial alarm with the Slack user. We will no longer be able to rely on this.
Some initial thoughts on how to pivot and account for this change...
username
with the full, immutable user id
in internal databaseusername
cases/cc @mapbox/security
There are about five TODOs in dispatch-triage which I left open related to the finalization of the message spec, and, uncertainty I had around what would be where in terms of the payload that comes from slack. There are also a couple cases of adding proper error handling. They're all identified with a TODO
in the dispatch-triage function.
In order to make writing tests, we need to write a wrapper around dke
and stick in ./lib/utils and use that wrapper in the lambdas. The wrapper can check for NODE_ENV=test and in that case would not actually try to decrypt the env, but instead it'd do nothing and just return the process.env as-is.
cc @oliikit
Currently Dispatch assumes both github and slack handles are the same, and uses the github handle returned from the Oracle for both operations. It's possible that users (especially new users) could have different handles, and it's possible that in the future this will not be a safe assumption as github handles are globally unique, while slack handles are only unique to the company (I believe).
The Oracle already returns both in its response.
So dispatch-oracle is created.
As #22 pointed out, we should create a utils function within Dispatch that we can provide a REST endpoint to lookup who we're trying to retrieve. The lookup function should look like
lookup('oliikit') // looking up Github handle
lookup('ASADSKLABA1232') // looking up an asset
The parameters may change for us based on optimizing the oracle's code (per https://github.com/mapbox/dispatch-oracle/issues/3#issuecomment-324465446).
The oracle's README covers how the request looks like and its response.
We should support adding labels to Github issues. This will allow us to manage /dispatch-alerts with what's been looked at with the L1.
The label that is created should be tied to the inbound SNS message.
Looks like we're still testing for Node 4 compatibility on Travis
https://github.com/mapbox/dispatch/blob/3579a68e48381c83b68f7f6824356dcce503b770/.travis.yml#L4
even though we only support Node 6
https://github.com/mapbox/dispatch/blob/3579a68e48381c83b68f7f6824356dcce503b770/package.json#L7
@oliikit @zmully @k-mahoney - any reason why we need these Node 4 tests or can we remove them?
Carrying on from https://github.com/mapbox/dispatch/issues/63, we have the option to potentially release dispatch
as a Slack application. Currently we include documentation for setting it up as a private Workspace application, it'd be great to alleviate the manual steps we can.
Will need to explore the Slack app release process further, as well as determine whether custom parameters like the API Gateway URL and SNS Topic are possible in a released application.
Creating an umbrella ticket to discuss and track how to open source Dispatch.
We're dividing tasks into two categories:
The first category of tasks won't take long at all, but it'll be important to knock these out of the way quickly so we can spend more of our week working on the features in the second category. We don't necessarily have to open source the repo after all of category 1 is complete - but we want to be able to confidently do this by the end of next week (Friday, November 10th).
@k-mahoney gardened the Dispatch project to move all tasks from category 1 to the "Current Phase". Once those are done (or almost done) we can start moving high priority category 2 items to the "Current phase".
Category 1:
Issue already in Category 2 for sure:
This is just a refactor task, likely best todo after https://github.com/mapbox/dispatch/issues/6 the idea is to rename them to just:
./triage
./incoming
(getting rid of the dispatch- prefix) which is just redundant.
We should look at how to add authentication on the dispatch-triage api-gateway. I believe we could have api-gateway require an API token, and then pass the dispatch-triage api-gateway API token to the dispatch-incoming stack as a CFN parameter... or concatenated onto the action URL parameter (which is the api-gateway REST endpoint URL).
Tests were passing on master 4 days ago on Friday September 22nd, see https://travis-ci.com/mapbox/dispatch/builds/55238770. As of today they are now failing, see https://travis-ci.com/mapbox/dispatch/builds/55455033.
@oliikit first reported this while working on https://github.com/mapbox/dispatch/pull/83.
I added .log(console.log)
to the failing nock test and got the following result:
matching https://api.pagerduty.com:443 to POST https://api.pagerduty.com:443/incidents: true
bodies don't match:
{ incident:
{ type: 'incident',
title: '6cf9397c71e2: user kara responded \'no\' for self-service issue 7',
service: { id: 'XXXXXXX', type: 'service_reference' },
incident_key: '6cf9397c71e2' } }
{"incident":{"type":"incident","title":"6cf9397c71e2: user kara responded 'no' for self-service issue 7","service":{"id":"XXXXXXX","type":"service_reference"},"incident_key":"6cf9397c71e2","body":{"type":"incident_body","details":"6cf9397c71e2: user kara responded 'no' for self-service issue 7\n\n https://github.com/testOwner/testRepo/issues/7"}}}
For some reason or another either PD sends a body
property now or the body
property was dropped (unsure which is the real request and which is the nock).
/cc @mapbox/security @oliikit
The PD message body is currently empty, the incident title is the slack message body.
@oliikit I took a look at the gh-tests branch. I know this is still in progress, but I've got some test organization thoughts that may be helpful. Here goes:
cc @k-mahoney
Currently, the PagerDuty serivceId is configured as a stack parameter in dispatch-incoming and triage, but should be overrideable by the a serviceId in the Dispatch message.
For example, a self-service dispatch creates a PagerDuty incident if the user hits the "no" slack action. Right now that will always go to the service set by the stack parameter, but if the message were to look like:
body: { // required
github: { ... },
slack: { ... },
pagerduty: {
service: 'PDNCI9', //optional, defaults to stack parameter
title: 'Something' //optional, defaults to github title
body: 'Something' //optional, defaults to github body
}
}
Sometimes we'll want to use dispatch to DM + create GH issue for folks, but, not give the option to escalate the incident to Pagerduty since doing so would not serve a helpful purpose. An example is where we want to task a set of individuals with going through the initial enrollment of their laptop into Jamf.
Questions related to this are:
announce
?{
priority: high | self_service | announce
}
In working through implementing out first broadcast
and self-service
alerts, we've found that more passive error reporting in the associated GitHub issue would be ideal - rather than failing outright or escalating immediately to PD. For Slack errors specifically, @zmully has laid out a lot of the initial work on this in #78.
Similarly, it would make sense to have a dispatch
route that simply ingested the incoming SNS and opened an issue ticket and tagged a specified team or team member. We currently have priority
, self-service
, and broadcast
alert types - so maybe this could be low-priority
.
@k-mahoney what would it take to get a staging dispatch in place, whereby we could start sending messages to a staging dispatch-incoming and DM ourselves in the Mapbox
Slack org? Getting staging dispatch-incoming and dispatch-triage stacks up is straight forward, but I have less of an idea what's required to hook up Slack. Does it make sense timing-wise to set this up now?
cc @oliikit
As we work through more detailed UX testing, a number of minor issues with the interactive Slack issue flow have come to light. Opening this to track them.
We should add a package-lock.json file to this project to avoid issues like https://github.com/mapbox/dispatch/issues/84 in the future.
Per the npm docs on package-lock.json:
package-lock.json is automatically generated for any operations where npm modifies either the node_modules tree, or package.json. It describes the exact tree that was generated, such that subsequent installs are able to generate identical trees, regardless of intermediate dependency updates.
This file is intended to be committed into source repositories
So we can keep ๐ on dispatch's activity, we should have a dashboard of it's activities which should include, but is not limited to:
For the future backlog - for tasks that create GH issues to track a team member's progress on completing an action, especially those where we want dispatch to track state as to who has and has not taken an action when doing so is otherwise impossible to track from some other data source we have access to, we could have a remind
CLI command which would work like:
$ remind "title of issue" "text of reminder"
Ticket to track assorted, lingering code clean-up tasks, in preparation for future open source plans.
dispatch-incoming
GitHub issue creation to functiondispatch-incoming
Slack alert post to functiondispatch-incoming
decrypt
failure behavior/cc @oliikit @ianshward
Which patrol rule should we adapt to use Dispatch? How about 2FA disabled on Github ? What does it actually take to adapt a rule to use Dispatch?
The patrol rule needs to send an SNS message to dispatch. To do this there needs to be an SNS IAM policy in place to allow the lambda to post to the dispatch SNS topic. @zmully i partly recall you discussing this. Is this supported in lambda-cfn?
Otherwise, it looks like what it will take is swapping out message with code that sends a dispatch-formatted message to dispatch's SNS topic. Maybe we could ship a client with some error handling in dispatch, so you could do, within patrol-rules-github:
var client = require('dispatch').client
client.send({
// properties for the message
}, function(err, res) {
}
@k-mahoney @zmully do you think it makes sense to start w/ this rule first, and, what do you think about the steps and the client?
Dispatch should allow for dispatch messages to define to what Github repository a message gets posted, instead of having this be a single repository. This entails at least changing the message spec. to support a repo
parameter in the github
property, and having the code use this to determine where to post the message.
self-service
aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"self-service\",\"users\":[\"USER\"],\"body\":{\"github\":{\"title\":\"self-service title\",\"body\":\"self-service body\"},\"slack\":{\"message\":\"testSlackMessage\",\"actions\":{\"yes\":\"testYesAction\",\"no\":\"testNoAction\"}}}}"
high
aws sns publish --topic-arn "SNS_ARN" --subject "ANYTHING" \
--message "{\"timestamp\":\"2017-07-31T00:54:06.655Z\",\"type\":\"high\",\"body\":{\"pagerduty\":{\"title\":\"pagerduty title\"}}}}"
replace the SNS_ARN with the correct SNS arn for the function under test.
I'd like add an option to the Dispatch message spec (and the relevant code in Dispatch itself) so that Dispatch also checks for closed GitHub issues with the same title before deciding to create a new issue + send a Slack DM.
The ultimate goal is to leverage Dispatch for checking state (in the GitHub issues) and for the code in my Patrol rule to avoid having to manage state.
Consider the following use case:
See #51 for background on broadcast message load testing. Self-service messages will be more difficult to control concurrency for as the current dispatch architecture for self-service dispatches map a single SNS event to a single self-service dispatch lambda function. As Lambda has no concurrency controls other than an account invocation limit, if 200 messages are dispatched, Lambda could invoke all 200 Dispatch-incoming lambda functions at once, and Slack would rate-limit some number of them.
Possibilities:
{
forever: true,
retries: 20,
factor: 1.22226,
maxTimeout: 5 * 60 * 1000,
randomize: true
}
As part of open sourcing Dispatch we should update our existing documentation as well as add new docs.
@k-mahoney noted that for the MESSAGE-SPEC, we should be clear about their needing to be an outside oracle that stores username mappings, and that this could be either a service or a file.
Currently the interactive slack buttons return the value of the triage function's successful callback, so the user gets a message like "b3c90ad35e58: closed GitHub issue 41" when they click on an interactive button.
The Slack interactive buttons (https://api.slack.com/docs/interactive-message-field-guide) support passing a value
behind the scenes which we can use to hold the value of what we'd like the response to the user to be. This is currently hardcoded in dispatch.
So the SNS message could look like:
{
...
slack: {
message: 'STRING_VALUE',
actions: {
yes: 'Yes, I've enrolled!',
yes_response: 'Thank you for enrolling your computer in JAMF!',
no: 'Nope, I had an issue enrolling!', // Slack button text for 'no' action type
no_response: 'We're sorry you've had issues, the Security team has been notified, and we'll be in touch!'
}
}
}
}
Several dispatch
messages have failed due to incorrect Slack usernames from internal database and missing username objects due to Slack's API changes.
Before we ๐ณ๏ธ, we need to edit the info about for the Slack app so users can know what dispatch is. We should link to the internal documentation (https://github.com/mapbox/security/issues/533).
It would also be nice to have a custom avatar for dispatch, but that's not required for the completion of this ticket.
In order to know who to DM on Slack, as well as tag in Github issues, for rules like https://github.com/mapbox/patrol-service/issues/137 we need a way to look up a user's Slack and Github. On one hand we could assume that all messages sent to dispatch should already contain who to DM and who to tag. However, having to bake that into every single patrol rule would not be very convenient. I'm leaning toward that we should support two ways (example message json sent to dispatch. Property names are just examples and not implying we call them this):
{
recipient: {
slack: "@ianshward",
github: "@ianshward"
}
}
In this format, we tell dispatch we know who to send the message to
For our REST endpoint (dispatch-oracle), it'd be an api-gateway-fronted lambda. The lambda code would fetch the latest version of our internal list of GitHub and Slack handles.
The interface could look like:
lookup('@ianshward')
Assuming we open source dispatch, people can plug in a URL to their "oracle endpoint" and it can work however they want it to. It should just return JSON in a specified manner.
This dispatch-oracle should be a separate GH project, since if we open source Dispatch, dispatch-oracle is very custom to our use case and we would not want to make it publicly available.
I don't think so. This is one of the most critical aspects of dispatch - knowing who to assign a task to, and, ensuring dispatch always has a very up-to-date directory (which /us does for us already).
Creating a ticket to track this todo. @k-mahoney has a PR in progress for tests around all of the Slack usage in this project.
We need to test both the broadcast
and self-service
work flow with folks outside of the security team - to get a sense of how simple it is to use, how straight forward.
Alongside testing with a handful of specific people, as a PoC we'll be doing our initial test with the DC office, sending a broadcast
alert instructing how to register for Jamf and the deadline for doing so. Alert text can be found here. Will track feedback in this ticket.
Round 1
Round 2
Dispatch can't tag @mapbox/security because they aren't in the Mapbox org. Is there a better way we can tag security when there's an undefined message sent out?
Make sure we can:
Then double the volume to see if we hit any API limits in GitHub (self-service) or Slack (either type).
We should have the option to send a broadcast message without creating a GitHub issue. Have some initial work on this, iterated through discussion related to Jamf enrollment alert PoC.
Currently the GitHub repository dispatch
tickets are opened in is set via the CloudFormation template. This should be updated such that the repository specified in the template acts as the default or fallback destination for dispatch
tickets and allow a repository to be specified in the message specification. This way different dispatch
alerts can be ticketed in specific repositories, rather than all defaulting to one.
/cc @alulsh
dispatch-triage still needs tests. Waiting on wrapper around dke
@k-mahoney could not get the "replace_original": "false" flag to work so went with generating two messages instead. The issue here is that the action response (coming from the triage function) will by default overwrite the original message. By using two messages, only the second message, containing the "prompt" and actions gets overwritten, leaving the original alert body visible to the user.
https://api.slack.com/interactive-messages#building_workflows
Right now the KMS key for encrypting secure CloudFormation parameters is hard coded in the CloudFormation template for both dispatch-incoming and dispatch-triage.
We should find a way to parameterize it. Hopefully this is a matter of creating a new CloudFormation parameter, then reference in the value of the parameter in the IAM statement.
/cc @k-mahoney
What information do we need from the patrol
SNS message for dispatch
to work?
We'll need to have content to post to both the GitHub issue and Slack, as well as directions on how to resolve the issue if necessary. The GitHub username will be provided in the case of GitHub related alarms which will need to be mapped to the correct Slack username.
Tentative list:
We can add to this and further develop structure as we work through getting a working skeleton this week. I added some tentative JSON test objects for the time being that I'm using for Slack testing. These are by no means set in stone, just something to work with.
/cc @mapbox/security
@zmully has a branch of lambda-cfn for 1r1s. We should try to switch to that branch sooner than later.
For many incidents, the appropriate course of action would be to open a master issue to triage and coordinate with then manually trigger a Dispatch event to notify affected users.
For instance, a credential leak requires rotation of X users credentials. While triaging the main incident, a Dispatch self-service event would be triggered from the commandline something like:
$ dispatch self-service --users affectedUsers.json --message message.json
The users file would be the array to be passed to the Oracle for lookup:
[ 'user1', 'user2', 'user3', 'user4' ]
The message json file would be appropriate the body
object from the Dispatch message specification, for example a self-service message:
{
github: {
title: 'User1 NPM credentials require immediate rotation'
body: 'We have detected that your NPM credentials may have been compromised. Rotate your credentials immediately. Please see issue #XXX for details and instructions'
}
slack: {
message: 'Your NPM credentials may have been compromised, please rotate your NPM tokens immediately, following the instructions in issue #XXX. If you have any questions please ask in #security',
actions: {
yes: 'I have rotated all my NPM tokens',
no: 'I'm busy, remind me later!',
}
}
}
In this case of a self-service message, the dispatch CLI would:
dispatch-incoming-production
stackIf this JSON.parse is done in a try/catch it'd be possible to point out incorrectly formatted json when the parsing fails. We've seen at least one bad message come through:
SyntaxError: Unexpected token } in JSON at position 174
at Object.parse (native)
at /var/task/incoming/function.js:28:26
at /var/task/lib/utils.js:15:7
at /var/task/node_modules/decrypt-kms-env/index.js:21:5
at Queue._call (/var/task/node_modules/decrypt-kms-env/index.js:67:5)
at maybeNotify (/var/task/node_modules/d3-queue/build/d3-queue.js:120:7)
at /var/task/node_modules/d3-queue/build/d3-queue.js:91:12
at Response. (/var/task/node_modules/decrypt-kms-env/index.js:61:9)
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:364:18)
at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
at Request.emit (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
at Request.emit (/var/task/node_modules/aws-sdk/lib/request.js:683:14)
at Request.transition (/var/task/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/task/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/task/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request. (/var/task/node_modules/aws-sdk/lib/request.js:38:9)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.