Giter VIP home page Giter VIP logo

matanolabs / matano Goto Github PK

View Code? Open in Web Editor NEW
1.4K 1.4K 87.0 11.15 MB

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS

Home Page: https://matano.dev

License: Apache License 2.0

JavaScript 1.13% Batchfile 0.01% TypeScript 10.28% Python 5.86% Kotlin 13.84% Makefile 0.45% Rust 67.15% Shell 0.09% Java 1.19%
alerting apache-iceberg aws aws-security big-data cloud cloud-native cloud-security cybersecurity detection-engineering dfir log-analytics log-management rust secops security security-tools serverless siem threat-hunting

matano's Introduction

Twitter Follow

Open source security data lake for AWS

Matano Open Source Security data lake is an open source cloud-native security data lake, built for security teams on AWS.

Note

Matano offers a commercial managed Cloud SIEM for a complete enterprise Security Operations platform. Learn more.

Features



  • Security Data Lake: Normalize unstructured security logs into a structured realtime data lake in your AWS account.
  • Collect All Your Logs: Integrates out of the box with 50+ sources for security logs and can easily be extended with custom sources.
  • Detection-as-Code: Use Python to build realtime detections as code. Support for automatic import of Sigma detections to Matano.
  • Log Transformation Pipeline: Supports custom VRL (Vector Remap Language) scripting to parse, enrich, normalize and transform your logs as they are ingested without managing any servers.
  • No Vendor Lock-In: Uses an open table format (Apache Iceberg) and open schema standards (ECS), to give you full ownership of your security data in a vendor-neutral format.
  • Bring Your Own Analytics: Query your security lake directly from any Iceberg-compatible engine (AWS Athena, Snowflake, Spark, Trino etc.) without having to copy data around.
  • Serverless: Fully serverless and designed specifically for AWS and focuses on enabling high scale, low cost, and zero-ops.

Architecture


👀 Use cases

  • Reduce SIEM costs.
  • Augment your SIEM with a security data lake for additional context during investigations.
  • Write detections-as-code using Python to detect suspicious behavior & create contextualized alerts.
  • ECS-compatible serverless alternative to ELK / Elastic Security stack.

✨ Integrations

Managed log sources

Alert destinations

Query engines

Quick start

View the complete installation instructions

Installation

Install the matano CLI to deploy Matano into your AWS account, and manage your deployment.

Linux

curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-linux-x64.sh
chmod +x matano-linux-x64.sh
sudo ./matano-linux-x64.sh

macOS

curl -OL https://github.com/matanolabs/matano/releases/download/nightly/matano-macos-x64.sh
chmod +x matano-macos-x64.sh
sudo ./matano-macos-x64.sh

Deployment

Read the complete docs on getting started

To get started, run the matano init command.

  • Make sure you have AWS credentials in your environment (or in an AWS CLI profile).
  • The interactive CLI wizard will walk you through getting started by generating an initial Matano directory for you, initializing your AWS account, and deploying into your AWS account.
  • Initial deployment takes a few minutes.

Directory structure

Once initialized, your Matano directory is used to control & manage all resources in your project e.g. log sources, detections, and other configuration. It is structured as follows:

➜  example-matano-dir git:(main) tree
├── detections
│   └── aws_root_credentials
│       ├── detect.py
│       └── detection.yml
├── log_sources
│   ├── cloudtrail
│   │   ├── log_source.yml
│   │   └── tables
│   │       └── default.yml
│   └── zeek
│       ├── log_source.yml
│       └── tables
│           └── dns.yml
├── matano.config.yml
└── matano.context.json

When onboarding a new log source or authoring a detection, run matano deploy from anywhere in your project to deploy the changes to your account.

🔧 Log Transformation & Data Normalization

Read the complete docs on configuring custom log sources

Vector Remap Language (VRL), allows you to easily onboard custom log sources and encourages you to normalize fields according to the Elastic Common Schema (ECS) to enable enhanced pivoting and bulk search for IOCs across your security data lake.

Users can define custom VRL programs to parse and transform unstructured logs as they are being ingested through one of the supported mechanisms for a log source (e.g. S3, SQS).

VRL is an expression-oriented language designed for transforming observability data (e.g. logs) in a safe and performant manner. It features a simple syntax and a rich set of built-in functions tailored specifically to observability use cases.

Example: parsing JSON

Let's have a look at a simple example. Imagine that you're working with HTTP log events that look like this:

{
  "line": "{\"status\":200,\"srcIpAddress\":\"1.1.1.1\",\"message\":\"SUCCESS\",\"username\":\"ub40fan4life\"}"
}

You want to apply these changes to each event:

  • Parse the raw line string into JSON, and explode the fields to the top level
  • Rename srcIpAddress to the source.ip ECS field
  • Remove the username field
  • Convert the message to lowercase

Adding this VRL program to your log source as a transform step would accomplish all of that:

log_source.yml
transform: |
  . = object!(parse_json!(string!(.json.line)))
  .source.ip = del(.srcIpAddress)
  del(.username)
  .message = downcase(string!(.message))

schema:
  ecs_field_names:
    - source.ip
    - http.status

The resulting event 🎉:

{
  "message": "success",
  "status": 200,
  "source": {
    "ip": "1.1.1.1"
  }
}

📝 Writing Detections

Read the complete docs on detections

Use detections to define rules that can alert on threats in your security logs. A detection is a Python program that is invoked with data from a log source in realtime and can create an alert.

Examples

Detect failed attempts to export AWS EC2 instance in AWS CloudTrail logs.

def detect(record):
  return (
    record.deepget("event.action") == "CreateInstanceExportTask"
    and record.deepget("event.provider") == "ec2.amazonaws.com"
    and record.deepget("event.outcome") == "failure"
  )

Detect Brute Force Logins by IP across all configured log sources (e.g. Okta, AWS, GWorkspace)

detect.py
def detect(r):
    return (
        "authentication" in r.deepget("event.category", [])
        and r.deepget("event.outcome") == "failure"
    )


def title(r):
    return f"Multiple failed logins from {r.deepget('user.full_name')} - {r.deepget('source.ip')}"


def dedupe(r):
    return r.deepget("source.ip")
detection.yml
---
tables:
  - aws_cloudtrail
  - okta_system
  - o365_audit
alert:
  severity: medium
  threshold: 5
  deduplication_window_minutes: 15
  destinations:
    - slack_my_team

Detect Successful Login from never before seen IP for User

from detection import remotecache

# a cache of user -> ip[]
user_to_ips = remotecache("user_ip")

def detect(record):
    if (
      record.deepget("event.action") == "ConsoleLogin" and
      record.deepget("event.outcome") == "success"
    ):
        # A unique key on the user name
        user = record.deepget("user.name")

        existing_ips = user_to_ips[user] or []
        updated_ips = user_to_ips.add_to_string_set(
          user,
          record.deepget("source.ip")
        )

        # Alert on new IPs
        new_ips = set(updated_ips) - set(existing_ips)
        if existing_ips and new_ips:
            return True

🚨 Alerting

Read the complete docs on alerting

Alerts table

All alerts are automatically stored in a Matano table named matano_alerts. The alerts and rule matches are normalized to ECS and contain context about the original event that triggered the rule match, along with the alert and rule data.

Example Queries

Summarize alerts in the last week that are activated (exceeded the threshold)

select
  matano.alert.id as alert_id,
  matano.alert.rule.name as rule_name,
  max(matano.alert.title) as title,
  count(*) as match_count,
  min(matano.alert.first_matched_at) as first_matched_at,
  max(ts) as last_matched_at,
  array_distinct(flatten(array_agg(related.ip))) as related_ip,
  array_distinct(flatten(array_agg(related.user))) as related_user,
  array_distinct(flatten(array_agg(related.hosts))) as related_hosts,
  array_distinct(flatten(array_agg(related.hash))) as related_hash
from
  matano_alerts
where
  matano.alert.first_matched_at > (current_timestamp - interval '7' day)
  and matano.alert.activated = true
group by
  matano.alert.rule.name,
  matano.alert.id
order by
  last_matched_at desc

Delivering alerts

You can deliver alerts to external systems. You can use the alerting SNS topic to deliver alerts to Email, Slack, and other services.



A medium severity alert delivered to Slack

❤️ Community support

For general help on usage, please refer to the official documentation. For additional help, feel free to use one of these channels to ask a question:

  • Discord (Come join the family, and hang out with the team and community)
  • Forum (For deeper conversations about features, the project, or problems)
  • GitHub (Bug reports, Contributions)
  • Twitter (Get news hot off the press)

👷 Contributors

Thanks go to these wonderful people (emoji key):

Shaeq Ahmed
Shaeq Ahmed

🚧
Samrose
Samrose

🚧
Kai Herrera
Kai Herrera

💻 🤔 🚇
Ram
Ram

🐛 🤔 📓
Zach Mowrey
Zach Mowrey

🤔 🐛 📓
marcin-kwasnicki
marcin-kwasnicki

📓 🐛 🤔
Greg Rapp
Greg Rapp

🐛 🤔
Matthew X. Economou
Matthew X. Economou

🐛
Jarret Raim
Jarret Raim

🐛
Matt Franz
Matt Franz

🐛
Francesco Faenzi
Francesco Faenzi

🤔
Nishant Das Patnaik
Nishant Das Patnaik

🤔
Tim O'Guin
Tim O'Guin

🤔 🐛 💻
Francesco R.
Francesco R.

🐛
Joshua Sorenson
Joshua Sorenson

💻 📖
Chris Smith
Chris Smith

💻

This project follows the all-contributors specification. Contributions of any kind are welcome!

License

matano's People

Contributors

allcontributors[bot] avatar chrismsnz avatar clairecasalnova-cisa avatar francescor avatar gromit6891 avatar grue avatar kai-ten avatar rileydakota avatar samrose-ahmed avatar shaeqahmed avatar timoguin avatar vtimd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

matano's Issues

Make it easier to test VRL transformations + schema changes

Overview

It is currently difficult to test VRL and schema changes in Matano. It requires a deployment and results in errors that make it hard to ascertain the issue.

Goal

Add functionality to be able to test changes to VRL transformations and schema's easily while developing.

Installation instructions in README do not match the documentation

The documentation at https://www.matano.dev/docs/installation tells users to run these commands to install the Matano CLI:

git clone https://github.com/matanolabs/matano.git
cd matano && make install

However, the README at https://github.com/matanolabs/matano#from-source tells users to run these commands, which do not include the cd command required to complete the installation successfully:

git clone https://github.com/matanolabs/matano.git
make install

Add CLI command to bulk search for IoC's across data lake

Overview

Currently, it is difficult to search for a known indicator across all/multiple tables in your Matano security lake.

Goals

Add a CLI command that automatically searches for a given indicator against all relevant fields in all relevant tables.

For example, one can provide a malicious IP and it will be searched across columns such as related.ip in all Matano tables that have this field.

Notes

  • Display a table showing aggregate view of matches in each table
  • Support ability to save matches to file.
  • Be able to search any ECS field and narrow matches by time.

[Feature Request] Allow Custom Tags on all taggable resources

What

Allow users to apply their own custom AWS tags to all taggable resources.

Why

Business/Enterprise clients will need the ability to determine ownership and costs for this application.

How

Add support for an optional tags property in matano.config.yml which allows users to input a list of key:value pairs which will be applied to all taggable resources in all Matano stacks.

matano init should create a unique resource identier

When running multiple inits (assuming a new directory) is used I would expect a new unique identifier would be created each time

 CDKToolkit |  0/12 | 9:06:46 AM | CREATE_FAILED        | AWS::S3::Bucket         | StagingBucket cdk-hnb659fds-assets-XXXXXX-us-east-2 already exists

I ran init twice and cdk-hnb659fds seems to be re-used, i would expect this to be unique each run, but maybe this is a constraint of CDK

When you have multiple repeated failures to deploy this makes cleanup difficult. I would also expect each deployment to have unique roles.

{
  "version": "20.0.0",
  "files": {
    "70f03c831095bf0345af1dac68037dcb2b95a9fe0c4b4d27738cfad55da1c8c7": {
      "source": {
        "path": "DPCommonStack.template.json",
        "packaging": "file"
      },
      "destinations": {
        "647303185053-us-east-2": {
          "bucketName": "cdk-hnb659fds-assets-XXXXX-us-east-2",
          "objectKey": "70f03c831095bf0345af1dac68037dcb2b95a9fe0c4b4d27738cfad55da1c8c7.json",
          "region": "us-east-2",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::XXX:role/cdk-hnb659fds-file-publishing-role-XXX-us-east-2"
        }
      }
    }
  },
  "dockerImages": {}
}mfranz@pixel-slate-cros:~$ cat /tmp/matanocdkoutonT9xy/DPCommonStack.assets.json 
{
  "version": "20.0.0",
  "files": {
    "70f03c831095bf0345af1dac68037dcb2b95a9fe0c4b4d27738cfad55da1c8c7": {
      "source": {
        "path": "DPCommonStack.template.json",
        "packaging": "file"
      },
      "destinations": {
        "647303185053-us-east-2": {
          "bucketName": "cdk-hnb659fds-assets-XXXX-us-east-2",
          "objectKey": "70f03c831095bf0345af1dac68037dcb2b95a9fe0c4b4d27738cfad55da1c8c7.json",
          "region": "us-east-2",
          "assumeRoleArn": "arn:${AWS::Partition}:iam::XXX:role/cdk-hnb659fds-file-publishing-role-XXX-us-east-2"
        }
      }
    }
  },
  "dockerImages": {}

"make install" does not install the Matano CLI independent of its source code

I am running Ubuntu 20.04 on Windows via WSL 1.

I have installed node.js v12.22.12 via the Node Version Manager per https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-20-04.

After installing the Matano CLI per https://www.matano.dev/docs/installation, I cannot remove my copy of the source code without breaking the installation of the Matano CLI:

xenophonf@l0000000d:~/src$ git clone https://github.com/matanolabs/matano.git
Cloning into 'matano'...
remote: Enumerating objects: 4324, done.
remote: Counting objects: 100% (1119/1119), done.
remote: Compressing objects: 100% (518/518), done.
remote: Total 4324 (delta 618), reused 931 (delta 488), pack-reused 3205
Receiving objects: 100% (4324/4324), 7.21 MiB | 967.00 KiB/s, done.
Resolving deltas: 100% (2203/2203), done.
xenophonf@l0000000d:~/src$ cd matano && make install
cd infra && npm run clean && npm ci && npm run build

> [email protected] clean /home/xenophonf/src/matano/infra
> rm -rf dist && rm -rf node_modules

added 598 packages in 14.224s

> [email protected] build /home/xenophonf/src/matano/infra
> rm -rf dist && tsc

cd cli && npm run clean && npm run full-install

> [email protected] clean /home/xenophonf/src/matano/cli
> rm -rf dist && rm -rf node_modules


> [email protected] full-install /home/xenophonf/src/matano/cli
> npm ci && npm run build && npm uninstall -g matano && npm install -g .


> [email protected] preinstall /home/xenophonf/src/matano/cli/node_modules/yarn
> :; (node ./preinstall.js > /dev/null 2>&1 || true)

added 632 packages in 21.89s

> [email protected] build /home/xenophonf/src/matano/cli
> rm -rf dist && tsc -b

removed 1 package in 2.042s
/home/xenophonf/.nvm/versions/node/v12.22.12/bin/matano -> /home/xenophonf/.nvm/versions/node/v12.22.12/lib/node_modules/matano/bin/run
+ [email protected]
added 1 package from 1 contributor in 0.833s
xenophonf@l0000000d:~/src/matano$ which matano
/home/xenophonf/.nvm/versions/node/v12.22.12/bin/matano
xenophonf@l0000000d:~/src/matano$ matano --help
(node:24394) SyntaxError Plugin: matano: Unexpected token '.'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
(node:24394) SyntaxError Plugin: matano: Unexpected token '.'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
(node:24394) SyntaxError Plugin: matano: Unexpected token '?'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
█▀▄▀█ ▄▀█ ▀█▀ ▄▀█ █▄░█ █▀█
█░▀░█ █▀█ ░█░ █▀█ █░▀█ █▄█

Matano - the open source security lake platform for AWS.

VERSION
  matano/0.0.0 wsl-x64 node-v12.22.12

USAGE
  $ matano [COMMAND]

TOPICS
  generate  Utilities to get started and generate boilerplate.

COMMANDS
  autocomplete  display autocomplete installation instructions
  help          Display help for matano.

xenophonf@l0000000d:~/src/matano$ cd ..
xenophonf@l0000000d:~/src$ rm -rf matano
xenophonf@l0000000d:~/src$ which matano
xenophonf@l0000000d:~/src$ ls -l /home/xenophonf/.nvm/versions/node/v12.22.12/bin/matano
lrwxrwxrwx 1 xenophonf xenophonf 34 Aug 12 11:00 /home/xenophonf/.nvm/versions/node/v12.22.12/bin/matano -> ../lib/node_modules/matano/bin/run
xenophonf@l0000000d:~/src$ ls -l /home/xenophonf/.nvm/versions/node/v12.22.12/bin/../lib/node_modules/matano/bin/run
ls: cannot access '/home/xenophonf/.nvm/versions/node/v12.22.12/bin/../lib/node_modules/matano/bin/run': No such file or directory
xenophonf@l0000000d:~/src$ ls -l /home/xenophonf/.nvm/versions/node/v12.22.12/bin/../lib/node_modules/
total 0
lrwxrwxrwx 1 xenophonf xenophonf   32 Aug 12 11:00 matano -> ../../../../../../src/matano/cli
drwx------ 1 xenophonf xenophonf 4096 Apr  5 03:11 npm

Generic equivalent?

This is not really an "issue". I just want to thank you for open-sourcing this interesting project. I have been thinking on the same lines but a vendor neutral alternative. Do you think there could be an vendor neutral equivalent for this project? Something that can be deployed across cloud providers as well bare-metal (Kubernetes)? Do you think there are equivalent alternatives for the AWS components that can be replaced with CNCF projects and/or FOSS projects? That would be really awesome and would probably have a much wider adoption IMHO.

Its perfectly fine though if you want to be AWS specific. :) Happy to chat further.

Move schema definitions out of template to avoid CFN template size limit

Overview

We currently inline the table schema as a property in the CloudFormation template through CDK. Schemas can be quite long and when we have many tables, we'll run into the CFN 1MB template size limit.

Goals

  • Dont inline the schema as a property and instead have an asset publish asset file to S3 and use s3 path as property that the custom resource can download.

Notes

Implement deduplication for threat intel enrichment ingestion

Overview

Many threat intel sources are not static and are modified/updated. If we are polling for data based on time, this will introduce duplicates.

Goal

Add ability to deduplicate data ingested from enrichment sources.

Notes

Can be implemented with Athena V3 Iceberg MERGE INTO

  • For enrichment table, have a temp table: table_temp (need to create this table statically)
  • On new data pulled, overwrite temp table with new data. (puller writes to temp table).
  • Inside metadata writer, Execute an Athena query that merges new data from temp table to main table. Query like:
MERGE INTO enrichment_table main USING enrichment_table_temp new
	-- primary key
    ON (main.event.id = new.event.id)
    WHEN MATCHED
		-- all top level cols
        THEN UPDATE SET event = new.event, threat = new.threat
    WHEN NOT MATCHED
		-- all top level cols
        THEN INSERT (event, threat) VALUES(new.event, new.threat)

SQS ingestion

Tracking issue for SQS ingestion support.

Goal

Someone can send logs through SQS to Matano.

Design

  • Likely have a separate ingest queue per log source.

  • SQS has a max message size of 256 KB. For efficiency, we can store many log 'rows' in one SQS message (similar to how Kinesis packs messages).

  • Aim to basically encode each message very similarly to an S3 file.

  • Can also add support compression through base64 encoding.

Notes

  • SQS ingestion can skip Data batcher (necessary for S3 to have predictable work sent to Transformer).
    • Instead, set SQS event source to max batch window (10000) and Lambda will send payload up to max 6MB.
  • Transformer to be updated to be able to handle SQS messages. Lambda will be invoked with array of messages.

bug: Init fails if there is not a default VPC in the account

What?

When running matano init, the wizard fails while initializing the AWS environment because it can't find a default VPC.

Error output:

Initializing AWS environment... (1/3) ›   Error: An error occurred: Could not find any VPCs matching {"account":"111111111111","region":"us-east-1","filter":{"is-default":"true"}}

Fix

  • It should prompt for a VPC ID.
  • Even better (next iteration), it should fetch a list of VPCs and provide a selection prompt.

Managed log source for AWS ECR image scanning

Add support for managing data from ECR's image scanning service.

Considerations

ECR supports automatic vulnerability scanning on push for container images.

There are two ways to get these events:

  • Poll the ECR API
  • Send results to Amazon Inspector, which can surface events to EventBridge

Tasks

  • Research, planning, and design
  • ECS schemas for repository events
  • VRL transforms
  • Docs and examples

[Bug] Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its minimum value of [x]

I have been trying to run matano in a fresh personal AWS account after having it tried it an another account with extended lambda limits to see if there exists any additional configuration / request for quota increase. I hit upon this error with matano.

Details below.

Version : matano/0.0.0 linux-x64 node-v14.18.1
Note: This is the nightly build as of today.

Snippet Error from Terminal Below :

rams3sh@monastery:~/Garage/matano$ matano init
━━━ Matano: Get started Wizard ━━━

Welcome to the Matano init wizard. This will get you started with Matano.
Follow the prompts to get started. You can always change these values later.

✔ Which AWS Region to deploy to? · us-east-1
✔ What is the AWS Account ID to deploy to? · XXXXXXXXXXXXX
✔ Do you have an existing matano directory? (y/N) · false
  I will generate a Matano directory in the current directory.
✔ What is the name of the directory to generate?(use . for current directory) · .
✔ Generated Matano directory at /home/rams3sh/Garage/matano.
✔ Successfully initialized your account.
⠦ Now deploying Matano to your AWS account... 
›   Error: An error occurred: Command failed with exit code 1: /usr/local/matano-cli/cdk deploy DPMainStack --require-approval never --app /usr/local/matano-cli/matano-cdk 
...

 ›   Failed resources:
 ›   MatanoDPMainStack | 7:51:57 PM | CREATE_FAILED        | AWS::Lambda::Function            | DPMainStack/LakeWriter/AlertsFunction (LakeWriterAlertsFunctionCB567D9B) 
 ›   Resource handler returned message: "Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its minimum value of [50].
 ›    (Service: Lambda, Status Code: 400, Request ID: c990af9b-a3e6-4328-a3c7-4f0b01967c4f)" (RequestToken: b6eb4fad-441b-2493-86c5-5c29b6969a6f, HandlerErrorCode: 
 ›   InvalidRequest)
 ›   
 ›    ❌  DPMainStack (MatanoDPMainStack) failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS console: 
 ›   ROLLBACK_COMPLETE: Resource handler returned message: "Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its 
 ›   minimum value of [50]. (Service: Lambda, Status Code: 400, Request ID: c990af9b-a3e6-4328-a3c7-4f0b01967c4f)" (RequestToken: b6eb4fad-441b-2493-86c5-5c29b6969a6f, 
 ›   HandlerErrorCode: InvalidRequest)
 ›       at FullCloudFormationDeployment.monitorDeployment (/snapshot/node_modules/aws-cdk/lib/api/deploy-stack.ts:505:13)
 ›       at runMicrotasks (<anonymous>)
 ›       at processTicksAndRejections (internal/process/task_queues.js:95:5)
 ›       at deployStack2 (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:265:24)
 ›       at /snapshot/node_modules/aws-cdk/lib/deploy.ts:39:11
 ›       at run (/snapshot/node_modules/p-queue/dist/index.js:163:29)
 ›   
 ›    ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS console:
 ›    ROLLBACK_COMPLETE: Resource handler returned message: "Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its 
 ›   minimum value of [50]. (Service: Lambda, Status Code: 400, Request ID: c990af9b-a3e6-4328-a3c7-4f0b01967c4f)" (RequestToken: b6eb4fad-441b-2493-86c5-5c29b6969a6f, 
 ›   HandlerErrorCode: InvalidRequest)
 ›       at deployStacks (/snapshot/node_modules/aws-cdk/lib/deploy.ts:61:11)
 ›       at runMicrotasks (<anonymous>)
 ›       at processTicksAndRejections (internal/process/task_queues.js:95:5)
 ›       at CdkToolkit.deploy (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:339:7)
 ›       at initCommandLine (/snapshot/node_modules/aws-cdk/lib/cli.ts:374:12)
 ›
 ›   Stack Deployments Failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource 
 ›   handler returned message: "Specified ReservedConcurrentExecutions for function decreases account's UnreservedConcurrentExecution below its minimum value of [50]. 
 ›   (Service: Lambda, Status Code: 400, Request ID: c990af9b-a3e6-4328-a3c7-4f0b01967c4f)" (RequestToken: b6eb4fad-441b-2493-86c5-5c29b6969a6f, HandlerErrorCode: 
 ›   InvalidRequest)
 ›   Created temporary directory for configuration files: /tmp/mtnconfigv9yADs/config
 ›   arn:aws:cloudformation:us-east-1:XXXXXXXXXX:stack/MatanoDPCommonStack/cebd94d0-7e14-11ed-9855-0e5a30013c2f

Lambda Quotas :

rams3sh@monastery:~/Garage/matano$ aws lambda get-account-settings
{
    "AccountLimit": {
        "TotalCodeSize": 80530636800,
        "CodeSizeUnzipped": 262144000,
        "CodeSizeZipped": 52428800,
        "ConcurrentExecutions": 50,
        "UnreservedConcurrentExecutions": 50
    },
    "AccountUsage": {
        "TotalCodeSize": 1337,
        "FunctionCount": 1
    }
}

Please let me know how to proceed from here.

Also, should I have to mandatory increase the lambda quota since it has a separate pricing ? Can there be any option not to have this concurrency enabled as part of matano deployment ? This will be helpful for experimentation use cases such as the current scenario like mine where I don't expect to have production scale events.

Further, such cases (in general) can be part of some kind of cli argument where the user has an option to explicitly disable such recommended production settings which may not be required for a staging / experimentation.

Managed log source for AWS VPC Flow logs

Add support for managing VPC Flow logs.

Considerations

Flow logs can now be published to Kinesis Firehose. We should look at implementing the current transformer Lambda in a way that it can be utilized by a Firehose Delivery Stream for transformation. This is probably as close to real-time as we can get. It would allow us to handle the normalization and delivery to S3 (in parquet) in one step, bypassing the data batcher Lambda.

Flow logs can also be published directly to S3 in parquet format. Since we already need to do record-based ECS normalization, these may not be as useful as the text-based delivery. Now that I think of it, would it actually be useful to process the raw parquet files simply due to the smaller file sizes? It would also save on up-front storage costs for the ingestion buckets. Flow logs get big fast.

Flow logs can also be delivered to CloudWatch Logs, although anyone with real volume is probably not doing this because CloudWatch Logs gets expensive. However, streaming CloudWatch Logs to Firehose is a rather nice experience when such high volumes are not a concern.

Tasks

  • ECS normalization for VPC Flow logs
  • VRL transforms
  • Investigate Firehose transforms

References

Managed log source for Google Workspace

Managed log source for AWS S3 access logs

Add support for managing AWS S3 access logs.

Considerations

S3 access logs are an odd man out for a few reasons:

  • KMS encryption is NOT supported (must use SSE-S3 encryption, if any)
  • Cross-account access requires roles assumption due to object ownership weirdness

Tasks

  • ECS normalization for S3 access logs
  • VRL transforms
  • UX design if role assumption is required (add role ARN to config?)

References

Transformer function(s) for Kinesis Firehose

Allow running transformations as part of Kinesis Firehose delivery streams.

Considerations

Many AWS services still will only deliver logs to CloudWatch Logs. Some can send directly to Kinesis Firehose.

Anything sent to CloudWatch Logs can be streamed from there to Kinesis Firehose.

The transformation features of Firehose are quite nice to work with, and they would allow skipping the data batcher logic required for processing files from S3. Firehose also manages retries and other logic, and it adds a number of additional points of observability.

This will require planning and design work. I'm not sure if we'd want to make the current transformer Lambda more generic so it can process more triggers, or if we'd want to implement something slightly different for the Firehose use case. My guess is it'd be better to support more triggers / sources for the current transformer.

Tasks

  • Research, planning, and design

User guide for querying data

Add a basic user guide showing how to run queries.

Considerations

I'm not sure if we should have a top-level guide (perhaps below the "Tables" section), or if we should have some examples included for each type of managed log source.

Without a robust UI to poke around with, many users will stall at this step. Some example queries for common use cases would be helpful to ensure users get quick feedback.

These can eventually be made into a collection of views, UDFs, prepared statements, etc. Later we could allow defining custom ones as part of the Matano config.

References

S3 Event Notifications - Configuration is ambiguously defined

The Problem

When creating Matano with byob, I was unable to deploy DPMainStack with the error "Configuration is ambiguously defined"

Ultimately this was because my existing dev bucket already had an SNS Event Notification on it for an existing workflow.

Side effects of this problem include:

  • Must remove the existing S3 Event Notification + deploy Matano (this causes an outage)
  • Repoint any existing SQS queue(s) to the new Matano SNS topic
    • If a user was using any Event trigger types other than Object Creation, they would then have to go and add their Event triggers again, causing stack drift / manual point of failure

Discussed solutions at the time of writing

  • Allow users to bring their own SNS Topics if there is already an S3 Event Notification
  • In the Transformer, skip events that don't match an acceptable Struct for ingest

Managed log source for Microsoft Graph

Overview

Microsoft Graph is a unified API for access to many relevant Microsoft/Azure logs & resources.

Puller

The advantage of Microsoft Graph is we can implement a largely unified poller, and only have to define transforms/schemas for each source within the Graph API.

  • Implement Microsoft Graph puller

Tables

Relevant tables to target:

  • Azure Active Directory
    • #76
    • Provisioning logs
    • #92
    • Identity Protection logs

Add option to Use VPCs in lambdas if specified by user

Overview

We currently let the user define a VPC id in their matano.config.yml like so:

vpc:
  id: vpc-05175918865d89771

However, we don't currently use the VPC in all the generated resources.

Goal

If the user specifies a VPC ID in their config, use the VPC when generating all resources.

Relevant resource currently is just Lambda functions.

Notes

  • Matano uses CDK context to cache the VPC info. You can access the VPC info inside a CDK stack like so, which will be defined if the user specified a VPC in their config:
const vpc: cdk.aws_ec2.IVpc | undefined = (cdk.Stack.of(this) as MatanoStack).matanoVpc;
  • If user doesn't specify a VPC, can just not use any VPC for now.
  • Possibly look into using CDK aspects to simplify.

Managed log source for AWS Config

Add support for managing logs (data?) and events from AWS Config. This includes configuration snapshots, configuration history, and configuration streams.

Considerations

AWS Config sends notifications to SNS for a number of events, the most useful of which is the ConfigurationItemChangeNotification and the ComplianceChangeNotification. These would be highly useful to ingest.

AWS Config also delivers configuration snapshots and configuration history data to S3.

Another useful event is the OversizedConfigurationItemChangeNotification, which delivers configuration change data to S3 in the event that it is too large for an SNS message (so they require additional processing).

Tasks

  • Configuration Stream support (SNS)
  • Configuration Snapshot support
  • Configuration History support

References

Package and publish CLI as a Docker image

What?

This is a packaging concern. It'd be nice to be able to pull and run a Docker image for the Matano CLI.

Why?

Run it locally without having to bother with installation. Run it in container clusters. Use it with Docker Compose as part of a testing setup.

How?

Probably by adding an image definition at cli/Dockerfile, followed by a CI/CD workflow to build and publish to Docker Hub, GH Container Registry, etc.

"Error: command bootstrap not found" when bootstrapping the AWS account

I am running Ubuntu 20.04 on Windows via WSL 1.

I have installed node.js v12.22.12 via the Node Version Manager per https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-20-04.

I have installed the Matano CLI and created the configuration directory.

When I run matano bootstrap, I get the following error:

xenophonf@l0000000d:~/src/matano/my-matano-config$ matano bootstrap
(node:24682) SyntaxError Plugin: matano: Unexpected token '.'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
(node:24682) SyntaxError Plugin: matano: Unexpected token '.'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
(node:24682) SyntaxError Plugin: matano: Unexpected token '?'
module: @oclif/[email protected]
task: toCached
plugin: matano
root: /home/xenophonf/src/matano/cli
See more details with DEBUG=*
 ›   Error: command bootstrap not found

Zscaler - Managed log source

Add support for Zscaler logs to Matano.

Sources

  1. Zscaler Internet Access logs (zscaler_zia)

Tables:

  • alerts
  • dns
  • firewall
  • tunnel
  • web
  1. Zscaler Private Access logs (zscaler_zpa)

Tables:

  • audit
  • browser_access
  • user_activity
  • user_status

Steps

  • Implement all relevant parsers to ECS (proceses from ingest S3 bucket)
  • Build a managed poller to automatically pull logs from Zscaler

Init Fails with: Resource handler returned message: "Invalid request provided: Queue visibility timeout: 30 seconds is less than Function timeout: 60 seconds

CLI Version (installed via docs today)

mfranz@pixel-slate-cros:~/matano$  matano --version
matano/0.0.0 linux-x64 node-v14.18.1
mfranz@pixel-slate-cros:~/matano$ md5sum /usr/local/bin/matano
ca5dbebd474f92dd3448bc54398b93b2  /usr/local/bin/matano

Logs


 ›   MatanoDPMainStack |  99/104 | 10:27:12 AM | CREATE_COMPLETE      | AWS::Lambda::Function            | DPMainStack/Transformer/Function 
 ›   (TransformerFunctionFE009084) 
 ›   MatanoDPMainStack | 100/104 | 10:27:13 AM | CREATE_COMPLETE      | AWS::Lambda::Function            | DPMainStack/LakeWriter/Function (LakeWriterFunctionF773435F)
 › 
 ›   MatanoDPMainStack | 100/104 | 10:27:15 AM | CREATE_IN_PROGRESS   | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/LakeWriter/AlertsFunction/SqsEventSource:DPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805 
 ›   (LakeWriterAlertsFunctionSqsEventSourceDPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805B599AD6D) 
 ›   MatanoDPMainStack | 100/104 | 10:27:16 AM | CREATE_IN_PROGRESS   | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/Transformer/Function/SqsEventSource:DPMainStackDataBatcherOutputQueueD9616F88 
 ›   (TransformerFunctionSqsEventSourceDPMainStackDataBatcherOutputQueueD9616F888667E4CB) 
 ›   MatanoDPMainStack | 100/104 | 10:27:18 AM | CREATE_IN_PROGRESS   | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/LakeWriter/AlertsFunction/SqsEventSource:DPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805 
 ›   (LakeWriterAlertsFunctionSqsEventSourceDPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805B599AD6D) Resource creation Initiated
 ›   MatanoDPMainStack | 100/104 | 10:27:18 AM | CREATE_IN_PROGRESS   | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/LakeWriter/Function/SqsEventSource:DPMainStackMatanoLogstestlogsourceDefaultTableLakeWriterQueueC1E4E04B 
 ›   (LakeWriterFunctionSqsEventSourceDPMainStackMatanoLogstestlogsourceDefaultTableLakeWriterQueueC1E4E04BF71C3721) 
 ›   MatanoDPMainStack | 100/104 | 10:27:18 AM | CREATE_FAILED        | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/Transformer/Function/SqsEventSource:DPMainStackDataBatcherOutputQueueD9616F88 
 ›   (TransformerFunctionSqsEventSourceDPMainStackDataBatcherOutputQueueD9616F888667E4CB) Resource handler returned message: "Invalid request provided: Queue 
 ›   visibility timeout: 30 seconds is less than Function timeout: 60 seconds (Service: Lambda, Status Code: 400, Request ID: bc350930-2f6e-4f4c-9b68-809dd098f9c7)" 
 ›   (RequestToken: fb617621-26d6-2dd5-6739-1c4b5762885b, HandlerErrorCode: InvalidRequest)
 ›   MatanoDPMainStack | 100/104 | 10:27:19 AM | CREATE_FAILED        | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/LakeWriter/AlertsFunction/SqsEventSource:DPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805 
 ›   (LakeWriterAlertsFunctionSqsEventSourceDPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805B599AD6D) Resource creation cancelled
 ›   MatanoDPMainStack | 100/104 | 10:27:20 AM | CREATE_FAILED        | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/LakeWriter/Function/SqsEventSource:DPMainStackMatanoLogstestlogsourceDefaultTableLakeWriterQueueC1E4E04B 
 ›   (LakeWriterFunctionSqsEventSourceDPMainStackMatanoLogstestlogsourceDefaultTableLakeWriterQueueC1E4E04BF71C3721) Resource creation cancelled
 ›   MatanoDPMainStack | 100/104 | 10:27:22 AM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack       | MatanoDPMainStack The following resource(s) failed to 
 ›   create: [TransformerFunctionSqsEventSourceDPMainStackDataBatcherOutputQueueD9616F888667E4CB, 
 ›   LakeWriterFunctionSqsEventSourceDPMainStackMatanoLogstestlogsourceDefaultTableLakeWriterQueueC1E4E04BF71C3721, 
 ›   LakeWriterAlertsFunctionSqsEventSourceDPMainStackAlertsDefaultTableLakeWriterQueueC3CE4805B599AD6D]. Rollback requested by user.

Error Message

 ›   Failed resources:
 ›   MatanoDPMainStack | 10:27:18 AM | CREATE_FAILED        | AWS::Lambda::EventSourceMapping  | 
 ›   DPMainStack/Transformer/Function/SqsEventSource:DPMainStackDataBatcherOutputQueueD9616F88 
 ›   (TransformerFunctionSqsEventSourceDPMainStackDataBatcherOutputQueueD9616F888667E4CB) Resource handler returned message: "Invalid request provided: Queue 
 ›   visibility timeout: 30 seconds is less than Function timeout: 60 seconds (Service: Lambda, Status Code: 400, Request ID: bc350930-2f6e-4f4c-9b68-809dd098f9c7)" 
 ›   (RequestToken: fb617621-26d6-2dd5-6739-1c4b5762885b, HandlerErrorCode: InvalidRequest)
 ›   
 ›    ❌  DPMainStack (MatanoDPMainStack) failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS console: 
 ›   ROLLBACK_COMPLETE: Resource handler returned message: "Invalid request provided: Queue visibility timeout: 30 seconds is less than Function timeout: 60 seconds 
 ›   (Service: Lambda, Status Code: 400, Request ID: bc350930-2f6e-4f4c-9b68-809dd098f9c7)" (RequestToken: fb617621-26d6-2dd5-6739-1c4b5762885b, HandlerErrorCode: 
 ›   InvalidRequest)
 ›       at FullCloudFormationDeployment.monitorDeployment (/snapshot/node_modules/aws-cdk/lib/api/deploy-stack.ts:505:13)
 ›       at processTicksAndRejections (internal/process/task_queues.js:95:5)
 ›       at deployStack2 (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:265:24)
 ›       at /snapshot/node_modules/aws-cdk/lib/deploy.ts:39:11
 ›       at run (/snapshot/node_modules/p-queue/dist/index.js:163:29)
 ›   
 ›    ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS 
 ›   console: ROLLBACK_COMPLETE: Resource handler returned message: "Invalid request provided: Queue visibility timeout: 30 seconds is less than Function timeout: 60 
 ›   seconds (Service: Lambda, Status Code: 400, Request ID: bc350930-2f6e-4f4c-9b68-809dd098f9c7)" (RequestToken: fb617621-26d6-2dd5-6739-1c4b5762885b, 
 ›   HandlerErrorCode: InvalidRequest)
 ›       at deployStacks (/snapshot/node_modules/aws-cdk/lib/deploy.ts:61:11)
 ›       at processTicksAndRejections (internal/process/task_queues.js:95:5)
 ›       at CdkToolkit.deploy (/snapshot/node_modules/aws-cdk/lib/cdk-toolkit.ts:339:7)
 ›       at initCommandLine (/snapshot/node_modules/aws-cdk/lib/cli.ts:374:12)
 ›
 ›   Stack Deployments Failed: Error: The stack named MatanoDPMainStack failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: 
 ›   Resource handler returned message: "Invalid request provided: Queue visibility timeout: 30 seconds is less than Function timeout: 60 seconds (Service: Lambda, 
 ›   Status Code: 400, Request ID: bc350930-2f6e-4f4c-9b68-809dd098f9c7)" (RequestToken: fb617621-26d6-2dd5-6739-1c4b5762885b, HandlerErrorCode: InvalidRequest)
 ›   Created temporary directory for configuration files: /tmp/mtnconfignGHkpS/config

Structured log output for all functions

All functions should output fully structured logs.

Considerations

Every individual event that triggers a Lambda should result in at least one line of structured output. This will improve observability in a number of ways. By default they're going to go to CloudWatch Logs, which allows us to create alarms based on metrics calculated from the logs. Then they could also be easily streamed from there into Kinesis Firehose, and then delivered in parquet right back into Matano.

A good structured logging library will allow us to create a log context that we can flow through the application logic, attaching relevant log data along the way, and then flushing one line at the end. High cardinality log output is key.

Tasks

  • Research structured log libraries
    • Rust
    • Should we provide something for Python code? For example, to assist debugging if any of the detections are failing.
    • Java / Kotlin?
    • NodeJS?
  • Design work
  • Implementation for each language

References

Will add later.

Managed log source for S3 Inventory Reports

Add support for managing S3 Inventory Reports.

Considerations

Inventory reports can be delivered in one of three formats: CSV, ORC, or Parquet. Since this isn't the only AWS service that can deliver data in Parquet (or ORC) format, we should support ingesting them, especially considering columnar format support will continue to expand to other services.

Inventory reports can be delivered hourly or weekly.

The object metadata to include in the reports is configurable, so we'll need to be sure to handle any missing keys/values.

References

EACCES error when installing the Matano CLI

I am running Ubuntu 20.04 on Windows via WSL 1.

I have installed node.js v12.22.12 via the NodeSource PPA per https://www.digitalocean.com/community/tutorials/how-to-install-node-js-on-ubuntu-20-04.

Running make install per https://www.matano.dev/docs/installation, I get the following error:

npm WARN checkPermissions Missing write access to /usr/lib/node_modules
npm ERR! code EACCES
npm ERR! syscall access
npm ERR! path /usr/lib/node_modules
npm ERR! errno -13
npm ERR! Error: EACCES: permission denied, access '/usr/lib/node_modules'
npm ERR!  [Error: EACCES: permission denied, access '/usr/lib/node_modules'] {
npm ERR!   errno: -13,
npm ERR!   code: 'EACCES',
npm ERR!   syscall: 'access',
npm ERR!   path: '/usr/lib/node_modules'
npm ERR! }
npm ERR!
npm ERR! The operation was rejected by your operating system.
npm ERR! It is likely you do not have the permissions to access this file as the current user
npm ERR!
npm ERR! If you believe this might be a permissions issue, please double-check the
npm ERR! permissions of the file and its containing directories, or try running
npm ERR! the command again as root/Administrator.

The relevant NPM logs are attached.

2022-08-12T14_24_33_045Z-debug.log
2022-08-12T14_24_33_060Z-debug.log

Error parsing compressed file containing Cloudwatch event

Hello,

I ran into this issue while testing matano on some sample log files. TransformerLambda fails with the message:
thread 'main' panicked at 'called Result::unwrap()on anErr value: stream did not contain valid UTF-8', transformer/src/main.rs:538:58

The file that I want to parse is delivered by Kinesis Firehose and it is Cloudtrail logs streamed from Cloudwatch to S3. It doesn't have an extension and content-type is marked as 'application/octet-stream'. Inside there is JSON file represting Cloudwatch event.
Important note on that type of file can found here: https://docs.aws.amazon.com/firehose/latest/dev/writing-with-cloudwatch-logs.html.
"CloudWatch log events are compressed with gzip level 6. If you want to specify OpenSearch Service or Splunk as the destination for the delivery stream, use a Lambda function to uncompress the records to UTF-8.and single-line JSON"
I suspect that maybe some additonal parsing is required for that type of file.

bug: An empty detections directory causes init to fail with stack trace

What?

If you have an empty detections/ directory, matano init will fail with a stack trace when trying to bundle the (non-existent) detections with maturin. Deleting the empty directory allows the command to succeed.

Full error output with stack trace
 ›   Bundling asset DPMainStack/Detections/Function/Code/Stage...
 ›
 ›   /snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/asset-staging.js:2
 ›   `),localBundling=options.local?.tryBundle(bundleDir,options),!localBundling){let user;if(options.user)user=options.user;else{const
 ›   userInfo=os.userInfo();user=userInfo.uid!==-1?`${userInfo.uid}:${userInfo.gid}`:"1000:1000"}options.image.run({command:options.command,user,volumes,environment:options.environment,entrypoint:options.entrypoint,workingDirectory:optio
 ›   ns.workingDirectory??AssetStaging.BUNDLING_INPUT_DIR,securityOpt:options.securityOpt??"",volumesFrom:options.volumesFrom})}}catch(err){const bundleErrorDir=bundleDir+"-error";throw
 ›   fs.existsSync(bundleErrorDir)&&fs.removeSync(bundleErrorDir),fs.renameSync(bundleDir,bundleErrorDir),new Error(`Failed to bundle asset ${this.node.path}, bundle output is located at ${bundleErrorDir}:
 ›   ${err}`)}if(fs_1.FileSystem.isEmpty(bundleDir)){const outputDir=localBundling?bundleDir:AssetStaging.BUNDLING_OUTPUT_DIR;throw new Error(`Bundling did not produce any output. Check that content is written to
 ›   ${outputDir}.`)}}calculateHash(hashType,bundling,outputDir){if(hashType==assets_1.AssetHashType.CUSTOM||hashType==assets_1.AssetHashType.SOURCE&&bundling){const hash=crypto.createHash("sha256");return
 ›   hash.update(this.customSourceFingerprint??fs_1.FileSystem.fingerprint(this.sourcePath,this.fingerprintOptions)),bundling&&hash.update(JSON.stringify(bundling)),hash.digest("hex")}switch(hashType){case
 ›   assets_1.AssetHashType.SOURCE:return fs_1.FileSystem.fingerprint(this.sourcePath,this.fingerprintOptions);case assets_1.AssetHashType.BUNDLE:case assets_1.AssetHashType.OUTPUT:if(!outputDir)throw new Error(`Cannot use
 ›   \`${hashType}\` hash type when \`bundling\` is not specified.`);return fs_1.FileSystem.fingerprint(outputDir,this.fingerprintOptions);default:throw new Error("Unknown asset hash type.")}}}exports.AssetStaging=AssetStaging,_a=JSII_RT
 ›   TI_SYMBOL_1,AssetStaging[_a]={fqn:"aws-cdk-lib.AssetStaging",version:"2.56.0"},AssetStaging.BUNDLING_INPUT_DIR="/asset-input",AssetStaging.BUNDLING_OUTPUT_DIR="/asset-output",AssetStaging.assetCache=new cache_1.Cache;function
 ›   renderAssetFilename(assetHash,extension=""){return`asset.${assetHash}${extension}`}function determineHashType(assetHashType,customSourceFingerprint){const
 ›   hashType=customSourceFingerprint?assetHashType??assets_1.AssetHashType.CUSTOM:assetHashType??assets_1.AssetHashType.SOURCE;if(customSourceFingerprint&&hashType!==assets_1.AssetHashType.CUSTOM)throw new Error(`Cannot specify
 ›   \`${assetHashType}\` for \`assetHashType\` when \`assetHash\` is specified. Use \`CUSTOM\` or leave \`undefined\`.`);if(hashType===assets_1.AssetHashType.CUSTOM&&!customSourceFingerprint)throw new Error("`assetHash` must be
 ›   specified when `assetHashType` is set to `AssetHashType.CUSTOM`.");return hashType}function calculateCacheKey(props){return crypto.createHash("sha256").update(JSON.stringify(sortObject(props))).digest("hex")}function
 ›   sortObject(object){if(typeof object!="object"||object instanceof Array)return object;const ret={};for(const key of Object.keys(object).sort())ret[key]=sortObject(object[key]);return ret}function
 ›   singleArchiveFile(directory){if(!fs.existsSync(directory))throw new Error(`Directory ${directory} does not exist.`);if(!fs.statSync(directory).isDirectory())throw new Error(`${directory} is not a directory.`);const
 ›   content=fs.readdirSync(directory);if(content.length===1){const file=path.join(directory,content[0]),extension=getExtension(content[0]).toLowerCase();if(fs.statSync(file).isFile()&&ARCHIVE_EXTENSIONS.includes(extension))return
 ›   file}}function determineBundledAsset(bundleDir,outputType){const
 ›   archiveFile=singleArchiveFile(bundleDir);switch(outputType===bundling_1.BundlingOutput.AUTO_DISCOVER&&(outputType=archiveFile?bundling_1.BundlingOutput.ARCHIVED:bundling_1.BundlingOutput.NOT_ARCHIVED),outputType){case
 ›   bundling_1.BundlingOutput.NOT_ARCHIVED:return{path:bundleDir,packaging:assets_1.FileAssetPackaging.ZIP_DIRECTORY};case bundling_1.BundlingOutput.ARCHIVED:if(!archiveFile)throw new Error("Bundling output directory is expected to
 ›   include only a single archive file when `output` is set to `ARCHIVED`");return{path:archiveFile,packaging:assets_1.FileAssetPackaging.FILE,extension:getExtension(archiveFile)}}}function getExtension(source){for(const ext of
 ›   ARCHIVE_EXTENSIONS)if(source.toLowerCase().endsWith(ext))return ext;return path.extname(source)}
 ›
 ›
 ›
 ›                                                                                                                                                                                   ^
 ›   Error: Bundling did not produce any output. Check that content is written to /var/folders/sq/yzwl6n255zz2s32y1pwymf400000gq/T/matanocdkoutd2YWDY/asset.874a339ec05d104f21b143404e74723ce7d1e51ad089d6a0183a839a53c207ac.
 ›       at AssetStaging.bundle (/snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/asset-staging.js:2:873)
 ›       at AssetStaging.stageByBundling (/snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:4506)
 ›       at stageThisAsset (/snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:1867)
 ›       at Cache.obtain (/snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/private/cache.js:1:242)
 ›       at new AssetStaging (/snapshot/matano/infra/node_modules/aws-cdk-lib/core/lib/asset-staging.js:1:2262)
 ›       at new Asset (/snapshot/matano/infra/node_modules/aws-cdk-lib/aws-s3-assets/lib/asset.js:1:736)
 ›       at AssetCode.bind (/snapshot/matano/infra/node_modules/aws-cdk-lib/aws-lambda/lib/code.js:1:4628)
 ›       at new Function (/snapshot/matano/infra/node_modules/aws-cdk-lib/aws-lambda/lib/function.js:1:2803)
 ›       at new MatanoDetections (/snapshot/matano/infra/lib/detections.ts:48:30)
 ›       at new DPMainStack (/snapshot/matano/infra/src/DPMainStack.ts:76:24)
 ›   [07:47:24] Reading cached notices from /Users/REDACTED/.cdk/cache/notices.json
 ›   [07:47:24] Failed to get tree.json file: Error: /var/folders/sq/yzwl6n255zz2s32y1pwymf400000gq/T/matanocdkoutd2YWDY/tree.json: ENOENT: no such file or directory, open
 ›   '/var/folders/sq/yzwl6n255zz2s32y1pwymf400000gq/T/matanocdkoutd2YWDY/tree.json'. Proceeding with empty tree.
 ›
 ›   Subprocess exited with error 1
 ›   [07:47:24] Error: Subprocess exited with error 1
 ›       at ChildProcess.<anonymous> (/snapshot/node_modules/aws-cdk/lib/api/cxapp/exec.ts:153:23)
 ›       at ChildProcess.emit (events.js:400:28)
 ›       at ChildProcess.emit (domain.js:475:12)
 ›       at Process.ChildProcess._handle.onexit (internal/child_process.js:282:12)
 ›   Created temporary directory for configuration files: /var/folders/sq/yzwl6n255zz2s32y1pwymf400000gq/T/mtnconfignpydzp/config

Transform error with client.geo.location

I'm creating a log source for Okta logs and am struggling to transform log data to the ECS fields client.geo.location.lat and client.geo.location.lon. With the VRL below, I consistently get the error "USER_ERROR: Failed at FindUnionVariant, likely schema issue." in the transformer Lambda. I have pretty much every other Okta log field working.

Looking at the ECS schema JSON, both lat and lon are defined as floats, so this should work.

Relevant VRL transform:
.client.geo.location.lat = to_float(del(.json.client.geographicalContext.geolocation.lat)) ?? null
.client.geo.location.lon = to_float(del(.json.client.geographicalContext.geolocation.lon)) ?? null

Relevant log data:

{
    "json": {
        "client": {
            "geographicalContext": {
                "city": "Ashburn",
                "country": "United States",
                "geolocation": {
                    "lat": 39.0469,
                    "lon": -77.4903
                },
                "postalCode": "20149",
                "state": "Virginia"
            }
        }
}

Any assistance identifying the issue or bug would be appreciated.

Thanks.

Alerting integrations

Tracking for integrating Matano alerts with external systems

Destinations

  • Pagerduty
  • Slack
  • TheHive
  • Webhook
  • Jira
  • ServiceNow
  • Step functions (trigger custom workflow on alert)

Design

Integration

@shaeqahmed

  • Sending rule matches, alerts, schema
  • Updating alerts

Transformer: Add CSV support

Overview

Currently Matano only supports JSONL and text lines for log sources. Ingesting CSV logs would currently require writing custom VRL transform scripts inside expand_records_from_payload which is fairly complicated.

Goal

Add native support for ingesting and parsing CSV data.

Notes

  • Look at file extension to determine if data is CSV.
  • Use Rust csv_async crate to parse.
  • For now can only support CSV with headers.
    • Later can add option in log_source.yml in ingest.csv to specify whether to read headers from first row.
  • Reference here to add an if else for case of CSV and generate a Stream of Result's by wrapping the reader with csv_async async reader instead of LinesCodec .
    None => FramedRead::new(reader, LinesCodec::new())

Managed log source for Signal Sciences audit logs

Add support for managing audit logs from Signal Sciences.

Considerations

Signal Sciences has two types of audit logs:

  • Corp audit logs
  • Site audit logs

Currently, only the site audit logs can be streamed in a useful way, via generic webhooks.

Corp audit logs, which may arguably be more useful, can only be sent to email, Microsoft Teams, or Slack. I have an open feature request (pre-Fastly acquisition) for webhook support that is several years old and seems to have gotten lost in the ether a few times, despite several follow-ups with account reps
and TAMs. 😞

Tasks

  • Research, planning, and design
  • ECS schemas
  • VRL transforms
  • Docs and examples

References

Managed log source for Fastly access logs

Add support for managing access logs from Fastly.

Considerations

This could be a tricky one because Fastly allows you to customize the log format, which also depends on which destination you configure.

Kinesis and S3 will be the most useful.

Tasks

  • Research and design

References

Enrichment support

Tracking issue for enrichment support

Goal

Provide enrichment through enrichment tables in Matano

  • Enrichment table as Iceberg table
  • Enrichment tables for lookup in Python detections
  • Matching against IOCs
  • Dynamic & static enrichment tables (ingesting)
  • Ingesting threat intelligence feed data (IOC’s)

Managed Integrations

  • AbuseCH (Malware Bazaar, Threatfox, URLHaus)
  • GeoIP (Maxmind)
  • Greynoise intelligence (RIOT, Noise)

Forward looking

  • #99
  • Using enrichment tables in scheduled detections (SQL)
  • Auto enrichment (endpoint logs + malware)

Managed log source for GitHub audit logs

Add support for managing GitHub audit logs.

Considerations

Enterprise audit logs are different from normal organization audit logs. Will need to research specifics.

Repo and org-level webhooks may also have different data and structures than the two types of audit logs.

May be able to generate schemas from GraphQL, OpenAPI for REST, or some other way of keeping up with all the many types of GitHub events.

Tasks

  • Research and discovery
    • Organization audit logs
    • Enterprise audit logs
    • General webhooks (repo or org-level)
    • Schema generation
  • ECS schemas
  • VRL transforms
  • Docs and examples

Add a JSON/object type for semi structured data

Overview

Currently, semi structured data must be stringified and defined as a string type. Subsequently, it is always treated as a string type (e.g. in detections).

Goal

Add a JSON or object type that represents JSON/object data. This can be auto-coerced to physical type string for the Iceberg schema. We can use this information to provide additional features for detections and searching.

Notes

Managed log source for AWS LB logs

Add support for managing AWS load balancer logs.

Tasks

  • Classic Load Balancer logs
  • Application Load Balancer logs
  • Network Load Balancer logs

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.