Giter VIP home page Giter VIP logo

attack-datasources's Introduction

⚠️ This repository is no longer necessary as we have finalized the way that Data Sources are included in ATT&CK. If you are looking for Python scripts to interact with ATT&CK STIX data please see our mitreattack-python python library.

ATT&CK Data Sources

As part of the ATT&CK 2021 Roadmap, we have defined a methodology that will help improve how ATT&CK maps adversary behaviors to detection data sources. The idea behind this methodology is to improve quality and consistency of ATT&CK data sources as well as to provide additional information to help users make better use of these values.

The previous image shows only some of the elements that the methodology brings out such as data components and relationships, however it represents the main goal of this project: to better connect the defensive data in ATT&CK with how operational defenders analyze potential adversaries/ behaviors.

Table of Contents

  1. Assembling ATT&CK Data Source Objects
  2. How Data Source Objects Can Support Security Operations?
  3. Where are the New Data Sources Objects Stored?
  4. How can you Consume Data Source Objects Content?
  5. How Can You Contribute?

Assembling ATT&CK Data Source Objects

During the development of this project we have identified that data sources' context can help us better describe adversary activity within a network environment. We have formalized this context through the definition of Data Source Objects within the ATT&CK Object Model. The objects' structure is represented in the following image:

If you are interested on getting a better understanding of the concepts and methodology we have developed so far, please review the following documents and blogs:

How Data Source Objects Can Support Security Operations?

Identification of Relevant Data Sources and Components

A common questions regarding ATT&CK data sources is What data source or component can help me to develop detections for most techniques? The definition of coverage metrics is something the community has been working on since the initial release of the framework. This is a complex problem, but one starting point is to measure the number of listed techniques associated with each data source.

The image above shows that, considering all platforms and tactics within the Enterprise matrix, command execution, process creation, and file modification are a good starting point when analyzing most (sub)techniques.

Identification of Relevant Data Sources and Components: A Graph Perspective

Another way to represent the interaction among techniques, data sources and components is by using a network graph. Using Python libraries such as NetworkX and Matplotlib, we can create a visualization that will support our analysis.

The image above shows the interaction among sub-techniques and recommended data sources and components under the T1134 - Access Token Manipulation technique for Defense Evasion (Tactic) in the Windows (Platform) environments.

Representation of Adversary Behavior

Data components gives us specific context of the activity or metadata related to network security concepts recommended as data sources by the ATT&CK framework.

For instance, let's say the Process data source is recommended for the detection of the T1543.003 - Create or Modify System Process: Windows Service technique. Without any other security context, the first question that might come to your mind is what information about a process is required? The following image shows some of the available option by using data components:

Each data component represents activity and/or information generated within a network environment because of actions or behaviors performed by a potential adversary. The ATT&CK framework (v9) now provides data components that can help you to represent specific actions or behaviors related to a technique. According to the framework, the creation of processes and execution of operating system's API calls are a good starting point from a Process perspective.

Identification of Relevant Security Events

At the beginning of this document, we mentioned that the main goal of this project was to connect the defensive data in ATT&CK with how operational defenders analyze potential adversaries/ behaviors. Even though the scope of this project does not consider mapping security events to data components and relationships, we believe that the information provided by data source objects can help you to identify relevant security data that should be collected in your environment in order to expedite the development of effective detections.

For example, the framework considers Process: Process Creation as a recommended data source for the T1543.003 - Create or Modify System Process: Windows Service technique. The important question here is What security events logs can give me context about the creation of a process? For example, on the Windows platform environments Security Auditing event 4688 and Sysmon event 1 can help us to cover this data source recommendation. The image above shows an example of security events mapped to other recommended data sources for the same technique.

Where are the New Data Sources Objects Stored?

V9 of the ATT&CK framework contains only data components as part of the new metadata for data sources. However, you can find our current Data Source Objects here. We are storing this new metadata using YAML files, but in the future it will be stored in STIX.

name: Process
definition: Information about instances of computer programs that are being executed by at least one thread.
collection_layers:
  - host
platforms:
  - Windows
  - Linux
  - macOS
contributors: 
  - ATT&CK
  - CTID
data_components:
  - name: process creation
    type: activity
    description: A process was created.
    relationships:
      - source_data_element: user
        relationship: created
        target_data_element: process
      - source_data_element: process
        relationship: created
        target_data_element: process
  - name: OS api execution
    type: activity
    description: A process executed operating system api functions.
    relationships:
      - source_data_element: process
        relationship: executed
        target_data_element: api call
references:
  - https://docs.microsoft.com/en-us/windows/win32/procthread/processes-and-threads

How can you Consume Data Source Objects Content?

The idea of storing all this data using YAML files is to facilitate the consumption of data source objects content until we move everything to STIX. So, feel free to use any tool that can handle yaml files and that is available for you. We have prepared a Jupyter notebook using libraries such attackcti, pandas, and yaml to give you an example of how can you gather up-to-date ATT&CK knowledge and YAML files' content. You can find the notebook in the following link.

How Can You Contribute?

We love feedback!! Hopefully, the explanation of our methodology provided in this document helps you to understand the structure of a data source object and gives you an idea on how to come up with new content. Take a look at the current data source objects here, propose or improve data relationships, components, and data sources, and submit a pull request!!

Notice

©2020 Copyright The MITRE Corporation. ALL RIGHTS RESERVED.

Approved for Public Release; Distribution Unlimited. Public Release Case Number 20-2841

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This project makes use of ATT&CK®

ATT&CK Terms of Use

attack-datasources's People

Contributors

adampennin avatar alexiacrumpton avatar cyb3rpandah avatar fenr1r-g avatar glennhd avatar ikiril01 avatar isabella-ma avatar jcwilliamsatmitre avatar jondricek avatar marcusbakker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

attack-datasources's Issues

Inexistant data components references and duplicate sources

Hello,

While working in the new data sources you made, I found that there are some duplicates and some non-referenced types in the techniques descriptions.

In the following example :
image

  • File: File Content does not exist here
  • File: File Creation is referenced multiple times.

If this is not on purpose, I can try and find all occurrences and report them to you through a PR for missing data components and a list for duplicates.

Thanks,

Questions about prior art and specific mappings

Thank you for this! We love ATT&CK, but the data sources sections have always felt a bit "loose" and left mostly as an exercise for the reader. The blog series and this repo prompted a couple questions I hoped you could discuss:

  1. Why not use/extend an existing schema for the abstractions?

    For example, STIX Cyber-observable Objects (SCO) cover some of the same ground, and link nicely with STIX-formatted intel ... like ATT&CK itself. The spec for the objects and their relationships reads a bit like your yaml data sources, and they can be reified with real data. Seems like STIX SCO is a natural fit, plus it has a well-thought-out relationship model, serialization format, extensions, etc.

    The Elastic Common Schema (ECS) is great too - it's permissively licensed, available for collaboration on github, has abstractions for many of the examples you provide (users, processes, etc.), and is already powering a lot of searches, visualizations, and analytics. We see it more in ops contexts, and it's perhaps a bit more flexible than SCOs. For example, you see it frequently merged with existing event data so you get the benefit of the abstractions without sacrificing the specificity of the original event.

    One of the beautiful things about ATT&CK is it reduced bike-shedding over terminology and helped the infosec community focus - STIX and ECS have put a lot of similar work, seems good to stand on the shoulders of giants. Naming things is hard, and it takes time to overcome intuitions (even at the top level: e.g., to my ear the phrase "data source" connotes the place you get the data, rather than an abstraction of the observable, but I'm just one guy 🙂).

    In any case, if ATT&CK leveraged one of these for the abstract entities, seems you could save energy for more ATT&CK-specific work like mapping those to (sub-)techniques or the actual concrete logs/artifacts.

  2. Are there plans to be more specific about mappings to artifacts?

    Presumably the idea is that (sub-)techniques would eventually use these new abstract data sources to replace or augment the text in the current "Data Sources" section. Unfortunately, unless I'm missing something, the proposed model doesn't seem to have a way to capture links to the concrete logs/artifacts.

    For example, the mapping example in figure 13 in part 2 of the blog series illustrates this last step:

    Data source mapping example

    That is, it shows links from the data components to specific event logs on the right, and that last step is really useful ... but it doesn't actually live anywhere in this repo's proposed approach. For many teams that last leg is the hard part! If we took your schema, for example, maybe added something like:

    - name: Service
      definition: Information about software programs that run in the background ...
      example_artifacts:
        - {os: windows, artifact: Security Audit Event 4688}
        - {os: windows, artifact: Sysmon Event 1}
        - {os: windows, artifact: Prefetch file}
        - {os: linux, artifact: auditd SYSCALL event}
        - {os: linux, artifact: auditd EXECVE event}
        # etc

    Perhaps this is considered out of scope, but hopefully not; it'd be great to see something as authoritative as ATT&CK pointing folks to specific useful artifacts rather than just the abstraction. I'd love to hear your thoughts.

Thanks again for your hard work on this and all the related projects, I look forward to learning more!

Permanent UUID or ID in attack-datasources

Thanks for the project. It's a very good idea.

  • Will you add an fixed/permanent UUID or ID in the sources?

It could be useful for many project to reuse the same data source description or create relationships on a permanent basis (just like we do in CyCAT.org).

Support NIDS and WAF via new 'network traffic content' relationship

Hello.

With the new DS structure NIDS and WAF are no longer available. A new relationship could be created in order to improve the mapping with alert related events:

  • Data source: Network Traffic
  • Data component: network traffic content
  • Relationship:
  - source_data_element: network traffic        
    relationship: triggered        
    target_data_element: alert

Thanks in advance.

Small loading error

When I tried to load logon_session.yml I have gotten the error

mapping values are not allowed here
at line 22, column 72.

The offending line is
description: Data and information that describe a logon session (ex: logon type) and activity within it.

It can be solved by removing the space between the colon and logon
description: Data and information that describe a logon session (ex:logon type) and activity within it.

Best regards,
Sascha90

Questions about data format

I found this new data sources very promising as someone coming from the ATT&CK matrix world looking for reducing the gap between events and CTI.

This is more a design question than an issue:

  1. Why did you choose YAML over JSON that is widely used in the cti repo ?
  2. Why did not you follow the STIX format to make it more easily connectable to the (sub)technique from the same cti repo ?

KeyError: "['x_mitre_is_subtechnique'] not in index"

  • This error occurs in the notebook_functions.py file at the get_attack_dataframe function.

Below Commands in .ipnyb file reproduce this error:
attack = get_attack_dataframe()
attack.head()

output :

KeyError Traceback (most recent call last)
Input In [32], in
----> 1 attack = get_attack_dataframe()
2 attack.head()

File D:\Dec-\attack-datasources-main\docs\scripts\notebook_functions.py:57, in get_attack_dataframe(matrix)
53 attck = json_normalize(attck)
54 # view available columns - my line
55 #print(attck.columns)
56 # selecting columns
---> 57 attck = attck[['technique_id','x_mitre_is_subtechnique','technique','tactic','platform','data_sources']]
59 # Splitting data_sources field
60 attck = attck.explode('data_sources').reset_index(drop=True)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py:3464, in DataFrame.getitem(self, key)
3462 if is_iterator(key):
3463 key = list(key)
-> 3464 indexer = self.loc._get_listlike_indexer(key, axis=1)[1]
3466 # take() does not accept boolean indexers
3467 if getattr(indexer, "dtype", None) == bool:

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1314, in _LocIndexer._get_listlike_indexer(self, key, axis)
1311 else:
1312 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
-> 1314 self._validate_read_indexer(keyarr, indexer, axis)
1316 if needs_i8_conversion(ax.dtype) or isinstance(
1317 ax, (IntervalIndex, CategoricalIndex)
1318 ):
1319 # For CategoricalIndex take instead of reindex to preserve dtype.
1320 # For IntervalIndex this is to map integers to the Intervals they match to.
1321 keyarr = ax.take(indexer)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py:1377, in _LocIndexer._validate_read_indexer(self, key, indexer, axis)
1374 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1376 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
-> 1377 raise KeyError(f"{not_found} not in index")

KeyError: "['x_mitre_is_subtechnique'] not in index"

Fix Definition for Module

definition: Information about module files such as executable, dynamic link library (dll), executable and linkiable format (elf), and Mach-o consisting of one or more classes and interfaces.

There's a minor typo (linkiable) in the definition text, and I think the overall definition can be modified a little for accuracy since PE/ELF/Mach-O encompass both executables and libraries. I would suggest:

Information about module files consisting of one or more classes and interfaces, such as portable executable (PE) format executables/dynamic link libraries (DLL), executable and linkable format (ELF) executables/shared libraries, and Mach-O format executables/shared libraries.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.