Giter VIP home page Giter VIP logo

auditree-harvest's Introduction

OS Compatibility Python Compatibility pre-commit Code validation Upload Python Package

auditree-harvest

The Auditree data gathering and reporting tool.

Introduction

Auditree harvest is a command line tool that assists with the gathering and formatting of data into human readable reports. Auditree harvest allows a user to easily retrieve historical raw data, in bulk, from a Git repository and optionally format that raw data to meet reporting needs. Auditree harvest is meant to retrieve and report on historical evidence from an evidence locker. It is, however, not limited to just processing evidence. Any file found in a Git repository hosting service can be processed by Auditree harvest.

Prerequisites

  • Supported for execution on OSX and LINUX.
  • Supported for execution with Python 3.6 and above.

Python 3 must be installed, it can be downloaded from the Python site or installed using your package manager.

Python version can be checked with:

python --version

or

python3 --version

The harvest tool is available for download from PyPI.

Installation

It is best practice, but not mandatory, to run harvest from a dedicated Python virtual environment. Assuming that you have the Python virtualenv package already installed, you can create a virtual environment named venv by executing virtualenv venv which will create a venv folder at the location of where you executed the command. Alternatively you can use the python venv module to do the same.

python3 -m venv venv

Assuming that you have a virtual environment and that virtual environment is in the current directory then to install a new instance of harvest, activate your virtual environment and use pip to install harvest like so:

. ./venv/bin/activate
pip install auditree-harvest

As we add functionality to harvest users will want to upgrade their harvest package regularly. To upgrade harvest to the most recent version do:

. ./venv/bin/activate
pip install auditree-harvest --upgrade

See pip documentation for additional options for using pip.

Configuration

Since Auditree harvest interacts with Git repositories, it requires Git remote hosting service credentials in order to do its thing. Auditree harvest will by default look for a username and token in a ~/.credentials file. You can override the credentials file location by using the --creds option on a harvest CLI execution. Valid section headings include github, github_enterprise, bitbucket, and gitlab. Below is an example of the expected credentials entry.

[github]
username=your-gh-username
token=your-gh-token

Execution

Collate data

To collate historical versions of a file from a Git repository hosting service like Github, provide the repository URL (repo positional argument), the relative path to the file within the remote repository including the file name (filepath positional argument) and an optional date range (--start and --end arguments). You can also, optionally, provide the local Git repository path (--repo-path argument), if the repository already exists locally and you wish to override the remote repository download behavior.

harvest collate https://github.com/org-foo/repo-bar /raw/baz/baz.json --start 20191201 --end 20191212 --repo-path ./bar-repo
  • File versions are written to the current local directory where harvest was executed from.
  • File versions are prefixed by the commit date in YYYYMMDD format.
  • File versions are gathered with daily granularity.
    • Only the latest version of a file for a given day is retrieved.
    • If a file did not change on a date then no file version is written for that date. Instead the latest version prior to that date serves as the version of that file for that date.
  • If you don't provide a --start and an --end then the latest version of a file is retrieved.
  • If you only provide a --start date file versions from the start date to the current date are retrieved.
  • If you only provide an --end date the latest version of a file for the end date is retrieved.

Generate report(s)

To run a report using content contained in a Git repository hosting service like Github, provide the repository URL (repo positional argument), the report package (package), the report name (name positional argument) and include any configuration that the report requires (--config) as a JSON string. You can also, optionally, provide the local Git repository path (--repo-path argument), if the repository already exists locally and you wish to override the remote repository download behavior.

harvest report https://github.com/org-foo/repo-bar auditree_arboretum check_results_summary --config '{"start":"20191212","end":"20191221"}'

Getting report details

To see a full summary of available reports within any package (like auditree-arboretum) do:

harvest reports auditree_arboretum --list

To see details on a specific report that include usage example do something like:

harvest reports auditree_arboretum --detail check_results_summary

Report development

Reports should be hosted with the fetchers/checks that collect the evidence for the reports process. Within auditree-arboretum this means the code lives in the appropriate provider directory. Contributing common harvest reports are as follows:

  1. Adhere to the auditree-arboretum contribution guidelines - TODO add link.
  2. Reports go in the "reports" folder by provider.
  3. Create a python module with a class that extends the BaseReporter class.
    • The harvest CLI will use the report module name as the name of the report (sans the .py extention).
    • Only one report class per report module is permitted.
  4. In the new report class the expectations are as follows:
    • Provide a module level docstring that contains:
      • A single line summary
      • A detailed description of the report that includes evidence/files being processed and expected configuration
      • At least one usage example
      • Use the check results summary report docstring as an example/template.
      • harvest uses this docstring to display available reports and their details to the user.
    • Provide/Override the report_filename property to return the name of the report (including extension). harvest uses this property to apply a report template (if desired) and to determine which writer function to use when writing the report to a file. Use the check results summary report report_filename property and the Python packages summary report report_filename property as examples.
    • Provide/Override the generate_report method. This is where you put your evidence processing and report formatting logic. Use the check results summary report generate_report method as an example.
      • harvest takes the optional --config command line argument as a JSON string when executing a report, converts it to a dictionary and attaches it as the config attribute to your report object. Use the report object's config attribute in the generate_report method if you plan to have report specific configuration options.
      • Your report object also has a method that retrieves an evidence file for a given date. Use the report object's get_file_content method when retrieving evidence from an evidence locker.
      • Generating CSV reports:
        • harvest uses the Python CSV writer to write out the report file. So be sure that your generate_report method returns a list of dictionaries that adheres to the expectations of the Python CSV writer.
      • Generating reports from a Jinja2 template:
        • Add a report template named the same as your report_filename property with a .tmpl extension. harvest will start to look for the template in the same directory as the report module. So as long as it exists within that directory structure, harvest will find it. Use python_packages_summary.md.tmpl as an example.
        • harvest will look for this template file as part of your report processing and, if found, will pass your generate_report returned content through the template logic.
        • Your generate_report returned content should be a dictionary with everything necessary for your report template to render the desired report appropriately.
        • The report template can access the "raw" content generated by generate_report through a dictionary named data and also has access to the report's attributes through the report object. Use python_packages_summary.md.tmpl as an example.
      • Generating reports without templates:
        • You just want to generate report content directly from generate_report? No problem. Just generate a string as the report content or a list of strings as the rows of the report content and harvest will do the rest.

Custom report development

If you find that you have a specific reporting need that does not fit in as a common harvest report, no problem. Just develop the report in a separate repo/project following the same guidelines as above. As long as the package is importable by python and you tell harvest what package to look for your report(s) in via the CLI, it will handle the rest.

auditree-harvest's People

Contributors

alfinkel avatar cletomartin avatar rhyshort avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

auditree-harvest's Issues

Remove creds requirement when running in local mode

Overview

Remove credentials requirements when running in local mode.

Requirements

  • Remove credentials requirements when running in local mode.
  • You shouldn't need a credentials file if you're running in local mode.

Approach

TBD

Security and Privacy

N/A

Test Plan

TBD

Add OSCAL support

Overview

OSCAL will define a specific format for Assessment Results. We should add support to harvest report that can format results to meet that OSCAL format.

Requirements

  • Provide the ability to format report content to meet the OSCAL Assessment Results format.

Approach

Security and Privacy

N/A

Test Plan

TBD

Add verbose option

Overview

The tool should provide the option of displaying operation progress to standard out.

Requirements

  • verbose option should be off by default
  • verbose option should display to standard out
    • Any git operation
    • When a file has been found for a given date
    • etc...

Approach

  • See req.
  • TBD

Security and Privacy

git repo read/view access is expected

Test Plan

  • Unit tests and integration tests

Formalize local execution

Overview

There's been some interest in harvest working exclusively on a local git repo without the notion of a remote counterpart. To that end we should formalize functionality that allows for harvest to target any local git repo.

Requirements

  • Allow for the repo positional argument to be set to local.
  • repo as local must be paired with the --repo-path argument.

Approach

TBD

Security and Privacy

N/A

Test Plan

TBD

Bulk collate option

Overview

Add the ability to run collate operations on a series of repos and files and configurations.

Requirements

  • bulk collate option
  • use a bulk configuration JSON file
  • functionality should mirror individual collate operations

Approach

TBD

Security and Privacy

N/A

Test Plan

TBD

Add option to refresh non-harvest local repo

Overview

We should add an option to permit harvest to refresh a local repo that it did not itself standup.

Requirements

  • When providing a --repo-path we need to add an option to permit the collator to refresh that environment. Current behavior is to only let harvest refresh a repo that it pulled down originally.
  • As part of this enhancement we should also change logic to allow harvest to pull down a repo locally to the repo path provided if no repo existed in that location. Thereby allowing for harvest to write local repos to a non-$TMPDIR location.

Approach

TBD

Security and Privacy

TBD

Test Plan

TBD

Make repo branch configurable

Overview

We want to add an option to the CLI to allow for users to override the branch of their local repo when retrieving files or generating reports based on file content.

Requirements

  • branch should be optional
  • --branch
  • defaults to master

Approach

TBD

Security and Privacy

N/A

Test Plan

  • Unit tests and integration tests

Check for new versions

Overview

Add a check to see if a new version of harvest exists and if it does suggest a pip install auditree-harvest --upgrade.

Requirements

  • Check for new version pre execution of a command
  • If exists suggest pip install auditree-harvest --upgrade otherwise do nothing

Approach

See req

Security and Privacy

N/A

Test Plan

TBD

Harvest orchestrator/aggregator

Overview

Multiple harvest reports may need to be run to answer an audit, and possibly their results will need aggregating into a single file, for example an OSCAL Assessment Result. We should facilitate this in Harvest or with tooling "around" it.

Requirements

  • multiple reports can be run from a single invocation
  • their results can be combined

Approach

  • I think you're going to need to identify reports that can be combined (e.g. that are producing reports in the same format) and have some kind of plugin/awareness per "type". Maybe that's just OSCAL, though?
  • It would be nice if this were done as a "report of reports" in vanilla harvest - maybe that's possible already?

Security and Privacy

Provide the impact on security and privacy as it relates to the completion of
this issue. This level of detail may not be available at the time of
issue creation and can be completed at a later time. N/A if not applicable.

Test Plan

Provide the test process that will be followed to adequately verify that the
approach above satisfies the requirements provided. This level of detail may
not be available at the time of issue creation and can be completed at a later
time.

Add a force refresh option

Overview

Similar to #10 we should have a --force-refresh option for the repo specified. This option will remove the old local copy, if it exists and provide a fresh local clone.

Requirements

  • Add --force-refresh option
  • If selected, delete the repo from $TMPDIR before collating or reporting

Approach

TBH

Security and Privacy

N/A

Test Plan

TBD

Add compression archive option

Overview

We should provide the option to compress all files into one archive artifact.

Requirements

  • Add option to archive
  • Default is false
  • Archive to a single artifact

Approach

See req.

Security and Privacy

N/A

Test Plan

  • Unit tests and integration tests

Bulk report option

Overview

Add the ability to run reports on a series of repos and reports and configurations.

Requirements

  • bulk report option
  • use a bulk configuration JSON file
  • functionality should mirror individual report operations

Approach

TBD

Security and Privacy

N/A

Test Plan

TBD

Add --output-location option

Overview

We need an option to configure the location of harvest output.

Requirements

  • Add an option to optionally configure the location of harvest results.
    • Applicable to collate sub-command
    • Applicable to report sub-command

Approach

  • Update CLI
  • Update write functionality for both collate and report.

Security and Privacy

N/A

Test Plan

TBD

Reconcile a bad git repo

Overview

At times harvest managed git repos get corrupted usually when putting your mac into sleep mode. When this is encountered in a harvest managed local git repo harvest should remove the repo and re-clone it.

Requirements

See:

meh

  • When a git.exc.InvalidGitRepositoryError is encountered for a harvest managed git repo, remove the corrupted repo and re-clone.

Approach

  • See req.
  • TBD

Security and Privacy

N/A

Test Plan

TBD

No reports available message

Overview

When no reports are available in a package, display a message that no reports are available in the package.

Requirements

  • When performing harvest reports <package name>, if no reports are available, then display No reports found in <package name>. Try another package.
  • Fix readme to reference arboretum module rather than auditree-arboretum package.
  • Bring readme in line with contents of arboretum. Related to ComplianceAsCode/auditree-arboretum#59

Approach

See req

Security and Privacy

N/A

Test Plan

TBD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.