Giter VIP home page Giter VIP logo

pe-reports's Introduction

Posture & Exposure Reports (P&E Reports)

GitHub Build Status CodeQL Coverage Status Known Vulnerabilities

This package is used to generate and deliver CISA Posture & Exposure Reports (P&E Reports). Reports are delivered by email and include an encrypted PDF attachment with a series of embedded raw-data files of the collected materials. The reports are delivered in a two step process. First the pe_reports module collects the raw data and creates the encrypted PDFs. The pe_mailer then securely delivers the content.

Topics of interest include Exposed Credentials, Domain Masquerading, Malware, Inferred Vulnerabilities and the Dark Web. The data collected for the reports is gathered on the 1st and 15th of each month.

Requirements

Installation

  • git clone https://github.com/cisagov/pe-reports.git

  • pip install -e .

Create P&E Reports

Usage:
  pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--log-level=LEVEL]

Arguments:
  REPORT_DATE                   Date of the report, format YYYY-MM-DD.
  OUTPUT_DIRECTORY              The directory where the final PDF reports should be saved.
Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  --log-level=LEVEL             If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]

Deliver P&E Reports

Usage:
  pe-mailer [--pe-report-dir=DIRECTORY] [--db-creds-file=FILENAME] [--log-level=LEVEL]

Arguments:
  -p --pe-report-dir=DIRECTORY  Directory containing the pe-reports output.
  -c --db-creds-file=FILENAME   A YAML file containing the Cyber
                                Hygiene database credentials.
                                [default: /secrets/database_creds.yml]
Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  -s --summary-to=EMAILS        A comma-separated list of email addresses
                                to which the summary statistics should be
                                sent at the end of the run.  If not
                                specified then no summary will be sent.
  -t --test_emails=EMAILS       A comma-separated list of email addresses
                                to which to test email send process. If not
                                specified then no test will be sent.
  -l --log-level=LEVEL          If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]

Database backup/restore

Follow the instructions below to backup the P&E database instance and restore locally.

In the P&E database environment:

  • Pull the latest repository
  • If necessary, edit ./src/pe_reports/pe_db/pg_backup.sh and replace the default output path ($PWD) with your preferred output path.
  • Open terminal and run: bash ./src/pe_reports/pe_db/pg_backup.sh
  • Export resulting .zip file

In your local environment:

  • Pull the latest repository
  • If necessary, edit ./src/pe_reports/pe_db/pg_restore.sh and replace the default path to the backup files ($PWD) with your preferred path.
  • Start local postgres
  • Open terminal and run: bash ./src/pe_reports/pe_db/pg_restore.sh

Collect P&E Source Data

  • Add database and data source credentials to src/pe_reports/data/config.ini
Usage:
  pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS]

Arguments:
  DATA_SOURCE                       Source to collect data from. Valid values are "cybersixgill",
                                    "dnstwist", "hibp", and "shodan".
Options:
  -h --help                         Show this message.
  -v --version                      Show version information.
  -l --log-level=LEVEL              If specified, then the log level will be set to
                                    the specified value.  Valid values are "debug", "info",
                                    "warning", "error", and "critical". [default: info]
  -o --orgs=ORG_LIST                A comma-separated list of orgs to collect data for.
                                    If not specified, data will be collected for all
                                    orgs in the pe database. Orgs in the list must match the
                                    IDs in the cyhy-db. E.g. DHS,DHS_ICE,DOC
                                    [default: all]
  -csg --cybersix-methods=METHODS   A comma-separated list of cybersixgill methods.
                                    If not specified, all will run. Valid values are "alerts",
                                    "credentials", "mentions", "topCVEs". E.g. alerts,mentions.
                                    [default: all]

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

pe-reports's People

Contributors

aloftus23 avatar arcsector avatar arng4108 avatar cduhn17 avatar dav3r avatar dependabot[bot] avatar djensen94 avatar edujosemena avatar felddy avatar hillaryj avatar jasonodoom avatar jmorrowomni avatar jsf9k avatar mcdonnnj avatar schmelz21 avatar stewartl97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pe-reports's Issues

Update pe-mailer error handling

๐Ÿ’ก Summary

Include the proper "Exceptions" and "Logging" to handle errors that may cause the pe-mailer module "Fails and Traceback messaging". On run pe-mailer ... ... ... the module should run start-to-finsh regards less of a lack of module resources, or return a "Usage" error message.

For a code example see pe-reports fix and its scope.

Motivation and context

It provides a proper implementation of the pe-mailer module, and handles errors properly when importing and running.

Implementation notes

Please provide details for implementation, such as:

  • Explore all probable Traceback Issues
  • Implement Exceptions to issues with proper messaging.

Acceptance criteria

  • % import pe-mailer
  • % pe-mailer pande_report_dir= db_creds_file= with bad references but runs to completion with no Traceback errors.

Add Windows option to convert a .pptx file to a .pdf.

๐Ÿ’ก Summary

Current solution uses LibreOffice to convert PowerPoint Presentation to a PDF file. This works well for both Mac OSX and Linux systems. A possible code modification could be made to include Windows operating systems.

Motivation and context

This is a critical process in producing an encrypted PDF for a customer. This would be useful for persons using a Windows operating system.

Implementation notes

To implement write in a condition to check system OS:
if os.name == ''posix':
linuxStuff()
elif os.name == 'nt':
windowsStuff()
elif os.name == 'os2': ...

If Windows use WIN COM, not LibreOffice to convert pptx to pdf. The user must have PowerPoint installed.

from pathlib import Path
import win32com.client

def ppt2pdf(ppt_target_file):
file_path = Path(ppt_target_file).resolve()
out_file = file_path.parent / file_path.stem
powerpoint = win32com.client.Dispatch("Powerpoint.Application")
pdf = powerpoint.Presentations.Open(file_path, WithWindow=False)
pdf.SaveAs(out_file, 32)
pdf.Close()
powerpoint.Quit()

ppt2pdf('')

Acceptance criteria

How do we know when this work is done?

  • Can be run on Windows OS.
  • README.md Installation instructions address the option.

Define constants for graph sizing and positioning.

๐Ÿ’ก Summary

In pages.py, remove hard-coded values addressing graph size positioning:

  • x = width
  • y = hight
  • cx = graph position horizontal
  • cy = graph position vertical

Example code:

 x, y, cx, cy = Inches(8.25), Inches(4.9), Inches(4.5), Inches(2.0)
 chart = slide.shapes.add_chart(
            XL_CHART_TYPE.COLUMN_STACKED_100, x, y, cx, cy, chart
            ).chart
            Graph.chart_sm(prs, slide, chart)

Motivation and context

Additional value will be found on a complete P&E Reports build-out as each graphs size and positioning may constitute a method or "graph type" in stylesheets.py.

Implementation notes

Issue noted - #6 (comment)
Create constants for each graph type.

Acceptance criteria

How do we know when this work is done?

  • No hard-coded values will be found in the page generating functions.

Standardize API formula

๐Ÿ’ก Summary

There is a large amount of data sources used to generate PE Reports +1.0.0. A standard code set is required to build continuity between developers for making request.

Motivation and context

This issue is to review and discover best methods for the PE Team's API development.

Implementation notes

Please review documents in the PE Teams File repository located:

Documents/Posture and Exposure/Temporary/Proposed API Standard/ Proposed API Standardization.docx

Acceptance criteria

  • PE Team Sign-off.

Credential Exposure Data Schema

๐Ÿ’ก Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

  • create at least one table that emulates how the data will be stored
  • add to the data_schema.sql file located here

Acceptance criteria

  • At least one table is created
  • Each field needed for the report is included
  • Table is commented out

pe-mailer module "tests"

๐Ÿ’ก Summary

Add in "tests" for the pe-mailer module. These tests may include the following:

  • test_message.py
  • test_pandemessage.py
  • test_reportmessage.py
  • test_statsmessage.py

Motivation and context

The tests are added for quality assurance and is standard practice for cisago/ repos.

Implementation notes

  • create appropriate scripts for pe-mailer; some derived from cyhy-mailer test
  • add to pe-reports/tests

Acceptance criteria

How do we know when this work is done?

  • Pass unittesting, pytest

Error Handling for PDF output over 20MB

๐Ÿ› Summary

Some clients can't receive files over 20 MB and bounce back when emailed. We should add some functionality in our scripts to either flag on pdfs that are over 20MB, or automatically zip the files once the csv attaching and encrypting process has been complete.

To reproduce

Steps to reproduce the behavior:

  1. Run production and evaluate attachment file sizes. This is a rare occurrence, but can happen.

Expected behavior

In the pe-report scripts, flag any reports that are greater than or equal to 20MB.

Alternatively remove the "content" of the web mentions (the post/comment text) because it creates the large file sizes. The 'content' of the dark web mentions can still be included as the data is limited. Other attributes of the web mentions should not be removed (E.g. title and url)

Possible solutions:

  • Get rid of web mentions raw data and keep dark web.
  • Truncate the list.
  • Don't embed raw data.
  • Re-evaluate pe_reports code.
  • Flag files sizes over 20MB

Any helpful log output or screenshots

Screen Shot 2021-07-09 at 11 24 23 AM

Define `libreoffice_exe()` failure outcome

๐Ÿ’ก Summary

At the function libreoffice_exec() What happens in this code if sys.platform == "darwin" is false?
def libreoffice_exec(): """Call to MacOS LibeOffice App.""" if sys.platform == "darwin": return "/Applications/LibreOffice.app/Contents/MacOS/soffice"

Motivation and context

The failure of this function will stop the execution of the program.

This would be useful because not only would the program finish with a return value of 0.

The improvement could make it possible to run the software on other OS versions.

Acceptance criteria

How do we know when this work is done?

  • Return value 0 on completion of program
  • Alternative functionality to launch Libreoffice regardless of OS version or environment

Research usage of Docker to support P&E Report architecture.

๐Ÿ’ก Summary

The primary goal is to build a transferable environment to run P&E Reports. The intended architecture includes a Python framework, Docker image and a Postgres database.

Motivation and context

P&E Reports requires the process of scanning external resources for data, storing the data, export data/analytics to a formalized report and then deliver output via email to subscribed customers. A primary requirement is to create an environment that can be delivered by various 'users'. A contained solution would provide for this along with enhancing security and stability to all data processes.

Noted Resources:

Implementation notes

  • Discover and document probably solutions.
  • Build a Proof-of-Concept (POC).

Acceptance criteria

How do we know when this work is done?

Adjust table column names for proper formatting and standardized naming conventions

๐Ÿ’ก Summary

The names of the table columns in the report currently come from the names found in the database, these all use lower case with underscores instead of spaces. Naming convention is based off of the api or what makes most sense to P&E but may not be the best for the end customer. Need to adjust Column names for every table in the report to standardize the naming and formatting to maximize comprehension

Motivation and context

We want to make the tables as easy to read as possible. So data is very easy to understand for the end customer.

Implementation notes

Implementation will occur in the report_metrics or pages scripts where we manipulate the dataframes for the report.
We just need to change the column names before sending the dataframe to the table builder function.

Acceptance criteria

How do we know when this work is done?

  • Criterion
  • All tables have well formatted and easy to understand column names

Change Verbiage in PDF Report (Credential Publication and Abuse)

๐Ÿ› Summary

Verbiage used under section "Credential Publication and Abuse" on page 4 is incorrect.

Expected behavior

The verbiage needs to be change to the following:

"Exposed credentials put systems at risk for unauthorized access and users at risk for highly effective phishing attacks."

Create PDF Reports From HTML/CSS

๐Ÿ’ก Summary

Find a solution that leverages HTML/CSS code builds to PDF output.

Motivation and context

The current method using .pptx files and converting on Macbooks ruins the formatting elements.

Implementation notes

  • This new report process should be able to generate complete and accurate reports without utilizing middleware applications.

Acceptance criteria

How do we know when this work is done?

  • P&E full reports can be generated using HTML as the transition.
  • All charts and objects contained within the report are formatted and placed as expected.
  • Text contained within the report is free of any spelling or grammatical errors.

Remove hard-coded file locations

๐Ÿ’ก Summary

In Pages Class remove hard-coded references to files.

Motivation and context

This would be useful because it removes any directory management issues.
Code Example:
df_customer = pd.read_csv("src/pe_reports/data/csv/dhs_customer.csv")

Implementation notes

  • Create a data-dictionary reference or method to resolve file paths in class.

Acceptance criteria

How do we know when this work is done?

  • Remove hard-coded file paths
  • pe-reports passes checks and run

Clean up unnecessary characters from table values

๐Ÿ’ก Summary

A lot of the data that comes from the APIs has unnecessary characters i.e. various braces {}[] and single quotes. We need to adjust the table building function to remove those values to maximize readability

Motivation and context

The tables are a key part of the report, so they need to be as clean as possible

Implementation notes

The table generator function already loops through all the values, so as it loops through it should identify unnecessary characters and strip them out, shouldn't be too difficult

Acceptance criteria

How do we know when this work is done?

  • Criterion
  • Table values are clean of unnecessary characters

Replace eval() functions

๐Ÿ› Summary

Python has an eval() function which evaluates a string of Python code bandit check has "blacklisted" as an insecure function.

To reproduce

Steps to reproduce the behavior:

  1. Run branch pre-commit: https://github.com/cisagov/pe-reports/tree/AL-working-v1.0

Expected behavior

A pre-commit run --all-files should render no bandit errors.

Any helpful log output or screenshots

Example Test Error:

Test results:
>> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval.
   Severity: Medium   Confidence: High
   Location: src/pe_reports/report_metrics.py:191
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b307-eval
191                 k = eval(row["right"])

One-off password retrieval

๐Ÿ’ก Summary

Feature to allow setting a password manually for one-off or small group reports. This feature would only be accessible for P&E Team members.

Motivation and context

This would be useful because it will allow P&E Team members respond in a timely manor for one-off report requests are made. It will also reduce the risk of official passwords being used outside of the context of data sharing with customers.

Implementation notes

  • This feature should use command line switches to control how passwords are gathered. Default can be to pull from CyHy database, however this should be avoided when doing demo or test report runs as the passwords should only be used when communicating directly with the customers.

Acceptance criteria

How do we know when this work is done?

  • Individual password can be set via command line switch for single report.
  • Individual password (same) can be set via command line switch for multiple report.
  • Password list can be passed to apply to multiple reports (optional).
  • Proper security features are enabled and tested

DRY out this code (PE Report 0.1.1)

๐Ÿ’ก Summary

This Issue has been split for both PE Report 0.1.1 "main" and for PE Report 1.0.0 "develop" - Issue #18, as both repos need updated, but constitute separate efforts.

There is a repeat pattern of code used in pages.py. Noted in comments bug-unusable-package, here and here The code should be re-written as a single function/class, unambiguous and non-repeatable.

Please provide details for implementation, such as:

  • An evaluation of updated Matplotlib code to pptx file code is needed.
  • Remove repetitive code - standardize graphing output.

Acceptance criteria

How do we know when this work is done?

  • Repetition of code is reduced.

Report QA: Confirm assets via WhoIs service

๐Ÿ’ก Summary

A secondary check/script is needed to validate data feeds which include a list of a customer's assets. A "net-range validation process" or "whois" check is needed to filter bad assets.

Motivation and context

This improves the quality of the report through validation and cuts down on any speculative data included in the reports.

Implementation notes

Please provide details for implementation, such as:

  • Define Solution

Acceptance criteria

How do we know when this work is done?

  • IP Addresses should be pulled from the CyHy database instead of LG to identify IP Addresses that are valid for the particular agency.
  • The team should be able to perform asset checks on a regular basis to determine if the current IP addresses on record with CyHy are valid.

Remove all 'subprocess' package incidences

๐Ÿ› Summary

Python package subprocess is flagged by bandit check as "blacklisted".

To reproduce

Steps to reproduce the behavior:

  1. Run branch pre-commit: https://github.com/cisagov/pe-reports/tree/AL-working-v1.0

Expected behavior

A pre-commit run --all-files should render no bandit errors.

Any helpful log output or screenshots

Paste the results here:

>> Issue: [B404:blacklist] Consider possible security implications associated with subprocess module.
   Severity: Low   Confidence: High
   Location: src/pe_reports/report_generator.py:29
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess

29      import subprocess

Add command line functionality.

๐Ÿ’ก Summary

Add in command line functionality that addresses cyhy database connection, data input and data output arguments.

Motivation and context

Critical feature to ensure a secure and proper distribution of P&E Reports.

Implementation notes

Intended usage should follow:

"""A tool for creating Posture & Exposure reports.

Usage:
    pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--db-creds-file=FILENAME] [--log-level=LEVEL]

Arguments:
  REPORT_DATE                   Date of the report, format YYYY-MM-DD.
  DATA_DIRECTORY                The directory where the excel data files are located.
                                Organized by owner.
  OUTPUT_DIRECTORY              The directory where the final PDF reports should be saved.
  -c --db-creds-file=FILENAME   A YAML file containing the Cyber
                                Hygiene database credentials.
                                [default: /secrets/database_creds.yml]

Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  --log-level=LEVEL             If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]
"""

Acceptance criteria

How do we know when this work is done?

  • Successfully generate P&E Reports from command line interface.
  • DATA_DIRECTORY input data file locations are not hard-coded.
  • OUTPUT_DIRECTORY is created when scripts are run.
  • Proper Exceptions are made if -db-creds-file is not available or fails.
  • log-level option type is added.

Replace python-pptx graphs with Matplotlib graphing libraries

๐Ÿ’ก Summary

Due to graphing errors with python-pptx and pdf conversions on mac OS, all graphing functions will be replace with the Matplotlib package.

Motivation and context

Primarily this improvement allows for better quality control and addresses the open issues presented in #9 "Define constants for graph sizing and positioning".

Implementation notes

Please provide details for implementation, such as:

  • From the local P&E Teams repository take scripts from charts.py and update GitHub stylesheets.py
  • Adjust references in pages.py
  • Update setup.py with Matplotlib instillation requirements

Acceptance criteria

How do we know when this work is done?

  • Report runs end-to-end from command-line.
  • Must pass Issue requirements - #18
  • Must pass Issue requirements - #21
  • Code meets CISA dev-standards and tests.

New P&E Reports Design 1.0

๐Ÿ’ก Summary

Enhance the design and capabilities of the current P&E Report 0.1.1.

Motivation and context

  • Find a solution that leverages HTML/CSS code builds to PDF output.
  • Add additional report metrics.
  • Coordinate new sources for input.

Acceptance criteria

How do we know when this work is done?

  • Create Design in P&E Figma Tool for viewing.
  • P&E Team Documents Data Sourcing.
  • Demo Proof-of-Concept.
  • Deliver high-level plan to P&E Team.
  • Update Sprint 2

Define StatsMessage in pe-mailer module

๐Ÿ› Summary

There is a definition missing in email_reports.py called StatsMessage. While PR is in draft mode, the StatsMessage has been temporarily hardcode as a string.

line 442

if summary_to is not None and all_stats_strings:
        StatsMessage = "Needs Defined!!"
        message = StatsMessage(summary_to.split(","), all_stats_strings)
        try:

Expected behavior

StatsMessage will have a reference to src/pe_mailer/StatsMessage.py which is originally taken from cyhy-mailer.

Change Verbiage in PDF Report (Supplemental Information)

๐Ÿ› Summary

On pg 4 of the report the final section is untitled but talks about the Appendix.

Expected behavior

We should title that section "Supplemental Information" and use the second sentence from what is currently on that page.

Dark Web Database Schema

๐Ÿ’ก Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

  • create at least one table that emulates how the data will be stored
  • add to the data_schema.sql file located here

Acceptance criteria

  • At least one table is created
  • Each field needed for the report is included
  • Table is commented out

Update cisa/pe-reports with latest working version.

๐Ÿ’ก Summary

Update the cisa/pe-reports repo with current/active scripts used by the P&E Team to run P&E Reports.

Motivation and context

Currently the P&E Team has working scripts that produce the bi-weekly P&E Cyber Hygiene Reports. The code improvement provides full report output which is to merge into the dev-standards architecture addressed in #6. Introduction of Matplotlib graphing tools have been added for better visualizations.

Secondary end-to-end process improvements:

Implementation notes

Acceptance criteria

How do we know when this work is done?

  • Report runs end-to-end from command-line.
  • csv data embeds in deliverable PDFs.
  • All linked issues are addressed.
  • Final PDFs are encrypted.
  • Code meets cisa dev-sandards and tests.
  • Archive 'Version'.

Domain Masquerading Database Schema

๐Ÿ’ก Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

  • create at least one table that emulates how the data will be stored
  • add to the data_schema.sql file located here

Acceptance criteria

  • At least one table is created
  • Each field needed for the report is included
  • Table is commented out

Package is not Usable

๐Ÿ› Summary

As of the merge of #2 this project is unusable. Trying to perform import pe_reports results in:

>>> import pe_reports
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mcdonnnj/workspace/cisa_repos/pe-reports/src/pe_reports/__init__.py", line 7, in <module>
    from _version import __version__  # noqa: F401
ModuleNotFoundError: No module named '_version'

To reproduce

Steps to reproduce the behavior:

  1. Clone the repo
  2. Set up Python environment
  3. Install package
  4. Perform an import pe_reports either in a script or the Python interactive shell

Expected behavior

An import pe_reports call is performed with no errors.

Suggested resolution

I performed a review of #2 which can be found here #2 (review). The issues mentioned should be resolved. In addition a bare minimum of testing must be implemented along the lines of this section of the tests in cisagov/skeleton-python-library. This will guarantee that before any merges the changes clear both linters (through pre-commit) and enough testing to show that it is a bare-minimum functional Python package.

Add P&E Mailer Automation - Module

๐Ÿ’ก Summary

To further automate the delivery of P&E Reports to customers, a module/program will be implemented to retrieve pe_report output and deliver to the appropriate customer's email. Ideally this would be an in-line process that can be turned on/off will running the pe_reports module.

Motivation and context

This would fully automate the P&E Reports delivery process. Secondarily allows for scalability and acquiring more customers at a faster pace. With proper test, it improves quality control.

Implementation notes

  • Proposed Architecture
โ”œโ”€โ”€ src
โ”‚   โ”œโ”€โ”€ pe_reports
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ __main__.py
โ”‚   โ”‚   โ”œโ”€โ”€ _version.py
โ”‚   โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ”œโ”€โ”€ file_1.py
โ”‚   โ”‚   โ”œโ”€โ”€ file_2.py
โ”‚   โ”‚   โ””โ”€โ”€ report_generator.py
โ”‚   โ”œโ”€โ”€ pe_mailer
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ __main__.py
โ”‚   โ”‚   โ”œโ”€โ”€ _version.py
โ”‚   โ”‚   โ”œโ”€โ”€ data
โ”‚   โ”‚   โ”œโ”€โ”€ file_1.py
โ”‚   โ”‚   โ”œโ”€โ”€ file_2.py
โ”‚   โ”‚   โ””โ”€โ”€ mailer.py
โ””โ”€โ”€ tests
    โ”œโ”€โ”€ conftest.py
    โ””โ”€โ”€ test_pe_reports.py
    โ””โ”€โ”€ test_pe_mailer.py
โ”œโ”€โ”€ setup.py
โ”œโ”€โ”€ docker-compose/yml????
  • Elegantly include secondary command line process for mailer Usage.
  • Update setup.py to handle multiple entry_points={'console_scripts':[ ]
  • Update dependancies for a python import pe_mailer

Acceptance criteria

How do we know when this work is done?

  • TBD

Command Line Functionality - Matplotlib Default Log Level

๐Ÿ’ก Summary

Default Log Level used by matplotlib is too verbose.
usage:
pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--db-creds-file=FILENAME] [--log-level=LEVEL]

Implementation notes

Please provide details for implementation, such as:

To avoid Matplotlib INFO logs which can be extensive set --log-level=warning or anything not "info" pr default.

Acceptance criteria

How do we know when this work is done?

  • Test test_log_levels(level) in test_pe_reports.py passes.

Update README.md (develop)

๐Ÿ’ก Summary

Update README.md content in association to repo changes which may include the following items:

  • Removal of LibreOffice dependancies
  • Update to pe-reports cmd-line usage
  • Updates addressing the pe-mailer module
  • Updates to install process
  • Description on report generation and delivery processes.
  • Description of PE Report metrics.

Motivation and context

Necessary to provide appropriate instruction for using repository resources and defining its purpose.

Acceptance criteria

  • P&E Team Sign-Off (all members)
  • Commit README.md file in pull request

Suspected Vulnerabilities Database Schema

๐Ÿ’ก Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to reference.

Implementation notes

  • create at least one table that emulates how the data will be stored
  • add to the data_schema.sql file located here

Acceptance criteria

  • At least one table is created
  • Each field needed for the report is included
  • Tables are commented out

Connect cyhy customer db to encrypt and embed pdf reports.

๐Ÿ’ก Summary

Update the "report_generator.py" scripts to include a function "generate_reports" that coordinates the P&E Report process.

Motivation and context

This is a key feature to securely deliver encrypted reports to Posture and Exposure customers.

Implementation notes

Please provide details for implementation, such as:

  • Create function "generate_reports" to include the following:
  • Connect to the cyhy database.
  • Gathers customer information based on customer ID.
  • Builds the reports.
  • Embed raw data into the pdf output.
  • Encrypts the pdfs.

Acceptance criteria

How do we know when this work is done?

  • Produce P&E Report with no errors.
  • Output is encrypted.
  • Validate report data.
  • Review and ensure reports can be sent using current mailer scripts.

Primary Key integer or UUID

๐Ÿ’ก Summary

Should the db schema implement int or UUID as the primary key in a table

Motivation and context

In the event that the relationships grow in complexity and the possibility of scaling the DB to a data warehouse

The chance of orphaned data relationships is diminished when scaling out the data

Implementation notes

Please provide details for implementation, such as:

  • Change the primary key type from integer to UUID

Acceptance criteria

How do we know when this work is done?

  • primary key type changed from int to UUID

Reestablish args for pande_dir and db_creds_file in pe-mailer module

๐Ÿ’ก Summary

Currently the arguments pande_dir and db_creds_file in the pe-mailer module are hard-coded only for the reson to pass checks for the draft PR.

Motivation and context

This would be useful because... It is the primary command-line to invoke the pe-mailer module.

Implementation notes

  • Reference pande_dir and db_creds_filearg refereces to the arguments in main()

Acceptance criteria

  • Usage should be: % pe-mailer pande_report_dir= db_creds_file= summary_to=None test_emails= debug=None

Version control management

๐Ÿ’ก Summary

P&E is currently looking to incorporate version control as per cisagov dev-standards. Two report types are in current development, but 1 will eventually merge into develop branch and one to be archived as a "Release".

  • PE Report 0.1.1: This version is an active report delivery system for P&E. The current objective is to get the code up to dev-standards, add to the cisagov/pe-reports repo and then archive as a pe-report "Release". To manage and access this version a branch pe-reports 0.1.1 will be created and used for any PE team member do commits/pull request pertaining to this version.

  • PE Report 1.0.0: This version is currently in development and introduces a whole new graphing system. The current objective is to continue development with the intention to merge into the main branch develop and act as the current P&E report moving forward and would be archived accordingly as updates occur.

Motivation and context

Priority is to be compliant to dev-standards and update the cisagov/pe-reports to manage all code deemed important to P&E reporting capabilities.

This would be useful because it allows for improved source control for the P&E team.

Implementation notes

PE Report 0.1.1

  • Create branch pe-reports 0.1.1
  • P&E team members will create individual branches per issue/commits
  • Pull Request will be set to "Draft Mode" as it is expected early PRs may not pass all checks.
  • Upon 'pe-reports 0.1.1' is compliant and complete create a Release

PE Report 1.0.0

  • P&E team members will create individual branches per issue/commits
  • PR will follow cisagov dev-standards

Acceptance criteria

How do we know when this work is done?

  • Create branch 'pe-reports 0.1.1'
  • 'pe-reports 0.1.1' is set as Release

"install_requires" package to create a CyHy MongoDB connection

๐Ÿ› Summary

PE Reports and eventually the PE Mailer processes rely on including the package from mongo-db-from-config. The current implementation in. setup.py in branch ss-working-tests adds this requirement yet fails on install. Also attempted was the following taken from the cyhy-mailer repo:
"mongo-db-from-config @ http://github.com/cisagov/mongo-db-from-config/tarball/develop#egg=mongo-db-from-config"

Question: Is there additional code or method that's need to be included, or alternatively is this now a step added to install instructions.

To reproduce

Steps to reproduce the behavior:

  1. pe-reports % python3 setup.py install
  2. Then this:
    ModuleNotFoundError: No module named 'mongo_db_from_config'

Expected behavior

What did you expect to happen that didn't?

The mongo-db-from-config package installs on setup.

Domain Masquerading Page in PDF Report (Incorrect Masquerading ID)

๐Ÿ› Summary

On NASA report, Domain Masquerading incorrectly identifies NSA.gov as a spoofed domain by 'omission'.

Expected behavior

NSA and other government agencies should be excluded from being identified as spoofing each other.

Any helpful log output or screenshots

image

Command Line Functionality Documentation

๐Ÿ’ก Summary

Update the GitHub SOP and associated documentation with the updated functionality.

Acceptance criteria

How do we know when this work is done?

  • Update Teams Documentation - P&E GitHub SOP

This is related to #8

Dynamic Appendicies To Add Descriptive Details

๐Ÿ’ก Summary

Our report needs to have context when displaying source information such as Forum Name or Breach Names.

Motivation and context

This is best done in an appendix as the tables used in the report are too small to give enough detail about a given item.

Implementation notes

The appendices should be dictionary style in nature. This allows our users to quickly identify the information and read a short 1 - 2 sentence summary.

Acceptance criteria

How do we know when this work is done?

  • Breach Desceription Appendix
  • Dark Web Site Appendix
  • Dark Web/Deep Web Site Appendix
  • Appendicies should be dynamic and only display information contained in the report.

Add Mac PowerPoint option to convert a .pptx file to a .pdf.

๐Ÿ’ก Summary

Current solution uses LibreOffice to convert PowerPoint Presentation to a PDF file. This works for both Mac OSX and Linux systems but causes formatting issues with graphs. A possible code modification could be made to include scripts to invoke an AppleScript function that uses the PowerPoint app natively to convert pptx files to pdfs, thus retaining all graphing elements available from MS PowerPoint.

Motivation and context

This process is a key element to the P&E Reports and improves both quality and scalability of the product. This additional function eliminates a complicated dependency - LibreOffice instillation.

Implementation notes

Please provide details for implementation, such as:

-Write in condition to look at OS

if os.name == ''posix':
linuxStuff()
elif os.name == 'nt':
windowsStuff()
elif os.name == 'os2': ...
  • Create a python subprocess to call ApplesScript function.
  • AppleScript function will look like/include:
on run {input, parameters}
	set theOutput to {}
	tell application "Microsoft PowerPoint"
		launch
		set theDial to start up dialog
		set start up dialog to false
		repeat with i in input
			open i
			set pdfPath to my makeNewPath(i)
			save active presentation in pdfPath as save as PDF
			close active presentation saving no
			set end of theOutput to pdfPath as alias
		end repeat
		set start up dialog to theDial
	end tell
	return theOutput
end run

on makeNewPath(f)

Acceptance criteria

How do we know when this work is done?

  • Run script in cisa/pe-reports with no errors.
  • Run PowerPoint Application processing in background.
  • Write in condition to look at OS
  • All graphing issues resolved.

Define variable for empty iterations

๐Ÿ’ก Summary

In reference to #6 (comment), all report objects need to be defined as a variable in the event an iteration renders empty. This primarily applies in the pptx object 'shape' from 'shapes' and will need to be adjusted accordingly when applying the matplotlib libraries. Issue -#14

Motivation and context

Considered a quality improvement to code.

Implementation notes

Please provide details for implementation, such as:

- from pptx import Presentation

SLD_LAYOUT_TITLE_AND_CONTENT = 1

prs = Presentation()
slide_layout = prs.slide_layouts[SLD_LAYOUT_TITLE_AND_CONTENT]
slide = prs.slides.add_slide(slide_layout)
shapes = slide.shapes
  • Example stylesheets.py: Declare a shape variable in the event slide.shapes is empty.
   @staticmethod
    def shapes(slide):
        """Create a text frame."""
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
        return shape

Acceptance criteria

How do we know when this work is done?

  • Return value shape on empty iteration

DRY out this code

๐Ÿ’ก Summary

There is a repeat pattern of code used in pages.py. Noted in comments bug-unusable-package, here and here . The code should be re-written as a single function/class, unambiguous and non-repeatable.

Motivation and context

Due to the change in graphing libraries in the next planned merge. This issue will be generically applied to the updates being made in Issue - #14 "Replace python-pptx graphs with Matplotlib" and will be set as an "Acceptance criteria" item.

Implementation notes

  • An evaluation of updated Matplotlib code to pptx file code is needed.

Acceptance criteria

How do we know when this work is done?

  • Repetition of code is reduced.

Update README.md (main)

๐Ÿ’ก Summary

Update README.md content in association to repo changes which may include the following items:

  • Update to pe-reports cmd-line usage
  • Updates to install process
  • Description on report generation and delivery processes.
  • Description of PE Report metrics.

Motivation and context

Necessary to provide appropriate instruction for using repository resources and defining its purpose.

Implementation notes

Please provide details for implementation, such as:

  • an example for how this would be used
  • what this would look like
  • how this would act
  • any related work, including links to related issues

Acceptance criteria

How do we know when this work is done?

  • P&E Team Sign-Off (all members)
  • Commit README.md file in pull request

Add New Team Member: cduhn17

๐Ÿ’ก Summary

Craig Dunn is a new member to the P&E Team and will be an owner of https://github.com/cisagov/pe-reports

Motivation and context

Craig Dunn's GitHub profile = cduhn17

Implementation notes

# These owners will be the default owners for everything in the
# repo. Unless a later match takes precedence, these owners will be
# requested for review when someone opens a pull request.
* @aloftus23 @cduhn17 @dav3r @DJensen94 @elr64 @felddy @jsf9k @mcdonnnj @schmelz-ctr @stewartl97

Acceptance criteria

How do we know when this work is done?

Re-review of unresolved conversations in PR#33

Thank you for your work on this PR. I had some suggestions and some questions about fail states I would like explained a bit more.

Originally posted by @mcdonnnj in #33 (review)

With multiple approvals from the team we are looking to move forward with a merge. The PR is a blocker for other Issues pending that will clean-up branch "main".

To continue, we are creating this issue in the event we need to revisit some of the unresolved conversations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.