cisagov / pe-reports Goto Github PK

Automated process to build and distribute Posture & Exposure Reports' bi-weekly to customers.

License: Creative Commons Zero v1.0 Universal

Shell 1.90% Python 90.91% CSS 3.78% HTML 3.41%

pe-reports's Introduction

Posture & Exposure Reports (P&E Reports)

This package is used to generate and deliver CISA Posture & Exposure Reports (P&E Reports). Reports are delivered by email and include an encrypted PDF attachment with a series of embedded raw-data files of the collected materials. The reports are delivered in a two step process. First the pe_reports module collects the raw data and creates the encrypted PDFs. The pe_mailer then securely delivers the content.

Topics of interest include Exposed Credentials, Domain Masquerading, Malware, Inferred Vulnerabilities and the Dark Web. The data collected for the reports is gathered on the 1st and 15th of each month.

Requirements

Installation

git clone https://github.com/cisagov/pe-reports.git
pip install -e .

Create P&E Reports

Configure cisagov MongoDB connection

Usage:
  pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--log-level=LEVEL]

Arguments:
  REPORT_DATE                   Date of the report, format YYYY-MM-DD.
  OUTPUT_DIRECTORY              The directory where the final PDF reports should be saved.
Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  --log-level=LEVEL             If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]

Deliver P&E Reports

Configure cisagov MongoDB connection
Load an AWS profile that assumes this role

Usage:
  pe-mailer [--pe-report-dir=DIRECTORY] [--db-creds-file=FILENAME] [--log-level=LEVEL]

Arguments:
  -p --pe-report-dir=DIRECTORY  Directory containing the pe-reports output.
  -c --db-creds-file=FILENAME   A YAML file containing the Cyber
                                Hygiene database credentials.
                                [default: /secrets/database_creds.yml]
Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  -s --summary-to=EMAILS        A comma-separated list of email addresses
                                to which the summary statistics should be
                                sent at the end of the run.  If not
                                specified then no summary will be sent.
  -t --test_emails=EMAILS       A comma-separated list of email addresses
                                to which to test email send process. If not
                                specified then no test will be sent.
  -l --log-level=LEVEL          If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]

Database backup/restore

Follow the instructions below to backup the P&E database instance and restore locally.

In the P&E database environment:

Pull the latest repository
If necessary, edit ./src/pe_reports/pe_db/pg_backup.sh and replace the default output path ($PWD) with your preferred output path.
Open terminal and run: bash ./src/pe_reports/pe_db/pg_backup.sh
Export resulting .zip file

In your local environment:

Pull the latest repository
If necessary, edit ./src/pe_reports/pe_db/pg_restore.sh and replace the default path to the backup files ($PWD) with your preferred path.
Start local postgres
Open terminal and run: bash ./src/pe_reports/pe_db/pg_restore.sh

Collect P&E Source Data

Add database and data source credentials to src/pe_reports/data/config.ini

Usage:
  pe-source DATA_SOURCE [--log-level=LEVEL] [--orgs=ORG_LIST] [--cybersix-methods=METHODS]

Arguments:
  DATA_SOURCE                       Source to collect data from. Valid values are "cybersixgill",
                                    "dnstwist", "hibp", and "shodan".
Options:
  -h --help                         Show this message.
  -v --version                      Show version information.
  -l --log-level=LEVEL              If specified, then the log level will be set to
                                    the specified value.  Valid values are "debug", "info",
                                    "warning", "error", and "critical". [default: info]
  -o --orgs=ORG_LIST                A comma-separated list of orgs to collect data for.
                                    If not specified, data will be collected for all
                                    orgs in the pe database. Orgs in the list must match the
                                    IDs in the cyhy-db. E.g. DHS,DHS_ICE,DOC
                                    [default: all]
  -csg --cybersix-methods=METHODS   A comma-separated list of cybersixgill methods.
                                    If not specified, all will run. Valid values are "alerts",
                                    "credentials", "mentions", "topCVEs". E.g. alerts,mentions.
                                    [default: all]

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

pe-reports's People

Contributors

Stargazers

Watchers

Forkers

snowdensb ekmixon alnashecs28 fairhopeweb ethicalsecurity-agency

pe-reports's Issues

Add cc Address To CyHy Mailer Module

🐛 Summary

After talking to Genevieve, we need to ensure that the CyHy mailer sends a copy of all reports to [email protected]

Expected behavior

All reports from this point forward should cc [email protected] for record retention requirements.

Update pe-mailer error handling

💡 Summary

Include the proper "Exceptions" and "Logging" to handle errors that may cause the pe-mailer module "Fails and Traceback messaging". On run pe-mailer ... ... ... the module should run start-to-finsh regards less of a lack of module resources, or return a "Usage" error message.

For a code example see pe-reports fix and its scope.

Motivation and context

It provides a proper implementation of the pe-mailer module, and handles errors properly when importing and running.

Implementation notes

Please provide details for implementation, such as:

Explore all probable Traceback Issues
Implement Exceptions to issues with proper messaging.

Acceptance criteria

% import pe-mailer
% pe-mailer pande_report_dir= db_creds_file= with bad references but runs to completion with no Traceback errors.

Add useful test to pe-report module, develop branch

Since you created dummy tests here, please create an issue in this repo to replace these tests with useful tests. Then add a TODO comment somewhere in this file with a brief note and a link to the issue you created.

Originally posted by @dav3r in #2 (comment)

Add Windows option to convert a .pptx file to a .pdf.

💡 Summary

Current solution uses LibreOffice to convert PowerPoint Presentation to a PDF file. This works well for both Mac OSX and Linux systems. A possible code modification could be made to include Windows operating systems.

Motivation and context

This is a critical process in producing an encrypted PDF for a customer. This would be useful for persons using a Windows operating system.

Implementation notes

To implement write in a condition to check system OS:
if os.name == ''posix':
linuxStuff()
elif os.name == 'nt':
windowsStuff()
elif os.name == 'os2': ...

If Windows use WIN COM, not LibreOffice to convert pptx to pdf. The user must have PowerPoint installed.

from pathlib import Path
import win32com.client

def ppt2pdf(ppt_target_file):
file_path = Path(ppt_target_file).resolve()
out_file = file_path.parent / file_path.stem
powerpoint = win32com.client.Dispatch("Powerpoint.Application")
pdf = powerpoint.Presentations.Open(file_path, WithWindow=False)
pdf.SaveAs(out_file, 32)
pdf.Close()
powerpoint.Quit()

ppt2pdf('')

Acceptance criteria

How do we know when this work is done?

Can be run on Windows OS.
README.md Installation instructions address the option.

Define constants for graph sizing and positioning.

💡 Summary

In pages.py, remove hard-coded values addressing graph size positioning:

x = width
y = hight
cx = graph position horizontal
cy = graph position vertical

Example code:

 x, y, cx, cy = Inches(8.25), Inches(4.9), Inches(4.5), Inches(2.0)
 chart = slide.shapes.add_chart(
            XL_CHART_TYPE.COLUMN_STACKED_100, x, y, cx, cy, chart
            ).chart
            Graph.chart_sm(prs, slide, chart)

Motivation and context

Additional value will be found on a complete P&E Reports build-out as each graphs size and positioning may constitute a method or "graph type" in stylesheets.py.

Implementation notes

Issue noted - #6 (comment)
Create constants for each graph type.

Acceptance criteria

How do we know when this work is done?

No hard-coded values will be found in the page generating functions.

Standardize API formula

💡 Summary

There is a large amount of data sources used to generate PE Reports +1.0.0. A standard code set is required to build continuity between developers for making request.

Motivation and context

This issue is to review and discover best methods for the PE Team's API development.

Implementation notes

Please review documents in the PE Teams File repository located:

Documents/Posture and Exposure/Temporary/Proposed API Standard/ Proposed API Standardization.docx

Acceptance criteria

PE Team Sign-off.

Credential Exposure Data Schema

💡 Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

create at least one table that emulates how the data will be stored
add to the data_schema.sql file located here

Acceptance criteria

At least one table is created
Each field needed for the report is included
Table is commented out

pe-mailer module "tests"

💡 Summary

Add in "tests" for the pe-mailer module. These tests may include the following:

test_message.py
test_pandemessage.py
test_reportmessage.py
test_statsmessage.py

Motivation and context

The tests are added for quality assurance and is standard practice for cisago/ repos.

Implementation notes

create appropriate scripts for pe-mailer; some derived from cyhy-mailer test
add to pe-reports/tests

Acceptance criteria

How do we know when this work is done?

Pass unittesting, pytest

Error Handling for PDF output over 20MB

🐛 Summary

Some clients can't receive files over 20 MB and bounce back when emailed. We should add some functionality in our scripts to either flag on pdfs that are over 20MB, or automatically zip the files once the csv attaching and encrypting process has been complete.

To reproduce

Steps to reproduce the behavior:

Run production and evaluate attachment file sizes. This is a rare occurrence, but can happen.

Expected behavior

In the pe-report scripts, flag any reports that are greater than or equal to 20MB.

Alternatively remove the "content" of the web mentions (the post/comment text) because it creates the large file sizes. The 'content' of the dark web mentions can still be included as the data is limited. Other attributes of the web mentions should not be removed (E.g. title and url)

Possible solutions:

Get rid of web mentions raw data and keep dark web.
Truncate the list.
Don't embed raw data.
Re-evaluate pe_reports code.
Flag files sizes over 20MB

Any helpful log output or screenshots

Define `libreoffice_exe()` failure outcome

💡 Summary

At the function libreoffice_exec() What happens in this code if sys.platform == "darwin" is false?
def libreoffice_exec(): """Call to MacOS LibeOffice App.""" if sys.platform == "darwin": return "/Applications/LibreOffice.app/Contents/MacOS/soffice"

Motivation and context

The failure of this function will stop the execution of the program.

This would be useful because not only would the program finish with a return value of 0.

The improvement could make it possible to run the software on other OS versions.

Acceptance criteria

How do we know when this work is done?

Return value 0 on completion of program
Alternative functionality to launch Libreoffice regardless of OS version or environment

Research usage of Docker to support P&E Report architecture.

💡 Summary

The primary goal is to build a transferable environment to run P&E Reports. The intended architecture includes a Python framework, Docker image and a Postgres database.

Motivation and context

P&E Reports requires the process of scanning external resources for data, storing the data, export data/analytics to a formalized report and then deliver output via email to subscribed customers. A primary requirement is to create an environment that can be delivered by various 'users'. A contained solution would provide for this along with enhancing security and stability to all data processes.

Noted Resources:

Implementation notes

Discover and document probably solutions.
Build a Proof-of-Concept (POC).

Acceptance criteria

How do we know when this work is done?

Build Working POC
Validate solution meets cisagov development standards.
Create implementation plan and sprint.

Adjust table column names for proper formatting and standardized naming conventions

💡 Summary

The names of the table columns in the report currently come from the names found in the database, these all use lower case with underscores instead of spaces. Naming convention is based off of the api or what makes most sense to P&E but may not be the best for the end customer. Need to adjust Column names for every table in the report to standardize the naming and formatting to maximize comprehension

Motivation and context

We want to make the tables as easy to read as possible. So data is very easy to understand for the end customer.

Implementation notes

Implementation will occur in the report_metrics or pages scripts where we manipulate the dataframes for the report.
We just need to change the column names before sending the dataframe to the table builder function.

Acceptance criteria

How do we know when this work is done?

Criterion
All tables have well formatted and easy to understand column names

Change Verbiage in PDF Report (Credential Publication and Abuse)

🐛 Summary

Verbiage used under section "Credential Publication and Abuse" on page 4 is incorrect.

Expected behavior

The verbiage needs to be change to the following:

"Exposed credentials put systems at risk for unauthorized access and users at risk for highly effective phishing attacks."

Create PDF Reports From HTML/CSS

💡 Summary

Find a solution that leverages HTML/CSS code builds to PDF output.

Motivation and context

The current method using .pptx files and converting on Macbooks ruins the formatting elements.

Implementation notes

This new report process should be able to generate complete and accurate reports without utilizing middleware applications.

Acceptance criteria

How do we know when this work is done?

P&E full reports can be generated using HTML as the transition.
All charts and objects contained within the report are formatted and placed as expected.
Text contained within the report is free of any spelling or grammatical errors.

Remove hard-coded file locations

💡 Summary

In Pages Class remove hard-coded references to files.

Motivation and context

This would be useful because it removes any directory management issues.
Code Example:
df_customer = pd.read_csv("src/pe_reports/data/csv/dhs_customer.csv")

Implementation notes

Create a data-dictionary reference or method to resolve file paths in class.

Acceptance criteria

How do we know when this work is done?

Remove hard-coded file paths
pe-reports passes checks and run

Clean up unnecessary characters from table values

💡 Summary

A lot of the data that comes from the APIs has unnecessary characters i.e. various braces {}[] and single quotes. We need to adjust the table building function to remove those values to maximize readability

Motivation and context

The tables are a key part of the report, so they need to be as clean as possible

Implementation notes

The table generator function already loops through all the values, so as it loops through it should identify unnecessary characters and strip them out, shouldn't be too difficult

Acceptance criteria

How do we know when this work is done?

Criterion
Table values are clean of unnecessary characters

Replace eval() functions

🐛 Summary

Python has an eval() function which evaluates a string of Python code bandit check has "blacklisted" as an insecure function.

To reproduce

Steps to reproduce the behavior:

Run branch pre-commit: https://github.com/cisagov/pe-reports/tree/AL-working-v1.0

Expected behavior

A pre-commit run --all-files should render no bandit errors.

Any helpful log output or screenshots

Example Test Error:

Test results:
>> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval.
   Severity: Medium   Confidence: High
   Location: src/pe_reports/report_metrics.py:191
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b307-eval
191                 k = eval(row["right"])

One-off password retrieval

💡 Summary

Feature to allow setting a password manually for one-off or small group reports. This feature would only be accessible for P&E Team members.

Motivation and context

This would be useful because it will allow P&E Team members respond in a timely manor for one-off report requests are made. It will also reduce the risk of official passwords being used outside of the context of data sharing with customers.

Implementation notes

This feature should use command line switches to control how passwords are gathered. Default can be to pull from CyHy database, however this should be avoided when doing demo or test report runs as the passwords should only be used when communicating directly with the customers.

Acceptance criteria

How do we know when this work is done?

Individual password can be set via command line switch for single report.
Individual password (same) can be set via command line switch for multiple report.
Password list can be passed to apply to multiple reports (optional).
Proper security features are enabled and tested

DRY out this code (PE Report 0.1.1)

💡 Summary

This Issue has been split for both PE Report 0.1.1 "main" and for PE Report 1.0.0 "develop" - Issue #18, as both repos need updated, but constitute separate efforts.

There is a repeat pattern of code used in pages.py. Noted in comments bug-unusable-package, here and here The code should be re-written as a single function/class, unambiguous and non-repeatable.

Please provide details for implementation, such as:

An evaluation of updated Matplotlib code to pptx file code is needed.
Remove repetitive code - standardize graphing output.

Acceptance criteria

How do we know when this work is done?

Repetition of code is reduced.

Report QA: Confirm assets via WhoIs service

💡 Summary

A secondary check/script is needed to validate data feeds which include a list of a customer's assets. A "net-range validation process" or "whois" check is needed to filter bad assets.

Motivation and context

This improves the quality of the report through validation and cuts down on any speculative data included in the reports.

Implementation notes

Please provide details for implementation, such as:

Define Solution

Acceptance criteria

How do we know when this work is done?

IP Addresses should be pulled from the CyHy database instead of LG to identify IP Addresses that are valid for the particular agency.
The team should be able to perform asset checks on a regular basis to determine if the current IP addresses on record with CyHy are valid.

Remove all 'subprocess' package incidences

🐛 Summary

Python package subprocess is flagged by bandit check as "blacklisted".

To reproduce

Steps to reproduce the behavior:

Run branch pre-commit: https://github.com/cisagov/pe-reports/tree/AL-working-v1.0

Expected behavior

A pre-commit run --all-files should render no bandit errors.

Any helpful log output or screenshots

Paste the results here:

>> Issue: [B404:blacklist] Consider possible security implications associated with subprocess module.
   Severity: Low   Confidence: High
   Location: src/pe_reports/report_generator.py:29
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess

29      import subprocess

Add command line functionality.

💡 Summary

Add in command line functionality that addresses cyhy database connection, data input and data output arguments.

Motivation and context

Critical feature to ensure a secure and proper distribution of P&E Reports.

Implementation notes

Intended usage should follow:

"""A tool for creating Posture & Exposure reports.

Usage:
    pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--db-creds-file=FILENAME] [--log-level=LEVEL]

Arguments:
  REPORT_DATE                   Date of the report, format YYYY-MM-DD.
  DATA_DIRECTORY                The directory where the excel data files are located.
                                Organized by owner.
  OUTPUT_DIRECTORY              The directory where the final PDF reports should be saved.
  -c --db-creds-file=FILENAME   A YAML file containing the Cyber
                                Hygiene database credentials.
                                [default: /secrets/database_creds.yml]

Options:
  -h --help                     Show this message.
  -v --version                  Show version information.
  --log-level=LEVEL             If specified, then the log level will be set to
                                the specified value.  Valid values are "debug", "info",
                                "warning", "error", and "critical". [default: info]
"""

Acceptance criteria

How do we know when this work is done?

Successfully generate P&E Reports from command line interface.
DATA_DIRECTORY input data file locations are not hard-coded.
OUTPUT_DIRECTORY is created when scripts are run.
Proper Exceptions are made if -db-creds-file is not available or fails.
log-level option type is added.

Replace python-pptx graphs with Matplotlib graphing libraries

💡 Summary

Due to graphing errors with python-pptx and pdf conversions on mac OS, all graphing functions will be replace with the Matplotlib package.

Motivation and context

Primarily this improvement allows for better quality control and addresses the open issues presented in #9 "Define constants for graph sizing and positioning".

Implementation notes

Please provide details for implementation, such as:

From the local P&E Teams repository take scripts from charts.py and update GitHub stylesheets.py
Adjust references in pages.py
Update setup.py with Matplotlib instillation requirements

Acceptance criteria

How do we know when this work is done?

Report runs end-to-end from command-line.
Must pass Issue requirements - #18
Must pass Issue requirements - #21
Code meets CISA dev-standards and tests.

New P&E Reports Design 1.0

💡 Summary

Enhance the design and capabilities of the current P&E Report 0.1.1.

Motivation and context

Find a solution that leverages HTML/CSS code builds to PDF output.
Add additional report metrics.
Coordinate new sources for input.

Acceptance criteria

How do we know when this work is done?

Create Design in P&E Figma Tool for viewing.
P&E Team Documents Data Sourcing.
Demo Proof-of-Concept.
Deliver high-level plan to P&E Team.
Update Sprint 2

Define StatsMessage in pe-mailer module

🐛 Summary

There is a definition missing in email_reports.py called StatsMessage. While PR is in draft mode, the StatsMessage has been temporarily hardcode as a string.

line 442

if summary_to is not None and all_stats_strings:
        StatsMessage = "Needs Defined!!"
        message = StatsMessage(summary_to.split(","), all_stats_strings)
        try:

Expected behavior

StatsMessage will have a reference to src/pe_mailer/StatsMessage.py which is originally taken from cyhy-mailer.

Change Verbiage in PDF Report (Supplemental Information)

🐛 Summary

On pg 4 of the report the final section is untitled but talks about the Appendix.

Expected behavior

We should title that section "Supplemental Information" and use the second sentence from what is currently on that page.

Dark Web Database Schema

💡 Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

create at least one table that emulates how the data will be stored
add to the data_schema.sql file located here

Acceptance criteria

At least one table is created
Each field needed for the report is included
Table is commented out

Update cisa/pe-reports with latest working version.

💡 Summary

Update the cisa/pe-reports repo with current/active scripts used by the P&E Team to run P&E Reports.

Motivation and context

Currently the P&E Team has working scripts that produce the bi-weekly P&E Cyber Hygiene Reports. The code improvement provides full report output which is to merge into the dev-standards architecture addressed in #6. Introduction of Matplotlib graphing tools have been added for better visualizations.

Secondary end-to-end process improvements:

Implementation notes

Branch from - #6.
"Draft" PR "(initials)-wip-pe-reports-v1".
Update code to cisa dev-standards = https://github.com/cisagov/skeleton-python-library.
Merge to develop.

Acceptance criteria

How do we know when this work is done?

Report runs end-to-end from command-line.
csv data embeds in deliverable PDFs.
All linked issues are addressed.
Final PDFs are encrypted.
Code meets cisa dev-sandards and tests.
Archive 'Version'.

Domain Masquerading Database Schema

💡 Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to referance.

Implementation notes

create at least one table that emulates how the data will be stored
add to the data_schema.sql file located here

Acceptance criteria

At least one table is created
Each field needed for the report is included
Table is commented out

Package is not Usable

🐛 Summary

As of the merge of #2 this project is unusable. Trying to perform import pe_reports results in:

>>> import pe_reports
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mcdonnnj/workspace/cisa_repos/pe-reports/src/pe_reports/__init__.py", line 7, in <module>
    from _version import __version__  # noqa: F401
ModuleNotFoundError: No module named '_version'

To reproduce

Steps to reproduce the behavior:

Clone the repo
Set up Python environment
Install package
Perform an import pe_reports either in a script or the Python interactive shell

Expected behavior

An import pe_reports call is performed with no errors.

Suggested resolution

I performed a review of #2 which can be found here #2 (review). The issues mentioned should be resolved. In addition a bare minimum of testing must be implemented along the lines of this section of the tests in cisagov/skeleton-python-library. This will guarantee that before any merges the changes clear both linters (through pre-commit) and enough testing to show that it is a bare-minimum functional Python package.

Improve use of schema to validate arguments

💡 Summary

Noted in 'bug-unusable-package', validate the arguments by using python package schema. REPORT_DATE and DATA_DIRECTORY should exist at a minimum.

Motivation and context

Motivation is to improve quality of code.

Implementation notes

Please provide details for implementation, such as:

https://pypi.org/project/schema/

Acceptance criteria

How do we know when this work is done?

schema.is_valid(data) will return True

Add P&E Mailer Automation - Module

💡 Summary

To further automate the delivery of P&E Reports to customers, a module/program will be implemented to retrieve pe_report output and deliver to the appropriate customer's email. Ideally this would be an in-line process that can be turned on/off will running the pe_reports module.

Motivation and context

This would fully automate the P&E Reports delivery process. Secondarily allows for scalability and acquiring more customers at a faster pace. With proper test, it improves quality control.

Implementation notes

Proposed Architecture

├── src
│   ├── pe_reports
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── _version.py
│   │   ├── data
│   │   ├── file_1.py
│   │   ├── file_2.py
│   │   └── report_generator.py
│   ├── pe_mailer
│   │   ├── __init__.py
│   │   ├── __main__.py
│   │   ├── _version.py
│   │   ├── data
│   │   ├── file_1.py
│   │   ├── file_2.py
│   │   └── mailer.py
└── tests
    ├── conftest.py
    └── test_pe_reports.py
    └── test_pe_mailer.py
├── setup.py
├── docker-compose/yml????

Elegantly include secondary command line process for mailer Usage.
Update setup.py to handle multiple entry_points={'console_scripts':[ ]
Update dependancies for a python import pe_mailer

Acceptance criteria

How do we know when this work is done?

Command Line Functionality - Matplotlib Default Log Level

💡 Summary

Default Log Level used by matplotlib is too verbose.
usage:
pe-reports REPORT_DATE DATA_DIRECTORY OUTPUT_DIRECTORY [--db-creds-file=FILENAME] [--log-level=LEVEL]

Implementation notes

Please provide details for implementation, such as:

To avoid Matplotlib INFO logs which can be extensive set --log-level=warning or anything not "info" pr default.

Acceptance criteria

How do we know when this work is done?

Test test_log_levels(level) in test_pe_reports.py passes.

Update README.md (develop)

💡 Summary

Update README.md content in association to repo changes which may include the following items:

Removal of LibreOffice dependancies
Update to pe-reports cmd-line usage
Updates addressing the pe-mailer module
Updates to install process
Description on report generation and delivery processes.
Description of PE Report metrics.

Motivation and context

Necessary to provide appropriate instruction for using repository resources and defining its purpose.

Acceptance criteria

P&E Team Sign-Off (all members)
Commit README.md file in pull request

Suspected Vulnerabilities Database Schema

💡 Summary

Show database schema for domain masquerading data

Motivation and context

To support database creation and use. For now, will be used solely for the rest of the team to reference.

Implementation notes

create at least one table that emulates how the data will be stored
add to the data_schema.sql file located here

Acceptance criteria

At least one table is created
Each field needed for the report is included
Tables are commented out

Connect cyhy customer db to encrypt and embed pdf reports.

💡 Summary

Update the "report_generator.py" scripts to include a function "generate_reports" that coordinates the P&E Report process.

Motivation and context

This is a key feature to securely deliver encrypted reports to Posture and Exposure customers.

Implementation notes

Please provide details for implementation, such as:

Create function "generate_reports" to include the following:
Connect to the cyhy database.
Gathers customer information based on customer ID.
Builds the reports.
Embed raw data into the pdf output.
Encrypts the pdfs.

Acceptance criteria

How do we know when this work is done?

Produce P&E Report with no errors.
Output is encrypted.
Validate report data.
Review and ensure reports can be sent using current mailer scripts.

Primary Key integer or UUID

💡 Summary

Should the db schema implement int or UUID as the primary key in a table

Motivation and context

In the event that the relationships grow in complexity and the possibility of scaling the DB to a data warehouse

The chance of orphaned data relationships is diminished when scaling out the data

Implementation notes

Please provide details for implementation, such as:

Change the primary key type from integer to UUID

Acceptance criteria

How do we know when this work is done?

primary key type changed from int to UUID

Reestablish args for pande_dir and db_creds_file in pe-mailer module

💡 Summary

Currently the arguments pande_dir and db_creds_file in the pe-mailer module are hard-coded only for the reson to pass checks for the draft PR.

Motivation and context

This would be useful because... It is the primary command-line to invoke the pe-mailer module.

Implementation notes

Reference pande_dir and db_creds_filearg refereces to the arguments in main()

Acceptance criteria

Usage should be: % pe-mailer pande_report_dir= db_creds_file= summary_to=None test_emails= debug=None

Version control management

💡 Summary

P&E is currently looking to incorporate version control as per cisagov dev-standards. Two report types are in current development, but 1 will eventually merge into develop branch and one to be archived as a "Release".

PE Report 0.1.1: This version is an active report delivery system for P&E. The current objective is to get the code up to dev-standards, add to the cisagov/pe-reports repo and then archive as a pe-report "Release". To manage and access this version a branch pe-reports 0.1.1 will be created and used for any PE team member do commits/pull request pertaining to this version.
PE Report 1.0.0: This version is currently in development and introduces a whole new graphing system. The current objective is to continue development with the intention to merge into the main branch develop and act as the current P&E report moving forward and would be archived accordingly as updates occur.

Motivation and context

Priority is to be compliant to dev-standards and update the cisagov/pe-reports to manage all code deemed important to P&E reporting capabilities.

This would be useful because it allows for improved source control for the P&E team.

Implementation notes

PE Report 0.1.1

Create branch pe-reports 0.1.1
P&E team members will create individual branches per issue/commits
Pull Request will be set to "Draft Mode" as it is expected early PRs may not pass all checks.
Upon 'pe-reports 0.1.1' is compliant and complete create a Release

PE Report 1.0.0

P&E team members will create individual branches per issue/commits
PR will follow cisagov dev-standards

Acceptance criteria

How do we know when this work is done?

Create branch 'pe-reports 0.1.1'
'pe-reports 0.1.1' is set as Release

"install_requires" package to create a CyHy MongoDB connection

🐛 Summary

PE Reports and eventually the PE Mailer processes rely on including the package from mongo-db-from-config. The current implementation in. setup.py in branch ss-working-tests adds this requirement yet fails on install. Also attempted was the following taken from the cyhy-mailer repo:
"mongo-db-from-config @ http://github.com/cisagov/mongo-db-from-config/tarball/develop#egg=mongo-db-from-config"

Question: Is there additional code or method that's need to be included, or alternatively is this now a step added to install instructions.

To reproduce

Steps to reproduce the behavior:

pe-reports % python3 setup.py install
Then this:
ModuleNotFoundError: No module named 'mongo_db_from_config'

Expected behavior

What did you expect to happen that didn't?

The mongo-db-from-config package installs on setup.

Domain Masquerading Page in PDF Report (Incorrect Masquerading ID)

🐛 Summary

On NASA report, Domain Masquerading incorrectly identifies NSA.gov as a spoofed domain by 'omission'.

Expected behavior

NSA and other government agencies should be excluded from being identified as spoofing each other.

Any helpful log output or screenshots

Command Line Functionality Documentation

💡 Summary

Update the GitHub SOP and associated documentation with the updated functionality.

Acceptance criteria

How do we know when this work is done?

Update Teams Documentation - P&E GitHub SOP

This is related to #8

mypy hook: PyYAML not supported with Python 3.9

🐛 Summary

During run of pre-commit run --all-flies mypy check issues the following warning: "PyYAML not supported with Python 3.9"

To reproduce

Steps to reproduce the behavior:

Have import yaml in scripts. PyPi Reference
pre-commit run --all-flies

Expected behavior

pre-commit would pass after implementing the following parameter to mypy hook - additional_dependencies: [types-all]
Reference: https://github.com/asottile/types-all

Dynamic Appendicies To Add Descriptive Details

💡 Summary

Our report needs to have context when displaying source information such as Forum Name or Breach Names.

Motivation and context

This is best done in an appendix as the tables used in the report are too small to give enough detail about a given item.

Implementation notes

The appendices should be dictionary style in nature. This allows our users to quickly identify the information and read a short 1 - 2 sentence summary.

Acceptance criteria

How do we know when this work is done?

Breach Desceription Appendix
Dark Web Site Appendix
Dark Web/Deep Web Site Appendix
Appendicies should be dynamic and only display information contained in the report.

Add Mac PowerPoint option to convert a .pptx file to a .pdf.

💡 Summary

Current solution uses LibreOffice to convert PowerPoint Presentation to a PDF file. This works for both Mac OSX and Linux systems but causes formatting issues with graphs. A possible code modification could be made to include scripts to invoke an AppleScript function that uses the PowerPoint app natively to convert pptx files to pdfs, thus retaining all graphing elements available from MS PowerPoint.

Motivation and context

This process is a key element to the P&E Reports and improves both quality and scalability of the product. This additional function eliminates a complicated dependency - LibreOffice instillation.

Implementation notes

Please provide details for implementation, such as:

-Write in condition to look at OS

if os.name == ''posix':
linuxStuff()
elif os.name == 'nt':
windowsStuff()
elif os.name == 'os2': ...

Create a python subprocess to call ApplesScript function.
AppleScript function will look like/include:

on run {input, parameters}
	set theOutput to {}
	tell application "Microsoft PowerPoint"
		launch
		set theDial to start up dialog
		set start up dialog to false
		repeat with i in input
			open i
			set pdfPath to my makeNewPath(i)
			save active presentation in pdfPath as save as PDF
			close active presentation saving no
			set end of theOutput to pdfPath as alias
		end repeat
		set start up dialog to theDial
	end tell
	return theOutput
end run

on makeNewPath(f)

Acceptance criteria

How do we know when this work is done?

Run script in cisa/pe-reports with no errors.
Run PowerPoint Application processing in background.
Write in condition to look at OS
All graphing issues resolved.

Define variable for empty iterations

💡 Summary

In reference to #6 (comment), all report objects need to be defined as a variable in the event an iteration renders empty. This primarily applies in the pptx object 'shape' from 'shapes' and will need to be adjusted accordingly when applying the matplotlib libraries. Issue -#14

Motivation and context

Considered a quality improvement to code.

Implementation notes

Please provide details for implementation, such as:

From pptx-python - the scope of 'shape':

- from pptx import Presentation

SLD_LAYOUT_TITLE_AND_CONTENT = 1

prs = Presentation()
slide_layout = prs.slide_layouts[SLD_LAYOUT_TITLE_AND_CONTENT]
slide = prs.slides.add_slide(slide_layout)
shapes = slide.shapes

Example stylesheets.py: Declare a shape variable in the event slide.shapes is empty.

   @staticmethod
    def shapes(slide):
        """Create a text frame."""
        for shape in slide.shapes:
            if not shape.has_text_frame:
                continue
        return shape

Acceptance criteria

How do we know when this work is done?

Return value shape on empty iteration

DRY out this code

💡 Summary

There is a repeat pattern of code used in pages.py. Noted in comments bug-unusable-package, here and here . The code should be re-written as a single function/class, unambiguous and non-repeatable.

Motivation and context

Due to the change in graphing libraries in the next planned merge. This issue will be generically applied to the updates being made in Issue - #14 "Replace python-pptx graphs with Matplotlib" and will be set as an "Acceptance criteria" item.

Implementation notes

An evaluation of updated Matplotlib code to pptx file code is needed.

Acceptance criteria

How do we know when this work is done?

Repetition of code is reduced.

Update README.md (main)

💡 Summary

Update README.md content in association to repo changes which may include the following items:

Update to pe-reports cmd-line usage
Updates to install process
Description on report generation and delivery processes.
Description of PE Report metrics.

Motivation and context

Necessary to provide appropriate instruction for using repository resources and defining its purpose.

Implementation notes

Please provide details for implementation, such as:

an example for how this would be used
what this would look like
how this would act
any related work, including links to related issues

Acceptance criteria

How do we know when this work is done?

P&E Team Sign-Off (all members)
Commit README.md file in pull request

Add New Team Member: cduhn17

💡 Summary

Craig Dunn is a new member to the P&E Team and will be an owner of https://github.com/cisagov/pe-reports

Motivation and context

Craig Dunn's GitHub profile = cduhn17

Implementation notes

Coordinate with Fusion Dev to add Craig's GitHub profile to 'cisagov' organization.
Add Craig's GitHub profile as owner to cisa/pe-reports
update https://github.com/cisagov/pe-reports/blob/develop/.github/CODEOWNERS

# These owners will be the default owners for everything in the
# repo. Unless a later match takes precedence, these owners will be
# requested for review when someone opens a pull request.
* @aloftus23 @cduhn17 @dav3r @DJensen94 @elr64 @felddy @jsf9k @mcdonnnj @schmelz-ctr @stewartl97

Acceptance criteria

How do we know when this work is done?

Craig is successfully added to https://github.com/cisagov/pe-reports
CODEOWNERS is updated.

Re-review of unresolved conversations in PR#33

Thank you for your work on this PR. I had some suggestions and some questions about fail states I would like explained a bit more.

Originally posted by @mcdonnnj in #33 (review)

With multiple approvals from the team we are looking to move forward with a merge. The PR is a blocker for other Issues pending that will clean-up branch "main".

To continue, we are creating this issue in the event we need to revisit some of the unresolved conversations.