CCDB Data Pipeline
A lightweight ETL data pipeline intended to support the operations of the Consumer Complaint Search application.
Description: This purpose of this code is to provide data for Consumer Complaint Search. This pipeline downloads scrubbed consumer complaint data and indexes that data in Elasticsearch for the Complaint Search application to display and analyze.
Status: In Production
Dependencies
This pipeline is intended to index data in Elasticsearch and is dependent on having an Elasticsearch instance to interface with.
Installation
Detailed instructions on how to install, configure, and get the project running are in the INSTALL document.
Usage
source ./activate-virtualenv.sh
- Set environment variables
export AWS_ACCESS_KEY_ID=<svc_account_access_key>
export AWS_SECRET_ACCESS_KEY=<svc_account_secret_access_key>
export ES_USERNAME=<foo>
export ES_PASSWORD=<bar>
export ENV=[ENVIRONMENT]
- where ENVIRONMENT=
dev
,staging
,prod
- where ENVIRONMENT=
export INPUT_S3_BUCKET=<bucket-name>
export INPUT_S3_KEY=<path-to-csv>
export OUTPUT_S3_BUCKET=<bucket-name>
export OUTPUT_S3_FOLDER=<path-to-csv-and-json>
make
Getting help
Instruct users how to get help with this software; this might include links to an issue tracker, wiki, mailing list, etc.