This repo contains the code to run the sentiment analysis pipeline
- sentiment_pytorch.py - Pytorch code using DataParallel
- sentiment_ddp.py - Pytorch code using Distributed Data Parallel
- sentiment.py - Tensorflow MultiGPU
This takes as input the folder containing the input files, located in sentiment/ containing the text data. The processed files, and the results files are written to the folders as shown below.
sentiment/ *.parquet ...
sentimentres/ processed/ results/
The files can be copied from the Wasabi S3 buckets using the following scripts
- copy_files_from_wasabi.sh
- copy_files_to_wasabi.sh
Use the file env_minimal.yml to setup a Python environment