Text analysis of the internet forum r/wallstreetbets subreddit to predict the popularity of posts from the post's text.
Final project for NYU Messy Data and Machine Learning class. Contains explicit language.
.
├── analyses # Feature engineering, model fitting, and performance estimates
│ └── plots # Plots
├── data # Cleaned data and cleaning scripts
├── inputs # Raw input data and scraping scripts
├── material # Class material (proposal and paper)
└── README.md
To reproduce, run the scripts in the following order:
inputs/scrape_WBS.R
data/cleaning.R
- Features:
analyses/topic_modeling.R
analyses/sentiment_scoring.py
analyses/GME_price.R
analyses/comment_hierarchy.R
analyses/feature_engineering.R
analyses/feature_selection.R
analyses/create_train_test_split.R
analyses/model_upvotes.R