This project compares two popular classification algorithms for sentiment analysis of political text.
https://www.notion.so/9e0455381c4c499597cacc7979842f6a?v=67e699c48bac4d7e81796e9d019c0b9a
- Find 50 articles from Breitbart & Guardian (~50% pos and ~50% neg on topic of Trump)
- Split training corpus in pos and neg files
- Lemmatize words
- Remove stop words
- Remove capitalization & punctuation
- Convert txt to lower
- Tokenize and remove punctuation (https://stackoverflow.com/questions/15547409/how-to-get-rid-of-punctuation-using-nltk-tokenizer)