Natural Language Processing Project, Warsaw University of Technology
Project introduction:
Our idea is to create a machine learning model which can distinguish/ classify online hate speech: specifically, it will be able to discern transphobic statements on the social media platform Twitter. We plan to use Twitter API to collect a data set of English-language tweets. These will then be used to train and test our model. Once we have our classifier, we will study the prevalence of transphobia on English-language Twitter. Our findings will be described in a paper, which will include a walkthrough of our method, statistical analysis of the results obtained by the classifier, and our answers to the following questions:
How prevalent is transphobia on Twitter?
What are the most common ways in which people express transfobia?
In what context does transphobic speech usually occur?
What locations are transphobic tweets most often sent from?
Group members and preliminary task distribution:
Marin Karamihalev - programmer, help w/ statistical analysis
Siyana Ivanova - organisation & scheduling, statistical analysis, writing the paper
Jakub Blaszkowski - data pre-processing
Jae Sun Lee - code review, testing & quality assurance
Wojciech Drezek - programmer
To run the program: From the folder with the project run > python gui.py