Final Project Machine Learning
The main idea of the project is to classify whether a statement is sexist, not sexist or neutral. The main motivation is to identify the sexist statements that are commonly observed in workspaces. In the project we are trying to classify these statements using various ML algorithm and parallely comparing which algorithm gives the best accuracy.
The dataset is obtained from kaggle and contains classified data on sexist and not sexist statementsf
In the preprocessing step we cleaned the data and removed missing values. Then we used BOW(bag of words) and TF-IDF(Term frequency inverse document frequence) for feature extraction and finally trained models on the obtained preprocessed data.
- Logistic Regression on BOW and TF-IDF
- Naive Bayes on TF-IDF Data
- Random Fprest on TF-IDF Data
- SVM with various kernels on TF-IDF Data
- matplotlib==3.2.0
- seaborn==0.10.0
- nltk==3.5
- numpy==1.17.4
- pandas==0.25.3
- scikit_learn==1.0.1
Download the ipynb file and load the dataset folder. Then run each cell to obtain output