- 60% of teenagers have experienced some sort of cyberbullying.
- Overall, 36.5% of people feel they have been cyberbullied in their lifetime.
- Increasing Cyberbullying since digital sphere (social media) has expanded and technology has advanced so its misuse has also increased.
- What is Cyberbullying or online bullying exactly: It is when someone bullies or harasses others on the internet, particularly on social media. Harmful bullying behaviour can include posting rumours, threats, sexual remarks, personal information of victim, or hate speeches.
- Victims of cyberbullying may experience lower self-esteem, increased suicidal ideation, and various negative emotional responses, including anger and depression.
- Develop a Machine Learning model, which will classify any text into 6 categories which are as follows: age-based cyberbullying, ethnicity-based cyberbullying, gender-based cyberbullying, religion-based cyberbullying, any other form of cyberbullying, and not cyberbullying.
- Further, developing chatbots for various social media platforms like Discord (https://discord.com/) to try to detect cyberbullying using the above machine learning model, and take appropriate measures.
- Dataset from Kaggle having 47693 sentences / tweets.
Transfer Learning, Python, Google Colab, etc.
spacy, NLTK, scikit-learn, TensorFlow, Keras, NumPy, etc.
- Tried different models and the best came out to be Long short-term memory (LSTM) with 85% accuracy.
- Datset used to train the model was the Twitter based dataset picked up from Kaggle.
- After training was done, object's state was saved in the binary Pickle file so that there is no need of training the model again and again.
- Discord Bot that we have created will pick up the text from the Discord via the Python script and then run the pickle file over that text and flags the text on the basis of cyberbullying and thus take appropriate actions if cyberbullying is detected.
- LSTM based ML model couldn't be integrated with the Discord Bot due to unavailability of the high processors and GPU so we have to got best accuracy as 78% for the Random Forest model which can be integrated with the Discord Bot.
- We have used the concept of Transfer Learning where in we are changing the last layer of the model.
- Further increasing accuracy of the model as it requires to take care of the context in which words are used to be able to properly classify it.
- Converting the code so that it can be used in case of videos, images and audio as well.
- Difficulty in the initial phases due to pandemic.