Giter VIP home page Giter VIP logo

maz2198 / natural-language-processing Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 24.83 MB

This repository represents several projects completed in IE HST's MS in Business Analytics and Big Data program, Natural Language Processing course.

Jupyter Notebook 93.59% HTML 6.41%
topic-modeling nlp-keywords-extraction domestic-violence nlp-dependency-parsing naive-bayes-classifier svm ensemble-model stemming lemmatization classification

natural-language-processing's Introduction

Natural Language Processing

This repo comprises of several projects completed for the Natural Language Processing course at IE University's MS in Business Analytics and Big Data program.

Topic Modelling: Domestic Violence Application

Domestic Violence is not a pandemic, it’s an epidemic. With Covid-19 ravaging the economy and increasing unemployment; such crises are set to become much more frequent. Add another public health crisis to the toll of the new coronavirus: Mounting data suggests that domestic abuse is acting as an opportunistic infection, flourishing in the conditions created by the pandemic.

With topic modelling, we were looking to identify the different topics or classes of the tweets or comments in Reddit to make the large and unstructured data more organized in a way that will make it easier for NGOs, government officials or researchers to assess and get useful insights from to better analyze this crisis and come up with better courses of action.

Reddit comments (+- 1200) and tweets (+- 25 000) were scraped with searches or hashtags #METOO, #WHYISTAYED, #WHYILEFT,#HeForSheAtHome, #WomenCount, #GenerationEquality, #AntiDomesticViolenceDuringEpidemic, #Mask-19, #WithHer, #SpotlightEndViolence and #staysafe.

Three topic modelling techniques were applied:

  • Latent Semantic Indexing (LSI)
  • Hierachical Dirichlet Process (HDP)
  • Latent Dirichlet Allocation (LDA)

The topics were identified across five countries where the conversation of Domestic Violence was most prominent; namely USA, UK, India, Nigeria & Kenya.

An in-depth report reviewing relevant literature, outlining the methodology taken and summarizing insights made, is available here. The final website and application can be viewed at Domestic Violence: Topic Modelling.

This project was completed with my colleagues Adelina Akhsanova, Nisrine Ferahi, Begona Frigolet, Mohamed Amer and Prasanth Chakka.

Real or Not: Tweets Disaster Classification

The main objective of this project wass to classify these tweets on whether they are about real disasters or not.

We are given a dataset of almost 11,000 tweets and can be downloaded here. The complication comes because many tweets have words that would imply a disaster but are used metaphorically. Therefore, merely identifying the presence of such words would not guarantee that the tweet is in fact referring to an actual disaster. We have just three pieces of information to do this,

  • the tweet
  • the key word in the tweet
  • location of tweet account.

Several preprocessing techniques and the use of n-grams, lemmatization, stemming and different tokenizers were applied. Moreover, an ensembled voting classifier consisting of a Logistic Regression Model, Multinomial Naive-Bayes and SVM. The entire preprocessing and modelling is outlined in NLP_MarangMutloatse.ipynb. An summary of the methodology, results and insights are outlined in NLP_Report.pdf

natural-language-processing's People

Contributors

maz2198 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.