Giter VIP home page Giter VIP logo

kaggle-natural-language-processing-with-disaster-tweets's Introduction

kaggle-Natural-Language-Processing-with-Disaster-Tweets

Task Description

Dataset: Natural Language Processing with Disaster Tweets

In the disaster tweets classification task, I will build some machine learning models to train and predict which tweets are about real disaster (label = 1) and which one’s aren’t (label = 0).

Implementation

This task is implemented at kaggle with CPU and GPU T4 x2 in Python.

Code Description

1-data-preprocessing-disaster-tweets.ipynb: This code contains several parts of data preprocessing for training set and test set.

2-ml-disaster-tweets.ipynb: This code is to construct and train three machine learning models to classify the disaster tweets, such as XGBoost, SVM, Ransom Forest, and evaluate the performance with validation accuracy.

3-bert-disaster-tweets.ipynb: This code is to train BERT model to classify the disaster tweets and evaluate the performance with validation accuracy.

4-zero-shot-classification-disaster-tweets.ipynb: In this notebook, I explore the zero-shot classification using the Hugging Face library.

Data Description

disaster tweets 3 tokenizers data (Testset): Contain the preprocessed results with 1-data-preprocessing-disaster-tweets.ipynb on Test set.

  • test_e.csv: preprocessed test set with TreebankWordTokenizer
  • test_u.csv: preprocessed test set with WordPunctTokenizer
  • test_s.csv: preprocessed test set with WhitespaceTokenizer

Prediction: Contain the predicted results on test set and sumbit at kaggle.

  • test_prediction.csv: The best result predicted by BERT model.
  • SVM_penn_tokens_prediction.csv: The result predicted by SVM model (data tokenized by TreebankWordTokenizer).
  • zero_shot_submission.csv: The result predicted by pre-trained model in zero-shot way.

Reference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.