Giter VIP home page Giter VIP logo

quora-insincere-identification's Introduction

Quora insincere questions identification with Multi-ratio sampling, LSTM and Capsules

An attempt to tackle the Quora Insincere Questions Classification competition. Data is available here.

All the relevant code of the project is in the jupyter notebook file Iqbal_Asif_project_quora.ipynb. The file is self-contained, however, you do need to prepare the python environment and install required dependencies as described below. The file has

Setup

I have personally used the Anaconda distribution for downloading all required packages and creating my virtual environment for this. You can use pip for your purposes. For a proper execution of my code, Python version 3.6.6 is needed. Ideally any version above 3.6.6 should also work, but since I am using Keras with Tensorflow backend and Tensorflow seems to have some issues with Python 3.7, I have decided to stick to version 3.6.6.

To mimic my setup steps, do the following:

Setting up the environment and packages

  • Install Python 3.6.6 from here
  • Install Anaconda by following the instructions here
  • Create a conda virtual environment conda create --name {your_env_name}
  • Activate the environment conda activate {your_env_name}
  • Install following packages:
conda install numpy
conda install pandas
conda install scikit-learn
conda install matplotlib
conda install nltk
conda install seaborn
conda install -c conda-forge keras
conda install jupyter

Download the data

  • Clone the git repository located at https://github.com/asif31iqbal/quora-insincere-identification. You can close using ssh: git clone [email protected]:asif31iqbal/quora-insincere-identification.git
  • cd to the directory of the repository
  • Install Kaggle API pip install kaggle --upgrade
  • Assuming you have a Kaggle account, follow the commnads here regarding API Credentials so that you are able to invoke the kaggle API methods.
    • Create a new Kaggle API Token from you Kaggle's profile page
    • Download the created kaggle.json file and copy to a directory called .kaggle under your home directory
    • chmod 600 ~/.kaggle/kaggle.json
  • Create data directory mkdir data
  • cd data
  • Download the Kaggle competitions files by invoking kaggle competitions download quora-insincere-questions-classification
  • For our purposes, we only need 3 files out of the downloaded files. Unzip the files and copy train.csv, test.csv and glove.840B.300d.txt. Copy these 3 files right under the data directory. Feel free to delete other files that got extracted from the zip file.

At this point the directory structure should looks like:

your_root_directory_where_you_cloned_the_repo
|
|-- Iqbal_Asif_project_quora.ipynb
|-- report.pdf
|-- data
    |
    |-- train.csv
    |-- test.csv
    |-- glove.840B.300d.txt

That's it!! You are all set. At this point just run jupyter notebook, open the Iqbal_Asif_project_quora.ipynb file and run step by step or whatever way you want. The notebook file has short embedded guidelines of how everything is working together. Details of the concept and implementation can be found in report.pdf.

quora-insincere-identification's People

Contributors

asif31iqbal avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.