Giter VIP home page Giter VIP logo

text-classification's Introduction

text-classification

Ekşi Sözlük is one of the most popular social media platforms in Turkey. Users discuss every topic such as (Education, Relationships etc). So I wanted to create a simple text classification example.

To test some sentences, I added a simple flask application. You need to write

set FLASK_APP=app.py
flask run

After that you can see the simple web app.

DATASET

I created dataset by scraping the website. You can use the scrape.py to do it. It will scrape the topics currently popular (agenda). It will save entries to the total_dataset.csv file.

For the project, I uploaded a small dataset where there are 8000 entries, 1600 for 5 categories.

LABELING DATA

Since text classification is a supervised task, you need to label the data. I labeled as Economy, Education, Politics, Relationships, Sports.

But you can choose as many as categories you want.

CREATING DATASET

You need to run data.py to get the dataset which you can use for training. From the command line, just write

python data.py

It will create a folder named dataset, and it will include the features and labels for both training and testing.

To create the dataset, Tf-Idf is used. You can check from here

TRAINING

I just used the Naive Bayes. I got 77 percent accuracy which is not bad actually. You need to run the train.py file. From the command line, just write

python train.py

It will create a folder named as models, where you can find your Naive Bayes model used for training.

TESTING

To test some tweets, you need to run test.py I wrote 3 tweet just for trying, you can challenge the model as many as you can. Do not forget, the training data is Turkish, this is a Turkish text classification so you need to write your sentences in Turkish for testing. From the command line, just write

python test.py

text-classification's People

Contributors

ocakhasan avatar

Stargazers

Serhan Silahyürekli avatar Burak Sekili avatar

Watchers

 avatar

Forkers

mustafasari

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.