Giter VIP home page Giter VIP logo

mbti_analysis's Introduction

Psychotypes analysis by Myers–Briggs Type Indicator

HSE ML course project (2nd year)

Team:

  • Kuznetsov Nikita
  • Levin Mark

Installing dependencies

foo@bar:~$ pip3 install -r requirements.txt

Source data description

There are two datasets:

  • dataset from kaggle.com, csv file containing two columns -- type (psychotype class, 16 in total) and posts (a person’s post on a social network in text format).
  • For additional data, we manually parsed articles from the forum written by people with psychotypes designated mbti.

Our purposes

Write a model that determines 1 of 16 MBTI psychotypes of a person based on the written text. A telegram bot was also written to interact with users. Improve metric results by adding additional data from the forum mentioned above.

Used algorithms

  • For preprocessing, we've removed punctuation and unnecessary information that didn't contain meangful text, conducted experiments with lemmatization and stemming, stemming turned out to be better.
  • We have chosen TF-IDF to create embeddings.
  • There was no explicit choice regarding the models; experiments were carried out with logistic regression, catboost, xgboost, and random forest.

Metrics

There is a classification problem with highly unbalanced classes (confirmation of this is below, a graph correlating the number of posts of people with psychotype classes), so we decided to take F1 score.

alt text

Results

The results from stemming were slightly better than those from lemmatization. Also, after adding an additional dataset, the results increased by 0.02 on average. (more details - in jupyter notebooks)

There are some of them (with supplemented dataset):

  • Catboost (stemming): 0.676
  • Xgboost (stemming): 0.671
  • LogisticRegression (stemming): 0.645
  • Catboost (lemmatization): 0.671
  • Xgboost (lemmatization): 0.664
  • LogisticRegression (lemmatization): 0.621
  • RandomForestClassifier (stemming): 0.432

mbti_analysis's People

Contributors

bananananacat avatar nikait avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.