Giter VIP home page Giter VIP logo

jpcenteno / kdtree-pca-sentinment-analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 63.55 MB

馃 Sentiment analysis using PCA + KDTree KNN. Implemented using C++ & PyBind. Run in parallel with OpenMP. 馃く

Emacs Lisp 0.02% CMake 0.79% Makefile 0.05% Python 7.18% Shell 0.34% TeX 6.08% C++ 4.60% Jupyter Notebook 80.95%
nlp machine-learning sentiment-analysis pca knn kdtree cpp python pybind openmp

kdtree-pca-sentinment-analysis's Introduction

Sentiment Analysis with PCA and KDTree KNN

This project implements a sentiment classifier using a custom C++ implementation of PCA and KNN. We wrote a KNN classifier adapting a KDTree implementation by Fabian Meyer. This implementation leverages OpenMP for parallel computation achieving a significant speedup in train/test iterations. Finally, we wrapped the C++ code into Python classes using PyBind.

This project was made as part of the "Numerical Methods" course, taught during the Second semester of 2019 at the Faculty of Natural and Exact Sciences, University of Buenos Aires.

Instrucciones

Datos

En data/ tenemos que descomprimir el dataset de IMDB, que lo pueden bajar de ac谩

Otros directorios

En src/ est谩 el c贸digo de C++, en particular en src/sentiment.cpp est谩 el entry-point de pybind.

En notebooks/ hay ejemplos para correr partes del TP usando sklearn y usando la implementaci贸n en C++.

Creaci贸n de un entorno virtual de python

Con pyenv

curl https://pyenv.run | bash

Luego, se sugiere agregar unas l铆neas al bashrc. Hacer eso, REINICIAR LA CONSOLA y luego...

pyenv install 3.6.5
pyenv global 3.6.5
pyenv virtualenv 3.6.5 tp2

En el directorio del proyecto

pyenv activate tp2

Directamente con python3

python3 -m venv tp2
source tp2/bin/activate

Con Conda

conda create --name tp2 python=3.6.5
conda activate tp2

Instalaci贸n de las depencias

pip install -r requirements.txt

Correr notebooks de jupyter

cd notebooks
jupyter lab

o notebook

jupyter notebook

Compilaci贸n

Ejecutar la primera celda del notebook knn.ipynb o seguir los siguientes pasos:

Subm贸dulos y librer铆as necesarias

Necesitamos bajar las librer铆as pybind y eigen (el "numpy" de C++), para eso bajamos los subm贸dulos como primer paso.

Versi贸n de Python >= 3.6.5

Para bajar subm贸dulos ejecutar:

git submodule init
git submodule update
  • Compilar el c贸digo C++ en un m贸dulo de python
mkdir build
cd build
rm -rf *
cmake -DPYTHON_EXECUTABLE="$(which python)" -DCMAKE_BUILD_TYPE=Release ..
  • Al ejecutar el siguiente comando se compila e instala la librer铆a en el directorio notebooks
make install

Prueba de clasificaci贸n

  1. Compilar la librer铆a
cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make clean && make && make install
  1. Correr clasificaci贸n sobre sample
python bin/classify.py data/test_sample.csv data/test_sample.out
  1. Correr evaluaci贸n
python bin/evaluate.py data/test_sample.out data/test_sample.true

kdtree-pca-sentinment-analysis's People

Contributors

catrield avatar finiteautomata avatar fragofer avatar gflan avatar jpcenteno avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    馃枛 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 馃搳馃搱馃帀

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google 鉂わ笍 Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.