Giter VIP home page Giter VIP logo

cusim's Introduction

CUSIM

License Build Status contributions welcome Documentation Status

Superfast CUDA implementation of Word2Vec and Latent Dirichlet Allocation (LDA)

Introduction

This project is to speed up various ML models (e.g. topic modeling, word embedding, etc) by CUDA. It would be nice to think of it as gensim's GPU version project. As a starting step, I implemented the most widely used word embedding model, the word2vec model, and the most representative topic model, the LDA (Latent Dirichlet Allocation) model.

Requirements

  • Python3.6+
  • gcc / g++ (>= 5.1 for c++14)
  • cuda >= 7.0
  • Tested on Ubuntu 18.04 / GCC 7.5 / CUDA 11.1 / Python 3.6

How to install

  • install from pypi
pip install cusim
  • install from source
# clone repo and submodules
git clone [email protected]:js1010/cusim.git && cd cusim && git submodule update --init

# install requirements
pip install -r requirements.txt

# generate proto
python -m grpc_tools.protoc --python_out cusim/ --proto_path cusim/proto/ config.proto

# install
python setup.py install

How to use

  • examples/example_w2v.py, examples/example_lda.py and examples/README.md will be very helpful to understand the usage.
  • paremeter description can be seen in cusim/proto/config.proto

Performance

  • AWS g4dn 2xlarge instance is used to the experiment. (One NVIDIA T4 GPU with 8 vcpus, Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz)
  • results can be reproduced by simply running examples/example_w2v.py and examples/example_lda.py
  • To evaluate w2v model, I used evaluate_word_pairs function (ref link) in gensim, note that better performance on WS-353 test set does not necessarily mean that the model will workbetter in application as desribed on the link. However, it is good to be measured quantitively and fast training time will be at least very objective measure of the performaance.
    • I trained W2V model on quora-duplicate-questions dataset from gensim downloader api on GPU with cusim and compare the performance (both speed and model quality) with gensim.
  • To evaluate LDA model, I found there is no good way to measure the quality of traing results quantitatively. But we can check the model by looking at the top words of each topic. Also, we can compare the training time quantitatively.
  • W2V (skip gram, hierarchical softmax)
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 892.596 544.212 310.727 226.472 16.162
pearson 0.487832 0.487696 0.482821 0.487136 0.492101
spearman 0.500846 0.506214 0.501048 0.506718 0.479468
  • W2V (skip gram, negative sampling)
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 586.545 340.489 220.804 146.23 33.9173
pearson 0.354448 0.353952 0.352398 0.352925 0.360436
spearman 0.369146 0.369365 0.370565 0.365822 0.355204
  • W2V (CBOW, hierarchical softmax)
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 250.135 155.121 103.57 73.8073 6.20787
pearson 0.309651 0.321803 0.324854 0.314255 0.480298
spearman 0.294047 0.308723 0.318293 0.300591 0.480971
  • W2V (CBOW, negative sampling)
attr 1 workers (gensim) 2 workers (gensim) 4 workers (gensim) 8 workers (gensim) NVIDIA T4 (cusim)
training time (sec) 176.923 100.369 69.7829 49.9274 9.90391
pearson 0.18772 0.193152 0.204509 0.187924 0.368202
spearman 0.243975 0.24587 0.260531 0.237441 0.358042
  • LDA (nytimes dataset from https://archive.ics.uci.edu/ml/datasets/bag+of+words)
    • I found that setting workers variable in gensim LdaMulticore does not work properly (it uses all cores in instance anyway), so I just compared the speed between cusim with single GPU and gensim with 8 vcpus.
    • One can compare the quality of modeling by looking at examples/cusim.topics.txt and examples/gensim.topics.txt.
attr gensim (8 vpus) cusim (NVIDIA T4)
training time (sec) 447.376 76.6972

Future tasks

  • support half precision
  • support multi device (multi device implementation on LDA model will not be that hard, while multi device training on w2v may require some considerations)
  • implement other models such as FastText, BERT, etc
  • contribution is always welcome

cusim's People

Contributors

js1010 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cusim's Issues

Compilation error with MS VC 2015 & 2017 under windows

In https://github.com/js1010/cusim/blob/main/cpp/src/utils/ioutils.cc the "not" logical operator should be replaced by "!"

cpp/src/utils\ioutils.cc(25): error C2065: 'not': undeclared identifier
cpp/src/utils\ioutils.cc(25): error C2146: syntax error: missing ')' before identifier 'err_cmt'
cpp/src/utils\ioutils.cc(25): error C2059: syntax error: ')'
cpp/src/utils\ioutils.cc(25): error C2143: syntax error: missing ';' before 'return'
cpp/src/utils\ioutils.cc(39): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
cpp/src/utils\ioutils.cc(72): error C2065: 'not': undeclared identifier
cpp/src/utils\ioutils.cc(72): error C2146: syntax error: missing ')' before identifier 'read_lines'
cpp/src/utils\ioutils.cc(72): error C2059: syntax error: ')'
cpp/src/utils\ioutils.cc(72): error C2143: syntax error: missing ';' before 'return'
cpp/src/utils\ioutils.cc(95): error C2065: 'not': undeclared identifier
cpp/src/utils\ioutils.cc(95): error C2146: syntax error: missing ')' before identifier 'word_idmap_'
cpp/src/utils\ioutils.cc(95): error C2059: syntax error: ')'
cpp/src/utils\ioutils.cc(95): error C2143: syntax error: missing ';' before 'continue'
cpp/src/utils\ioutils.cc(102): warning C4267: '+=': conversion from 'size_t' to 'int', possible loss of data
cpp/src/utils\ioutils.cc(109): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
cpp/src/utils\ioutils.cc(154): error C2065: 'not': undeclared identifier
cpp/src/utils\ioutils.cc(154): error C2146: syntax error: missing ')' before identifier 'remain_lines_'
cpp/src/utils\ioutils.cc(154): error C2059: syntax error: ')'
cpp/src/utils\ioutils.cc(154): error C2146: syntax error: missing ';' before identifier 'fin_'
cpp/src/utils\ioutils.cc(163): warning C4267: '=': conversion from 'size_t' to 'int', possible loss of data
cpp/src/utils\ioutils.cc(173): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
cpp/src/utils\ioutils.cc(202): error C2065: 'not': undeclared identifier

unable to install cusim

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting cusim
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/50/0d/3f7cde3e169d0949342af84ed4d0ab02d094191d7b927add4ce592145c32/cusim-0.0.2.tar.gz (364 kB)
ERROR: The tar file (C:\Users\admin\AppData\Local\Temp\pip-unpack-ih0m5zfp\cusim-0.0.2.tar.gz) has a file (C:\Users\admin\AppData\Local\Temp\pip-install-ullfmob7\cusim\cusim/aux.py) trying to install outside target directory (C:\Users\admin\AppData\Local\Temp\pip-install-ullfmob7\cusim)

Unable to Install package

python setup.py install gives me this error
Traceback (most recent call last):
File "setup.py", line 23, in
from cuda_setup import CUDA, BUILDEXT
File "/home/d/cusim/cuda_setup.py", line 260, in
assert CUDA is not None
AssertionError. I do have

(cuda_envi) bash-4.2$ whereis cuda
cuda: /usr/local/cuda

bash-4.2$ echo $CUDA_HOME
/usr/local/cuda/

setuptools

when trying to install via pip install cusim-0.0.2.tar.gz I got this error as following:

no matching distribution found for setuptools>=1.3.2

while i have actually setuptools==41.4.0

Can you help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.