Note that in its current state the code is in a bit of a mess. A lot of remnants of some related expiriments are left in the code... Substantial refactoring is needed.
I've started refactoring heavily in the v2 branch, but it's currently not actually working/building in v2. Please post issues on github and I'll try to take a look, but my time on this is very limited :(
word matrix batches
This code was developed as part of my Master's thesis research.
A paper is available that describes the methods in this package on IEEE:
Efficient and accurate Word2Vec implementations in GPU and shared-memory multicore architectures
The work builds upon ideas presented in BIDMach and further refined in Intel's pWord2Vec.
This code supports:
- Both CPU and GPU matrix-based fast Word2Vec
- Both SkipGram and Hierarchical Softmax Word2Vec architectures
This code does not support:
- Distributed computing techniques (see pWord2Vec)
- CBOW Word2Vec architectures
The make file (hackishly) supports g++, CUDA or ICPC.
Different source files are used for different compilers.
To compile, use make:
For g++:
make
For CUDA:
make cuda
For MKL support and ICPC:
make intel
Once made, you can use the scripts in /scripts to run test programs:
Testing g++ or icpc compiled program:
./cpu.sh [num threads]
Testing CUDA (requries 6.0 CUDA capability):
./cuda.numCPUT-batchSize-batchesPerT.sh [num cpu threads] [batch size] [batches per thread]
For all programs, to get test data:
./get-data.sh