cannylab / tsne-cuda Goto Github PK

GPU Accelerated t-SNE for CUDA with Python bindings

License: BSD 3-Clause "New" or "Revised" License

Python 8.46% CMake 8.41% C++ 25.41% Cuda 45.47% C 2.01% Shell 1.40% Roff 7.82% HTML 1.02%

cuda gpu mnist tsne tsne-algorithm data-visualization data-analysis barnes-hut barnes-hut-tsne fit-tsne

tsne-cuda's Introduction

TSNE-CUDA

This repo is an optimized CUDA version of FIt-SNE algorithm with associated python modules. We find that our implementation of t-SNE can be up to 1200x faster than Sklearn, or up to 50x faster than Multicore-TSNE when used with the right GPU. The paper describing our approach, as well as the results below, is available at https://arxiv.org/abs/1807.11824.

You can install binaries with anaconda for CUDA version 10.1 and 10.2 using conda install tsnecuda -c conda-forge. Tsnecuda supports CUDA versions 9.0 and later through source installation, check out the wiki for up to date installation instructions. https://github.com/CannyLab/tsne-cuda/wiki/

Benchmarks

Simulated Data

Time taken compared to other state of the art algorithms on synthetic datasets with 50 dimensions and four clusters for varying numbers of points. Note the log scale on both the points and time axis, and that the scale of the x-axis is in thousands of points (thus, the values on the x-axis range from 1K to 10M points. Dashed lines on SkLearn, BH-TSNE, and MULTICORE-4 represent projected times. Projected scaling assumes an O(nlog(n)) implementation.

MNIST

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the MNIST dataset. t-SNE-CUDA runs on the raw pixels of the MNIST dataset (60000 images x 768 dimensions) in under 7 seconds.

CIFAR

The performance of t-SNE-CUDA compared to other state-of-the-art implementations on the CIFAR-10 dataset. t-SNE-CUDA runs on the output of a classifier on the CIFAR-10 training set (50000 images x 1024 dimensions) in under 6 seconds. While we can run on the full pixel set in under 12 seconds, Euclidean distance is a poor metric in raw pixel space leading to poor quality embeddings.

Comparison of Embedding Quality

The quality of the embeddings produced by t-SNE-CUDA do not differ significantly from the state of the art implementations. See below for a comparison of MNIST cluster outputs.

Left: MULTICORE-4 (501s), Middle: BH-TSNE (1156s), Right: t-SNE-CUDA (Ours, 6.98s).

Installation

To install our library, follow the installation instructions.

Run

Like many of the libraries available, the python wrappers subscribe to the same API as sklearn.manifold.TSNE.

You can run it as follows:

from tsnecuda import TSNE
X_embedded = TSNE(n_components=2, perplexity=15, learning_rate=10).fit_transform(X)

We only support n_components=2. We currently have no plans to support more dimensions as this requires significant changes to the code to accomodate.

For more information on running the library, or using it as a C++ library, see the Python usage or C++ Usage sections of the wiki.

Citation

Please cite the corresponding paper if it was useful for your research:

@article{chan2019gpu,
  title={GPU accelerated t-distributed stochastic neighbor embedding},
  author={Chan, David M and Rao, Roshan and Huang, Forrest and Canny, John F},
  journal={Journal of Parallel and Distributed Computing},
  volume={131},
  pages={1--13},
  year={2019},
  publisher={Elsevier}
}

This library is built on top of the following technology, without this tech, none of this would be possible!

L. Van der Maaten's paper

FIt-SNE

Multicore-TSNE

BHTSNE

CUDA Utilities/Pairwise Distance

License

Our code is built using components from FAISS, the Lonestar GPU library, GTest, CXXopts, and OrangeOwl's CUDA utilities. Each portion of the code is governed by their respective licenses - however our code is governed by the BSD-3 license found in LICENSE.txt

tsne-cuda's People

Contributors

Stargazers

Watchers

Forkers

shlpu wdan fishexpert dreadlord1984 gustavocarita sigmaquan johnson-yue rappdw mdek mbenhamd bianjiang1234567 rubenszimbres z130110 gdcollect codeaudit mkolod dennistang742 ml-lab scarita amitadate rameezrehman83 stjordanis comzyh hedgehogtw feesics danielhanchen pawansit lidongyv yanzhaowu wyvern92 psu1 kobaisi gaurav-singh1998 b2220333 chck victorsungminyou davidtranno1 foreverzippo xyy19920105 asanakoy fionakim seancmonahan wicky1234444 lotayou wangzheallen quentin-wang blinky0815 basav1989 takahashikazutaka mteterin drhb ssitb chenxiangse sailfish009 hwfan kayya886 chenxj295 mindexp arthur151 pandinosaurus trasse astrogilda shiwenqi97 chadepl bakhtiaris dogadikbayir fabrizziosoares rosyapril rahulksom yusue wiwi basasuya ggmartins tkhan3 trendingtechnology jyunwe cjnolet zeta1999 kwonjunn01 liuynkit xlees donnyyou gabrielvicenteyt rituraj-commits yikuide dumpmemory wonderwrj leonardo-lyh apurv-shukla yephm gaimjkp silk760 h-vetinari jefffang19 hilbertxu jlqzzz jinsuby andreas-koukorinis kyky233 vallinp

tsne-cuda's Issues

Infinite loop in Tree Building Kernel

This is a bug that has been present for a while in tsne-cuda and that we can't seem to track down. It appears to occur when the number of nodes allocated for the barnes-hut tree is exceeded by the tree building kernel. The way this is handled inside the code at the moment causes an infinite loop.

First of all - this should be impossible. Unless there are two points in exactly the same position, then it only takes 2N tree nodes to separate all the data perfectly. We've checked, and there aren't two points in exactly the same position.

Furthermore, increasing the number of allocated nodes only delays the problem, and doesn't solve it. Printing out the number of used nodes shows that this is below 2N nodes, and well below the new increased number of nodes, in the iteration before the infinite loop.

The bug also appears data/learning rate dependent. Some combinations of datasets, perplexities, and learning rates cause the bug, while others do not. It is at least partially deterministic because the same combination of dataset and input parameters will cause the bug in the same location. However, saving the state of the program (the current embedded positions, the input data, learning rate, etc.) and restarting it at that point does not seem to cause the bug.

details: CUDA error 8

Hi,
I keep getting the following error.
I am using CUDA 9.1 and I installed the tsnecuda using conda.

Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runL2Norm(faiss::gpu::Tensor<T, 2, true, TIndex>&, faiss::gpu::Tensor<T, 1, true, TIndex>&, bool, cudaStream_t) [with T = float; TVec = float4; TIndex = int; cudaStream_t = CUstream_st*] at /home/rmrao/miniconda3/conda-bld/tsnecuda_1538699373852/work/third_party/faiss/gpu/impl/L2Norm.cu:239; details: CUDA error 8

Illegal instruction in CUDA8.0

Hi,
I just installed tsne-cuda used anaconda:

conda install tsnecuda cuda80 -c cannylab # For CUDA8.0

My nvcc version is listed below:
(tsnecuda) [xfu7@c47 ~]$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61
However using the following test script, I find some error:

>>> from tsnecuda import TSNE
>>> import numpy as np
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE().fit_transform(X)

WARNING clustering 4 points to 2 centroids: please provide at least 78 training points
Illegal instruction

Can anyone help me what is the problem?

Thanks a lot,

Python: TSNE().fit_transform endless loop with CUDA 9.2

I'm interested in looking at this implementation. To facilitate my exploration, I've created a docker image and am attempting to run this on an EC2 p3.xLarge instance (with nvidia-docker2 installed.) (Dockerfile is attached as is the script to install mkl in the docker image)

I had to modify the CMakeLists.txt to support the Nvidia Tesla v100 available on the p3 instance by adding the following to CMakeLists.txt:

63: -gencode=arch=compute_70,code=sm_70

After building and running the docker file, I try the simple example for python on the wiki:

>>> import numpy as np
>>> from tsnecuda import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(perplexity=64.0, learning_rate=270).fit_transform(X)
WARNING clustering 4 points to 2 centroids: please provide at least 78 training points

fit_transform never returns a result. One of the CPU cores on the p3 is running python at 100% and nvidia-smi shows GPU utilization at 100%.

To reproduce this:

Start a p3.xLarge EC2 instance
install nvidia-docker
clone repo and modify CMakeLists.txt as described
save attached Dockerfile and install-mkl.sh to /docker/test/
from repo directory:

docker build -t tsne-cuda-test:latest -f docker/test/Dockerfile .
docker run --init -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all tsne-cuda-test:latest

start Python REPL and entry example above

windows installation

So at this time this cuda-accelerated tsne id not supported on windows ? I've tried on my PC but met errors when I was using mkl installed by anaconda.

libfaiss.so error

When I try to execute I get an OS Error libfaiss.so. I'm using Python 3.6.7 on Ubuntu 18.10 (cuda 9.0)
I then tried to get the conda faiss linux-64/faiss-gpu-1.5.0-py36_cuda9.0_1.tar.bz2 from here but I couldn't find the libfaiss.so in it.

tsne = TSNE(n_components=2, perplexity=1000,
                n_iter=1500)

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-5-bf7b6c5abfca> in <module>
     13     tsne = TSNE(n_components=2, perplexity=i,
     14                 learning_rate=200,
---> 15                 n_iter=j)
     16     tsne_data = tsne.fit_transform(x_scaled[:20000, :])
     17     tsne_df[f'Type: {i}_{j}'] = pd.DataFrame(np.vstack([tsne_data.T, y]).T, 

~/.local/lib/python3.6/site-packages/tsnecuda/TSNE.py in __init__(self, n_components, perplexity, early_exaggeration, learning_rate, num_neighbors, force_magnify_iters, pre_momentum, post_momentum, theta, epssq, n_iter, n_iter_without_progress, min_grad_norm, perplexity_epsilon, metric, init, return_style, num_snapshots, verbose, random_seed, use_interactive, viz_timeout, viz_server, dump_points, dump_file, dump_interval, print_interval, device)
    108         # self._faiss_lib = N.ctypeslib.load_library('libfaiss', self._path) # Load the ctypes library
    109         # self._gpufaiss_lib = N.ctypeslib.load_library('libgpufaiss', self._path) # Load the ctypes library
--> 110         self._lib = N.ctypeslib.load_library('libtsnecuda', self._path) # Load the ctypes library
    111 
    112         # Hook the BH T-SNE function

~/.local/lib/python3.6/site-packages/numpy/ctypeslib.py in load_library(libname, loader_path)
    150             if os.path.exists(libpath):
    151                 try:
--> 152                     return ctypes.cdll[libpath]
    153                 except OSError:
    154                     ## defective lib file

/usr/lib/python3.6/ctypes/__init__.py in __getitem__(self, name)
    421 
    422     def __getitem__(self, name):
--> 423         return getattr(self, name)
    424 
    425     def LoadLibrary(self, name):

/usr/lib/python3.6/ctypes/__init__.py in __getattr__(self, name)
    416         if name[0] == '_':
    417             raise AttributeError(name)
--> 418         dll = self._dlltype(name)
    419         setattr(self, name, dll)
    420         return dll

/usr/lib/python3.6/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    346 
    347         if handle is None:
--> 348             self._handle = _dlopen(self._name, mode)
    349         else:
    350             self._handle = handle

OSError: libfaiss.so: cannot open shared object file: No such file or directory

terminate called after throwing an instance of 'thrust::system::system_error'

Hi,

thank you again for the conda option.

I installed:
"conda install tsnecuda cuda91 -c cannylab -c numba"

in a linux RHEL v7 with Cuda compilation tools, release 9.1, V9.1.85
the GPU hardware is 8 x Tesla P100

if more info is needed please let me know.

I am getting the following error:

Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
from tsnecuda import TSNE
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
X_embedded = TSNE(perplexity=64.0, learning_rate=270).fit_transform(X)
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: invalid device function
Aborted (core dumped)

get nan Avg. Gradient when handing data with (2000,50), (2000,150)

efficiency test uncheck
test datapoint of (5000,512)
time:2.29
test datapoint of (5000,5)
time=2.06
test datapoint of (5000,1024)
time:2.48
test datapoint of (50,50)
time:0.66
test datapoint of (500,50)
test datapoint of (2000,50)
time: bug infinite iteration
test datapoint of (50000,50)
time:19.85s
test datapoint of (50000,256)
time:20.67s
test datapoint of (400,25)
time:0.8s

I notice that the program reports a nan Avg. Gradient when testing with data having the shape of (2000,50), and jump to an infinite iteration when testing with data having the shape of (500,50).

Compilation error about -fPIC

First thanks a lot for providing the code, and suggestions on our previous fail on parallel_for #11 !
Compiling from source Error:

[ 38%] Building CXX object CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/GpuAutoTune.cpp.o
Linking CXX shared library libtsnecuda.so
/usr/bin/ld: CMakeFiles/tsnecuda.dir/./tsnecuda_intermediate_link.o: relocation R_X86_64_32S against `__nv_module_id' can not be used when making a shared object; recompile with -fPIC
CMakeFiles/tsnecuda.dir/./tsnecuda_intermediate_link.o: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
make[2]: *** [libtsnecuda.so] Error 1
make[1]: *** [CMakeFiles/tsnecuda.dir/all] Error 2
make: *** [all] Error 2

Gcc version:

gcc (Ubuntu 4.8.5-4ubuntu8~14.04.2) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

GPU:

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "Tesla K40c"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 11440 MBytes (11995578368 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Max Clock rate: 745 MHz (0.75 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "Tesla K40c"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 11440 MBytes (11995578368 bytes)
(15) Multiprocessors, (192) CUDA Cores/MP: 2880 CUDA Cores
GPU Max Clock rate: 745 MHz (0.75 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 384-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Peer access from Tesla K40c (GPU0) -> Tesla K40c (GPU1) : Yes
Peer access from Tesla K40c (GPU1) -> Tesla K40c (GPU0) : Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 2
Result = PASS

CUDA:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

install failure when building from source

Hi, when I try to compile this library from the source, I encounter the following error:

lib/libfaiss.so: undefined reference to dsyev_' lib/libfaiss.so: undefined reference to sgeqrf_'
lib/libfaiss.so: undefined reference to sorgqr_' lib/libfaiss.so: undefined reference to sgesvd_'
collect2: error: ld returned 1 exit status
CMakeFiles/tsne.dir/build.make:80394: recipe for target 'tsne' failed
make[2]: *** [tsne] Error 1
CMakeFiles/Makefile2:91: recipe for target 'CMakeFiles/tsne.dir/all' failed
make[1]: *** [CMakeFiles/tsne.dir/all] Error 2
Makefile:151: recipe for target 'all' failed
make: *** [all] Error 2

My environment is Ubuntu 16.04 with gcc 5.4.0. Could you give me some suggestions? Thanks in advance!

MKL Dir not Found

I am trying to build from source since I have a TeslaK80 and need to set some flags. I am trying to run the command

[kk303@compute-a-16-167 build]$ rm -r CMake*
[kk303@compute-a-16-167 build]$ cmake .. -DBUILD_PYTHON=TRUE -DWITH_ZMQ=FALSE -DWITH_MKL=TRUE -DMKL_DIR=/home/kk303/intel/mkl
-- The C compiler identification is GNU 6.2.0
-- The CXX compiler identification is GNU 6.2.0
-- Check for working C compiler: /n/app/gcc/6.2.0/bin/cc
-- Check for working C compiler: /n/app/gcc/6.2.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /n/app/gcc/6.2.0/bin/c++
-- Check for working CXX compiler: /n/app/gcc/6.2.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Not building with ZMQ. Interactive visualization disabled. To build with ZMQ use -DWITH_ZMQ=ON
-- Not building standalone gpufaiss lib. To build gpufaiss standalone use -DWITH_FAISS_GPU_STANDALONE=ON
-- Found Git: /usr/bin/git (found version "1.8.3.1") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /n/app/cuda/9.0 (found version "9.0") 
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
CMake Error at /n/app/cmake/3.7.1/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:138 (message):
  Could NOT find MKL (missing: MKL_LIBRARIES)
Call Stack (most recent call first):
  /n/app/cmake/3.7.1/share/cmake-3.7/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
  cmake/Modules/FindMKL.cmake:201 (find_package_handle_standard_args)
  CMakeLists.txt:89 (find_package)


-- Configuring incomplete, errors occurred!
See also "/home/kk303/cudatsne/tsne-cuda/build/CMakeFiles/CMakeOutput.log".
See also "/home/kk303/cudatsne/tsne-cuda/build/CMakeFiles/CMakeError.log".

MKL was installed and I have set up environment variables

[kk303@compute-a-16-167 build]$ echo $MKLROOT && echo $MKL_INCLUDE && echo $MKL_LIBRARIES
/home/kk303/intel/mkl
/home/kk303/intel/mkl/include
/home/kk303/intel/mkl/lib/intel64

Install documentation fix

instead of:

git submodules init
git submodules update

should be

git submodule init
git submodule update

no plural, atleast with CentOS 7.5:

git --version
git version 1.8.3.1

Link to Arxiv paper

Kindly add a link to your Arxiv paper in your readme:

https://arxiv.org/abs/1807.11824

python parameter list

Hi,
I am trying to decreasing the min gradient norm in the tsne function in python, but i cannot find the name of the parameter, do you have a list with the exact names of the parameters in the python wrapper ?
Thank you,
Fabien

Running parallel instances

Hi. I am wondering if it is possible to make multiple calls to tsnecuda() from a python multiprocessing module? It would be really great if that could work, but I am guessing the current implementation maxes out threads for each run.

Currently, when I try to call tsnecuda() from multiple CPU threads I get this kind of error:

[I 19:42:01.068 NotebookApp] Kernel interrupted: 37453a58-efbd-4902-a13d-921186587399
E: Device warp size not supported.E: Device warp size not supported.

E: Device warp size not supported.
E: Device warp size not supported.
E: Device warp size not supported.
E: Device warp size not supported.
E: Device warp size not supported.

Thanks for the help.

wininst installation error

Hello,

I am trying to install tsnecuda on conda and I get an error related to wininst-14 (I'm not sure what this is). My CUDA version when I do nvcc --version is Cuda compilation tools, release 8.0, V8.0.61.

I run the following command: conda install tsnecuda cuda80 -c cannylab

And I get the following output (error at the end):

Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/frozenmiwe/anaconda3/envs/tsnecuda

  added / updated specs:
    - cuda80
    - tsnecuda


The following NEW packages will be INSTALLED:

  blas               pkgs/main/linux-64::blas-1.0-mkl
  cuda80             cannylab/linux-64::cuda80-1.0-0
  cudatoolkit        pkgs/free/linux-64::cudatoolkit-8.0-3
  intel-openmp       pkgs/main/linux-64::intel-openmp-2019.3-199
  libgfortran        pkgs/free/linux-64::libgfortran-3.0.0-1
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  mkl                pkgs/main/linux-64::mkl-2019.3-199
  mkl_fft            pkgs/main/linux-64::mkl_fft-1.0.12-py36ha843d7b_0
  mkl_random         pkgs/main/linux-64::mkl_random-1.0.2-py36hd81dba3_0
  numpy              pkgs/main/linux-64::numpy-1.16.3-py36h7e9f1db_0
  numpy-base         pkgs/main/linux-64::numpy-base-1.16.3-py36hde5b4d6_0
  openblas           pkgs/free/linux-64::openblas-0.2.19-0
  tsnecuda           cannylab/linux-64::tsnecuda-0.1.1-py36_cuda80_0

The following packages will be UPDATED:

  certifi                                  2018.8.24-py35_1 --> 2019.3.9-py36_0
  openssl                                 1.0.2r-h7b6447c_0 --> 1.1.1b-h7b6447c_1
  packaging                                     17.1-py35_0 --> 19.0-py36_0
  pip                                         10.0.1-py35_0 --> 19.1.1-py36_0
  python                                   3.5.6-hc3d631a_0 --> 3.6.8-h0371630_0
  setuptools                                  40.2.0-py35_0 --> 41.0.1-py36_0
  six                                         1.11.0-py35_1 --> 1.12.0-py36_0
  wheel                                       0.31.1-py35_0 --> 0.33.4-py36_0


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: failed

CondaVerificationError: The package for python located at /home/frozenmiwe/anaconda3/pkgs/python-3.6.8-h0371630_0
appears to be corrupted. The path 'lib/python3.6/distutils/command/wininst-14.0-amd64.exe'
specified in the package manifest cannot be found.

CondaVerificationError: The package for python located at /home/frozenmiwe/anaconda3/pkgs/python-3.6.8-h0371630_0
appears to be corrupted. The path 'lib/python3.6/distutils/command/wininst-14.0.exe'
specified in the package manifest cannot be found.

Do you have any ideas about how to fix this?

Thank you,

Miguel

parallel_for failed: no kernel image is available for execution on the device

Error

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: no kernel image is available for execution on the device
Aborted (core dumped)

Code to reproduce

from tsnecuda import TSNE
import numpy as np

N = int(1e5)
n_iter = 1000

X = np.random.rand(N, 2)
X_embedded = TSNE(n_components=2, verbose=1, n_iter=n_iter, num_neighbors=32).fit_transform(X)

print("Done")

Configuration

Tesla K80
Driver Version: 396.44

Compiled from source using

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Looking at the conda_support branch, does this have something to do with gencode and code flags in CMakeLists.txt?

can this be run on windows?

Request for R bindings

Hi. I would love to see an R binding/wrapper to use your code from R. Is that possible/planned?

Connection Failed

Hi, I use the recommended installation of conda, but there is a HTTP 000 connection error:

I have tried several times, but this error always occurs.
What should I do? Thanks!

Compiling with Cuda 10?

Hi,

I am wondering if anyone has tried compiling with CUDA 10. I am getting a bunch of "redefinition" errors", which may or may not be related to trying to compile with cuda 10.

/home/mustafa/tsne-cuda/src/include/kernels/bounding_box.h:17:21: error: redefinition of ‘volatile int stepd’
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                     ^~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bh_rep_forces.h:17:21: note: ‘volatile int stepd’ previously declared here
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                     ^~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bounding_box.h:17:28: error: redefinition of ‘volatile int bottomd’
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                            ^~~~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bh_rep_forces.h:17:28: note: ‘volatile int bottomd’ previously declared here
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                            ^~~~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bounding_box.h:17:37: error: redefinition of ‘volatile int maxdepthd’
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                                     ^~~~~~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bh_rep_forces.h:17:37: note: ‘volatile int maxdepthd’ previously declared here
 extern __device__ volatile int stepd, bottomd, maxdepthd;
                                     ^~~~~~~~~
/home/mustafa/tsne-cuda/src/include/kernels/bounding_box.h:18:17: error: redefinition of ‘unsigned int blkcntd’
 extern __device__ unsigned int blkcntd;

Request: tsnecuda for conda

Since conda provides various versions of faiss:

# CPU version only
conda install faiss-cpu -c pytorch
# Make sure you have CUDA installed before installing faiss-gpu, otherwise it falls back to CPU version
conda install faiss-gpu -c pytorch # [DEFAULT]For CUDA8.0
conda install faiss-gpu cuda90 -c pytorch # For CUDA9.0
conda install faiss-gpu cuda91 -c pytorch # For CUDA9.1
# cuda90/cuda91 shown above is a feature, it doesn't install CUDA for you.

It would be great if you could make a conda-specific tsnecuda package.

PackagesNotFoundError

Collecting package metadata (current_repodata.json): done
Solving environment: failed
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

tsnecuda -> libgfortran
tsnecuda -> *[track_features=cuda91]

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

I tried installing tsnecuda on colab using :
!conda install tsnecuda cuda100 -c cannylab.

Have been receiving this error since 12:00 PM IST 8th July,2019. Was working fine 1-2 weeks ago. (I work on a temp virtual env setup on Colab using miniconda)

!conda install cuda100 -c cannylab is working fine. No errors.

gli

I encounter strange output when use this tsne to visualize data.

Hi sir,

To be honest I don't really understand how tsne do. I just use the code bellow to visualize my feature gotten by CNN. The feature is a numpy array with dimension of n * 9216(n * 6x6x256).

tsne_point = TSNE().fit_transform(feature)

Then it first print message as "WARNING clustering 430 points to 20 centroids: please provide at least 780 training points". Then it print a lot of lines like below, about 430 * 6 lines totally.

Can anyone help me? Thanks a lot!

578, -1
578, -1
578, -1
578, -1
578, -1
578, -1
570, -1
570, -1
570, -1
570, -1
570, -1
570, -1
585, -1
585, -1
585, -1
585, -1
585, -1
585, -1
576, -1
576, -1
576, -1

Python wrapper error: parallel_for failed: out of memory

Error

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: out of memory
Aborted (core dumped)

Code to reproduce

from tsnecuda import TSNE
import numpy as np

# Just a data limit
N = 1000000
iter = 1000

X = np.random.rand(N, 2)

X_embedded = TSNE(n_components=2, verbose=1, n_iter=iter).fit_transform(X)

# Print the frame
print(X_embedded)

Configuration

GPU:
Tesla V100-SXM2-16gb
NVIDIA Driver Version: 396.37

Compiled from source with

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Compilation fails on 7.5 compute capability?

When I try to compile for a 7.5 compute capability with CUDA 10.0 or 10.1 I get errors about Unknown __CUDA_ARCH__

Here is a big dump showing context:

make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_L2Select.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
[ 26%] Building NVCC (Device) object CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_IVFUtilsSelect1.cu.o
In file included from /app/third_party/faiss/gpu/impl/L2Norm.cu:13:0:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
CMake Error at tsnecuda_generated_L2Norm.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_L2Norm.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_L2Norm.cu.o] Error 1
In file included from /app/third_party/faiss/gpu/impl/L2Norm.cu:13:0:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
CMake Error at tsnecuda_generated_L2Norm.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_L2Norm.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_L2Norm.cu.o] Error 1
In file included from /app/third_party/faiss/gpu/impl/PQCodeDistances.cu:15:0:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
CMake Error at tsnecuda_generated_PQCodeDistances.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_PQCodeDistances.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_PQCodeDistances.cu.o] Error 1
[ 26%] Linking CXX executable stl_logging_unittest
[ 26%] Built target stl_logging_unittest
In file included from /app/third_party/faiss/gpu/impl/PQCodeDistances.cu:15:0:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
CMake Error at tsnecuda_generated_PQCodeDistances.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_PQCodeDistances.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_PQCodeDistances.cu.o] Error 1
[ 26%] Linking CXX static library ../../../lib/libgmock.a
[ 26%] Built target gmock
Scanning dependencies of target gmock_main
In file included from /app/third_party/faiss/gpu/impl/../utils/WarpShuffles.cuh:13:0,
                 from /app/third_party/faiss/gpu/impl/../utils/Pair.cuh:14,
                 from /app/third_party/faiss/gpu/impl/../utils/Limits.cuh:13,
                 from /app/third_party/faiss/gpu/impl/Distance.cu:17:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
[ 27%] Building CXX object third_party/gtest/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
In file included from /app/third_party/faiss/gpu/impl/IVFFlatScan.cu:14:0:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability
  ^
CMake Error at tsnecuda_generated_IVFFlatScan.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_IVFFlatScan.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_IVFFlatScan.cu.o] Error 1
CMake Error at tsnecuda_generated_Distance.cu.o.cmake:203 (message):
  Error generating
  /app/build/CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/./tsnecuda_generated_Distance.cu.o


make[2]: *** [CMakeFiles/tsnecuda.dir/third_party/faiss/gpu/impl/tsnecuda_generated_Distance.cu.o] Error 1
In file included from /app/third_party/faiss/gpu/impl/../utils/WarpShuffles.cuh:13:0,
                 from /app/third_party/faiss/gpu/impl/../utils/Pair.cuh:14,
                 from /app/third_party/faiss/gpu/impl/../utils/Limits.cuh:13,
                 from /app/third_party/faiss/gpu/impl/Distance.cu:17:
/app/third_party/faiss/gpu/impl/../utils/DeviceDefs.cuh:18:2: error: #error Unknown __CUDA_ARCH__; please define parameters for compute capability
 #error Unknown __CUDA_ARCH__; please define parameters for compute capability

I tried adding these lines in the CMakelists.txt file:

                    -gencode=arch=compute_75,code=sm_75
                    -gencode=arch=compute_75,code=compute_75

I tried both of them by themselves as well as together.

Are you all able to compile for this architecture? Maybe it's just something about my env or some other arg that isn't correct?

Keep old version available on Conda

It seems that the old version of this package is not available on Conda anymore? Let me know if I'm mistaken.

I now have some environments that aren't reproducible as a result. It's a good practice (and somewhat a main feature of Conda) to have all major versions available. Maybe you have a good reason not to, though.

Related: #42

OSError: no file with expected extension

Hi,
In windows, I get the following error during run time:

Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

from tsnecuda import TSNE
import numpy as np
X = np.random.random((5000, 50))
TSNE(verbose=1).fit_transform(X)
Traceback (most recent call last):
File "", line 1, in
File "C:\ProgramData\Anaconda3\lib\site-packages\tsnecuda\TSNE.py", line 110, in init
self._lib = N.ctypeslib.load_library('libtsnecuda', self._path) # Load the ctypes library
File "C:\Users\fabie\AppData\Roaming\Python\Python36\site-packages\numpy\ctypeslib.py", line 155, in load_library
raise OSError("no file with expected extension")
OSError: no file with expected extension

the error appear at the line of 110 of TNSE.py
with "_path" as the path where is libtsnecuda.so :

N.ctypeslib.load_library('libtsnecuda', _path)
OSError: no file with expected extension

Any idea of what is happening ?

Thanks

Asking support for Mac OS

Hi, I try to compile your code on Mac OS, but without success. There is a problem that default XCode clang compiler does not support OpenMP.

It seems third_party dependencies rely on OpenMP and report the following error:

/Users/tomheaven/Documents/libraries/tsne-cuda/third_party/faiss/IndexHNSW.h:12:10: fatal error: 'omp.h' file not found
#include <omp.h>

As we only want the CUDA acceleration, can there be an option to disable that feature so the code will compile smoothly?

Thanks!

Still have parallel compilation issues

There are parallel compilation issues in the conda build process, and some were noted in issue #42.

One string possibility is that when I compile the code on my machine I usually disable compilation for GPU architectures other than my actual GPU architecture. This is not done (obviously) during conda-build as we want to build for all architectures. I also suspect most users won't try this since we don't have an option to do so (you just have to delete them from the CMakeLists.txt file).

Libfaiss error

Hi,

i get the following error during run time:
OSError: libfaiss.so: cannot open shared object file: No such file or directory

Even after a manual install of FAISS. Any idea why this is happening?

install error using pip

When I tried to install tsne-cuda using 'pip install tsne-cuda', it threw out the error:

Could not find a version that satisfies the requirement tsne-cuda (from versions: )
No matching distribution found for tsne-cuda

GLIB2.27 error

Ubuntu16.04
Python3.6.8

Getting this error while running tsnecuda:

tsne_model = TSNE(n_iter=2500)
File "/home/abinashsinha330/anaconda3/lib/python3.6/site-packages/tsnecuda/TSNE.py", line 110, in init
self._lib = N.ctypeslib.load_library('libtsnecuda', self._path) # Load the ctypes library
File "/home/abinashsinha330/anaconda3/lib/python3.6/site-packages/numpy/ctypeslib.py", line 150, in load_library
return ctypes.cdll[libpath]
File "/home/abinashsinha330/anaconda3/lib/python3.6/ctypes/init.py", line 423, in getitem
return getattr(self, name)
File "/home/abinashsinha330/anaconda3/lib/python3.6/ctypes/init.py", line 418, in getattr
dll = self._dlltype(name)
File "/home/abinashsinha330/anaconda3/lib/python3.6/ctypes/init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /home/abinashsinha330/anaconda3/lib/python3.6/site-packages/tsnecuda/libtsnecuda.so)

I checked my cuda version if 10.0.130 so I used the command:
"conda install tsnecuda cuda100 -c cannylab" for installing tsnecuda

I have glib version of 2.23 in Ubuntu16.04. Please help me with the same

Request: Parallelization for multiple GPUs (devices)

I do know this one might be tricky / difficult, however, a support for multiple GPUs would be fantastic, especially for guys like us, who run things on NVIDIA DGX-1.

undefined symbol: _ZN5faiss14FaissExcepttionC1ERKSsPKcS4_i

I can install tsne-cuda using "pip install tsnecuda" and import it correctly. But when I want to test the installation by "tsnecuda.test()", it raises an error "module object has no attribute test".

problem with NaiveTSNE.py

Hi,
I just want to test the performance of the version n^2 computational complexity(naive version as in original paper). However it seems currently the python wrapper for naive version is of some kind of problem. I fix the bug according to my understanding, as shown below. The code then worked for small datasets(15000by30 minist data), but the resulting graph is not correct. I am wondering maybe I made mistakes in debug.

Another issue is that currently the n^2 naive version only support for small amount of data, it can not go through full minist dataset(60000by30, with PCA dim reduction from 28*28 to 30 dim).

Also, how could I set the specific gpu I want to run the code on?

diff --git a/src/python/tsnecuda/NaiveTSNE.py b/src/python/tsnecuda/NaiveTSNE.py
index a9b8b00..c060273 100644
--- a/src/python/tsnecuda/NaiveTSNE.py
+++ b/src/python/tsnecuda/NaiveTSNE.py
@@ -72,14 +72,14 @@ class NaiveTSNE(object):
         else:
             self.metric = metric
         self.verbose = int(verbose)
-        if random_seed is not None:
-            self.random_seed = int(random_seed)
-        else:
-            self.random_seed = os.urandom()
+        # if random_seed is not None:
+        #     self.random_seed = int(random_seed)
+        # else:
+        #     self.random_seed = os.urandom()
 
         # Build the hooks for the Naive T-SNE library
         self._path = pkg_resources.resource_filename('tsnecuda','') # Load from current location
-        self._lib = N.ctypeslib.load_library('libtsnecuda', _path) # Load the ctypes library
+        self._lib = N.ctypeslib.load_library('libtsnecuda', self._path) # Load the ctypes library
 
         # Hook the naive T-SNE function
         self._lib.pymodule_naive_tsne.restype = None
@@ -92,7 +92,7 @@ class NaiveTSNE(object):
                                   ctypes.c_float, # Learning Rate
                                   ctypes.c_int, # n_iter
                                   ctypes.c_int, # n_iter w/o progress
-                                  ctypres.c_float # min_norm
+                                  ctypes.c_float # min_norm
                                 ]
 
         # Set up the attributed
@@ -112,15 +112,15 @@ class NaiveTSNE(object):
 
         X = N.require(X, N.float32, ['F_CONTIGUOUS', 'ALIGNED'])
         self.embedding_ = N.empty(shape=(X.shape[0],self.n_components))
-        self.embedding_ = N.require(results, N.float32, ['F_CONTIGUOUS', 'ALIGNED', 'WRITEABLE'])
+        self.embedding_ = N.require(self.embedding_, N.float32, ['F_CONTIGUOUS', 'ALIGNED', 'WRITEABLE'])
         self._lib.pymodule_naive_tsne(X, self.embedding_, X.ctypes.shape, 
-                                        c_int(self.n_components), 
-                                        c_float(self.perplexity), 
-                                        c_float(self.early_exaggeration),
-                                        c_float(self.learning_rate), 
-                                        c_int(self.n_iter),
-                                        c_int(self.n_iter_without_progress),
-                                        c_float(self.min_grad_norm))
+                                        ctypes.c_int(self.n_components), 
+                                        ctypes.c_float(self.perplexity), 
+                                        ctypes.c_float(self.early_exaggeration),
+                                        ctypes.c_float(self.learning_rate), 
+                                        ctypes.c_int(self.n_iter),
+                                        ctypes.c_int(self.n_iter_without_progress),
+                                        ctypes.c_float(self.min_grad_norm))
         return self.embedding_

Request: provide logging for phases before iterations start

It would be useful if there were some sort of logging during the initialization, KNN search, etc., phases before the iterations start. With large datasets (e.g., millions of rows) there can be significant lag times with no reporting. Then it becomes a question whether any progress is actually being made.

On that note, do you have any theoretical or practical guidance on the maximum number of rows and/or columns that the algorithm can handle?

cli

Gradient Norm: nan

I noticed that some datasets are causing Gradient Norm: nan and causing the algorithm to fail, or at least to produce nonsense results.

I have two test files. One causes the error and one does not. The one that does not is mostly identical to the one that does except it is missing ~1500 rows that were sampled out of it at regular interval.

File 1 (fails): https://drive.google.com/open?id=12tzAyo9SIt7xcSLn2RWTxc0janATpoZk
File 2 (succeeds): https://drive.google.com/open?id=1VSGkMl6hPzTmXGVmGa4anMvtRmqc1lSf

Strangely, if I run just the row diff as the input, it succeeds. So I'm not sure what it is about the data in the failure file that causes the failure. I also tried shuffling the rows randomly once and it still failed.

I'm using the current build with the major bug fixes but a small number of commits behind.

Curious to hear if you can repro and hopefully find the issue.

Results not consistent with previous build?

I can't seem to produce good embeddings with v2. Check out the difference on the same dataset with an old build versus the new one:

V2 build:

V2 build logs:

[Step 0] Avg. Gradient Norm: 0.000577044

[Step 50] Avg. Gradient Norm: 0.0707002

[Step 100] Avg. Gradient Norm: 0.0651156

[Step 150] Avg. Gradient Norm: 0.0585839

[Step 200] Avg. Gradient Norm: 0.0517932

[Step 250] Avg. Gradient Norm: 0.0723898

[Step 300] Avg. Gradient Norm: 0.0590655

[Step 350] Avg. Gradient Norm: 0.0462633

[Step 400] Avg. Gradient Norm: 0.0449879

[Step 450] Avg. Gradient Norm: 0.0418904

[Step 500] Avg. Gradient Norm: 0.0390911

[Step 550] Avg. Gradient Norm: 0.0356273

[Step 600] Avg. Gradient Norm: 0.0310828

[Step 650] Avg. Gradient Norm: 0.0245189

[Step 700] Avg. Gradient Norm: 0.0188884

[Step 750] Avg. Gradient Norm: 0.017637

[Step 800] Avg. Gradient Norm: 0.0151654

[Step 850] Avg. Gradient Norm: 0.0118614

[Step 900] Avg. Gradient Norm: 0.0126098

[Step 950] Avg. Gradient Norm: 0.0119634

Version `efa2098`

Old version logs:

[Step 0] Avg. Gradient Norm: 5.4169e-06

[Step 50] Avg. Gradient Norm: 0.0909963

[Step 100] Avg. Gradient Norm: 0.0573804

[Step 150] Avg. Gradient Norm: 0.0432024

[Step 200] Avg. Gradient Norm: 0.0329338

[Step 250] Avg. Gradient Norm: 0.0818289

[Step 300] Avg. Gradient Norm: 0.0811468

[Step 350] Avg. Gradient Norm: 0.0756753

[Step 400] Avg. Gradient Norm: 0.0717059

[Step 450] Avg. Gradient Norm: 0.0687041

[Step 500] Avg. Gradient Norm: 0.0663398

[Step 550] Avg. Gradient Norm: 0.0642467

[Step 600] Avg. Gradient Norm: 0.0623364

[Step 650] Avg. Gradient Norm: 0.0605533

[Step 700] Avg. Gradient Norm: 0.0588856

[Step 750] Avg. Gradient Norm: 0.0572197

[Step 800] Avg. Gradient Norm: 0.0553893

[Step 850] Avg. Gradient Norm: 0.05345

[Step 900] Avg. Gradient Norm: 0.051608

[Step 950] Avg. Gradient Norm: 0.0497179

The reason the old build is just a git tag is because I built it many months ago when master was at that point. V2 is compiled from source from the V2 tag. Both were compiled in an ubuntu 16.04 / cuda9.0 environment but not the exact same environment for both.

The settings for the run are the same between builds. The differences from default settings are early exag = 4, learning rate = 20000, n neighbors = 100. The dataset is 500k rows and 14 columns. I can provide it if you want to try it out. That might be the simplest way to judge if it's some build issue on my end.

Do you have any standard datasets that you run through the algorithm as a quality check? Have you noticed any differences between your old build and the new one? If any, I'm sure they aren't this bad. Is it possible that the characteristics of this data are bringing out an edge case?

what are the GPU memory requirements?

the sample with (300000,6) is filling up the whole 12GB of my TitanXP, is it the maximum number of data points what i can use with this setup? or there is some room to optimize that?
thanks Arman
ps
btw it is extremely fast on 300k: 5-10sec

Python: tsnecuda.TSNE() error, illegal memory access

I'm having some problems in order to execute the tsnecuda.TSNE() in my dataset.
Here you have a link to a github with the extended information and the data in order to reproduce the error:https://github.com/miqueleg/cuda-tSNE_problem
A a summary:
When I execute the TSNE function, a error is displayed:

GPUassert: an illegal memory access was encountered /home/rmrao/miniconda3/conda-bld/tsnecuda_1538693499277/work/src/util/cuda_utils.cu 55

System information:

OS: CentOS Linux 7
GPU: nvidia GeForce GTX980
Python 3.6.3
faiss-cpu 1.4.0 py36_cuda0.0_1 pytorch
faiss-gpu 1.4.0 py36_cuda8.0.61_1 pytorch
tsnecuda 0.1.1 py36_cuda80_0 [cuda80] cannylab
tsnecuda 0.1.1

I hope you can help me!

Thanks,
Miquel Estévez Gay
CompBioLab, UdG

projection to other than 2 dimension error

import numpy as np
from tsnecuda import TSNE
X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
X_embedded = TSNE(method='naive', n_components=10).fit_transform(X)
X_embedded.shape

the above is the code from the basic usage, below is the error report

TypeError Traceback (most recent call last)
in
2 from tsnecuda import TSNE
3 X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
----> 4 X_embedded = TSNE(method='naive', n_components=10).fit_transform(X)
5 X_embedded.shape

TypeError: init() got an unexpected keyword argument 'method'

Is it possible to change the metric let say mahalanobis metric or precomputed one?

Is it possible to change the metric let say mahalanobis metric or precomputed one?
thanks,

Test suite compilation errors

Steps to reproduce
Enable test in the Makefile,

cmake .. -DBUILD_PYTHON=TRUE -DWITH_ZMQ=FALSE -DBUILD_TEST=TRUE -DWITH_MKL=FALSE

Compile with,

make

System config

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609

Python 3.6.4 :: Anaconda, Inc.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Error

[ 34%] Building NVCC (Device) object CMakeFiles/tsne.dir/third_party/faiss/gpu/utils/tsne_generated_WarpSelectFloat.cu.o
/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(277): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(277): error: expected a ";"

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(278): error: identifier "opt" is undefined

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(284): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(330): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(330): error: expected a ";"

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(331): error: identifier "opt" is undefined

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(337): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(365): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(365): error: expected a ";"

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(366): error: identifier "opt" is undefined

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(372): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(407): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(407): error: expected a ";"

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(408): error: identifier "opt" is undefined

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(414): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(389): warning: variable "num_columns" was declared but never referenced

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(390): warning: variable "num_rows" was declared but never referenced

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(391): warning: variable "num_channels" was declared but never referenced

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(449): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(449): error: expected a ";"

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(450): error: identifier "opt" is undefined

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(456): error: name followed by "::" must be a class or namespace name

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(431): warning: variable "num_columns" was declared but never referenced

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(432): warning: variable "num_rows" was declared but never referenced

/home/ubuntu/tsne-cuda/src/include/test/test_tsne.h(433): warning: variable "num_channels" was declared but never referenced

20 errors detected in the compilation of "/tmp/tmpxft_00003529_00000000-7_test.compute_61.cpp1.ii".
CMake Error at tsne_test_generated_test.cu.o.cmake:262 (message):
  Error generating file
  /home/ubuntu/tsne-cuda/build/CMakeFiles/tsne_test.dir/src/test/./tsne_test_generated_test.cu.o


CMakeFiles/tsne_test.dir/build.make:665: recipe for target 'CMakeFiles/tsne_test.dir/src/test/tsne_test_generated_test.cu.o' failed
make[2]: *** [CMakeFiles/tsne_test.dir/src/test/tsne_test_generated_test.cu.o] Error 1
CMakeFiles/Makefile2:219: recipe for target 'CMakeFiles/tsne_test.dir/all' failed
make[1]: *** [CMakeFiles/tsne_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

unstable execution process

Hi, I used your lastest conda method to install tsne-cuda "conda install tsnecuda -c cannylab" and my cuda version is 9.0. The installation went well, but I tested the code something wired happened.
I randomly created a data matrix 40000*100, and used a for loop to compute multiple projections with the same data matrix. As you can see, the first three iterations worked fine, but the fourth never returned a result.

Actually use random_seed argument

The Python API has a random_seed attribute for the tsnecuda.TSNE class but it's ignored.

Random seed is tracked as an option within the tsnecuda implementation but it looks hard coded to time-based seed instead. Is there a good reason for this?

I might implement and PR this change unless you indicate otherwise or do it first.

question about performance benchmark in README.md

Hi,
Great works!
I am wondering, for benchmark calculation of MNIST and CIFAR-10 datasets described in README.md, if it was handled after applying PCA dimension reduction to 2 dim or not? Since BH algorithm only supports doing so. The term "runs on the raw pixels" is a little bit confusing.

Thanks a lot.

Critical Execution Error

Steps to reproduce
Compile tsnecuda from source and launch tsne.

Configuration

NVIDIA Tools:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

GPU:
Tesla V100-SXM2-16gb
NVIDIA Driver Version: 396.37

Error

Starting TSNE calculation with 5000 points.
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: invalid device function
Aborted (core dumped)

cannylab / tsne-cuda Goto Github PK

tsne-cuda's Introduction

TSNE-CUDA

Benchmarks

Simulated Data

MNIST

CIFAR

Comparison of Embedding Quality

Installation

Run

Citation

License

tsne-cuda's People

Contributors

Stargazers

Watchers

Forkers

tsne-cuda's Issues

V2 build:

V2 build logs:

Version efa2098

Old version logs:

System information:

Recommend Projects

Recommend Topics

Recommend Org

Version `efa2098`