Comments (16)
I think the problem is that that should be sigmas[k]
instead, but I'll have to trace through the whole thing to be sure.
from umap.
Thanks for detailed bug report! This may be the same as issue #33. I believe the latest master version should have fixed this, but I haven't put out a release on pip yet (waiting to roll a few more bug fixes and features together). If you have time to clone the master branch from github and install it (after removing the old pip version) and try that instead I would appreciate it. I believe it should fix the issue you are seeing, and if it doesn't then I clearly have a little more work to do.
from umap.
@lmcinnes thanks for your quick response. It seems the problem has been resolved on master. I've run several dozen runs since reinstalling and haven't hit the division by zero error, so I think it should be safe to close this issue. Thanks again for this great work!
from umap.
Thanks for checking that out so fast.
from umap.
I am running to zero division errors every couple of runs, I am using CentOS, and spyder (py 3.6).
runfile('/home/anaconda3/envs/ML/scripts/beta_projects/umap_test.py', wdir='/home/anaconda3/envs/ML/scripts/beta_projects')
File "/home/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "/home/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/home/anaconda3/envs/ML/scripts/beta_projects/umap_test.py", line 83, in <module>
embedding = umap.UMAP(n_neighbors = ideal_model['n_neighbors'], min_dist = ideal_model['min_dist'], metric = ideal_model['metric_iso'], n_components = 3).fit_transform(input_data)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 1402, in fit_transform
self.fit(X)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 1361, in fit
self.verbose
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 385, in rptree_leaf_array
angular=angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 315, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 310, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 315, in make_tree
angular)
File "/home/anaconda3/lib/python3.6/site-packages/umap/umap_.py", line 301, in make_tree
rng_state)
ZeroDivisionError: division by zero ```
from umap.
@100518832 did you git clone from master then run python setup.py install
inside the directory to install the library? The latest release on PYPI (0.1.5) still has this problem I believe, so if you pip installed you may be affected, but if you run the setup.py install this should clear up...
from umap.
@duhaime, I do not recall running the python setup.py install; however, I will try this. Also, I have noticed that I only get a zero division error with the following three argument parameters.
Embedding Failed - info: 15, 0.01, correlation
Embedding Failed - info: 15, 0.05, correlation
Embedding Failed - info: 20, 0.01, correlation
Here are all the parameters I am currently testing through
n_neighbors = [15,20,40]
min_dist = [0.01,0.05]
metric_iso = ['correlation','euclidean','manhattan']
from umap.
@100518832 I'd try installing from the repo's setup.py
file, as the fixes for the div by zero problem are on master but not Pypi...
from umap.
I'm now getting this error with the inverse_transform
method. I'm using the code in the repo's master branch, at the 0.4rc1 tag (fc59aa7)
Code:
import numpy as np
from umap import UMAP
to_reduce = np.vstack(data) # shape: (1137, 25); dtype: 'float64')
np.random.seed(0)
reducer = UMAP(random_state=0).fit(to_reduce)
embeddings = reducer.transform(to_reduce)
# create a 2D grid over the embedding space
resolution = 50
x_min, y_min = embeddings.min(axis=0) // 1
x_max, y_max = embeddings.max(axis=0) // 1 + 1
x_step = (x_max - x_min) / resolution
y_step = (y_max - y_min) / resolution
xs = np.arange(x_min, x_max, x_step)
ys = np.arange(y_min, y_max, y_step)
X, Y = np.meshgrid(xs, ys)
xy_grid = np.empty((resolution, resolution, 2), dtype=np.float64)
for (x_ix, y_ix), X_val in np.ndenumerate(X):
xy_grid[x_ix, y_ix] = (X_val, Y[x_ix, y_ix])
# recover vector in original space for each gridpoint
vertices = xy_grid.reshape(resolution**2, 2)
np.random.seed(0)
high_dim_vertices = reducer.inverse_transform(vertices)
high_dim_grid = high_dim_vertices.reshape(resolution, resolution, 25)
Traceback:
ZeroDivisionError Traceback (most recent call last)
<ipython-input-127-93accc0c2718> in <module>
1 # recover vector in original space for each gridpoint
2 np.random.seed(0)
----> 3 high_dim_vertices = reducer.inverse_transform(vertices)
4 high_dim_grid = high_dim_vertices.reshape(resolution, resolution, 25)
/opt/conda/lib/python3.7/site-packages/umap/umap_.py in inverse_transform(self, X)
2203 _input_distance_func,
2204 tuple(self._metric_kwds.values()),
-> 2205 verbose=self.verbose,
2206 )
2207
ZeroDivisionError: division by zero
At first I thought it was related to #33 because I can get the error to go away by setting min_dist
sufficiently high (~.5), which makes the 2D grid points spread out enough that they wouldn't get rounded off to the same value as float32's. But now I'm not so sure, because I still get the error if I do
vertices += np.random.uniform(-10, 10, vertices.shape)
before inverse_transforming them. Any help would be greatly appreciated!!
Additional info:
numpy==1.16.5
sklearn==0.21.3
scipy==1.2.1
numba==0.45.1
tbb==2020.0.133
from umap.
This is somewhat disconcerting, and it is non-obvious to me where exactly this could be occurring. One thing you could try, presuming the compute isn't too expensive, is turning of the numba compilation of the inverse transform related functions and running again -- that way we can get a stack trace to the exact location of the error.
from umap.
So the actual error is coming from here, in optimize_layout_inverse
. At least for me, it's happening when more data points are inverse transformed than were used to fit the model. To compute grad_coeff
, the j
th item from sigmas
is used. But is sigmas
is the distance to the kth nearest neighbor for each data point used to fit the model, so it has self._raw_data.shape[0]
entries. Meanwhile j
is row indices of the adjacency matrix for the 1-skeleton of the inverse_transformed data.
Traceback is kinda weird-looking since I ran it through the Pycharm debugger, but here it is:
Traceback (most recent call last):
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2060, in <module>
main()
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 2054, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1405, in run
return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
File "/Applications/PyCharm.app/Contents/helpers/pydev/pydevd.py", line 1412, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/Applications/PyCharm.app/Contents/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/debugging.py", line 79, in <module>
topic_space_grid = reducer.inverse_transform(vertices)
File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/umap/umap_.py", line 2270, in inverse_transform
verbose=self.verbose,
File "/Users/paxtonfitzpatrick/Documents/Dartmouth/CDL/umap/umap/layouts.py", line 510, in optimize_layout_inverse
grad_coeff = -(1 / (w_l * sigmas[j] + 1e-6))
IndexError: index 1172 is out of bounds for axis 0 with size 1137
from umap.
That did it! And that makes sense based on how grad_coeff
is computed just below that... Thanks so much!
from umap.
Can I assume that this will arrive with your upcoming PR, or should I fix it and push it myself?
from umap.
I changed it on my fork, so it'll get included in my PR. I'm ready to submit that today btw -- just want to confirm about what I commented on #367
from umap.
Excellent -- I'm looking forward to it. Thanks for all your hard work on this!
from umap.
Any update on this? I have a dataset that seems to be throwing it when the size of the data is large.
from umap.
Related Issues (20)
- Implementation of sciki-learn's get_feature_names_out() API is not correct
- Is 'n_training_epochs' working for parameteric UMAP?
- visualize video data
- How to combine UMAP models in new data?
- Edit instructions to make them compatible with zsh
- Empty API page on UMAP API Guide? HOT 1
- PCA diagnostic error HOT 2
- Speed inquries HOT 2
- UMAP crashes when torch also imported before first run HOT 2
- Unable to pickle trained UMAP instance
- Reducing Model Size for UMAP on Large Datasets HOT 2
- umap.UMAP accepts strings as n_neighbors and min_dist, causing later failures
- Optimal dimensions
- RunUMAP Failing HOT 1
- Semi-deterministic output even though randon_state is set
- TypeError: Dispatcher._rebuild() got an unexpected keyword argument 'impl_kind' HOT 1
- illegal hardware instruction python HOT 2
- Transform new input with composite model HOT 1
- Inquiry on Utilizing UMAP for Text Similarity and Clustering HOT 4
- No clear documentation of default parameter values HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from umap.