Comments (31)
I'm trying to reproduce this but cannot (for master and 1.0.0b6), can you give a small self contained example that shows this?
(Hint: ds = vaex.from_scalars(ra=1, dec=2)
may be useful)
from vaex.
Here is one example:
N = int(1e4)
ra = np.random.uniform(0, 360, N)
dec = np.random.uniform(0, 180, N)
ds = vaex.from_arrays(ra=ra, dec=dec)
ds.add_virtual_columns_eq2gal()
ds.plot1d('ra', selection='l < 0')
NameError: name '__celestial_eq2gal_matrix' is not defined
from vaex.
I don't get this error, quite strange. Which version do you run, I guess you installed the latest release(1.0.0b6) after the previous issue right? From pip?
from vaex.
I ran pip again and now I have a broken python library
libc++abi.dylib:terminating with uncaught exception of type Error: stride is not equal to 1
It seems matplotlib can run so I don't know what happened. That will take a while to figure.
from vaex.
It seems that msg comes from vaex actually, try git master otherwise, it will give an error but should not crash.
from vaex.
I don't know, it really crashes the python kernel so I have no way to find where it is from
from vaex.
interestingly... if I open a file with vaex it seems to work (did not test long) but if I create a dataset from arrays it crashes with the above error. Seems it is from vaex.
I updated for the git/master and now I have a RuntimeError
How do I correct this?
from vaex.
The arrays you feed into vaex need to have a stride of 1, to if you do sth like 2d_blob[:,2], it will not take it since it is not optimal.
from vaex.
data arrays are 1d arrays. vaex can export them into a file and read it back without the error.
It must be something that is not cast when in memory
random numbers generating some data as
('ra', dtype('float64'), (100000,))
('dec', dtype('float64'), (100000,))
('gmag', dtype('float64'), (100000,))
('source_id', dtype('int64'), (100000,))
then using from_arrays()
from vaex.
If you .copy()
it, I should work btw.
from vaex.
.copy() on what?
from vaex.
The numpy arrays, for instance np.zeros((3,3))[:,2].copy()
. Can you share the code that gives you this bug btw. And if would be great if you can check the git master version:
$ git clone https://github.com/maartenbreddels/vaex/
$ cd vaex
$ pip install -e .
from vaex.
ok found something interesting I think.
(I already installed the git/master version)
The following code triggers the error:
uniform = np.random.uniform
Npop = int(1e6)
pop = np.random.uniform(0, 10, (Npop, 3))
samples = {
'ra': pop[:, 0],
'dec': np.rad2deg(np.arcsin(pop[:, 1])),
'gmag' : pop[:, 2]}
ds = vaex.from_arrays(**samples)
ds.plot1d('ra')
However, if I ravel the individual variables as follow is does not generate an error
samples = {
'ra': pop[:, 0].ravel(),
'dec': np.rad2deg(np.arcsin(pop[:, 1])).ravel(),
'gmag' : pop[:, 2].ravel()}
ds = vaex.from_arrays(**samples)
ds.plot1d('ra')
from vaex.
Indeed, .copy()
gives you the same effect. Let me show you the strides for a 3x3 array:
When the stride is 8 (and the itemsize is 8, in vaex I divide by the itemsize, so that's why I complain it is not 1), it means the next item is directly at the next memory location. When the strides is 24, as shown above, it means the next item is 3 location apart, which is bad for performance. But I should check this when constructing the dataset I think, instead of later on when doing the binning.
from vaex.
I am surprised it is not caught by numpy conversion to C.
The data extracted from 2d arrays are not aligned in any specific way, esp. not 'C'.
I found the asfloat()
in the legacy.py but that's not enough apparently.
from vaex.
Well, it shouldn't accept it, but I think vaex should warn at this part:
ds = vaex.from_arrays(**...)
I could try to support strides != itemsize, I just want to avoid getting poor performance.
from vaex.
I think you could force the contiguous data:
in dataset.py
line 3588:
self.columns[name] = f_or_array
replaced by
self.columns[name] = np.array(f_or_array, copy=False, order='C')
It seems to do the trick. I don't know where else you need it though.
from vaex.
Well, that will fix it, but imagine the array being a few GB, it will make a copy of it, and that is why vaex doesn't try to convert it but gives up. It could be done just before binning, when only a small chuck is done, but you'll get bad performance without a warning.
from vaex.
Maybe the best method would be to not convert it, when emit a warning. And indeed allow it, even though it may give bad performance.
from vaex.
I agree but that is still better than the current runtimeError
The best could be a test
if isinstance(f_or_array, np.array):
if not f_or_array.flags['C_CONTIGUOUS']:
warn....
self.columns[name] = np.array(f_or_array, copy=False, order='C')
I suggest that as a quick fix so it works while keeping the bug open for proper processing
from vaex.
FYI, the initial bug: error name '__celestial_eq2gal_matrix' is not defined
seems gone with the git/master version
from vaex.
👍 great, I'll keep this open to support stride != (1 or 8)
from vaex.
Parsing quickly the code, I think you need to wrap vaexfast
functions with a contiguous parsing.
It seems you call those functions only on chunks. maybe that's what you meant with the binning?
(Thanks for being so present!)
from vaex.
What do you mean with 'contiguous parsing'?
Yes, those functions are only used on the 'chunks', so just before calling these functions I could check for the stride and convert on the fly (for small datasets that is ok).
Thanks for testing out!
from vaex.
I don't know how easy it is to update your C code directly.
you're welcome. Awesome work that makes my life easier!
from vaex.
I am thinking of converting vaex' c codebase to c++ using the xtensor library, it will shrink the codebase quite a lot, and will make it easier to support different strides.
The c code needs a lot of cleaning anyway, quite some dead code in there anyway. But for the moment the conversion can be done on the Python side.
from vaex.
That'd be great. Xtensor is really an interesting library.
from vaex.
Yes indeed, @SylvainCorlay was pushing me to try it out, and the first tests show it gives me the same performance.
from vaex.
My only issue with xtensor is the C++14 need that is still not standard on old systems. This needs at least GCC 4.9 or 5 I believe.
from vaex.
True.. vaex 2.0 then ;)
from vaex.
This is fixed in master, and also supported in the refactor of #197 where this is also tested.
from vaex.
Related Issues (20)
- Vaex with Pyinstaller
- [BUG-REPORT] vaex save error
- [FEATURE-REQUEST] Support Python 3.12 HOT 1
- [BUG-REPORT] Printing vaex df after sort running out of memory
- [BUG-REPORT] HOT 1
- [BUG-REPORT] Large Groupby Agg runs out of memory
- Vaex not exporting to file properly inside of a mulitprocessing pool.
- [FEATURE-REQUEST] looking for vaex equivalent of pandas_df.corr(min_periods=100)
- [FEATURE-REQUEST] Getting dtype of columns as they are when rendered in a pandas dataframe?
- [BUG-REPORT] SyntaxError when representing result of a basic operation involving Expression and numpy array, with the array on the right of the operator HOT 1
- How do i overcome this issue? HOT 1
- [BUG-REPORT] why the same dataframe and same groupby(vaex.agg.list) operation, one got error and one got correct if i export to hdf5 and loaded it back
- [BUG-REPORT] jupyter notebook error in tutorial docs
- [BUG-REPORT] HDF5 file remains locked after close
- [BUG-REPORT] MinMaxScaler not working after filtering
- [FEATURE-REQUEST] is it still supported
- [FEATURE-REQUEST] read hdf5 file in C++
- [BUG-REPORT] Docs are down HOT 2
- statistics on 2d grids: control the `bin_centers`
- interactive statistic heatmap / quadrilateral mesh
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vaex.