Giter VIP home page Giter VIP logo

Comments (31)

maartenbreddels avatar maartenbreddels commented on May 1, 2024

I'm trying to reproduce this but cannot (for master and 1.0.0b6), can you give a small self contained example that shows this?
(Hint: ds = vaex.from_scalars(ra=1, dec=2) may be useful)

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

Here is one example:

N = int(1e4)
ra = np.random.uniform(0, 360, N)
dec = np.random.uniform(0, 180, N)
ds = vaex.from_arrays(ra=ra, dec=dec)
ds.add_virtual_columns_eq2gal()
ds.plot1d('ra', selection='l < 0')


NameError: name '__celestial_eq2gal_matrix' is not defined

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

I don't get this error, quite strange. Which version do you run, I guess you installed the latest release(1.0.0b6) after the previous issue right? From pip?

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I ran pip again and now I have a broken python library

libc++abi.dylib:terminating with uncaught exception of type Error: stride is not equal to 1

It seems matplotlib can run so I don't know what happened. That will take a while to figure.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

It seems that msg comes from vaex actually, try git master otherwise, it will give an error but should not crash.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I don't know, it really crashes the python kernel so I have no way to find where it is from

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

interestingly... if I open a file with vaex it seems to work (did not test long) but if I create a dataset from arrays it crashes with the above error. Seems it is from vaex.

I updated for the git/master and now I have a RuntimeError

How do I correct this?

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

The arrays you feed into vaex need to have a stride of 1, to if you do sth like 2d_blob[:,2], it will not take it since it is not optimal.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

data arrays are 1d arrays. vaex can export them into a file and read it back without the error.
It must be something that is not cast when in memory

random numbers generating some data as

('ra', dtype('float64'), (100000,))
('dec', dtype('float64'), (100000,))
('gmag', dtype('float64'), (100000,))
('source_id', dtype('int64'), (100000,))

then using from_arrays()

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

If you .copy() it, I should work btw.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

.copy() on what?

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

The numpy arrays, for instance np.zeros((3,3))[:,2].copy(). Can you share the code that gives you this bug btw. And if would be great if you can check the git master version:

$ git clone https://github.com/maartenbreddels/vaex/
$ cd vaex
$ pip install -e .

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

ok found something interesting I think.
(I already installed the git/master version)

The following code triggers the error:

uniform = np.random.uniform
Npop = int(1e6)
pop = np.random.uniform(0, 10, (Npop, 3))
samples = {
    'ra': pop[:, 0],
    'dec': np.rad2deg(np.arcsin(pop[:, 1])),
    'gmag' : pop[:, 2]}
ds = vaex.from_arrays(**samples)
ds.plot1d('ra')

However, if I ravel the individual variables as follow is does not generate an error

samples = {
    'ra': pop[:, 0].ravel(),
    'dec': np.rad2deg(np.arcsin(pop[:, 1])).ravel(),
    'gmag' : pop[:, 2].ravel()}
ds = vaex.from_arrays(**samples)
ds.plot1d('ra')

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

Indeed, .copy() gives you the same effect. Let me show you the strides for a 3x3 array:
screen shot 2017-09-18 at 19 20 25
When the stride is 8 (and the itemsize is 8, in vaex I divide by the itemsize, so that's why I complain it is not 1), it means the next item is directly at the next memory location. When the strides is 24, as shown above, it means the next item is 3 location apart, which is bad for performance. But I should check this when constructing the dataset I think, instead of later on when doing the binning.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I am surprised it is not caught by numpy conversion to C.
The data extracted from 2d arrays are not aligned in any specific way, esp. not 'C'.

I found the asfloat() in the legacy.py but that's not enough apparently.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

Well, it shouldn't accept it, but I think vaex should warn at this part:

ds = vaex.from_arrays(**...)

I could try to support strides != itemsize, I just want to avoid getting poor performance.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I think you could force the contiguous data:

in dataset.py line 3588:
self.columns[name] = f_or_array
replaced by
self.columns[name] = np.array(f_or_array, copy=False, order='C')

It seems to do the trick. I don't know where else you need it though.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

Well, that will fix it, but imagine the array being a few GB, it will make a copy of it, and that is why vaex doesn't try to convert it but gives up. It could be done just before binning, when only a small chuck is done, but you'll get bad performance without a warning.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

Maybe the best method would be to not convert it, when emit a warning. And indeed allow it, even though it may give bad performance.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I agree but that is still better than the current runtimeError
The best could be a test

if isinstance(f_or_array, np.array):
     if not f_or_array.flags['C_CONTIGUOUS']:
           warn....
           self.columns[name] = np.array(f_or_array, copy=False, order='C')

I suggest that as a quick fix so it works while keeping the bug open for proper processing

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

FYI, the initial bug: error name '__celestial_eq2gal_matrix' is not defined

seems gone with the git/master version

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

👍 great, I'll keep this open to support stride != (1 or 8)

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

Parsing quickly the code, I think you need to wrap vaexfast functions with a contiguous parsing.
It seems you call those functions only on chunks. maybe that's what you meant with the binning?

(Thanks for being so present!)

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

What do you mean with 'contiguous parsing'?
Yes, those functions are only used on the 'chunks', so just before calling these functions I could check for the stride and convert on the fly (for small datasets that is ok).
Thanks for testing out!

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

I don't know how easy it is to update your C code directly.

you're welcome. Awesome work that makes my life easier!

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

I am thinking of converting vaex' c codebase to c++ using the xtensor library, it will shrink the codebase quite a lot, and will make it easier to support different strides.
The c code needs a lot of cleaning anyway, quite some dead code in there anyway. But for the moment the conversion can be done on the Python side.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

That'd be great. Xtensor is really an interesting library.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

Yes indeed, @SylvainCorlay was pushing me to try it out, and the first tests show it gives me the same performance.

from vaex.

mfouesneau avatar mfouesneau commented on May 1, 2024

My only issue with xtensor is the C++14 need that is still not standard on old systems. This needs at least GCC 4.9 or 5 I believe.

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

True.. vaex 2.0 then ;)

from vaex.

maartenbreddels avatar maartenbreddels commented on May 1, 2024

This is fixed in master, and also supported in the refactor of #197 where this is also tested.

from vaex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.