Deion It would be great if there would be a way to convert a

I don't think we should want that <a class="user-mention notranslate" data-hovercard-t

Way to enforce "fortran" layout about polars HOT 5 CLOSED

cosama commented on May 23, 2024

Way to enforce "fortran" layout

from polars.

Comments (5)

stinodego commented on May 23, 2024

Currently, rechunk only makes sure that each individual Series is contiguous in memory.

It would make sense to have a parameter on rechunk that ensures the entire dataframe is contiguous. Thanks for the suggestion!

I think you can hack around this by converting to numpy and then back. The resulting DataFrame will be contiguous in memory.

from polars.

ritchie46 commented on May 23, 2024

I don't think we should want that @stinodego. That requires unsafe allocations and will be super hard to enfore throughout the engine.

And besides all much more costly than simply paying the copy at the end.

Either way there needs to be made a copy. It doesn't matter if we do it internally or when moving out to numpy. I will close this as it will have no benefit.

from polars.

stinodego commented on May 23, 2024

Right, I was thinking that if you do many to_numpy calls, it would be cheaper to first convert the DataFrame to Fortran layout once, and it will save the copy in subsequent calls.

However, any operations you do on the DataFrame inbetween those calls will not guarantee that the Fortran layout is preserved. So better to just not give any guarantees about it.

from polars.

cosama commented on May 23, 2024

Thank you so much for the quick answer. I did assume that the layout is by always fortran internally (except maybe for some weird numpy arrays), but yeah I can see how this could create some complications at other places. Makes sense to not implement it then.

By the way, why does to_numpy need to make a copy if the layout is not fortran? If you support other layouts throughout the engine, couldn't you just move it back to numpy in that format?

from polars.

ritchie46 commented on May 23, 2024

Right, I was thinking that if you do many to_numpy calls, it would be cheaper to first convert the DataFrame to Fortran layout once, and it will save the copy in subsequent calls.

In such a case, I think people should cache their numpy array. I don't think our methods should be focussed with caching.

If you support other layouts throughout the engine, couldn't you just move it back to numpy in that format?

If we could, we would. A numpy array is backed by a single contiguous allocation. Polars DataFrames are backed by multiple buffers.

Don't worry about that copy too much. They happen implicitly all the time.

from polars.

Recommend Projects

Way to enforce "fortran" layout about polars HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent