Comments (4)
pygdf still requires data to fit in GPU memory. Our out-of-core solution will be provided by https://github.com/gpuopenanalytics/dask_gdf, which uses pygdf.DataFrame as a building block with an interface similar to dask.dataframe. However, dask_gdf is in earlier stages than pygdf and the docs are a bit lacking at the moment.
Pygdf has a delayed memory deallocation machinery inherited from numba. When a DataFrame is no longer retained by any reference, it goes into a pool for pending deallocation. The memory is deallocated as soon as there are 10 pending deallocation, pending deallocation holds on 20% of total gpu memory, or the numba gpu allocator observes a CUDA_ERROR_OUT_OF_MEMORY.
from cudf.
Thank you for your answer, so is pygdf not suitable for big data analisis? should I use dask_gdf for big data analisis? How does dask_gdf work?
from cudf.
Since big-data analytics is a big umbrella term, I will use the term out-of-core for your usecase where the data cannot fit in ram. We designed pygdf and dask_gdf such that pygdf handles in-core operations and dask_gdf handles out-of-core operations. Recall that dask_gdf uses pygdf as the building block, so it is possible to put pygdf in another out-of-core execution framework (i.e Spark).
For your usecase, you might be more interested in dask_gdf. To learn about dask_gdf, it good to start with learning about dask and the dask.dataframe at http://dask.pydata.org/. The dask_gdf is just dask.dataframe that uses pygdf.DataFrame as the building block instead of the pandas.DataFrame. We currently don't have a documentation page for dask_gdf but we are working on it.
Lastly, pygdf and dask_gdf are both in alpha stage. We encourage early adopters to experiment with them and give us feedback. But, it is not ready for production deployment yet.
from cudf.
thank you for your answer. I will try it. looking foward to see the final stage of this project
from cudf.
Related Issues (20)
- [BUG] Timestamp scalars losing sub-second data
- [BUG] `__contains__` not comparing int vs float values
- [BUG] `Index.get_indexer` returns `int32` array instead `int64`
- [FEA] Enable pandas Holidays in `cudf.pandas`
- [BUG] `Groupby.fillna` is not raising when a non-categorical value is passed
- [BUG] `Groupby` operations should fail on un-supported types instead of passing silently
- [FEA] Make `cudf.pandas` not perform redundant CPU<->GPU transfers if there is no in-place write operations
- [FEA] Add `pd.Timestamp` & `pd.Timedelta` types in `cudf.pandas`
- [BUG] Binary operators dunder methods should not call `operator.op` in `cudf.pandas`
- [BUG] `__isub__` and `__iadd__` implementations are missing in `cudf.pandas`
- [FEA] Add `NumpyExtensionArray` class to `cudf.pandas`
- [BUG] `nan_as_null` needs to be `False` by default when pandas-compatible mode is on
- [BUG] pylibcudf strings circular import error HOT 1
- [BUG] Memcheck error in ParquetChunkedReaderInputLimitTest.Mixed HOT 1
- [BUG] cudf.pandas wrapped numpy arrays not compatible with numba HOT 3
- [BUG] Series/Single Column DataFrame Groupby value_counts fails (DataFrame Groupby value_counts succeeds) HOT 1
- [FEA] Explicitly guarantee row group ordering in the parquet reader.
- [FEA] Migrate left join and conditional join benchmarks to use nvbench
- [BUG] [JNI] `CudaTest.testCudaException` will not throw `cudaErrorInvalidValue` expectedly under certain environment
- [BUG] cudf.Series should accept None values when nan_as_null=False
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cudf.