Comments (4)
You should edit into the title that the issue is reading from cloud. Please run this and post full output, maybe that'll help.
import os
os.environ['RUST_TRACEBACK']='full'
import polars as pl
parquet_file_name=...
df = pl.read_parquet(parquet_file_name, columns=col_list, use_pyarrow=False)
The team treats all panic errors as bugs but it's likely that you need to set storage_options
.
See here https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html#polars-scan-parquet
and here
https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html
The environment variables that pyarrow via fsspec look for aren't 100% in sync with what polars via object_store looks for so that's probably how to fix the issue.
from polars.
You should edit into the title that the issue is reading from cloud. Please run this and post full output, maybe that'll help.
import os os.environ['RUST_TRACEBACK']='full' import polars as pl parquet_file_name=... df = pl.read_parquet(parquet_file_name, columns=col_list, use_pyarrow=False)
The team treats all panic errors as bugs but it's likely that you need to set
storage_options
.See here https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html#polars-scan-parquet
and here
https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html
The environment variables that pyarrow via fsspec look for aren't 100% in sync with what polars via object_store looks for so that's probably how to fix the issue.
Updated as suggested
from polars.
object_store
As per your suggestion to use storage_options:
# WORKS
df = pl.read_parquet(parquet_file_name, columns=col_list, use_pyarrow=False,
storage_options = {
"aws_access_key_id": os.environ.get('AWS_ACCESS_KEY_ID'),
"aws_secret_access_key": os.environ.get('AWS_SECRET_ACCESS_KEY'),
"aws_region": AWS_REGION})
# DOES NOT WORK
df = pl.read_parquet(parquet_file_name, columns=col_list, use_pyarrow=True,
storage_options = {
"aws_access_key_id": os.environ.get('AWS_ACCESS_KEY_ID'),
"aws_secret_access_key": os.environ.get('AWS_SECRET_ACCESS_KEY'),
"aws_region": AWS_REGION})
The error message is
TypeError: AioSession.init() got an unexpected keyword argument 'aws_access_key_id'
from polars.
The new error is because there isn't parity between what fsspec expects as key names and what object_store expects between key names. If you set your AWS_REGION as as an env var then does pl.read_parquet(parquet_file_name, columns=col_list,use_pyarrow=False)
work? I don't use S3 so I'm just guessing at that.
From here it looks like you should set the environment variable AWS_DEFAULT_REGION
to whatever it should be and then I think that pl.read_parquet(parquet_file_name, columns=col_list,use_pyarrow=False)
would work.
from polars.
Related Issues (20)
- Polars write database - Rest API call failing with AWS Lambda trigger
- Change `read_csv` and `read_ipc` to use object store instead of fsspec
- `.over()` fails with `.top_k_by` HOT 3
- `join_nulls` in "asof" join HOT 3
- exception on numpy slicing literal column with object column
- Scanning cloud paths with percentages '%' fails
- Make pl.Enum(...) return type rather than instance HOT 2
- Elementwise check on `join` expressions is too restrictive
- Built-in datasets and a function to load them HOT 1
- Python test workflows may fail due to failure to download `torch` dependency
- `scan_parquet` does not optimise `slice` or `tail` operations HOT 3
- Apply function to rows of dictionaries in `map_rows`
- De-duplicate decompression in CSV/NDJSON scans
- import numpy with initial null value HOT 2
- Eager/Lazy API alignment: LazyGroupBy vs DynamicGroupBy
- list.any() and list.all() behavior with all null list looks incorrect HOT 3
- pl.from_numpy produces a DataFrame with the wrong values if a schema is given
- Assigning multiple columns on same condition, splitting struct into multiple columns? HOT 1
- Add a `replace` method to the `expr.name` attribute
- Arrow PyCapsule TypeError: __arrow_c_stream__() missing 1 required positional argument: 'requested_schema' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.