Comments (3)
partitioning only on "ticker" causes the issue too, I just checked
from polars.
pq.ParquetDataset(dataset_path).read().to_pandas()
works fine as well
from polars.
this seems fixed in 1.0, at least with the following changes:
from datetime import datetime
from pathlib import Path
import polars as pl
dataset_path = Path("./test_dataset")
dataset_path.mkdir(exist_ok=True)
test_df = (
pl.DataFrame(
{
"timestamp": [datetime(2021, 1, 1), datetime(2022, 2, 1)],
"data": [1, 2],
"ticker": ["AAPL", "TRUE"],
}
)
.with_columns(pl.col("ticker").cast(pl.Enum(["AAPL", "TRUE"])))
.with_columns(
pl.col("timestamp").dt.year().alias("year"), pl.col("timestamp").dt.month().alias("month")
)
)
test_df.write_parquet(
dataset_path, use_pyarrow=True, pyarrow_options={"partition_cols": ["ticker", "year", "month"]}
)
df_in = pl.scan_parquet(dataset_path, hive_partitioning=True)
print(df_in.filter(pl.col("ticker") == "AAPL").collect())
from polars.
Related Issues (20)
- Unexpected results occur when using the head(1) expression in the select method.
- Add cumulative n_unique HOT 1
- Series constructor with NumPy datetime64 scalar returns incorrect results
- Request for new method argmax_horizontal HOT 4
- `pl.struct` inside `list.eval` produces different dtype if fields are named
- Add `Expr.scatter` HOT 3
- Example code for `group_by` iteration is wrong HOT 1
- `unpivot()` on an empty DataFrame creates two empty `variable` columns
- SQL expression panics on invalid query HOT 3
- Make series raw display output to not use single quotes to be consistent with the DataFrame raw display outputs. HOT 10
- ParquetWriter<CloudWriter> hangs when uploading to S3 HOT 2
- When using `.implode().list` in `GroupBy.agg`, the return type is list, not the original type of the value. HOT 2
- `Series.search_sorted` gives wrong answer when using expression arguments HOT 4
- Panic occurring when using streaming and limit with Parquet
- `Expr.replace` and `Expr.replace_strict` set "NO_DEFAULT" as a value HOT 1
- Parquet files cannot be read from pre-signed S3 URLs due to S3 forbidding HTTP HEAD HOT 8
- Wrong result when filtering categorical using `.is_in` in `scan_parquet` HOT 2
- `.over()` fails with `.get`. HOT 6
- Don't disable coalesce when joining on expressions HOT 3
- entered unreachable code when using collect_schema and concat HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.