Comments (9)
I'm marking this as an enhancement rather than as a bug since the docs clearly state "Series only support the vertical strategy." for pl.concat
.
That said, I personally would be okay with allowing named series on pl.concat
for horizontal.
@stinodego What do you think?
from polars.
I personally would be okay with allowing named series on
pl.concat
for horizontal.
Is a named series a series whose name is not ""
? How do we know that the series was not explicitly named this?
I would expect concat
to fail if we end up with multiple columns with the same name (i.e. the user attempts to concatenate multiple Series named ""
), but otherwise succeed, as in:
>>> df = pl.DataFrame({"a": [1, 2, 3]})
>>> s = pl.Series([4, 5, 6])
>>> pl.concat((df, s), how="horizontal)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
but:
>>> pl.concat((s, s), how="horizontal")
polars.exceptions.DuplicateError: unable to hstack, column with name "" already exists
from polars.
Is a named series a series whose name is not
""
?
Yes.
How do we know that the series was not explicitly named this?
I don't really care to discuss the scenario where someone intentionally names their columns the empty string. The vast majority of unnamed series are due to people simply not naming them, which is not something I'd like to support in concat
.
from polars.
There is Series.to_frame
which makes the conversion explicit and allows for naming the series. I'm not sure an implicit conversion is good for the API because in some cases, concatenating a DataFrame
with a Series
or vice versa may be accidental.
from polars.
@mickvangelderen I'm not sure I follow; Series in principle should be interchangeable with eager DataFrame columns, and hstacking
a (named) Series onto an existing df makes perfect logical sense:
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
})
s = pl.Series("c", [7, 8, 9])
# currently, we must df.hstack(s.to_frame())
print(df.hstack(s))
shape: (3, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 4 ┆ 7 │
│ 2 ┆ 5 ┆ 8 │
│ 3 ┆ 6 ┆ 9 │
└─────┴─────┴─────┘
from polars.
I'm fine closing this if we want to only strictly allow dataframes in concat operations, as calling to_frame()
is fairly easy. But--it's something pandas supports, and something that feels logical since there is no ambiguity in what should happen.
from polars.
It is not immediately clear to me what "Combine multiple DataFrames, LazyFrames, or Series into a single object." means exactly in the concat
docs. The type of items is items: Iterable[PolarsType],
where PolarsType = TypeVar("PolarsType", "DataFrame", "LazyFrame", "Series", "Expr")
. Does that mean that each item in the iterable has to be of the same concrete type? That would mean that you can concat a DataFrame
with a DataFrame
, and a Series
with a Series
, but not necessarily a DataFrame
with a Series
.
from polars.
I'm marking this as an enhancement rather than as a bug since the docs clearly state "Series only support the vertical strategy." for
pl.concat
.That said, I personally would be okay with allowing named series on
pl.concat
for horizontal.@stinodego What do you think?
I don't see why we couldn't support horizontal concatenation of Series. The user must make sure the Series names are unique, otherwise we raise an error.
from polars.
@mickvangelderen I agree that it's ambiguous, we should rework the language on that, although it will depend on the decision made in this issue.
I am not sure about mixing eager and lazy frames. We do allow for mixing lazy frames with series, as the Series is simply considered as input into the lazy query plan:
>>> pl.LazyFrame().with_columns(
pl.Series("a", [1, 2, 3])
).collect()
shape: (3, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
│ 3 │
└─────┘
However, join
, concat
, hstack
, etc. do not work with eager and lazy frame combinations. I feel that allowing joint operations on lazy and eager dataframes opens a Pandora's box that we should leave closed.
from polars.
Related Issues (20)
- wrong documentation in DataFrame.update HOT 1
- ComputeError: unable to parse Hive partition value: "TRUE" HOT 3
- `min` fails on `duration` types HOT 3
- Github release for rust-polars 0.40.0 HOT 1
- Getting panic when calling `LazyFrame.group_by().map_groups` and intermitten panic when calling `LazyFrame.columns` HOT 8
- GitHub release seems created with wrong commit? HOT 1
- Ergonomic improvements to `struct.with_fields` HOT 3
- Support converting to NumPy masked arrays HOT 5
- `write_parquet` on chunked data is pathological
- LazyFrame() not omitting hive partition columns
- Panic when trying to use List(Categorical) set_intersection with concat_list of other column with nulls or empty frame HOT 2
- read_excel with engine="calamine" infer_schema_length=0 returns an empty DataFrame HOT 1
- `struct.field("*")` duplicate column ComputeError
- `from_repr` generates DecprecationWarning about `apply` when Duration type is present
- In `expr.str.slice()` indicate whether an index of 0 or 1 means "start at the start of the string"
- Add argument to `df.to_dicts()` and `df.to_dict()` - `maintain_column_order: bool` HOT 3
- Support zero copy for Datetime/Duration types in `DataFrame.to_numpy`
- Reading parquet with PyArrow ignores rechunk argument HOT 1
- Add `pl.col(...).is_not_in(<iterable>)` method HOT 4
- `search_sorted` in an order of magnitude slower when single element chunk vstacked to the original dataframe HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.