Comments (5)
In pandas, I use the read_csv
function of pandas and then use the period parameter sep='\s+'
to split the data.
df = pd.read_csv(filename, header=None, skiprows=6, sep='\s+')
from polars.
yeah, this also works but as I said currently polars does not support regex or string separator but only a single char.
there are workarounds but they are not very nice 😆
DATA = """\
11.50225 34.62792 341.48861 60.23845 33.86916 340.52216
16.08011 46.36068 112.74108 82.09562 45.90745 112.68871
5.44448 64.20202 84.74526 92.26079 63.48149 84.83877
154.21007 40.30874 284.20968 248.08102 40.32464 284.05453
44.78606 81.08370 306.90320 207.53215 80.58101 307.01056
187.79354 52.18742 348.14328 254.43741 52.35809 348.16040
3.19632 58.35471 336.89014 83.53841 59.67276 335.88022
4.53459 54.00255 23.75481 66.02106 51.58699 23.86702
"""
pl.read_csv(DATA.encode(), has_header=False, new_columns=["data"]).with_columns(
pl.col("data")
.str.strip_chars(" ")
.str.replace_all(" +", " ")
.str.split(" ")
.list.to_struct()
).unnest(columns="data").with_columns(pl.all().cast(pl.Float64))
shape: (8, 6)
┌───────────┬──────────┬───────────┬───────────┬──────────┬───────────┐
│ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ field_4 ┆ field_5 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════════╪══════════╪═══════════╪═══════════╪══════════╪═══════════╡
│ 11.50225 ┆ 34.62792 ┆ 341.48861 ┆ 60.23845 ┆ 33.86916 ┆ 340.52216 │
│ 16.08011 ┆ 46.36068 ┆ 112.74108 ┆ 82.09562 ┆ 45.90745 ┆ 112.68871 │
│ 5.44448 ┆ 64.20202 ┆ 84.74526 ┆ 92.26079 ┆ 63.48149 ┆ 84.83877 │
│ 154.21007 ┆ 40.30874 ┆ 284.20968 ┆ 248.08102 ┆ 40.32464 ┆ 284.05453 │
│ 44.78606 ┆ 81.0837 ┆ 306.9032 ┆ 207.53215 ┆ 80.58101 ┆ 307.01056 │
│ 187.79354 ┆ 52.18742 ┆ 348.14328 ┆ 254.43741 ┆ 52.35809 ┆ 348.1604 │
│ 3.19632 ┆ 58.35471 ┆ 336.89014 ┆ 83.53841 ┆ 59.67276 ┆ 335.88022 │
│ 4.53459 ┆ 54.00255 ┆ 23.75481 ┆ 66.02106 ┆ 51.58699 ┆ 23.86702 │
└───────────┴──────────┴───────────┴───────────┴──────────┴───────────┘
However, best way if the file is not huge is probably to read the data, replace all \s+
with ',' and then read_csv
the "clean" csv using polars
from polars.
afaik this is not possible with polars currently because the separator must be a single character.
what you are looking for is the equivalent of pandas read_fwf
to read "fixed-width-formatted" data (https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html)
there are a few issues already but it is not yet supported.
from polars.
no, because the implementation of the separator
param behaviour in the read_csv
method only accept a single byte character.
from polars.
As answered above: this is not possible.
from polars.
Related Issues (20)
- pl.struct with no arguments triggers a panic
- Multiple expr.head(n).max()/min()/etc operations in with_columns causing ShapeError
- Add `repeat` and `tile` for Series/Expr
- `SchemaFieldNotFoundError` when chaining `select` and `collect`
- Series is ignoring the dtype argument, series.to_numpy() dtype depends on values passed
- Problem filtering categorical string columns with lazy frame and scan_parquet HOT 4
- PanicException: validity must be equal to the array's length HOT 6
- Polars-lts-cpu fails to import on older CPU (no SSSE3/SSE4 support)
- Release GIL on `collect_schema`
- Error when using struct expression with `with_fields` in an `over` context HOT 6
- How to write a UDF for polars that run concurrently?
- `write_excel`: write column formats for column, not individual cells within column HOT 1
- Horizontal concat execution time is quadratic in the number of columns
- Incorrect results from `Series.__rtruediv__` HOT 1
- Add SQL feature of ORDER BY RANDOM()
- Expressions support in insert_column (like with_columns)
- Passing a `Series` to `DataFrame.sort` gives "literal expressions are not allowed for sorting" error HOT 3
- polars' ingestion of decimal.Decimal values fails if all values do not have the same number of decimal places HOT 3
- Series.is_in called with a mixed list of Python integers and floats fails HOT 4
- Where is indexing `__getitem__` (e.g. `df[...]`) documented for polars DataFrame and Series? HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.