Giter VIP home page Giter VIP logo

Comments (5)

MarcoGorelli avatar MarcoGorelli commented on August 18, 2024

Thanks For the report, I'm surprised this didn't work, will check

from polars.

Julian-J-S avatar Julian-J-S commented on August 18, 2024

This might be a "bug" or design decision by chrono (the rust parser) BUT

  • "%Y-%m-%d %H:%M:%S%.6f0" 💥
    • this does NOT work in this case
    • this is usually supported.
  • "%Y-%m-%d %H:%M:%S.%6f0" 🍀
    • this WORKS
    • Note: moved the dot . left
    • alternative way to make a literal dot instead of telling chrono to handle it itself as part of the fractional seconds

from polars.

MarcoGorelli avatar MarcoGorelli commented on August 18, 2024

To me this looks like it might be a bug in Chrono:

use chrono::NaiveDateTime;
fn main() {

    let result = NaiveDateTime::parse_from_str(
        "2024-06-03 20:02:48.6800000",
        "%Y-%m-%d %H:%M:%S%.6f0",
    );
    println!("{:?}", result);
    let result = NaiveDateTime::parse_from_str(
        "2024-06-03 20:02:48.680000",
        "%Y-%m-%d %H:%M:%S%.6f",
    );
    println!("{:?}", result);
}

prints out

Err(ParseError(TooShort))
Ok(2024-06-03T20:02:48.680)

from polars.

tritemio avatar tritemio commented on August 18, 2024

For reference I have created a pytest file that tests several format strings on datetime.strptime, pandas.to_datetime and polars to_datetime. Results shows the valid format string is different across the 3, with datetime and pandas being similar, while polars requires a different format.

I found that polars converts the string even without the trailing 0 in the format when using either %.f or %.6f. This should be the right format for chrono (although different from the python conventions).

Full results:
Screenshot 2024-06-26 at 00 10 38

pytest file:

from datetime import datetime

import pandas as pd
import polars as pl
import pytest

dt_string = "2024-06-03 20:02:48.6800000"


dt_formats = [
    "%Y-%m-%d %H:%M:%S%.f",
    "%Y-%m-%d %H:%M:%S%.6f",
    "%Y-%m-%d %H:%M:%S.%f",
    "%Y-%m-%d %H:%M:%S.%6f",
]
dt_formats = dt_formats + [f + "0" for f in dt_formats]


@pytest.fixture
def df():
    return pl.DataFrame({"dt": dt_string})


@pytest.mark.parametrize("format", dt_formats)
def test_polars(df, format, capsys):
    with capsys.disabled():
        print(format)
        print(df)

    t_ref = pl.Series(name="dt", values=[datetime(2024, 6, 3, 20, 2, 48, 680000)])
    df2 = df.with_columns(dt=pl.col("dt").str.to_datetime(format, time_unit="ns"))

    with capsys.disabled():
        print(df2)
    assert (df2["dt"] == t_ref).all()


@pytest.mark.parametrize("format", dt_formats)
def test_pandas(df, format, capsys):
    with capsys.disabled():
        print(format)
        print(df)

    t_ref = pd.Series(name="dt", data=[datetime(2024, 6, 3, 20, 2, 48, 680000)])
    t = pd.to_datetime(
        df["dt"].to_pandas(use_pyarrow_extension_array=True), format=format
    )

    with capsys.disabled():
        print(t)

    assert (t == t_ref).all()


@pytest.mark.parametrize("format", dt_formats)
def test_datetime(df, format, capsys):
    dt_ref = datetime(2024, 6, 3, 20, 2, 48, 680000)
    dt = datetime.strptime(dt_string, format)

    assert dt == dt_ref

from polars.

Julian-J-S avatar Julian-J-S commented on August 18, 2024

@tritemio correct, python datetime and chrono have different behaviour for some details.

you can check out the documentation

python
image

chrono
image

image

Watch our for this (chrono)

date = "2020-01-01 10:00:00.1234"

  • "%Y-%m-%d %H:%M:%S%.f" -> 2020-01-01 10:00:00.123400
  • "%Y-%m-%d %H:%M:%S.%f" -> 2020-01-01 10:00:00.000001234

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.