Giter VIP home page Giter VIP logo

Comments (6)

midichef avatar midichef commented on May 23, 2024 1

Great repro case for this report.

I narrowed down where to look. This line creates the slowness for dates. If I comment it out, describe-sheet finishes quickly.

v = srccol.type(v)

from visidata.

anjakefala avatar anjakefala commented on May 23, 2024

Thank you for such a detailed report, and providing sample data!

This regression is not seen with other typed column (integer)

Do you mean that this behaviour has regressed since a previous version, or that you noticed a difference in behaviour between int and date?

from visidata.

saulpw avatar saulpw commented on May 23, 2024

Parsing dates is expensive, especially with python-dateutil. If you know the format, try using z@.

from visidata.

adren avatar adren commented on May 23, 2024

Thank you for such a detailed report, and providing sample data!

This regression is not seen with other typed column (integer)

Do you mean that this behaviour has regressed since a previous version, or that you noticed a difference in behaviour between int and date?

No, it's not a regression in 3.x per se: the problem was already there in 2.11

from visidata.

adren avatar adren commented on May 23, 2024

Parsing dates is expensive, especially with python-dateutil. If you know the format, try using z@.

Even when specifying a custom date format on that particular big file, the summarizing still takes 49 seconds
which is +122% compared to the 22 seconds (no typing)
but still better that without specifying (122 seconds)

$ time vd -b -p describe_cd.vdj 
saul.pw/VisiData v3.0.2
Support VisiData: https://github.com/sponsors/saulpw
opening describe_cd.vdj as vdj
opening people-2000000.csv as csv
set type of current column to custom date format
open Describe Sheet with descriptive statistics for all visible columns
replay complete

real    0m49,026s
user    0m48,227s
sys     0m0,809s

from visidata.

saulpw avatar saulpw commented on May 23, 2024

Yes, this makes sense. Again, parsing dates is expensive, even with strptime (which is what z@ uses). If you write a Python script that converts all elements in that column to date objects and then summarizes them, you should find that it takes about the same amount of time. If you want the values summarized as dates, the work has to be done somewhere!

from visidata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.