Comments (6)
Great repro case for this report.
I narrowed down where to look. This line creates the slowness for dates. If I comment it out, describe-sheet finishes quickly.
visidata/visidata/features/describe.py
Line 85 in 6a1f17c
from visidata.
Thank you for such a detailed report, and providing sample data!
This regression is not seen with other typed column (integer)
Do you mean that this behaviour has regressed since a previous version, or that you noticed a difference in behaviour between int
and date
?
from visidata.
Parsing dates is expensive, especially with python-dateutil. If you know the format, try using z@
.
from visidata.
Thank you for such a detailed report, and providing sample data!
This regression is not seen with other typed column (integer)
Do you mean that this behaviour has regressed since a previous version, or that you noticed a difference in behaviour between
int
anddate
?
No, it's not a regression in 3.x per se: the problem was already there in 2.11
from visidata.
Parsing dates is expensive, especially with python-dateutil. If you know the format, try using
z@
.
Even when specifying a custom date format on that particular big file, the summarizing still takes 49 seconds
which is +122% compared to the 22 seconds (no typing)
but still better that without specifying (122 seconds)
$ time vd -b -p describe_cd.vdj
saul.pw/VisiData v3.0.2
Support VisiData: https://github.com/sponsors/saulpw
opening describe_cd.vdj as vdj
opening people-2000000.csv as csv
set type of current column to custom date format
open Describe Sheet with descriptive statistics for all visible columns
replay complete
real 0m49,026s
user 0m48,227s
sys 0m0,809s
from visidata.
Yes, this makes sense. Again, parsing dates is expensive, even with strptime (which is what z@
uses). If you write a Python script that converts all elements in that column to date objects and then summarizes them, you should find that it takes about the same amount of time. If you want the values summarized as dates, the work has to be done somewhere!
from visidata.
Related Issues (20)
- Cannot open xpt files with the latest xport dependency version HOT 2
- AttributeError: `AttrColumn` object has no attribute `column_letter` HOT 1
- The expression used for custom columns using `=` is evaluated repeatedly HOT 2
- [eml] Add an alias for mhtml file type that maps to eml
- [eml] Traceback when using the x command to extract entries without a filename HOT 1
- Can't disable mouse HOT 2
- vd v3.0.2 performance regression on Android HOT 1
- [Windows] Error: use_default_colors() returned ERR HOT 1
- system crash opening large (compressed) file
- unable to edit nested json value
- shell-command-on-cell HOT 1
- vdsql: edit cell, copy as sql, dump data...
- `history` parameter of input() is appears ignored HOT 1
- Support Decimal type HOT 1
- vsdql related errors on load and `&`
- Can't open any file HOT 2
- Scientific notation shown for column with large number even when type is string HOT 2
- [texttables] incorrect 'tabulate' module installed with brew HOT 6
- Autodetect file delimiters by scanning the first ten lines HOT 1
- Some issues during first time testing vdsql HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from visidata.