Comments (2)
We use Python's Decimal for parsing: https://github.com/wireservice/agate/blob/e6fc5beb65f444a84b02884e9c5e7b3e344599dd/agate/data_types/number.py#L95
For example:
>>> from decimal import Decimal
>>> Decimal('1_1')
Decimal('11')
Decimal works this way because, in Python, the underscore can be used as a grouping character:
>>> 1_000_000
1000000
>>> 1_1_1_1
1111
I'd be curious to know how tidyverse or pandas do this.
In terms of feedback to the user, we can hide it behind the -v
(--verbose
) flag, or we can figure out a way to keep the feedback minimal (e.g. only warn once, not every time).
from csvkit.
I'll check to confirm my memory that it doesn't occur in pandas/tidyverse. I'm not super familiar with their underlying code, but I can take a quick look to figure out how they're handling it.
I generally lean towards cli tools being as quiet as possible, so in that respect I'm leaning towards the verbose
option.
However, I also value no surprises - the root cause of surprise was that csvkit
was doing any datatype inference at all.
My mental model had it just treating everything as string, probably because while I use it a lot (thanks!), I don't do any real data analysis with it, just a bunch of joins, stacks, and greps. In that respect, a one-off warning might be better.
from csvkit.
Related Issues (20)
- Specify which in-memory SQL database is created when querying CSV files directly HOT 1
- csvsql UPDATE query not working? HOT 2
- Support zstandard-compressed (.zst) CSV files HOT 3
- BadZipFile when using -f xlsx --write-sheets - on STDIN HOT 1
- Test and document support for DuckDB HOT 5
- 'gbk' codec can't encode character '\xc1' HOT 2
- Apologies if i asked in wrong place HOT 3
- csvsql: add an option to not show the name of the selected columns HOT 1
- csvlook: consider passing max_rows argument to agate.table.from_csv HOT 1
- add more options from csvlook to csvpy HOT 3
- csvformat: add command-line option for output in ASV, i.e. ascii unit separator HOT 2
- csvstack is column agnostic and corrupts output HOT 2
- Include man pages in PyPI package tarballs? HOT 4
- csvclean: Doesn't behave as expected if header row is too short HOT 5
- Does csvkit support the # character as comments? HOT 2
- Integrating with MegaLinter HOT 16
- Containerize / Dockerize csvkit HOT 9
- Question: parsing text column-wise HOT 4
- TimeDeltas and json HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csvkit.