Comments (5)
Yes - both make sense. the whitelist/blacklist are used and supported specifically only with csv and parquet file loading systems and the make the parsing far cheaper. They aren't currently passed into the new-dataset system.
Or - put more generally - it is up to the specific loading extension to either support them or not.
from tech.ml.dataset.
[not urgent - I can proceed presently with acceptable performance w/ ds/select-columns
]
from tech.ml.dataset.
Thanks! That helps me understand a lot.
I get now that there is more time to be saved w/ an allow list on (for example) csv over json, because a lot of the parsing is already done to make the sequence of maps in json, where csv there is more work to save by skipping entire columns.
I am going to proceed with a pathway something like json -> ds/->dataset -> ds/select-columns -> ds/row-map
for now, because I think w/ all the data I need to process this way this wont be the performance bottleneck. I'll circle back around if it does become a pain/blocker.
Separately, I do think having :column-allowlist
and :column-blocklist
as synonyms for :column-whitelist
and :column-blacklist
is a good idea.
from tech.ml.dataset.
Yes very good. Much clearer and they don't rely on anachronisms.
from tech.ml.dataset.
Related Issues (20)
- Column order in `descriptive-stats` and (implicitly) `brief` HOT 3
- Support for more neanderthal matrix types
- math/correlation-table always uses pearson correlation type HOT 2
- left-join failing on dates in 7.000-beta-10 compared to v6 HOT 2
- Issue in filter-column for large datasets in 7 beta HOT 4
- Strange ds/head and ds/tail behaviour in v7 HOT 6
- tech.v3.libs.nettoolkit HOT 4
- `print-all` is busted HOT 2
- Document build/deploy pathways
- Arrow - nested types
- allow printing precision for doubles HOT 1
- NullPointerException when reading an empty Arrow dataset HOT 1
- tranduce-compatible rf functions for parquet ds-seq->parquet and arrow/ds-seq->arrow pathways.
- Maybe a more generic `replace-missing` interface? HOT 6
- `column-map` on three columns throws an exception HOT 2
- `[group-by]` - returned value cannot be destructured as a sequence of key/value pairs HOT 3
- do we have dot product ? HOT 3
- selecting first row on empty dataset throws an exception HOT 2
- left-join on char column fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tech.ml.dataset.