Two thoughts: Tried :column-whitelis

`:column-whitelist` thoughts about tech.ml.dataset HOT 5 OPEN

harold commented on July 24, 2024

`:column-whitelist` thoughts

from tech.ml.dataset.

Comments (5)

cnuernber commented on July 24, 2024 1

Yes - both make sense. the whitelist/blacklist are used and supported specifically only with csv and parquet file loading systems and the make the parsing far cheaper. They aren't currently passed into the new-dataset system.

Or - put more generally - it is up to the specific loading extension to either support them or not.

from tech.ml.dataset.

harold commented on July 24, 2024

[not urgent - I can proceed presently with acceptable performance w/ ds/select-columns]

from tech.ml.dataset.

harold commented on July 24, 2024

Thanks! That helps me understand a lot.

I get now that there is more time to be saved w/ an allow list on (for example) csv over json, because a lot of the parsing is already done to make the sequence of maps in json, where csv there is more work to save by skipping entire columns.

I am going to proceed with a pathway something like json -> ds/->dataset -> ds/select-columns -> ds/row-map for now, because I think w/ all the data I need to process this way this wont be the performance bottleneck. I'll circle back around if it does become a pain/blocker.

Separately, I do think having :column-allowlist and :column-blocklist as synonyms for :column-whitelist and :column-blacklist is a good idea.

from tech.ml.dataset.

cnuernber commented on July 24, 2024

Yes very good. Much clearer and they don't rely on anachronisms.

from tech.ml.dataset.

Recommend Projects

`:column-whitelist` thoughts about tech.ml.dataset HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent