Comments (6)
SQLite is much more than just reading a table. There could be some better binary format for this.
from desktop.
Can we hook the CSV reader so that it emits rows to another parallel thread and stores to an sqlite DB (or other fast to read format)? This could happen in the background while a query is doing a full scan.
It makes more sense to me to do this in Easy Diffix, after schema detection.
Not sure how much we would gain by this.
If we don't plan to use column indexes, I don't have high hopes it would be much faster.
I would prefer if we were able to read compressed data. I also don't want to have to worry about cache storage and garbage collection.
from desktop.
I would prefer if we were able to read compressed data.
I gave this a try, but got no significant performance gain.
I don't think we should do data caching (adds too many headaches). We could try to use separate threads for data loading and aggregation, maybe that makes an impact.
from desktop.
I had tried to use parallelization via PLINQ and such but it didn't make a difference (while doing the immutable data structure optimization).
Performance is quite good now, we just need to figure out how to optimize the DISTINCT count. I noticed that it is highly dependent on the data structure. It scales terribly with number of buckets I think.
from desktop.
OK, there is a separate issue in reference
for that already, so this seems obsolete now.
from desktop.
It scales terribly with number of buckets I think.
My impression was that it scales badly with the cardinality of the input argument.
from desktop.
Related Issues (20)
- Support passing custom AnonymizationParams from DfD to the service
- Replace full aggregation hook with post aggregation callback
- Add a configuration file.
- LED causes long Preview request when there are ~40k buckets HOT 9
- Modification to Desktop-settable anonymization parameters HOT 4
- Cleanup the handling of default anonymization parameters
- Suppression threshold is a controlled input but has a defaultValue HOT 7
- Provide smart defaults for generalization.
- Always cast columns to their inferred type when loading from the table
- Change "Adjust suppression threshold" to just "Suppression threshold" HOT 2
- Specification of suppression ("star") bucket
- Consider using 'summary' feature of tables for suppress bin
- Generalized star bucket tooltip looks weird
- Fill anon params description location in "Other anonymization parameters" docs section. HOT 1
- Example emails in docs should not be hyperlinked.
- Changing tabs reruns notebook steps after a point
- Update `reference` dep to latest `master` version.
- Numeric generalization of `integer` columns casts to `real`.
- Investigate Unicode support.
- Test/support auto detection of language to use for the GUI
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from desktop.