Comments (9)
On a similar note I was wondering how one could reuse the generated db.
Changing :memory:
to q.sqlite
and ending with db.conn.commit()
instead of table_creator.drop_table()
did the trick.
Caching the data is not obvious, as you need to check if it's the same (could be the file's md5sum), and have some sort of garbage collection.
from q.
Yeah, as we all know, the 2 most difficult problems in computer science are cache invalidation, naming things and off by one errors.
from q.
Exactly :)
Hi, sorry for the late reply. Been offline for a couple of days.
Thanks a lot, I'll take a deeper look at your tip and see if I can find some trick to make the invalidation fast enough (was planning on cksum, perhaps a sampled cksum or something, with an option to be stricter and slower through a command line parameter).
Harel
from q.
I've created an API which will allow q to be used from python code as a module. The changes also inherently include the possibility to reuse previously loaded data (e.g. running multiple queries against the same loaded data).
Alpha version of the new API will be committed into the main branch in a couple of days.
from q.
Alpha branch of the python api has been committed to https://github.com/harelba/q/tree/expose-as-python-api.
The python api supports reuse of already-loaded data, and this capability is exposed to the command line by allowing the user to write multiple queries in the same q execution - E.g. q "select ..." "select ..." "select ..." ...
. Running q like that will load the data only once for each file, even if it's used in multiple queries. In the future, I'll probably add an interactive REPL for this as well.
Any input would be helpful and appreciated.
Harel
from q.
Forgot to write - The readme file of the branch contains the required information about the API.
from q.
This capability is now fully supported internally, and exposed partially by running multiple queries on the same command line (Every invocation of q reuses the data between multiple queries that are being run).
This issue will be closed when the feature is fully exposed.
from q.
This can be also done like an interactive SQL client, so at the start it loads all the data(into memory, I don't care, I have 32GB ram) and then we can execute the queries. My sample file is like 3GB and waiting for another minute per each query isn't good.
Like:
$ q --client -H data.csv as data q > select count(*) from data ------------------ | count(*) | ------------------ | 10000000 | ------------------ q > select my_field from data where condition=true limit 3 ------------- | my_field | ------------- | val1 | ------------- | val2 | ------------- | val3 | -------------
Support for multiple files discussable.
from q.
I'm going to release a new version of q soon. It's a large change, which includes inherent caching capabilities similar to the ones you're describing, eliminating the need to wait between multiple queries of the same file.
Harel
from q.
Related Issues (20)
- Can we support more glibc version ? HOT 1
- Setting default delimiter in .qrc config file? HOT 2
- Escape table name to avoid filename replacement
- local install fails due to setup.py HOT 3
- Feature request: replace spaces in column names with underline
- HOT 1
- Cannot run WIndows build HOT 4
- How do I use it in GitHub Actions?
- Header -O problem with -b beautify and -f format HOT 1
- Using q inside Windows PowerShell causes encoding issues
- q can't be installed with home-brew anymore HOT 3
- Is entire file loaded into memory to be queried? would prefer read line by line, not whole file loaded in (10Gb file) HOT 2
- Warning about BOM suggest incorrect flag
- header mis-interpretation HOT 1
- Exception ignored in _io.TextIOWrapper when piping output to less HOT 1
- IS DISTINCT FROM not working HOT 1
- ENH: Would you consider making q available through conda? HOT 6
- How does the speed depend on the environment? HOT 2
- Always getting "query error: near "~": syntax error" error HOT 4
- BUG: Packaging in broken
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from q.