Reusing a file that has already been loaded in the past should be faster. Can be that

Alpha branch of the python api has been committed to <a href="https://github.com/harel

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Reuse of previously loaded data about q HOT 9 OPEN

harelba commented on May 13, 2024

Reuse of previously loaded data

from q.

Comments (9)

Fil commented on May 13, 2024

On a similar note I was wondering how one could reuse the generated db.
Changing :memory: to q.sqlite and ending with db.conn.commit() instead of table_creator.drop_table() did the trick.

Caching the data is not obvious, as you need to check if it's the same (could be the file's md5sum), and have some sort of garbage collection.

from q.

bitti commented on May 13, 2024

Yeah, as we all know, the 2 most difficult problems in computer science are cache invalidation, naming things and off by one errors.

from q.

harelba commented on May 13, 2024

Exactly :)

Hi, sorry for the late reply. Been offline for a couple of days.

Thanks a lot, I'll take a deeper look at your tip and see if I can find some trick to make the invalidation fast enough (was planning on cksum, perhaps a sampled cksum or something, with an option to be stricter and slower through a command line parameter).

Harel

from q.

harelba commented on May 13, 2024

I've created an API which will allow q to be used from python code as a module. The changes also inherently include the possibility to reuse previously loaded data (e.g. running multiple queries against the same loaded data).

Alpha version of the new API will be committed into the main branch in a couple of days.

from q.

harelba commented on May 13, 2024

Alpha branch of the python api has been committed to https://github.com/harelba/q/tree/expose-as-python-api.

The python api supports reuse of already-loaded data, and this capability is exposed to the command line by allowing the user to write multiple queries in the same q execution - E.g. q "select ..." "select ..." "select ..." .... Running q like that will load the data only once for each file, even if it's used in multiple queries. In the future, I'll probably add an interactive REPL for this as well.

Any input would be helpful and appreciated.

Harel

from q.

harelba commented on May 13, 2024

Forgot to write - The readme file of the branch contains the required information about the API.

from q.

harelba commented on May 13, 2024

This capability is now fully supported internally, and exposed partially by running multiple queries on the same command line (Every invocation of q reuses the data between multiple queries that are being run).

This issue will be closed when the feature is fully exposed.

from q.

msangel commented on May 13, 2024

This can be also done like an interactive SQL client, so at the start it loads all the data(into memory, I don't care, I have 32GB ram) and then we can execute the queries. My sample file is like 3GB and waiting for another minute per each query isn't good.
Like:

$ q --client -H data.csv as data
q > select count(*) from data
------------------
|  count(*)     |
------------------
|  10000000     |
------------------
q > select my_field from data where condition=true limit 3
-------------
|  my_field |
-------------
|  val1     |
-------------
|  val2     |
-------------
|  val3     |
-------------

Support for multiple files discussable.

from q.

harelba commented on May 13, 2024

Hi @msangel @Fil

I'm going to release a new version of q soon. It's a large change, which includes inherent caching capabilities similar to the ones you're describing, eliminating the need to wait between multiple queries of the same file.

Harel

from q.

Reuse of previously loaded data about q HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent