Giter VIP home page Giter VIP logo

Comments (9)

Fil avatar Fil commented on May 13, 2024

On a similar note I was wondering how one could reuse the generated db.
Changing :memory: to q.sqlite and ending with db.conn.commit() instead of table_creator.drop_table() did the trick.

Caching the data is not obvious, as you need to check if it's the same (could be the file's md5sum), and have some sort of garbage collection.

from q.

bitti avatar bitti commented on May 13, 2024

Yeah, as we all know, the 2 most difficult problems in computer science are cache invalidation, naming things and off by one errors.

from q.

harelba avatar harelba commented on May 13, 2024

Exactly :)

Hi, sorry for the late reply. Been offline for a couple of days.

Thanks a lot, I'll take a deeper look at your tip and see if I can find some trick to make the invalidation fast enough (was planning on cksum, perhaps a sampled cksum or something, with an option to be stricter and slower through a command line parameter).

Harel

from q.

harelba avatar harelba commented on May 13, 2024

I've created an API which will allow q to be used from python code as a module. The changes also inherently include the possibility to reuse previously loaded data (e.g. running multiple queries against the same loaded data).

Alpha version of the new API will be committed into the main branch in a couple of days.

from q.

harelba avatar harelba commented on May 13, 2024

Alpha branch of the python api has been committed to https://github.com/harelba/q/tree/expose-as-python-api.

The python api supports reuse of already-loaded data, and this capability is exposed to the command line by allowing the user to write multiple queries in the same q execution - E.g. q "select ..." "select ..." "select ..." .... Running q like that will load the data only once for each file, even if it's used in multiple queries. In the future, I'll probably add an interactive REPL for this as well.

Any input would be helpful and appreciated.

Harel

from q.

harelba avatar harelba commented on May 13, 2024

Forgot to write - The readme file of the branch contains the required information about the API.

from q.

harelba avatar harelba commented on May 13, 2024

This capability is now fully supported internally, and exposed partially by running multiple queries on the same command line (Every invocation of q reuses the data between multiple queries that are being run).

This issue will be closed when the feature is fully exposed.

from q.

msangel avatar msangel commented on May 13, 2024

This can be also done like an interactive SQL client, so at the start it loads all the data(into memory, I don't care, I have 32GB ram) and then we can execute the queries. My sample file is like 3GB and waiting for another minute per each query isn't good.
Like:

$ q --client -H data.csv as data
q > select count(*) from data
------------------
|  count(*)     |
------------------
|  10000000     |
------------------
q > select my_field from data where condition=true limit 3
-------------
|  my_field |
-------------
|  val1     |
-------------
|  val2     |
-------------
|  val3     |
-------------

Support for multiple files discussable.

from q.

harelba avatar harelba commented on May 13, 2024

Hi @msangel @Fil

I'm going to release a new version of q soon. It's a large change, which includes inherent caching capabilities similar to the ones you're describing, eliminating the need to wait between multiple queries of the same file.

Harel

from q.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.