Giter VIP home page Giter VIP logo

Comments (7)

robsmith11 avatar robsmith11 commented on June 2, 2024

As a more fair comparison, I ran the same query with python's psycopg2 execute() and fetchall() and the timings are consistent with psql (5-10x faster than LibPQ.jl).

from libpq.jl.

iamed2 avatar iamed2 commented on June 2, 2024

Are you on the latest version of LibPQ.jl? 1.14 introduced some significant performance improvements. Are you measuring parse time or just transit time (i.e. time for execute to finish).

from libpq.jl.

robsmith11 avatar robsmith11 commented on June 2, 2024

[194296ae] LibPQ v1.14.1 on a recent nightly.
I was measuring both the transit time and time to convert to Tables.rowtable, which I thought would be a lot faster, but if is split to two operations the time breakdown is 70/30 over the slower connection. So it looks like both transferring and parsing are each slower with LibPQ.jl than the total time with psycopg2.

from libpq.jl.

iamed2 avatar iamed2 commented on June 2, 2024

Tables.columntable should be significantly faster. Parsing entire columns at once will be a type-stable operation, and we explicitly test for performance on that operation since 1.14.

On the transit time, it would be helpful for replication to have a sense of the schema being pulled. Can this be replicated by pulling 10 million rows of a single double precision column containing 3 unique non-null values?

Are your connections using SSL?

from libpq.jl.

robsmith11 avatar robsmith11 commented on June 2, 2024

Ah you're right, after making a minimal example locally on my laptop and testing more carefully, nearly all the time is taken up by Tables.rowtable.

But while it's true that Tables.columntable is extremely fast. It seems like that's because it's a lazy operation. If I want to actually do anything with the columns, they must first be parsed which allocates for every record.

For example:

postgres=# create table t (x integer, y integer); insert into t (select *,1 from generate_series(1,10000000));
CREATE TABLE
INSERT 0 10000000
julia> c = LibPQ.Connection("postgresql://postgres@localhost:5432/postgres");

julia> @time r = LibPQ.execute(c, "select * from t", binary_format=false, not_null=true);
  1.643387 seconds (85 allocations: 4.484 KiB)

julia> @time t = Tables.columntable(r);
  0.000096 seconds (23 allocations: 1.109 KiB)

julia> @time sum(t.x);
  1.116090 seconds (20.00 M allocations: 686.646 MiB, 8.03% gc time)

julia> typeof(t)
NamedTuple{(:x, :y), Tuple{LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}, LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}}}

(everything was already run once to compile)

Also, it seems like Tables.rowtable is much slower than it should be. In this example, I had to run it on a smaller table because I was impatient, but extrapolating the timing it would take about 500 seconds to complete.

from libpq.jl.

robsmith11 avatar robsmith11 commented on June 2, 2024

Ah nevermind, I forgot to set binary_format=true in the last set of tests. When I used that, the results of Tables.columntable don't need additional parsing and are fast.

I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r)) than just calling Tables.rowtable(r).

from libpq.jl.

iamed2 avatar iamed2 commented on June 2, 2024

I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r)) than just calling Tables.rowtable(r).

This is a really good catch, we should be able to make a change to get Tables to do that automatically.

from libpq.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.