Comments (7)
As a more fair comparison, I ran the same query with python's psycopg2 execute()
and fetchall()
and the timings are consistent with psql (5-10x faster than LibPQ.jl).
from libpq.jl.
Are you on the latest version of LibPQ.jl? 1.14 introduced some significant performance improvements. Are you measuring parse time or just transit time (i.e. time for execute
to finish).
from libpq.jl.
[194296ae] LibPQ v1.14.1
on a recent nightly.
I was measuring both the transit time and time to convert to Tables.rowtable
, which I thought would be a lot faster, but if is split to two operations the time breakdown is 70/30 over the slower connection. So it looks like both transferring and parsing are each slower with LibPQ.jl than the total time with psycopg2.
from libpq.jl.
Tables.columntable
should be significantly faster. Parsing entire columns at once will be a type-stable operation, and we explicitly test for performance on that operation since 1.14.
On the transit time, it would be helpful for replication to have a sense of the schema being pulled. Can this be replicated by pulling 10 million rows of a single double precision
column containing 3 unique non-null values?
Are your connections using SSL?
from libpq.jl.
Ah you're right, after making a minimal example locally on my laptop and testing more carefully, nearly all the time is taken up by Tables.rowtable
.
But while it's true that Tables.columntable
is extremely fast. It seems like that's because it's a lazy operation. If I want to actually do anything with the columns, they must first be parsed which allocates for every record.
For example:
postgres=# create table t (x integer, y integer); insert into t (select *,1 from generate_series(1,10000000));
CREATE TABLE
INSERT 0 10000000
julia> c = LibPQ.Connection("postgresql://postgres@localhost:5432/postgres");
julia> @time r = LibPQ.execute(c, "select * from t", binary_format=false, not_null=true);
1.643387 seconds (85 allocations: 4.484 KiB)
julia> @time t = Tables.columntable(r);
0.000096 seconds (23 allocations: 1.109 KiB)
julia> @time sum(t.x);
1.116090 seconds (20.00 M allocations: 686.646 MiB, 8.03% gc time)
julia> typeof(t)
NamedTuple{(:x, :y), Tuple{LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}, LibPQ.Column{Int32, 0x00000017, Int32, LibPQ.Result{false}, LibPQ.ParseType{Int32}}}}
(everything was already run once to compile)
Also, it seems like Tables.rowtable
is much slower than it should be. In this example, I had to run it on a smaller table because I was impatient, but extrapolating the timing it would take about 500 seconds to complete.
from libpq.jl.
Ah nevermind, I forgot to set binary_format=true
in the last set of tests. When I used that, the results of Tables.columntable
don't need additional parsing and are fast.
I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r))
than just calling Tables.rowtable(r)
.
from libpq.jl.
I also noticed that it's much (orders of magnitude) faster to to Tables.rowtable(Tables.columntable(r)) than just calling Tables.rowtable(r).
This is a really good catch, we should be able to make a change to get Tables to do that automatically.
from libpq.jl.
Related Issues (20)
- collect on LibPQ.Column allocates a lot HOT 1
- Memory Leak? HOT 4
- LibPQ and local peer trust ("ident") HOT 4
- All rational numbers are parsed as Decimals.Decimal instead of Float HOT 3
- Trying to implement large object support HOT 4
- Support UUIDs with binary_format=true HOT 1
- Make Tables.rowtable(result) as fast as Tables.rowtable(Tables.columntable(result))
- `tryparse` Inexact Error HOT 5
- Docs links return 404 HOT 3
- Using array_agg results in a string HOT 2
- I install LibPQ but test fails. HOT 1
- Guess when there is no server running in tests and suggest users install one
- Multiple `async_execute`s may not always execute in sequence?
- `execute` and `async_execute` exception handling difference
- `LibPQ`.`async_execute`'s overhead is around 3ms HOT 2
- Add functions for closing prepared statements (requires PostgreSQL 17 release and corresponding libpq_jll)
- Can't suppress LibPQ's error output HOT 2
- Conflict with Dates.jl HOT 1
- Can't store Vector{UInt8} as expected in bytea
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libpq.jl.