Giter VIP home page Giter VIP logo

Comments (8)

mtuncer avatar mtuncer commented on September 18, 2024

Could you run analyze on the table to see if it makes any difference ? We have seen a substantial difference when a query involved joins or subqueries in similar cases.

from cstore_fdw.

Cahu avatar Cahu commented on September 18, 2024

We tried to run analyze but there was no improvement.

from cstore_fdw.

koppenhoefer avatar koppenhoefer commented on September 18, 2024

We are unfortunately seeing a performance hit (tiny) when using cstore_fdw too! :(
I was hoping to get 2x query-speed improvement at the very least, but no-go for us.
I am using cstore_fdw 1.4.1 (Protobuf-c 1.2.1) with simple tables of 187 million lines of data.

My data looks like this:

 2012-01-01 00:00:00.735056 | 151.6129 | -12.4378 |  17.4 |   9
 2012-01-01 00:00:00.801839 | -70.8298 |  -6.7162 |     9 |   6
 2012-01-01 00:00:00.801826 | -70.9036 |  -6.6841 |  14.3 |  13

And I am comparing the performance between 'normal' Postgres (9.5.3) and CStore using these table structures...

wwlln=> \d rawdata2012
          Table "public.rawdata2012"
 Column |           Type           | Modifiers 
--------+--------------------------+-----------
 date     | timestamp without time zone | 
 lon      | double precision         | 
 lat      | double precision         | 
 resid    | double precision         | 
 obs      | integer                  | 
Indexes: "rawdata2012_date_idx" BTREE (date)
wwlln=> \d rawdata2012_cstore
             Foreign table "public.rawdata2012_cstore"
 Column |            Type             | Modifiers | FDW Options 
--------+-----------------------------+-----------+-------------
 date     | timestamp without time zone |           | 
 lon      | double precision            |           | 
 lat      | double precision            |           | 
 resid    | double precision            |           | 
 obs      | integer                     |           | 
Server: cstore_server
FDW Options: (compression 'pglz', block_row_count '40000', stripe_row_count '600000')

When I imported the data, it was already ordered by date.
I AM seeing a 10x reduction in diskspace...

du -hs /usr/local3/pgsql-9.5.3/data/base/16389 | grep [0-9]G
38G
du -hs /usr/local3/pgsql-9.5.3/data/cstore_fdw | grep [0-9]G
3.7G

But I'm getting no improvement when using the CStore_fdw.

wwlln=# explain (analyze,buffers) select lat from rawdata2012 where date between '2012-01-01' and '2012-01-31' ;
                                                                       QUERY PLAN                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using rawdata2012a_date_idx on rawdata2012  (cost=0.57..541315.67 rows=13759855 width=8) (actual time=0.035..5005.964 rows=12942101 loops=1)
   Index Cond: ((date >= '2012-01-01 00:00:00'::timestamp without time zone) AND (date <= '2012-01-31 00:00:00'::timestamp without time zone))
   Buffers: shared hit=314427
 Planning time: 0.424 ms
 Execution time: 5002.698 ms
when using BTREE indexes:   5002.698, 5032.372, 5032.102
when using BRIN indexes:    5080.413, 5079.190, 5082.345

and here is the (slightly) worse performance using cstore.:

 Foreign Scan on rawdata2012_cstore  (cost=0.00..2995801.76 rows=12170119 width=8) (actual time=6.719..4961.190 rows=12942101 loops=1)
   Filter: ((date >= '2012-01-01 00:00:00'::timestamp without time zone) AND (date <= '2012-01-31 00:00:00'::timestamp without time zone))
   CStore File: /usr/local3/pgsql-9.5.3/data/cstore_fdw/16389/16426
   CStore File Size: 3792626950
   Buffers: shared hit=30996
 Planning time: 0.123 ms
 Execution time: 5895.757 ms, 5827.419, 5839.434, 5837.165

I don't really know how to pursue this,.. besides trying on a different platform with alternate hardware; These tests were done on an aging Xserve3,1 Quad-Core Intel Xeon, 2.26 GHz (2processors 8cores), 24GB memory with local internal storage. I tried playing with the Postgres configuration parameters (restarting the service every time) shared_buffers, work_mem, and effective_cache_size.. but althought results change, the effect is minimal and cstore always behaves worse than not.
I've run analyze on all the tables.

Using a much lower block_row_count did decrease compression (a bit)... but actually made execution-time even worse!

CREATE FOREIGN TABLE rawdata2012_cstoreb (
    timestamp timestamp without time zone,
    lon double precision,
    lat double precision,
    resid double precision,
    obs integer
) SERVER cstore_server
OPTIONS (compression 'pglz', block_row_count '10000', stripe_row_count '150000'
);
wwlln=> explain (analyze,buffers) select avg(lat) from rawdata2012_cstoreb where date between '2012-01-01' and '2012-01-31' ;
         CStore File Size: 3,816,784,464  

Running 'purge' (we're on OSX XServ's here) didn't help matters at all btw.

Any ideas?..
I've already read these:

I do vaguely recall having played with this on my laptop months ago, and seeing a impressive improvement when using cstore_fdw... but that must have been with some other sort of data because I cannot replicate the success on our servers, hélas.

cheers
/me

from cstore_fdw.

koppenhoefer avatar koppenhoefer commented on September 18, 2024

Followup,.. I have, since, recreated the above on my MacbookPro (Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz, 16GB + APPLE SSD SM0512G).

The execution time for the native Postgres data is:

Execution time: 3837.535 ms, 3288.8, 3252.538, 3295.547, 3359.382

The execution time for Cstore data is somewhat better,.. (but not 2x as I expected):

Execution time: 2884.281 ms, 2907.608, 2882.742, 2919.275, 2933.878

Curiously, when I run the same comparison with data from https://www.citusdata.com/blog/2014/04/03/columnar-store-for-analytics/ I see an incredible performance degradation when using Cstore. Why is this?

explain (analyze,buffers) select review_rating from customerreviews where review_date between '1995-07-01' and '1995-07-31' ;
 Index Scan using customerreviews_reviewdate_idx on customerreviews  (cost=0.43..10.29 rows=43 width=4) (actual time=0.007..0.040 rows=71 loops=1)
   Index Cond: ((review_date >= '1995-07-01'::date) AND (review_date <= '1995-07-31'::date))
   Buffers: shared hit=4 read=2
 Planning time: 0.215 ms
 Execution time: 0.052 ms, 0.067 ms, 0.066 ms, 0.067 ms
(5 rows)
customerreviews=# select count(*) from customerreviews;
  count  
---------
 1762504
(1 row)

Compare those native Postgres results with a Postgres/Cstore result:

explain (analyze,buffers) select review_rating from customerreviews_fdw where review_date between '1995-07-01' and '1995-07-31' ;
 Foreign Scan on customer_reviews  (cost=0.00..28581.56 rows=167 width=4) (actual time=5.147..11.413 rows=71 loops=1)
   Filter: ((review_date >= '1995-07-01'::date) AND (review_date <= '1995-07-31'::date))
   Rows Removed by Filter: 9929
   CStore File: /usr/local/pgsql-9.5.3/data/cstore_fdw/16420/16429
   CStore File Size: 105,380,083
   Buffers: shared hit=1016 read=12
 Planning time: 0.064 ms
 Execution time: 12.423 ms, 5.703, 5.940 ms, 5.471 ms, 6.034 ms 
select count(*) from customerreviews_fdw;
  count  
---------
 1762504

Something is wrong, clearly. I restarted the postgres server,.. and even tried rebooting,.. but the difference remains!

Running the query suggested on https://www.citusdata.com/blog/2014/04/03/columnar-store-for-analytics/ does show less of peformance difference... but I'd still like to know why we aren't getting a significant improvement rather than this weak result. :(

Here is the native postgres result:

explain (analyze,buffers) SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM customerreviews
WHERE product_group = 'Book'
GROUP BY title_length_bucket
ORDER BY title_length_bucket;
 Sort  (cost=94273.79..94316.49 rows=17080 width=29) (actual time=794.660..794.660 rows=6 loops=1)
   Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=32 read=52624
   ->  HashAggregate  (cost=92688.76..93073.06 rows=17080 width=29) (actual time=794.625..794.654 rows=6 loops=1)
         Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)
         Buffers: shared hit=32 read=52624
         ->  Seq Scan on customerreviews  (cost=0.00..83687.82 rows=1,200,125 width=29) (actual time=1.911..576.809 rows=1198218 loops=1)
               Filter: (product_group = 'Book'::text)
               Rows Removed by Filter: 564286
               Buffers: shared hit=32 read=52624
 Planning time: 0.081 ms
 Execution time: 952.000 ms, 794.725 ms, 845.545, 819.970, 797.947, 783.744
(13 rows)

and here is the ever-so-slight improvement when using CStore:

explain (analyze,buffers) SELECT
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    count(*)
FROM customerreviews_fdw
WHERE product_group = 'Book'
GROUP BY title_length_bucket
ORDER BY title_length_bucket;
 Sort  (cost=44751.18..44793.60 rows=16969 width=29) (actual time=843.584..843.585 rows=6 loops=1)
   Sort Key: (width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5))
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=1512
   ->  HashAggregate  (cost=43177.25..43559.05 rows=16969 width=29) (actual time=843.546..843.576 rows=6 loops=1)
         Group Key: width_bucket((length(product_title))::double precision, '1'::double precision, '50'::double precision, 5)
         Buffers: shared hit=1512
         ->  Foreign Scan on customerreviews_fdw  (cost=0.00..34212.28 rows=1,195,330 width=29) (actual time=10.312..624.219 rows=1198218 loops=1)
               Filter: (product_group = 'Book'::text)
               Rows Removed by Filter: 564286
               CStore File: /usr/local/pgsql-9.5.3/data/cstore_fdw/16420/16429
               CStore File Size: 105,380,083
               Buffers: shared hit=1512
 Planning time: 0.168 ms
 Execution time: 845.685 ms, 842.528 ms, 739.159 ms, 760.996 ms, 737.626 ms, 724.743 ms
(15 rows)

from cstore_fdw.

samuel-cc-4451 avatar samuel-cc-4451 commented on September 18, 2024

The solution to the issue reported by @Cahu
In function SelectedBlockMask(cstore_reader.c) , instead of looping through the projectedColumnList we should be pulling vars from whereclauselist and looping over that.

from cstore_fdw.

darkleaf avatar darkleaf commented on September 18, 2024

any updates?

from cstore_fdw.

jur3 avatar jur3 commented on September 18, 2024

bump

from cstore_fdw.

darkleaf avatar darkleaf commented on September 18, 2024

@jur3 use Clickhouse or Timescale 😃

UPD: See https://github.com/citusdata/cstore_fdw#important-notice-columnar-storage-is-now-part-of-citus

from cstore_fdw.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.