Giter VIP home page Giter VIP logo

postgres_scanner's Introduction

DuckDB logo

Github Actions Badge discord Latest Release

DuckDB

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs), and more. For more information on using DuckDB, please refer to the DuckDB documentation.

Installation

If you want to install and use DuckDB, please see our website for installation and usage instructions.

Data Import

For CSV files and Parquet files, data import is as simple as referencing the file in the FROM clause:

SELECT * FROM 'myfile.csv';
SELECT * FROM 'myfile.parquet';

Refer to our Data Import section for more information.

SQL Reference

The website contains a reference of functions and SQL constructs available in DuckDB.

Development

For development, DuckDB requires CMake, Python3 and a C++11 compliant compiler. Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version. You should run make unit and make allunit to verify that your version works properly after making changes. To test performance, you can run BUILD_BENCHMARK=1 BUILD_TPCH=1 make and then perform several standard benchmarks from the root directory by executing ./build/release/benchmark/benchmark_runner. The details of benchmarks are in our Benchmark Guide.

Please also refer to our Build Guide and Contribution Guide.

Support

See the Support Options page.

postgres_scanner's People

Contributors

akuzm avatar alitrack avatar carlopi avatar changhiskhan avatar chhetripradeep avatar gpsamson avatar hannes avatar lepfhty avatar lnkuiper avatar mause avatar mytherin avatar samansmink avatar szarnyasg avatar tishj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

postgres_scanner's Issues

load postgres_scanner in R Windows crash R

Hello.

I tried to use the DuckDB postgres_scanner extension on R and RStudio for Windows (x86_64) and it crashed when I do the "LOAD postgres_scanner;". "R Session Aborted" message. I tried both the binary version of DuckDB (0.5.1) and the source version, built and installed with install.packages("duckdb", type = "source"). Both methods failed.

On Linux, Debian bookworm, same machine, it worked correctly. Built the DuckDB R package from source in RStudio.

I used ok this extension with CLI client and DBeaver in Windows and Linux.

Thanks very much for your work. Duckdb is becoming one of my favorite library for data work.

ctid cause scan partition sub-table failure

What happens?

create or replace view biz.gto_action_record as 
SELECT * FROM postgres_scan('postgres://gpadmin:[email protected]:2345/qsf_prod', 'biz', 'rst_ra_skc_org_detail_1_prt_p20221011')
;
select count(*) from biz.gto_action_record;

raise error:

IOException: IO Error: Unable to query Postgres: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
 server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

and then cause gp into recovery mode!

To Reproduce

then go to greenplum server, in pg_log directory, in the log file:
we find:

COPY (SELECT NULL FROM ""tenant_yuege_biz"".""rst_ra_skc_org_detail_1_prt_p20221123"" WHERE ctid BETWEEN '(0,0)':                                                                     :tid AND '(4294967295,0)'::tid ) TO STDOUT (FORMAT binary);
",0,,"postgres.c",1639,
2022-11-23 20:40:43.313946 CST,,,p23011,th0,,,2022-11-23 20:40:42 CST,0,con813,cmd2,seg-1,,,,,"PANIC","XX000","Un                                                                     expected internal error: Master process received signal SIGSEGV",,,,,,,0,,,,"1    0x7fd6fb14a630 libpthread.so.0                                                                      <symbol not found> + 0xfb14a630
2    0xf9cc68 postgres <symbol not found> (CXformUtils.cpp:2234)

and I query the ctid, not these values:

SELECT min(ctid),max(ctid) FROM tenant_yuege_biz.rst_ra_skc_org_detail_1_prt_p20221123;--(33554432,2)	(33554435,17469)
SELECT NULL FROM tenant_yuege_biz.rst_ra_skc_org_detail_1_prt_p20221123 WHERE ctid BETWEEN '(33554432,2)'::tid AND '(33554435,17469)'::tid

but, use these values, also raise error and cause gp recovery mode.

but, the success one db, I query the min and max ctid, and then query the table, is OK:

SELECT min(ctid),max(ctid) FROM tenant_qisefang_biz.rst_ra_skc_org_detail_1_prt_p20221123--(11360,7)	(33325,2)
SELECT NULL FROM tenant_qisefang_biz.rst_ra_skc_org_detail_1_prt_p20221123 WHERE ctid BETWEEN '(11360,7)'::tid AND '(33325,2)'::tid

I confused why this sql add the ctid where condition, I think it's unnecessary。
maybe,remove the ctid, can alse support pg views rather than tables?

OS:

centos7

PostgreSQL Version:

9.4(greenplum6.12)

DuckDB Version:

0.6.0

DuckDB Client:

Python

Full Name:

Changzhen Wang

Affiliation:

linezonedata

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

how to build pip package?

Our project used python,before use pip install duckdb=0.5.1 to get the pip package, which contains postgre scanner default,.
Now I edit the file postgres_scanner.cpp to fulfill my situation, and run make I got binary duckdb and extension , and tested all functions are OK in ubuntu and centos. But how to build pip package? thanks!

DuckDB postgres_attach potentially leaving connections open

What happens?

I set up an Expressjs server with a background process to incrementally update my local DuckDB. The server starts up postgres_attach on server startup, but then uses postgres_scanner every 30 seconds to query postgres for any updates.

I began hitting FATAL: remaining connection slots are reserved for non-replication superuser connections and was unable to start up the server again.

To Reproduce

It's tricky to setup a clean project to reproduce this but the basic outline of what I had was:

// Stop the background job when the server is shutting down
process.on('SIGINT', function () {
  clearTimeout(intervalId);
  process.exit();
});

let intervalId;
const performBackgroundIncrementalUpdate = async () => {
  try {
    // perform background job here
    await db.loadIncrementalUpdates();
  } catch (err) {
    log(err);
  }
  intervalId = setTimeout(performBackgroundIncrementalUpdate, db.MIN_TIME_TO_REFRESH);
};

app.listen(
  port,
  () => {
   await db.run("INSTALL postgres_scanner;");
    await db.run("LOAD postgres_scanner;");
    const dbUrl = process.env.DATABASE_URL;
    await db.run(
      `CALL postgres_attach('${dbUrl}', overwrite=true, filter_pushdown=true);`,
    );
    intervalId = setTimeout(performBackgroundIncrementalUpdate, db.MIN_TIME_TO_REFRESH);
    log(`Analytics service listening on port ${port}!`)
  },
);

Where loadIncrementalUpdates used postgres_scan. After leaving this running a few minutes (and killing and restarting the server), I began hitting the connection error.

Also I found a bunch of connections open in postgres like this:

FROM pg_stat_activity 
WHERE pid <> pg_backend_pid()
AND datname = 'postgres' and usename = 'postgres' and state = 'idle in transaction';

which I killed with pg_terminate_backend(pid).

Removing the postgres_attach (and only using the scan) seems to work now, it consistently queries postgres every 30 sec without the error! It's possible attach is not closing things properly, or it interacts with postgres_scan poorly?

OS:

macOS 12.4

PostgreSQL Version:

psql (14.5 (Homebrew))

DuckDB Version:

v0.6.1

DuckDB Client:

Nodejs

Full Name:

Victor Mota

Affiliation:

PinchedIO

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

IO Error: Unsupported Postgres type jsonb

When I try to attach a database which contains jsonb columns, the command fails

D CALL POSTGRES_ATTACH('');
Error: IO Error: Unsupported Postgres type jsonb

I'd appreciate if it was supported. Even if it's not supported, it would be great if only failed when querying the tables which contain
jsonb.

SSL Support

try to connect to a postgres db that requires SSL gives this error

sslmode value "require" invalid when SSL support is not compiled in

Query result is empty on non-empty tables

I have a postgres instance running in docker with a non-empty test_table:
image

When I select the result directly through POSTGRES_SCAN from this table in duckdb, the result is empty:
image

When I attach the tables using POSTGRES_ATTACH, I can see the table is successfully a attached with correct schema, however the result is still empty:
image

I'm not sure what's happening here. Is there anything that I missed to configure?

Error: Invalid Error: IO Error: Unsupported Postgres type interval

I installed DuckDB on my Mac and tried to connect to a Postgres server running in RDS. The PG instance is PostgreSQL 10.18 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit

D CALL postgres_attach('host=blabla.us-east-2.rds.amazonaws.com port=5432 dbname=my_db connect_timeout=10 password=mypass user=my_user');
  Error: Invalid Error: IO Error: Unsupported Postgres type interval

These are the all the types PG has AFAICT:

+----------------+
|data_type       |
+----------------+
|aclitem         |
|bigint          |
|boolean         |
|"char"          |
|cstring         |
|double precision|
|integer         |
|interval        |
|name            |
|numeric         |
|oid             |
|real            |
|regdictionary   |
|regtype         |
|smallint        |
|text            |
|xml             |
+----------------+

DuckDB postgres scanner connecting to clickhouse

Can it support Clickhouse (which supports psql)?

CALL postgres_attach('postgresql://default:password#@IPAddress:9005/default')

I am happy to share some connection details if anyone is interested. In any case the response was very Clickhouse specific, per below. If I had to guess, DuckDB probes information_schema that Clickhouse in some peculiar way is not compatible with.

Any hope of debugging?

DB::Exception: Cannot convert string BASE TABLE to type UInt8: while executing 'FUNCTION equals(multiIf(is_temporary, 4, like(engine, '%View'), 2, like(engine, 'System%'), 5, equals(has_own_data, 0), 3, 1) : 19, 'BASE TABLE' : 13) -> equals(multiIf(is_temporary, 4, like(engine, '%View'), 2, like(engine, 'System%'), 5, equals(has_own_data, 0), 3, 1), 'BASE TABLE') UInt8 : 20'
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
ERROR: Query execution failed.
DB::Exception: Cannot convert string BASE TABLE to type UInt8: while executing 'FUNCTION equals(multiIf(is_temporary, 4, like(engine, '%View'), 2, like(engine, 'System%'), 5, equals(has_own_data, 0), 3, 1) : 19, 'BASE TABLE' : 13) -> equals(multiIf(is_temporary, 4, like(engine, '%View'), 2, like(engine, 'System%'), 5, equals(has_own_data, 0), 3, 1), 'BASE TABLE') UInt8 : 20'
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

current_timestamp wrong when export to csv

What happens?

When query oracle in duckdb, current_timestamp to be timestamptz format in duckdb,

con.execute("select *,pg_typeof(etl_time) from ar limit 1").fetch_df()
# timestamp with time zone

when copy to csv, it's wrong time in excel:

con.execute("copy (select * from ar limit 1) to 'ar.csv'")

then copy to postgres, same the csv.
in oracle: 2022-12-05 09:44:44.018
export to csv: 2022-12-05 17:42:44.018 PRC
copy to pg: 2022-12-05 17:42:44.018
how to export the right time in csv?

To Reproduce

-- in orcale:
create table test as select current_timestamp as etl_time from dual;
-- in duckdb:
import pandas as pd
import duckdb as dd
df=pd.read_sql_query('select * from test',conn)
con=dd.connect(':memory:')
con.execute("copy df to 'df.csv')

OS:

win10

PostgreSQL Version:

10.2

DuckDB Version:

0.6.0

DuckDB Client:

Python

Full Name:

Changzhen Wang

Affiliation:

linezonedata.com

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Do not return the column instead of raising an exception

What happens?

I have a column in PostgreSQL which is of type int4range, it raises an exception in POSTGRES_SCAN although it would be much useful to just print a warning message and ignore that column, due to raising such an exception I was not able to use the whole table although this field is not that important to me.

same goes with have a multi-dimensional array in PostgreSQL it raises an exception although it would be better to generate a warning and return the table without this specific column.

To Reproduce

To reproduce create a table
CREATE TABLE reservation (room int, during int4range);

you can find those exception in:

  1. line 211 throw IOException("Unsupported Postgres type %s", pgtypename);
    2.line 783 throw NotImplementedException("Only one-dimensional Postgres arrays are supported");

OS:

Ubuntu

PostgreSQL Version:

psql (PostgreSQL) 14.1 (Debian 14.1-5)

DuckDB Version:

0.6.0

DuckDB Client:

c++ and cli

Full Name:

Marwan Hazem

Affiliation:

blnk

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Unsupported Postgres type jsonb

trying to use postgres_scan on a table with jsonb fields and getting the following exception: Unsupported Postgres type jsonb

PostgreSQL User Defined Data Types not supported

What happens?

When I try attach a Postgres database where exists user defined data types, raise a duckdb.IOException: IO Error: Unsupported Postgres type <user_defined_type name>

To discover if a data type is user defined, do:

select 
	ud.typname as user_defined, 
	pd.typname as pg_defined, 
	ud.typtypmod as length_indicator,
	case 
		when (ud.typtypmod > 4 and pd.typname != 'numeric')
			then ud.typtypmod - 4 
		else 0 
	end as varchar_length
from 
	pg_catalog.pg_type ud                     -- user defined
	inner join pg_catalog.pg_type pd          -- postgres defined
		on pd."oid" = ud.typbasetype

Probably #57 is related.

Suggestion: Discover user defined data types, then change to postgres regular data types.

To Reproduce

Postgres:


CREATE DOMAIN my_type_v30 AS VARCHAR(30) NOT NULL;

CREATE DOMAIN my_id AS INT4;

CREATE TABLE my_table (
    table_id my_id PRIMARY KEY,
    table_var varchar(10),
    table_v30 my_type_v30
);

DuckDB/Python:

import duckdb

conn = duckdb.connect()  # create an in-memory database

try:
    conn.execute('LOAD postgres')
except duckdb.IOException:
    conn.execute('INSTALL postgres')
    conn.execute('LOAD postgres')

conn.execute("
    CALL POSTGRES_ATTACH('')
")

OS:

Linux

PostgreSQL Version:

13

DuckDB Version:

0.6.1

DuckDB Client:

Python

Full Name:

Hildeberto Magalhães

Affiliation:

NA

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Build from source code error

The following error occurs when I build from source code through make on the newest commit:

[ 97%] Building CXX object CMakeFiles/postgres_scanner.dir/postgres_scanner.cpp.o
/home/ubuntu/postgresscanner/postgres_scanner.cpp: In constructor ‘PostgresScanFunction::PostgresScanFunction()’:
/home/ubuntu/postgresscanner/postgres_scanner.cpp:941:75: error: invalid conversion from ‘std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ExecutionContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)’ to ‘duckdb::table_function_init_local_t’ {aka ‘std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ClientContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)’} [-fpermissive]
  941 |                      PostgresScan, PostgresBind, PostgresInitGlobalState, PostgresInitLocalState) {
      |                                                                           ^~~~~~~~~~~~~~~~~~~~~~
      |                                                                           |
      |                                                                           std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ExecutionContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)
In file included from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/parallel/pipeline.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/execution/executor.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/main/pending_query_result.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/main/connection.hpp:16,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb.hpp:11,
                 from /home/ubuntu/postgresscanner/postgres_scanner.cpp:2:
/home/ubuntu/postgresscanner/duckdb/src/include/duckdb/function/table_function.hpp:112:44: note:   initializing argument 6 of ‘duckdb::TableFunction::TableFunction(std::string, std::vector<duckdb::LogicalType>, duckdb::table_function_t, duckdb::table_function_bind_t, duckdb::table_function_init_global_t, duckdb::table_function_init_local_t)’
  112 |                table_function_init_local_t init_local = nullptr);
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/home/ubuntu/postgresscanner/postgres_scanner.cpp: In constructor ‘PostgresScanFunctionFilterPushdown::PostgresScanFunctionFilterPushdown()’:
/home/ubuntu/postgresscanner/postgres_scanner.cpp:952:75: error: invalid conversion from ‘std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ExecutionContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)’ to ‘duckdb::table_function_init_local_t’ {aka ‘std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ClientContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)’} [-fpermissive]
  952 |                      PostgresScan, PostgresBind, PostgresInitGlobalState, PostgresInitLocalState) {
      |                                                                           ^~~~~~~~~~~~~~~~~~~~~~
      |                                                                           |
      |                                                                           std::unique_ptr<duckdb::LocalTableFunctionState> (*)(duckdb::ExecutionContext&, duckdb::TableFunctionInitInput&, duckdb::GlobalTableFunctionState*)
In file included from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/parallel/pipeline.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/execution/executor.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/main/pending_query_result.hpp:13,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb/main/connection.hpp:16,
                 from /home/ubuntu/postgresscanner/duckdb/src/include/duckdb.hpp:11,
                 from /home/ubuntu/postgresscanner/postgres_scanner.cpp:2:
/home/ubuntu/postgresscanner/duckdb/src/include/duckdb/function/table_function.hpp:112:44: note:   initializing argument 6 of ‘duckdb::TableFunction::TableFunction(std::string, std::vector<duckdb::LogicalType>, duckdb::table_function_t, duckdb::table_function_bind_t, duckdb::table_function_init_global_t, duckdb::table_function_init_local_t)’
  112 |                table_function_init_local_t init_local = nullptr);
      |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
make[3]: *** [CMakeFiles/postgres_scanner.dir/build.make:63: CMakeFiles/postgres_scanner.dir/postgres_scanner.cpp.o] Error 1
make[3]: Leaving directory '/home/ubuntu/postgresscanner/build/release'
make[2]: *** [CMakeFiles/Makefile2:96: CMakeFiles/postgres_scanner.dir/all] Error 2
make[2]: Leaving directory '/home/ubuntu/postgresscanner/build/release'
make[1]: *** [Makefile:84: all] Error 2
make[1]: Leaving directory '/home/ubuntu/postgresscanner/build/release'
make: *** [Makefile:35: release] Error 2

Is there a way to fix this?

postgres_scanner.duckdb_extension: undefined symbol: _ZN6duckdb28SimpleNamedParameterFunction8ToStringEv

I use BUILD_PYTHON=1 make debug to build the python duckdb, and can import it in python:

import duckdb as dd
dd.__version__
'0.5.2-dev1249'

but, when I load the postgres_scanner,

con = dd.connect(":memory:", config={"allow_unsigned_extensions": True})
url_pgscan="/root/postgres_scanner/build/debug/extension/postgres_scanner/postgres_scanner.duckdb_extension"
con.install_extension(url_pgscan, force_install=True)
con.load_extension("postgres_scanner")

raise error:

duckdb.IOException: IO Error: Extension "/root/.duckdb/extensions/550e8ef/linux_amd64_gcc4/postgres_scanner.duckdb_extension" could not be loaded: /root/.duckdb/extensions/550e8ef/linux_amd64_gcc4/postgres_scanner.duckdb_extension: undefined symbol: _ZN6duckdb28SimpleNamedParameterFunction8ToStringEv

why?
thanks!

supporting MySQL

This is the killer feature I've been waiting for. Generally, though, am hoping that it supports databases that support a given interface (psql is supported by QuestDB and Clickhouse, for example).

It would also be great to see some support for MySQL. The rationale would be: MySQL is as prevalent as Postgres, if not more. Moreover, emergent databases (Clickhouse, Starrocks, Doris, etc) tend to have mysql compatibility. Thus one could possibly grab MySQL data + the data of all the other databases via this protocol.

Issues with PostgreSQL tables having column in jsonb

Postgres Scanner works absolutely fine with databases where the table doesn't contain columns with type jsonb.

Query:

D call postgres_attach(host=**** port=5432 user=***** password=***** dbname=*****');

Error:

Error: Invalid Error: IO Error: Unable to query Postgres: ERROR:  type "enum_*****_status" does not exist
LINE 1: SELECT unnest(enum_range(NULL::enum_*****_status))
                                       ^
 ERROR:  type "enum_*****_status" does not exist
LINE 1: SELECT unnest(enum_range(NULL::enum_*****_status))

image

Postgres Scanner support of Table Dee/Dum

In postgres

foo=# create database deedum; CREATE DATABASE foo=# \c deedum psql (15rc2, server 14.5) You are now connected to database "deedum" as user "benji". deedum=# CREATE TABLE dum(); CREATE TABLE deedum=# CREATE TABLE dee(); CREATE TABLE deedum=# INSERT INTO dee DEFAULT VALUES; INSERT 0 1

In duckdb
D install postgres_scanner; D load postgres_scanner; D call postgres_attach('dbname=deedum'); Error: INTERNAL Error: INTERNAL Error: Failed to bind "postgres_scan": Table function must return at least one column

Expected: Database to attach and tables dee and dum to be queryable.

I was trying to work around duckdb not allowing creation of table dee/dum natively by connecting to a postgres database containing table dee/dum.

Write functionality

This is a “far out there” feature request that I suspect is not even possible to implement (it’s a scanner not a writer after all), but here it goes anyway. Any chance DuckDB could grow Postgres writing functionality?

We have data pipelines where we move data from an analytical database (Redshift and BigQuery) to Postgres. We currently do this by dumping CSVs which we then load into Postgres. This is a pain most of the time and a nightmare when you start running into delimiters in your text columns, nulls vs empty characters, etc. It would be really nice to be able to go from Parquet to Postgres (or even OLAP DB to Postgres) using the simplicity of DuckDB’s ability to scan, buffer and transform data.

Just a thought after a night of hair pulling 😅

Floating point exception (core dumped)

What happens?

Floating point exception (core dumped)

To Reproduce

In Postgres:

create table fail(n numeric(12,7));
insert into fail values (32.8875000);

In DuckDB CLI:

INSTALL postgres;
LOAD postgres;
CALL postgres_attach('...')
select * from fail;

OS:

Ubuntu 20.04.5 LTS

PostgreSQL Version:

15.1

DuckDB Version:

v0.6.1-dev83 dfae126ebb

DuckDB Client:

CLI

Full Name:

Dave Tapley

Affiliation:

JE Fuller

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

how to install?when I make then raise error

git clone https://gitee.com/wonb168/postgresscanner.git cd postgresscanner make

(base) [root@ovf72-1511 install]# cd postgresscanner/
(base) [root@ovf72-1511 postgresscanner]# make
cd duckdb &&
BUILD_TPCDS=1 BUILD_TPCH=1 make release
make[1]: Entering directory /root/install/postgresscanner/duckdb' make[1]: *** No rule to make target release'. Stop.
make[1]: Leaving directory `/root/install/postgresscanner/duckdb'
make: *** [duckdb_release] Error 2
(base) [root@ovf72-1511 postgresscanner]#

memory corruption

when I use pgscan to query gp table, raise memory corruption

my code

[root@project ~]# python
Python 3.8.13 (default, Aug 16 2022, 12:16:29)
[GCC 9.3.1 20200408 (Red Hat 9.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os #必须加这2行!!!
>>> sys.setdlopenflags(os.RTLD_GLOBAL | os.RTLD_NOW)
>>> import duckdb as dd
>>> print(dd.__version__)
0.5.2-dev1295
>>> con = dd.connect(":memory:", config={"allow_unsigned_extensions": True})
>>> con.execute("INSTALL postgres_scanner;")
<duckdb.DuckDBPyConnection object at 0x7f205ccd2670>
>>> con.execute("LOAD postgres_scanner;")
<duckdb.DuckDBPyConnection object at 0x7f205ccd2670>
>>> gp_str='postgres://密码@192.168.0.161:2345/mdmaster_baosheng_test1'
>>> sql="""create schema tenant_baosheng_edw"""
>>> con.execute(sql)
<duckdb.DuckDBPyConnection object at 0x7f205ccd2670>
>>> sql=f"""create view tenant_baosheng_edw.dim_sku as
... SELECT * FROM postgres_scan('{gp_str}', 'tenant_baosheng_edw', 'dim_sku');"""
>>> con.execute(sql)
<duckdb.DuckDBPyConnection object at 0x7f205ccd2670>
>>> con.execute("select count(*) from tenant_baosheng_edw.dim_sku").fetchall()

Error in `python': malloc(): memory corruption: 0x00007f203400d340
detail:

======= Backtrace: =========
/lib64/libc.so.6(+0x82b36)[0x7f2092fb4b36]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7f2092fb778c]
/lib64/libstdc++.so.6(_Znwm+0x1d)[0x7f20893b218d]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb12VectorBuffer20CreateConstantVectorENS_12PhysicalTypeE+0x2b)[0x7f208a2b4e6b]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb12VectorBuffer20CreateConstantVectorERKNS_11LogicalTypeE+0xd)[0x7f208a2b4f8d]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb6Vector9ReferenceERKNS_5ValueE+0x3c)[0x7f208a2cf09c]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb18ExpressionExecutor7ExecuteERKNS_10ExpressionEPNS_15ExpressionStateEPKNS_15SelectionVectorEmRNS_6VectorE+0xd8)[0x7f208a3032a8]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb18ExpressionExecutor17ExecuteExpressionEmRNS_6VectorE+0x46)[0x7f208a3033c6]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb18ExpressionExecutor7ExecuteEPNS_9DataChunkERS1_+0x4b)[0x7f208a3034cb]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZNK6duckdb18PhysicalProjection7ExecuteERNS_16ExecutionContextERNS_9DataChunkES4_RNS_19GlobalOperatorStateERNS_13OperatorStateE+0x13)[0x7f208a49ada3]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb16PipelineExecutor7ExecuteERNS_9DataChunkES2_m+0x176)[0x7f208aa71426]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb16PipelineExecutor19ExecutePushInternalERNS_9DataChunkEm+0x6b)[0x7f208aa7161b]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb16PipelineExecutor7ExecuteEm+0x5f)[0x7f208aa719ff]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb12PipelineTask11ExecuteTaskENS_17TaskExecutionModeE+0x31)[0x7f208aa78fc1]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb12ExecutorTask7ExecuteENS_17TaskExecutionModeE+0x21)[0x7f208aa705f1]
/opt/rh/rh-python38/root/usr/lib64/python3.8/site-packages/duckdb-0.5.2.dev1295+gd3db91f-py3.8-linux-x86_64.egg/duckdb.cpython-38-x86_64-linux-gnu.so(_ZN6duckdb13TaskScheduler14ExecuteForeverEPSt6atomicIbE+0xb8)[0x7f208aa74c08]
/lib64/libstdc++.so.6(+0xb5330)[0x7f2089408330]
/lib64/libpthread.so.0(+0x7ea5)[0x7f2093a10ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f2093030b0d]
======= Memory map: ========
00400000-00401000 r--p 00000000 08:01 4196452                            /opt/rh/rh-python38/root/usr/bin/python3.8
00401000-00402000 r-xp 00001000 08:01 4196452                            /opt/rh/rh-python38/root/usr/bin/python3.8
00402000-00403000 r--p 00002000 08:01 4196452                            /opt/rh/rh-python38/root/usr/bin/python3.8
00403000-00404000 r--p 00002000 08:01 4196452                            /opt/rh/rh-python38/root/usr/bin/python3.8
00404000-00405000 rw-p 00003000 08:01 4196452                            /opt/rh/rh-python38/root/usr/bin/python3.8
01480000-022b9000 rw-p 00000000 00:00 0                                  [heap]
7f2034000000-7f2034021000 rw-p 00000000 00:00 0
7f2034021000-7f2038000000 ---p 00000000 00:00 0
7f2038000000-7f2038021000 rw-p 00000000 00:00 0
7f2038021000-7f203c000000 ---p 00000000 00:00 0
7f203c000000-7f203c021000 rw-p 00000000 00:00 0
7f203c021000-7f2040000000 ---p 00000000 00:00 0
7f2044000000-7f2044021000 rw-p 00000000 00:00 0
7f2044021000-7f2048000000 ---p 00000000 00:00 0
7f204afe6000-7f204afe7000 ---p 00000000 00:00 0
7f204afe7000-7f204b7e7000 rw-p 00000000 00:00 0
7f204b7e7000-7f204b7e8000 ---p 00000000 00:00 0
7f204b7e8000-7f204bfe8000 rw-p 00000000 00:00 0
7f204bfe8000-7f204bfe9000 ---p 00000000 00:00 0
7f204bfe9000-7f204c7e9000 rw-p 00000000 00:00 0
7f204c7e9000-7f204c7ea000 ---p 00000000 00:00 0
7f204c7ea000-7f204cfea000 rw-p 00000000 00:00 0
7f204cfea000-7f204cfeb000 ---p 00000000 00:00 0
7f204cfeb000-7f204d7eb000 rw-p 00000000 00:00 0
7f204d7eb000-7f204d7ec000 ---p 00000000 00:00 0
7f204d7ec000-7f204dfec000 rw-p 00000000 00:00 0
7f204dfec000-7f204dfed000 ---p 00000000 00:00 0
7f204dfed000-7f204e7ed000 rw-p 00000000 00:00 0
7f204e7ed000-7f204e7ee000 ---p 00000000 00:00 0
7f204e7ee000-7f204efee000 rw-p 00000000 00:00 0
7f204efee000-7f204efef000 ---p 00000000 00:00 0
7f204efef000-7f204f7ef000 rw-p 00000000 00:00 0
7f204f7ef000-7f204f7f0000 ---p 00000000 00:00 0
7f204f7f0000-7f204fff0000 rw-p 00000000 00:00 0
7f204fff0000-7f204fff1000 ---p 00000000 00:00 0
7f204fff1000-7f20507f1000 rw-p 00000000 00:00 0
7f20507f1000-7f20507f2000 ---p 00000000 00:00 0
7f20507f2000-7f2050ff2000 rw-p 00000000 00:00 0
7f2050ff2000-7f2050ff3000 ---p 00000000 00:00 0
7f2050ff3000-7f20517f3000 rw-p 00000000 00:00 0
7f20517f3000-7f20517f4000 ---p 00000000 00:00 0
7f20517f4000-7f2051ff4000 rw-p 00000000 00:00 0
7f2051ff4000-7f2051ff5000 ---p 00000000 00:00 0
7f2051ff5000-7f20527f5000 rw-p 00000000 00:00 0
7f20527f5000-7f20527f6000 ---p 00000000 00:00 0
7f20527f6000-7f2052ff6000 rw-p 00000000 00:00 0
7f2052ff6000-7f2052ff7000 ---p 00000000 00:00 0
7f2052ff7000-7f20537f7000 rw-p 00000000 00:00 0
7f20537f7000-7f20537f8000 ---p 00000000 00:00 0
7f20537f8000-7f2053ff8000 rw-p 00000000 00:00 0
7f2053ff8000-7f2053ff9000 ---p 00000000 00:00 0
7f2053ff9000-7f20547f9000 rw-p 00000000 00:00 0
7f20547f9000-7f20547fa000 ---p 00000000 00:00 0
7f20547fa000-7f2054ffa000 rw-p 00000000 00:00 0
7f2054ffa000-7f2054ffb000 ---p 00000000 00:00 0
7f2054ffb000-7f20557fb000 rw-p 00000000 00:00 0
7f20557fb000-7f20557fc000 ---p 00000000 00:00 0
7f20557fc000-7f2055ffc000 rw-p 00000000 00:00 0
7f2055ffc000-7f2055ffd000 ---p 00000000 00:00 0
7f2055ffd000-7f20567fd000 rw-p 00000000 00:00 0
7f20567fd000-7f20567fe000 ---p 00000000 00:00 0
7f20567fe000-7f2056ffe000 rw-p 00000000 00:00 0
7f2056ffe000-7f2056fff000 ---p 00000000 00:00 0
7f2056fff000-7f20577ff000 rw-p 00000000 00:00 0
7f20577ff000-7f2057800000 ---p 00000000 00:00 0
7f2057800000-7f2058022000 rw-p 00000000 00:00 0
7f2058022000-7f205c000000 ---p 00000000 00:00 0
7f205c197000-7f205c200000 r-xp 00000000 08:01 2497030                    /root/.duckdb/extensions/d3db91f/linux_amd64_gcc4/postgres_scanner.duckdb_extension
7f205c200000-7f205c400000 ---p 00069000 08:01 2497030                    /root/.duckdb/extensions/d3db91f/linux_amd64_gcc4/postgres_scanner.duckdb_extension
7f205c400000-7f205c403000 r--p 00069000 08:01 2497030                    /root/.duckdb/extensions/d3db91f/linux_amd64_gcc4/postgres_scanner.duckdb_extension
7f205c403000-7f205c404000 rw-p 0006c000 08:01 2497030                    /root/.duckdb/extensions/d3db91f/linux_amd64_gcc4/postgres_scanner.duckdb_extension
7f205c404000-7f205c405000 ---p 00000000 00:00 0
7f205c405000-7f205cc05000 rw-p 00000000 00:00 0
7f205cc05000-7f205cc14000 r--p 00000000 08:01 7080731                    /opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/pyarrow/_parquet.cpython-38-x86_64-linux-gnu.so
7f205cc14000-7f205cc4e000 r-xp 0000f000 08:01 7080731                    /opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/pyarrow/_parquet.cpython-38-x86_64-linux-gnu.so
7f205cc4e000-7f205cc5b000 r--p 00049000 08:01 7080731                    /opt/rh/rh-python38/root/usr/local/lib64/python3.8/site-packages/pyarrow/_parquet.cpython-38-x86_64-linux-gnu.soAborted

IO Error: Unsupported Postgres type _bpchar

When I try to connect to one specific postgres database (I don't have issues with others), I get the following error:

$ ./duckdb
v0.5.1 7c111322d
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D INSTALL postgres_scanner; LOAD postgres_scanner;
D CALL postgres_attach('host=localhost port=5432 dbname=dbname user=user password=pw');
Error: IO Error: Unsupported Postgres type _bpchar

The postgres version is PostgreSQL 12.6 (Ubuntu 12.6-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit.

I see that the error message is coming from: https://github.com/duckdblabs/postgres_scanner/blob/e3202f2acef75b91905e6595a605eb8ab9bb4849/postgres_scanner.cpp#L198

The database has multiple tables, and I am using these column types:

integer
character(19)
character(19)[]
boolean
integer[]
text
text[]
timestamp without time zone

Plus a couple of custom enums. Is there any of these types not implemented yet? Or maybe it is related to something else? Any clues? Let me know if you need more info.

CMake Error during compiling

Hi, I'm trying to compile this project on Ubuntu 18.04.
I use CLion IDE and build a "cmake-build-debug" directory at the root page.
Then I run cmake .. and it shows the error "CMake Error at CMakeLists.txt:117 (build_loadable_extension): Unknown CMake command "build_loadable_extension"."
So I add this code add_subdirectory(duckdb) at Line116.
图片
After that, I clean the cmake cache and compile the project again, then I got the error inside the “duckdb” directory:

fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
CMake Warning at duckdb/CMakeLists.txt:363 (message):
  Please use a recent compiler for debug builds

You have called ADD_LIBRARY for library loadable_extension_demo_loadable_extension without any source files. This typically indicates a problem with your CMakeLists.txt file
-- Configuring done
CMake Error: CMake can not determine linker language for target: loadable_extension_demo_loadable_extension
CMake Error: Cannot determine link language for target "loadable_extension_demo_loadable_extension".
-- Generating done

I'm wondering what I can do to fix this CMake Error.
Thanks!

How to specify the "filter_pushdown" parameters for each table?

Hi!
First of all : Duckdb is really great!
I'm trying to specify the filter_pushdown parameters by using postgres_scan_pushdown function for each table, however, it seems that it can only be used once in a query.
I got this exception: "duckdb.BinderException: Binder Error: Duplicate alias "postgres_scan_pushdown" in query!"
Here is my query:

SELECT l_orderkey, SUM(l_extendedprice*(1.0-l_discount)) as revenue, o_orderdate, o_shippriority 
FROM postgres_scan_pushdown('host=localhost user=xxx dbname=tpch_sf1 port=xxx','public','customer'), postgres_scan_pushdown('host=localhost user=xxx dbname=tpch_sf1 port=xxx','public','orders'), 
lineitem 
WHERE c_mktsegment = 'BUILDING' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < '1995-03-15' AND l_shipdate > '1995-03-15' GROUP BY l_orderkey, o_orderdate, o_shippriority ORDER BY revenue DESC, o_orderdate LIMIT 10;

I wonder what can I do to specify filter_pushdown the parameter for each table.
Thanks~

can't scan partitional son table

What happens?

I use postgres_scan to query patitional son table, like this:

create or replace view tenant_qisefang_biz.gto_action_record as SELECT * FROM postgres_scan('postgres://gpadmin:[email protected]:2345/qsf_prod', 'biz', 'rst_ra_skc_org_detail_1_prt_p20221011')

Create view all OK, but when query, I tested three db, only One(qsf_prod) is OK, other two are error:

IOException: IO Error: Unable to query Postgres: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
 server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

To Reproduce

I dont't know how to reproduce it,
qsf_test and qsf_prod are same tenant in different environment, have the same DDL about the partition table.
the three table attitudes in pg_class:

envirement relname relnamespace reltype reloftype relowner relam relfilenode reltablespace relpages reltuples relallvisible reltoastrelid relhasindex relisshared relpersistence relkind relstorage relnatts relchecks relhasoids relhaspkey relhasrules relhastriggers relhassubclass relispopulated relreplident relfrozenxid relminmxid relacl reloptions
qsf-test rst_ra_skc_org_detail_1_prt_p20221011 236645463 328576078 0 236645467 0 69327401 236645454 9201 39807 0 353502973 TRUE FALSE p r h 223 1 FALSE FALSE FALSE FALSE FALSE TRUE d 24425733 1 {backend_qisefang_test=arwdDxt/etl_qisefang_test,etl_qisefang_test=arwdDxt/etl_qisefang_test,model_qisefang_test=arwdDxt/etl_qisefang_test,query_qisefang_test=arwdDxt/etl_qisefang_test,etl_qisefang_test_readonly=arwdDxt/etl_qisefang_test,linezone_t2_readonly=r/etl_qisefang_test} NULL
qsf-prod rst_ra_skc_org_detail_1_prt_p20221011 176052483 210101537 0 173967062 0 42205523 173967034 7621 12997 0 210101542 TRUE FALSE p r h 223 1 FALSE FALSE FALSE FALSE FALSE TRUE d 14160586 1 {etl_qisefang_prod=arwdDxt/etl_qisefang_prod,etl_qisefang_prod_readonly=arwdDxt/etl_qisefang_prod,backend_qisefang_prod=arwdDxt/etl_qisefang_prod,model_qisefang_prod=arwdDxt/etl_qisefang_prod,query_qisefang_prod=arwdDxt/etl_qisefang_prod,linezone_t2_readonly=r/etl_qisefang_prod} NULL
yg-prod rst_ra_skc_org_detail_1_prt_p20221011 63377625 210493179 0 26947415 0 42296994 26947402 48655 542035 0 210493183 TRUE FALSE p r c 223 1 FALSE FALSE FALSE FALSE FALSE TRUE d 0 0 {backend_yuege_prod=arwdDxt/etl_yuege_prod,etl_yuege_prod=arwdDxt/etl_yuege_prod,model_yuege_prod=arwdDxt/etl_yuege_prod,query_yuege_prod=arwdDxt/etl_yuege_prod,etl_yuege_prod_readonly=arwdDxt/etl_yuege_prod,linezone_t2_readonly=r/etl_yuege_prod} {appendonly=true,orientation=column,compresstype=zlib,compresslevel=5}

OS:

centos7

PostgreSQL Version:

9.4(greenplum6.12)

DuckDB Version:

0.6.0

DuckDB Client:

Python

Full Name:

Changzhen Wang

Affiliation:

linezonedata

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Can't connect - Error: Decimal with a width of 255 is bigger than the maximum supported of 38.

unfortunately, I couldn't connect to any of my postgresql databases through the duckdb scanner due to 2 types of errors.

first one: Decimal with a width of 255 is bigger than the maximum supported of 38.
I tried with both mechanisms, to connect to the database and to connect to just a table.
call postgres_attach('dbname=bassam');
select * from postgres_scan('dbname=bassam', 'public', 'test');
Screen Shot 2022-09-30 at 10 50 55 PM

Screen Shot 2022-09-30 at 10 50 55 PM

the only database that I could able to connect to is the one that having 0 tables.

the second problem: is that the database has not been invalidated!
D select * from postgres_scan('dbname=bassam', 'public', 'untitled_table');
Screen Shot 2022-09-30 at 10 54 14 PM

you can replicate the issue with the this dataset below:
https://github.com/bassamsdata/earthquake_data/blob/main/earthquake.csv

postgresql version: 14.5
duckdb version: 0.5.1
macOS: 12.6

thanks a lot for your efforts.

how to attach many schemas?

call postgres_attach default schema is public,and how to attach many schemas?attach a db,duckdb vs pg can support schema to schema?

Filtering by char column doesn't work correctly in Postgres_scanner

What happens?

When filtering by a char column the query doesn't return correct results if the filter is not pushed down to Postgres. For example in the customer table of TPC-H, the c_mktsegment is usually a char(10).

select count(*), c_mktsegment from customer where c_mktsegment = 'BUILDING' group by c_mktsegment;

doesn't return any results when the filter is not pushed down to Postgres. However when the filter is pushed down to Postgres, the query returns the correct result. I believe its because Postgres ignores the blank space padding of the column but duckdb does not.

Similarly, if the query is written explicitly with the blank spaces padding, and the filter isn't pushed down, then this also works OK.

select count(*), c_mktsegment from customer where c_mktsegment = 'BUILDING ' group by c_mktsegment;

To Reproduce

  • Create the TPC-H customer table in postgres and populate it with data of any scale. Note that the c_mktsegment is a char(10)
create table tpch.customer
(
    c_custkey    integer        not null,
    c_name       varchar(25)    not null,
    c_address    varchar(40)    not null,
    c_nationkey  integer        not null,
    c_phone      char(15)       not null,
    c_acctbal    numeric(15, 2) not null,
    c_mktsegment char(10)       not null,
    c_comment    varchar(117)   not null
);
  • Link duckdb to the postgres schema and don't enable filter pushdown

CALL postgres_attach('host=localhost port=28815 dbname=tpch', source_schema='tpch', filter_pushdown=false, overwrite=true);

  • Run the query below, you'll see that no results are returned
    select count(*), c_mktsegment from customer where c_mktsegment = 'BUILDING' group by c_mktsegment;

OS:

MacOS

PostgreSQL Version:

15

DuckDB Version:

0.6.1

DuckDB Client:

CLI

Full Name:

Olo Sawyerr

Affiliation:

None. Hobby projects.

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

postgres_scan error with enums with multiple schemas

What happens?

trying to run the following query:
select
i.*
from
postgres_scan('', 'ketteq_planning', 'route') i

and receiving the following error:
SQL Error: java.sql.SQLException: IO Error: Unable to query Postgres: ERROR: type "route_type" does not exist
LINE 1: SELECT unnest(enum_range(NULL::route_type))

the proper query to get enum values should be
SELECT unnest(enum_range(NULL::ketteq_planning.route_type))

i.e. schema is missing - ketteq_planning

To Reproduce

create a schema
create an enum type
create a table that references that enum
try running postgres_scan

OS:

windows

PostgreSQL Version:

14.5

DuckDB Version:

0.6.0

DuckDB Client:

dbeaver

Full Name:

Andrey Kornienko

Affiliation:

ketteQ

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

can scan postgre's function?

Now postgres_scanner can only scan pg table, but we have some table functions also.
Can scanner add funtion link in the future?

Please clarify what this does in the README

@hannes @Mytherin - Could you please explain with 2-3 sentences what postgresscanner does at the top of the README? I mean from a potential new user perspective, e.g. I'm looking for a PostgreSQL -> DuckDB sync tool and I wonder if this is relevant.

The README example for POSTGRES_ATTACH mentions SQLite. Is that correct or a typo? Does this have anything to do with SQLite?

Thanks!

Decimal out of range error when trying to read a NUMERIC type

Create the following Postgres table:

-- postgres
create table numeric_test(x int, y numeric, z numeric);
insert into numeric_test values (0), (0), (0);

Then try to read it with the postgres scanner:

-- duckdb
SELECT * FROM postgres_scan('', 'public', 'numeric_test');

This results in the following internal error:

INTERNAL Error: Decimal has a width of 255 which is bigger than the maximum supported width of 38

Parallel Scans not partitioning?

I have two (related) questions around postgres_scan and postgres_scan_pushdown. I am testing these functions to fetch data from a Postgres read replica. I'm finding that the "parallelization" and TID scan stuff doesn't seem to work as advertised in the blog, but maybe I'm just not understanding how it works.

Consider the following example query:

select prediction_log_id from rep.tbl where created_at between '2022-09-30'::timestamp and '2022-10-01'::timestamp

(maybe irrelevant, but there is an index on created_at).

Now, I ran this in DuckDB using both functions to see how the extension works with/without predicate pushdown and I checked on the PG instance using:

SELECT pid, state, age(clock_timestamp(), query_start), usename, query from pg_stat_activity where query != '<IDLE>' and query not like '%pg_stat_activity%' and state='active' ORDER BY query_start desc;

With postgres_scan_pushdown, I get something like this:

COPY (SELECT "prediction_log_id" FROM "rep"."tbl" WHERE ctid BETWEEN '(13582000,0)'::tid AND '(13583000,0)'::tid AND ("created_at" >= '2022-09-30 00:00:00' AND "created_at" <= '2022-10-01 00:00:00' AND "created_at" IS NOT NULL)) TO STDOUT (FORMAT binary);

Frequently refreshing, the ctid range indeed increments by 1000 as the blog post indicates, but there isn't any parallel queries run, it's just 1 process. Is that intentional?

With postgres_scan, I get something like this:

COPY (SELECT "prediction_log_id" FROM "rep"."tbl" WHERE ctid BETWEEN '(0,0)'::tid AND '(1000,0)'::tid ) TO STDOUT (FORMAT binary);

But this time I get three of these queries at a time, but they don't have different ctid ranges as I expected (like, literally the same exact query, three times). So in this case, looks like there's parallelization, but you don't have separate ranges going on at once. Waiting a while and refreshing, I'd find a fresh new set of three queries with updated ctid ranges but all the same (eg. all updated from 0-1000 to 1000-2000). Is that intentional?

Also interesting tidbit: after testing with different tables and queries, I find that the postgres_scan_pushdown function never starts with a 0-1000 CTID range, but somewhere in the middle. Does this mean the pushdown function is smarter with reading TIDs?

I am using DuckDB version 0.5.1, reading from PostgreSQL 10.18

why scan partition table error?

If query the root partition table need 40s, but if query directly the son table only 10s in greenplum.
So, I postgres_scanner the root table in duckdb, the sql run 40s,
but scan the son table, create view success,

create or replace view biz.rst_ra_skc_org_detail_20221118 as SELECT * FROM postgres_scan('postgres://gpadmin:[email protected]:2345/prod', 'biz', 'rst_ra_skc_org_detail_1_prt_p20221118')

but run the sql, error, :

con.execute("select count(*) from biz.rst_ra_skc_org_detail_20221118").fetch_df()

error:

-- 
drop table if exists tmp_rst_ra_skc_org_detail;
create temp table tmp_rst_ra_skc_org_detail as
select b.operating_unit_sk,a.skc_sk::int as skc_sk
      ,a.human_allot_out_org_sk::int human_allot_out_org_sk,a.human_allot_in_org_sk::int human_allot_in_org_sk
      ,a.out_org_before_ra_include_stock_qty,a.in_org_before_ra_include_stock_qty
      ,a.out_org_after_human_ra_include_stock_qty ,a.in_org_after_human_ra_include_stock_qty
from biz.rst_ra_skc_org_detail_20221118 a
--from (select * from biz.rst_ra_skc_org_detail where day_date = '2022-11-18') a
inner join(select batch_id ,operating_unit_sk from tenant_yuege_biz.gto_action_record) b on a.batch_id::int = b.batch_id
where a.day_date = '2022-11-18' and 
a.ra_source <> '3' and a.is_deleted = '0'

---------------------------------------------------------------------------
IOException                               Traceback (most recent call last)
Cell In [271], line 23
      9 sql=f"""-- 
     10 drop table if exists tmp_rst_ra_skc_org_detail;
     11 create temp table tmp_rst_ra_skc_org_detail as
   (...)
     20 a.ra_source <> '3' and a.is_deleted = '0'
     21 {v_where}"""
     22 print(sql)
---> 23 con.execute(sql)#43.3s rst_ra_skc_org_detail_20221118 rst_ra_skc_org_detail_20221118 

IOException: IO Error: Unable to query Postgres: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
 server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

just count also error:

IOException                               Traceback (most recent call last)
Cell In [272], line 1
----> 1 con.execute("select count(*) from tenant_yuege_biz.rst_ra_skc_org_detail_20221118").fetch_df()

IOException: IO Error: Unable to query Postgres: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
 server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

Indefinite hang with Python module

What happens?

I can successfully CALL postgres_attach from the duckdb CLI, but it hangs indefinitely via Python module.

To Reproduce

Get a local Postgres server running.

Verify this works in duckdb CLI:

INSTALL postgres;
LOAD postgres;
CALL postgres_attach('host=localhost dbname=postgres user=postgres password=password');

Run python and paste:

db = duckdb.connect()
db.execute('''
INSTALL postgres;
LOAD postgres;
CALL postgres_attach('host=localhost dbname=postgres user=postgres password=password');
''')

Observe it hangs indefinitely.

OS:

Ubuntu 18.04.01

PostgreSQL Version:

Ubuntu 10.22-0ubuntu0.18.04.1

DuckDB Version:

0.6.0

DuckDB Client:

Python

Full Name:

Dave Tapley

Affiliation:

JE Fuller

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Selection not being pushed down by 'postgres_scan_pushdown'

It looks like the postgres_scan_pushdown is not behaving as intended or more documentation is needed. I am running the following query:

SELECT *
FROM postgres_scan_pushdown('host=localhost port=5432 dbname=name user=user password=pw connect_timeout=1', 'public', 'entries')
WHERE id in (15078724, 14345981, 7502816, 19982348, 19955057, 5595804, 3090984, 15114757, 4342971, 3684533, 15332157, 7977537, 6968341, 3565833, 13419215, 7214809, 4768957, 11041866, 15094037, 3195645, 3630081, 3748183, 3836289, 3895904, 4069128, 4235944, 4368211, 4514836, 4734324, 17768845, 6413051, 6568003, 6681069, 3338266, 334
5805, 1696061, 3382514, 6942962, 7316728, 3659078);

It takes exactly the same amount of time (~16 seconds) that the same query without the WHERE clause, or the same query with the WHERE clause but with postgres_scan instead of postgres_scan_pushdown. It looks like if the postgres_scan_pushdown is not actually pushing down the filters (i.e., they are not run by postgres) and it is getting the full table and then filtering locally.

However, the following query seems to work as expected:

SELECT *
FROM postgres_scan_pushdown('host=localhost port=5432 dbname=name user=user password=pw connect_timeout=1', 'public', 'entries')
WHERE id in (15078724);

It returns the results very quickly (below a second). While it is slower (~16 seconds), as expected, if I use the pushdown version:

SELECT *
FROM postgres_scan('host=localhost port=5432 dbname=name user=user password=pw connect_timeout=1', 'public', 'entries')
WHERE id in (15078724);

Hence, it seems that the pushdown function only works with certain (very simple?) conditions. I am using duckdb v0.5.1 7c111322d. Is this a bug or is it an undocumented limitation?

unsupported type uuid

duckdb version: v0.5.1 7c111322d

To reproduce:

-- in postgres
CREATE TABLE example (
  id uuid
);
-- in duckdb
D install postgres_scanner;
D load postgres_scanner;
D select * from postgres_scan('dbname=db', 'public', 'example') limit 10;
Error: Invalid Error: IO Error: Unsupported Postgres type uuid

I have a patch that seems to be working. I updated a test and got make test to pass locally (with some changes to the build/test scripts that are not included in the patch).

Love what you guys are doing. I can open a pull request if you'd like.

IO Error: Unsupported Postgres type tsvector

What happens?

When I try either:
CALL postgres_attach('dbname=postgres');
or
SELECT * FROM postgres_scan('dbname=postgres', 'public', 'twitter_users');

on a Postgres database that uses tsvector, DuckDB fails with:
Error: Invalid Error: IO Error: Unsupported Postgres type tsvector

To Reproduce

Create a Postgres table with tsvector and run:

D INSTALL postgres_scanner;
D LOAD postgres_scanner;
D CALL postgres_attach('dbname=postgres');

in DuckDb.

Confirmed it reproduces on master of DuckDB and master branch of postgres_scanner (built from source):

duckdb-binaries-osx|⇒ ./duckdb -unsigned
v0.5.2-dev1318 b4f5e6ea46
Enter ".help" for usage hints.
D LOAD 'postgres_scanner/build/release/extension/postgres_scanner/postgres_scanner.duckdb_extension';
D CALL postgres_attach('dbname=postgres');
Error: IO Error: Unsupported Postgres type tsvector

OS:

macOS 12.4

PostgreSQL Version:

psql (14.5 (Homebrew))

DuckDB Version:

v0.5.1 7c111322d

DuckDB Client:

CLI

Full Name:

Victor Mota

Affiliation:

PinchedIO

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Connecting with `CALL postgres_attach` loads tables, but displays error

What happens?

I have tested the DuckDB with the CLI and in Python.
The preferred method to connect to the postgres database connects throws an exception, but still shows the tables. Not sure if it is a bug or something wrong from my end

Using libpq connection string

I can access the tables, but it displays an error when attempting to attach to the database session

To Reproduce

Attaching in postgres and python works but displays an error

INSTALL postgres; LOAD postgres;
CALL postgres_attach('dbname=idau user=idau host=127.0.0.1');

Error: Invalid Error: IO Error: Unsupported Postgres type oid

Or

 CALL postgres_attach('postgresql://idau@localhost:5432/idau');

Error: Invalid Error: IO Error: Unsupported Postgres type oid

Same issue for python

import duckdb
db = duckdb.connect()
db.execute('''
    INSTALL postgres;
    LOAD postgres;
    CALL postgres_attach('host=127.0.0.1 dbname=idau user=idau port=5432');
''')

Fails with: Error: Invalid Error: IO Error: Unsupported Postgres type oid

Shows no errors and works

import duckdb
conn = duckdb.connect()
conn.execute("SELECT * FROM postgres_scan('postgresql://idau@localhost:5432/idau', 'public', '<some_table>')").fetchall()

OS:

MacOS 12.6

PostgreSQL Version:

14.6

DuckDB Version:

v0.6.1 919cad22e8 (CLI) and v0.6.2.dev1675 (Python)

DuckDB Client:

CLI and Python

Full Name:

Markus Sagen

Affiliation:

ZagenDuo

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

scanner is powerful, but lack of some features

postgres_scanner can read greenplum table directly, but lack of some features:

  1. only support 1 schema, but sql may join different schemas tables, eg. select * from sch1.table1 join sch2.table2 on ...
  2. not support uuid、array、json/jsonb columns
  3. not support nemeric if not have length
    sample data:
create table test_scanner(id uuid,code text,sizes int[],skc jsonb,number numeric)
;
insert into test_scanner values(gen_random_uuid(),'test', array[1,2,3],'{"a":1,"b":2}'::jsonb,1234.5000);

we use greenplum as HTAP, and night is busy time, but greenplum not support expand temporary,
if postgres_scanner is fully functional, the separation of memory and calculation can be realized,
and duckdb will be a very powerful tool, just like a spark for hive, and Duckdb+Greenplum will be surpass Spark+Hive in many situations.

pgconfigure: line 9: ./configure: No such file or directory

The latest code(main branch) can't compile by make:

[ 68%] Building CXX object extension/jemalloc/jemalloc/CMakeFiles/jemalloc.dir/src/hpdata.cpp.o

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error is not recoverable: exiting now
[ 69%] Building CXX object extension/jemalloc/jemalloc/CMakeFiles/jemalloc.dir/src/inspect.cpp.o
[ 69%] Building CXX object extension/jemalloc/jemalloc/CMakeFiles/jemalloc.dir/src/jemalloc.cpp.o
pgconfigure: line 9: ./configure: No such file or directory
gmake[3]: *** [../../postgres/src/common/base64.c] Error 127
gmake[3]: Leaving directory `/home/linezone_t2/wcz/postgres_scanner/build/release'
gmake[2]: *** [extension/postgres_scanner/CMakeFiles/postgres_scanner_loadable_extension.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....

postgres_scan can scan partiton table?and more, can scan a sql query?

I hava a partition table tenant_qisefang_biz.rst_ra_skc_org_detail in greenplum,
and it has a subtable tenant_qisefang_biz.rst_ra_skc_org_detail_1_prt_p20221028,
and I use postgres_scan to create view in duckdb,

sql=f"""create view tenant_qisefang_biz.v_rst_ra_skc_org_detail as
SELECT * FROM postgres_scan('{gp_str}', 'tenant_qisefang_biz', 'rst_ra_skc_org_detail');"""
sql=f"""create view tenant_qisefang_biz.v_rst_ra_skc_org_detail_1_prt_p20221028 as
SELECT * FROM postgres_scan('{gp_str}', 'tenant_qisefang_biz', 'rst_ra_skc_org_detail_1_prt_p20221028');"""
con.execute(sql)

If directly query the subtable, cost 5.8s to get the count(),
but query the main table and use where contition, after 6 minutes can not output , I killed the query.
The table has 60 subtable like rst_ra_skc_org_detail_1_prt_p20221028.
And why count(
) is so slowly, even directly query the subtable ,shouldn't have been more than 1s.(In gp, less than 1s)

con.execute("select count(*) from tenant_qisefang_biz.v_rst_ra_skc_org_detail_1_prt_p20221028").fetch_df()#956098
con.execute("select count(*) from tenant_qisefang_biz.v_rst_ra_skc_org_detail where day_date = '2022-10-28'").fetch_df()#956098

I want to ask, if can add a param part_condition to spport partition table? maybe like this:

SELECT * FROM POSTGRES_SCAN('', 'public', 'mytable','2022-10-28');

And more, rather than table , can scan a sql query? just like spark or pandas:

SELECT * FROM POSTGRES_SCAN('', 'public', 'select * from mytable where ...');

If support sql query, it'll also can support pg's table funtions.
Now, I should use pandas to read it to duckdb.

sql="""select tenant_baosheng_proc.p_get_level_table_parameter_sql({p1},{p2}) ;"""
para=pa.Table.from_pandas(pd.read_sql_query(sql,conn))

Incorrect and inconsistent results with same data and query

I am not getting the same number of rows returned by the Postgres scanner as from Postgres. Also, when I run the same query using the Postgres scanner on the same Postgres data multiple times, I get different results (a different row count).

My hunch is that it has to do with different thread counts (although it occurs with threads = 1), so that may not be the true root cause. I see similar results with or without the filter_pushdown enabled.

I did not run into this issue when I only had 1 item in the where clause of the subquery. As soon as I change to an IN list, the error was triggered. Maybe because multiple different threads were pulling some of the data? I did not use an in list in the original query at work that triggered this investigation.

DuckDB 0.4.0, Windows 10, Python 3.10.4 (similar results with Python 3.7 as well)

import duckdb
import pandas
from datetime import datetime

conn = duckdb.connect()

print(conn.execute("""install postgres_scanner""").df())

print(conn.execute("""load postgres_scanner""").df())

# Code executed on Postgres
# drop table if exists public.postgres_table;
# create table public.postgres_table as
# select
# 	x as unique_id
# 	,current_timestamp - x * interval '1 minute' as timestamp_column
# 	,'varchar_' || cast(mod(x,100) as varchar) as varchar_1 
# 	,'varchar_' || cast(mod(x,100000) as varchar) as varchar_2 
# 	,'varchar_' || cast(mod(x,1000000) as varchar) as varchar_3 
# 	,'value_'   || cast(mod(x,10000000) as varchar) as value_column
# from generate_series(1,10000000) t(x);

# Might not be needed:
# vacuum full analyze;

print(conn.execute("""CALL POSTGRES_ATTACH('postgresql://postgres@localhost:5432',source_schema='public', overwrite=true, filter_pushdown=true)""").df())

for i in range (0,33,2):
    if i == 0:
        i = 1
    conn.execute("""pragma threads="""+str(i))
    print(conn.execute("""select current_setting('threads')""").df())
    start_time = datetime.now()

    output_df = conn.execute("""
        select
            *
        from (
            select
                *
                ,row_number() over (partition by varchar_2, varchar_3 order by timestamp_column desc, unique_id desc) as sort_id
            from postgres_table
            where
                --If I use varchar_1 = 'varchar_1' then I do not see an issue
                varchar_1 in ('varchar_1','varchar_2','varchar_3','varchar_4','varchar_5','varchar_6','varchar_7','varchar_8','varchar_9','varchar_10')
        ) subquery
        where
            sort_id = 1
            and value_column != ''
            ;
    """).df()

    print('output_df length:',len(output_df))
    print(output_df.head(1))
    print('Runtime:',(datetime.now() - start_time).total_seconds())

These are the results that I get. The row count should be 100,000.

Empty DataFrame
Columns: [Success]
Index: []
Empty DataFrame
Columns: [Success]
Index: []
Empty DataFrame
Columns: [Success]
Index: []
   current_setting('threads')
0                           1
output_df length: 100153
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 57.906681
   current_setting('threads')
0                           2
output_df length: 100171
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 43.162708
   current_setting('threads')
0                           4
output_df length: 100072
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 33.836762
   current_setting('threads')
0                           6
output_df length: 100171
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 30.629222
   current_setting('threads')
0                           8
output_df length: 100072
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 26.904889
   current_setting('threads')
0                          10
output_df length: 100126
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 25.660612
   current_setting('threads')
0                          12
output_df length: 100072
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 23.866439
   current_setting('threads')
0                          14
output_df length: 100171
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 23.88096
   current_setting('threads')
0                          16
output_df length: 100090
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 23.631315
   current_setting('threads')
0                          18
output_df length: 100018
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 24.343506
   current_setting('threads')
0                          20
output_df length: 100108
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 22.656423
   current_setting('threads')
0                          22
output_df length: 100072
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 22.098565
   current_setting('threads')
0                          24
output_df length: 100072
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 21.896262
   current_setting('threads')
0                          26
output_df length: 100189
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 22.482625
   current_setting('threads')
0                          28
output_df length: 100189
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 22.267104
   current_setting('threads')
0                          30
output_df length: 100090
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 21.770293
   current_setting('threads')
0                          32
output_df length: 100189
   unique_id           timestamp_column  varchar_1      varchar_2      varchar_3 value_column  sort_id
0      11008 2022-08-28 08:32:56.528292  varchar_8  varchar_11008  varchar_11008  value_11008        1
Runtime: 21.588638

How to load only some of the tables from PostgreSQL?

Hi, I'm using TPC-H benchmark on PostgreSQL.
I want to load only some of tables into duckdb (e.g., orders), the others are loaded from pg_scanner.
So I first load data into duckdb, and then use postgres_attach to load the other tables from PostgreSQL.

cursor.execute("CREATE TABLE orders as (SELECT * FROM postgres_scan_pushdown('host={} user={} "
                   "dbname={} port={}','public','orders'));")
cursor.execute("CALL postgres_attach('host={} user={} "
                   "dbname={} port={}', filter_pushdown=false);")

Then I got the exception "Failed to create view 'orders': View with name "orders" already exists!"
I wonder what can I do to load tables except "orders" from scanner, (i.e., skipping existing tables.)
Thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.