r-dbi / dbi3 Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 2.0 2.91 MB

DBI revisited

Home Page: https://r-dbi.github.io/dbi3

R 100.00%

dbi3's People

Contributors

Stargazers

Watchers

Forkers

maelle howardbaek

dbi3's Issues

Send parameter types

libpq has an option to define the types of the parameters in PQprepare(). Can we make use of it? Does it allow getting rid of typecasting in DBItest?

Progress bar for dbWriteTable?

Is it possible to show a progress bar for long-running dbWriteTable commands?

Interface to call stored procedure with various parameters and their attributes need to be supported in DBI

DBI specification had dbCallProc procedure but has been deprecated. ROracle added its own custom method to call PLSQL stored procedures in version 1.3-1. One can provide input, output or input and output arguments to stored procedure. There is not way to specify it in DBI. There is no dbBind functionality to specify that the value should be bound as a BLOB or some other data type in the DB.
We have added oracle-specific attributes like the following:
name <- attr(obj, "ora.parameter_name") # Used to bind using name instead of position
mode <- attr(obj, "ora.parameter_mode") # this where one specifies IN/OUT or INOUT arguments
typ <- attr(col, "ora.type") # data types(blob/bfile/date/raw/char/cursor...)
enc <- attr(col, "ora.encoding") # UTF8 or something encoded data in
mxl <- attr(col, "ora.maxlength") # Maximum length of IN/OUT arguemnst or bind variable

I am attaching a document describing PLSQL procedures and its usage in Oracle:
Parameter_to_SP.pdf

It would be good to add dbBind and/or dbCallProc.

I had discussed this with Hadley Wickham and he suggested that I implement oracleProc for now.

Implement RStudio Connection Contract

Link: https://rstudio.github.io/rstudio-extensions/connections-contract.html

Should be possible by changing the implementation of the generic to

setGeneric("dbConnect",
  def = function(drv, ...) {
    con <- standardGeneric("dbConnect")
    register_con_with_rstudio(con)
    con
  },
  valueClass = "DBIConnection"
)

instead of the current implementation:

setGeneric("dbConnect",
  def = function(drv, ...) standardGeneric("dbConnect"),
  valueClass = "DBIConnection"
)

Similar for dbDisconnect(). I'm not sure about the connectionUpdated() entry point, because DBI currently doesn't provide notification.

Implementation sketch by @edgararuiz: https://gist.github.com/edgararuiz/0533b3daf70a41b70acd75c1e6a2b7a3.

Happy to review a PR. CC @nwstephens @jimhester.

libpq expects knowledge about data types for parametrized queries

Example:

$ psql -c 'PREPARE test AS SELECT ($1 IS NULL)'
ERROR:  could not determine data type of parameter $1
$ psql -c 'PREPARE test AS SELECT ($1::int4 IS NULL)'
PREPARE

For full support in DBI, this means that we must postpone sending the query to the database until we have received values for binding. On the other hand, currently the specs require that dbSendQuery() must reject invalid queries.

If we parse query parameters ourselves (r-dbi/DBI#52), we have more control and still can send non-parametrized queries right away (and fail in case of error). Proposed action:

Rewrite parametrized queries
- Support everything that SQLite or PostgreSQL supports
- Might require a helper in DBI
If query doesn't have parameters, send right away
If query has parameters, wait until types are known before sending, and use these types

Specify binding of different argument types for different calls of dbBind()

Need to mention in the specs that dbBind() can expect the same argument types between calls for each result set. Maybe even require rejection of parameters if the type is different in a second call?

Reference: https://github.com/r-dbi/RPostgres/issues/139.

Specify and test case-sensitivity for table names

regardless of quoting status.

r-dbi/RPostgres#96 (comment)

`dbIsValid()` fails in new sessions

If dbIsValid() is called in a fresh R session, the required packages are not loaded correctly.

path_db = tempfile()
con = DBI::dbConnect(RSQLite::SQLite(), path_db)

# break the connection/external pointers by saving
# to file system and restoring again
path = tempfile()
saveRDS(con, file = path)
con = readRDS(path)

# detection works in current session
DBI::dbIsValid(con)

# detection fails in remote session, RSQLite is not loaded
callr::r(DBI::dbIsValid, list(con = con))

# RSQLite is loaded on print
callr::r(function(con) { print(con); DBI::dbIsValid(con) }, list(con = con))

Error message:

 unable to find an inherited method for function ‘dbIsValid’ for signature
 ‘"missing"’>

Reading and writing tables with dots in name

should fail: r-dbi/RPostgres#359.

To write a table to a specific schema, the identifier must be quoted.

Consider quote_table_name() and as_quoted_table_name(), the latter would fail for weird names.

dbConnect should give advice about argument names

i.e. dbname, host, username, password. We can't test for this, but we can encourage backend authors to use standard names.

Related to tidyverse/dplyr#1955

Feature request: add a dbResetConnection() generic

Sometimes it's useful to reset all user-defined variables back to their defaults in a DBIConnection (set using SET), instead of having to disconnect and fetch a brand new connection. This is particularly important in the context of a connection pool, for which we want to make sure that, when a connection is returned to the pool, its variables will be reset (so that when it's checkout again, it is as good as new).

It seems that this would be very easy to do if DBI had a generic called dbResetConnection() or the like. Each driver would then have to implement the corresponding method for its connections, in order to be considered DBI-compliant. For example, in RMySQL, this could be implemented using MySQL's mysql_reset_connection().

cc @hadley, @jcheng5

Dynamically list methods in generic documentation

This will require a \Sexpr{} tag that calls an internal function that generates Rd

Sketch of needed code:

library(DBI)
library(RSQLite)
library(RMySQL)
library(RPostgreSQL)

methods <- methods::findMethods("dbConnect")

# Extracted from roxygen2::object_topic.s4method
s4_topic <- function(x) {
  sig <- paste0(x@defined, collapse = ",")
  paste0(x@generic, ",", sig, "-method")
}

sapply(methods@.Data, s4_topic)

dbVersion(con) generic?

Should we have a standard way of getting the server database version?

Consider standard timezone and timezone_out arguments to dbConnect()

r-dbi/RMariaDB#93 (comment)

With tests.

Add default dbWriteTable() method

Follow-up to r-dbi/DBI#74.

Think about asynchronous operations

Long-running queries should not block R
- Test with repeated cross-join of table with two rows: select count(*) from a cross join a cross join a ...
Blocking vs. non-blocking dbFetch() and/or dbGetQuery()
- Support timeout
Waiting (default implementation: busy waiting with Sys.sleep())
Query cancellation (now in #63)
One active result set per connection
- Support easy way to clone a connection (#57)

Important for "big data" backends.

Avoid DML in tests

Statements such as CREATE TABLE and DROP TABLE might be unavailable, e.g. in BigQuery. We could just expect an existing empty table with a very simple schema for the tests.

Use ellipsis::check_dots_used() in dbConnect

It's probably broadly useful but it's so easy to silently misname arguments in dbConnect() that this seems like a reasonable place to try it out.

Used named parameters in parameterized statements

Is it possible for RPostgres to support the use of named parameters rather than only positional parameters in parameterized queries? This is currently possible in RSQLite and might be beneficial in RPostgres as well.

For instance, perhaps the named_parameterized_statement is possible:

con = dbConnect(RPostgres::Postgres(),
                dbname = "...",
                host = "...",
                port = ...,
                user = "...",
                password = "...")
create_products_sql = 
  "
  CREATE TABLE products (
    product_no SERIAL PRIMARY KEY,
    name text,
    price numeric
  );
  "
dbExecute(con, create_products_sql)
products = data.frame(product_no = 1:4,
                      name=c("Laptop", "Milk", "Bread", "Couch"),
                      price=c(699.99, 2.49, 3.49, 299.99)
                      )
dbWriteTable(con, "products", products, append=T, row.names=F)

positional_parameterized_statement = 
"
UPDATE products
SET price=$1
WHERE product_no=1;
"
named_parameterized_statement = 
"
UPDATE products
SET price=:price
WHERE product_no=1;
"

dbExecute(con, positional_parameterized_statement, list(599.99))
dbExecute(con, named_parameterized_statement, list(price=599.99))
dbDisconnect(con)

I don't have the best development skills, but I would be willing to work on this if a more senior developer could confirm that this would be useful and give me some rough guidance.

let dbreadtable use copy

dbreadtable could use copy to csv and then read the csv to dataframe.

this needs a temporary folder, but this would improve performance in many cases.

also several R readers could be passed such fread to get different object such dataframe of datatable or whatever

Raise error if data type domain differs between R and database

It's the only safe way. We might need to make this an option, though, to support legacy user code.

Embedded NUL
2⁻³¹ vs. NA
Number out of range

Reenable test for dbIsReadOnly()

when DBI > 0.8 is on CRAN: 77b64be24.

dbWithConnection() and dbWithResult()

Usage like this:

dbWithConnection(
  con <- dbConnect(...),
  {
    dbWithResult(
      res <- dbSendQuery(con, ...),
      {
        dbFetch(res)
      }
    )
  }
)

dbGetChunkedQuery()

The goal is to support fetching and processing data piecemeal with a callback. Development started in r-dbi/DBI#111, but perhaps the interface and the requirements should be specified here first.

CC @bborgesr @jcheng5 @jimhester @hadley.

Add tweak for restricting to one result set per connection

Because people using dbSendQuery() would also know to close their result sets to avoid unnecessary open locks on the database.

@jimhester @imanuelcostigan: Do odbc and RSQLServer support multiple open result sets per connection?

CC @hannesmuehleisen.

Need MYSQL_OPT_RECONNECT

Typically, I can specify several mysql options before connecting using MariaDB C Connector introduced at https://mariadb.com/kb/en/library/mysql_optionsv/ like the following:

MYSQL *mysql = mysql_init(nullptr);
mysql_options(mysql, MYSQL_OPT_RECONNECT, &config.reconnect);
mysql_options(mysql, MYSQL_OPT_LOCAL_INFILE, 0);
mysql_options(mysql, MYSQL_INIT_COMMAND, "SET autocommit=0");

However, I don't find a way to specify MYSQL_OPT_RECONNECT in RMariaDB. I'm wondering if there any possibility to support this?

Think about moving DBItest code here

Simplifies generation of spec.

Support arrays natively

See here for some discussion we've had on R-SIG-DB on this issue in the past.

Here are hints of an indicative use case. Looking at ? tidyr::unnest, it seems that what would be ideal would be if arrays could be mapped to what you call "list-columns" and then operations performed on those (in the example, I am doing the unnest in PostgreSQL).

Name collision when creating tables

Must be able to create a table if a temporary table exists, and vice versa

Do you have any intention of adding the "upsert" function?

There is a package that provides the "upsert" function of sql, is there a plan to create the function in DBI package?

[dbx] package ↓
https://rdrr.io/cran/dbx/man/dbxUpsert.html

Stronger guarantees for quoting literals

Quoting NA values should always return a query that yields a NULL of the correct type. This might be especially challenging in MariaDB where the support for coercion is weak. Could perhaps be simulated by

SELECT x FROM (SELECT '' AS x, 0 AS y UNION ALL SELECT NULL AS x, 1 AS y) qq WHERE y = 1

for such backends.

From r-dbi/RPostgres#357.

Also, we should include at least date, time, and interval types.

Think about listing temporary tables

temporary argument, like with dbCreateTable(). Default NA lists everything, TRUE or FALSE lists only temporary or not.

r-dbi/RSQLite@8f49651b12bbd0

dbListTables()
dbExistsTable()
dbRemoveTable()

Test for formals(dbXxx) ineffective

This seems to test the formals of the generic, not those of the method.

Specialised data types in R

DBI should offer some type of plugin system that other packages can build upon by offering implementations for representing some of the more exotic data types in R. Throughout the DBI packages, there are many open issues surrounding this problem. A selection:

r-dbi/DBI#199 enum types
r-dbi/DBI#179 uuid types
r-dbi/RPostgres#132 geometry types, sf
r-dbi/RPostgres#114 geometry types, sf
r-dbi/RPostgres#86 geometry types
r-dbi/bigrquery#145 array/struct types
r-dbi/RMariaDB#42 json

other types that fall into this area include arbitrary precision floats (Rmpfr, gmp), xml types (xml2), monetary types, etc.

Now if geometry types are implemented for Postgres, this is great. But they are also available in MySQL/MariaDB. It therefore might be useful to consider these issues in a more general fashion. Furthermore, approaching this in a type by type fashion might not be sufficient. How could a user map a Postgres composite type, if there is not some inherent extensibility?

Unfortunately, I have no idea how to tackle such an issue. Maybe a pragmatic approach, where things such as composite types are simply not considered, is the best we can do. I just was hoping to get a discussion started on this topic.

Support passing lists as arguments for queries like `SELECT a, b WHERE a IN ($1)`

I'm trying to pass two arguments to a query. One is a list to be used in the form of SELECT FOO IN ($1) while the second parameter is used just for something like BAR = $2.

Why should these two parameters need to be of the same length? A workaround is to replicate the other arguments, but this seems wrong.

Query cancellation

This seems to be a more severe issue than anticipated. Cancelling queries seems generally very poorly supported.

As a user, I want to be able to cancel a long-running query at any time. If the connection dies, I can recover, but I'd rather not restart R.

A "hammer-like" solution would be a generic DBI backend that instantiates the connection in a separate process. Similarly to {DBIlog}, this can be a wrapper that wraps arbitrary DBI backends. The DBIDriver class of this connection would hold the handle to the remote process.

CC @hannesmuehleisen.

More arguments to dbListObjects() to narrow result set

Default: NA.

Should backends use "Depends: DBI"?

duckdb/duckdb#384 (comment)

Rethink `immediate` argument

Some queries (at least on MySQL and SQL Server) do not work with the "prepared statement" API, they require the use of the "simple" API. We might want to default to the "simple" API and use the "prepared" API only if parameters are passed, and perhaps only when requested.

How to run tests on a user's database?

Expose default DBItest context from package
Support updating a context (e.g., database credentials, tweaks)
Summary of results and failure details
Documentation

We already have test_all(), we need a convention to retrieve the default context from a database backend. Perhaps a dbiTestContext() method? As a stop gap, looking up an unexported function might be enough.

Specify n argument for dbReadTable()

to limit the number of rows read.

Think about column conversion

By @hadley:

dbFetch() and dbGetQuery() need to have optional coercion object. Should be named list, and provide common converters (e.g. times to number of seconds, dates to days, factors to strings, binary serialisation (default for list, raw should stay as is) etc, bool<->int for sqllite).

Defaults only need to work with atomic vectors.

Original issue: r-dbi/RSQLite#74

Asynchronous processing

Connection (to allow aborts)
Query execution
Fetching (should test % 1024 instead of % 1000, )

Caveats

MariaDB and MySQL interface is different: https://mariadb.com/kb/en/using-the-non-blocking-library/
Only Connector/C (MariaDB) seems to explicitly support termination
Use Rcpp::checkUserInterrupt() in a try {} block
Try extern void Rsleep(...) for accessing a portable sleep() function, otherwise call Sys.sleep() from C

Feature Request: Virtual connection

For testing purposes it would be great to have something like a "virtual" connection. That is, a connection that can be created without actually connecting to a database. This connection could then be used in functions like DBI::dbQuoteLiteral() that don't need a live connection.
The dbplyr package has something like this with the simulate_*() functions. This can be useful for testing when one cannot connect locally to a specific database.
I also think/hope one could then get rid of some code duplication between dbplyr and the DBI packages, for example quoting literals is also defined in dbplyr.

Think about ways to specify placeholder in prepared/parametrized query

Used in dbBind(). Example (MySQL): SELECT cast(? as character)

Postgres: positional, $1, $2, ...
MySQL: positional, ?
SQLite: named: $a or :a, and positional: ?
Monet-DB: positional, ?

Similar to dbQuoteIdentifier() and dbQuoteString():

setMethod("dbQuoteParameter", c("MySQLConnection", "integer"),
  function(conn, n, ...) {
    rep("?", n)
  }
)

setMethod("dbQuoteParameter", c("PqConnection", "integer"),
  function(conn, n, ...) {
    paste0("$", seq_len(n))
  }
)

setMethod("dbQuoteNamedParameter", c("SQLiteConnection", "character"),
  function(conn, param_names, ...) {
    paste0(":", param_names)
  }
)

Provide interfaces and support for Parameterised Queries

I would like to have functionality in Rmarkdown files that is similar to BigQuery Magics via yihui/knitr#1867

here a some interfaces that I think are missing in DBI package for it to be work:

Extract variables from sql
Function that can extract parameter names from sql based on different notations, as opposed to current DBI::sqlParseVariables implementation will support different options:

database	notation
postgres	$param_name
bigquery	@param_name
other	?param_name

Each package that supports DBI interfaces should provide information about notation used by the corresponding database
Interface in the DBI package that will allow to access notation for parameters used in the engine
Each package should implement a flag that indicates if database supports parameterised queries
Interface in the DBI package that will define if interpolation should be done or not, e.g. whether DB engine supports parameterised queries

Connector class needs to store expression to create driver

so that dblog_cnr() can use it. Otherwise test_some() doesn't work e.g. for duckdb.

Allow creation of tables with primary key and/or index

The title is pretty self-explanatory.

Primary keys and indices are essential tools for performant SQL queries, but must currently be performed manually by the user after table creation with an ALTER TABLE call (made more complex for PK's by the fact that the respective columns must be NOT NULL, which must also be defined post-facto).

At a glance, at least PK's could be relatively simply implemented by adding an argument to dbCreateTable and modifying sqlCreateTable. Indices are made more complicated by the need to define them as clustered or non-clustered, but PK's seem quite simple (says the guy who hasn't actually tried doing it).

Avoid hard-coded data types in queries

Use dbDataType() consistently.

r-dbi/DBItest#76 (comment)

CC @imanuelcostigan.

Any way to know if a connection is already inside a transaction?

Ideally, every call to dbBegin() will be followed by exactly one call to either rollback or commit.

However, sometimes in complex situations where the transaction is happening throughout several functions, it might not be 100% certain that a rollback is called in case of an error. In that case, the next time I try to call dbBegin(), I'd like to clear any active transactions because if a previous transaction failed part-way through then I'm stuck and cannot start a new transaction.

To do this, it would be necessary to have a function such as DBI::isActiveTransaction() or similar