Giter VIP home page Giter VIP logo

dbi3's People

Contributors

krlmlr avatar maelle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

maelle howardbaek

dbi3's Issues

Send parameter types

libpq has an option to define the types of the parameters in PQprepare(). Can we make use of it? Does it allow getting rid of typecasting in DBItest?

Interface to call stored procedure with various parameters and their attributes need to be supported in DBI

DBI specification had dbCallProc procedure but has been deprecated. ROracle added its own custom method to call PLSQL stored procedures in version 1.3-1. One can provide input, output or input and output arguments to stored procedure. There is not way to specify it in DBI. There is no dbBind functionality to specify that the value should be bound as a BLOB or some other data type in the DB.
We have added oracle-specific attributes like the following:
name <- attr(obj, "ora.parameter_name") # Used to bind using name instead of position
mode <- attr(obj, "ora.parameter_mode") # this where one specifies IN/OUT or INOUT arguments
typ <- attr(col, "ora.type") # data types(blob/bfile/date/raw/char/cursor...)
enc <- attr(col, "ora.encoding") # UTF8 or something encoded data in
mxl <- attr(col, "ora.maxlength") # Maximum length of IN/OUT arguemnst or bind variable

I am attaching a document describing PLSQL procedures and its usage in Oracle:
Parameter_to_SP.pdf

It would be good to add dbBind and/or dbCallProc.

I had discussed this with Hadley Wickham and he suggested that I implement oracleProc for now.

Implement RStudio Connection Contract

Link: https://rstudio.github.io/rstudio-extensions/connections-contract.html

Should be possible by changing the implementation of the generic to

setGeneric("dbConnect",
  def = function(drv, ...) {
    con <- standardGeneric("dbConnect")
    register_con_with_rstudio(con)
    con
  },
  valueClass = "DBIConnection"
)

instead of the current implementation:

setGeneric("dbConnect",
  def = function(drv, ...) standardGeneric("dbConnect"),
  valueClass = "DBIConnection"
)

Similar for dbDisconnect(). I'm not sure about the connectionUpdated() entry point, because DBI currently doesn't provide notification.

Implementation sketch by @edgararuiz: https://gist.github.com/edgararuiz/0533b3daf70a41b70acd75c1e6a2b7a3.

Happy to review a PR. CC @nwstephens @jimhester.

libpq expects knowledge about data types for parametrized queries

Example:

$ psql -c 'PREPARE test AS SELECT ($1 IS NULL)'
ERROR:  could not determine data type of parameter $1
$ psql -c 'PREPARE test AS SELECT ($1::int4 IS NULL)'
PREPARE

For full support in DBI, this means that we must postpone sending the query to the database until we have received values for binding. On the other hand, currently the specs require that dbSendQuery() must reject invalid queries.

If we parse query parameters ourselves (r-dbi/DBI#52), we have more control and still can send non-parametrized queries right away (and fail in case of error). Proposed action:

  • Rewrite parametrized queries
    • Support everything that SQLite or PostgreSQL supports
    • Might require a helper in DBI
  • If query doesn't have parameters, send right away
  • If query has parameters, wait until types are known before sending, and use these types

`dbIsValid()` fails in new sessions

If dbIsValid() is called in a fresh R session, the required packages are not loaded correctly.

path_db = tempfile()
con = DBI::dbConnect(RSQLite::SQLite(), path_db)

# break the connection/external pointers by saving
# to file system and restoring again
path = tempfile()
saveRDS(con, file = path)
con = readRDS(path)

# detection works in current session
DBI::dbIsValid(con)

# detection fails in remote session, RSQLite is not loaded
callr::r(DBI::dbIsValid, list(con = con))

# RSQLite is loaded on print
callr::r(function(con) { print(con); DBI::dbIsValid(con) }, list(con = con))

Error message:

 unable to find an inherited method for function ‘dbIsValid’ for signature
 ‘"missing"’>

Feature request: add a dbResetConnection() generic

Sometimes it's useful to reset all user-defined variables back to their defaults in a DBIConnection (set using SET), instead of having to disconnect and fetch a brand new connection. This is particularly important in the context of a connection pool, for which we want to make sure that, when a connection is returned to the pool, its variables will be reset (so that when it's checkout again, it is as good as new).

It seems that this would be very easy to do if DBI had a generic called dbResetConnection() or the like. Each driver would then have to implement the corresponding method for its connections, in order to be considered DBI-compliant. For example, in RMySQL, this could be implemented using MySQL's mysql_reset_connection().

cc @hadley, @jcheng5

Dynamically list methods in generic documentation

This will require a \Sexpr{} tag that calls an internal function that generates Rd

Sketch of needed code:

library(DBI)
library(RSQLite)
library(RMySQL)
library(RPostgreSQL)

methods <- methods::findMethods("dbConnect")

# Extracted from roxygen2::object_topic.s4method
s4_topic <- function(x) {
  sig <- paste0(x@defined, collapse = ",")
  paste0(x@generic, ",", sig, "-method")
}

sapply(methods@.Data, s4_topic)

Think about asynchronous operations

  • Long-running queries should not block R
    • Test with repeated cross-join of table with two rows: select count(*) from a cross join a cross join a ...
  • Blocking vs. non-blocking dbFetch() and/or dbGetQuery()
    • Support timeout
  • Waiting (default implementation: busy waiting with Sys.sleep())
  • Query cancellation (now in #63)
  • One active result set per connection
    • Support easy way to clone a connection (#57)

Important for "big data" backends.

Avoid DML in tests

Statements such as CREATE TABLE and DROP TABLE might be unavailable, e.g. in BigQuery. We could just expect an existing empty table with a very simple schema for the tests.

Used named parameters in parameterized statements

Is it possible for RPostgres to support the use of named parameters rather than only positional parameters in parameterized queries? This is currently possible in RSQLite and might be beneficial in RPostgres as well.

For instance, perhaps the named_parameterized_statement is possible:

con = dbConnect(RPostgres::Postgres(),
                dbname = "...",
                host = "...",
                port = ...,
                user = "...",
                password = "...")
create_products_sql = 
  "
  CREATE TABLE products (
    product_no SERIAL PRIMARY KEY,
    name text,
    price numeric
  );
  "
dbExecute(con, create_products_sql)
products = data.frame(product_no = 1:4,
                      name=c("Laptop", "Milk", "Bread", "Couch"),
                      price=c(699.99, 2.49, 3.49, 299.99)
                      )
dbWriteTable(con, "products", products, append=T, row.names=F)

positional_parameterized_statement = 
"
UPDATE products
SET price=$1
WHERE product_no=1;
"
named_parameterized_statement = 
"
UPDATE products
SET price=:price
WHERE product_no=1;
"

dbExecute(con, positional_parameterized_statement, list(599.99))
dbExecute(con, named_parameterized_statement, list(price=599.99))
dbDisconnect(con)

I don't have the best development skills, but I would be willing to work on this if a more senior developer could confirm that this would be useful and give me some rough guidance.

let dbreadtable use copy

dbreadtable could use copy to csv and then read the csv to dataframe.

this needs a temporary folder, but this would improve performance in many cases.

also several R readers could be passed such fread to get different object such dataframe of datatable or whatever

Need MYSQL_OPT_RECONNECT

Typically, I can specify several mysql options before connecting using MariaDB C Connector introduced at https://mariadb.com/kb/en/library/mysql_optionsv/ like the following:

MYSQL *mysql = mysql_init(nullptr);
mysql_options(mysql, MYSQL_OPT_RECONNECT, &config.reconnect);
mysql_options(mysql, MYSQL_OPT_LOCAL_INFILE, 0);
mysql_options(mysql, MYSQL_INIT_COMMAND, "SET autocommit=0");

However, I don't find a way to specify MYSQL_OPT_RECONNECT in RMariaDB. I'm wondering if there any possibility to support this?

Support arrays natively

See here for some discussion we've had on R-SIG-DB on this issue in the past.

Here are hints of an indicative use case. Looking at ? tidyr::unnest, it seems that what would be ideal would be if arrays could be mapped to what you call "list-columns" and then operations performed on those (in the example, I am doing the unnest in PostgreSQL).

Stronger guarantees for quoting literals

Quoting NA values should always return a query that yields a NULL of the correct type. This might be especially challenging in MariaDB where the support for coercion is weak. Could perhaps be simulated by

SELECT x FROM (SELECT '' AS x, 0 AS y UNION ALL SELECT NULL AS x, 1 AS y) qq WHERE y = 1

for such backends.

From r-dbi/RPostgres#357.

Also, we should include at least date, time, and interval types.

Specialised data types in R

DBI should offer some type of plugin system that other packages can build upon by offering implementations for representing some of the more exotic data types in R. Throughout the DBI packages, there are many open issues surrounding this problem. A selection:

r-dbi/DBI#199 enum types
r-dbi/DBI#179 uuid types
r-dbi/RPostgres#132 geometry types, sf
r-dbi/RPostgres#114 geometry types, sf
r-dbi/RPostgres#86 geometry types
r-dbi/bigrquery#145 array/struct types
r-dbi/RMariaDB#42 json

other types that fall into this area include arbitrary precision floats (Rmpfr, gmp), xml types (xml2), monetary types, etc.

Now if geometry types are implemented for Postgres, this is great. But they are also available in MySQL/MariaDB. It therefore might be useful to consider these issues in a more general fashion. Furthermore, approaching this in a type by type fashion might not be sufficient. How could a user map a Postgres composite type, if there is not some inherent extensibility?

Unfortunately, I have no idea how to tackle such an issue. Maybe a pragmatic approach, where things such as composite types are simply not considered, is the best we can do. I just was hoping to get a discussion started on this topic.

Query cancellation

This seems to be a more severe issue than anticipated. Cancelling queries seems generally very poorly supported.

As a user, I want to be able to cancel a long-running query at any time. If the connection dies, I can recover, but I'd rather not restart R.

A "hammer-like" solution would be a generic DBI backend that instantiates the connection in a separate process. Similarly to {DBIlog}, this can be a wrapper that wraps arbitrary DBI backends. The DBIDriver class of this connection would hold the handle to the remote process.

CC @hannesmuehleisen.

Rethink `immediate` argument

Some queries (at least on MySQL and SQL Server) do not work with the "prepared statement" API, they require the use of the "simple" API. We might want to default to the "simple" API and use the "prepared" API only if parameters are passed, and perhaps only when requested.

How to run tests on a user's database?

  • Expose default DBItest context from package
  • Support updating a context (e.g., database credentials, tweaks)
  • Summary of results and failure details
  • Documentation

We already have test_all(), we need a convention to retrieve the default context from a database backend. Perhaps a dbiTestContext() method? As a stop gap, looking up an unexported function might be enough.

Think about column conversion

By @hadley:

dbFetch() and dbGetQuery() need to have optional coercion object. Should be named list, and provide common converters (e.g. times to number of seconds, dates to days, factors to strings, binary serialisation (default for list, raw should stay as is) etc, bool<->int for sqllite).

Defaults only need to work with atomic vectors.

Original issue: r-dbi/RSQLite#74

Asynchronous processing

  • Connection (to allow aborts)
  • Query execution
  • Fetching (should test % 1024 instead of % 1000, )

Caveats

  • MariaDB and MySQL interface is different: https://mariadb.com/kb/en/using-the-non-blocking-library/
  • Only Connector/C (MariaDB) seems to explicitly support termination
  • Use Rcpp::checkUserInterrupt() in a try {} block
  • Try extern void Rsleep(...) for accessing a portable sleep() function, otherwise call Sys.sleep() from C

Feature Request: Virtual connection

For testing purposes it would be great to have something like a "virtual" connection. That is, a connection that can be created without actually connecting to a database. This connection could then be used in functions like DBI::dbQuoteLiteral() that don't need a live connection.
The dbplyr package has something like this with the simulate_*() functions. This can be useful for testing when one cannot connect locally to a specific database.
I also think/hope one could then get rid of some code duplication between dbplyr and the DBI packages, for example quoting literals is also defined in dbplyr.

Think about ways to specify placeholder in prepared/parametrized query

Used in dbBind(). Example (MySQL): SELECT cast(? as character)

Postgres: positional, $1, $2, ...
MySQL: positional, ?
SQLite: named: $a or :a, and positional: ?
Monet-DB: positional, ?

Similar to dbQuoteIdentifier() and dbQuoteString():

setMethod("dbQuoteParameter", c("MySQLConnection", "integer"),
  function(conn, n, ...) {
    rep("?", n)
  }
)

setMethod("dbQuoteParameter", c("PqConnection", "integer"),
  function(conn, n, ...) {
    paste0("$", seq_len(n))
  }
)

setMethod("dbQuoteNamedParameter", c("SQLiteConnection", "character"),
  function(conn, param_names, ...) {
    paste0(":", param_names)
  }
)

Provide interfaces and support for Parameterised Queries

I would like to have functionality in Rmarkdown files that is similar to BigQuery Magics via yihui/knitr#1867

here a some interfaces that I think are missing in DBI package for it to be work:

  1. Extract variables from sql
    Function that can extract parameter names from sql based on different notations, as opposed to current DBI::sqlParseVariables implementation will support different options:
database notation
postgres $param_name
bigquery @param_name
other ?param_name
  1. Each package that supports DBI interfaces should provide information about notation used by the corresponding database
    Interface in the DBI package that will allow to access notation for parameters used in the engine

  2. Each package should implement a flag that indicates if database supports parameterised queries
    Interface in the DBI package that will define if interpolation should be done or not, e.g. whether DB engine supports parameterised queries

Allow creation of tables with primary key and/or index

The title is pretty self-explanatory.

Primary keys and indices are essential tools for performant SQL queries, but must currently be performed manually by the user after table creation with an ALTER TABLE call (made more complex for PK's by the fact that the respective columns must be NOT NULL, which must also be defined post-facto).

At a glance, at least PK's could be relatively simply implemented by adding an argument to dbCreateTable and modifying sqlCreateTable. Indices are made more complicated by the need to define them as clustered or non-clustered, but PK's seem quite simple (says the guy who hasn't actually tried doing it).

Any way to know if a connection is already inside a transaction?

Ideally, every call to dbBegin() will be followed by exactly one call to either rollback or commit.

However, sometimes in complex situations where the transaction is happening throughout several functions, it might not be 100% certain that a rollback is called in case of an error. In that case, the next time I try to call dbBegin(), I'd like to clear any active transactions because if a previous transaction failed part-way through then I'm stuck and cannot start a new transaction.

To do this, it would be necessary to have a function such as DBI::isActiveTransaction() or similar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.