r-dbi / dbi3 Goto Github PK
View Code? Open in Web Editor NEWDBI revisited
Home Page: https://r-dbi.github.io/dbi3
DBI revisited
Home Page: https://r-dbi.github.io/dbi3
libpq has an option to define the types of the parameters in PQprepare()
. Can we make use of it? Does it allow getting rid of typecasting in DBItest?
Is it possible to show a progress bar for long-running dbWriteTable
commands?
DBI specification had dbCallProc procedure but has been deprecated. ROracle added its own custom method to call PLSQL stored procedures in version 1.3-1. One can provide input, output or input and output arguments to stored procedure. There is not way to specify it in DBI. There is no dbBind functionality to specify that the value should be bound as a BLOB or some other data type in the DB.
We have added oracle-specific attributes like the following:
name <- attr(obj, "ora.parameter_name") # Used to bind using name instead of position
mode <- attr(obj, "ora.parameter_mode") # this where one specifies IN/OUT or INOUT arguments
typ <- attr(col, "ora.type") # data types(blob/bfile/date/raw/char/cursor...)
enc <- attr(col, "ora.encoding") # UTF8 or something encoded data in
mxl <- attr(col, "ora.maxlength") # Maximum length of IN/OUT arguemnst or bind variable
I am attaching a document describing PLSQL procedures and its usage in Oracle:
Parameter_to_SP.pdf
It would be good to add dbBind and/or dbCallProc.
I had discussed this with Hadley Wickham and he suggested that I implement oracleProc for now.
Link: https://rstudio.github.io/rstudio-extensions/connections-contract.html
Should be possible by changing the implementation of the generic to
setGeneric("dbConnect",
def = function(drv, ...) {
con <- standardGeneric("dbConnect")
register_con_with_rstudio(con)
con
},
valueClass = "DBIConnection"
)
instead of the current implementation:
setGeneric("dbConnect",
def = function(drv, ...) standardGeneric("dbConnect"),
valueClass = "DBIConnection"
)
Similar for dbDisconnect()
. I'm not sure about the connectionUpdated()
entry point, because DBI currently doesn't provide notification.
Implementation sketch by @edgararuiz: https://gist.github.com/edgararuiz/0533b3daf70a41b70acd75c1e6a2b7a3.
Happy to review a PR. CC @nwstephens @jimhester.
Example:
$ psql -c 'PREPARE test AS SELECT ($1 IS NULL)'
ERROR: could not determine data type of parameter $1
$ psql -c 'PREPARE test AS SELECT ($1::int4 IS NULL)'
PREPARE
For full support in DBI, this means that we must postpone sending the query to the database until we have received values for binding. On the other hand, currently the specs require that dbSendQuery()
must reject invalid queries.
If we parse query parameters ourselves (r-dbi/DBI#52), we have more control and still can send non-parametrized queries right away (and fail in case of error). Proposed action:
Need to mention in the specs that dbBind()
can expect the same argument types between calls for each result set. Maybe even require rejection of parameters if the type is different in a second call?
Reference: https://github.com/r-dbi/RPostgres/issues/139.
regardless of quoting status.
If dbIsValid()
is called in a fresh R session, the required packages are not loaded correctly.
path_db = tempfile()
con = DBI::dbConnect(RSQLite::SQLite(), path_db)
# break the connection/external pointers by saving
# to file system and restoring again
path = tempfile()
saveRDS(con, file = path)
con = readRDS(path)
# detection works in current session
DBI::dbIsValid(con)
# detection fails in remote session, RSQLite is not loaded
callr::r(DBI::dbIsValid, list(con = con))
# RSQLite is loaded on print
callr::r(function(con) { print(con); DBI::dbIsValid(con) }, list(con = con))
Error message:
unable to find an inherited method for function ‘dbIsValid’ for signature
‘"missing"’>
should fail: r-dbi/RPostgres#359.
To write a table to a specific schema, the identifier must be quoted.
Consider quote_table_name()
and as_quoted_table_name()
, the latter would fail for weird names.
i.e. dbname
, host
, username
, password
. We can't test for this, but we can encourage backend authors to use standard names.
Related to tidyverse/dplyr#1955
Sometimes it's useful to reset all user-defined variables back to their defaults in a DBIConnection (set using SET
), instead of having to disconnect and fetch a brand new connection. This is particularly important in the context of a connection pool, for which we want to make sure that, when a connection is returned to the pool, its variables will be reset (so that when it's checkout again, it is as good as new).
It seems that this would be very easy to do if DBI had a generic called dbResetConnection()
or the like. Each driver would then have to implement the corresponding method for its connections, in order to be considered DBI-compliant. For example, in RMySQL, this could be implemented using MySQL's mysql_reset_connection()
.
This will require a \Sexpr{}
tag that calls an internal function that generates Rd
Sketch of needed code:
library(DBI)
library(RSQLite)
library(RMySQL)
library(RPostgreSQL)
methods <- methods::findMethods("dbConnect")
# Extracted from roxygen2::object_topic.s4method
s4_topic <- function(x) {
sig <- paste0(x@defined, collapse = ",")
paste0(x@generic, ",", sig, "-method")
}
sapply(methods@.Data, s4_topic)
Should we have a standard way of getting the server database version?
With tests.
Follow-up to r-dbi/DBI#74.
select count(*) from a cross join a cross join a ...
Important for "big data" backends.
Statements such as CREATE TABLE
and DROP TABLE
might be unavailable, e.g. in BigQuery. We could just expect an existing empty table with a very simple schema for the tests.
It's probably broadly useful but it's so easy to silently misname arguments in dbConnect()
that this seems like a reasonable place to try it out.
Is it possible for RPostgres to support the use of named parameters rather than only positional parameters in parameterized queries? This is currently possible in RSQLite and might be beneficial in RPostgres as well.
For instance, perhaps the named_parameterized_statement
is possible:
con = dbConnect(RPostgres::Postgres(),
dbname = "...",
host = "...",
port = ...,
user = "...",
password = "...")
create_products_sql =
"
CREATE TABLE products (
product_no SERIAL PRIMARY KEY,
name text,
price numeric
);
"
dbExecute(con, create_products_sql)
products = data.frame(product_no = 1:4,
name=c("Laptop", "Milk", "Bread", "Couch"),
price=c(699.99, 2.49, 3.49, 299.99)
)
dbWriteTable(con, "products", products, append=T, row.names=F)
positional_parameterized_statement =
"
UPDATE products
SET price=$1
WHERE product_no=1;
"
named_parameterized_statement =
"
UPDATE products
SET price=:price
WHERE product_no=1;
"
dbExecute(con, positional_parameterized_statement, list(599.99))
dbExecute(con, named_parameterized_statement, list(price=599.99))
dbDisconnect(con)
I don't have the best development skills, but I would be willing to work on this if a more senior developer could confirm that this would be useful and give me some rough guidance.
dbreadtable could use copy to csv and then read the csv to dataframe.
this needs a temporary folder, but this would improve performance in many cases.
also several R readers could be passed such fread to get different object such dataframe of datatable or whatever
It's the only safe way. We might need to make this an option, though, to support legacy user code.
NA
when DBI > 0.8 is on CRAN: 77b64be24.
Usage like this:
dbWithConnection(
con <- dbConnect(...),
{
dbWithResult(
res <- dbSendQuery(con, ...),
{
dbFetch(res)
}
)
}
)
The goal is to support fetching and processing data piecemeal with a callback. Development started in r-dbi/DBI#111, but perhaps the interface and the requirements should be specified here first.
Because people using dbSendQuery()
would also know to close their result sets to avoid unnecessary open locks on the database.
@jimhester @imanuelcostigan: Do odbc and RSQLServer support multiple open result sets per connection?
CC @hannesmuehleisen.
Typically, I can specify several mysql options before connecting using MariaDB C Connector introduced at https://mariadb.com/kb/en/library/mysql_optionsv/ like the following:
MYSQL *mysql = mysql_init(nullptr);
mysql_options(mysql, MYSQL_OPT_RECONNECT, &config.reconnect);
mysql_options(mysql, MYSQL_OPT_LOCAL_INFILE, 0);
mysql_options(mysql, MYSQL_INIT_COMMAND, "SET autocommit=0");
However, I don't find a way to specify MYSQL_OPT_RECONNECT
in RMariaDB. I'm wondering if there any possibility to support this?
Simplifies generation of spec.
See here for some discussion we've had on R-SIG-DB on this issue in the past.
Here are hints of an indicative use case. Looking at ? tidyr::unnest
, it seems that what would be ideal would be if arrays could be mapped to what you call "list-columns" and then operations performed on those (in the example, I am doing the unnest
in PostgreSQL).
There is a package that provides the "upsert" function of sql, is there a plan to create the function in DBI package?
[dbx] package ↓
https://rdrr.io/cran/dbx/man/dbxUpsert.html
Quoting NA
values should always return a query that yields a NULL
of the correct type. This might be especially challenging in MariaDB where the support for coercion is weak. Could perhaps be simulated by
SELECT x FROM (SELECT '' AS x, 0 AS y UNION ALL SELECT NULL AS x, 1 AS y) qq WHERE y = 1
for such backends.
From r-dbi/RPostgres#357.
Also, we should include at least date, time, and interval types.
temporary
argument, like with dbCreateTable()
. Default NA
lists everything, TRUE
or FALSE
lists only temporary or not.
dbListTables()
dbExistsTable()
dbRemoveTable()
This seems to test the formals of the generic, not those of the method.
DBI should offer some type of plugin system that other packages can build upon by offering implementations for representing some of the more exotic data types in R. Throughout the DBI packages, there are many open issues surrounding this problem. A selection:
r-dbi/DBI#199 enum types
r-dbi/DBI#179 uuid types
r-dbi/RPostgres#132 geometry types, sf
r-dbi/RPostgres#114 geometry types, sf
r-dbi/RPostgres#86 geometry types
r-dbi/bigrquery#145 array/struct types
r-dbi/RMariaDB#42 json
other types that fall into this area include arbitrary precision floats (Rmpfr, gmp), xml types (xml2), monetary types, etc.
Now if geometry types are implemented for Postgres, this is great. But they are also available in MySQL/MariaDB. It therefore might be useful to consider these issues in a more general fashion. Furthermore, approaching this in a type by type fashion might not be sufficient. How could a user map a Postgres composite type, if there is not some inherent extensibility?
Unfortunately, I have no idea how to tackle such an issue. Maybe a pragmatic approach, where things such as composite types are simply not considered, is the best we can do. I just was hoping to get a discussion started on this topic.
I'm trying to pass two arguments to a query. One is a list to be used in the form of SELECT FOO IN ($1) while the second parameter is used just for something like BAR = $2.
Why should these two parameters need to be of the same length? A workaround is to replicate the other arguments, but this seems wrong.
This seems to be a more severe issue than anticipated. Cancelling queries seems generally very poorly supported.
As a user, I want to be able to cancel a long-running query at any time. If the connection dies, I can recover, but I'd rather not restart R.
A "hammer-like" solution would be a generic DBI backend that instantiates the connection in a separate process. Similarly to {DBIlog}, this can be a wrapper that wraps arbitrary DBI backends. The DBIDriver
class of this connection would hold the handle to the remote process.
CC @hannesmuehleisen.
Default: NA
.
Some queries (at least on MySQL and SQL Server) do not work with the "prepared statement" API, they require the use of the "simple" API. We might want to default to the "simple" API and use the "prepared" API only if parameters are passed, and perhaps only when requested.
We already have test_all()
, we need a convention to retrieve the default context from a database backend. Perhaps a dbiTestContext()
method? As a stop gap, looking up an unexported function might be enough.
to limit the number of rows read.
By @hadley:
dbFetch()
and dbGetQuery()
need to have optional coercion object. Should be named list, and provide common converters (e.g. times to number of seconds, dates to days, factors to strings, binary serialisation (default for list, raw should stay as is) etc, bool<->int for sqllite).
Defaults only need to work with atomic vectors.
Original issue: r-dbi/RSQLite#74
% 1024
instead of % 1000
, )Rcpp::checkUserInterrupt()
in a try {}
blockextern void Rsleep(...)
for accessing a portable sleep()
function, otherwise call Sys.sleep()
from CFor testing purposes it would be great to have something like a "virtual" connection. That is, a connection that can be created without actually connecting to a database. This connection could then be used in functions like DBI::dbQuoteLiteral()
that don't need a live connection.
The dbplyr
package has something like this with the simulate_*()
functions. This can be useful for testing when one cannot connect locally to a specific database.
I also think/hope one could then get rid of some code duplication between dbplyr
and the DBI packages, for example quoting literals is also defined in dbplyr
.
Used in dbBind()
. Example (MySQL): SELECT cast(? as character)
Postgres: positional, $1
, $2
, ...
MySQL: positional, ?
SQLite: named: $a
or :a
, and positional: ?
Monet-DB: positional, ?
Similar to dbQuoteIdentifier()
and dbQuoteString()
:
setMethod("dbQuoteParameter", c("MySQLConnection", "integer"),
function(conn, n, ...) {
rep("?", n)
}
)
setMethod("dbQuoteParameter", c("PqConnection", "integer"),
function(conn, n, ...) {
paste0("$", seq_len(n))
}
)
setMethod("dbQuoteNamedParameter", c("SQLiteConnection", "character"),
function(conn, param_names, ...) {
paste0(":", param_names)
}
)
I would like to have functionality in Rmarkdown files that is similar to BigQuery Magics via yihui/knitr#1867
here a some interfaces that I think are missing in DBI package for it to be work:
DBI::sqlParseVariables
implementation will support different options:database | notation |
---|---|
postgres | $param_name |
bigquery | @param_name |
other | ?param_name |
Each package that supports DBI interfaces should provide information about notation used by the corresponding database
Interface in the DBI package that will allow to access notation for parameters used in the engine
Each package should implement a flag that indicates if database supports parameterised queries
Interface in the DBI package that will define if interpolation should be done or not, e.g. whether DB engine supports parameterised queries
so that dblog_cnr()
can use it. Otherwise test_some()
doesn't work e.g. for duckdb.
The title is pretty self-explanatory.
Primary keys and indices are essential tools for performant SQL queries, but must currently be performed manually by the user after table creation with an ALTER TABLE
call (made more complex for PK's by the fact that the respective columns must be NOT NULL, which must also be defined post-facto).
At a glance, at least PK's could be relatively simply implemented by adding an argument to dbCreateTable
and modifying sqlCreateTable
. Indices are made more complicated by the need to define them as clustered or non-clustered, but PK's seem quite simple (says the guy who hasn't actually tried doing it).
Ideally, every call to dbBegin()
will be followed by exactly one call to either rollback or commit.
However, sometimes in complex situations where the transaction is happening throughout several functions, it might not be 100% certain that a rollback is called in case of an error. In that case, the next time I try to call dbBegin()
, I'd like to clear any active transactions because if a previous transaction failed part-way through then I'm stuck and cannot start a new transaction.
To do this, it would be necessary to have a function such as DBI::isActiveTransaction()
or similar
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.