The goal of arkdb is to provide a convienent way to move data from large compressed text files (tsv, csv, etc) into any DBI-compliant database connection (e.g. MYSQL, Postgres, SQLite; see DBI), and move tables out of such databases into text files. The key feature of arkdb is that files are moved between databases and text files in chunks of a fixed size, allowing the package functions to work with tables that would be much to large to read into memory all at once.
- A more detailed introduction to package design and use can be found in the package Vignette
- Online versions of package documentation
You can install arkdb from GitHub with:
# install.packages("devtools")
devtools::install_github("cboettig/arkdb")
library(arkdb)
# additional libraries just for this demo
library(dbplyr)
library(dplyr)
library(fs)
Consider the nycflights
database in SQLite:
db <- dbplyr::nycflights13_sqlite(".")
#> Caching nycflights db at ./nycflights13.sqlite
#> Creating table: airlines
#> Creating table: airports
#> Creating table: flights
#> Creating table: planes
#> Creating table: weather
Create an archive of the database:
ark(db, ".", lines = 50000)
#> Exporting airlines in 50000 line chunks:
#> ...Done! (in 0.02494812 secs)
#> Exporting airports in 50000 line chunks:
#> ...Done! (in 0.03492808 secs)
#> Exporting flights in 50000 line chunks:
#> ...Done! (in 11.74483 secs)
#> Exporting planes in 50000 line chunks:
#> ...Done! (in 0.03923202 secs)
#> Exporting weather in 50000 line chunks:
#> ...Done! (in 0.8512921 secs)
Import a list of compressed tabular files (i.e. *.tsv.bz2
) into a
local SQLite database:
files <- fs::dir_ls(glob = "*.csv.bz2")
new_db <- src_sqlite("local.sqlite", create=TRUE)
unark(files, new_db, lines = 50000)
#> Warning in assert_files_exist(files): no file specified
new_db
#> src: sqlite 3.22.0 [local.sqlite]
#> tbls:
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.