o2r-project / containerit Goto Github PK
View Code? Open in Web Editor NEWPackage an R workspace and all dependencies as a Docker container
Home Page: https://o2r.info/containerit/
License: GNU General Public License v3.0
Package an R workspace and all dependencies as a Docker container
Home Page: https://o2r.info/containerit/
License: GNU General Public License v3.0
see also #10
Issue #33 suggests to determine system dependencies based on https://github.com/rstudio/shinyapps-package-dependencies
However, as previous discussion in #33 shows, system dependencies are not explicit for all packages, for instance rgdal
. That is because those dependencies that are listed in the basic Dockerfile seem to be pre-assumed, while the scripts from the 'packages' folder apply to packages that rely on additional dependencies. Hence, in order to rely on the shinyapps-package-dependencies, we would need to install all dependencies from the basic Dockerfile (or use the shinyapps image as the base image) which can result in unnecessary overhead.
Moreover, the shell scripts are made for ubuntu/linux only and therefore may not be applied to all potential base images.
It could be useful to extend the Dockerfile so that the captured session is fully replicated directly after container start. This would save the user to call require
/library
on those packages manually.
The only way to restore an interactive session with required libraries seems to be defining a Rprofile.site
file and setting R_PROFILE environment variable to its location (using ENV
instruction)
The R_PROFILE file must contain a .First
- function which attaches required packages using require(...) (or library)?
load namespaces via requireNameSpace() --> create instruction CMD ["R"] at the end of the Dockerfile
How will/does/should containerit leverage other packages?
Allow to manually select a specific R version
Now there are versioned Rocker images: https://github.com/rocker-org/rocker-versioned/
https://github.com/metacran/rversions might become handy.
paste0(version$major, '.', version$minor)
extends #6
The session loads rgdal and proj packages and adds the required libraries in the Dockerfile.
Use two approaches and then compare:
rsysreqs
Use https://github.com/ropensci/datapack (see also https://cran.rstudio.com/web/packages/datapack/index.html)
Could be useful to install packages via packrat: https://github.com/rstudio/packrat
The startup script should ideally also log the environment, i.e.
sessionInfo()
Following the suggestions in Label Schema we can add some meta-information to the images
we must add user config files etc. to the container and make sure they are actually used.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html
https://rstudio.github.io/packrat/
Goal: add a snapshot of a packrat private repository to the container
It is completely correct that right now there is an error message when a session is packaged, because containeRit itself is not published online: "Failed to identify source for package containeRit. Therefore the package cannot be installed in the docker image."
However, it should not be common to "package the packaging lib", so we should add an option add_self = false
to the dockerfile(..)
function that does by default not try to add the containeRit package itself to the image.
When packaging research into higher level containers, e.g. ERC, we most probably need some meta information. While the use can be asked for this, see #13, it would be better to extract this automagically from the session.
For this, we would need a feature that appends a script to the "main script file" of the container which has access to the R session "after" the analysis is completed.
Some ideas for informations that could be extracted here:
This feature is complementary to the file analysis conducted by @7048730 in https://github.com/o2r-project/o2r-meta
We need a CLI (command line interface) wrapper around the library to integrate it into workflows in other programming languages (e.g. as part of a node.js-based webapp)
Alternatively, if docopt does not work at all, evaluate package https://cran.r-project.org/web/packages/optparse/
Example usage:
container_it.R [-f <path to (markdown, R-script)file>]
container_it.R [-s] # package new R session
What options do we need exposed? How easy is that with docopt?
Suggestion/discussion
We could wrap the Docker CLI in R functions.
Simply copy over packages from lib directory / search path.
Throw errors if OS of host and container are not compatible.
Extends #6
Install a package that is only available from GitHub. Probably need devtools::session_info()
.
Packages for testing:
This should (optinally, or even by default) also see if a specific version is tagged on GitHub and install that explicitly, e.g. devtools::install_github("Appsilon/shiny.collections", ref = "0.1.0")
See https://github.com/jimhester/lintr#testthat
RStudio has some nice ways to create a GUI (when package is used in RStudio presumably), so when needing user input this could be handy
See RStudio Add-inns: https://rstudio.github.io/rstudioaddins/
NixOS is an interesting Linux distro that is all about declarative configuration and "devops", see e.g. https://www.domenkozar.com/2014/03/11/why-puppet-chef-ansible-arent-good-enough-and-we-can-do-better/
Maybe containerit
could create NixOS installation instructions instead of Dockerfiles ?
They seems mostly reasonable :-)
MRAN supports file style locations now: RevolutionAnalytics/checkpoint#218
Also, this issue RevolutionAnalytics/checkpoint#216 (comment) reports on how to make checkpoint work for an R Markdown document.
In case information is missing, we can retrieve it from the user via a commandline interface directly from R
utils:menu
, cf. https://github.com/hadley/devtools/blob/aaa4b61ca7c44515418d485cc64c84475e998ac7/R/utils.r#L69https://cran.r-project.org/web/packages/datapack/vignettes/datapack-overview.html seems to have a good working process
build
("setup works")Expands on an idea mentioned in #6 (comment)
After a session (or workspace or something else) was containerized, a user may be able to test-build an image in order to verify that
(1) The build is successful
(2) The local session matches the dockerized session
(3) More ideas?
1 and 2 can be achieved analog to tests/testthat/test_sessioninfo_reproduce.R
, by turning the test into feature
Provide a path to a vignette, then it gets packaged as the main document in a container (build at start time of the container, container finished when vignette is build).
https://github.com/codemeta/codemeta/pull/97/files
Should be quite straightforward thanks to https://github.com/ropensci/jsonld
When reproducing an R session also match the locales.
As shown in #33 (comment) locales are not reproduced yet
In Linux, the locale first has to be generated, if missing, and then configured as default or current locale. That seems not to be trivial, especially in non-interactive mode.
The R functions Sys.getlocale()
and Sys.setlocale()
may be helpful.
tests/testthat/test_sessioninfo_reproduce.R
runs with test (uncomment corresponding lines):test_that("the locales are the same ", {
expect_equal(local_sessionInfo$locale, docker_sessionInfo$locale)
})
Write a function that pulls a Dockerized linter (projectatomic/dockerfile-lint looks good) container and executes it on a given path and shows the output on the R console.
Append all Dockerfiles used in FROM
statements for a complete one-document Dockerfile. Could be interesting to evaluate for reproducibility - what is really installed?
"Unchain a Dockerfile"... _how useful can this be?
See e.g. Debian testing Dockerfile: https://github.com/tianon/docker-brew-debian/blob/9b1dd4b1594b8df02f7caa739e84b187edaab404/testing/Dockerfile
Extends #37
Simple R images like rocker/r-ver do not come with a GUI, therefore, the use of R is restricted to the console.
With such an configuration (try for instance docker run -it --rm rocker/r-ver
) users are not able to view R plots or any file or data that cannot be printed directly to the console.
Therefore, it would be beneficial to leverage Rstudio images (see rocker/rstudio
) and thus restore sessions directly in an Rstudio Server session.
See https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image
It might be quite hard for long running system calls, but maybe we find a way to add a progress bar.
Create a small session (load few packages, including one from CRAN but not in base packages) and create a Dockerfile that is as close as possible to recreate that session.
Based on sessionInfo()$running
we have a mapping from running string to base image. In our case all running strings map to rocker.
Also check devtools::session_info()
, could be useful to determine installation source.
Open questions:
MAINTAINER
field?May extend #37
Users may include R objects from their R workspace in a restored session.
Therefore, the dockerfile()-method has a parameter 'objects' that is not yet implemented.
The objects-parameter takes a character vector containing the names of the objects, as returned by the function ls()
The objects are saved to an RData-file that is copied to the image at the location of the R working directory. If the file has no name (".RData"), R by default loads it automatically into the session on startup.
Alternatively, users may "load" the file manually from the working directory.
When we can write Dockerfiles, we might as well parse them. uncontainer_it
= create a session on a host machine that resembles the one that is inside a container.
Use cases needed!
copy = script
(default) copies the supplied script to the image,copy = script_dir
also copies the script and all files / directories of the same foldercopy
takes a list of files and directories to be copied to the foldercmd
parameter can be set with Cmd_Rscript("path/to/script") resulting toCMD ["Rscript", "--save", "/path/to/Rscript"]
- [ ] 1. Default: execute a script locally and reproduce the session that results by the end of the script
- [ ] 2. copy_script = TRUE
also copies the script
- [ ] 3. copy_parent = TRUE
also copies the script and all files / directories of the same folder
- [ ] 4. batch_exec = TRUE
also copies script and sets CMD instruction to
CMD ["Rscript", "--save", "/path/to/Rscript"]
- [ ] 5. copy_files
takes a list of files and directories to be copied to the folder
- [ ] 6. Test 1-5 with test scripts
Issues (to be discussed later)
Consider how the function works_with_R
could be useful for this package: https://github.com/tdhock/dotfiles/blob/master/.Rprofile
Base on https://docs.docker.com/engine/reference/builder/
See if stuff from traitecoevo/dockertest
https://github.com/traitecoevo/dockertest/search?utf8=%E2%9C%93&q=dockerfile can just be reused?
- [ ] https://github.com/o2r-project/o2r-muncher/tree/master/test/bagtainers/markdowntainer-sfr/data
system.file("doc/sf3.Rmd",package = "sf")
)system.file("examples", "knitr-minimal.Rnw", package = "knitr")
As becomes clear in the discussion on geospatial libraries in Rocker, the versions of linked external libraries matter.
Can we support packaging explicit version of linked libraries?
> extSoftVersion()
zlib bzlib xz
"1.2.8" "1.0.6, 6-Sept-2010" "5.1.0alpha"
PCRE ICU TRE
"8.38 2015-11-23" "" "TRE 0.8.0 R_fixes (BSD)"
iconv readline
"glibc 2.23" "6.3"
> library(sf)
Linking to GEOS 3.5.0, GDAL 2.1.2, proj.4 4.9.2
> sf::sf_extSoftVersion()
GEOS GDAL proj.4
"3.5.1" "2.1.2" "4.9.2"
This information could be accessed by a funtion <pkgname_extSoftVersion>
, see extSoftVersion and (sf_extSoftVersion()
](https://github.com/edzer/sfr/blob/5c3dfea395af81bf352b4007d16c6a7d419883c2/R/init.R#L59)
Package a script and add, outside of the script, some result testing, i.e. a validation that the script has succeeded.
Probably use testthat
for it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.