klmr / box Goto Github PK
View Code? Open in Web Editor NEWWrite reusable, composable and modular R code
Home Page: https://klmr.me/box/
License: MIT License
Write reusable, composable and modular R code
Home Page: https://klmr.me/box/
License: MIT License
At the moment, modules::
{help
/`?`
} shadows devtools::
{help
/`?`
} and dispatches directly to utils::
{help
/`?`
}.
Test whether devtools
is loaded and conditionally dispatch to it instead.
In addition, use setHook
to catch subsequent loading of devtools
, and ensure that modules
’ help system is still callable.
The way Unix handles its $PATH variable is to look in every path separated by ":".
It would be nice if modules' R_IMPORT_PATH did the same.
Every once in a while, it’s helpful for modules to be able to query their caller, i.e. the scope from which import
was invoked to load them.
Therefore, there should be a function parent_module
(defined only inside modules, i.e. inside the helper_env
), which returns that parent (roughly analogous to parent.frame()
).
One use-case is to override symbols in the parent scope. This should be handled with care, of course, but is sometimes necessary when monkey-patching functions. A prominent example is fixing library
by wrapping it inside suppressMessages
because many packages suffer from verbal diarrhoea.
At the moment, I’m using assign('library', new_library, envir = globalenv())
instead, which is obviously buggy.
export_submodules
currently does not correctly load and/or attach the submodules’ documentation.
Furthermore, the module_help
function currently only supports non-nested object names of the form a$b
but not a$b$c
.
Help for functions with non-legal names (e.g. operators) is currently not available under the actual name, since the internal dictionary uses roxygen2’s mangled names rather than the correct names.
Circumvent roxygen2’s name mangling mechanism and extract the real names instead.
Given a directory a
with files b.r
and c/d.r
, implement a mechanism which enables the use of a = import(a)
, and loads the appropriate submodules a$b
and a$c$d
.
__init__.r
file.__init__.r
files.__init__.r
files are loaded in the right order.Currently modules’ script_path
is aware of Rscript
and R CMD BATCH
but not littler (except maybe incidentally). Ensure that it works.
The de-facto standard for R is to use doc comments and roxygen2 to transform those into Rd. Unfortunately, roxygen2 currently only supports parsing packages, no other structure (and in particular not single files). I’ve created r-lib/roxygen2#273 to address this.
At the moment, the implementation relies on non-exposed private functions from the roxygen2 package and a bit of hackery, which is obviously not future-proof.
Not a big deal, but ??
is not an operator, it is just ?
defined in a tricky way. Also, :=
is actually an operator, it is not used in R, but the parser recognizes it, and you can redefine it as you feel like:
> `??` <- function(x, y) x + y
> 1 ?? 2
Error in help.search("2", fields = "1", package = NULL) :
incorrect field specification
> `:=` <- function(x,y) x + y
> 1 := 2
[1] 3
Does this work?
# a.r
import('b')
# b.r
import('a')
Add test case, and ensure that it has a proper, well-defined semantic with bounded execution time (although not necessarily regardless of inclusion order).
Similar to PYTHONPATH
and sys.path
in Python.
It happens that I am running some R code in a temporary directory, as part of a build, and want to import some modules. I could use absolute file names, but some mechanism to set the import path would be better I think.
MWE from ebits
, both b$min
and sum
have one required and one named default argument.
The first example that throws an error here should work. This might be an R bug though.
library(modules)
b = import('base')
c(1:100) %>% b$minN
# Error in `c(1:100)`$b : 3 arguments passed to '$' which requires 2
c(1:100) %>% (b$minN)
# [1] 2
c(1:100) %>% b$minN(N=2)
# [1] 2
c(1:100) %>% sum
# [1] 5050
c(1:100) %>% sum(na.rm=T)
# [1] 5050
Does it make sense to byte-compile modules, either upon loading or (ideally) cached?
Of course, this would not be as fast as using Rcpp
, but is a speed increase that comes at (almost) no cost to the end user.
unload
and reload
currently don’t detach globally (or otherwise) attached environments. At least for globally attached environments, this should change.
With a module base
that defines an %or%
operator, is there a way to use it when it is not attached?
> library(modules)
> b = import('base', attach_operators=F)
> T %or% F
Error: could not find function "%or%"
> T b$%or% F
Error: unexpected symbol in "T b"
> T b$`%or%` F
Error: unexpected symbol in "T b"
> T `b$%or%` F
Error: unexpected symbol in "T b"
> T (b$`%or%`) F # suggestion by @gaborscardi
Error: unexpected symbol in "T (b$`%or%`) F"
If not, it might make sense to not export operators when they are not attached.
edit, this works:
b$`%or%`(T,F)
Do we want that this way?
> library(modules)
> options(import.path)
Error in options(import.path) : object 'import.path' not found
Unless explicitly specified as local (#18), make imports global-by-default.
That is, import('foo')
will first search the module search path for a module called foo
before trying the current module’s local directory last. In order to prioritise a local module, it is required to use a relative import, e.g. import('./foo')
.
Custom operators containing a .
in their name are not getting exported since they are incorrectly assumed to be S3 methods.
Allow importing some, but not all submodules of a module, akin to Python’s
from foo import a, b, c
MWE:
library(modules)
help(package = 'utils')
# Error in is_module_help_topic(topic, parent.frame()) :
# argument "topic" is missing, with no default
See Demo: Foreign language interface
Prototyping via Rcpp works but compiling of modules requires a standardised installation procedure which Modules ideally should support by providing appropriate functions. Same goes for the loading dynamic libraries, although that is already quite straightforward.
Consider the following module:
a
|- __init__.r
|- file.r
Editing and reloading does not reload a submodule's changes:
a = import('amodule')
# change file.r
reload(a) # this does not load the changes in file.r
Having reload()
default to shallow reloading that only reloads the __init__
file doesn't make any sense. Either make shallow-whole-module reloading default or full deep reloading.
.
in the import search path corresponds to getwd()
rather than module_path()
, which means that running a file via Rscript foo/bar.r
will result in a different path from Rscript bar.r
.
Actually, even module_path()
will currently do the wrong thing for this, for the same reason. We have to investigate commandArgs(trailing=FALSE)
to get the real path.
【Platform】
Win7 x64
【Error Description】
installing source package 'modules' ...
R
inst
preparing package for lazy loading
help
Error in iconv(lines, encoding, "UTF-8", sub = "byte") :
embedded nul in string: '\title{Find a moduleb\0\031s source code location}'
ERROR: installing Rd objects failed for package 'modules'
【Analysis】
I've done some research on the internet and found a similar issue {call to install_github() fails on Windows, but not on Linux or Mac}[https://github.com/r-lib/devtools/issues/420]
It seems that the error occurs because of some unsupported utf-8 character. There's a suggestion in the post above:
pre-processing those files to replace non-ASCII with {rd codes} would be nice. Then we could write using legible UTF-8, but R would still see what it currently wants.
At the moment, modules may call base::attach
at certain points, and don’t change its default option warn.conflicts = TRUE
. This can get very annoying in some circumstances, most notably cluttering the output of R-Markdown etc.
Attaching modules in the global namespace, and in particular attaching the operators:
namespace, happens regardless of whether the namespace in question is already in the search()
path.
This is annoying because it results in message
s about hidden objects, and also because it potentially makes unloading and reloading brittle.
A simple check whether the namespace in question is already attached before attaching it again should fix this.
The following fails:
scripts/test.rmd
:
```{r}
library(modules)
import('a')
```
scripts/a.r
cat('Hello\n')
And then, from the command line:
mkdir test
./scripts/knitr -n scripts/test.rmd -o test/test.md
The result file contains:
```
## Error: Unable to load module a; not found in "scripts"
````
🔗 The knitr
executable is available as a gist. I have forgotten where I got it from.
Currently, when searching for a submodule, a path is only considered if it contains a __init__.r
file, all the way up the fully qualified module name. Drop this requirement for ease of use, since arguably there’s no great disadvantage to dropping this requirement.
This also implies changes to the initialisation of nested modules when loading a submodule, since not all modules might have an initialiser to execute.
See #12 for discussion.
a.r
:
message(module_name())
main.r
:
modules::import('a')
This yields:
Error in
message(module_name())
: could not find function"module_name"
Because main.r
didn’t attach modules
.
This should work, because inside a module the module-specific functions should always be attached. This can be implemented by checking the search()
path inside import
and temporarily attach
ing modules
as necessary.
Right now, when loading a modules that has functions and operators, they can be imported using mod = import(module)
.
The functions are usable with mod$function
, but there is no way to access the operators:
> library(modules)
> op = import(operators)
> 3 %or% 5
Error: could not find function "%or%"
> 3 op$`%or%` 5
Error: unexpected symbol in "3 op"
Attaching of course works, but that would attach functions as well:
> import(operators, attach=T)
> 3 %or% 5
[1] 3
So I would suggest to attach operators by default.
Add a mechanism to make installed packages available via import
.
This would have the following advantages over library
:
Uniform API for accessing library code
Doesn’t clutter the search path by default – encourages better style
Allows adapting the name for accessing packages (instead of having hard-coded pkg_name::
)
Supports loading and using packages locally only; consider:
frob = function (args) {
import(package = 'pkg', attach = TRUE)
…
}
Now pkg
is attached, but only inside frob
– just like normal modules, but unlike loading a package via library
.
R source files don’t allow specifying an encoding. Since everything but Unicode is nonsense in this day and age, import
should by default load all files as UTF-8 (to ensure that UTF-8 containing files work on Windows), see also #19.
The question is whether we can afford to be opinionated or whether import
needs an argument to override file encodings, or whether we copy PEP-0263 and allow specifying the source file’s encoding inside the file.
Would be probably required before supporting #23, upgrades, I guess. It would be great to support semantic versioning.
(Btw. it would also be great to have versioned dependencies, but I don't want to rush forward.....)
Calling reload on a module whose name shadows the name of another object or function in a parent environment causes that name to be overridden.
There is two possibilities to use packages/package functions inside a module:
library(mypackage)
#or
mypackage::method()
Disregarding that the latter is preferred to not shadow objects in .GlobalEnv
, either call occur somewhere in a given module.
The question is: is there a good way to track package dependencies of modules, without having to look for these calls in each module file?
Ideally, I would like to see something that provides:
modules
to automatically install these packages upon module loadingI was trying this module system, since I come from a node.js background and I think that R needs something like this really badly (congrats!)
However, I can only get to work the simplest case of same-folder modules. When I try to import modules in nested folders it doesn't work.
In my test scenario, I've got a config/log.r module I want to import. If I cd to config and import the module with import("log") it works. However, if I launch R from config's parent folder, it doesn't.
require(modules)
import("config/log")
Error en find_module("config/log") :
Unable to load module config/log; not found in "/Users/santillan/dev/solvview/analitix-r/src"
Seems that is_valid_nested considers my module is not valid. Is it mandatory to create a log folder and create a "init.r" file? Or should this just work?
script1.R
x = function() {
load('myFile.RData')
return(myContent)
}
dir/script2.R
library(modules)
y = import('script1.R')
y$x() # FileNotFound error
I'm not sure how to solve this best.
Forcing the user to load all files on module init is not good for modules that interface with a lot of large files.
Also, you can't setwd()
on every function call, because that would break passing file paths to module functions that are relative to the calling script.
Implement something akin to the following syntax to selectively import only certain names (for which aliases may be specified):
import(module_name, only(a, b, c = d))
This would load the module, but attach only a
, b
and c
, with c
referring to the name d
inside the module.
> expr = ma$CELsToExpression(DIR)
Error in `[.data.frame`(x, !gn$duplicated(x, ...), ) :
object 'gn' not found
> gn = import('general')
> expr = ma$CELsToExpression(DIR)
>
Here, gn
is imported in ma both times, but it needs to be imported in the top-level script as well for the ma$CELsToExpression
function to find it.
Function install_module
which allows specifying different source types (Git, Mercurial, Local source, Github …) and option sets, tailored to the respective source type.
In particular for the following functions:
module_file
(and mention in modules
help)module_name
module_file
instead of overriding system.path
module_file
running at module import time, rather than later on (i.e. when module_file()
should equal getwd()
)At the moment only 'text'
is supported. Implement 'html'
as well, and use rstudioapi
(?) to support RStudio’s built-in help.
I just found this via this SO question and it caught my eye. I think this is a nice idea and would make it a lot easier for people to wrap up their useful functions without the overhead of making a package. I've been interested in that idea for a while and one thing I did work on was creating help files for functions that aren't in a package - some of that work is here: https://github.com/Dasonk/flydoc
The solution I have isn't complete - I had a working case of using all of the first comments after the function declaration as the description for the help file but I wasn't in the habit of regularly pushing my commits and ended up losing some of that.
My question for you is how did you envision the 'help documentation' is stored in the users source files? This is your package but I'm interested in helping by forking and getting some of this functionality down but want to know what you had envisioned for that.
Currently modules only allow exporting functions, not objects, intentionally. Sometimes, however, it’s useful to export objects as well. Make this possible.
Suggested implementation, using roxygen annotations:
#' @export
the_object
There should be an upgrade_all
function which upgrades all installed modules, regardless of their source, if their source is known at all (i.e. installed via a known mechanism).
Rationale: R packages fail at this phenomenally: After update to R 3.1, all my packages were lost, installed.packages
didn’t list them, and update.packages
updates nothing. That’s of course ridiculous. Furthermore, since not all packages are CRAN packages, just listing all installed package directories and run install.packages
on that wouldn’t work (other sources include GitHub and Bioconductor). This is a sorry state of affairs, and entirely unnecessary.
S3 lookup requires S3 methods to be findable via the object search path. Explore how this can be done without importing every object.
The following objects might be key to a solution, investigate this:
.__S3MethodsTable__.
getS3method
.knownS3Generics
import('./foo')
or import('../foo')
should not search import.path
, and always be relative to the current module’s path.
Currently, printing a module reveals a lot of cruft. This should be hidden by implementing print.module
appropriately.
Module-level documentation applies to the whole module rather than just a function.
Module-level documentation can be specified by simply documenting a dummy object, .e.g. NULL
.
It’s not clear how this help would be displayed, though: ?modulename
would be ambiguous if the name is masked by a function, and ?modulename$
is invalid syntax. Going the route of help(module = 'modulename')
might be the best option.
MWE:
a/__init__.r
:
b = import('b')
a/b.r
:
f = function () 42
Observed effect:
a = import('a')
ls(a)
# Actual:
# character(0)
# Expected:
# [1] "b"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.