klmr / box Goto Github PK

View Code? Open in Web Editor NEW

845.0 20.0 47.0 5.34 MB

Write reusable, composable and modular R code

Home Page: https://klmr.me/box/

License: MIT License

R 93.62% Makefile 2.70% CSS 0.93% C 1.90% Shell 0.81% JavaScript 0.04%

r packages modules

box's People

Contributors

Stargazers

Watchers

box's Issues

Make compatible with dev_help

At the moment, modules::{help/`?`} shadows devtools::{help/`?`} and dispatches directly to utils::{help/`?`}.

Test whether devtools is loaded and conditionally dispatch to it instead.

In addition, use setHook to catch subsequent loading of devtools, and ensure that modules’ help system is still callable.

Allow R_IMPORT_PATH environment variable to contain multiple paths

The way Unix handles its $PATH variable is to look in every path separated by ":".

It would be nice if modules' R_IMPORT_PATH did the same.

Add function `parent_module`

Every once in a while, it’s helpful for modules to be able to query their caller, i.e. the scope from which import was invoked to load them.

Therefore, there should be a function parent_module (defined only inside modules, i.e. inside the helper_env), which returns that parent (roughly analogous to parent.frame()).

One use-case is to override symbols in the parent scope. This should be handled with care, of course, but is sometimes necessary when monkey-patching functions. A prominent example is fixing library by wrapping it inside suppressMessages because many packages suffer from verbal diarrhoea.

At the moment, I’m using assign('library', new_library, envir = globalenv()) instead, which is obviously buggy.

Support submodule documentation

export_submodules currently does not correctly load and/or attach the submodules’ documentation.

Furthermore, the module_help function currently only supports non-nested object names of the form a$b but not a$b$c.

Make help for non-legal names available

Help for functions with non-legal names (e.g. operators) is currently not available under the actual name, since the internal dictionary uses roxygen2’s mangled names rather than the correct names.

Circumvent roxygen2’s name mangling mechanism and extract the real names instead.

Implement nested modules

Given a directory a with files b.r and c/d.r, implement a mechanism which enables the use of a = import(a), and loads the appropriate submodules a$b and a$c$d.

Change module name lookup to consider directories which contain __init__.r file.
Load all underlying modules ~~and create additional top-level module which contains references to all nested modules, recursively~~.
Hierarchically load all applicable __init__.r files.
Implement test cases to verify that the __init__.r files are loaded in the right order.
Implement test cases to verify that the correct modules (and only those) are loaded.

Make modules work with littler

Currently modules’ script_path is aware of Rscript and R CMD BATCH but not littler (except maybe incidentally). Ensure that it works.

Implement interactive help

The de-facto standard for R is to use doc comments and roxygen2 to transform those into Rd. Unfortunately, roxygen2 currently only supports parsing packages, no other structure (and in particular not single files). I’ve created r-lib/roxygen2#273 to address this.

At the moment, the implementation relies on non-exposed private functions from the roxygen2 package and a bit of hackery, which is obviously not future-proof.

Clean up list of operators in import.r

Not a big deal, but ?? is not an operator, it is just ? defined in a tricky way. Also, := is actually an operator, it is not used in R, but the parser recognizes it, and you can redefine it as you feel like:

> `??` <- function(x, y) x + y
> 1 ?? 2
Error in help.search("2", fields = "1", package = NULL) : 
  incorrect field specification

> `:=` <- function(x,y) x + y
> 1 := 2
[1] 3

Ensure that circular imports work

Does this work?

# a.r
import('b')

# b.r
import('a')

Add test case, and ensure that it has a proper, well-defined semantic with bounded execution time (although not necessarily regardless of inclusion order).

Search path for modules

Similar to PYTHONPATH and sys.path in Python.

It happens that I am running some R code in a temporary directory, as part of a build, and want to import some modules. I could use absolute file names, but some mechanism to set the import path would be better I think.

magrittr operator `%>%` is evaluated before modules `$` operator

MWE from ebits, both b$min and sum have one required and one named default argument.

The first example that throws an error here should work. This might be an R bug though.

library(modules)
b = import('base')
c(1:100) %>% b$minN
# Error in `c(1:100)`$b : 3 arguments passed to '$' which requires 2
c(1:100) %>% (b$minN)
# [1] 2
c(1:100) %>% b$minN(N=2)
# [1] 2
c(1:100) %>% sum
# [1] 5050
c(1:100) %>% sum(na.rm=T)
# [1] 5050

Byte-compile modules

Does it make sense to byte-compile modules, either upon loading or (ideally) cached?

Of course, this would not be as fast as using Rcpp, but is a speed increase that comes at (almost) no cost to the end user.

Detach environments when un/reloading

unload and reload currently don’t detach globally (or otherwise) attached environments. At least for globally attached environments, this should change.

Using operators with `attach_operators=F`

With a module base that defines an %or% operator, is there a way to use it when it is not attached?

> library(modules)
> b = import('base', attach_operators=F)
> T %or% F
Error: could not find function "%or%"
> T b$%or% F
Error: unexpected symbol in "T b"
> T b$`%or%` F
Error: unexpected symbol in "T b"
> T `b$%or%` F
Error: unexpected symbol in "T b"
> T (b$`%or%`) F # suggestion by @gaborscardi
Error: unexpected symbol in "T (b$`%or%`) F"

If not, it might make sense to not export operators when they are not attached.

edit, this works:

b$`%or%`(T,F)

Do we want that this way?

options(import.path) not working as expected

> library(modules)
> options(import.path)
Error in options(import.path) : object 'import.path' not found

Make imports absolute by default

Unless explicitly specified as local (#18), make imports global-by-default.

That is, import('foo') will first search the module search path for a module called foo before trying the current module’s local directory last. In order to prioritise a local module, it is required to use a relative import, e.g. import('./foo').

Operator %.% is not getting exported

Custom operators containing a . in their name are not getting exported since they are incorrectly assumed to be S3 methods.

Import multiple submodules

Allow importing some, but not all submodules of a module, akin to Python’s

from foo import a, b, c

help(package = …) fails

MWE:

library(modules)
help(package = 'utils')
# Error in is_module_help_topic(topic, parent.frame()) :
#   argument "topic" is missing, with no default

Add support for foreign language interface

See Demo: Foreign language interface

Prototyping via Rcpp works but compiling of modules requires a standardised installation procedure which Modules ideally should support by providing appropriate functions. Same goes for the loading dynamic libraries, although that is already quite straightforward.

reload() does not reload submodules

Consider the following module:

a
|- __init__.r
|- file.r

Editing and reloading does not reload a submodule's changes:

a = import('amodule')
# change file.r
reload(a) # this does not load the changes in file.r

Having reload() default to shallow reloading that only reloads the __init__ file doesn't make any sense. Either make shallow-whole-module reloading default or full deep reloading.

Import path is relative to CWD, not module

. in the import search path corresponds to getwd() rather than module_path(), which means that running a file via Rscript foo/bar.r will result in a different path from Rscript bar.r.

Actually, even module_path() will currently do the wrong thing for this, for the same reason. We have to investigate commandArgs(trailing=FALSE) to get the real path.

Failed in installation

【Platform】
Win7 x64

【Error Description】
installing source package 'modules' ...
R
inst
preparing package for lazy loading
help
Error in iconv(lines, encoding, "UTF-8", sub = "byte") :
embedded nul in string: '\title{Find a moduleb\0\031s source code location}'
ERROR: installing Rd objects failed for package 'modules'

【Analysis】
I've done some research on the internet and found a similar issue {call to install_github() fails on Windows, but not on Linux or Mac}[https://github.com/r-lib/devtools/issues/420]

It seems that the error occurs because of some unsupported utf-8 character. There's a suggestion in the post above:

pre-processing those files to replace non-ASCII with {rd codes} would be nice. Then we could write using legible UTF-8, but R would still see what it currently wants.

Add option `warn.conflict`

At the moment, modules may call base::attach at certain points, and don’t change its default option warn.conflicts = TRUE. This can get very annoying in some circumstances, most notably cluttering the output of R-Markdown etc.

Modules should not attach twice

Attaching modules in the global namespace, and in particular attaching the operators: namespace, happens regardless of whether the namespace in question is already in the search() path.

This is annoying because it results in messages about hidden objects, and also because it potentially makes unloading and reloading brittle.

A simple check whether the namespace in question is already attached before attaching it again should fix this.

modules don’t work with command line knitr

The following fails:

scripts/test.rmd:

```{r}
library(modules)
import('a')
```

scripts/a.r

cat('Hello\n')

And then, from the command line:

mkdir test
./scripts/knitr -n scripts/test.rmd -o test/test.md

The result file contains:

```
## Error: Unable to load module a; not found in "scripts"
````

🔗 The knitr executable is available as a gist. I have forgotten where I got it from.

Remove requirement of `init.r` in super modules

Currently, when searching for a submodule, a path is only considered if it contains a __init__.r file, all the way up the fully qualified module name. Drop this requirement for ease of use, since arguably there’s no great disadvantage to dropping this requirement.

This also implies changes to the initialisation of nested modules when loading a submodule, since not all modules might have an initialiser to execute.

See #12 for discussion.

`modules` should be attached inside a module

MWE:

a.r:

message(module_name())

main.r:

modules::import('a')

This yields:

Error in message(module_name()): could not find function "module_name"

Because main.r didn’t attach modules.

Expected behaviour:

This should work, because inside a module the module-specific functions should always be attached. This can be implemented by checking the search() path inside import and temporarily attaching modules as necessary.

Attach operators by default?

Right now, when loading a modules that has functions and operators, they can be imported using mod = import(module).

The functions are usable with mod$function, but there is no way to access the operators:

> library(modules)
> op = import(operators)
> 3 %or% 5
Error: could not find function "%or%"
> 3 op$`%or%` 5
Error: unexpected symbol in "3 op"

Attaching of course works, but that would attach functions as well:

> import(operators, attach=T)
> 3 %or% 5
[1] 3

So I would suggest to attach operators by default.

Make packages loadable via `import`

Add a mechanism to make installed packages available via import.

This would have the following advantages over library:

Uniform API for accessing library code
Doesn’t clutter the search path by default – encourages better style
Allows adapting the name for accessing packages (instead of having hard-coded pkg_name::)
Supports loading and using packages locally only; consider:
```
frob = function (args) {
    import(package = 'pkg', attach = TRUE)
    …
}
```
Now pkg is attached, but only inside frob – just like normal modules, but unlike loading a package via library.

Specify file encodings

R source files don’t allow specifying an encoding. Since everything but Unicode is nonsense in this day and age, import should by default load all files as UTF-8 (to ensure that UTF-8 containing files work on Windows), see also #19.

The question is whether we can afford to be opinionated or whether import needs an argument to override file encodings, or whether we copy PEP-0263 and allow specifying the source file’s encoding inside the file.

Versioning

Would be probably required before supporting #23, upgrades, I guess. It would be great to support semantic versioning.

(Btw. it would also be great to have versioned dependencies, but I don't want to rush forward.....)

reload fails for overridden names

Calling reload on a module whose name shadows the name of another object or function in a parent environment causes that name to be overridden.

Package dependencies of modules

There is two possibilities to use packages/package functions inside a module:

library(mypackage)
#or
mypackage::method()

Disregarding that the latter is preferred to not shadow objects in .GlobalEnv, either call occur somewhere in a given module.

The question is: is there a good way to track package dependencies of modules, without having to look for these calls in each module file?

Ideally, I would like to see something that provides:

a way to track required packages in a module (roxygen annotations?)
an option in modules to automatically install these packages upon module loading

import from folder not working (at least on the mac)

I was trying this module system, since I come from a node.js background and I think that R needs something like this really badly (congrats!)
However, I can only get to work the simplest case of same-folder modules. When I try to import modules in nested folders it doesn't work.

In my test scenario, I've got a config/log.r module I want to import. If I cd to config and import the module with import("log") it works. However, if I launch R from config's parent folder, it doesn't.

require(modules)
import("config/log")

Error en find_module("config/log") : 
  Unable to load module config/log; not found in "/Users/santillan/dev/solvview/analitix-r/src"

Seems that is_valid_nested considers my module is not valid. Is it mandatory to create a log folder and create a "init.r" file? Or should this just work?

Modules breaks file loading

script1.R

x = function() {
    load('myFile.RData')
    return(myContent)
}

dir/script2.R

library(modules)
y = import('script1.R')
y$x() # FileNotFound error

I'm not sure how to solve this best.

Forcing the user to load all files on module init is not good for modules that interface with a lot of large files.

Also, you can't setwd() on every function call, because that would break passing file paths to module functions that are relative to the calling script.

Allow specifying import list

Implement something akin to the following syntax to selectively import only certain names (for which aliases may be specified):

import(module_name, only(a, b, c = d))

This would load the module, but attach only a, b and c, with c referring to the name d inside the module.

Module lookup in called modules not working properly

> expr = ma$CELsToExpression(DIR)

Error in `[.data.frame`(x, !gn$duplicated(x, ...), ) : 
  object 'gn' not found
> gn = import('general')
> expr = ma$CELsToExpression(DIR)

>

Here, gn is imported in ma both times, but it needs to be imported in the top-level script as well for the ma$CELsToExpression function to find it.

Add simple distribution mechanism

Function install_module which allows specifying different source types (Git, Mercurial, Local source, Github …) and option sets, tailored to the respective source type.

Add examples to documentation & readme

In particular for the following functions:

module_file (and mention in modules help)
module_name
Add comment explaining why module_file instead of overriding system.path
Add test case for module_file running at module import time, rather than later on (i.e. when module_file() should equal getwd())

Implement other help formats

At the moment only 'text' is supported. Implement 'html' as well, and use rstudioapi (?) to support RStudio’s built-in help.

HTML
LaTeX/PDF
RStudio?

Documentation

I just found this via this SO question and it caught my eye. I think this is a nice idea and would make it a lot easier for people to wrap up their useful functions without the overhead of making a package. I've been interested in that idea for a while and one thing I did work on was creating help files for functions that aren't in a package - some of that work is here: https://github.com/Dasonk/flydoc

The solution I have isn't complete - I had a working case of using all of the first comments after the function declaration as the description for the help file but I wasn't in the habit of regularly pushing my commits and ended up losing some of that.

My question for you is how did you envision the 'help documentation' is stored in the users source files? This is your package but I'm interested in helping by forking and getting some of this functionality down but want to know what you had envisioned for that.

Make it possible to export objects as well

Currently modules only allow exporting functions, not objects, intentionally. Sometimes, however, it’s useful to export objects as well. Make this possible.

Suggested implementation, using roxygen annotations:

#' @export
the_object

Module upgrade mechanism

There should be an upgrade_all function which upgrades all installed modules, regardless of their source, if their source is known at all (i.e. installed via a known mechanism).

Rationale: R packages fail at this phenomenally: After update to R 3.1, all my packages were lost, installed.packages didn’t list them, and update.packages updates nothing. That’s of course ridiculous. Furthermore, since not all packages are CRAN packages, just listing all installed package directories and run install.packages on that wouldn’t work (other sources include GitHub and Bioconductor). This is a sorry state of affairs, and entirely unnecessary.

Support for S3

S3 lookup requires S3 methods to be findable via the object search path. Explore how this can be done without importing every object.

The following objects might be key to a solution, investigate this:

.__S3MethodsTable__.
getS3method
.knownS3Generics

Implement relative imports

import('./foo') or import('../foo') should not search import.path, and always be relative to the current module’s path.

Override `print` for modules

Currently, printing a module reveals a lot of cruft. This should be hidden by implementing print.module appropriately.

Implement module-level documentation

Module-level documentation applies to the whole module rather than just a function.

Module-level documentation can be specified by simply documenting a dummy object, .e.g. NULL.

It’s not clear how this help would be displayed, though: ?modulename would be ambiguous if the name is masked by a function, and ?modulename$ is invalid syntax. Going the route of help(module = 'modulename') might be the best option.

Loading submodules in init.r does nothing

MWE:

a/__init__.r:
```
b = import('b')
```
a/b.r:
```
f = function () 42
```

Observed effect:

a = import('a')
ls(a)
# Actual:
# character(0)
# Expected:
# [1] "b"

klmr / box Goto Github PK

box's People

Contributors

Stargazers

Watchers

Forkers

box's Issues

MWE:

Expected behaviour:

Recommend Projects

Recommend Topics

Recommend Org