Giter VIP home page Giter VIP logo

projr's Introduction

projr

R-CMD-check

The goal of projr is to ...

Installation

You can install the development version of projr from GitHub with:

# install.packages("devtools")
devtools::install_github("SATVILab/projr")

Example

This is a basic example which shows you how to solve a common problem:

library(projr)
## basic example code

projr's People

Contributors

miguelrodo avatar

Stargazers

AM avatar

Forkers

miguelrodo

projr's Issues

Identify how to archive well

Option 1

  • Automatically name initial report <project_name>V1 in _bookdown.yml$book_filename.
  • Allow an automatic versioning check, which:
    • Bumps version up.
    • Only runs if there are no files in the Git working directory, and records the last Git commit.
    • Copies outputted document to .archive.
    • Requests an update to NEWS.md
    • Possibly auto-generates a detailed changes file, which shows changes to "important" objects (such as "data/" files.

Develop motivation for `_projr.yml`

  • Purpose is to set directories for which there are no standard R directories, really, and about which there might be disagreement or good reason to choose between different options
    • Example of a good reason
      • Keeping large raw data and output in /scratch but code in /home

Review Quarto

Quarto

  • Quarto essentially provides a more feature-rich, user-friendly interface to RMarkdown-style Markdown and extends it to Python, Julia and Observable.

What does Quarto not do that I am proposing (out the box)

  • Allow user-specific project directories (such as dir_cache and dir_archive
  • Work directly with bookdown
  • Implement automated but flexible versioning
  • Facilitate easier raw data management
  • Implement Docker containers easily
  • Create nice in-folder log messages
  • Create set-up document with specific things to change
  • Wrap into an R package
  • Wrap useful Git workflows
  • Set

Review prodigenr

Aims to generate a project directory structure.

  • Source
  • Pros
    • Generates project directory structure
  • Cons
    • Doesn't seem to allow user-specific project directory structures

Review starters package

Sets up an R project

  • Source
  • Pros
    • Sets up project structure
    • Aims to review project health report
  • Cons
    • Last changed a couple years ago
      • Doesn't seem to be taking on templates then
    • Does not seem to use bookdown
  • Overall
    • Seems to aim at a similar thing to what we want to do, but seems to no longer be maintained.

Review `workflowr`

Generates a website that is reproducible in the sense that it is time-stamped and versioned.

  • Source
  • Pros
    • Explicit versioning upon publication
      • Use Git to do that.
    • Runs analyses in isolated R sessions
    • Records session information from each analysis
    • Sets seeds
    • Able to set Git up for you using workflowr commands
  • Cons
    • Only outputs HTML format
    • Fixed R folder directory
    • Does not build to a package - not ideal.
    • Does not use renv automatically
    • Have to learn to use bookdown for non-HTML multi-page documents
  • Overall
    • No real reason not to use a project structure that also builds into a package
      • Not everything is a report

Consider thesisdown

  • Pros
    • Automatically has sections you would
      • Abstract
      • Appendix
      • Acknowledgments
  • Cons
    • No automatic versioning

Wrap useful Git workflows

  • Examples
    • Build on your own dev branch
    • Upon (successful) major rebuild:
      • Squash commits since last major rebuild/first commit
      • Prompt the commit message
      • Push latest commit
      • Submit pull request/merge into main branch

Review projects package

  • Source: https://github.com/NikKrieger/projects
  • Pros
    • Can allow archiving of projects
    • Has a central directory of author information
    • Prints to the console useful snippets
  • Cons
    • A central database just seems cumbersome and not portable
      • A person may have different projects in different places
    • Uses rds files, that do not translate across users.
    • Seems like a lot of extra stuff to remember that doesn't really help that much.
    • Does not use bookdown.
    • Seems like a real commitment to their format.
    • Archiving not linked to Git commits.
  • Overall
    • Seems like quite a commitment for author metadata and project folders
      • Author metadata is something you'll have to manually enter time and again anyway.

Conceptualise `projr`

  • bookdown-centric
    • Allows cross-linking, multi-page output, multiple formats, established, R-specific
    • Most natural progression from RMarkdown
      • Very easy to use - one extra file to understand! (_bookdown.yml)
  • Follow R package structure where applicable
    • R/, data/
  • Allow user-specifiable and re-usable structures for non-R package folders
    • Where Rmd's go
    • Intermediate results
    • Final output
    • Author details (use usethis)
  • Clean, natural versioning
    • Automatically increment report version number and tag with last Git commit when the report is built under the following conditions:
      • User requests it
      • Working directory is empty
      • Possibly:
        • data/ folder is empty
  • Save key outputs to output/sharing/publish folder:
    • Processed data
      • In R and non-R format (as applicable)
    • Link to raw data
      • Raw data should already be in another folder
    • Tarball of package (if it is a package)
  • Allow publishing to website automatically
    • Create orphaned gh-pages branch that contains just the last report
  • Automatically set up key tools:
    • git
    • renv
  • Log raw data used
    • Keep an md file within each
    • Keep hash table on each file within raw data
  • Appendices summarising useful information
    • utils::sessionInfo() from when book was last built
    • Put together md's from raw data folder(s)
    • Possibly summarise raw data (# of files/directory).
    • Project folder structure
    • Summarise changes in Git between versions
      • List all commits
      • List all files that have changed
  • Zero (or near-zero) dependencies for install
    • Dependencies are unnecessary when you're just loading data or trying to access a link to view a report.
  • Position on Docker
    • In R, your main dependencies are already
    • Can build on cluster if you want a different computer.
    • Can just wipe out the cache folder (possibly the raw data directory) and recompute the renv/ library if you're worried.
      • Can clone to a different location and set up again there.
  • Templates
    • Examples
      • README for data package projects
      • README that specifies project structure automatically
        • As well as links to raw data
      • Analysis Rmd
        • Gives sections and Markdown comments saying what to put in each section
          • Examples of headings:
            • Aims, Data (sample size), Methods, Biological information, Study design
  • Possible extras:
    • Only rerun report if raw data, R files, Rmd's or key package versions in lock file have changed.
    • Add functions for package even if it's just a report:
      • Link to download raw data
      • Link to view report online
      • Link to repo

Allow specifying alternative `_projr.yml` during project initialisation

  • I think the best way would be to:
    • Allow setting the PROJR_YML_DEFAULT_PATH variable, which is automatically used whenever path isn't specified if it is set.
    • Suggest to people that they save the paths as environment variables, and then use projr::projr_init(path = NULL, path_env = <env_var_name>).
  • BETTER suggestion:
    • Allow people to set PROJR_YML_PATH_<settings_id>, and then when they run projr::projr_init, they get the following options for _projr.yml:
    • 1: projr` package default
    • 2: <settings_id_1>
    • 3: <settings_id_2>
    • ...

Facilitate better handling of raw data

Possible features of raw data

  • Make it a GitHub repo
  • Allow uploading only the last commit as a branch to GitHub
  • Check that every item is documented
  • Create structure for automated commit message
  • Ensure that old versions are retained when uploading to Google Drive (or prompt a reminder to do, such as creating a GitHub issue)
  • Detect if raw data have changed since last log message
  • Write down the details of when the raw data were last modified and what they were (well, whatever was in each folder)

Add essentials of build function

Essential features

  • Differentiate between major and minor builds
  • Allow manual version bumping
  • #143
  • Record package state if build was successful
  • Output version bump message
  • #142
  • #135

Non-essential features

  • Consider enforcing that GitHub working directory is clean before build
  • Consider automatic building of Git commits
  • Consider notifying team members if successful

Identify standard components to `_output.yml`

Probably a good idea to use the bookdown defaults, but replace the references to bookdown with references to the project or its GitHub repository:

bookdown::gitbook:
  css: style.css
  config:
    toc:
      before: |
        <li><a href="./">A Minimal Book Example</a></li>
      after: |
        <li><a href="https://github.com/rstudio/bookdown" target="blank">Published with bookdown</a></li>
    edit: https://github.com/USERNAME/REPO/edit/BRANCH/%s
    download: ["pdf", "epub"]
bookdown::pdf_book:
  includes:
    in_header: preamble.tex
  latex_engine: xelatex
  citation_package: natbib
  keep_tex: yes
bookdown::epub_book: default

For example, the above would become

bookdown::gitbook:
  css: style.css
  config:
    toc:
      before: |
        <li><a href="./">DataTidy22BCGCorrPilot</a></li>
      after: |
        <li><a href="https://github.com/SATVILab/DataTidy22BCGCorrPilot" target="blank">DataTidy22BCGCorrPilot</a></li>
    edit: https://github.com/SATVILab/DataTidy22BCGCorrPilot/edit/BRANCH/%s
    download: ["pdf", "epub"]
bookdown::pdf_book:
  includes:
    in_header: preamble.tex
  latex_engine: xelatex
  citation_package: natbib
  keep_tex: yes
bookdown::epub_book: default

Review `cabinets` package

Sets up a re-usable project template

  • Source
  • Pros
    • Re-usable, flexible templates that don't require synchronising with an online version
  • Cons
    • Doesn't do anything, really, besides the file structure.
      • Seems like we could just implement this kind of idea ourselves?
        • This is basically what the _project.yml file is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.