Giter VIP home page Giter VIP logo

universal_analysis_pipeline's Introduction

Universal data analysis pipeline

Build Status

Nextflow-based pipeline to run and deploy reproducible analyses. Alongside the pipeline I developed the toolbox reportsrender to execute notebooks, but it can as well be used without it.

Features

  • render jupyter notebooks or Rmarkdown notebooks (papermill/knitr)
  • ensure reproducible analyses
  • deploy reports to GitHub pages.

Structure

  • analyses: The actual analysis steps (i.e. jupyter notebooks, Rmarkdown documents, bash scripts) go here.
  • bin: scripts that can be called from nextflow directly (nextflow will add them to the PATH for commands ran from a process.
  • data: input data for the notebooks. I often replace this with a symlink to some data storage.
  • deploy: final reports. Will be filled by the deploy process which copies all html reports to that directory and creates an index file. A great way to share the final reports is to push this directory to Github pages.
  • envs: conda environment files go here. Create one file per notebook, or re-use environments for multiple notebooks -- it's up to you.
  • lib: put custom libraries (e.g. python modules) here.
  • results: final results generated by the pipeline go here. Concept: one can always delete the results directory and re-generate it from data using the pipeline.
  • tables: manually created input data that I want to be under version control. E.g. the list of samples and the associated patient data that you had to compile manually from three excel sheets because the biologists encoded data as background-color.
  • main.nf: The nextflow workflow that ties everything together.
  • nextflow.config: Contains configuration options for the pipeline (e.g. output directory). You can also set options here to run the pipeline on a HPC grid engine (e.g. SGE or SLURM).

How to run.

  1. Install nextflow In this case, we use conda. Check the nextflow webiste for other options.
conda create -n nextflow -c conda-forge -c bioconda nextflow
conda activate nextflow
  1. Clone this repository
gitclone [email protected]:grst/universal_analysis_pipeline.git
cd universal_analysis_pipeline
  1. Run the pipeline
./main.nf
  1. Share the results. You can zip and email the deploy folder. Even better is to share the results using github pages:
  • To setup GitHub pages, init a repository in the deploy folder and push to the gh-pages branch:
cd deploy
git init
git remote add origin <YOUR_REMOTE>
git checkout --orphan gh-pages
git add -A .
git commit -m "Initial deploy on gh-pages"
git push -u origin gh-pages
  • It can take a few minutes, but eventually your reports will be available at https://<yourgithubuser>.github.io/<yourrepo>

  • You might want to "password protect" your pages. This is not natively supported by GitHub pages, but a workaround is to put all files in a cryptic subfolder, e.g. rBymGubVBBrdHtGo6Of35E3uI. As GitHub pages doesn't list directories, you need to know the precise URL to access the folder. You can adjust the deploy dir in nextflow.config.

How to use

This repository is meant as a template. You can fork/clone this repository and expand from there. At least, you have to change two things:

  • Add your notebooks to the analyses folder
  • Edit main.nf to wire your notebooks together the right way. You can use reportsrender to execute the notebooks.

Ideas for the future:

  • convert conda envs to singularity containers to ensure reproducibility.

universal_analysis_pipeline's People

Contributors

grst avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.