Giter VIP home page Giter VIP logo

cleanse's Introduction

cleanse

Overview

The SummarizedExperiment (se) class offers a useful way to store multiple row and column metadata along with the values from an experiment and is widely used in computational biology.
Although subsetting se's is possible with base R notation (ie using []), se's cannot be manipulated using grammar from the tidyverse. As a consequence, it is not possible to manipulate se's in pipelines using the pipe operator.

This package contains a number of wrapper functions to extend the usage of se's:

  • dplyr functions: to use dplyr's grammar of data manipulation
  • arithmetic functions: to perform arithmetic on 2 se's
  • write functions: to print the options of a se and to write se's to delimited files

As an example, compare how cleanse is used to subset rows for gene_group NOTCH and then arrange the columns by patient

Using native syntax Using cleanse
rowdata <- rowData(se)
se <- se[rowdata$gene_group == "NOTCH", ]
se <- se[, order(se$patient)]
se <- se %>%
filter(row, gene_group == "NOTCH") %>%
arrange(col, patient)

Usage information can be found by reading the vignettes: browseVignettes("cleanse").

Supported dplyr functions

Functions that subset the se based on the rowData or colData

  • filter() picks rows/cols based on the se's attached rowData/colData
  • slice() picks rows/cols by position
  • arrange() changes the ordering of the rows
  • sample_slice() picks a random portion of rows or cols from the se.

Functions that change the se's rowData or colData

  • select() selects variables
  • rename() renames variables
  • mutate() adds new variables that are functions of existing variables
  • drop_metadata() drops all rowData and colData having only 1 unique value

Supported arithmetic functions

  • - subtracts values from the assays in 2 se's
  • + adds values from the assays in 2 se's
  • / divides values from the assays in 2 se's
  • * multiplies values from the assays in 2 se's
  • round rounds the assay values of a se

Supported write functions

  • write_csv() writes a se to csv
  • write_tsv() writes a se to tsv
  • write_delim() writes a se to a delimited file

Installation

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("cleanse")

Usage

library(cleanse)

# -- An example se called seq_se is provided

# Example pipe
data(seq_se)
seq_se %>%
  filter(row, gene_group == "NOTCH") %>%
  filter(col, site %in% c("brain", "skin")) %>%
  arrange(col, patient) %>%
  round(3)

# Example sampling
data(seq_se)
seq_se %>% slice_sample(row, prop=.2)

# Example arithmetic subtracting the expression values at T=0 from T=4
data(seq_se)
(filter(seq_se, col, time == 4)) - (filter(seq_se, col, time == 0))

Getting help

If you encounter a clear bug, please file a minimal reproducible example on github.

cleanse's People

Contributors

martijnvanattekum avatar

Stargazers

 avatar

Watchers

 avatar

cleanse's Issues

Comment on example of 'native syntax' in README

Hi @martijnvanattekum,

I saw cleanse submitted to Bioconductor and it prompted me to take a quick look at the package.

This is just a comment on something that caught my eye in the README.md, and this is partly a matter of perhaps personal coding style, but an example of 'native syntax' struck me as a bit 'non-native'/'unnatural' (for want of a better word):

# From README.md: "Using native syntax"
coldata <- colData(se) 
indices <- which(coldata$time == 4) 
se[,indices] 

I think most 'base R-centric Bioconductor' users would use the simpler:

se[, which(se$time == 4)]

Or, if you know there are no NAs, then the even simpler:

se[, se$time == 4]

Anyway, it's a minor thing, but when comparing 'base' vs. 'tidy' approaches I think it pays to try to write both versions as 'natively'/'naturally' as possible.

Cheers,
Pete

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.