Giter VIP home page Giter VIP logo

ropensci / ucscxenatools Goto Github PK

View Code? Open in Web Editor NEW
102.0 5.0 12.0 2.93 MB

:package: An R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq https://cran.r-project.org/web/packages/UCSCXenaTools/

Home Page: https://docs.ropensci.org/UCSCXenaTools

License: GNU General Public License v3.0

R 86.59% XQuery 11.14% TeX 2.27%
ucsc-xena downloader api-client tcga ccle icgc ucsc toil treehouse r bioinformatics

ucscxenatools's Introduction

UCSCXenaTools logo

CRAN status lifecycle R-CMD-check rOpenSci DOI

UCSCXenaTools is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Public omics data from UCSC Xena are supported through multiple turn-key Xena Hubs, which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.

Who is the target audience and what are scientific applications of this package?

  • Target Audience: cancer and clinical researchers, bioinformaticians
  • Applications: genomic and clinical analyses

Table of Contents

Installation

Install stable release from CRAN with:

install.packages("UCSCXenaTools")

You can also install devel version of UCSCXenaTools from github with:

# install.packages("remotes")
remotes::install_github("ropensci/UCSCXenaTools")

If you want to build vignette in local, please add two options:

remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE)

Data Hub List

All datasets are available at https://xenabrowser.net/datapages/.

Currently, UCSCXenaTools supports the following data hubs of UCSC Xena.

Users can update dataset list from the newest version of UCSC Xena by hand with XenaDataUpdate() function, followed by restarting R and library(UCSCXenaTools).

If any url of data hub is changed or a new data hub is online, please remind me by emailing to [email protected] or opening an issue on GitHub.

Basic usage

Download UCSC Xena datasets and load them into R by UCSCXenaTools is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as XenaGenerate, XenaFilter, XenaQuery, XenaDownload and XenaPrepare functions, respectively. They are very clear and easy to use and combine with other packages like dplyr.

To show the basic usage of UCSCXenaTools, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. Users can learn more about UCSCXenaTools by running browseVignettes("UCSCXenaTools") to read vignette.

XenaData data.frame

UCSCXenaTools uses a data.frame object (built in package) XenaData to generate an instance of XenaHub class, which records information of all datasets of UCSC Xena Data Hubs.

You can load XenaData after loading UCSCXenaTools into R.

library(UCSCXenaTools)
#> =========================================================================================
#> UCSCXenaTools version 1.4.8
#> Project URL: https://github.com/ropensci/UCSCXenaTools
#> Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html
#> 
#> If you use it in published research, please cite:
#> Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
#>   from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
#>   Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
#> =========================================================================================
#>                               --Enjoy it--
data(XenaData)

head(XenaData)
#> # A tibble: 6 × 17
#>   XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype Label
#>   <chr>     <chr>         <chr>       <chr>              <int> <chr>       <chr>
#> 1 https://… publicHub     Breast Can… ucsfNeve_pu…          51 gene expre… Neve…
#> 2 https://… publicHub     Breast Can… ucsfNeve_pu…          57 phenotype   Phen…
#> 3 https://… publicHub     Glioma (Ko… kotliarov20…         194 copy number Kotl…
#> 4 https://… publicHub     Glioma (Ko… kotliarov20…         194 phenotype   Phen…
#> 5 https://… publicHub     Lung Cance… weir2007_pu…         383 copy number CGH  
#> 6 https://… publicHub     Lung Cance… weir2007_pu…         383 phenotype   Phen…
#> # ℹ 10 more variables: Type <chr>, AnatomicalOrigin <chr>, SampleType <chr>,
#> #   Tags <chr>, ProbeMap <chr>, LongTitle <chr>, Citation <chr>, Version <chr>,
#> #   Unit <chr>, Platform <chr>

Workflow

Select datasets.

# The options in XenaFilter function support Regular Expression
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% 
  XenaFilter(filterDatasets = "clinical") %>% 
  XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo

df_todo
#> class: XenaHub 
#> hosts():
#>   https://tcga.xenahubs.net
#> cohorts() (3 total):
#>   TCGA Lung Cancer (LUNG)
#>   TCGA Lung Adenocarcinoma (LUAD)
#>   TCGA Lung Squamous Cell Carcinoma (LUSC)
#> datasets() (3 total):
#>   TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#>   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#>   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

Query and download.

XenaQuery(df_todo) %>%
  XenaDownload() -> xe_download
#> This will check url status, please be patient.
#> All downloaded files will under directory /var/folders/gm/lw6z28md2594gcnws_38_9f40000gn/T//RtmpfewSeZ.
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena.
#> Creating directories for datasets...
#> Downloading TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
#> Downloading TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
#> Downloading TCGA.LUSC.sampleMap/LUSC_clinicalMatrix

Prepare data into R for analysis.

cli = XenaPrepare(xe_download)
class(cli)
#> [1] "list"
names(cli)
#> [1] "LUNG_clinicalMatrix" "LUAD_clinicalMatrix" "LUSC_clinicalMatrix"

More to read

Citation

Cite me by the following paper.

Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
  from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. 
  Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627

# For BibTex
  
@article{Wang2019UCSCXenaTools,
    journal = {Journal of Open Source Software},
    doi = {10.21105/joss.01627},
    issn = {2475-9066},
    number = {40},
    publisher = {The Open Journal},
    title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq},
    url = {https://dx.doi.org/10.21105/joss.01627},
    volume = {4},
    author = {Wang, Shixiang and Liu, Xuesong},
    pages = {1627},
    date = {2019-08-05},
    year = {2019},
    month = {8},
    day = {5},
}

Cite UCSC Xena by the following paper.

Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data 
    visualization and interpretation." BioRxiv (2019): 326470.

How to contribute

For anyone who wants to contribute, please follow the guideline:

  • Clone project from GitHub
  • Open UCSCXenaTools.Rproj with RStudio
  • Modify source code
  • Run devtools::check(), and fix all errors, warnings and notes
  • Create a pull request

Acknowledgment

This package is based on XenaR, thanks Martin Morgan for his work.

ropensci_footer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.