Giter VIP home page Giter VIP logo

all_markers's Introduction

Cancer Systems Biology, Section of Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800, Lyngby, Copenhagen

ALL_markers

Introduction

This repository contains scripts related to the discovery of gene expression markers that distinguish two acute lymphoblastic leukemia subtypes: B-cell and T-cell ALL.

Data-driven discovery of gene expression markers distinguishing pediatric acute lymphoblastic leukemia subtypes. Mona Nourbakhsh, Nikola Tom, Anna Schroeder Lassen, Helene Brasch Lind Petersen, Ulrik Kristoffer Stoltze, Karin Wadt, Kjeld Schmiegelow, Matteo Tiberti, Elena Papaleo. bioRxiv 2024.02.26.582026; doi: https://doi.org/10.1101/2024.02.26.582026

Please cite the paper above if you use the scripts for your own research.

Installation requirements and guideline for reproducing the analyses

All analyses have been performed on a GNU/Linux server.

Computing environment

To reproduce the data and results, you will need to set up a conda environment which contains the expected R version and the required R packages. This requires being able to run Anaconda by means of the conda executable.

If you don't have access to conda, please see the Miniconda installer page for instructions on how to install Miniconda.

Once you have access to conda, follow the below guidelines to reproduce the results and data:

  1. Clone the GitHub repository into a local directory on your local machine:
git clone https://github.com/ELELAB/ALL_markers.git
cd ALL_markers
  1. Create a virtual environment using conda and activate it afterwards. The environment should be placed in the ALL_markers folder:
conda create --prefix ./env_ALL -c conda-forge r-base=4.2 r-pacman=0.5.1 r-curl=4.3.3 r-ragg=1.2.5 r-renv=0.16.0 r-osfr=0.2.9 r-cairo=1.6.0 gsl=2.7
conda activate ./env_ALL
  1. Run the analyses:
bash ./run_all.sh

WARNING: our scripts use the renv R package to handle automatic dependency installation. Renv writes packages in its own cache folder, which is by default in the user's home directory. This might not be desirable if free space in the home directory is limited. You can change the location of the Renv root folder by setting a system environment variable - please see comments in the run_all.sh script.

The run_all.sh will perform the following steps to reproduce all results and data:

  1. Download data from the corresponding OSF repository which contains the required data to run the analyses and all results associated with the analyses.

  2. Install in the environment all necessary packages to run the analyses.

  3. Perform all analyses.

Structure and content of GitHub and OSF repositories

The GitHub and OSF repositories contain scripts and data/results associated with this publication, respectively. Both repositories are structured in the same way with a main folder named after the main analyses which then contains all scripts and data/results associated with the main analysis. Below is an overview of these main folders. See README files in each main folder for more details.

TARGET_data:

  • This directory contains gene expression data of the TARGET-ALL-P2 project and information about its metadata such as age of patients, number of samples, number of genes, subtype information etc. Moreover, here the data is subsetted to contain only Primary Blood Derived Cancer - Bone Marrow samples

TARGET_replicates:

  • This directory investigates replicate samples found in the TARGET-ALL-P2 data and adjusts the data for these replicates

TARGET_processing:

  • This directory processes gene expression data of the TARGET-ALL-P2 project (preprocessing, normalization, and filtering using TCGAbiolinks)

TARGET_transform:

  • This directory voom transforms TARGET-ALL-P2 gene expression data
  • In here, the directory voom_transform_bmp_raw voom transforms the raw gene expression data and the directory voom_transform_bmp_comp voom transforms processed data

TARGET_pca:

  • This directory performs dimensionality reduction of TARGET-ALL-P2 gene expression data
  • In here, the directory bone_marrow_primary_pca_raw performs MDS of the raw gene expression data and the directory bone_marrow_primary_pca_comp performs MDS of the processed data (both before and after batch correction). Moreover, in bone_marrow_primary_pca_comp, PCA is performed to find the contributions of genes to principal components 1 and 2

TARGET_batch:

  • This directory performs batch correction of TARGET-ALL-P2 gene expression data

TARGET_dea:

  • This directory performs differential expression analysis of TARGET-ALL-P2 gene expression data

TARGET_housekeeping

  • This directory investigates overlaps of discovered differentially expressed genes with a list of housekeeping genes
  • This list of housekeeping genes was downloaded from https://www.tau.ac.il/~elieis/HKG/ with associated publication: Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013 Oct;29(10):569-74. doi: 10.1016/j.tig.2013.05.010. Epub 2013 Jun 27. Erratum in: Trends Genet. 2014 Mar;30(3):119-20. PMID: 23810203.

TARGET_enrichment

  • This directory performs enrichment analyses of consensus differentially expressed genes

TARGET_lasso

  • This directory performs regularized elastic net logistic regression of TARGET-ALL-P2 gene expression data

TARGET_compare_genes

  • This directory compares results found from above methods (differential expression analysis, elastic net logistic regression, PCA, housekeeping analysis) and with genes reported in the Network of Cancer Genes (NCG) database

TARGET_clustering

  • This directory performs unsupervised clustering of the TARGET-ALL-P2 gene expression data

TARGET_random_forest

  • This directory performs random forest variable selection on predicted clusters from unsupervised clustering

TARGET_survival

  • This directory performs survival analysis on predicted subtype-related and cluster-related gene expression markers

TARGET_drug_targets

  • This directory investigates if any of the subtype-related and cluster- related gene expression markers are annotated as drug targets in the Drug Gene Interaction Database

validation

  • This directory contains scripts used to perform validation of predicted gene expression markers from an independent cohort of patients with ALL from a Danish hospital. As this data cannot be granted, these scripts are not meant to be runnable.

all_markers's People

Contributors

mona-n avatar elenapapaleo avatar

Watchers

Matteo Tiberti avatar  avatar  avatar

all_markers's Issues

Add scripts from TARGET_pca and TARGET_batch

Add scripts from TARGET_pca which performs and visualizes MDS analysis of the gene expression data before and after batch correction. MDS analysis of raw gene expression data is also added.

Add scripts from TARGET_batch which carries out batch correction of the expression data

Add scripts from TARGET_processing

Add scripts from TARGET_processing which preprocess, normalize and filter the data and moreover, investigates/visualizes lost genes following normalization step

Add all scripts from first steps of workflow

Add scripts from TARGET_data which investigate the expression data, plots the age distribution of patients, subsets data to contain only "Primary Blood Derived Cancer - Bone Marrow" samples, investigates metadata

Add scripts from TARGET_replicates which investigates replicate samples in the data and adjusts the data for these replicate samples

Add scripts from TARGET_processing which preprocess, normalize and filter the data and moreover, investigates/visualizes lost genes following normalization step

Add scripts from TARGET_transform which performs voom transformation of the data

Add also README and init.R script

Add DEA

Add scripts and README files associated with differential expression analysis (DEA)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.