Giter VIP home page Giter VIP logo

cancer-dependency-study's Introduction

Cancer-Dependency-Study

Cancer dependency study includes machine learning methods trying and a small shiny app.

Summary

Overview

Context

Predicting cancer dependencies from molecular data can help stratify patients and identify novel therapeutic targets. However, the prediction power of protein expression data has been strictly measured. Thus, the paper intended to evaluate the predictive power of the protein expression data generated by reverse-phased protein arrays in detecting cancer dependencies, and to develop a related analytic tool for community use.

Need

Understanding the relationship genotype-phenotype relationships of cancer cells would be critical for precision cancer medicine because it could help classify patients into different treatment groups and distinguish novel therapeutic targets.

Vision

The paper evaluated the consistency of cancer dependency data between CRISPR/Cas9 and short hairpin RNA (shRNA) perturbation platforms. Then the same-gene predictions of the cancer dependency would be performed using four available expression-related features (copy number alteration, DNA methylation, messenger RNA expression, and protein expression). Also, three machine learning algorithms (Conditional random forest, Linear regression, Random forest) have been used to analyze the feature importances.

Outcome

For the genes selected from CRISPR/Cas9 and shRNA, the paper found that the protein expression data showed significant predictive power for cancer dependencies, and they were the best predictive feature for the CRISPR/Cas9-based dependency data. Thus, a systematic assessment for predicting cancer dependeccies of cell lines from different expression-related features of a gene has been provided. Also, the protein expression data have been proved that they are a highly valuable information source for understanding tumor vulnerabilities and identifying therapeutic opportunities.

Methodology

Data Sources

This paper makes use of the following sets of data:

  1. Reverse-phase protein array (RPPA)-based protein data from CCLE, which assayed 214 protein markers across 899 cell lines.
  2. Cancer dependency data: CRISPR/Cas9 (DepMap19Q1) and shRNA (DEMETER2).
  3. Copy number alteration (CNA), DNA methylation, and mRNA expression data from CCLE

Data Model

The reponse variable is a vector of dependency scores (cell growth change) for each gene across cell lines.

  1. A score of 0 indicates that a gene is not essential.
  2. A score of -1 corresponds to the median value of all common essential genes.

The explanatory variables (predictors) were the self-features that were related to gene expression.

Feature engineering has been done for improving the quality of the model outcome.

  1. Construct a robust cancer dependency set by selecting genes showed high consistency between shRNA and CRISPR/Cas9.
  2. Overlap that with the genes and cell lines from CCLE
  3. Only consider RPPA, CNA, DNA methylation, and mRNA expression from the same set of cell lines.

Machine Learning Model

  • Train-test-split: training set (70% cancer cell lines), testing set (30% cancer cell lines).
  • Regression methods: linear regression, random forest, conditional random forest.
  • Baseline model: exclude failed predictions by using the averaged dependency score as the predicted values.
  • Cross-validation training: 10-fold cross validation and repeated the procedure for 10 times to avoid model overfitting.
  • Evaluation metrics: root-mean-square error (RMSE) and R2.
  • Fearture importance analysis (varImp function in R with caret package):
    • Linear regression: the absolute value of the t-statistic for each model parameter is used.
    • Random forest: the MSE is computed on the out-of-bag data for each tree, and then the same computed after permuting a variable.

Tools

Paper

R and Python

Myself

Python: scikit-learn (Lasso Regression), statsmodels (Lasso Regression, Linear Regression), CatBoost R: Shinyapp

Paper and App

cancer-dependency-study's People

Contributors

smt970913 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.