Giter VIP home page Giter VIP logo

crc-multi-kingdom's Introduction

Multi-kingdom microbiota analyses identify bacterial-fungal interactions and biomarkers of colorectal cancer across cohorts

###Data Preprocess ##major step could be found in 1_raw sequence process.sh, mainly include: ##quality control with kneaddata, taxonomic profile with Kraken2 and Braken, functional profile by assembly (megahit) and gene prediction with Prodigal. Then gene abundance is estimated with Coverm

###Profile_preprocess, code in 2_Profile_preprocess.R ##Profiles were converted into relative abundance and only that with prevalence above 20% samples were filtered for further analysis

###Differential species/functions, code in 3_MMUPHin_differential_analysis.R ##Since the microbial profiles are compositional and sparse and heterogeneity exist among different cohorts, Meta-analysis Methods with a Uniform Pipeline for Heterogeneity (MMUPHin)76 was performed to identify CRC-related differential microbial species, which enables the normalization and combination of multiple microbial community studies. In the MMUPHin analysis, microbial community batch effect among cohorts was corrected via a Combat-like extended method. Microbial profile was arcsine square root (AST) transformed and age, gender and BMI of subjects were treated as covariates.

###The random forest-based model construction ##Feature selection, code in 4_feature_selection.R ##feature selection via ‘Boruta’ package in R with default parameters (pValue=0.05, mcAdj=T, maxRuns=1000) which iteratively removes the features that are proved by a statistical test to be less relevant than random probes. Correlations between “Confirmed features” identified by Boruta were then calculated and only features with correlation less than 0.7 were selected to further model construction to avoid co-linearity issue

##model construction, code in 5_classification_model_cv_repeat.R ##to construct predictive models, we tuned hyperparameters (e.g. mtry, ntree, nodesize, maxnodes) using ‘caret’ packages. Finally, with the best combination of hyperparameters, five-fold cross-validation model was constructed to avoid overfitting issues, which was constructed with each cohort and repeated 20 times. ##Model’s significance was accessed with 1000 permutations via “A3” package. code in 6_classification_model_significance.R

##Evaluation generalization of microbial markers, code in 7_classification_model_studytostudy_loco.R ##To further test the generalization of CRC microbial markers across technical and geographic differences in multiple populations, we extensively validated the diagnostic models with cohort-to-cohort transfer validation and leave-one-cohort-out (LOCO) validation as described previously

##Independent validation for robustness of multi-kingdom microbial markers with additional datasets,code in 8_classification_model_independent_validation.R

##specificity of features in non-CRC diseases, code in 9_classification_model_nonCRC_model_specificity.R ##To avoid false positives in clinical diagnoses, we estimated the specificity of microbial biomarkers for CRC via testing the AUROC values of models constructed with the best panel of features aganist non-CRC disease.

crc-multi-kingdom's People

Contributors

jiaonall avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.