Giter VIP home page Giter VIP logo

final-degree-project's Introduction

Benchmarking & Evaluating Single-cell Enhancer-Gene Regulatory Networks

Final Degree Project | Bachelor's Degree in Bioinformatics | ESCI-UPF

Project Description

Enhancer-inclusive gene regulatory networks allow us to characterize and provide functional understanding of regulatory interactions underlying phenotypes in a cell-type specific manner.GRaNIE is a method that infers gene regulatory networks including enhancers from bulk transcriptomics (RNA-seq) and chromatin accessibility (ATAC-seq) data by reconstructing tripartite networks which describe transcription factor-enhancer-gene associations. Bulk data mask cell-type- and cell-state-specific gene expression and chromatin accessibility data as the counts represent an average of the activity of a population of cells. Hence, to overcome these limitations the approach requires its adjustment at the single-cell resolution to take into account inter-cell variation and consequently truly capture specific cell-type and cell-state regulatory differences. Therefore, a proper benchmark at the single-cell layer of the approach remains to be accomplished. We propose a way of benchmarking GRaNIE at the single-cell resolution through a validation-based approach that assesses the network's ability to predict real regulatory interactions from cell-type-specific promoter capture Hi-C, eQTL and ChIP-Seq data. Furthermore, we report further validation through a network biological comprehensive analysis alongside an evaluation of the eGRNs predictive power by means of GRaNPA, a machine learning framework that assesses the network's capability of predicting cell-type-specific differential gene expression data. The results show GRaNIE, applied to single-cell hiPSC-neuron timecourse datasets, is capable of inferring neuronal differentiation and development relevant regulatory associations as well as predicting cell-type specific iPSC against neuron DGE at the single-cell level.

Scripts Descriptions

Datasets Preprocessing

  • ๐Ÿ“„ timecourse_preprocessing.R

    Preprocessing performed on the timecourse dataset.
  • ๐Ÿ“„ combined_preprocessing.R

    Preprocessing performed on the combined dataset.
  • ๐Ÿ“„ data_extraction.R

    Extraction of raw data (RNA counts, ATAC counts, metadata) from the standardly preprocessed dataset-specific-seurat objects for further dataset-customized preprocessing.
  • ๐Ÿ“„ SeuratToAnndata.R

    Conversion from seurat objects to AnnData objects to permit the usage of the preprocessed datasets for eGRN inference via SCENIC+ which is only available in Python/Scanpy.
  • ๐Ÿ“„ get_fragments_from_archr.R

    Get the ATAC fragments from the ArchR projects for each dataset.

Single-cell GRaNIE

  • ๐Ÿ“„ GRaNIE_batch_mode.R

    Run GRaNIE in batch mode (across a wide range of cluster resolutions on the integrated WNN space) generating an eGRN per resolution.
  • ๐Ÿ“„ GRaNIE_specific_resolution.R

    Run GRaNIE in for a specific cluster resolution.
  • ๐Ÿ“„ GRaNIE_helper_functions.R

    Helper functions to execute GRaNIE for single-cell datasets. Functions to preprocess the data, run GRaNIE for a specific metadata column and in batch mode.
  • ๐Ÿ“„ GRaNIE_metadata.R

    Run GRaNIE pseudobulking the cells according a metadata column e.g. celltype.
  • ๐Ÿ“„ timecourse_scGRaNIE_test.R

    Run GRaNIE on the timecourse dataset for initial size tests, different cell type annotations and batch mode are utilized.
  • ๐Ÿ“„ network_enrichment_analysis.R

    Perform gene enrichment and TF enrichment analyses on a given eGRN.
  • ๐Ÿ“„ custom_peakgene_QC_plots.R

    Customize GRaNIE original code that creates the peak-gene quality control plots in a qay that different eGRNs can be analyzed at the same time.
  • ๐Ÿ“„ MergePeakGeneQCPDFs.sh

    Create a single pdf document with the final peak-gene quality control plots for different eGRNs (each network-specific QC plot in a different page)
  • ๐Ÿ“„ CRISPR_screen_GRaNIE.R

    Run GRaNIE for separate scRNA and scATAC modalities for a NPC-Neuron datasets perturbed for a set of Schizophrenia markers.

single-cell GRaNIE Benchmark

  • Validations

  • ๐Ÿ“„ ChiPSeq_setup.R

    ChiP-seq data preprocessing.
  • ๐Ÿ“„ ChiPSeq_validation.R

    Validation of network TF-peak links with preprocessed ChiP-Seq data.
  • ๐Ÿ“„ pcHiC_setup.R

    Promoter capture Hi-C data preprocessing.
  • ๐Ÿ“„ pcHiC_validation.R

    Validation of network peak-gene links with preprocessed promoter capture Hi-C data.
  • ๐Ÿ“„ eQTL_enrichment.R

    eQTL preprocessing and enrichment / validation analysis on network peak-gene links.
  • ๐Ÿ“„ GRN_validation_analyses.R

    Integration and visualization of networks validation.
  • GRaNPA

  • ๐Ÿ“„ GRaNPA_network_evalutation.R

    Networks evaluation for predicting cell-type-specific DGE using GRaNPA.
  • ๐Ÿ“„ RNA_preprocessing_DGE_analysis.R

    Bulk RNA preprocessing and DGE analysis from an independent hiPSC to Neuron time course dataset.
  • eGRN Analyses

  • ๐Ÿ“„ Network_visualizations.R

    Visualization of a GRaNIE network and TF/Gene enichment analyses.
  • ๐Ÿ“„ celltype_GRNstats_comparison.R

    Comparison of the effects of different cell-type annotations on GRaNIE networks.
  • ๐Ÿ“„ pearson_vs_spearman_analysis.R

    In depth study of pearson and spearman correlation algorithms for links inference in GRaNIE.
  • ๐Ÿ“„ extensive_GRaNIE_networks_analyses.R

    GRaNIE networks analyses based on pseudobulking resolutions, overlaps on links across resolutions, networks general stats, pcHi-C validations, robust links and pcHI-C validated links relationhsip, and further pearson vs spearman correlation analysis.
  • ๐Ÿ“„ overlap_analysis.R

    Overlap analysis of different type of links (TF-peak-gene, TF-peak and peak-gene) across cluster resolutions.

single cell-eGRN Inference Methods

  • SCENIC+

  • ๐Ÿ“„ scRNA-seq_Preprocessing.ipynb

    Timecourse RNA preprocessing using Scanpy.
  • ๐Ÿ“„ scRNA-seq_setup_timecourse.ipynb

    Import the preprocessed timecourse Anndata objects and format it for further steps.
  • ๐Ÿ“„ scATAC-seq_preprocessing_timecourse.ipynb

    Generate pseudobulk ATAC-seq profiles, call peaks and generate a consensus peak set for the timecourse dataset.
  • ๐Ÿ“„ timecourse_cistopic.ipynb

    Perform CisTopic modelling, candidate enhancer regions inference, motif enrichment analysis using pycisTarget for the timecourse dataset.
  • ๐Ÿ“„ timecourse_scenicplus.ipynb

    eGRN inference using SCENIC+ on the timecourse dataset and further network analysis.
  • ๐Ÿ“„ scRNA-seq_setup_combined.ipynb

    Import the preprocessed combined Anndata objects and format it for further steps.
  • ๐Ÿ“„ scATAC-seq_preprocessing_combined.ipynb

    Generate pseudobulk ATAC-seq profiles, call peaks and generate a consensus peak set for the combined dataset.
  • ๐Ÿ“„ combined_cistopic.ipynb

    Perform CisTopic modelling, candidate enhancer regions inference, motif enrichment analysis using pycisTarget for the combined dataset.
  • ๐Ÿ“„ combined_scenicplus.ipynb

    eGRN inference using SCENIC+ on the combined dataset and further network analysis.
  • Pando

  • ๐Ÿ“„ Pando.R

    eGRN inference using the Pando method.

final-degree-project's People

Contributors

gerard-deuner avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.