Giter VIP home page Giter VIP logo

treeshapedrearrangements's Introduction

This repository contains code and data from Koche et al. (under review).

Data

Data from the Berlin Neuroblastoma Dataset can only be shared upon request at the moment. This is due to ongoing work in collaborating groups. It will be made public at the time of publication.

Circle Calls

  • Data/CircleseqCircles.bed contains Circle-seq circle calls for our dataset
  • Data/WGSinferredCircles_eccDNA.txt contains WGS-inferred circle calls for our dataset (eccDNA only)
  • Data/WGSinferredCircles_ecDNA.txt contains WGS-inferred circle calls for our dataset (ecDNA only)

WGS Rearrangement Calls

  • Data/MergedSV_StrictestFiltering_CircleSeqCircleAnnotation.txt are the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with Circle-seq circle calls for classification (available upon request)
  • Data/MergedSV_StrictestFiltering_WGSCircleAnnotation.txt are the merged and filtered interchromosomal rearrangement calls (see Structural Variant Merging) which were overlapped with WGS-inferred circle calls for classification (available upon request)
  • Data/PedPanCanSVs.csv contains structural variants in the DKFZ Pediatric Pan-Cancer dataset published by Gröbner et al. 2018 (data downloaded from the corresponding R2 platform)
  • Data/PedPanCanMeta.csv contains metadata such as the tumor entity for the DKFZ Pediatric Pan-Cancer dataset

Expression Data

  • Data/berlin_cohort_rnaseq_fpkm.txt is RNA-seq data for the Berlin neuroblastoma cohort (available upon request)
  • Data/peifer_54nb_fpkms.txt is RNA-seq data published by Peifer et al.

Clinical Data

  • Data/ClinicalData.csv specifies risk group and survival data

Other

  • Data/BroadIntervals/b37-[...]_MinusBlckLst.txt specifies the mappable, non-blacklisted genome we use for randomization analyses

Code

Structural Variant Merging

  • script_merge_translocations[...].py collapses interchromosomal structural variants from different callers
  • script_filter_mapqBAM_3.0.py filters collapsed interchromosomal structural variants using common criteria
  • script_merge_intraSVs_[...].py collapses intrachromosomal structural variants from different callers
  • script_filter_atleast2callers_2.0.py filters collapsed intrachromosomal structural variants using common criteria
  • script_merge_circularized_regions_[...].py overlaps interchromosomal structural variants with circle calls and classifies accordingly
  • script_merge_circularized_regions_[...]_intraSV_[...].py overlaps intrachromosomal structural variants with circle calls and classifies accordingly
  • script_search_integrations_[...].py searches for two-breakpoint patterns indicative of circle integration into a chromosome

Detection of Clusters of Rearrangements

Tree-shaped rearrangement regions are referred to as Palm Trees throughout the code.

  • RunAll.R runs scripts in the correct order
  • Parse*.R and Prepare*.R are several scripts to read data from different sources and create tidy representations for further analysis
  • CallPalmTrees.R uses output of structural variant merging to call palm trees (a.k.a. tree-shaped rearrangements or clusters of rearrangements)
  • CallPalmTreesPedPanCan.R calls palm trees on published data from Gröbner et al. 2018 (using the same settings as in CallPalmTrees.R)

Basic Rearrangement Cluster Statistics

  • CircosPlots.R plots the identified rearrangements and palm tree regions
  • PalmTreeStackPlot.R creates plots that integrate all CN, SV and Circle-seq information
  • GeneralPalmTreeStatistics.R analyses number and length distributions of palm trees
  • PalmTreeDensity.R plots genome-wide recurrence of palm trees
  • PedPanCanStats.R explores palm tree occurrence in PedPanCan dataset
  • CompareBerlinCohortToPedPanCan.R compares palm tree prevalence between the Peifer/Berlin dataset and the PedPanCan neuroblastoma cases
  • AnalyseNBCallPalmTreesRandomization.Rmd explores 500 synthetic datasets for our cohort and the PedPanCan dataset respectively. Synthetic datasets were obtained by randomizing breakpoint positions during palm tree calling. This is used to estimate false-discovery rates due to the number of rearrangements per sample.

Genes and Gene Expression

  • PalmTreeGenes.R analyzes which genes can be found within palm tree regions
  • PalmTreeTargetsCloseToOncogenes.R statistically analyses whether palm tree target sites are significantly enriched in the neighborhood of cancer-related genes.
  • ExpressionAnalysis.R screens for deregulated genes close to palm tree associated rearrangements

Association with circular DNA

  • RegionSampling.R contains sampling methods to randomly sample regions from a masked or unmasked genome
  • Overlap[...]CircleCalls.R statistically analyses the degree of overlap between palm tree regions and eccDNA / ecDNA inferred from WGS / Circle-seq
  • MergeWGSCSOverlapPlots.R merges results from the preceding scripts for integrative plotting

Clinical Relevance

  • ClinicalData.R explores the prognostic significance of palm trees

Breakpoint Analysis

  • CircleJunctionAnalyseSvabaBreakpoints.R investigates sequence characteristics reconstructed Circle-seq circle junctions
  • SVAnalyseSvabaBreakpoints.R investigates sequence characteristics interchromosomal rearrangements calls by Svaba
  • MemeAnalysis.sh runs MEME on several breakpoint-associated sequences
  • myHomology.R obtains microhomology estimates for accurate breakpoint coordinates from the reference genome
  • Repeats.R tests for association of breakpoints with repetitive regions

Contact

If you have any questions concerning code or data, please do not hesitate to contact us at [email protected].

treeshapedrearrangements's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.