Giter VIP home page Giter VIP logo

modules's People

Contributors

charlottekyng avatar hxrts avatar inodb avatar ipstone avatar juberpatel avatar ndbrown6 avatar raylim avatar rodrigogm avatar selenicp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

modules's Issues

fathmm annotation bug

transcripts with > 1 snv in a vcf are missing annotations: only first snv is annotated

Mutational signatures figures

For the pies, please take out the grey background, axes and labels.
For the bars, the panels should be on a single row instead of over 2 rows, take out the grey background, larger font on the y-axis and the top title bars.
Thanks.

Output excel version of alltables

Usually we want only the headers:

TUMOR_SAMPLE    GENE    AA  EFFECT  TUMOR_MAF   NORMAL_MAF  CHASM  Uterus   Mut Tastor  FATHMM  Cancer Gene Census  Kandoth Lawrence    haploinsufficiency  Cancer Cell Fraction ABSOLUTE   IMPACT

It would be nice to have one sheet that only has those headers (summary view) and one with all the data for both high_moderate and low_modifier. That is four sheets in total.

Check if one can ssh to all nodes in cluster

After each run, we check whether or not the file size is the same on each node. This however requires the main process to ssh to every node. We should therefore check before running the pipeline that this is possible.

mutation_summary's environment is not working for other users

This is the output:

[ngk1@saba2 log]$ pwd
/home/ngk1/share/projects/nendo_amyo_btseq/log
[ngk1@saba2 log]$ less mutation_summary.2015-08-05.25.log
make[1]: Entering directory '/ifs/e63data/reis-filho/projects/nendo_amyo_btseq'
mkdir -p -m 775 log/mutation_summary.2015-08-05.25/excel excel; umask 002; set -o pipefail;  source /home/debruiji/share/usr/anaconda-envs/anaconda-2.7/bin/activate /home/debruiji/share/usr/anaconda-envs/anaconda-2.7; \
python modules/scripts/mutation_summary_excel.py alltables/allTN.mutect.dp_ft.som_ad_ft.target_ft.pass.dbsnp.cosmic.nsfp.eff.gene_ann.cn_reg.chasm.fathmm.tab.high_moderate.txt alltables/allTN.mutect.dp_ft.som_ad_ft.target_ft.pass.dbsnp.cosmic.nsfp.eff.gene_ann.cn_reg.chasm.fathmm.tab.low_modifier.txt alltables/allTN.mutect.dp_ft.som_ad_ft.target_ft.pass.dbsnp.cosmic.nsfp.eff.gene_ann.cn_reg.chasm.fathmm.tab.synonymous.txt alltables/allTN.mutect.dp_ft.som_ad_ft.target_ft.pass.dbsnp.cosmic.nsfp.eff.gene_ann.cn_reg.chasm.fathmm.tab.nonsynonymous.txt alltables/allTN.strelka_varscan_indels.tabigh_moderate.txt alltables/allTN.strelka_varscan_indels.tab.low_modifier.txt alltables/allTN.strelka_varscan_indels.tab.synonymous.txt alltables/allTN.strelka_varscan_indels.tab.nonsynonymous.txt excel/mutation_summary.xlsx
discarding /home/debruiji/anaconda/bin from PATH
prepending /home/debruiji/share/usr/anaconda-envs/anaconda-2.7/bin to PATH
/home/ngk1/share/usr/lib/python/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/hashtable.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
Traceback (most recent call last):
 File "modules/scripts/mutation_summary_excel.py", line 6, in <module>
    import pandas as pd
     File "/home/ngk1/share/usr/lib/python/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/__init__.py", line 6, in <module>
        from . import hashtable, tslib, lib
        ImportError: /home/ngk1/share/usr/lib/python/pandas-0.12.0-py2.7-linux-x86_64.egg/pandas/hashtable.so: undefined symbol: PyUnicodeUCS2_DecodeUTF8
        modules/excel/mutationSummary.mk:24: recipe for target 'excel/mutation_summary.xlsx' failed
        make[1]: *** [excel/mutation_summary.xlsx] Error 1
        make[1]: Leaving directory '/ifs/e63data/reis-filho/projects/nendo_amyo_btseq'

Summarise FACETS results to gene-level results

#### turn segmented copy number data to gene-based copy number with findOverlaps
## define HomDel as TCN=0, loss as TCN<ploidy, gain as TCN>ploidy, amp as TCN>=ploidy+4
## where ploidy= mode of TCN
### some variant of the below, also need one for the breast panel, IMPACT310 and exome

genes <- read.delim("/home/ngk1/share/reference/IMPACT410_genes_for_copynumber.txt", as.is=T)

genesGR <- GRanges(seqnames=genes$chromosome, 
        ranges=IRanges(as.numeric(genes$start_position), as.numeric(genes$end_position)),
        mcols=genes[,c("order", "Cyt", "hgnc_symbol")])

facets_files <- dir("facets", pattern="txt", full=T)

mm <- do.call("cbind", lapply(facets_files, function(f) {
    tab <- read.delim(f, as.is=T)
    tab$chrom[which(tab$chrom==23)] <- "X"

    tabGR <- GRanges(seqnames=tab$chrom, 
        ranges=IRanges(as.numeric(tab$loc.start), as.numeric(tab$loc.end)),
        mcols=tab[,-c(1:4)])

    fo <- findOverlaps(tabGR, genesGR)
    rr <- ranges(fo, ranges(tabGR), ranges(genesGR))
    df <- cbind(as.data.frame(fo), as.data.frame(rr))

    df <- cbind(df, mcols(genesGR)[df$subjectHits,], mcols(tabGR)[df$queryHits,])

#when genes span multiple segments
    oo <- tapply(df$mcols.cnlr.median, df$subjectHits, function(x){which.max(abs(x))})
    oo <- oo[match(1:409, names(oo))]
    oo[which(is.na(oo))] <- 1

    df <- df[unlist(lapply(1:409, function(x) { which(df$mcols.order==x)[oo[which(names(oo)==x)]]})),]

    ploidy <- table(df$mcols.tcn)
    ploidy <- as.numeric(names(ploidy)[which.max(ploidy)])

    df$GL <- 0
    df$GL[which(df$mcols.tcn<ploidy)] <- -1
    df$GL[which(df$mcols.tcn==0)] <- -2
    df$GL[which(df$mcols.tcn>ploidy)] <- 1
    df$GL[which(df$mcols.tcn>=ploidy+4)] <- 2

    df <- df[match(genes$order, df$mcols.order),]
    df$GL
}))
colnames(mm) <- facets_files
mm <- cbind(genes, mm)
write.table(mm, file="GL.txt", sep="\t", row.names=F, na="", quote=F)

Turn off annotation for mouse mutation calls

We need snpEff annotation (with mouse genome), but don't need dbNSFP, CHASM, FATHMM and the rest. Something choked (I think CHASM) because of the different chromosome sizes.

Excel output raw file sheet unparseable with pandas

The raw file sheets are created from a groupby object. When that object is
ouputted to an excel sheet, cells with the same value are merged. It looks nice
visually in the sheet but it makes the sheet unparseable, so for the raw data
sheets it is probably better to turn the groupby object to a regular dataframe
before outputting to excel.

Test data

We should create some simple test data sets that can show that all modules are working. Preferably a very tiny one that runs only a couple of minutes.

Additions to mutsig_report

In addition to what's already there, please add the following to the report. This is basically trying to solve the signatures using the NMF method.

require(NMF)
require(lsa)
require(reshape2)
require(plyr)
require(ggplot2)
require(gplots)

alexandrov <- read.delim(opt$alexandrovData)
rownames(alexandrov) <- with(alexandrov, gsub(">", ".", paste(Trinucleotide, Substitution.Type, sep=".")))
alexandrov.matrix <- data.matrix(alexandrov[,4:ncol(alexandrov)])

solveNMF <- function(x, inmatrix){
    coef <- fcnnls(x, inmatrix[rownames(x),]) # reorder the rownames of the in matrix 
    colsum <- apply(coef$x, 2, sum)
    coef_x_scaled <- scale(coef$x, center=F, scale=colsum)
    return(coef_x_scaled)
}

mutcounts.nmf <- solveNMF(alexandrov.matrix, spectra_mat) ### this spectra_mat should be similar to the "X" that goes into plotMutBarplot, example attached.

# then perhaps a heatmap or something showing the results of mutcounts.nmf, also write it out to a text file.

sig_counts_matrix.txt

Run CHASM with multiple classifiers

Currently we only support running CHASM with one classifier e.g. Breast. It would be nice to support multiple, maybe by supplying a space separated list of classifiers in the Makefile

Mutsig_report is not using the novel vcfs

For the absolute vast majority of projects, only the novel mutations are reported (GMAF>0.05). mutsig_report is using the everything.vcf. To be consistent, this should filter the input mutations to everything.novel.vcf.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.