Giter VIP home page Giter VIP logo

aneufinder's People

Contributors

ataudt avatar daewoooo avatar dtenenba avatar hpages avatar jwokaty avatar kayla-morrell avatar link-ny avatar nturaga avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aneufinder's Issues

too many 3-6 somy CNV calls from single cell DNA-seq

I am using aneufinder to analyze single cell data with default parameters.

For most of cells from healthy donors (assumed with only a small number of copy number variation), only a few called CNVs are not "2somy", which is as expected.

However for around 5% of cells, most of called CNVs are not "2 somy", as shown in attached file.

Is there any problem with the result?

cell73.txt

Add way to skip plotting

First of all: Thank you for the great package!

It would be nice to have a parameter in the Aneufinder procedure to disable/skip the plotting, because it seems to be the most time-consuming part. Looking at the code it seems that this would roughly involve making the code between L573-L758 optional.

To make things slightly harder, it would also be great to be able to do the plotting without regenerating the models.

If this is a feature you would accept a patch for, I can create a pull request.

connect to host 4 port 22: Invalid argument

I am trying to run Aneufinder(inputfolder=infolder, outputfolder=outfolder,format="bam", numCPU=cpu, method=c("HMM","dnacopy"),pairedEndReads = TRUE )

A config file is created and then I get the Message:
Setting up parallel execution with 4 CPUs ...ssh: connect to host 4 port 22: Invalid argument
and then nothing happens.

What is wrong here??

Genomewide heatmap deviating from modal reference copy number

Hi,

We are dealing with a control setting in which the genome is already pretty messed up.
We would like to determine the most frequently observed copy number state per bin in our reference genome and set that as "normal". Next we would like to plot a heatmap genomewide which shows not the exact copy number state but whether the state is deviating from the most frequently observed state in our control genome. Is this possible in aneufinder or could this be made a feature?

cheers,
Yannick

Multiple issues with `binReads`

I've found the following potential issues when looking at the code of the binReads function and how it is called from Aneufinder.

binReads calculates binning for each file it processes in parallel

Calculating the bins for multiple bam files should yield the same result as long as the assembly is the same. This has to be the case if use.bamsignals=FALSE, but also should be if use.bamsignals=TRUE.

Why then does Aneufinder generate the same bins again, and this in parallel for each bam file it processes? If I've got 50 bam files, this will calculate bins 50 times over:

https://github.com/ataudt/aneufinder/blob/master/R/binReads.R#L216-L237

It is not possible to make binReads run only on pre-calculated bins

There is 3 variables used to pass bins:

  • binsizes - bins to calculate with a fixed size
  • reads.per.bin - bins to calculate with a fixed read count
  • bins - already calculated bins either from a file or GRanges object

The documentation for bins states:

A named list with GRanges containing precalculated bins produced by fixedWidthBins or variableWidthBins.

This, however, is not how it is used in the Aneufinder function; here, bin sizes are passed that are not calculated yet:

https://github.com/ataudt/aneufinder/blob/master/R/Aneufinder.R#L221

In addition, the parallel.helper function that is supposed to bin reads doesn't do anything with existing bins:

https://github.com/ataudt/aneufinder/blob/master/R/Aneufinder.R#L217-L224

can we get bed files from aneufinder

Hello we are trying to compare single cell data calls made by Ginkgo, Aneufinder and QDNAseq. We can do this if we get either bed or vcf files from aneufinder. We cant work out if this is possible, it will be great if you can let me know. Thanks

Package check failure with cowplot release candidate

I'm about to release cowplot 1.0 and your package currently doesn't pass a package check with the cowplot release candidate. The problem is in this example:

#'qplot(pre.blacklist$ratio, binwidth=0.1)

The cowplot package doesn't automatically attach ggplot2 anymore, and therefore the example breaks without an explicit library(ggplot2). If you depend on an attached ggplot2 in multiple places, you can also ggplot2 from Imports to Depends in your description file, though generally it is now discouraged to have numerous packages in Depends.

There are also a few other important changes to cowplot. I suggest you read through the release notes and make sure your package works with the cowplot release candidate.
https://github.com/wilkelab/cowplot/blob/master/NEWS

I expect to do the CRAN release in the 2nd week of July.

Can I use this tool for bulk WGS data?

Dear Aneufinder team,

Thank you for providing this tool. I was wondering whether I could use it for CNV calling in bulk WGS data. I tried using hg38 aligned bam files of bulk WGS data and got the error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'an integer', got 'Signal'

Do you have any clues on what I need to do to fit the WGS data for this tool? Thank you so much!

Best,

Pingping

Binread, Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

I'm trying to generate bin data from GRanges object that I made from:

raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)

And this turned out to work. However when I try:

bins_reads=binReads(raw_reads,
                    assembly=genome,
                    chromosomes=chromosomes,
                    binsizes=c(40000,80000,100000,200000,500000))

It gives me the error massage: Subsetting specified chromosomes ...Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

The error can be traced back to:

traceback()
3: seqnames(data) %in% chroms2use
2: data[seqnames(data) %in% chroms2use]
1: binReads(raw_reads, assembly = genome, chromosomes = chromosomes, 
       binsizes = c(40000, 80000, 100000, 200000, 500000))

Please help me understand what is going on here. Thank you so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.