ataudt / aneufinder Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 18.0 36.79 MB

Find CNVs in single cell sequencing data.

R 74.49% C++ 24.56% TeX 0.96%

aneufinder's People

Contributors

Stargazers

Watchers

Forkers

daewoooo devmotion yixf-self sunbymoon mschubert rwardenaar maiseb zongchangli umcugenetics zhamadeh jeskowagner zihangwen panpan2557 volcano1998 sdangelis

aneufinder's Issues

too many 3-6 somy CNV calls from single cell DNA-seq

I am using aneufinder to analyze single cell data with default parameters.

For most of cells from healthy donors (assumed with only a small number of copy number variation), only a few called CNVs are not "2somy", which is as expected.

However for around 5% of cells, most of called CNVs are not "2 somy", as shown in attached file.

Is there any problem with the result?

cell73.txt

Add way to skip plotting

First of all: Thank you for the great package!

It would be nice to have a parameter in the Aneufinder procedure to disable/skip the plotting, because it seems to be the most time-consuming part. Looking at the code it seems that this would roughly involve making the code between L573-L758 optional.

To make things slightly harder, it would also be great to be able to do the plotting without regenerating the models.

If this is a feature you would accept a patch for, I can create a pull request.

Plots assigned wrong identifiers in heatmapGenomewide

heatmapGenomewide gets confused if any model in the list of hmms has an empty segments variable and plots the wrong identifiers for samples.

It would be better to either drop these samples or throw an error.

connect to host 4 port 22: Invalid argument

I am trying to run Aneufinder(inputfolder=infolder, outputfolder=outfolder,format="bam", numCPU=cpu, method=c("HMM","dnacopy"),pairedEndReads = TRUE )

A config file is created and then I get the Message:
Setting up parallel execution with 4 CPUs ...ssh: connect to host 4 port 22: Invalid argument
and then nothing happens.

What is wrong here??

Genomewide heatmap deviating from modal reference copy number

Hi,

We are dealing with a control setting in which the genome is already pretty messed up.
We would like to determine the most frequently observed copy number state per bin in our reference genome and set that as "normal". Next we would like to plot a heatmap genomewide which shows not the exact copy number state but whether the state is deviating from the most frequently observed state in our control genome. Is this possible in aneufinder or could this be made a feature?

cheers,
Yannick

Multiple issues with `binReads`

I've found the following potential issues when looking at the code of the binReads function and how it is called from Aneufinder.

`binReads` calculates binning for each file it processes in parallel

Calculating the bins for multiple bam files should yield the same result as long as the assembly is the same. This has to be the case if use.bamsignals=FALSE, but also should be if use.bamsignals=TRUE.

Why then does Aneufinder generate the same bins again, and this in parallel for each bam file it processes? If I've got 50 bam files, this will calculate bins 50 times over:

https://github.com/ataudt/aneufinder/blob/master/R/binReads.R#L216-L237

It is not possible to make `binReads` run only on pre-calculated bins

There is 3 variables used to pass bins:

binsizes - bins to calculate with a fixed size
reads.per.bin - bins to calculate with a fixed read count
bins - already calculated bins either from a file or GRanges object

The documentation for bins states:

A named list with GRanges containing precalculated bins produced by fixedWidthBins or variableWidthBins.

This, however, is not how it is used in the Aneufinder function; here, bin sizes are passed that are not calculated yet:

https://github.com/ataudt/aneufinder/blob/master/R/Aneufinder.R#L221

In addition, the parallel.helper function that is supposed to bin reads doesn't do anything with existing bins:

https://github.com/ataudt/aneufinder/blob/master/R/Aneufinder.R#L217-L224

404 page not found

Hi,
When I click the hyperlink 'vignette', it will prompt “404 page not found‘’.
I think the pdf no longer exists， what happened?
Thank you for your attention.
(https://github.com/ataudt/aneufinder/blob/master/vignettes/AneuFinder.pdf)

can we get bed files from aneufinder

Hello we are trying to compare single cell data calls made by Ginkgo, Aneufinder and QDNAseq. We can do this if we get either bed or vcf files from aneufinder. We cant work out if this is possible, it will be great if you can let me know. Thanks

Package check failure with cowplot release candidate

I'm about to release cowplot 1.0 and your package currently doesn't pass a package check with the cowplot release candidate. The problem is in this example:

aneufinder/R/blacklist.R

Line 22 in adb7443

#'qplot(pre.blacklist$ratio, binwidth=0.1)

The cowplot package doesn't automatically attach ggplot2 anymore, and therefore the example breaks without an explicit library(ggplot2). If you depend on an attached ggplot2 in multiple places, you can also ggplot2 from Imports to Depends in your description file, though generally it is now discouraged to have numerous packages in Depends.

There are also a few other important changes to cowplot. I suggest you read through the release notes and make sure your package works with the cowplot release candidate.
https://github.com/wilkelab/cowplot/blob/master/NEWS

I expect to do the CRAN release in the 2nd week of July.

Can I use this tool for bulk WGS data?

Dear Aneufinder team,

Thank you for providing this tool. I was wondering whether I could use it for CNV calling in bulk WGS data. I tried using hg38 aligned bam files of bulk WGS data and got the error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : scan() expected 'an integer', got 'Signal'

Do you have any clues on what I need to do to fit the WGS data for this tool? Thank you so much!

Best,

Pingping

Binread, Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

I'm trying to generate bin data from GRanges object that I made from:

raw_reads=bam2GRanges(bamfile,remove.duplicate.reads = TRUE,min.mapq = 10,blacklist = blacklist)

And this turned out to work. However when I try:

bins_reads=binReads(raw_reads,
                    assembly=genome,
                    chromosomes=chromosomes,
                    binsizes=c(40000,80000,100000,200000,500000))

It gives me the error massage: Subsetting specified chromosomes ...Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

The error can be traced back to:

traceback()
3: seqnames(data) %in% chroms2use
2: data[seqnames(data) %in% chroms2use]
1: binReads(raw_reads, assembly = genome, chromosomes = chromosomes, 
       binsizes = c(40000, 80000, 100000, 200000, 500000))

Please help me understand what is going on here. Thank you so much!

ataudt / aneufinder Goto Github PK

aneufinder's People

Contributors

Stargazers

Watchers

Forkers

aneufinder's Issues

binReads calculates binning for each file it processes in parallel

It is not possible to make binReads run only on pre-calculated bins

Recommend Projects

Recommend Topics

Recommend Org

`binReads` calculates binning for each file it processes in parallel

It is not possible to make `binReads` run only on pre-calculated bins