embl-hentze-group / dewseq Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 1.0 7.89 MB

R/Bioconductor package for e/iCLIP data analysis

R 86.02% TeX 13.98%

bioinformatics ngs-analysis eclip

dewseq's Introduction

DEWSeq

An R package for the analysis of eCLIP and iCLIP data using sliding window approach.

Bioconductor page: https://bioconductor.org/packages/release/bioc/html/DEWSeq.html

Vignette: https://bioconductor.org/packages/release/bioc/vignettes/DEWSeq/inst/doc/DEWSeq.html

Bug reports: https://github.com/EMBL-Hentze-group/DEWSeq/issues

dewseq's People

Contributors

Stargazers

Watchers

Forkers

fulaibaowang

dewseq's Issues

question about annotObj in DESeqDataSetFromSlidingWindows

Hi,

I followed the tutorial from Htseq-clip and generated everything.

But the sliding windowed annotation file that Htseq-clip created is BED6 format.

It cannot be used for DESeqDataSetFromSlidingWindows as annotObj.

May I ask do you have any idea that can convert annotation from Htseq-clip to DESeqDataSetFromSlidingWindows recoginized annotObj ?

Snakemake workflow for preprocessing prior DEWseq

Dear Thomas @Distue and Sudeep @sudeepsahadevan,

I hope this finds you both well. I remember you used to provide a snakemake workflow to preprocess eClip data; please correct me if I'm wrong on this. Is that available anywhere? Do you recommend any alternatives?

citation?

Good day, there is no output to citation("DEWSeq"). How do I properly cite this package? Thank you.

resultRegions() and toBED() function

Hi,

I am running the vignette, and I will really appreciate if you can explain a bit more about the output.

1 extractRegions
As wrote there, extractRegions function to combine the overlapping significant windows.
But in the result, you still see overlapping regionns, for example in the vignette:

##  4 chr1           28648620   28648730 +                      5           110
##  5 chr1           28648620   28648733 +                      5           113

Then the real number of signficant binding region shall be less than the total number of row of resultRegions table (218)?

toBED
if I do :

resultRegions <- extractRegions(windowRes  = resultWindows,
                                padjCol    = "p_adj_IHW",
                                padjThresh = 0.01, 
                                log2FoldChangeThresh = 0.5) %>% as_tibble

and

toBED(windowRes = resultWindows,
      regionRes = resultRegions,
      fileName  = "enrichedWindowsRegions.bed",                               
       padjCol    = "p_adj_IHW",
       padjThresh = 0.01, 
       log2FoldChangeThresh = 0.5)

the output file "enrichedWindowsRegions.bed" has much more rows than the table resultRegions, why?

Thank you!

contrast argument in resultsDEWSeq

Hi,

I see the order of group in contrast argument changes result of resultsDEWSeq, basically

resultWindows <- resultsDEWSeq(ddw,
                              contrast = c("type", "group1", "group2"),
                              tidy = TRUE) %>% as_tibble

resultWindows <- resultsDEWSeq(ddw,
                              contrast = c("type", "group2", "group1"),
                              tidy = TRUE) %>% as_tibble

are different.

Can you elaborate a bit more about this? Thank you!

error when running the vignette

I guess the last piece of code of toBED

toBED(windowRes = resultWindows,
      regionRes = resultRegions,
      fileName  = "enrichedWindowsRegions.bed")

should be

toBED(windowRes = resultWindows,
      regionRes = resultRegions,
      fileName  = "enrichedWindowsRegions.bed",
      padjCol = "p_adj_IHW")

function not found

Hello!
After I run the code in your R package, I get the following error:
Error in DESeqDataSetFromSlidingWindows(countData = count_matrix, colData = col_data, :
There is no "DESeqDataSetFromSlidingWindows" function.
The source code is as follows:
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, colData=col_data, annotObj=annotation_file, design=~-type)

How should this problem be solved? Looking forward to your answer! grateful!

Does count/matrix file dimension have to match annothenion file dimension?

Hi,

I followed the htseq-clip + Dewseq pipeline to process clip data.

But it gave erroe when use DESeqDataSetFromSlidingWindows:

Error in SummarizedExperiment(assays = SimpleList(counts = countData), :
the rownames and colnames of the supplied assay(s) must be NULL or identical to those of the
RangedSummarizedExperiment object (or derivative) to construct

Is this because countData (1462654 3) dimension not match annotationData (65312564 12)？

I get all count/ matrix files and sliding windowed annotation files from HTseq-clip.

Is there any way to fix this ?

Thanks

specific design and model

Hello,
Thank you very much for the great tool!
I have eCLIP dataset with two conditions (before and after treatment), 8 IP replicates plus SMI controls in replicates.
I would like to use DEWSeq to compare binding profile of a RBP at two different time points.
I was wondering if the DEWSeq pipeline can be applied to search for differentially bindind regions between IP samples (not for a one-sided IP vs SMI comparison)? If so, how can I make this comparison while accounting for the negative controls? What would be the design formula and model for this experiment?

Here is the sample info:

Sample ID Condition1 Condition2
1 T0 A IP
2 T0B IP
3 T0C IP
4 T0 D IP
5 T2 A IP
6 T2B IP
7 T2C IP
8 T2 D IP
9 T0 A Input
10 T0B Input
11 T0C Input
12 T0 D Input
13 T2 A Input
14 T2B Input
15 T2C Input
16 T2 D Input

The different condition are T0 or T2 (non-stimulated or stimulated).

So for example,
“T0 A IP” is the non stimulated sample A with the IP
“T0 A input” is the same sample as before just the input

“T2 A IP” is the stimulated sample A with the IP
“T2 A input” is the same sample as before just the input

Thank you in davance!
All the best,

Memory issues

I'm running out of memory trying to create ddw object using DESeqDataSetFromSlidingWindows.

Code is:
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = data.frame(annotation_file), colData=col_data, design=~type)

Result is:
Error: cannot allocate vector of size 1024.0 Mb

I checked dimensions of matrixes:
> dim(count_matrix)
[1] 381366 16
> dim(annotation_file)
[1] 88574203 12

Does the annotation seem a bit large? I struggled to upload this to R in the first place using fread so used ff instead.

I followed the examples from https://link.springer.com/protocol/10.1007%2F978-1-0716-1851-6_10 to generate the annotation file so I'm not sure how to fix it.

Any help would be much appreciated.

How to set the -e/--mate parameter for non-strand-specific or single-end sequencing libraries?

Hi,

I have a question regarding the -e/--mate parameter for the extract command.

Sometimes sequencing library is either non-strand-specific or single-end sequencing. According to the documentation, the -e/--mate parameter is used to select the read/mate to extract the crosslink sites from paired-end sequencing, with choices 1 or 2 (1 for the first mate and 2 for the second mate).

Could you please provide guidance on how to set this parameter for:

Non-strand-specific libraries
Single-end sequencing libraries

Thank you for your help!

Best regards,

DESeqDataSetFromSlidingWindows issue

When trying to generate DESeq object I get this error:
ddw <- DESeqDataSetFromSlidingWindows(countData=count_matrix, annotObj = annotation_file, colData=col_data, design=~type)

Warning in DESeqDataSetFromSlidingWindows(countData = count_matrix, annotObj = annotation_file, :
Cannot find chromosomal positions for all entries in countData.
countData rows with missing annotation will be removed !
Error in DESeqDataSet(se, design = design, ignoreRank) :
all samples have 0 counts for all genes. check the counting script.

head(count_matrix) gives me:
smb-hk-2-22-fhevci1-rep1-20200123-ju_trimmed
ENSG00000227232.5:intron0005W00067 1
ENSG00000227232.5:exon0004W00079 0
ENSG00000227232.5:intron0002W00089 2
ENSG00000227232.5:intron0001W00221 0
ENSG00000279457.4:intron0008W00009 0
ENSG00000279457.4:intron0007W00029 0
smb-hk-2-23-fhevci8-rep1-20200123-ju_trimmed
ENSG00000227232.5:intron0005W00067 0
ENSG00000227232.5:exon0004W00079 0
ENSG00000227232.5:intron0002W00089 1
ENSG00000227232.5:intron0001W00221 0
ENSG00000279457.4:intron0008W00009 0
ENSG00000279457.4:intron0007W00029 1

I thought the chromosomal positions came from the annotation object rather than the matrix file?