zarnackgroup / bindingsitefinder Goto Github PK

Package for the definition of biniding sites for iCLIP data

Home Page: https://www.bioconductor.org/packages/release/bioc/html/BindingSiteFinder.html

R 99.82% TeX 0.18%

binding-site-classification binding-sites bioconductor-package iclip rna-binding-proteins

bindingsitefinder's Introduction

BindingSiteFinder

Precise knowledge on the binding sites of an RNA-binding protein (RBP) is key to understand (post-) transcriptional regulatory processes.

The BindingSiteFinder software package provides a full workflow for RBP binding site definition. The package provides function as well as rich visualizations and is well integrated in state-of-the-art and widely used Bioconductor classes, such as GenomicRanges and SummarizedExperiments. For details please see the vignette.

For latest changes and updates please have a look at the devel-page.

bindingsitefinder's People

Contributors

Stargazers

Watchers

bindingsitefinder's Issues

help with an erro please

Hello, I was trying to use your BindingSiteFinder and got an error:

Error in if (all(!df.global$increaseOverMin)) { :
missing value where TRUE/FALSE needed

I was using BSFind(object = KObds, anno.genes = gns, anno.transcriptRegionList = regions,

          est.subsetChromosome = "chr3", veryQuiet = TRUE, est.maxBsWidth = 29)

and it also gave following message:
2.16% (17/788) peaks overlap with multiple anno.genes in the given gene annotation.
A single instance of each peak is kept. This is recommended.

Could you please tell me waht could be the problem? thanks

Problems if not all samples have the same chromosomes

Hi,
when I use makeBindingSites, I get the following error
Error in .subset_by_GenomicRanges(x, i) :
‘x’ must have unique names when subsetting by a GenomicRanges subscript.

I think the error comes from the .collapseSamples function, that uses
for (i in seq_along(p)) {
pSum = pSum + p[[i]]
}
This will not add up the chromosomes right if there are different chromosomes or the chromosomes do not have the same order in to samples.

when I merge these two samples

names(signal$signalPlus$1_oe_FLAG)
[1] “KI270802.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY”
names(signal$signalPlus$2_oe_FLAG)
[1] “GL000225.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY”
the merge does not contain “GL000225.1”
names(p)
[1] “KI270802.1" “chr1” “chr10" “chr11” “chr12"
[6] “chr13” “chr14" “chr15” “chr16" “chr17”
[11] “chr18" “chr19” “chr2" “chr20” “chr21"
[16] “chr22” “chr3" “chr4” “chr5" “chr6”
[21] “chr7" “chr8” “chr9" “chrM” “chrX”
[26] “chrY”

instead “KI270802.1” and “GL000225.1" are added and called “KI270802.1”

If I then merge all 4 samples the merge looks like this:
names(sgnMergePlus)
[1] “KI270802.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY” NA
[28] NA

and it will through an error because two names are NA. However, it would probably not cause an error if just one name is NA, which is kind of dangerous. Because it might add up the wrong stuff without causing an error.

3.3 Transcript region assignment - typo?

Hi,
maybe I am missing something, but I was wondering about the third code chunk of the 3.3 Transcript region assignment chapter in the vignette.
The code is:

# Count the overlaps of each binding site fore each region of the transcript. 
cdseq = regions$CDS %>% countOverlaps(bindingSites,.)
intrns = regions$Intron %>% countOverlaps(bindingSites,.)
utrs3 = regions$UTR3 %>% countOverlaps(bindingSites,.)
utrs5 = regions$UTR5 %>% countOverlaps(bindingSites,.)
countDf = data.frame(CDS = cdseq, Intron = intrns, UTR3 = utrs3, UTR5 = utrs5)
# Count how many times an annotation is not present.
df = data.frame(olClass = apply(countDf,1,function(x) length(x[x == 0])))

So you want to know how many times a binding site overlaps with more than one region. You then say you "Count how many times an annotation is not present." But shouldn't you count how many times an annotation is present?
So length(x[x != 0]) in the last row?

Because the plot afterward says "Bar plot shows how many times a binding site overlaps with an annotation of a different transcript region."

Error in .subset_by_GenomicRanges(x, i): 'x' must have unique names when subsetting by a GenomicRanges subscript

Hi, please help.

I'm having some issues running BindingSiteFinder.
After following the complete iCLIP analysis pipeline on the instructions, I check my data and get these:

### checking the bed file
read.table(BED.file1)

           V1       V2       V3 V4          V5 V6
1   Bomo_Chr1   366460   366461  3   3.7454300  +
2   Bomo_Chr1   366461   366462  3  17.8156000  +
3   Bomo_Chr1   553887   553888  3   6.1753600  +
...
164 Bomo_Chr1 13135067 13135068  3   3.8580500  +
165 Bomo_Chr1 13135205 13135206  3   6.2024100  +
166 Bomo_Chr1 13135206 13135207  3   1.8451600  +
 [ reached 'max' / getOption("max.print") -- omitted 18706 rows ]

### importing the GRanges object
cs = rtracklayer::import(con = "BED.file1", format = "BED")
cs

GRanges object with 18872 ranges and 2 metadata columns:
              seqnames    ranges strand |        name     score
                 <Rle> <IRanges>  <Rle> | <character> <numeric>
      [1]    Bomo_Chr1    366461      + |           3   3.74543
      [2]    Bomo_Chr1    366462      + |           3  17.81560
      [3]    Bomo_Chr1    553888      + |           3   6.17536
      [4]    Bomo_Chr1    553935      + |           3   0.46261
      [5]    Bomo_Chr1    629420      + |           3   3.59940
      ...          ...       ...    ... .         ...       ...
  [18868] Bomo_Scaf657      7769      - |           3   4.72738
  [18869] Bomo_Scaf657      7768      - |           3   4.76283
  [18870] Bomo_Scaf657      7767      - |           3   2.66149
  [18871] Bomo_Scaf657      7766      - |           3   4.81242
  [18872] Bomo_Scaf659      7245      - |           3  20.80880
  -------
  seqinfo: 59 sequences from an unspecified genome; no seqlengths

### Importing the bigwig files
files <- "BIGWIG.file1"
clipFilesP <- list.files(files, pattern = "_s.bw$", full.names = TRUE)
clipFilesM <- list.files(files, pattern = "_as.bw$", full.names = TRUE)

### checking the meta data
meta = data.frame(
  id = c(1,2,3,4),
  condition = factor(c("WT","WT","KD","KD"), 
  levels = c("KD","WT")), 
  clPlus = clipFilesP, clMinus = clipFilesM)
meta

  id condition clPlus
1  1       WT  BIGWIG.file1/WT-1_s.bw
2  2      WT  BIGWIG.file1/WT-2_s.bw
3  3      KD  BIGWIG.file1/KD-1_s.bw
4  4      KD  BIGWIG.file1/KD-2_s.bw
    clMinus
1  BIGWIG.file1/WT-1_as.bw
2  BIGWIG.file1/WT-2_as.bw
3  BIGWIG.file1/KD-1_as.bw
4  BIGWIG.file1/KD-2_as.bw

### Construction of the the BindingSiteFinder dataset
bds = BSFDataSetFromBigWig(ranges = cs, meta = meta, silent = TRUE)
bds

Object of class BSFDataSet 
Contained ranges:  17.918 
----> Number of chromosomes:  59 
----> Ranges width:  1 
Contained conditions:  KD WT

But when I try to proceed with the workflow like this:

supportRatioPlot(bds, bsWidths = seq(from = 3, to = 19, by = 2))

I get the following error:

Error in .subset_by_GenomicRanges(x, i) : 
  'x' must have unique names when subsetting by a GenomicRanges subscript
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Do you think you could help me figure out what's causing this?

Thank you!

Error when extending plots with theme from ggplot

Target gene assignment in vignette: Problem with "_PAR_Y"

Hello,
I am using your vignette (https://www.bioconductor.org/packages/release/bioc/vignettes/BindingSiteFinder/inst/doc/vignette.html) and found and got an Error in the " Target gene assignment" part (with gencode.v31.annotation.gff3):

> annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
> annoInfo = import(mygff3, format = "gff3")
> 
> # Get genes as GRanges
> gns = genes(annoDb)
> idx = match(gns$gene_id, annoInfo$gene_id)
> elementMetadata(gns) = cbind(elementMetadata(gns),
+                              elementMetadata(annoInfo)[idx,])

Error: subscript contains NAs

The problems seem to be the _PAR_Y gene names.

gns$gene_id[!(gns$gene_id %in% annoInfo$gene_id)]

[1] "ENSG00000002586.20_PAR_Y" "ENSG00000124333.16_PAR_Y" "ENSG00000124334.17_PAR_Y" "ENSG00000167393.17_PAR_Y" "ENSG00000168939.11_PAR_Y" "ENSG00000169084.14_PAR_Y"
[7] "ENSG00000169093.16_PAR_Y" "ENSG00000169100.14_PAR_Y" "ENSG00000178605.13_PAR_Y" "ENSG00000182162.11_PAR_Y" "ENSG00000182378.14_PAR_Y" "ENSG00000182484.15_PAR_Y"
[13] "ENSG00000185203.12_PAR_Y" "ENSG00000185291.11_PAR_Y" "ENSG00000185960.14_PAR_Y" "ENSG00000196433.13_PAR_Y" "ENSG00000197976.12_PAR_Y" "ENSG00000198223.16_PAR_Y"
[19] "ENSG00000205755.11_PAR_Y" "ENSG00000214717.12_PAR_Y" "ENSG00000223274.6_PAR_Y" "ENSG00000223484.7_PAR_Y" "ENSG00000223511.7_PAR_Y" "ENSG00000223571.6_PAR_Y"
[25] "ENSG00000223773.7_PAR_Y" "ENSG00000225661.7_PAR_Y" "ENSG00000226179.6_PAR_Y" "ENSG00000227159.8_PAR_Y" "ENSG00000228410.6_PAR_Y" "ENSG00000228572.7_PAR_Y"
[31] "ENSG00000229232.6_PAR_Y" "ENSG00000230542.6_PAR_Y" "ENSG00000234622.6_PAR_Y" "ENSG00000234958.6_PAR_Y" "ENSG00000236017.8_PAR_Y" "ENSG00000236871.7_PAR_Y"
[37] "ENSG00000237040.6_PAR_Y" "ENSG00000237531.6_PAR_Y" "ENSG00000237801.6_PAR_Y" "ENSG00000265658.6_PAR_Y" "ENSG00000270726.6_PAR_Y" "ENSG00000275287.5_PAR_Y"
[43] "ENSG00000277120.5_PAR_Y" "ENSG00000280767.3_PAR_Y" "ENSG00000281849.3_PAR_Y"

A solution is to remove the _PAR. Maybe include that in the vignette?

annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
annoInfo = import(mygff3, format = "gff3")

# Get genes as GRanges
gns = genes(annoDb)
gns = gns[!grepl(pattern = "_PAR_Y", gns$gene_id)]
idx = match(gns$gene_id, annoInfo$gene_id)
elementMetadata(gns) = cbind(elementMetadata(gns),
                             elementMetadata(annoInfo)[idx,])

zarnackgroup / bindingsitefinder Goto Github PK

bindingsitefinder's Introduction

BindingSiteFinder

bindingsitefinder's People

Contributors

Stargazers

Watchers

bindingsitefinder's Issues

help with an erro please

Problems if not all samples have the same chromosomes

3.3 Transcript region assignment - typo?

Error in .subset_by_GenomicRanges(x, i): 'x' must have unique names when subsetting by a GenomicRanges subscript

Error when extending plots with theme from ggplot

Target gene assignment in vignette: Problem with "_PAR_Y"

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent