Giter VIP home page Giter VIP logo

bindingsitefinder's Introduction

BindingSiteFinder

Precise knowledge on the binding sites of an RNA-binding protein (RBP) is key to understand (post-) transcriptional regulatory processes.

The BindingSiteFinder software package provides a full workflow for RBP binding site definition. The package provides function as well as rich visualizations and is well integrated in state-of-the-art and widely used Bioconductor classes, such as GenomicRanges and SummarizedExperiments. For details please see the vignette.

For latest changes and updates please have a look at the devel-page.

bindingsitefinder's People

Contributors

jwokaty avatar mirkobr avatar nturaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

bindingsitefinder's Issues

help with an erro please

Hello, I was trying to use your BindingSiteFinder and got an error:

Error in if (all(!df.global$increaseOverMin)) { :
missing value where TRUE/FALSE needed

I was using BSFind(object = KObds, anno.genes = gns, anno.transcriptRegionList = regions,

  •           est.subsetChromosome = "chr3", veryQuiet = TRUE, est.maxBsWidth = 29)
    

and it also gave following message:
2.16% (17/788) peaks overlap with multiple anno.genes in the given gene annotation.
A single instance of each peak is kept. This is recommended.

Could you please tell me waht could be the problem? thanks

Problems if not all samples have the same chromosomes

Hi,
when I use makeBindingSites, I get the following error
Error in .subset_by_GenomicRanges(x, i) :
‘x’ must have unique names when subsetting by a GenomicRanges subscript.

I think the error comes from the .collapseSamples function, that uses
for (i in seq_along(p)) {
pSum = pSum + p[[i]]
}
This will not add up the chromosomes right if there are different chromosomes or the chromosomes do not have the same order in to samples.

when I merge these two samples

names(signal$signalPlus$1_oe_FLAG)
[1] “KI270802.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY”
names(signal$signalPlus$2_oe_FLAG)
[1] “GL000225.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY”
the merge does not contain “GL000225.1”
names(p)
[1] “KI270802.1" “chr1” “chr10" “chr11” “chr12"
[6] “chr13” “chr14" “chr15” “chr16" “chr17”
[11] “chr18" “chr19” “chr2" “chr20” “chr21"
[16] “chr22” “chr3" “chr4” “chr5" “chr6”
[21] “chr7" “chr8” “chr9" “chrM” “chrX”
[26] “chrY”

instead “KI270802.1” and “GL000225.1" are added and called “KI270802.1”

If I then merge all 4 samples the merge looks like this:
names(sgnMergePlus)
[1] “KI270802.1” “chr1" “chr10”
[4] “chr11" “chr12” “chr13"
[7] “chr14” “chr15" “chr16”
[10] “chr17" “chr18” “chr19"
[13] “chr2” “chr20" “chr21”
[16] “chr22" “chr3” “chr4"
[19] “chr5” “chr6" “chr7”
[22] “chr8" “chr9” “chrM”
[25] “chrX” “chrY” NA
[28] NA

and it will through an error because two names are NA. However, it would probably not cause an error if just one name is NA, which is kind of dangerous. Because it might add up the wrong stuff without causing an error.

3.3 Transcript region assignment - typo?

Hi,
maybe I am missing something, but I was wondering about the third code chunk of the 3.3 Transcript region assignment chapter in the vignette.
The code is:

# Count the overlaps of each binding site fore each region of the transcript. 
cdseq = regions$CDS %>% countOverlaps(bindingSites,.)
intrns = regions$Intron %>% countOverlaps(bindingSites,.)
utrs3 = regions$UTR3 %>% countOverlaps(bindingSites,.)
utrs5 = regions$UTR5 %>% countOverlaps(bindingSites,.)
countDf = data.frame(CDS = cdseq, Intron = intrns, UTR3 = utrs3, UTR5 = utrs5)
# Count how many times an annotation is not present.
df = data.frame(olClass = apply(countDf,1,function(x) length(x[x == 0]))) 

So you want to know how many times a binding site overlaps with more than one region. You then say you "Count how many times an annotation is not present." But shouldn't you count how many times an annotation is present?
So length(x[x != 0]) in the last row?

Because the plot afterward says "Bar plot shows how many times a binding site overlaps with an annotation of a different transcript region."

Error in .subset_by_GenomicRanges(x, i): 'x' must have unique names when subsetting by a GenomicRanges subscript

Hi, please help.

I'm having some issues running BindingSiteFinder.
After following the complete iCLIP analysis pipeline on the instructions, I check my data and get these:

### checking the bed file
read.table(BED.file1)

           V1       V2       V3 V4          V5 V6
1   Bomo_Chr1   366460   366461  3   3.7454300  +
2   Bomo_Chr1   366461   366462  3  17.8156000  +
3   Bomo_Chr1   553887   553888  3   6.1753600  +
...
164 Bomo_Chr1 13135067 13135068  3   3.8580500  +
165 Bomo_Chr1 13135205 13135206  3   6.2024100  +
166 Bomo_Chr1 13135206 13135207  3   1.8451600  +
 [ reached 'max' / getOption("max.print") -- omitted 18706 rows ]
### importing the GRanges object
cs = rtracklayer::import(con = "BED.file1", format = "BED")
cs

GRanges object with 18872 ranges and 2 metadata columns:
              seqnames    ranges strand |        name     score
                 <Rle> <IRanges>  <Rle> | <character> <numeric>
      [1]    Bomo_Chr1    366461      + |           3   3.74543
      [2]    Bomo_Chr1    366462      + |           3  17.81560
      [3]    Bomo_Chr1    553888      + |           3   6.17536
      [4]    Bomo_Chr1    553935      + |           3   0.46261
      [5]    Bomo_Chr1    629420      + |           3   3.59940
      ...          ...       ...    ... .         ...       ...
  [18868] Bomo_Scaf657      7769      - |           3   4.72738
  [18869] Bomo_Scaf657      7768      - |           3   4.76283
  [18870] Bomo_Scaf657      7767      - |           3   2.66149
  [18871] Bomo_Scaf657      7766      - |           3   4.81242
  [18872] Bomo_Scaf659      7245      - |           3  20.80880
  -------
  seqinfo: 59 sequences from an unspecified genome; no seqlengths
### Importing the bigwig files
files <- "BIGWIG.file1"
clipFilesP <- list.files(files, pattern = "_s.bw$", full.names = TRUE)
clipFilesM <- list.files(files, pattern = "_as.bw$", full.names = TRUE)
### checking the meta data
meta = data.frame(
  id = c(1,2,3,4),
  condition = factor(c("WT","WT","KD","KD"), 
  levels = c("KD","WT")), 
  clPlus = clipFilesP, clMinus = clipFilesM)
meta

  id condition clPlus
1  1       WT  BIGWIG.file1/WT-1_s.bw
2  2      WT  BIGWIG.file1/WT-2_s.bw
3  3      KD  BIGWIG.file1/KD-1_s.bw
4  4      KD  BIGWIG.file1/KD-2_s.bw
    clMinus
1  BIGWIG.file1/WT-1_as.bw
2  BIGWIG.file1/WT-2_as.bw
3  BIGWIG.file1/KD-1_as.bw
4  BIGWIG.file1/KD-2_as.bw
### Construction of the the BindingSiteFinder dataset
bds = BSFDataSetFromBigWig(ranges = cs, meta = meta, silent = TRUE)
bds

Object of class BSFDataSet 
Contained ranges:  17.918 
----> Number of chromosomes:  59 
----> Ranges width:  1 
Contained conditions:  KD WT

But when I try to proceed with the workflow like this:

supportRatioPlot(bds, bsWidths = seq(from = 3, to = 19, by = 2))

I get the following error:

Error in .subset_by_GenomicRanges(x, i) : 
  'x' must have unique names when subsetting by a GenomicRanges subscript
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Do you think you could help me figure out what's causing this?

Thank you!

Target gene assignment in vignette: Problem with "_PAR_Y"

Hello,
I am using your vignette (https://www.bioconductor.org/packages/release/bioc/vignettes/BindingSiteFinder/inst/doc/vignette.html) and found and got an Error in the " Target gene assignment" part (with gencode.v31.annotation.gff3):

> annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
> annoInfo = import(mygff3, format = "gff3")
> 
> # Get genes as GRanges
> gns = genes(annoDb)
> idx = match(gns$gene_id, annoInfo$gene_id)
> elementMetadata(gns) = cbind(elementMetadata(gns),
+                              elementMetadata(annoInfo)[idx,])

Error: subscript contains NAs

The problems seem to be the _PAR_Y gene names.

gns$gene_id[!(gns$gene_id %in% annoInfo$gene_id)]

[1] "ENSG00000002586.20_PAR_Y" "ENSG00000124333.16_PAR_Y" "ENSG00000124334.17_PAR_Y" "ENSG00000167393.17_PAR_Y" "ENSG00000168939.11_PAR_Y" "ENSG00000169084.14_PAR_Y"
[7] "ENSG00000169093.16_PAR_Y" "ENSG00000169100.14_PAR_Y" "ENSG00000178605.13_PAR_Y" "ENSG00000182162.11_PAR_Y" "ENSG00000182378.14_PAR_Y" "ENSG00000182484.15_PAR_Y"
[13] "ENSG00000185203.12_PAR_Y" "ENSG00000185291.11_PAR_Y" "ENSG00000185960.14_PAR_Y" "ENSG00000196433.13_PAR_Y" "ENSG00000197976.12_PAR_Y" "ENSG00000198223.16_PAR_Y"
[19] "ENSG00000205755.11_PAR_Y" "ENSG00000214717.12_PAR_Y" "ENSG00000223274.6_PAR_Y" "ENSG00000223484.7_PAR_Y" "ENSG00000223511.7_PAR_Y" "ENSG00000223571.6_PAR_Y"
[25] "ENSG00000223773.7_PAR_Y" "ENSG00000225661.7_PAR_Y" "ENSG00000226179.6_PAR_Y" "ENSG00000227159.8_PAR_Y" "ENSG00000228410.6_PAR_Y" "ENSG00000228572.7_PAR_Y"
[31] "ENSG00000229232.6_PAR_Y" "ENSG00000230542.6_PAR_Y" "ENSG00000234622.6_PAR_Y" "ENSG00000234958.6_PAR_Y" "ENSG00000236017.8_PAR_Y" "ENSG00000236871.7_PAR_Y"
[37] "ENSG00000237040.6_PAR_Y" "ENSG00000237531.6_PAR_Y" "ENSG00000237801.6_PAR_Y" "ENSG00000265658.6_PAR_Y" "ENSG00000270726.6_PAR_Y" "ENSG00000275287.5_PAR_Y"
[43] "ENSG00000277120.5_PAR_Y" "ENSG00000280767.3_PAR_Y" "ENSG00000281849.3_PAR_Y"

A solution is to remove the _PAR. Maybe include that in the vignette?

annoDb = makeTxDbFromGFF(file = mygff3, format = "gff3")
annoInfo = import(mygff3, format = "gff3")

# Get genes as GRanges
gns = genes(annoDb)
gns = gns[!grepl(pattern = "_PAR_Y", gns$gene_id)]
idx = match(gns$gene_id, annoInfo$gene_id)
elementMetadata(gns) = cbind(elementMetadata(gns),
                             elementMetadata(annoInfo)[idx,])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.