guigolab / ggsashimi Goto Github PK
View Code? Open in Web Editor NEWCommand-line tool for the visualization of splicing events across multiple samples
License: MIT License
Command-line tool for the visualization of splicing events across multiple samples
License: MIT License
$ ./sashimi-plot.py -b input_bam.tsv -g nochr_gencode_exon_transcript.gtf -c 2:29,619,720-29,740,978 -M 10 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
Error in seq.default(start, max(start + 1, end - 4), by = 2425) :
'from' must be of length 1
Calls: rbind -> [ -> [.data.table -> seq -> seq.default
Execution halted
$ ./ggsashimi.sh
Python path configuration:
PYTHONHOME = '/apps/python/3.7'
PYTHONPATH = (not set)
program name = '/usr/bin/python3'
isolated = 0
environment = 1
user site = 1
import site = 1
sys._base_executable = '/usr/bin/python3'
sys.base_prefix = '/apps/python/3.7'
sys.base_exec_prefix = '/apps/python/3.7'
sys.executable = '/usr/bin/python3'
sys.prefix = '/apps/python/3.7'
sys.exec_prefix = '/apps/python/3.7'
sys.path = [
'/apps/python/3.7/lib/python38.zip',
'/apps/python/3.7/lib/python3.8',
'/apps/python/3.7/lib/python3.8/lib-dynload',
]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'
Current thread 0x00002ae45d1c7740 (most recent call first):
<no Python frame>
ggsashimi.sh looks like:
#!/bin/bash
module load python
module load samtools
module load R/4.0
python3 /path/sashimi-plot.py \
--bam /path/wt_480480_ChP_ggsashimi.tsv \
--coordinates chrX:166170454-166178830 \
--gtf /path/Mus_musculus.GRCm38.100.gtf \
--overlay 1 \
--color-factor 3 \
--labels 1 \
--fix-y-scale
The test example raises some warnings when -A mean_j
or -A median_j
option is set:
../sashimi-plot.py -b input_bams.tsv -c chr10:27040584-27048100 -g annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt -A mean_j
There were 18 warnings (use warnings() to see them)
And the arches in the output plot start from (and end into) wrong signal values
Running the default example provided results in a PDF figure where expression labels (ex. 0-600) and the coordinate label are missing. The main figure on the repository, however, has these axes labels. Not sure if this is an issue with the R code or the python...
Hi,
I am using the ggsashmi to plot some of the transcripts from gtf file, However when I see in the IGV there are reads for exons as well as junctions. However, when I run sashami-plot script I get error "No reads in the specified area."
Example jpeg for the transcripts from IGV.
I am running following command for the plotting.
sashimi-plot.py -b bam_list -C 4 -L 3 -c chr22:39696699-39700075 -o MSTRG10006 -g ../Final_novel.gtf
From @sridhar0605 originally posted in #1:
When using Homo_sapiens.GRCh37.75.gtf as reference from ensembl, I see this error
Using default tag: latest
latest: Pulling from guigolab/ggsashimi
915665fee719: Pull complete
1a0814f59c8e: Pull complete
b3b71680ed5d: Pull complete
1c3c8afa6ada: Pull complete
2fbeb903a5b4: Pull complete
Digest: sha256:82590f821978568e948ad4861ce009fcb26e7543263bea9d7b78c17667f8d675
Status: Downloaded newer image for guigolab/ggsashimi:latest
Traceback (most recent call last):
File "/sashimi-plot.py", line 592, in
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "/sashimi-plot.py", line 278, in read_gtf
transcript_id = d["transcript_id"]
KeyError: 'transcript_id'
few lines from gtf:
#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1 pseudogene gene 11869 14412 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
1 processed_transcript transcript 11869 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00002234944";
1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00003582793";
1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00002312635";
./sashimi-plot.py -b input_bams.tsv -g Mus_musculus.GRCm38.99.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
Traceback (most recent call last):
File "./sashimi-plot.py", line 573, in
for id, bam, overlay_level, color_level, label_text in read_bam_input(args.bam, args.overlay, args.color_factor, args.labels):
File "./sashimi-plot.py", line 176, in read_bam_input
bam = get_bam_path(f, line_sp[1])
IndexError: list index out of range
Hello,
First off, thank you for developing this tool. I'm trying to run ggsashimi but I'm encountering the following error:
Traceback (most recent call last):
File "/home/sashimi-plot.py", line 632, in
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "/home/sashimi-plot.py", line 285, in read_gtf
el_chr, _, el, el_start, el_end, _, strand, _, tags = line.strip().split("\t")
ValueError: need more than 1 value to unpack
I would appreciate some help with this, thank you.
Nelson Barrientos
Hi, it seems that the mean and median aggregate functions throw an error if the genomic region is too large. I'm currently looking at a region 147,000 bases long and am getting this error when I try and use the mean or median aggregation
Error in j[4] = as.numeric(d[x == j[2] + 1, y]) : replacement has length zero Execution halted
It works just fine if I remove the aggregation or if I shrink the genomic region. What may be causing this?
Hi!
In some cases it would be nice if the y-axis scale was fixed for each compared condition. I think this could help visualize differences in the inclusion levels of alternative events. There are different ways in which this feature could be implemented, one could either fix the y-scale to that of the max read count among all the plots or it could be fixed to a value that is manually entered by the user. Both of these approaches should be fairly easy to implement using standard ggplot syntax and I could try it out if this sounds interesting to anyone else!
Hello,
I really like your tool and I find it very useful. I wonder if you already thought about wrapping your ggsashimi into Galaxy ?
I am currently working in the bioinformatics plateform ARTbio (see our github here ) and we work a lot with Galaxy by developping the framework but also different tools and wrapping them.
Would you agree to allow our platform to create the galaxy wrapper ? This would be an opportunity to provide a sashimi plot generator to Galaxy users.
Best,
Lea
I noticed the sashimi plot with multiple samples the y axis ranges are differently scaled. Is there a way so that I can set the range is same across samples?
Running the following command:
./sashimi-plot.py -b examples/input_bams.tsv -c chr10:27040584-27048100 -C 3
only the first color is used. Using a palette file or the -O
option gives the correct colors.
Hi,
I wanted to know if there is any way to annotate gene_name
and exon_number
in addition to the transcript_id
from the GTF file:
GTF file:
##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2013-12-05
##modified by GTEx_Collapsed_Gene_Model.py
1 HAVANA gene 11869 14362 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2";
1 HAVANA transcript 11869 14362 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"
; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG0
0000000961.2";
1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_1; exon_number 1";
1 HAVANA exon 12595 12721 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_2; exon_number 2";
1 HAVANA exon 12975 13052 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_3; exon_number 3";
Sashimi Plot:
Thanks
Komal
I am interested in using your ggsashimi
utility to generate some plots for a research publication. I have installed the listed dependencies and R packages for the utility from my bash prompt. (I am using the Linux subsystem for Windows 10.) However, when I attempt to run either my own samples or the provided example files, I get the error message provided below. Unfortunately, my knowledge of Python is limited. Does this indicate an error with my system setup? I appreciate the assistance!
Example command I used:
./sashimi-plot.py -b ./examples/input_bams.tsv -c chr10:27040584-27048100 -g ./examples/annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P ./examples/palette.txt
Which returns the following message:
Warning messages:
1: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] + :
number of items to replace is not a multiple of replacement length
2: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
number of items to replace is not a multiple of replacement length
3: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] + :
number of items to replace is not a multiple of replacement length
4: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
number of items to replace is not a multiple of replacement length
5: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] + :
number of items to replace is not a multiple of replacement length
6: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
number of items to replace is not a multiple of replacement length
7: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] + :
number of items to replace is not a multiple of replacement length
8: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
number of items to replace is not a multiple of replacement length
9: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] + :
number of items to replace is not a multiple of replacement length
10: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
number of items to replace is not a multiple of replacement length
Error in grid.Call.graphics(L_setviewport, vp, TRUE) :
INTEGER() can only be applied to a 'integer', not a 'NULL'
Calls: ggsave ... lapply -> FUN -> push.vp.viewport -> grid.Call.graphics
Execution halted
Hi,
I get the following error when using sashimi-plot.py
This is my code
python sashimi-plot.py -b /project/uml_frederic_chain/Daphnia/index/ConAc_A2-11D.sorted.bam -c DAPPUscaffold_67:640865-642375 -g /project/uml_frederic_chain/Daphnia/Daphnia_pulex.V1.0.41.exons.gtf --shrink
Could you please help me out in resolving this.
Hi
I pulled ggsashimi from DockerHub and am trying to smooth the histograms with "--smooth" but the argument is not recognized. In the GitHub code it looks to be commented out. Is there another argument or is smoothing not possible, currently?
Thanks, Luke
In ggplot2 versions greater that 2.2.1 grobs have a different number of colums
Hi, I'm curious about the meaning of numbers plotted above the curves (splice junction reads).
for single samples, the number seems to match the count of splice junction reads. (something like 206, 159.. )
But if I input multiple samples (4 samples) in one group and draw sashimi plot, the numbers get extremely huge. (something like : 347,142,108,206). It is definitely not the sum of junction read counts in 4 samples. I wonder what those huge numbers mean.
Thanks!
Maybe useful for some users
I'm trying to make sashimi plot using the examples files in the example folder and just copy/paste the command line you suggested and I get the plots like the example but no y axis at all. Any clues to fix this issue?
sashimi.pdf
I think now we currently support only gtf files with transcripts and exons, assuming that gene rows would still have a transcript_id
tag in the metadata. Unfortunately, newer gtf formats do not have transcript_id
for genes anymore and it throws error. We should add an exception for this
Hi,
When im running :
docker run -w $PWD -v $PWD:$PWD guigolab/ggsashimi -b bam_input.tsv -c 1:138062859-138102859
following error occurs:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 790: ordinal not in range(128)
When im running the container with one specified bam file the image works fine!
I'm using Mac OS Catalina in case if this is important.
Hi,
I'm trying to plot 26 bams separately in one single plot, but i constantly get the following error. I tried to tweak the parameters, specially the --height
but it doesn't seem to work. I can't edit the R script either to set limitsize to FALSE. When I overlay the plot by groups I can nicely see the plot, but then when showing the reads from each sample within the same layer gets a bit messy, and I do not want to aggregate the counts in this case.
Error: Dimensions exceed 50 inches (height and width are specified in 'in' not pixels). If you're sure you want a plot that big, use `limitsize = FALSE`.
Any help one track per sample ?
Best,
Pedro
Hi, I want to ask how to generate the transcript diagram (that consists transcript id and strand) at the bottom of sahsimi plot as you have generated in your example. I have tried, But I am not getting anything at the bottom..
Thanks
Heena
I would like to change the size of transctipt id when using --gtf
, but I am not sure which part of the R script needs to be modified to achieve this. Could you point to the part of the script that controls this? Thanks!
when I turn on GGSASHIMI_DEBUG and run the R script it produces, I get the following when I run the script in R:
library(ggplot2)
library(grid)
library(gridExtra)
library(data.table)
library(gtable)
scale_lwd = function(r) {
lmin = 0.1
lmax = 4
return( r*(lmax-lmin)lmin )
}
base_size = 14
height = ( 5.5 base_size*0.352777778/67 ) * 1.02
width = 10
theme_set(theme_bw(base_size=base_size))
theme_update(
plot.margin = unit(c(15,15,15,15), "pt"),
panel.grid = element_blank(),
panel.border = element_blank(),
axis.line = element_line(size=0.5),
axis.title.x = element_blank(),
axis.title.y = element_text(angle=0, vjust=0.5)
)
labels = list("#000000"="#000000","WT"="WT","1KO"="1KO")
density_list = list()
junction_list = list()
color_list = list("WT"="#000000","1KO"="#000000")
# data table with exons
ann_list = list(
"exons" = data.table(),
"introns" = data.table()
)
gtfp = ggplot()
if (length(ann_list[['introns']]) 0) {
gtfp = gtfp geom_segment(data=ann_list[['introns']], aes(x=start, xend=end, y=tx, yend=tx), size=0.3)
gtfp = gtfp geom_segment(data=txarrows, aes(x=V1,xend=V2,y=tx,yend=tx), arrow=arrow(length=unit(0.02,"npc")))
}
if (length(ann_list[['exons']]) 0) {
gtfp = gtfp geom_segment(data=ann_list[['exons']], aes(x=start, xend=end, y=tx, yend=tx), size=5, alpha=1)
}
gtfp = gtfp scale_y_discrete(expand=c(0,0.5))
gtfp = gtfp scale_x_continuous(expand=c(0,0.25))
gtfp = gtfp coord_cartesian(xlim = c(6064339,6070090))
gtfp = gtfp labs(y=NULL)
gtfp = gtfp theme(axis.line = element_blank(), axis.text.x = element_blank(), axis.ticks = element_blank())
[density lists and junction lists omitted bc too long]
junction_list[["1KO"]] = data.frame(x=c(6064422,6064422,6065846,6064422,6064422,6065846,6064422,6064422,6065846), xend=c(6065705,6069831,6069831,6065705,6069831,6069831,6065705,6069831,6069831), y=c(39,39,13,39,39,13,39,39,13), yend=c(14,34,34,14,34,34,14,34,34), count=c(13,22,8,13,22,8,13,22,8))
Error: unexpected symbol in:
"20,6064821,6064822,6064823,6064824,6064825,6064826,6064827,6064828,6064829,6064830,6064831,6064832,6064833,6064834,6064835,6064836,6064837,6064838,6064839,6064840,6
junction_list"
pdf(NULL) # just to remove the blank pdf produced by ggplotGrob
if(packageVersion('ggplot2') = '3.0.0'){ # fix problems with ggplot2 vs 3.0.0
vs = 1
} else {
vs = 0
}
if(TRUE) {
print(max(unlist(lapply(density_list, function(df){max(df$y)}))))
maxheight = max(unlist(lapply(density_list, function(df){max(df$y)})))
breaks_y = labeling::extended(0, maxheight, m = 4)
}
[1] -Inf
Error in seq.default(from = best$lmin, to = best$lmax, by = best$lstep) :
'from' must be of length 1
In addition: Warning messages:
1: In max(unlist(lapply(density_list, function(df) { :
no non-missing arguments to max; returning -Inf
2: In max(unlist(lapply(density_list, function(df) { :
no non-missing arguments to max; returning -Inf
if(exists('coord_dict')){
all_pos_shrinked = do.call(rbind, density_list)$x
s2r = merge(intersected_introns, coord_dict, by.x = 'real_xend', by.y = 'real')
s2r = merge(s2r, coord_dict, by.x = 'real_x', by.y = 'real', suffixes = c('_xend', '_x'))
breaks_x_shrinked = labeling::extended(min(all_pos_shrinked), max(all_pos_shrinked), m = 5)
breaks_x = c()
out_range = c()
for (b in breaks_x_shrinked){
iintron = FALSE
for (j in 1:nrow(s2r)){
l = s2r[j, ]
if(b = l$shrinked_x && b <= l$shrinked_xend){
# Intersected intron
p = (b-l$shrinked_x)/(l$shrinked_xend - l$shrinked_x)
realb = round(l$real_x p*(l$real_xend - l$real_x))
breaks_x = c(breaks_x, realb)
iintron = TRUE
break
}
}
if (!iintron){
# Exon, upstream/downstream intergenic region or intron (not intersected)
if(b <= min(s2r$shrinked_x)) {
l <- s2r[which.min(s2r$shrinked_x), ]
if(any(b == all_pos_shrinked)){
# Boundary (subtract)
s = l$shrinked_x - b
realb = l$real_x - s
breaks_x = c(breaks_x, realb)
} else {
out_range <- c(out_range, which(breaks_x_shrinked == b))
}
} else if (b = max(s2r$shrinked_xend)){
l <- s2r[which.max(s2r$shrinked_xend), ]
if(any(b == all_pos_shrinked)){
# Boundary (sum)
s = b - l$shrinked_xend
realb = l$real_xend s
breaks_x = c(breaks_x, realb)
} else {
out_range <- c(out_range, which(breaks_x_shrinked == b))
}
} else {
delta = b-s2r$shrinked_xend
delta[delta < 0] = Inf
l = s2r[which.min(delta), ]
# Internal (sum)
s = b - l$shrinked_xend
realb = l$real_xend s
breaks_x = c(breaks_x, realb)
}
}
}
if(length(out_range)) {
breaks_x_shrinked = breaks_x_shrinked[-out_range]
}
}
density_grobs = list();
for (bam_index in 1:length(density_list)) {
id = names(density_list)[bam_index]
d = data.table(density_list[[id]])
junctions = data.table(junction_list[[id]])
# Density plot
gp = ggplot(d) geom_bar(aes(x, y), width=1, position='identity', stat='identity', fill=color_list[[id]], alpha=0.05)
gp = gp labs(y=labels[[id]])
if(exists('coord_dict')) {
gp = gp scale_x_continuous(expand=c(0, 0.25), breaks = breaks_x_shrinked, labels = breaks_x)
} else {
gp = gp scale_x_continuous(expand=c(0, 0.25))
}
if(!TRUE){
maxheight = max(d[['y']])
breaks_y = labeling::extended(0, maxheight, m = 4)
gp = gp scale_y_continuous(breaks = breaks_y)
} else {
gp = gp scale_y_continuous(breaks = breaks_y, limits = c(NA, maxheight))
}
# Aggregate junction counts
row_i = c()
if (nrow(junctions) 0 ) {
junctions$jlabel = as.character(junctions$count)
junctions = setNames(junctions[,.(max(y), max(yend),round(mean(count)),paste(jlabel,collapse=",")), keyby=.(x,xend)], names(junctions))
if ("mean" != "") {
junctions = setNames(junctions[,.(max(y), max(yend),round(mean(count)),round(mean(count))), keyby=.(x,xend)], names(junctions))
}
# The number of rows (unique junctions per bam) has to be calculated after aggregation
row_i = 1:nrow(junctions)
}
for (i in row_i) {
j_tot_counts = sum(junctions[['count']])
j = as.numeric(junctions[i,1:5])
if ("mean" != "") {
j[3] = ifelse(length(d[x==j[1]-1,y])==0, 0, max(as.numeric(d[x==j[1]-1,y])))
j[4] = ifelse(length(d[x==j[2]1,y])==0, 0, max(as.numeric(d[x==j[2]1,y])))
}
# Find intron midpoint
xmid = round(mean(j[1:2]), 1)
ymid = max(j[3:4]) * 1.2
# Thickness of the arch
lwd = scale_lwd(j[5]/j_tot_counts)
curve_par = gpar(lwd=lwd, col=color_list[[id]])
# Arc grobs
# Choose position of the arch (top or bottom)
nss = i
if (nss%%2 == 0) { #bottom
ymid = -0.3 * maxheight
# Draw the arcs
# Left
curve = xsplineGrob(x=c(0, 0, 1, 1), y=c(1, 0, 0, 0), shape=1, gp=curve_par)
gp = gp annotation_custom(grob = curve, j[1], xmid, 0, ymid)
# Right
curve = xsplineGrob(x=c(1, 1, 0, 0), y=c(1, 0, 0, 0), shape=1, gp=curve_par)
gp = gp annotation_custom(grob = curve, xmid, j[2], 0, ymid)
}
if (nss%%2 != 0) { #top
# Draw the arcs
# Left
curve = xsplineGrob(x=c(0, 0, 1, 1), y=c(0, 1, 1, 1), shape=1, gp=curve_par)
gp = gp annotation_custom(grob = curve, j[1], xmid, j[3], ymid)
# Right
curve = xsplineGrob(x=c(1, 1, 0, 0), y=c(0, 1, 1, 1), shape=1, gp=curve_par)
gp = gp annotation_custom(grob = curve, xmid, j[2], j[4], ymid)
}
# Add junction labels
gp = gp annotate("label", x = xmid, y = ymid, label = as.character(junctions[i,6]),
vjust=0.5, hjust=0.5, label.padding=unit(0.01, "lines"),
label.size=NA, size=(base_size*0.352777778)*0.6
)
# gp = gp annotation_custom(grob = rectGrob(x=0, y=0, gp=gpar(col="red"), just=c("left","bottom")), xmid, j[2], j[4], ymid)
# gp = gp annotation_custom(grob = rectGrob(x=0, y=0, gp=gpar(col="green"), just=c("left","bottom")), j[1], xmid, j[3], ymid)
}
gpGrob = ggplotGrob(gp);
gpGrob$layout$clip[gpGrob$layout$name=="panel"] <- "off"
if (bam_index == 1) {
maxWidth = gpGrob$widths[2vs] gpGrob$widths[3vs]; # fix problems ggplot2 vs
maxYtextWidth = gpGrob$widths[3vs]; # fix problems ggplot2 vs
# Extract x axis grob (trim=F -- keep empty cells)
xaxisGrob <- gtable_filter(gpGrob, "axis-b", trim=F)
xaxisGrob$heights[8vs] = gpGrob$heights[1] # fix problems ggplot2 vs
x.axis.height = gpGrob$heights[7vs] gpGrob$heights[1] # fix problems ggplot2 vs
}
# Remove x axis from all density plots
kept_names = gpGrob$layout$name[gpGrob$layout$name != "axis-b"]
gpGrob <- gtable_filter(gpGrob, paste(kept_names, sep="", collapse="|"), trim=F)
# Find max width of y text and y label and max width of y text
maxWidth = grid::unit.pmax(maxWidth, gpGrob$widths[2vs] gpGrob$widths[3vs]); # fix problems ggplot2 vs
maxYtextWidth = grid::unit.pmax(maxYtextWidth, gpGrob$widths[3vs]); # fix problems ggplot2 vs
density_grobs[[id]] = gpGrob;
}
Error in density_list[[id]] :
attempt to select less than one element in get1index
# Add x axis grob after density grobs BEFORE annotation grob
density_grobs[["xaxis"]] = xaxisGrob
# Annotation grob
if (1.0 == 1) {
gtfGrob = ggplotGrob(gtfp);
maxWidth = grid::unit.pmax(maxWidth, gtfGrob$widths[2vs] gtfGrob$widths[3vs]); # fix problems ggplot2 vs
density_grobs[['gtf']] = gtfGrob;
}
Error in ans[ypos] <- rep(yes, length.out = len)[ypos] :
replacement has length zero
In addition: Warning message:
In rep(yes, length.out = len) : 'x' is NULL so the result will be NULL
# Reassign grob widths to align the plots
for (id in names(density_grobs)) {
density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] maxWidth - (density_grobs[[id]]$widths[2vs] maxYtextWidth); # fix problems ggplot2 vs
density_grobs[[id]]$widths[3vs] <- maxYtextWidth # fix problems ggplot2 vs
}
Error: object of type 'closure' is not subsettable
# Heights for density, x axis and annotation
heights = unit.c(
unit(rep(2, length(density_list)), "in"),
x.axis.height,
unit(1.5*1.0, "in")
)
Error in unit(rep(2, length(density_list)), "in") :
'x' and 'units' must have length 0
# Arrange grobs
argrobs = arrangeGrob(
grobs=density_grobs,
ncol=1,
heights = heights,
);
Error in arrangeGrob(grobs = density_grobs, ncol = 1, heights = heights, :
object 'heights' not found
# Save plot to file in the requested format
if ("pdf" == "tiff"){
# TIFF images will be lzw-compressed
ggsave("/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf", plot = argrobs, device = "tiff", width = width, height = height, units = "in", dpi = 300, compression = "lzw", limitsize = FALSE)
} else {
ggsave("/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf", plot = argrobs, device = "pdf", width = width, height = height, units = "in", dpi = 300, limitsize = FALSE)
}
Error in grDevices::pdf(file = filename, ..., version = version) :
cannot open file '/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf'
dev.log = dev.off()
Hi,
When I run the sashimi-plot.py it comes out this error. Could you help me solve this problem? Thank you!
(python3) [myang9@sbcsmdplp001 ReadsCoverage]$ sashimi-plot.py -s SENSE --shrink -F png -C 2 -g Homo_sapiens.GRCh38.94.gtf -b BamR6103640ACSE98545.bam -c chr1:196788215-196790227 -o R6103640-AC.SE_98545
Traceback (most recent call last):
File "sashimi-plot.py", line 214, in intersect_introns
a, b = next(it)
StopIterationThe above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "sashimi-plot.py", line 644, in
intersected_introns = list(intersect_introns(introns))
RuntimeError: generator raised StopIteration
Hi,
I am using the ggsashimi to generate sashimi plots for my results. The alternative splicing events I am interested in are located in either + or - strand. I checked the help info of ggsashimi, and the -S OUT_STRAND and -s STRAND are the two flags I can set to focus on only one strand.
But I am not sure how to correctly set the two flags. For my understanding, if the alternative splicing event I am interested in is located in - strand, -s was set to ANTISENSE. How about -S? Should I set it to minus which is compatible to -s flag? What does the out strand mean?
thanks,
Shan
Hello all,
I'm testing ggshashimi with a single bam input but keep getting the error bellow:
Error: unexpected '=' in " color_list = list(bam_iput="
Execution halted
I'm running:
Python 2.7.14
R version 3.4.1
ggplot2 v2.2.1
gridextra v2.3
data.table v1.11.4
Any help will be much appreciated and thank you for generating ggsashimi!
Best Gil
Hi! I was trying to run it via docker and started building the image:
docker build -f docker/Dockerfile -t guigolab/ggsashimi .
However it complains about devtools:
Step 6/14 : RUN R -e 'install.packages("devtools");' && R -e "require(devtools); install_version('ggplot2', version = '${GGPLOT_VER}', repos = 'http://cloud.r-project.org');"
---> Running in fcc8a5f682b6
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> install.packages("devtools");
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2017-04-21/src/contrib:
Line starting '<!doctype html><html ...' is malformed!
Warning message:
package ‘devtools’ is not available (for R version 3.3.3)
>
>
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> require(devtools); install_version('ggplot2', version = '2.2.1', repos = 'http://cloud.r-project.org');
Loading required package: devtools
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘devtools’
Error: could not find function "install_version"
Execution halted
The command '/bin/sh -c R -e 'install.packages("devtools");' && R -e "require(devtools); install_version('ggplot2', version = '${GGPLOT_VER}', repos = 'http://cloud.r-project.org');"' returned a non-zero code: 1
thoughts?
Make sure this is consistent along the code
I am encountering this particular error, am I running it incorrectly, or is there a specific input I have to format some way?
./sashimi-plot.py -b 11_sf3_ctrl.mm10.sorted.bam -c chr2:119,046,711-119,057,857 -M 10 -g gencode.vM18.annotation.nogene.gtf
Error: unexpected input in " color_list = list(11_"
Execution halted
Thanks in advance for the help!
Hi,
I've been using ggsashimi for a while now and it's a wonderful package!
I needed some clarification on -s
and -S
arguments. I went through the options as to what the script says the difference between the two are and also went through a similar question someone asked about this. As far as I understand -s SENSE
would be if you want to visualize the reads for the sense strand and -s ANTISENSE
would be if you want to visualize the reads for the antisense strand with respect to RNA-Seq. Meanwhile, I would want to visualize the reads of only for the forward strand (+ strand) in one of the junctions I'd like to view and would not want the reverse strand (- strand) reads. I entered the junction coordinates on IGV, viewed the sashimi plot for the forward and reverse strand and for each of the strands there are different numbers. I expected the same output using ggsashimi and used the -S plus
once and -S minus
another time but I get the same junction reads using both options and there is no difference in the outputs between them.
Here is my script execution incase I'm not making any mistakes and if my understanding is right:
~/tools/ggsashimi/sashimi-plot.py -b ../input_bam.tsv -c 12:6448738-6451363 -g annotation_CD27.gtf --shrink --alpha 0.6 --base-size=20 --ann-height=4 --height=3 --width=18 -P ../palette.txt -C 1 -S plus -o sashimi_CD27_onlyJunc_plus
If needed I can attach a screenshot of what I see for plus and minus strands on IGV and what I see for ggsashimi if I'm not doing something wrong.
Thanks,
Ahish
Hi,
I installed ggsashimi
and both examples in example_run.sh
worked fine. But when I tried with my BAM file, named 2dpi.bam
, I got the following error:
./sashimi-plot.py -b 2dpi.bam -c tig00000003:33700-34300
Error: unexpected symbol in " color_list = list(2dpi"
Execution halted
I started to mess around with different file names and when I changed 2dpi.bam
to dpi2.bam
it worked. It looks like if the BAM file name starts with a number ggsashimi
gives the error above. Maybe this is an issue you would like to address in future versions of ggsashimi
.
Package versions:
Python v3.7.3
R v3.5.1
ggplot2 v3.2.1
data.table v1.12.2
gridExtra v2.3
hello team,
This is a great tool.
I am not sure why when I pull you docker image i see this error
Using default tag: latest
latest: Pulling from guigolab/ggsashimi
Digest: sha256:eab7ae5fa4ae2aadb8fc20e507bda59acb9bf78c12013e8384be2302fbb57f3c
Status: Image is up to date for guigolab/ggsashimi:latest
Error in library(ggplot2) : there is no package called 'ggplot2'
Execution halted
Any help?
Thanks.
Sid
This line fails when some tags are like: tag1 "gene function; ABC1"
. This happens in Ensembl gtfs which have GO terms as tags
Line 288 in 1813aec
--Hi,
i generate the graphic with this command:
./sashimi-plot.py -b input_bams.tsv -c 11:5617339-5625858 -g exons.gtf -M 10 -C 3 -O 3 -A median -F tiff -R 350 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
but the arrows between exons (below the sashimi plots) are not drawn.
an idea ?
thank you --
Trying to set the colors.
I reused the palette file in the example.
How does it know which colors should be associated to which sample ?
My png plot is always gray. I m trying to use it on 5 samples without overlapping.
Also can you give an example of how labels file should be ? I didn't also succeed to make it works.
Hi,
I noticed the coordinate on x-axis of the generated graph would be horizontally truncated if --shrink flag was used. It's the same in your examples.
the command to generate sashimi.pdf:
../sashimi-plot.py -b input_bams.tsv -c chr10:27040584-27048100 -g annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
But the x-axis in the graph end at around 27041500 which is way less than the 27048100 in the -c flag.
We could add support for gtf.gz files since this is what is mostly downloaded from the main sources like Ensembl and Gencode.
Hi,
in some cases R raise a warning (Removed 1 rows containing missing values (geom_segment)) and some exons are not displayed.
I attached an example.
I run the following command
sashimi-plot -b totBs_sashimi.annot.tsv -c '1:109113929-109206781' -g TCONS_00009768_andNear.gtf -M 10 -C 3 -O 3
--shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
Thanks
When using the option --shrink
it may happen that coordinates shown do not match exactly the actual ones in the genome
am plotting two samples on using the same settings on the the example_run.sh, except the gtf file (which is mine). However, while the y axis scale on the first automatically readjusts to median read count, panel 2 is stuck to one. Is there a way to make sure the scales on both panels readjust automatically.
Also is it possible to visual rfkm/tpm rather than median/mean reads?
used this command - ./sashimi-plot.py -b wt_chr2.bam -g Mus_musculus.GRCm38.99.gtf -o wt_1_
Traceback (most recent call last):
File "./sashimi-plot.py", line 578, in
a, junctions = read_bam(bam, args.coordinates, args.strand)
File "./sashimi-plot.py", line 125, in read_bam
_, start, end = parse_coordinates(c)
File "./sashimi-plot.py", line 67, in parse_coordinates
c = c.replace(",", "")
AttributeError: 'NoneType' object has no attribute 'replace'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.