guigolab / ggsashimi Goto Github PK

View Code? Open in Web Editor NEW

116.0 116.0 43.0 10.96 MB

Command-line tool for the visualization of splicing events across multiple samples

License: MIT License

Python 93.43% Shell 3.16% Dockerfile 3.41%

ggsashimi's People

Contributors

Stargazers

Watchers

ggsashimi's Issues

Error in seq.default(start, max(start + 1, end - 4), by = 2425)

$ ./sashimi-plot.py -b input_bam.tsv -g nochr_gencode_exon_transcript.gtf -c 2:29,619,720-29,740,978 -M 10 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt

Error in seq.default(start, max(start + 1, end - 4), by = 2425) :
'from' must be of length 1
Calls: rbind -> [ -> [.data.table -> seq -> seq.default
Execution halted

Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding

$ ./ggsashimi.sh
Python path configuration:
  PYTHONHOME = '/apps/python/3.7'
  PYTHONPATH = (not set)
  program name = '/usr/bin/python3'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  sys._base_executable = '/usr/bin/python3'
  sys.base_prefix = '/apps/python/3.7'
  sys.base_exec_prefix = '/apps/python/3.7'
  sys.executable = '/usr/bin/python3'
  sys.prefix = '/apps/python/3.7'
  sys.exec_prefix = '/apps/python/3.7'
  sys.path = [
    '/apps/python/3.7/lib/python38.zip',
    '/apps/python/3.7/lib/python3.8',
    '/apps/python/3.7/lib/python3.8/lib-dynload',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
ModuleNotFoundError: No module named 'encodings'

Current thread 0x00002ae45d1c7740 (most recent call first):
<no Python frame>

ggsashimi.sh looks like:

#!/bin/bash
module load python
module load samtools
module load R/4.0

python3 /path/sashimi-plot.py \
--bam /path/wt_480480_ChP_ggsashimi.tsv \
--coordinates chrX:166170454-166178830 \
--gtf /path/Mus_musculus.GRCm38.100.gtf \
--overlay 1 \
--color-factor 3 \
--labels 1 \
--fix-y-scale

Warnings when using -A mean_j (median_j)

The test example raises some warnings when -A mean_j or -A median_j option is set:

../sashimi-plot.py -b input_bams.tsv -c chr10:27040584-27048100 -g annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt -A mean_j

There were 18 warnings (use warnings() to see them)

And the arches in the output plot start from (and end into) wrong signal values

Running the default example provided results in a PDF figure where expression labels (ex. 0-600) and the coordinate label are missing. The main figure on the repository, however, has these axes labels. Not sure if this is an issue with the R code or the python...

No reads in the specified Area error

Hi,
I am using the ggsashmi to plot some of the transcripts from gtf file, However when I see in the IGV there are reads for exons as well as junctions. However, when I run sashami-plot script I get error "No reads in the specified area."
Example jpeg for the transcripts from IGV.

I am running following command for the plotting.
sashimi-plot.py -b bam_list -C 4 -L 3 -c chr22:39696699-39700075 -o MSTRG10006 -g ../Final_novel.gtf

Add ability to test different versions of R and ggplot2 in Travis

KeyError: 'transcript_id' with Ensemble human annotation

From @sridhar0605 originally posted in #1:

When using Homo_sapiens.GRCh37.75.gtf as reference from ensembl, I see this error

Using default tag: latest
latest: Pulling from guigolab/ggsashimi
915665fee719: Pull complete
1a0814f59c8e: Pull complete
b3b71680ed5d: Pull complete
1c3c8afa6ada: Pull complete
2fbeb903a5b4: Pull complete
Digest: sha256:82590f821978568e948ad4861ce009fcb26e7543263bea9d7b78c17667f8d675
Status: Downloaded newer image for guigolab/ggsashimi:latest
Traceback (most recent call last):
File "/sashimi-plot.py", line 592, in
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "/sashimi-plot.py", line 278, in read_gtf
transcript_id = d["transcript_id"]
KeyError: 'transcript_id'

few lines from gtf:

#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1	pseudogene	gene	11869	14412	.	+	.	gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
1	processed_transcript	transcript	11869	14409	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
1	processed_transcript	exon	11869	12227	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00002234944";
1	processed_transcript	exon	12613	12721	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00003582793";
1	processed_transcript	exon	13221	14409	.	+	.	gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon_id "ENSE00002312635";

IndexError: list index out of range

./sashimi-plot.py -b input_bams.tsv -g Mus_musculus.GRCm38.99.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt
Traceback (most recent call last):
File "./sashimi-plot.py", line 573, in
for id, bam, overlay_level, color_level, label_text in read_bam_input(args.bam, args.overlay, args.color_factor, args.labels):
File "./sashimi-plot.py", line 176, in read_bam_input
bam = get_bam_path(f, line_sp[1])
IndexError: list index out of range

ValueError: need more than 1 value to unpack

Hello,

First off, thank you for developing this tool. I'm trying to run ggsashimi but I'm encountering the following error:

Traceback (most recent call last):
File "/home/sashimi-plot.py", line 632, in
transcripts, exons = read_gtf(args.gtf, args.coordinates)
File "/home/sashimi-plot.py", line 285, in read_gtf
el_chr, _, el, el_start, el_end, _, strand, _, tags = line.strip().split("\t")
ValueError: need more than 1 value to unpack

I would appreciate some help with this, thank you.

Nelson Barrientos

Aggregation doesn't work with long genomic regions

Hi, it seems that the mean and median aggregate functions throw an error if the genomic region is too large. I'm currently looking at a region 147,000 bases long and am getting this error when I try and use the mean or median aggregation

Error in j[4] = as.numeric(d[x == j[2] + 1, y]) : replacement has length zero Execution halted

It works just fine if I remove the aggregation or if I shrink the genomic region. What may be causing this?

Feature request: Add option for fixed y-axis scales

Hi!
In some cases it would be nice if the y-axis scale was fixed for each compared condition. I think this could help visualize differences in the inclusion levels of alternative events. There are different ways in which this feature could be implemented, one could either fix the y-scale to that of the max read count among all the plots or it could be fixed to a value that is manually entered by the user. Both of these approaches should be fairly easy to implement using standard ggplot syntax and I could try it out if this sounds interesting to anyone else!

Integration in Galaxy

Hello,

I really like your tool and I find it very useful. I wonder if you already thought about wrapping your ggsashimi into Galaxy ?
I am currently working in the bioinformatics plateform ARTbio (see our github here ) and we work a lot with Galaxy by developping the framework but also different tools and wrapping them.

Would you agree to allow our platform to create the galaxy wrapper ? This would be an opportunity to provide a sashimi plot generator to Galaxy users.

Best,
Lea

How to control the y axis with same range

I noticed the sashimi plot with multiple samples the y axis ranges are differently scaled. Is there a way so that I can set the range is same across samples?

Wrong colors when using the color-factor option without overlay or a palette file

Running the following command:

./sashimi-plot.py -b examples/input_bams.tsv -c chr10:27040584-27048100 -C 3

only the first color is used. Using a palette file or the -O option gives the correct colors.

Annotate gene symbol and exon number

Hi,

I wanted to know if there is any way to annotate gene_name and exon_number in addition to the transcript_id from the GTF file:

GTF file:

##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74)
##provider: GENCODE
##contact: [email protected]
##format: gtf
##date: 2013-12-05
##modified by GTEx_Collapsed_Gene_Model.py
1       HAVANA  gene    11869   14362   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2";
1       HAVANA  transcript      11869   14362   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"
; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG0
0000000961.2";
1       HAVANA  exon    11869   12227   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_1; exon_number 1";
1       HAVANA  exon    12595   12721   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_2; exon_number 2";
1       HAVANA  exon    12975   13052   .       +       .       gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_s
tatus "KNOWN"; gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG000000009
61.2"; exon_id "ENSG00000223972.4_3; exon_number 3";

Sashimi Plot:

Thanks
Komal

Error in grid.Call.graphics

Received by email

I am interested in using your ggsashimi utility to generate some plots for a research publication. I have installed the listed dependencies and R packages for the utility from my bash prompt. (I am using the Linux subsystem for Windows 10.) However, when I attempt to run either my own samples or the provided example files, I get the error message provided below. Unfortunately, my knowledge of Python is limited. Does this indicate an error with my system setup? I appreciate the assistance!

Example command I used:

./sashimi-plot.py -b ./examples/input_bams.tsv -c chr10:27040584-27048100 -g ./examples/annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P ./examples/palette.txt

Which returns the following message:

Warning messages:
1: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] +  :
  number of items to replace is not a multiple of replacement length
2: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
  number of items to replace is not a multiple of replacement length
3: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] +  :
  number of items to replace is not a multiple of replacement length
4: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
  number of items to replace is not a multiple of replacement length
5: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] +  :
  number of items to replace is not a multiple of replacement length
6: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
  number of items to replace is not a multiple of replacement length
7: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] +  :
  number of items to replace is not a multiple of replacement length
8: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
  number of items to replace is not a multiple of replacement length
9: In density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1] +  :
  number of items to replace is not a multiple of replacement length
10: In density_grobs[[id]]$widths[3 + vs] <- maxYtextWidth :
  number of items to replace is not a multiple of replacement length
Error in grid.Call.graphics(L_setviewport, vp, TRUE) :
  INTEGER() can only be applied to a 'integer', not a 'NULL'
Calls: ggsave ... lapply -> FUN -> push.vp.viewport -> grid.Call.graphics
Execution halted

Error: unexpected symbol in " color_list = list(ConAc_A2-11D.sorted"

Hi,

I get the following error when using sashimi-plot.py

This is my code
python sashimi-plot.py -b /project/uml_frederic_chain/Daphnia/index/ConAc_A2-11D.sorted.bam -c DAPPUscaffold_67:640865-642375 -g /project/uml_frederic_chain/Daphnia/Daphnia_pulex.V1.0.41.exons.gtf --shrink

Could you please help me out in resolving this.

Histogram smoothing possible?

I pulled ggsashimi from DockerHub and am trying to smooth the histograms with "--smooth" but the argument is not recognized. In the GitHub code it looks to be commented out. Is there another argument or is smoothing not possible, currently?

Thanks, Luke

Annotation and density plots are not aligned with ggplot2 version greater than 2.2.1

In ggplot2 versions greater that 2.2.1 grobs have a different number of colums

Sashimi plot read counts (above curves)

Hi, I'm curious about the meaning of numbers plotted above the curves (splice junction reads).

for single samples, the number seems to match the count of splice junction reads. (something like 206, 159.. )

But if I input multiple samples (4 samples) in one group and draw sashimi plot, the numbers get extremely huge. (something like : 347,142,108,206). It is definitely not the sum of junction read counts in 4 samples. I wonder what those huge numbers mean.

Thanks!

Generate a BED file with the junction coordinates and split read counts on each BAM file

Maybe useful for some users

Density plots do not have Y axis with ggplot2 version greater than 2.2.1

ggsashimi/sashimi-plot.py

Line 673 in 46a9bf3

  gp = gp + scale_y_continuous(breaks=ggplot_build(gp)$layout$panel_ranges[[1]]$y.major_source) 

Have a look at this Stack Overflow thread

No y scale

I'm trying to make sashimi plot using the examples files in the example folder and just copy/paste the command line you suggested and I get the plots like the example but no y axis at all. Any clues to fix this issue?
sashimi.pdf

Add exception for GTF files with gene rows as well

I think now we currently support only gtf files with transcripts and exons, assuming that gene rows would still have a transcript_id tag in the metadata. Unfortunately, newer gtf formats do not have transcript_id for genes anymore and it throws error. We should add an exception for this

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 790: ordinal not in range(128)

Hi,
When im running :

docker run -w $PWD -v $PWD:$PWD guigolab/ggsashimi -b bam_input.tsv -c 1:138062859-138102859
following error occurs:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 790: ordinal not in range(128)

When im running the container with one specified bam file the image works fine!

I'm using Mac OS Catalina in case if this is important.

Use output pdf for CI checksum test

Plot dimension too big

Hi,

I'm trying to plot 26 bams separately in one single plot, but i constantly get the following error. I tried to tweak the parameters, specially the --height but it doesn't seem to work. I can't edit the R script either to set limitsize to FALSE. When I overlay the plot by groups I can nicely see the plot, but then when showing the reads from each sample within the same layer gets a bit messy, and I do not want to aggregate the counts in this case.

Error: Dimensions exceed 50 inches (height and width are specified in 'in' not pixels). If you're sure you want a plot that big, use `limitsize = FALSE`.

Any help one track per sample ?
Best,
Pedro

How to generate transcript diagram at the bottom

Hi, I want to ask how to generate the transcript diagram (that consists transcript id and strand) at the bottom of sahsimi plot as you have generated in your example. I have tried, But I am not getting anything at the bottom..
Thanks
Heena

Change font size of transcript id when using `-gtf`

I would like to change the size of transctipt id when using --gtf, but I am not sure which part of the R script needs to be modified to achieve this. Could you point to the part of the script that controls this? Thanks!

error in R script portion

when I turn on GGSASHIMI_DEBUG and run the R script it produces, I get the following when I run the script in R:

library(ggplot2)
 library(grid)
 library(gridExtra)
 library(data.table)
 library(gtable)
 
 scale_lwd = function(r) {
   lmin = 0.1
   lmax = 4
   return( r*(lmax-lmin)lmin )
 }
 
 base_size = 14
 height = ( 5.5  base_size*0.352777778/67 ) * 1.02
 width = 10
 theme_set(theme_bw(base_size=base_size))
 theme_update(
   plot.margin = unit(c(15,15,15,15), "pt"),
   panel.grid = element_blank(),
   panel.border = element_blank(),
   axis.line = element_line(size=0.5),
   axis.title.x = element_blank(),
   axis.title.y = element_text(angle=0, vjust=0.5)
 )
 
 labels = list("#000000"="#000000","WT"="WT","1KO"="1KO")
 
 density_list = list()
 junction_list = list()
 
 color_list = list("WT"="#000000","1KO"="#000000")
 
 
 # data table with exons
 ann_list = list(
   "exons" = data.table(),
   "introns" = data.table()
 )
 
 
 gtfp = ggplot()
 if (length(ann_list[['introns']])  0) {
   gtfp = gtfp  geom_segment(data=ann_list[['introns']], aes(x=start, xend=end, y=tx, yend=tx), size=0.3)
   gtfp = gtfp  geom_segment(data=txarrows, aes(x=V1,xend=V2,y=tx,yend=tx), arrow=arrow(length=unit(0.02,"npc")))
 }
 if (length(ann_list[['exons']])  0) {
   gtfp = gtfp  geom_segment(data=ann_list[['exons']], aes(x=start, xend=end, y=tx, yend=tx), size=5, alpha=1)
 }
 gtfp = gtfp  scale_y_discrete(expand=c(0,0.5))
 gtfp = gtfp  scale_x_continuous(expand=c(0,0.25))
 gtfp = gtfp  coord_cartesian(xlim = c(6064339,6070090))
 gtfp = gtfp  labs(y=NULL)
 gtfp = gtfp  theme(axis.line = element_blank(), axis.text.x = element_blank(), axis.ticks = element_blank())
 
[density lists and junction lists omitted bc too long]
                                                                              junction_list[["1KO"]] = data.frame(x=c(6064422,6064422,6065846,6064422,6064422,6065846,6064422,6064422,6065846), xend=c(6065705,6069831,6069831,6065705,6069831,6069831,6065705,6069831,6069831), y=c(39,39,13,39,39,13,39,39,13), yend=c(14,34,34,14,34,34,14,34,34), count=c(13,22,8,13,22,8,13,22,8))
Error: unexpected symbol in:
"20,6064821,6064822,6064823,6064824,6064825,6064826,6064827,6064828,6064829,6064830,6064831,6064832,6064833,6064834,6064835,6064836,6064837,6064838,6064839,6064840,6
                                                                             junction_list"
                                                                              
                                                                              
                                                                              pdf(NULL) # just to remove the blank pdf produced by ggplotGrob
                                                                              
                                                                              if(packageVersion('ggplot2') = '3.0.0'){  # fix problems with ggplot2 vs 3.0.0
                                                                                vs = 1
                                                                              } else {
                                                                                vs = 0
                                                                              }
                                                                              
                                                                              if(TRUE) {
                                                                                print(max(unlist(lapply(density_list, function(df){max(df$y)}))))
                                                                                maxheight = max(unlist(lapply(density_list, function(df){max(df$y)})))
                                                                                breaks_y = labeling::extended(0, maxheight, m = 4)
                                                                              }
[1] -Inf
Error in seq.default(from = best$lmin, to = best$lmax, by = best$lstep) : 
  'from' must be of length 1
In addition: Warning messages:
1: In max(unlist(lapply(density_list, function(df) { :
  no non-missing arguments to max; returning -Inf
2: In max(unlist(lapply(density_list, function(df) { :
  no non-missing arguments to max; returning -Inf
                                                                              
                                                                              if(exists('coord_dict')){
                                                                                all_pos_shrinked = do.call(rbind, density_list)$x
                                                                                s2r = merge(intersected_introns, coord_dict, by.x = 'real_xend', by.y = 'real')
                                                                                s2r = merge(s2r, coord_dict, by.x = 'real_x', by.y = 'real', suffixes = c('_xend', '_x'))
                                                                                breaks_x_shrinked = labeling::extended(min(all_pos_shrinked), max(all_pos_shrinked), m = 5)
                                                                                breaks_x = c()
                                                                                out_range = c()
                                                                                for (b in breaks_x_shrinked){
                                                                                  iintron = FALSE
                                                                                  for (j in 1:nrow(s2r)){
                                                                                    l = s2r[j, ]
                                                                                    if(b = l$shrinked_x && b <= l$shrinked_xend){
                                                                                      # Intersected intron
                                                                                      p = (b-l$shrinked_x)/(l$shrinked_xend - l$shrinked_x)
                                                                                      realb = round(l$real_x  p*(l$real_xend - l$real_x))
                                                                                      breaks_x = c(breaks_x, realb)
                                                                                      iintron = TRUE
                                                                                      break
                                                                                    }
                                                                                  }
                                                                                  if (!iintron){
                                                                                    # Exon, upstream/downstream intergenic region or intron (not intersected)
                                                                                    if(b <= min(s2r$shrinked_x)) {
                                                                                      l <- s2r[which.min(s2r$shrinked_x), ]
                                                                                      if(any(b == all_pos_shrinked)){
                                                                                        # Boundary (subtract)
                                                                                        s = l$shrinked_x - b
                                                                                        realb = l$real_x - s
                                                                                        breaks_x = c(breaks_x, realb)
                                                                                      } else {
                                                                                        out_range <- c(out_range, which(breaks_x_shrinked == b))
                                                                                      }
                                                                                    } else if (b = max(s2r$shrinked_xend)){
                                                                                      l <- s2r[which.max(s2r$shrinked_xend), ]
                                                                                      if(any(b == all_pos_shrinked)){
                                                                                        # Boundary (sum)
                                                                                        s = b - l$shrinked_xend
                                                                                        realb = l$real_xend  s
                                                                                        breaks_x = c(breaks_x, realb)
                                                                                      } else {
                                                                                        out_range <- c(out_range, which(breaks_x_shrinked == b))
                                                                                      }
                                                                                    } else {
                                                                                      delta = b-s2r$shrinked_xend
                                                                                      delta[delta < 0] = Inf
                                                                                      l = s2r[which.min(delta), ]
                                                                                      # Internal (sum)
                                                                                      s = b - l$shrinked_xend
                                                                                      realb = l$real_xend  s
                                                                                      breaks_x = c(breaks_x, realb)
                                                                                    }
                                                                                  }
                                                                                }
                                                                                if(length(out_range)) {
                                                                                  breaks_x_shrinked = breaks_x_shrinked[-out_range]
                                                                                }
                                                                              }
                                                                              
                                                                              density_grobs = list();
                                                                              
                                                                              for (bam_index in 1:length(density_list)) {
                                                                                
                                                                                id = names(density_list)[bam_index]
                                                                                d = data.table(density_list[[id]])
                                                                                junctions = data.table(junction_list[[id]])
                                                                                
                                                                                # Density plot
                                                                                gp = ggplot(d)  geom_bar(aes(x, y), width=1, position='identity', stat='identity', fill=color_list[[id]], alpha=0.05)
                                                                                gp = gp  labs(y=labels[[id]])
                                                                                if(exists('coord_dict')) {
                                                                                  gp = gp  scale_x_continuous(expand=c(0, 0.25), breaks = breaks_x_shrinked, labels = breaks_x)
                                                                                } else {
                                                                                  gp = gp  scale_x_continuous(expand=c(0, 0.25))
                                                                                }
                                                                                
                                                                                if(!TRUE){
                                                                                  maxheight = max(d[['y']])
                                                                                  breaks_y = labeling::extended(0, maxheight, m = 4)
                                                                                  gp = gp  scale_y_continuous(breaks = breaks_y)
                                                                                } else {
                                                                                  gp = gp  scale_y_continuous(breaks = breaks_y, limits = c(NA, maxheight))
                                                                                }
                                                                                
                                                                                # Aggregate junction counts
                                                                                row_i = c()
                                                                                if (nrow(junctions) 0 ) {
                                                                                  
                                                                                  junctions$jlabel = as.character(junctions$count)
                                                                                  junctions = setNames(junctions[,.(max(y), max(yend),round(mean(count)),paste(jlabel,collapse=",")), keyby=.(x,xend)], names(junctions))
                                                                                  if ("mean" != "") {
                                                                                    junctions = setNames(junctions[,.(max(y), max(yend),round(mean(count)),round(mean(count))), keyby=.(x,xend)], names(junctions))
                                                                                  }
                                                                                  # The number of rows (unique junctions per bam) has to be calculated after aggregation
                                                                                  row_i = 1:nrow(junctions)
                                                                                }
                                                                                
                                                                                
                                                                                for (i in row_i) {
                                                                                  
                                                                                  j_tot_counts = sum(junctions[['count']])
                                                                                  
                                                                                  j = as.numeric(junctions[i,1:5])
                                                                                  
                                                                                  if ("mean" != "") {
                                                                                    j[3] = ifelse(length(d[x==j[1]-1,y])==0, 0, max(as.numeric(d[x==j[1]-1,y])))
                                                                                    j[4] = ifelse(length(d[x==j[2]1,y])==0, 0, max(as.numeric(d[x==j[2]1,y])))
                                                                                  }
                                                                                  
                                                                                  # Find intron midpoint
                                                                                  xmid = round(mean(j[1:2]), 1)
                                                                                  ymid = max(j[3:4]) * 1.2
                                                                                  
                                                                                  # Thickness of the arch
                                                                                  lwd = scale_lwd(j[5]/j_tot_counts)
                                                                                  
                                                                                  curve_par = gpar(lwd=lwd, col=color_list[[id]])
                                                                                  
                                                                                  # Arc grobs
                                                                                  
                                                                                  # Choose position of the arch (top or bottom)
                                                                                  nss = i
                                                                                  if (nss%%2 == 0) {  #bottom
                                                                                    ymid = -0.3 * maxheight
                                                                                    # Draw the arcs
                                                                                    # Left
                                                                                    curve = xsplineGrob(x=c(0, 0, 1, 1), y=c(1, 0, 0, 0), shape=1, gp=curve_par)
                                                                                    gp = gp  annotation_custom(grob = curve, j[1], xmid, 0, ymid)
                                                                                    # Right
                                                                                    curve = xsplineGrob(x=c(1, 1, 0, 0), y=c(1, 0, 0, 0), shape=1, gp=curve_par)
                                                                                    gp = gp  annotation_custom(grob = curve, xmid, j[2], 0, ymid)
                                                                                  }
                                                                                  
                                                                                  if (nss%%2 != 0) {  #top
                                                                                    # Draw the arcs
                                                                                    # Left
                                                                                    curve = xsplineGrob(x=c(0, 0, 1, 1), y=c(0, 1, 1, 1), shape=1, gp=curve_par)
                                                                                    gp = gp  annotation_custom(grob = curve, j[1], xmid, j[3], ymid)
                                                                                    # Right
                                                                                    curve = xsplineGrob(x=c(1, 1, 0, 0), y=c(0, 1, 1, 1), shape=1, gp=curve_par)
                                                                                    gp = gp  annotation_custom(grob = curve, xmid, j[2], j[4], ymid)
                                                                                  }
                                                                                  
                                                                                  # Add junction labels
                                                                                  gp = gp  annotate("label", x = xmid, y = ymid, label = as.character(junctions[i,6]),
                                                                                                     vjust=0.5, hjust=0.5, label.padding=unit(0.01, "lines"),
                                                                                                     label.size=NA, size=(base_size*0.352777778)*0.6
                                                                                  )
                                                                                  
                                                                                  
                                                                                  #               gp = gp  annotation_custom(grob = rectGrob(x=0, y=0, gp=gpar(col="red"), just=c("left","bottom")), xmid, j[2], j[4], ymid)
                                                                                  #               gp = gp  annotation_custom(grob = rectGrob(x=0, y=0, gp=gpar(col="green"), just=c("left","bottom")), j[1], xmid, j[3], ymid)
                                                                                  
                                                                                  
                                                                                }
                                                                                
                                                                                gpGrob = ggplotGrob(gp);
                                                                                gpGrob$layout$clip[gpGrob$layout$name=="panel"] <- "off"
                                                                                if (bam_index == 1) {
                                                                                  maxWidth = gpGrob$widths[2vs]  gpGrob$widths[3vs];    # fix problems ggplot2 vs
                                                                                  maxYtextWidth = gpGrob$widths[3vs];                     # fix problems ggplot2 vs
                                                                                  # Extract x axis grob (trim=F -- keep empty cells)
                                                                                  xaxisGrob <- gtable_filter(gpGrob, "axis-b", trim=F)
                                                                                  xaxisGrob$heights[8vs] = gpGrob$heights[1]              # fix problems ggplot2 vs
                                                                                  x.axis.height = gpGrob$heights[7vs]  gpGrob$heights[1] # fix problems ggplot2 vs
                                                                                }
                                                                                
                                                                                
                                                                                # Remove x axis from all density plots
                                                                                kept_names = gpGrob$layout$name[gpGrob$layout$name != "axis-b"]
                                                                                gpGrob <- gtable_filter(gpGrob, paste(kept_names, sep="", collapse="|"), trim=F)
                                                                                
                                                                                # Find max width of y text and y label and max width of y text
                                                                                maxWidth = grid::unit.pmax(maxWidth, gpGrob$widths[2vs]  gpGrob$widths[3vs]); # fix problems ggplot2 vs
                                                                                maxYtextWidth = grid::unit.pmax(maxYtextWidth, gpGrob$widths[3vs]); # fix problems ggplot2 vs
                                                                                density_grobs[[id]] = gpGrob;
                                                                              }
Error in density_list[[id]] : 
  attempt to select less than one element in get1index
                                                                              
                                                                              # Add x axis grob after density grobs BEFORE annotation grob
                                                                              density_grobs[["xaxis"]] = xaxisGrob
                                                                              
                                                                              # Annotation grob
                                                                              if (1.0 == 1) {
                                                                                gtfGrob = ggplotGrob(gtfp);
                                                                                maxWidth = grid::unit.pmax(maxWidth, gtfGrob$widths[2vs]  gtfGrob$widths[3vs]); # fix problems ggplot2 vs
                                                                                density_grobs[['gtf']] = gtfGrob;
                                                                              }
Error in ans[ypos] <- rep(yes, length.out = len)[ypos] : 
  replacement has length zero
In addition: Warning message:
In rep(yes, length.out = len) : 'x' is NULL so the result will be NULL
                                                                              
                                                                              # Reassign grob widths to align the plots
                                                                              for (id in names(density_grobs)) {
                                                                                density_grobs[[id]]$widths[1] <- density_grobs[[id]]$widths[1]  maxWidth - (density_grobs[[id]]$widths[2vs]  maxYtextWidth); # fix problems ggplot2 vs
                                                                                density_grobs[[id]]$widths[3vs] <- maxYtextWidth # fix problems ggplot2 vs
                                                                              }
Error: object of type 'closure' is not subsettable
                                                                              
                                                                              # Heights for density, x axis and annotation
                                                                              heights = unit.c(
                                                                                unit(rep(2, length(density_list)), "in"),
                                                                                x.axis.height,
                                                                                unit(1.5*1.0, "in")
                                                                              )
Error in unit(rep(2, length(density_list)), "in") : 
  'x' and 'units' must have length  0
                                                                              
                                                                              # Arrange grobs
                                                                              argrobs = arrangeGrob(
                                                                                grobs=density_grobs,
                                                                                ncol=1,
                                                                                heights = heights,
                                                                              );
Error in arrangeGrob(grobs = density_grobs, ncol = 1, heights = heights,  : 
  object 'heights' not found
                                                                              
                                                                              # Save plot to file in the requested format
                                                                              if ("pdf" == "tiff"){
                                                                                # TIFF images will be lzw-compressed
                                                                                ggsave("/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf", plot = argrobs, device = "tiff", width = width, height = height, units = "in", dpi = 300, compression = "lzw", limitsize = FALSE)
                                                                              } else {
                                                                                ggsave("/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf", plot = argrobs, device = "pdf", width = width, height = height, units = "in", dpi = 300, limitsize = FALSE)
                                                                              }
Error in grDevices::pdf(file = filename, ..., version = version) : 
  cannot open file '/orange/swanson/carter.h/ChP_Splicing_Project/Mbnl_12KO_ChP/rMATS_output_1KO_ChP_Triplicate/testing_SE_sashimi_plots/Arhgap12_chr18_6064340-6070091_meanj_-.pdf'
                                                                              
                                                                              dev.log = dev.off()

File "sashimi-plot.py", line 214, in intersect_introns a, b = next(it) StopIteration

Hi,

When I run the sashimi-plot.py it comes out this error. Could you help me solve this problem? Thank you!

(python3) [myang9@sbcsmdplp001 ReadsCoverage]$ sashimi-plot.py -s SENSE --shrink -F png -C 2 -g Homo_sapiens.GRCh38.94.gtf -b BamR6103640ACSE98545.bam -c chr1:196788215-196790227 -o R6103640-AC.SE_98545
Traceback (most recent call last):
File "sashimi-plot.py", line 214, in intersect_introns
a, b = next(it)
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "sashimi-plot.py", line 644, in
intersected_introns = list(intersect_introns(introns))
RuntimeError: generator raised StopIteration

What's the difference between the flag -S OUT_STRAND and -s STRAND?

Hi,

I am using the ggsashimi to generate sashimi plots for my results. The alternative splicing events I am interested in are located in either + or - strand. I checked the help info of ggsashimi, and the -S OUT_STRAND and -s STRAND are the two flags I can set to focus on only one strand.

But I am not sure how to correctly set the two flags. For my understanding, if the alternative splicing event I am interested in is located in - strand, -s was set to ANTISENSE. How about -S? Should I set it to minus which is compatible to -s flag? What does the out strand mean?

thanks,

Shan

Error: unexpected '=' in " color_list = list(input_bam="

Hello all,

I'm testing ggshashimi with a single bam input but keep getting the error bellow:

Error: unexpected '=' in " color_list = list(bam_iput="
Execution halted

I'm running:
Python 2.7.14
R version 3.4.1
ggplot2 v2.2.1
gridextra v2.3
data.table v1.11.4

Any help will be much appreciated and thank you for generating ggsashimi!
Best Gil

Docker build error: package devtools is not available

Hi! I was trying to run it via docker and started building the image:
docker build -f docker/Dockerfile -t guigolab/ggsashimi .

However it complains about devtools:

Step 6/14 : RUN R -e 'install.packages("devtools");' &&     R -e "require(devtools); install_version('ggplot2', version = '${GGPLOT_VER}', repos = 'http://cloud.r-project.org');"
 ---> Running in fcc8a5f682b6

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> install.packages("devtools");
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning: unable to access index for repository https://mran.microsoft.com/snapshot/2017-04-21/src/contrib:
  Line starting '<!doctype html><html ...' is malformed!
Warning message:
package ‘devtools’ is not available (for R version 3.3.3)
>
>

R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> require(devtools); install_version('ggplot2', version = '2.2.1', repos = 'http://cloud.r-project.org');
Loading required package: devtools
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘devtools’
Error: could not find function "install_version"
Execution halted
The command '/bin/sh -c R -e 'install.packages("devtools");' &&     R -e "require(devtools); install_version('ggplot2', version = '${GGPLOT_VER}', repos = 'http://cloud.r-project.org');"' returned a non-zero code: 1

thoughts?

Check coordinate 0/1-indexing

Make sure this is consistent along the code

Unexpected input in " color_list = list(11_"

I am encountering this particular error, am I running it incorrectly, or is there a specific input I have to format some way?

./sashimi-plot.py -b 11_sf3_ctrl.mm10.sorted.bam -c chr2:119,046,711-119,057,857 -M 10 -g gencode.vM18.annotation.nogene.gtf
Error: unexpected input in " color_list = list(11_"
Execution halted

Thanks in advance for the help!

-S plus vs -S minus shows the same reads

Hi,

I've been using ggsashimi for a while now and it's a wonderful package!

I needed some clarification on -s and -S arguments. I went through the options as to what the script says the difference between the two are and also went through a similar question someone asked about this. As far as I understand -s SENSE would be if you want to visualize the reads for the sense strand and -s ANTISENSE would be if you want to visualize the reads for the antisense strand with respect to RNA-Seq. Meanwhile, I would want to visualize the reads of only for the forward strand (+ strand) in one of the junctions I'd like to view and would not want the reverse strand (- strand) reads. I entered the junction coordinates on IGV, viewed the sashimi plot for the forward and reverse strand and for each of the strands there are different numbers. I expected the same output using ggsashimi and used the -S plus once and -S minus another time but I get the same junction reads using both options and there is no difference in the outputs between them.

Here is my script execution incase I'm not making any mistakes and if my understanding is right:
~/tools/ggsashimi/sashimi-plot.py -b ../input_bam.tsv -c 12:6448738-6451363 -g annotation_CD27.gtf --shrink --alpha 0.6 --base-size=20 --ann-height=4 --height=3 --width=18 -P ../palette.txt -C 1 -S plus -o sashimi_CD27_onlyJunc_plus

If needed I can attach a screenshot of what I see for plus and minus strands on IGV and what I see for ggsashimi if I'm not doing something wrong.

Thanks,
Ahish

Error if BAM filename starts with number

Hi,
I installed ggsashimi and both examples in example_run.sh worked fine. But when I tried with my BAM file, named 2dpi.bam, I got the following error:

./sashimi-plot.py -b 2dpi.bam -c tig00000003:33700-34300

Error: unexpected symbol in "        color_list = list(2dpi"
Execution halted

I started to mess around with different file names and when I changed 2dpi.bam to dpi2.bam it worked. It looks like if the BAM file name starts with a number ggsashimi gives the error above. Maybe this is an issue you would like to address in future versions of ggsashimi.

Package versions:
Python v3.7.3
R v3.5.1
ggplot2 v3.2.1
data.table v1.12.2
gridExtra v2.3

list of bams

no ggplot2 in docker

hello team,

This is a great tool.

I am not sure why when I pull you docker image i see this error

Using default tag: latest
latest: Pulling from guigolab/ggsashimi
Digest: sha256:eab7ae5fa4ae2aadb8fc20e507bda59acb9bf78c12013e8384be2302fbb57f3c
Status: Image is up to date for guigolab/ggsashimi:latest
Error in library(ggplot2) : there is no package called 'ggplot2'
Execution halted

Any help?

Thanks.

Sid

Improve split logic in gtf tags

This line fails when some tags are like: tag1 "gene function; ABC1". This happens in Ensembl gtfs which have GO terms as tags

ggsashimi/sashimi-plot.py

Line 288 in 1813aec

d = dict(kv.strip().split(" ") for kv in tags.strip(";").split("; "))

something missing in graphic

--Hi,
i generate the graphic with this command:
./sashimi-plot.py -b input_bams.tsv -c 11:5617339-5625858 -g exons.gtf -M 10 -C 3 -O 3 -A median -F tiff -R 350 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt

but the arrows between exons (below the sashimi plots) are not drawn.

an idea ?
thank you --

Color and label settings

Trying to set the colors.
I reused the palette file in the example.
How does it know which colors should be associated to which sample ?
My png plot is always gray. I m trying to use it on 5 samples without overlapping.
Also can you give an example of how labels file should be ? I didn't also succeed to make it works.

coordinate flag do not match with coordinate on x-axis of generated graph if using --shrink flag

Hi,

I noticed the coordinate on x-axis of the generated graph would be horizontally truncated if --shrink flag was used. It's the same in your examples.

the command to generate sashimi.pdf:
../sashimi-plot.py -b input_bams.tsv -c chr10:27040584-27048100 -g annotation.gtf -M 10 -C 3 -O 3 --shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt

But the x-axis in the graph end at around 27041500 which is way less than the 27048100 in the -c flag.

sashimi.pdf

Add support for gzipped gtf files

We could add support for gtf.gz files since this is what is mostly downloaded from the main sources like Ensembl and Gencode.

Exons not displayed properly

Hi,
in some cases R raise a warning (Removed 1 rows containing missing values (geom_segment)) and some exons are not displayed.
I attached an example.

I run the following command

sashimi-plot -b totBs_sashimi.annot.tsv -c '1:109113929-109206781' -g TCONS_00009768_andNear.gtf -M 10 -C 3 -O 3
--shrink --alpha 0.25 --base-size=20 --ann-height=4 --height=3 --width=18 -P palette.txt

Thanks

example.zip

Shrinkage of introns leads to wrong coordinates in x-axis

When using the option --shrink it may happen that coordinates shown do not match exactly the actual ones in the genome

rpkm/tpm rather than median/mean reads

am plotting two samples on using the same settings on the the example_run.sh, except the gtf file (which is mine). However, while the y axis scale on the first automatically readjusts to median read count, panel 2 is stuck to one. Is there a way to make sure the scales on both panels readjust automatically.
Also is it possible to visual rfkm/tpm rather than median/mean reads?

AttributeError: 'NoneType' object has no attribute 'replace'

used this command - ./sashimi-plot.py -b wt_chr2.bam -g Mus_musculus.GRCm38.99.gtf -o wt_1_

Traceback (most recent call last):
File "./sashimi-plot.py", line 578, in
a, junctions = read_bam(bam, args.coordinates, args.strand)
File "./sashimi-plot.py", line 125, in read_bam
_, start, end = parse_coordinates(c)
File "./sashimi-plot.py", line 67, in parse_coordinates
c = c.replace(",", "")
AttributeError: 'NoneType' object has no attribute 'replace'

guigolab / ggsashimi Goto Github PK

ggsashimi's People

Contributors

Stargazers

Watchers

Forkers

ggsashimi's Issues

Received by email

Recommend Projects

Recommend Topics

Recommend Org