conesalab / sqanti3 Goto Github PK

Tool for the Quality Control of Long-Read Defined Transcriptomes

License: GNU General Public License v3.0

Python 6.21% R 5.24% Perl 0.43% AMPL 2.59% JavaScript 0.01% CSS 0.02% HTML 83.90% Shell 1.61% Makefile 0.01%

sqanti3's Introduction

SQANTI3

SQANTI3 is the newest version of the SQANTI tool that merges features from SQANTI and SQANTI2, together with new additions. SQANTI3 will continue as an integrated development aiming to provide the best characterization for your new long read-defined transcriptome.

SQANTI3 is the first module of the Functional IsoTranscriptomics (FIT) framework, which also includes IsoAnnot and tappAS.

Installation

The latest SQANTI3 release (31/07/2024) is version 5.2.2. See our wiki for installation instructions.

For informacion about previous releases and features introduced in them, see the version history.

WARNING: v5.0 represented a major release of the SQANTI3 software. Versions of SQANTI3 >= 5.0 will not have backward compatibility with previous releases and their output (v4.3 and earlier). Users that wish to apply any of the new functionalities in v5.0 to output files from older versions will herefore need to re-run SQANTI3 QC. See below for a full list of changes implemented in SQANTI3 v5.0.

Documentation

For detailed documentation, please visit the SQANTI3 wiki.

Wiki contents:

Please, note that we are currently updating and expanding the wiki to provide as much information as possible and enhance the SQANTI3 user experience. Pages under construction -or where information is still missing- will be indicated where appropriate. Thank you for your patience!

How to cite SQANTI3

If you are using SQANTI3 in your research, please cite the following paper in addition to this repository:

Pardo-Palacios, F.J., Arzalluz-Luque, A. et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat Methods (2024). https://doi.org/10.1038/s41592-024-02229-2

sqanti3's People

Contributors

Stargazers

Watchers

sqanti3's Issues

classification step assertion error cupcake

Trying to run sqanti3_qc and run into an error once reaching the classification step:

**** Performing Classification of Isoforms....
Number of classified isoforms: 73118
Traceback (most recent call last):
File "/mnt/xomics/renees/tools/SQANTI3/sqanti3_qc.py", line 2289, in
main()
File "/mnt/xomics/renees/tools/SQANTI3/sqanti3_qc.py", line 2273, in main
run(args)
File "/mnt/xomics/renees/tools/SQANTI3/sqanti3_qc.py", line 1724, in run
write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
File "/mnt/xomics/renees/tools/SQANTI3/sqanti3_qc.py", line 425, in write_collapsed_GFF_with_CDS
for r in reader:
File "/home/renees/.local/lib/python3.7/site-packages/cupcake-9.0.3-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 405, in next
return self.read()
File "/home/renees/.local/lib/python3.7/site-packages/cupcake-9.0.3-py3.7-linux-x86_64.egg/cupcake/io/GFF.py", line 562, in read
assert raw[2] == 'transcript'
AssertionError

Any idea what could be going wrong here? for the --gff3 parameter I passed the tappas-provided human gff3 file like I normally do.

python error in sqanti3_qc.py

when I set 'python sqanti3_qc.py -h' I obtain the following error:

Traceback (most recent call last):
File "sqanti3_qc.py", line 123, in
if os.system(RSCRIPTPATH + " --version")!=0:
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Shall I need to check my installation? I am not allow to use the .yml files so I have installed all the dependencies manually and maybe I have forgotten something

document skip_report option

I've skimmed through SQANTI code to see if it was possible to skip the PDF report generation altogether (i.e., produce only text files). This is very useful because in my experience this is by far the most memory- and CPU-hungry part of the run and not everybody's interested in SQANTI's PDF reports. I found out that there actually is such an option (--skip_report), except it's hidden in the code and not documented in SQANTI's help.

AttributeError: 'NoneType' object has no attribute 'group'

Hi,
Thanks for very very useful tool.when I use sqanti3_qc.py , I got "AttributeError: 'NoneType' object has no attribute 'group'".But I don't konw what is wrong?
This is my command
"python /data5/home/zhanglab/wanghaoyu/software/SQANTI3/sqanti3_qc.py /
--gtf ERR1597963_correctedReads_collapse.fasta.collapsed.gff /
IWGSC_v1.1_HC_20170706.split.gtf /
161010_Chinese_Spring_v1.0_pseudomolecules_parts.fasta --is_fusion"
And the file "ERR1597963_correctedReads_collapse.fasta.collapsed.gff" is the output of collapse_isoforms_by_sam.py from cupcake.
Thank you !!!!!

I obtain the following error:

Hi,when I run the sqanti3_qc.py program , I get the following error: ModuleNotFoundError: No module named 'bx.intervals'

but when i run "pip install Interval" ,I get :Requirement already satisfied: Interval in home/zqq/miniconda3/envs/SQANTI3.env/lib/python3.7/site-packages (1.0.0)
and when i run "pip list | grep Intercal ",nothing appeared.
I do not know why .

Problem assert strand == '+' or strand == '-'

Good afternoon,
I have beed trying to run sqanti using the gtf of pacbio transcripts merged with additional transcripts identified by stringtie. But is not working.

This is the code
singularity exec -B /group/calegari_beatrizepi/combinedgtf /projects/globalscratch/sqanti3.sif sqanti3_qc.py stringtie.all.noG20cov.mergedrename.gtf Mus_musculus.GRCm38.101.gtf Mus_musculus.GRCm38.dna.primary_assembly.fa --cage_peak mm10_fair_new_CAGE_peaks_phase1and2.bed --polyA_motif_list PolyA_motif_List_mm10.txt --gtf -fl hq.rn2.collapse_capped_trans_read.mapped_fl_count.txt -o PP_DP_N_merged -c DP_L356_twopassSJ.out.tab,DP_L223_twopassSJ.out.tab,DP_L394_twopassSJ.out.tab,N_L357_twopassSJ.out.tab,N_L224_twopassSJ.out.tab,N_L395_twopassSJ.out.tab,PP_L355_twopassSJ.out.tab,PP_L222_twopassSJ.out.tab,PP_L393_twopassSJ.out.tab --skip_report

This is the head of the stringtie gtf file
1 StringTie transcript 3214482 3671498 . - . gene_id MSTRG.1; transcript_id MSTRG.1.1;
1 StringTie exon 3214482 3216968 . - . gene_id MSTRG.1; transcript_id MSTRG.1.1; exon_number 1;
1 StringTie exon 3421702 3421901 . - . gene_id MSTRG.1; transcript_id MSTRG.1.1; exon_number 2;
1 StringTie exon 3670552 3671498 . - . gene_id MSTRG.1; transcript_id MSTRG.1.1; exon_number 3;

This is the message.
**** Performing Classification of Isoforms....
Number of classified isoforms: 49533
Traceback (most recent call last):
File "/SQANTI3/sqanti3_qc.py", line 2267, in
main()
File "/SQANTI3/sqanti3_qc.py", line 2260, in main
run(args)
File "/SQANTI3/sqanti3_qc.py", line 1713, in run
write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
File "/SQANTI3/sqanti3_qc.py", line 424, in write_collapsed_GFF_with_CDS
for r in reader:
File "/cDNA_Cupcake/cupcake/io/GFF.py", line 408, in next
return self.read()
File "/cDNA_Cupcake/cupcake/io/GFF.py", line 592, in read
rec = gmapRecord(chr, coverage=None, identity=None, strand=strand, seqid=seqid, geneid=geneid)
File "/cDNA_Cupcake/cupcake/io/GFF.py", line 324, in init
assert strand == '+' or strand == '-'
AssertionError

5' UTR and 3' UTR length

Hi,

How to summary 5' UTR and 3' UTR length of transcripts using SQANTI3?

Thanks very much!

tappas annotation gff3 only in 'splits' folder

The tappas_annot_from_sqanti3.gff3 files were present in the intermediary 'splits' file, but once the program is complete, they are nowhere to be found. Could it be that they just got deleted with the splits file after the program completes? Or is there something else going wrong?

DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working

This warning appeared despite following exactly the recommended installation process.

AssertionError

Hello,I encountered this error, anf I installed the latest SQANTI3 and cDNA_Cupcake

**** Performing Classification of Isoforms....
Number of classified isoforms: 42820
Traceback (most recent call last):
File "/SQANTI3/sqanti3_qc.py", line 2289, in
main()
File "/SQANTI3/sqanti3_qc.py", line 2273, in main
run(args)
File "/SQANTI3/sqanti3_qc.py", line 1724, in run
write_collapsed_GFF_with_CDS(isoforms_info, corrGTF, corrGTF+'.cds.gff')
File "/SQANTI3/sqanti3_qc.py", line 425, in write_collapsed_GFF_with_CDS
for r in reader:
File "/cDNA_Cupcake/cupcake/io/GFF.py", line 408, in next
return self.read()
File "/cDNA_Cupcake/cupcake/io/GFF.py", line 579, in read
assert raw[2] == 'transcript'
AssertionError

Could you please tag your project.

To promote reproducible science, could you please use git tags. Creating a tag also creates a release for your project. We require tagged releases when building scientific software. Pulling from the master is not reproducible.

I would also recommend using at least a 3 digit semantic versioning scheme; Magor.Minor.Patch. Please do not
follow the git examples by putting a "v" as the leading character. Github will create a "release" when the tag is pushed.

thanks for making your software available

git tag 1.0.0
git tag push origin 1.0.0

Annotation.file

Dear all,
The annotation file must be in GFT format but is it also possible to do it with a GFF3 format or do i have to convert it?

Thank you in advance
Master student Biology

No GeneIDs in GTFs

I am analyzing scRNA data. I need a gtf file with annotated geneIDs.
I used sqanti3_qc.py with the isoAnnotLite option. This gives me the tappAS_annot_from_SQANTI3.gff3 with gene-IDs. After filtering that output with_RulesFilter.py the resulting GTF has no gene-ID, just PBs.
It would be great if all the GTFs have geneIDs without need for extra annotation.

various errors when running sqanti3_qc

Hello!
long-time long-read lover but first-time SQANTI user here... thanks for developing this tool and making it available to the community!
I'm running SQANTI on a GTF like so:

conda activate SQANTI3.env
export PYTHONPATH=$PYTHONPATH:$HOME/bin/SQANTI3/cDNA_Cupcake/sequence/
export PYTHONPATH=$PYTHONPATH:$HOME/bin/SQANTI3/cDNA_Cupcake/

python ~/bin/SQANTI3/sqanti3_qc.py --gtf --skipORF -d tmp tmp/isoforms.gff annotation.gff hg38.fa &> err.txt

err.txt (abridged/edited for clarity)

SQANTI crashes while generating the report because of some R package install errors.

I subsequently installed the missing packages with conda/mamba (note that all commands following are issued under SQANTI3.env) :

mamba install -c bioconda bioconductor-noiseq

Relaunched SQANTI:

python ~/bin/SQANTI3/sqanti3_qc.py --gtf --skipORF -d tmp tmp/isoforms.gff annotation.gff hg38.fa &> err2.txt

err2.txt

Re-installed ggplotify as requested:

mamba install -c conda-forge r-ggplotify

Relaunched SQANTI:

python ~/bin/SQANTI3/sqanti3_qc.py --gtf --skipORF -d tmp tmp/isoforms.gff annotation.gff hg38.fa &> err3.txt

err3.txt

I think this is something I won't be able to fix myself this time. Any help would be greatly appreciated!
Greetings,
Julien

Issues in final report generation

I got an error when running SQANTI3 with some of my samples at the final step of summary report generation.
I beleive that is because I used data with collapsed isoforms of targed genes. There is no ">=6" or even "4-5" nIso in my data.
Could you please fix the related code in SQANTI3_report.R or add the option to skip summary report generation?
Thanks!

Error in cut.default(isoPerGene$nIso, breaks = c(0, 1, 3, 5, max(isoPerGene$nIso) + :
'breaks' are not unique
Calls: cut -> cut.default
Execution halted

subprocess.CalledProcessError:` Command '/home/daij/miniconda2/envs/SQANTI3.env/bin/Rscript /gpfs/gsfs10/users/DCEG_RTEL/references/SQANTI3/utilities//SQANTI3_report.R /gpfs/gsfs10/users/DCEG_RTEL/DJQ/sqanti3/test_isoseq3_sqanti3/sqanti_r2/lima_output.lbc73--lbc73.collapsed.filtered.rep_classification.txt /gpfs/gsfs10/users/DCEG_RTEL/DJQ/sqanti3/test_isoseq3_sqanti3/sqanti_r2/lima_output.lbc73--lbc73.collapsed.filtered.rep_junctions.txt /gpfs/gsfs10/users/DCEG_RTEL/DJQ/sqanti3/test_isoseq3_sqanti3/sqanti_r2/lima_output.lbc73--lbc73.collapsed.filtered.rep.params.txt /gpfs/gsfs10/users/DCEG_RTEL/references/SQANTI3/utilities/' returned non-zero exit status 1.

invalid gffGroup

Hi team,
I ran into this error while running SQANTI3 on my FLAIR output gtf file. I'm thinking of contacting the FLAIR authors too but could you please clarify this?

Thank you!

KeyError in ORF prediction

I'm getting a KeyError during the classification step:

**** Performing Classification of Isoforms....
Traceback (most recent call last):
  File "/home/BIOTECH/zamanian/install/SQANTI3/sqanti3_qc.py", line 2268, in <module>
    main()
  File "/home/BIOTECH/zamanian/install/SQANTI3/sqanti3_qc.py", line 2261, in main
    run(args)
  File "/home/BIOTECH/zamanian/install/SQANTI3/sqanti3_qc.py", line 1710, in run
    isoforms_info = isoformClassification(args, isoforms_by_chr, refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene, genome_dict, indelsJunc, orfDict)
  File "/home/BIOTECH/zamanian/install/SQANTI3/sqanti3_qc.py", line 1553, in isoformClassification
    orfDict[rec.id].cds_genomic_end   = m[orfDict[rec.id].cds_end-1] + 1    # make it 1-based
KeyError: 1034

Using the --skipORF option allows the classification step to complete.

Errors when running sqanti3_qc.py and sqanti3_RulesFilter.py

Hello,

I would like to ask you for help by pointing me out to where the errors I am experiencing might come from. To start with, let me assure you I followed the installation instruction. I also have cDNA_Cupcake available and added it to the PYTHONPATH by running the following two commands after activating the created environment:
export PYTHONPATH=$PYTHONPATH:/home/bin/cDNA_Cupcake/sequence/
export PYTHONPATH=$PYTHONPATH:/home/bin/cDNA_Cupcake/

However, when trying to run python sqanti3_qc.py -h, the following error pops up:

Traceback (most recent call last):
  File "/home/eva/SQANTI3/sqanti3_qc.py", line 51, in <module>
    from err_correct_w_genome import err_correct
  File "/home/eva/cDNA_Cupcake/sequence/err_correct_w_genome.py", line 9, in <module>
    import BioReaders
  File "/home/eva/cDNA_Cupcake/sequence/BioReaders.py", line 419
    print "qStart:", self.qStart
                  ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("qStart:", self.qStart)?

Similarly, python sqanti3_RulesFilter.py returns:

Traceback (most recent call last):
  File "sqanti3_RulesFilter.py", line 17, in <module>
    from cupcake.io.BioReaders import GMAPSAMReader
  File "/home/eva/cDNA_Cupcake/cupcake/io/BioReaders.py", line 311
    raise Exception, "Unrecognized cigar character {0}!".format(type)
                   ^
SyntaxError: invalid syntax

I am wondering whether this has something to do with my python version, which should be Python 3.7.6.

Did something go wrong while preparing the environment or installing any of the dependencies? How should I solve this problem?

Thank you very much.

AssertionError when run SQANTI3 with example input files

I found some discussion about my issue and I will try and see whether I can fix it by myself. Thank you

Sam files missing header if genome too large

Hi there,

I was getting the following error when running squanti3_qc.py with the hg38 genome fasta (supplying isoforms as fastq, not gtf):

And I noticed that minimap2 was giving the following warning earlier in the output:

Sure enough, the all_samples.chained.rep_corrected.sam file in the output dir won't open with samtools because its missing the header.

Luckily, I can switch to just using chr12, which fixed this for me, but other users may run into trouble with this if they're supplying large genomes. Perhaps an option to supply your own sam file of isoforms aligned to the genome would help with this?

SQANTI3_report.R crashes on pre-filtered GTF?

Hello,

When running the latest SQANTI3 on my pre-filtered transcriptome:

python ~/bin/SQANTI3/sqanti3_qc.py --gtf --skipORF -d tmp tmp/isoforms.gff annotation.gff hg38.fa &> err3.txt

the script crashes when trying to generate the report:

[...]
`summarise()` regrouping output by 'lenCat' (override with `.groups` argument)
`summarise()` regrouping output by 'structural_category' (override with `.groups` argument)
`summarise()` ungrouping output (override with `.groups` argument)
`summarise()` regrouping output by 'structural_category' (override with `.groups` argument)
Error in `$<-.data.frame`(`*tmp*`, Var, value = "Non canonical") : 
  replacement has 1 row, data has 0
Calls: $<- -> $<-.data.frame
Execution halted

(see full log)

Note that this data is heavily pre-filtered and contains only canonical junctions. When run on unfiltered data (i.e. with non-canonical junctions as well) SQANTI does not crash. So I suspect SQANTI3_report.R crashes when the input contains only canonical junctions.

Thanks
Julien

error orf , terminate called after throwing an instance of 'std::length_error'

Good afternoon,
I tried to run sqanti with a gtf file generated from stringtie+PACBIO gtf. With only pacbio gtf works perfectly.

The command
singularity exec -B /group/calegari_beatrizepi/combinedgtf /projects/globalscratch/sqanti3.sif sqanti3_qc.py merged.a25.all.others.10.d.nofus.merge.annotated.filteredrename.gtf Mus_musculus.GRCm38.101.gtf Mus_musculus.GRCm38.dna.primary_assembly.fa --cage_peak mm10_fair_new_CAGE_peaks_phase1and2.bed --polyA_motif_list PolyA_motif_List_mm10.txt --gtf -o ALL -c DP_L356_twopassSJ.out.tab,DP_L223_twopassSJ.out.tab,DP_L394_twopassSJ.out.tab,N_L357_twopassSJ.out.tab,N_L224_twopassSJ.out.tab,N_L395_twopassSJ.out.tab,PP_L355_twopassSJ.out.tab,PP_L222_twopassSJ.out.tab,PP_L393_twopassSJ.out.tab --isoAnnotLite --gff3 Mus_musculus_GRCm38_Ensembl_86.gff3

head of the gtf file (I renamed to have PB id)
1 PacBio transcript 4807829 4846734 . + . gene_id "PB.3"; transcript_id "PB.3.1";
1 PacBio exon 4807829 4807982 . + . gene_id "PB.3"; transcript_id "PB.3.1";
1 PacBio exon 4808455 4808486 . + . gene_id "PB.3"; transcript_id "PB.3.1";
1 PacBio exon 4828584 4828649 . + . gene_id "PB.3"; transcript_id "PB.3.1";
1 PacBio exon 4830268 4830315 . + . gene_id "PB.3"; transcript_id "PB.3.1";
1 PacBio exon 4832311 4832381 . + . gene_id "PB.3"; transcript_id "PB.3.1";

The error
**** Parsing provided files....
Reading genome fasta /group/calegari_beatrizepi/combinedgtf/Mus_musculus.GRCm38.dna.primary_assembly.fa....
Error corrected FASTA /group/calegari_beatrizepi/combinedgtf/ALL_corrected.fasta already exists. Using it...
**** Predicting ORF sequences...
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_S_create
GeneMarkS: error on last system call, error code 134
Abort program!!!
Traceback (most recent call last):
File "/SQANTI3/sqanti3_qc.py", line 2267, in
main()
File "/SQANTI3/sqanti3_qc.py", line 2260, in main
run(args)
File "/SQANTI3/sqanti3_qc.py", line 1691, in run
orfDict = correctionPlusORFpred(args, genome_dict)
File "/SQANTI3/sqanti3_qc.py", line 593, in correctionPlusORFpred
if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
File "/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /group/calegari_beatrizepi/combinedgtf/GMST/GMST_tmp /group/calegari_beatrizepi/combinedgtf/ALL_corrected.fasta' returned non-zero exit status 1.

Any idea of what would cause this?
Thank you

Error: "Expected GMST output IDs to be of format:"

Hello,
I am trying to run SQANTI3 'sqanti_qc.py' on the output of cupcake 'collapse' as an SGE job:

#!/bin/bash
#$ -pe smp 28
#$ -S /bin/bash
#$ -cwd
#$ -M [email protected]
#$ -m aes
#$ -j y

source activate SQANTI3.env

export PYTHONPATH=$PYTHONPATH:/Users/mchiment/cDNA_Cupcake

python ~/SQANTI3/sqanti3_qc.py --gtf ../cupcake/transcripts.collapsed.min_fl_10.gff \
                               ../GRCm39/GCF_000001635.27_GRCm39_genomic.gtf \
                               ../GRCm39/GCF_000001635.27_GRCm39_genomic.fna \
                               -o test

Input transcripts from cupcake look like this:

NC_000067.7	PacBio	transcript	4561611	4563831	.	-	.	gene_id "PB.2"; transcript_id "PB.2.3";
NC_000067.7	PacBio	exon	4561611	4562891	.	-	.	gene_id "PB.2"; transcript_id "PB.2.3";
NC_000067.7	PacBio	exon	4563323	4563831	.	-	.	gene_id "PB.2"; transcript_id "PB.2.3";
NC_000067.7	PacBio	transcript	4846602	4855921	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";
NC_000067.7	PacBio	exon	4846602	4847024	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";
NC_000067.7	PacBio	exon	4847748	4847871	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";
NC_000067.7	PacBio	exon	4852791	4852956	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";
NC_000067.7	PacBio	exon	4854174	4854328	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";
NC_000067.7	PacBio	exon	4855796	4855921	.	-	.	gene_id "PB.3"; transcript_id "PB.3.4";

The python code errors out with a sys.exit() on this:

Expected GMST output IDs to be of format 

'<pbid> gene_4|GeneMark.hmm|<orf>_aa|<strand>|<cds_start>|<cds_end>’ 

but instead saw: 

PB.2.3 gene=PB.2    gene_1|GeneMark.hmm|419_aa|+|203|1462.   

! Abort!

Which I believe is coming from line ~586 of the SQANTI3 code:


        # Modifying ORF sequences by removing sequence before ATG
        with open(corrORF, "w") as f:
            for r in SeqIO.parse(open(gmst_pre+'.faa'), 'fasta'):
                m = gmst_rex.match(r.description)
                if m is None:
                    print("Expected GMST output IDs to be of format '<pbid> gene_4|GeneMark.hmm|<orf>_aa|<strand>|<cds_start>|<cds_end>' but instead saw: {0}! Abort!".format(r.description), file=sys.stderr)
                    sys.exit(-1)

A head on the GMST_tmp.faa file looks like this:

>PB.2.3 gene=PB.2	gene_1|GeneMark.hmm|419_aa|+|203|1462
MSSPDAGYASDDQSQPRSAQPAVMAGLGPCPWAESLSPLGDVKVKGEVVASSGAPAGTSG
RAKAESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEE
AERLRVQHMQDHPNYKYRPRRRKQVKRMKRVEGGFLHALVEPQAGALGPEGGRVAMDGLG
LPFPEPGYPAGPPLMSPHMGPHYRDCQGLGAPALDGYPLPTPDTSPLDGVEQDPAFFAAP
LPGDCPAAGTYTYAPVSDYAVSVEPPAGPMRVGPDPSGPAMPGILAPPSALHLYYGAMGS
PAASAGRGFHAQPQQPLQPQAPPPPPQQQHPAHGPGQPSPPPEALPCRDGTESNQPTELL
GEVDRTEFEQYLPFVYKPEMGLPYQGHDCGVNLSDSHGAISSVVSDASSAVYYCNYPDI

>PB.3.4 gene=PB.3	gene_2|GeneMark.hmm|302_aa|+|1|909

And there is indeed the "gene=PB.3" field which is not supposed to be there according to the code. I could try using VIM to delete this field from the "GMST_tmp.faa" file and running again? My input GFF file looks exactly like the example file, so it must be a problem with the GMST code base, I think.

Any insight on how to proceed would be appreciated. Thanks!

FileNotFoundError

Hi, I was running SQANTI3 but it shows me this message: No such file or directory: ### FileNotFoundError: [Errno 2] No such file or directory: '/home/hawkins/Documents/Cesar/RNA/globus/lordec_reports/sqanti3/refAnnotation_sqanti_LID50568_1001.genePred'

Do you know what could be wrong?

Best,

Cesar

NMD_prediction

Hello,
what is the algorithm for the NMD_prediction?
Marcel

BioReaders error

HI,

I installed it as per instructions. When I run it, I get the following error:

python sqanti3_qc.py -h sqanti3_qc.py:19: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working from collections import defaultdict, Counter, namedtuple, Iterable Traceback (most recent call last): File "sqanti3_qc.py", line 50, in <module> from err_correct_w_genome import err_correct File "/home/nagaraap/Downloads/Softwares/SQANTI2/cDNA_Cupcake/sequence/err_correct_w_genome.py", line 9, in <module> import BioReaders File "/home/nagaraap/Downloads/Softwares/SQANTI2/cDNA_Cupcake/sequence/BioReaders.py", line 419 print "qStart:", self.qStart ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print("qStart:", self.qStart)?

The python version is 3.7.6. How do I resolve it ?

Thanks.

Any specific requirements for genome annotation

Hi,
I have installed SQANTI3 and successfully ran the example set. Thus, I have a working installation of SQANTI3. However, when I tried my own dataset, it gives me all kinds of error and an empty classification and junction outputs. I am working on a non-model species with less than perfect genome assembly and annotation. Does SQANTI3 have specific/minimum requirements for a genome annotation/assembly?
Thanks.
Vera

docker image

Not an "issue" per se, but FYI I created a docker image for SQANTI3:

https://hub.docker.com/r/joelnitta/sqanti3

https://github.com/joelnitta/sqanti3-docker

I hope it's useful! The current image is tagged to v1.6.0. I plan to provide updates when new SQANTI3 versions come out... at least as long as I'm using it :)

Thanks for the great software!

filter problem

Good morning,
After completely running sqanti qc I decided to run sqanti3_RulesFilter.py

However I had a problem when running it. (we used a container with up-to-date linux system and all the tools bundled together)

singularity exec -B /group/calegari_beatrizepi/SQANTI3 /projects/globalscratch/sqanti3.sif sqanti3_RulesFilter.py gmap1_cup_PP_sqanti_classification.txt gmap1_cup_PP_sqanti_corrected.fasta gmap1_cup_PP_sqanti_corrected.gtf
/SQANTI3/sqanti3_RulesFilter.py: 1: author: not found
/SQANTI3/sqanti3_RulesFilter.py: 2: version: not found
/SQANTI3/sqanti3_RulesFilter.py: 11:
Lightweight filtering of SQANTI by using .classification.txt output

Only keep Iso-Seq isoforms if:
The isoform is FSM, ISM, or NIC and (does not have intrapriming or has polyA_motif)
The isoform is NNC, does not have intrapriming/or polyA motif, not RT-switching, and all junctions are either all canonical or short-read-supported
The isoform is antisense, intergenic, genic, does not have intrapriming/or polyA motif, not RT-switching, and all junctions are either all canonical or short-read-supported
: not found
/SQANTI3/sqanti3_RulesFilter.py: 13: import: not found
/SQANTI3/sqanti3_RulesFilter.py: 14: import: not found
/SQANTI3/sqanti3_RulesFilter.py: 15: from: not found
/SQANTI3/sqanti3_RulesFilter.py: 16: from: not found
/SQANTI3/sqanti3_RulesFilter.py: 17: from: not found
/SQANTI3/sqanti3_RulesFilter.py: 18: from: not found
/SQANTI3/sqanti3_RulesFilter.py: 20: Syntax error: "(" unexpected

merge multiple sampels' classification results

Hi,
Thanks for very very useful tool for isoform characterization.
How to merge multiple sampels' classification results and generate one report pdf file? The most important promblem is how to consider the boundary of exons in isoforms.

Best wishes,
ping

Error

Please explain the problem below and how to fix. Thanks.

Traceback (most recent call last):
File "/home/CAM/vvelasco/Applications/SQANTI3/sqanti3_qc.py", line 2289, in
main()
File "/home/CAM/vvelasco/Applications/SQANTI3/sqanti3_qc.py", line 2273, in main
run(args)
File "/home/CAM/vvelasco/Applications/SQANTI3/sqanti3_qc.py", line 1702, in run
orfDict = correctionPlusORFpred(args, genome_dict)
File "/home/CAM/vvelasco/Applications/SQANTI3/sqanti3_qc.py", line 594, in correctionPlusORFpred
if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
File "/home/CAM/vvelasco/Applications/anaconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /home/CAM/vvelasco/Applications/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /labs/Wegrzyn/CoAdapTree_Douglasfir/Assembly_comparison/04_Sqanti/GMST/GMST_tmp /labs/Wegrzyn/CoAdapTree_Douglasfir/Assembly_comparison/04_Sqanti/all.fake_genome_corrected.fasta' returned non-zero exit status 1.

Errors in generating Sqanti_report.R?

Hi,

Thanks for setting up sqanti3!

I've been running sqanti3_qc.py on my samples. while I am able to reach the step Writing output files,
I have been having 2 spots in my run that consistently fails:

#1: there seems to be an extra "/" added to the path when the SQANTI_report.R is called
the error generated is below:

**** Writing output files....
**** Generating SQANTI3 report....
Traceback (most recent call last):
File "/conda/SQANTI3/sqanti3_qc.py", line 2289, in
main()
File "/conda/SQANTI3/sqanti3_qc.py", line 2273, in main
run(args)
File "/conda/SQANTI3/sqanti3_qc.py", line 1892, in run
if subprocess.check_call(cmd, shell=True)!=0:
File "/.conda/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/.conda/envs/SQANTI3.env/bin/Rscript /conda/SQANTI3/utilities//SQANTI3_report.R /test_sqanti3/output/all_samples.chained.rep_classification.txt /test_sqanti3/output/all_samples.chained.rep_junctions.txt /test_sqanti3/output/all_samples.chained.rep.params.txt /conda/SQANTI3/utilities/' returned non-zero exit status 1.

#2: SQANTI_report.R keeps terminating at the LR.rarefaction.R
If I take the output files and run it separately using SQANTI3_report.R, then I encounter another problem with the script where it stops at the LR.rarefaction.R. There doesn't seem to be any problems plotting the other plots (p1, p2 etc.. )

my output -->
summarise() regrouping output by 'lenCat' (override with .groups argument)
summarise() regrouping output by 'structural_category' (override with .groups argument)
summarise() ungrouping output (override with .groups argument)
summarise() regrouping output by 'structural_category' (override with .groups argument)
summarise() regrouping output by 'structural_category' (override with .groups argument)
summarise() regrouping output by 'structural_category' (override with .groups argument)
summarise() regrouping output by 'associated_transcript' (override with .groups argument)
summarise() regrouping output by 'associated_transcript' (override with .groups argument)
Error in base::rep(genes, data[, samples[s]]) : invalid 'times' argument
Calls: LR.rarefaction
Execution halted

I'm new at Isoseq analysis.. so I'm not sure if this is something on my side, or if there is a quick fix I can apply here.
Thank you!

Cheers,
sook

Target ids in kallisto gene expression matrix

Hello,
I would like to know how to get the target ids in the gene expression in this format PB.X.Y as it is shown in the example.
Thank you,
Ismail

Comparing GeneIDs in the filtered gtf from SQANTI2 vs SQANTI3

Perhaps this has to do with a similar issue mentioned before, apologies for duplication if it is. I recently ran into an issue where after running sqanti3_RulesFilter.py (with a reference gtf) on my data, the value associated with the "gene_id" field is identical to the "transcript_id" field.

Example:
chr21 PacBio transcript 5116342 5128442 . - . gene_id "PB.6133.1"; transcript_id "PB.6133.1";

This is different than the output I get when running SQANTI2 where the value associated with gene_id is an HGNC symbol. I don't think it's a major issue, since the associated reference transcriptome value for the field should be present in the classification file and can be solved by a simple merge command, however I feel it is an issue worth noting/fixing since I ran into some downstream headaches assuming the output of the 2 versions was identical.

Sample output of SQANTI2 for reference:
chr21 PacBio transcript 5116342 5128442 . - . gene_id "CH507-9B2.3"; transcript_id "PB.6132.1";

error using GFF or converted GTF

Hi
I accidentally open a ticket with SQANTI2 instead of SQANTI3so I decided to re-post here (sorry for the duplication!):

I am using sqanti3 with the following parameters :

Version 2.0.0
Input   all_samples.chained.gff
Annotation      OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff
Genome  OcoaRS1.PZYT01.fa.masked
Aligner minimap2
FLCount all_samples.chained_count.txt
Expression      transcript_tpms_all_samples.tsv
Junction        *SJ.tab.out
CagePeak        NA
PolyA   NA
PolyAPeak       NA
IsFusion        False

The script I am running:

sqanti3_qc.py --gtf /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/chain_filtered_isoseq_3/all_samples.chained.gff \
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff \
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked \
--fl_count /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/chain_filtered_isoseq_3/all_samples.chained_count.txt \
-c star/*SJ.tab.out --expression rsem_exp/GeneMat.txt --isoAnnotLite -t 128 -n 8

But get this error (a directory named splits/, with subdirectories 0/ to 7/, are created, however the contents of the subdirectory are unequal (not all contain a *.faa) and all other output files are empty) :

R scripting front-end version 3.6.1 (2019-07-05)
Write arguments to /ibex/scratch/projects/c2042/analysis/sqanti3/all_samples.chained.params.txt...
**** Running SQANTI3...
launching worker on on splits/0/all_samples.chained.gff.split0....
launching worker on on splits/1/all_samples.chained.gff.split1....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/2/all_samples.chained.gff.split2....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/3/all_samples.chained.gff.split3....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/4/all_samples.chained.gff.split4....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/5/all_samples.chained.gff.split5....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/6/all_samples.chained.gff.split6....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
launching worker on on splits/7/all_samples.chained.gff.split7....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
**** Parsing provided files....
Reading genome fasta /home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.masked....
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.
Skipping aligning of sequences because GTF file was provided.

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
**** Predicting ORF sequences...
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
error: data input empty 
GeneMarkS: error on last system call, error code 256
Abort program!!!
Process Process-1:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1748, in run
    orfDict = correctionPlusORFpred(args, genome_dict)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 583, in correctionPlusORFpred
    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /ibex/scratch/projects/c2042/analysis/sqanti3/splits/0/GMST/GMST_tmp /ibex/scratch/projects/c2042/analysis/sqanti3/splits/0/all_samples.chained_corrected.fasta' returned non-zero exit status 1.
**** Parsing Reference Transcriptome....
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff doesn't appear to be a GTF file (GFF not supported by this program)
Process Process-8:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1751, in run
    refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene = reference_parser(args, list(genome_dict.keys()))
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 653, in reference_parser
    for r in genePredReader(referenceFiles):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 134, in __init__
    self.f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/ibex/scratch/projects/c2042/analysis/sqanti3/splits/7/refAnnotation_all_samples.chained.genePred'
**** Parsing Reference Transcriptome....
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff doesn't appear to be a GTF file (GFF not supported by this program)
Process Process-3:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1751, in run
    refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene = reference_parser(args, list(genome_dict.keys()))
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 653, in reference_parser
    for r in genePredReader(referenceFiles):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 134, in __init__
    self.f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/ibex/scratch/projects/c2042/analysis/sqanti3/splits/2/refAnnotation_all_samples.chained.genePred'
**** Parsing Reference Transcriptome....
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff doesn't appear to be a GTF file (GFF not supported by this program)
Process Process-3:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1751, in run
    refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene = reference_parser(args, list(genome_dict.keys()))
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 653, in reference_parser
    for r in genePredReader(referenceFiles):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 134, in __init__
    self.f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/ibex/scratch/projects/c2042/analysis/sqanti3/splits/2/refAnnotation_all_samples.chained.genePred'
^[[B**** Parsing Reference Transcriptome....
/home/albadenm/c2042/analysis/IsoSeq_Ocoarcata/maker/OcoaRS1.PZYT01.fa.maker.output/round4/OcoaRS1.PZYT01.fa.all.maker.functional_ipr.gff doesn't appear to be a GTF file (GFF not supported by this program)
Process Process-4:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1751, in run
    refs_1exon_by_chr, refs_exons_by_chr, junctions_by_chr, junctions_by_gene, start_ends_by_gene = reference_parser(args, list(genome_dict.keys()))
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 653, in reference_parser
    for r in genePredReader(referenceFiles):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 134, in __init__
    self.f = open(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/ibex/scratch/projects/c2042/analysis/sqanti3/splits/3/refAnnotation_all_samples.chained.genePred'
error: data input empty 
GeneMarkS: error on last system call, error code 256
Abort program!!!
Process Process-7:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1748, in run
    orfDict = correctionPlusORFpred(args, genome_dict)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 583, in correctionPlusORFpred
    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /ibex/scratch/projects/c2042/analysis/sqanti3/splits/6/GMST/GMST_tmp /ibex/scratch/projects/c2042/analysis/sqanti3/splits/6/all_samples.chained_corrected.fasta' returned non-zero exit status 1.
error: data input empty 
GeneMarkS: error on last system call, error code 256
Abort program!!!
Process Process-5:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1748, in run
    orfDict = correctionPlusORFpred(args, genome_dict)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 583, in correctionPlusORFpred
    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /ibex/scratch/projects/c2042/analysis/sqanti3/splits/4/GMST/GMST_tmp /ibex/scratch/projects/c2042/analysis/sqanti3/splits/4/all_samples.chained_corrected.fasta' returned non-zero exit status 1.
error: data input empty 
GeneMarkS: error on last system call, error code 256
Abort program!!!
Process Process-6:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1748, in run
    orfDict = correctionPlusORFpred(args, genome_dict)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 583, in correctionPlusORFpred
    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /ibex/scratch/projects/c2042/analysis/sqanti3/splits/5/GMST/GMST_tmp /ibex/scratch/projects/c2042/analysis/sqanti3/splits/5/all_samples.chained_corrected.fasta' returned non-zero exit status 1.
error: data input empty 
GeneMarkS: error on last system call, error code 256
Abort program!!!
Process Process-2:
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 1748, in run
    orfDict = correctionPlusORFpred(args, genome_dict)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 583, in correctionPlusORFpred
    if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/miniconda3/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /ibex/scratch/projects/c2042/analysis/sqanti3/splits/1/GMST/GMST_tmp /ibex/scratch/projects/c2042/analysis/sqanti3/splits/1/all_samples.chained_corrected.fasta' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 2337, in <module>
    main()
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 2328, in main
    combine_split_runs(args, split_dirs)
  File "/ibex/sw/csi/sqanti3/1.6/el7.9_conda3/SQANTI3/sqanti3_qc.py", line 2151, in combine_split_runs
    with open(_orf) as h: f_faa.write(h.read())
FileNotFoundError: [Errno 2] No such file or directory: '/ibex/scratch/projects/c2042/analysis/sqanti3/splits/0/all_samples.chained_corrected.faa'

Any help or insight would be greatly appreciated :)

Missing associated_gene for some "antisense" isoforms

I am trying to use SQANTI3 QC to classify isoforms and identify the associated reference genes from a reference annotation. No errors occurred in the generation of the output files, but one of the chromosomes (named MtDNA) seems to have some isoforms that are not properly classified (all were classified as "antisense" despite some having no overlap with reference genes) and did not have an associated_gene value. Below are the arguments I used for SQANTI3 QC. Attached are 1) a subset of the reference gtf file used with all MtDNA elements (MtDNA.gtf.txt), 2) the lines of the classification file that do not have an associated_gene value (adult_combine_no_associated_gene_classification.txt), and 3) the lines from the isoform gtf file that did not get an associated_gene value (adult_combine_no_associated_gene.gtf.txt).
This set of isoforms contains mostly mono-exon and a few multi-exon isoforms, but these are not the only isoforms from this chromosome. Other mono-exon and multi-exon isoforms seem to be properly classified and assigned an associated_gene so it does not look to be the chromosome itself causing the issue.

I would appreciate any advice on how to fix this issue if it is possible. Thank you!

python $SCRIPTS/SQANTI3/sqanti3_qc.py
--gtf
--skipORF
--force_id_ignore
--dir ${OUTDIR}/${NAME}
-o ${NAME}
${SAMPLEGTF}
${REFGTF}
${FA}

MtDNA.gtf.txt
adult_combine_no_associated_gene_classification.txt
adult_combine_no_associated_gene.gtf.txt

Error when running sqanti3_qc.py (possibly an error from R package?)

I am getting an error message when I am trying to run sqanti3_qc.py. I have posted what I am seeing on my end. I am not sure where the error is coming from but it may have something to do with R?

(SQANTI3.env) [hos28@lb021login ~]$ python3 SQANTI3/sqanti3_qc.py --gtf dedup.5merge.collapsed.gff genes.gtf genome.fa
R scripting front-end version 3.6.1 (2019-07-05)
Write arguments to /home/hos28/dedup.5merge.collapsed.params.txt...
**** Running SQANTI3...
**** Parsing provided files....
Reading genome fasta /home/hos28/genome.fa....
Error corrected FASTA /home/hos28/dedup.5merge.collapsed_corrected.fasta already exists. Using it...
**** Predicting ORF sequences...
ORF file /home/hos28/dedup.5merge.collapsed_corrected.faa already exists. Using it....
**** Parsing Reference Transcriptome....
/home/hos28/refAnnotation_dedup.5merge.collapsed.genePred already exists. Using it.
**** Parsing Isoforms....
Splice Junction Coverage files not provided.
**** Performing Classification of Isoforms....
Number of classified isoforms: 1
**** RT-switching computation....
Full-length read abundance files not provided.
Isoforms expression files not provided.
**** Writing output files....
**** Generating SQANTI3 report....
Registered S3 methods overwritten by 'ggplot2':
method from
[.quosures rlang
c.quosures rlang
print.quosures rlang

Attaching package: ‘dplyr’

The following object is masked from ‘package:gridExtra’:

combine

The following object is masked from ‘package:reshape’:

rename

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:dplyr’:

combine, intersect, setdiff, union

The following object is masked from ‘package:gridExtra’:

combine

The following objects are masked from ‘package:stats’:

IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: splines
Loading required package: Matrix

Attaching package: ‘Matrix’

The following object is masked from ‘package:reshape’:

expand

Error in $<-.data.frame(*tmp*, SJ_type, value = "__SJ") :
replacement has 1 row, data has 0
Calls: $<- -> $<-.data.frame
Execution halted
Traceback (most recent call last):
File "SQANTI3/sqanti3_qc.py", line 2291, in
main()
File "SQANTI3/sqanti3_qc.py", line 2275, in main
run(args)
File "SQANTI3/sqanti3_qc.py", line 1894, in run
if subprocess.check_call(cmd, shell=True)!=0:
File "/home/hos28/.pyenv/versions/3.7.2/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/home/hos28/miniconda3/envs/SQANTI3.env/bin/Rscript /home/hos28/SQANTI3/utilities//SQANTI3_report.R /home/hos28/dedup.5merge.collapsed_classification.txt /home/hos28/dedup.5merge.collapsed_junctions.txt /home/hos28/dedup.5merge.collapsed.params.txt /home/hos28/SQANTI3/utilities/' returned non-zero exit status 1.

conda fails to build environment on mac -- what system is supported?

Hi,
I tried building the conda environment from master and the v1.4 release on my mac, but the environment could not be resolved:

$ conda env create -f SQANTI3.conda_env.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - mpfr==4.0.1=hdf1c602_3
  - libstdcxx-ng==9.1.0=hdf63c60_0
  - libpng==1.6.37=hbc83047_0
  - tk==8.6.8=hbc83047_0
  - ld_impl_linux-64==2.33.1=h53a641e_7
  - sqlite==3.31.1=h62c20be_1
  - r-mgcv==1.8_28=r36h96ca727_0
  - gxx_impl_linux-64==7.3.0=hdf63c60_1
  - fontconfig==2.13.0=h9420a91_0
...

I suspect this fails because the versions are pinned to a specific linux (?) build (e.g. hdf1c602_3 in - mpfr==4.0.1=hdf1c602_3). I suspect that SQANTI3 is not supported on Mac OS, but what Linux systems are supported? What Linux system was this environment.yml built on?

Difference in number of genes in SQANTI report and fasta file

Hi,

I have the PacBio Isoseq data and I used SQANTI for the analysis. After performing sqanti_qc analysis, in the report it is mentioned as the number of unique gene is 23758 and unique isoforms is 102367. But counting the transcripts in GFF file, the number is 12911.

So why is the number different between the report and gff file?

Thanks!!

aligner if gff input?

Good evening!
First of all, I am really happy that SQANTI3 was finally released!
I wanted to use the GFF file output generated from cDNA_cupcake, does SQANTI3 accepts this input? If yes, it is written "If you provide the sequences of your transcripts, a mapping step will be performed initially with minimap2." is I use my GFF file, will it still happen an alignment?
Thank you
Beatriz

SQANTI3 output missing Intra−Priming Quality Check and Quality Controls

Sqanti2 and SQANTI3 out put are different. "Intra−Priming Quality Check" and "Quality Controls" out put are both missing in SQANTI3.

Absent in the example out "SQANTI3/example/example_out/melanoma_chr13_sqanti_report.pdf" file.

Something is missing in SQANTI3?

issues running IsoAnnotLite

I'm trying to run SQANTI with the isoAnnotLite option so I can then run Tappas using my Isoseq data as a functional reference.

It's not clear whether I need to use a reference GTF that matches the IsoAnnot GFF3 - I've tried with both a GENCODE and a RefSeq GTF and I get the same error. Here's my SQANTI command:

GTF=all_samples/test/all_samples_filtered_chr21.gtf
COUNT=all_samples/cupcake_filtered/all_samples.demux_fl_count_filtered.csv

export PS1=""; ml anaconda3; CONDA_BASE=$(conda info --base); source $CONDA_BASE/etc/profile.d/conda.sh; ml purge;
conda activate SQANTI3.env; 
export PYTHONPATH=$PYTHONPATH:/sc/arion/projects/ad-omics/data/software/cDNA_Cupcake/sequence;export PYTHONPATH=$PYTHONPATH:/sc/arion/projects/ad-omics/data/software/cDNA_Cupcake/;

python /sc/arion/projects/ad-omics/data/software/SQANTI3/sqanti3_qc.py -t 8 --aligner_choice=minimap2 --dir all_samples/SQANTI3_test/  --out all_samples.cupcake.collapsed.filtered  \
--cage_peak /sc/arion/projects/ad-omics/data/references/hg38_reference/SQANTI3/hg38.cage_peak_phase1and2combined_coord.bed \
--polyA_motif_list /sc/arion/projects/ad-omics/data/references/hg38_reference/SQANTI3/human.polyA.list.txt  \
--fl_count $COUNT \
--gtf $GTF \
 --isoAnnotLite --gff3 /sc/arion/projects/ad-omics/data/references/hg38_reference/RefSeq/Homo_sapiens_GRCh38_RefSeq_78.gff3 \
/sc/arion/projects/ad-omics/data/references/hg38_reference/RefSeq/GRCh38_latest_genomic.gtf \
/sc/hydra/projects/ad-omics/data/references/hg38_reference/hg38.fa

SQANTI runs fine, both ORF prediction and report generation occur without error, but then I get this from IsoAnnot:

Running IsoAnnot Lite 1.5...

Reading SQANTI 3 Files and creating an auxiliar GFF...
Reading reference annotation file and creating data variables...
Transforming CDS local positions to genomic position...
Transforming feature local positions to genomic position in GFF3...
Mapping transcript features betweeen GFFs...


	0.00 % of transcripts annotated...

	·Annoted a total of 0 annotation features from reference GFF3 file.
Traceback (most recent call last):
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1443, in <module>
    main()
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1320, in main
    run(args)
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1382, in run
    mappingFeatures(dc_SQexons, dc_SQcoding, dc_SQtransGene, dc_GFF3exonsTrans, dc_GFF3transExons, dc_GFF3_Genomic, dc_GFF3coding, filename) #edit tappAS_annotation_from_Sqanti file
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 851, in mappingFeatures
    perct = featuresAnnotated/totalAnotations*100
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/sqanti3_qc.py", line 2289, in <module>
    main()
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/sqanti3_qc.py", line 2277, in main
    run_isoAnnotLite(corrGTF, outputClassPath, outputJuncPath, args.dir, args.output, args.gff3)
  File "/sc/arion/projects/ad-omics/data/software/SQANTI3/sqanti3_qc.py", line 1912, in run_isoAnnotLite
    if subprocess.check_call(ISOANNOT_CMD, shell=True)!=0:
  File "/sc/arion/projects/als-omics/conda/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'python /sc/arion/projects/ad-omics/data/software/SQANTI3/utilities/IsoAnnotLite_SQ3.py /sc/arion/projects/als-omics/microglia_isoseq/isoseq-pipeline/all_samples/SQANTI3_test/all_samples.cupcake.collapsed.filtered_corrected.gtf /sc/arion/projects/als-omics/microglia_isoseq/isoseq-pipeline/all_samples/SQANTI3_test/all_samples.cupcake.collapsed.filtered_classification.txt /sc/arion/projects/als-omics/microglia_isoseq/isoseq-pipeline/all_samples/SQANTI3_test/all_samples.cupcake.collapsed.filtered_junctions.txt -gff3 /sc/arion/projects/ad-omics/data/references/hg38_reference/RefSeq/Homo_sapiens_GRCh38_RefSeq_78.gff3 -d /sc/arion/projects/als-omics/microglia_isoseq/isoseq-pipeline/all_samples/SQANTI3_test -o all_samples.cupcake.collapsed.filtered' returned non-zero exit status 1.

I also tried running IsoAnnot separately using the IsoAnnotLite_SQ3.py script with my corrected GTF but I get the same error. Do you have any ideas what I might be doing wrong?

sqanti3 out put: same sequences with different ID

Hi,

Sqanti3 gives similar sequences with different ID. How can i select single sequence.
Here is an example:

PB.3.1-gene_1|GeneMark.hmm|459_aa|+|3033|4412
MAGELVEFEEGTIGIALNLESTNVGVVLMGDGLMIQEGSSVKATGKIAQIPVSEAYLGRVINALAKPIDGRGEISASESRLIESPAPGIISRRSVYEPLQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.2-gene_3|GeneMark.hmm|459_aa|+|2238|3617
MAGELVEFEEGTIGIALNLESTNVGVVLMGDGLMIQEGSSVKATGKIAQIPVSEAYLGRVINALAKPIDGRGEISASESRLIESPAPGIISRRSVYEPLQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.3-gene_4|GeneMark.hmm|459_aa|+|2006|3385
MAGELVEFEEGTIGIALNLESTNVGVVLMGDGLMIQEGSSVKATGKIAQIPVSEAYLGRVINALAKPIDGRGEISASESRLIESPAPGIISRRSVYEPLQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.4-gene_6|GeneMark.hmm|459_aa|+|1128|2507
MAGELVEFEEGTIGIALNLESTNVGVVLMGDGLMIQEGSSVKATGKIAQIPVSEAYLGRVINALAKPIDGRGEISASESRLIESPAPGIISRRSVYEPLQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.5-gene_7|GeneMark.hmm|459_aa|+|997|2376
MAGELVEFEEGTIGIALNLESTNVGVVLMGDGLMIQEGSSVKATGKIAQIPVSEAYLGRVINALAKPIDGRGEISASESRLIESPAPGIISRRSVYEPLQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.6-gene_8|GeneMark.hmm|351_aa|+|192|1247
MIPIGRGQRELIIGDRQTGKTAVATDTILNQQGQNVICVYVAIGQKASSVAQVVNTFQERGAMDYTIVVAETADSPATLQYLAPYTGAALAEYFMYRERHTLIIYDDPSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLGSQLGEGSMTALPIVETQSGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQAQAAPLTVEEQIMTIYTGTNGYLDSLEIGQVRKFLVELRTYLKTNKPQFQEIISSTKIFTEEAEALLQEAIQEQKERFLLQEQF
PB.3.8-gene_9|GeneMark.hmm|247_aa|+|226|969
MNVLSCSMNTLRGLYDISGVEVGQHFYWQIGGFQVHAQVLITSWVVIAILLGSAFIAVRNPQTVPTATQNFFEYVLEFIRDVSKTQIGEEYGPWVPFIGTLFLFIFVSNWSGALLPWKIIELPHGELAAPTNDINTTVALALLTSIAYFYAGLSKKGLGYFSKYIQPTPILLPINILEDFTKPLSLSFRLFGNILADELVVVVLVSLVPLVVPIPVMFLGLFTSGIQALIFATLAAAYIGESMEGHH

_classification_TPM.txt is unreadable by R

Dear SQANTI3 authors,

Thanks for all your hard work in maintaining such a useful tool! I'm trying to using the TPM table for downstream analysis and I'm having a hard time reading in the _classification_TPM.txt

There seem to be 2 issues:

the isoform name occurs twice but only has one column name, so I assume it was being used as a rowname before. This prevents it from being read by readr, and data.table complains about it.
I have a transcript classified as "Genic Genomic" but the space between "Genic" and "Genomic" is being interpreted as a newline character by less , read.table, readr and data.table.

For reference, I used the latest version of SQANTI3 as of 2 days ago to run the analysis.

Recurring perl errors - gmst.pl

Hi,

I had this same issue in SQANTI2 and it persists in SQANTI3 as well:

(SQANTI3.env) [kantha01@bigpurple-ln1 SQANTI3]$ python sqanti3_qc.py --gtf 1001/bc1001.gtf \

database/gencode.v34.annotation.gtf
database/hg38.fa
--fl_count 1001/tofu.collapsed.abundance.txt

sqanti3_qc.py:19: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
from collections import defaultdict, Counter, namedtuple, Iterable
R scripting front-end version 3.6.1 (2019-07-05)
Write arguments to /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/bc1001.params.txt...
**** Running SQANTI3...
**** Parsing provided files....
Reading genome fasta /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/database/hg38.fa....
Skipping aligning of sequences because GTF file was provided.

Indels will be not calculated since you ran SQANTI3 without alignment step (SQANTI3 with gtf format as transcriptome input).
**** Predicting ORF sequences...
GeneMarkS: error on last system call, error code 7
Abort program!!!
Traceback (most recent call last):
File "sqanti3_qc.py", line 2289, in
main()
File "sqanti3_qc.py", line 2273, in main
run(args)
File "sqanti3_qc.py", line 1702, in run
orfDict = correctionPlusORFpred(args, genome_dict)
File "sqanti3_qc.py", line 594, in correctionPlusORFpred
if subprocess.check_call(cmd, shell=True, cwd=gmst_dir)!=0:
File "/gpfs/home/kantha01/.conda/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'perl /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/GMST/GMST_tmp /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/bc1001_corrected.fasta' returned non-zero exit status 1.

The failed command works separately:
perl /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/utilities/gmst/gmst.pl -faa --strand direct --fnn --output /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/GMST/GMST_tmp /gpfs/data/skoklab/home/kantha01/iso_seq/softwares/SQANTI3/bc1001_corrected.fasta

I do require the ORF analysis. How do I go ahead with this?

This seems to be a common issue and it would be great if it can be resolved as similar errors popped up for minimap2 as well in SQANTI2. Fixing it would ease out the analysis part!

FL abundance file

Good evening!
I have Multi Sample Demux FL Count, from whch I had 3 multiplexed samples. As I could understand SQANTI3 wouldn't accept multiple samples (so it wouldn't accept the 3 gff files in a single run). So I was planning to run sample1, then sample2, then sample3. but the Multi Sample Demux FL Count has the file count of all the 3 samples. should I separate the Multi Sample Demux FL Count file in 3, and run with its respective sample?

id,sample1,sample2,sample3
PB.1.1,3,10,
PB.1.2,0,11,
PB.1.3,4,4

Invalid input IDs on custom fasta

I am trying to run sqanti3_qc on custom clustered isoforms and it throws the following error:
Invalid input IDs! Expected PB.X.Y or PB.X.Y|xxxxx or PBfusion.X format but saw m54217_190408_162523/31458270/ccs instead. Abort!
Should I rename the fasta file or is there an option to specify custom files?

Data frame SJ type error when running sqanti3_qc.py

I get this error when --skipORF is implemented as well as when it is not. Here's my command:

python /SQANTI3/sqanti3_qc.py m64120_200619_171832.collapsed.rep.fa ../annotations/Mus_musculus.GRCm38.100.gtf ../ref/GCA_000001635.9_GRCm39_genomic.fna

Input fasta comes from cDNA_Cupcake 'collapse_isoforms_by_sam.py' (Pac Bio IsoSeq workflow) and reference annotation and reference genome are the latest mouse build Genbank accession GCA_000001635.9.

Error is below:

Error in $<-.data.frame(tmp, SJ_type, value = "__SJ") : replacement has 1 row, data has 0 Calls: $<- -> $<-.data.frame Execution halted Traceback (most recent call last): File "/SQANTI3/sqanti3_qc.py", line 2289, in <module> main() File "/SQANTI3/sqanti3_qc.py", line 2273, in main run(args) File "/SQANTI3/sqanti3_qc.py", line 1892, in run if subprocess.check_call(cmd, shell=True)!=0: File "/opt/conda/envs/SQANTI3.env/lib/python3.7/subprocess.py", line 363, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '/opt/conda/envs/SQANTI3.env/bin/Rscript /SQANTI3/utilities//SQANTI3_report.R /scratch/newest_build_isoseq/m64120_200619_171832.collapsed.rep_classification.txt /scratch/newest_build_isoseq/m64120_200619_171832.collapsed.rep_junctions.txt /scratch/newest_build_isoseq/m64120_200619_171832.collapsed.rep.params.txt /SQANTI3/utilities/' returned non-zero exit status 1.

Is this a data input issue or a SQANTI3 issue?

Add `--isoAnnotLite` flag to `sqanti3_RulesFilter.py`?

Hi! I am using SQANTI3 (v 1.3, current hash: 2190647) to prepare a gff3 file for tappAS.

Currently, I run sqanti3_qc.py first to generate *_classification.txt, then sqanti3_RulesFilter.py to filter for intrapriming. However, sqanti3_RulesFilter.py does not produce a gff3 file like sqanti3_qc.py would with the --isoAnnotLite flag. Could this option be added?

My work-around for now is to re-run sqanti3_qc.py again with the --isoAnnotLite flag using the *filtered_lite.gtf file output by sqanti3_RulesFilter.py.

I also tried running utilities/IsoAnnotLite_SQ3.py, utilities/IsoAnnotLite_SQ1.py, and the python script from the IsoAnnot Lite help page, but none of the them worked:

(SQANTI3.env) [joelnitta@juno sqanti3]$ python $BINDIR/SQANTI3/utilities/IsoAnnotLite_SQ3.py \
>   d_magna_classification.filtered_lite.gtf \
>   d_magna_classification.filtered_lite_classification.txt \
>   d_magna_classification.filtered_lite_junctions.txt


Running IsoAnnot Lite 1.5...
Traceback (most recent call last):
  File "/home/joelnitta/kato_daphnia/bin/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1443, in <module>
    main()
  File "/home/joelnitta/kato_daphnia/bin/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1320, in main
    run(args)
  File "/home/joelnitta/kato_daphnia/bin/SQANTI3/utilities/IsoAnnotLite_SQ3.py", line 1411, in run
    filename = os.path.join(args.dir , args.output+"_tappAS_annot_from_SQANTI3.gff3")
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

(SQANTI3.env) [joelnitta@juno sqanti3]$ python $BINDIR/SQANTI3/utilities/IsoAnnotLite_SQ1.py \
>   d_magna_classification.filtered_lite.gtf \
>   d_magna_classification.filtered_lite_classification.txt \
>   d_magna_classification.filtered_lite_junctions.txt


Running IsoAnnot Lite 1.5...

Reading SQANTI 3 Files and creating an auxiliar GFF...
File classification does not have the correct structure. The column "ORF_length" is not in the possition 28 in the classification file. We have found the column "FSM_class".

(SQANTI3.env) [joelnitta@juno sqanti3]$ python $BINDIR/IsoAnnotLite/IsoAnnotLite1.5.py \
>   d_magna_classification.filtered_lite.gtf \
>   d_magna_classification.filtered_lite_classification.txt \
>   d_magna_classification.filtered_lite_junctions.txt


Running IsoAnnot Lite 1.5...

Reading SQANTI 3 Files and creating an auxiliar GFF...
File classification does not have the correct structure. The column "ORF_length" is not in the possition 28 in the classification file. We have found the column "FSM_class".