Comments (7)
Hugo;
I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:
http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
from bcbb.
Thanks Brad. I will contact them and will let you know asap.
.''. Hugo A. M. Torres : :' :
. ' “Talk is cheap,
- show me the code. ” -- L. Torvalds.
On Mon, Mar 12, 2012 at 3:04 PM, Brad Chapman
[email protected]
wrote:
Hugo;
I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
Reply to this email directly or view it on GitHub:
#52 (comment)
from bcbb.
HI Brad, perhaps this might be useful for testing your program:
http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
I tried and the tool pointed for instance is that the produced gff3
file file has a "source" field. IIRC Peter Cock in one his blog posts
says genbank has those but GFF3 does not.
Here, I paste you a sample report:
GFF3 File Validation Report
ontology_file(s):
http://song.cvs.sourceforge.net/*checkout*/song/ontology/so.obo
generated: 12-Mar-12 15:27:10
###############################################################################
THIS FILE HAS NOT BEEN VALIDATED, IT CONTAINS ERRORS, PLEASE REVIEW REPORT!
(NO WARNINGS HAVE BEEN ISSUED FOR THIS FILE)
###############################################################################
###############################################################################
THIS FILE HAS BEEN PROCESSED ENTIRELY AND ALL ERRORS/WARNINGS ARE REPORTED!
###############################################################################
First 10 lines of the analyzed GFF3 file follows:
[line 1]> ##gff-version 3
[line 2]> ##sequence-region NG_017013.1 1 26144
[line 3]> NG_017013.1 annotation remark 1 26144 .
[line 3]> . . comment=REVIEWED%20REFSEQ%3A%20This%20record%20has%20been%20curated%20by%20NCBI%20staff%20in%0Acollaboration%20with%20Graham%20Taylor.%20The%20reference%20sequence%20was%0Aderived%20from%20AC087388.9%20and%20AC007421.13.%0AThis%20sequence%20is%20a%20reference%20standard%20in%20the%20RefSeqGene%20project.%0APublication%20Note%3A%20%20This%20RefSeq%20record%20includes%20a%20subset%20of%20the%0Apublications%20that%20are%20available%20for%20this%20gene.%20Please%20see%20the%20Gene%0Arecord%20to%20access%20additional%20publications.%0ASummary%3A%20This%20gene%20encodes%20tumor%20protein%20p53%2C%20which%20responds%20to%0Adiverse%20cellular%20stresses%20to%20regulate%20target%20genes%20that%20induce%20cell%0Acycle%20arrest%2C%20apoptosis%2C%20senescence%2C%20DNA%20repair%2C%20or%20changes%20in%0Ametabolism.%20p53%20protein%20is%20expressed%20at%20low%20level%20in%20normal%20cells%0Aand%20at%20a%20high%20level%20in%20a%20variety%20of%20transformed%20cell%20lines%2C%20where%0Ait%27s%20believed%20to%20contribute%20to%20transformation%20and%20malignancy.%20p53%0Ais%20a%20DNA-binding%20protein%20containing%20transcription%20activation%2C%0ADNA-binding%2C%20and%20oligomerization%20domains.%20It%20is%20postulated%20to%20bind%0Ato%20a%20p53-binding%20site%20and%20activate%20expression%20of%20downstream%20genes%0Athat%20inhibit%20growth%20and/or%20invasion%2C%20and%20thus%20function%20as%20a%20tumor%0Asuppressor.%20Mutants%20of%20p53%20that%20frequently%20occur%20in%20a%20number%20of%0Adifferent%20human%20cancers%20fail%20to%20bind%20the%20consensus%20DNA%20binding%0Asite%2C%20and%20hence%20cause%20the%20loss%20of%20tumor%20suppressor%20activity.%0AAlterations%20of%20this%20gene%20occur%20not%20only%20as%20somatic%20mutations%20in%0Ahuman%20malignancies%2C%20but%20also%20as%20germline%20mutations%20in%20some%0Acancer-prone%20families%20with%20Li-Fraumeni%20syndrome.%20Multiple%20p53%0Avariants%20due%20to%20alternative%20promoters%20and%20multiple%20alternative%0Asplicing%20have%20been%20found.%20These%20variants%20encode%20distinct%20isoforms%2C%0Awhich%20can%20regulate%20p53%20transcriptional%20activity.%20%5Bprovided%20by%0ARefSeq%2C%20Jul%202008%5D.;
[line 3]> sequence_version=1;source=Homo%20sapiens%20%28human%29;
[line 3]> taxonomy=Eukaryota,Metazoa,Chordata,
[line 3]> Craniata,Vertebrata,Euteleostomi,
[line 3]> Mammalia,Eutheria,Euarchontoglires,
[line 3]> Primates,Haplorrhini,Catarrhini,
[line 3]> Hominidae,Homo;keywords=RefSeqGene;
[line 3]> references=location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Marcel%2CV.%2C%20Tran%2CP.L.%2C%20Sagne%2CC.%2C%20Martel-Planche%2CG.%2C%20Vaslin%2CL.%2C%20Teulade-Fichou%2CM.P.%2C%20Hall%2CJ.%2C%20Mergny%2CJ.L.%2C%20Hainaut%2CP.%20and%20Van%20Dyck%2CE.%0Atitle%3A%20G-quadruplex%20structures%20in%20TP53%20intron%203%3A%20role%20in%20alternative%20splicing%20and%20in%20production%20of%20p53%20mRNA%20isoforms%0Ajournal%3A%20Carcinogenesis%2032%20%283%29%2C%20271-278%20%282011%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2021112961%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Naidu%2CS.R.%2C%20Love%2CI.M.%2C%20Imbalzano%2CA.N.%2C%20Grossman%2CS.R.%20and%20Androphy%2CE.J.%0Atitle%3A%20The%20SWI/SNF%20chromatin%20remodeling%20subunit%20BRG1%20is%20a%20critical%20regulator%20of%20p53%20necessary%20for%20proliferation%20of%20malignant%20cells%0Ajournal%3A%20Oncogene%2028%20%2827%29%2C%202492-2501%20%282009%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2019448667%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Anczukow%2CO.%2C%20Ware%2CM.D.%2C%20Buisson%2CM.%2C%20Zetoune%2CA.B.%2C%20Stoppa-Lyonnet%2CD.%2C%20Sinilnikova%2CO.M.%20and%20Mazoyer%2CS.%0Atitle%3A%20Does%20the%20nonsense-mediated%20mRNA%20decay%20mechanism%20prevent%20the%20synthesis%20of%20truncated%20BRCA1%2C%20CHK2%2C%20and%20p53%20proteins%3F%0Ajournal%3A%20Hum.%20Mutat.%2029%20%281%29%2C%2065-73%20%282008%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2017694537%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Bourdon%2CJ.C.%0Atitle%3A%20p53%20Family%20isoforms%0Ajournal%3A%20Curr%20Pharm%20Biotechnol%208%20%286%29%2C%20332-336%20%282007%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2018289041%0Acomment%3A%20Review%20article,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Murray-Zmijewski%2CF.%2C%20Lane%2CD.P.%20and%20Bourdon%2CJ.C.%0Atitle%3A%20p53/p63/p73%20isoforms%3A%20an%20orchestra%20of%20isoforms%20to%20harmonise%20cell%20differentiation%20and%20response%20to%20stress%0Ajournal%3A%20Cell%20Death%20Differ.%2013%20%286%29%2C%20962-972%20%282006%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%2016601753%0Acomment%3A%20Review%20article,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Flaman%2CJ.M.%2C%20Waridel%2CF.%2C%20Estreicher%2CA.%2C%20Vannier%2CA.%2C%20Limacher%2CJ.M.%2C%20Gilbert%2CD.%2C%20Iggo%2CR.%20and%20Frebourg%2CT.%0Atitle%3A%20The%20human%20tumour%20suppressor%20gene%20p53%20is%20alternatively%20spliced%20in%20normal%20cells%0Ajournal%3A%20Oncogene%2012%20%284%29%2C%20813-818%20%281996%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%208632903%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Lamb%2CP.%20and%20Crawford%2CL.%0Atitle%3A%20Characterization%20of%20the%20human%20p53%20gene%0Ajournal%3A%20Mol.%20Cell.%20Biol.%206%20%285%29%2C%201379-1385%20%281986%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%202946935%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Harlow%2CE.%2C%20Williamson%2CN.M.%2C%20Ralston%2CR.%2C%20Helfman%2CD.M.%20and%20Adams%2CT.E.%0Atitle%3A%20Molecular%20cloning%20and%20in%20vitro%20expression%20of%20a%20cDNA%20clone%20for%20human%20cellular%20tumor%20antigen%20p53%0Ajournal%3A%20Mol.%20Cell.%20Biol.%205%20%287%29%2C%201601-1610%20%281985%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%203894933%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Zakut-Houri%2CR.%2C%20Bienz-Tadmor%2CB.%2C%20Givol%2CD.%20and%20Oren%2CM.%0Atitle%3A%20Human%20p53%20cellular%20tumor%20antigen%3A%20cDNA%20sequence%20and%20expression%20in%20COS%20cells%0Ajournal%3A%20EMBO%20J.%204%20%285%29%2C%201251-1255%20%281985%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%204006916%0Acomment%3A,
[line 3]> location%3A%20%5B0%3A26144%5D%0Aauthors%3A%20Matlashewski%2CG.%2C%20Lamb%2CP.%2C%20Pim%2CD.%2C%20Peacock%2CJ.%2C%20Crawford%2CL.%20and%20Benchimol%2CS.%0Atitle%3A%20Isolation%20and%20characterization%20of%20a%20human%20p53%20cDNA%20clone%3A%20expression%20of%20the%20human%20p53%20gene%0Ajournal%3A%20EMBO%20J.%203%20%2813%29%2C%203257-3262%20%281984%29%0Amedline%20id%3A%20%0Apubmed%20id%3A%206396087%0Acomment%3A;
[line 3]> accessions=NG_017013;data_file_division=PRI;
[line 3]> date=19-FEB-2012;organism=Homo%20sapiens;
[line 3]> gi=293651587
[line 4]> NG_017013.1 feature source 1 26144 . + .
[line 4]> db_xref=taxon%3A9606;mol_type=genomic%20DNA;
[line 4]> organism=Homo%20sapiens;chromosome=17;
[line 4]> map=17p13.1
[line 5]> NG_017013.1 feature gene 1 6475 . - .
[line 5]> note=WD%20repeat%20containing%2C%20antisense%20to%20TP53;
[line 5]> db_xref=GeneID%3A55135,HGNC%3A25522,
[line 5]> MIM%3A612661;gene=WRAP53;gene_synonym=DKCB3%3B%20TCAB1%3B%20WDR79
[line 6]> NG_017013.1 feature mRNA 2845 6475 . - .
[line 6]> db_xref=GI%3A221136857,GeneID%3A55135,
[line 6]> HGNC%3A25522,MIM%3A612661;product=WD%20repeat%20containing%2C%20antisense%20to%20TP53%2C%20transcript%20variant%202;
[line 6]> transcript_id=NM_001143990.1;inference=similar%20to%20RNA%20sequence%2C%20mRNA%20%28same%20species%29%3ARefSeq%3ANM_001143990.1;
[line 6]> exception=annotated%20by%20transcript%20or%20proteomic%20data;
[line 6]> gene=WRAP53;gene_synonym=DKCB3%3B%20TCAB1%3B%20WDR79;
[line 6]> ID=NM_001143990.1
[line 7]> NG_017013.1 feature mRNA 2845 2956 . - .
[line 7]> Parent=NM_001143990.1
[line 8]> NG_017013.1 feature mRNA 3224 3322 . - .
[line 8]> Parent=NM_001143990.1
[line 9]> NG_017013.1 feature mRNA 3467 3898 . - .
[line 9]> Parent=NM_001143990.1
[line 10]> NG_017013.1 feature mRNA 6322 6475 . - .
[line 10]> Parent=NM_001143990.1
...
Line Number Error/Warning
4 [ERROR] invalid type (type: source)
7 [ERROR] invalid type pair - check all parents (at line
6; mRNA to mRNA)
12 [ERROR] invalid type pair - check all parents (at line
11; mRNA to mRNA)
17 [ERROR] invalid type pair - check all parents (at line
16; mRNA to mRNA)
22 [ERROR] invalid type pair - check all parents (at line
21; mRNA to mRNA)
26 [ERROR] invalid type pair - check all parents (at line
25; CDS to CDS)
30 [ERROR] invalid type pair - check all parents (at line
29; CDS to CDS)
34 [ERROR] invalid type pair - check all parents (at line
33; CDS to CDS)
38 [ERROR] invalid type pair - check all parents (at line
37; CDS to CDS)
44 [ERROR] invalid type pair - check all parents (at line
43; mRNA to mRNA)
56 [ERROR] invalid type pair - check all parents (at line
55; mRNA to mRNA)
69 [ERROR] invalid type pair - check all parents (at line
68; mRNA to mRNA)
82 [ERROR] invalid type pair - check all parents (at line
81; mRNA to mRNA)
94 [ERROR] invalid type pair - check all parents (at line
93; mRNA to mRNA)
113 [ERROR] invalid type pair - check all parents (at line
112; CDS to CDS)
124 [ERROR] invalid type pair - check all parents (at line
123; CDS to CDS)
135 [ERROR] invalid type pair - check all parents (at line
134; CDS to CDS)
145 [ERROR] invalid type pair - check all parents (at line
144; CDS to CDS)
162 [ERROR] invalid type pair - check all parents (at line
161; CDS to CDS)
171 [ERROR] invalid type pair - check all parents (at line
170; mRNA to mRNA)
180 [ERROR] invalid type pair - check all parents (at line
179; mRNA to mRNA)
189 [ERROR] invalid type pair - check all parents (at line
188; mRNA to mRNA)
206 [ERROR] invalid type pair - check all parents (at line
205; CDS to CDS)
214 [ERROR] invalid type pair - check all parents (at line
213; CDS to CDS)
221 [ERROR] invalid type pair - check all parents (at line
220; CDS to CDS)
.''. Hugo A. M. Torres : :' :
. ' “Talk is cheap,
- show me the code. ” -- L. Torvalds.
On Mon, Mar 12, 2012 at 3:50 PM, A M Torres, Hugo
[email protected] wrote:
Thanks Brad. I will contact them and will let you know asap.
.''
. Hugo A. M. Torres : :' :
.' “Talk is cheap,
- show me the code. ” -- L. Torvalds.On Mon, Mar 12, 2012 at 3:04 PM, Brad Chapman
[email protected]
wrote:Hugo;
I think that the phase is correct but happy to adjust if the GenomeTools folks think otherwise. The GFF spec specifies the phase as 0,1 or 2:http://www.sequenceontology.org/gff3.shtml
while codon_start from the GenBank file is 1, 2 or 3:
http://www.ddbj.nig.ac.jp/FT/full_index.html#7.2
so I've made the adjustment from 1 to 0 in the GFF output when converting. Let me know if your interaction with the GenomeTools developers indicate I've missed something in the conversion.
Reply to this email directly or view it on GitHub:
#52 (comment)
from bcbb.
Hugo;
Thanks for this. The validator is complaining about 'source' not being present in the Sequence Ontology. Mapping GenBank to SO is a fairly large problem. I tried to tackle this a few years back but it ended up being too much work. Here's the progress I made:
http://bcbio.wordpress.com/2008/12/14/standard-ontologies-in-biosql/
Practically, most tools will not enforce this requirement, so being unable to map the entire thing I took the approach of keeping the output GFF similar to the input GenBank. If you wanted to take on a mapping of GenBank to Sequence Ontology I'd be happy to incorporate in.
Is GenomeTools requiring the ontology matches, or just that online validator?
from bcbb.
Hi Brad,
Is GenomeTools requiring the ontology matches, or just that online validator?
Hmm, It seems only the validator. GenomeTools seems only to be
complaining about that "phase" field.
I have already posted your considerations on their issue tracker. I
will let you know what they say when I get a reply. In any case,
thanks for taking the time you spent on looking at my problem.
from bcbb.
Thanks Hugo -- let me know if there ends up being anything I can change on my end to improve the phase information. Hopefully that'll do it and get things working smoothly with GenomeTools. Thanks for your patience with this.
from bcbb.
Hugo;
I'm going to close this to clean up the issues. Hopefully everything was solved on the GenomeTools side. Thanks
from bcbb.
Related Issues (20)
- GFF parsing fails with most recent version of BioPython HOT 8
- Regarding gff2_to_gff3.py script
- ValueError: need more than 0 values to unpack HOT 1
- Did not find remapped ID location: HOT 2
- licence? HOT 1
- TypeError: 'FakeHandle' object is not iterable HOT 2
- Need Nucleotide or protein alphabet HOT 2
- issue with reduce HOT 5
- Python 3 compatibility issue + fix HOT 3
- IndexError with NCBI gff
- Any chance a new release will be made sometime soon?
- glimmergff_to_proteins.py / Reordering Fasta file
- glimmergff_to_proteins.py / Alternative Codon Table
- Bug in v.0.6.8 in `GFF.parse`: `AttributeError: 'NoneType' object has no attribute 'find'` HOT 1
- Error in GFFParser: UnknownSeq is deprecated in biopython 1.81 HOT 7
- Error in GFFParser: TypeError: SeqFeature.__init__() got an unexpected keyword argument 'strand' HOT 1
- Importing BCBio.GFF suppresses Biopython warnings in external code HOT 4
- Error in GFFParser: AttributeError: module 'Bio.SeqFeature' has no attribute 'SimpleLocation'
- Warning in GFFParser: BiopythonDeprecationWarning: UnknownSeq(length) is deprecated; please use Seq(None, length) instead.
- GFF.parse modifies input order of base_dict
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bcbb.