Comments (19)
The with/from field qualifies (or was originally intended to qualify) the evidence, so the individuals should be collapsed, but each ref/evidence_code/with is a separate piece of evidence.
The one caveat to that is that I thought we had decided to express binding annotations in a formally correct way with the bound entity being an input. then we would spit them back out as they currently are in the with field. @vanaukenk is that still the plan?
from gocamgen.
@ukemi
Yes, that is still the plan for protein binding annotations. If we hadn't done this for protein binding, then not only would we have been inconsistent with GO-CAM, but we would not have been able to collapse the evidence for this particular term.
from gocamgen.
OK, I think we can still work with that protein binding caveat. I now see the protein binding section on the wiki.
So basically, DO split out distinct with/from values into multiple assertions (translated with different has_input
entities) IF the primary term is protein binding (and descendants or just GO:0005515?), ELSE collapse these different with/from values onto the same assertion individual.
@ukemi @vanaukenk Sound correct?
from gocamgen.
Yes. That sounds correct.
from gocamgen.
Cool, thanks! I updated the header/line thingy in my first comment to clarify how with/from
is being handled.
from gocamgen.
We will want to split out distinct With/From values into multiple assertions for GO:0005515 and also its children, as we may have annotations to terms like 'protein kinase binding' GO:0019901 that still refer to different entities in the With/From field.
I'll update the protein binding section of the wiki to make it clearer which of the options we chose.
from gocamgen.
I've updated the import rules section for protein binding. Please let me know if anything is unclear or doesn't look right to you:
http://wiki.geneontology.org/index.php/Noctua_MOD_Imports#Protein_Binding_Annotations
Thx.
from gocamgen.
Thanks @vanaukenk ! That definitely is more straight-forward. I just wanted to make sure I understand the last point:
Evidence will not be combined for annotations to protein binding or its children
So multiple GPAD lines with same GP-term-with/from-etc (the header fields above) values won't be collapsed into the same assertion if their evidence code-references (line fields) vary? More simply, each protein binding GPAD line will have its own assertion individual in GO-CAM?
from gocamgen.
Is this true even for annotation lines that have the same value in the 'with' field?
Ex:
MGI MGI:1340046 enables GO:0005515 MGI:MGI:4845793|PMID:21068328 ECO:0000353 UniProtKB:Q8K1S1 20130311 MGI
MGI MGI:1340046 enables GO:0005515 MGI:MGI:4441002|PMID:20220021 ECO:0000353 UniProtKB:Q8K1S1 20130311 MGI
from gocamgen.
@dustine32
Yes, we can combine protein binding GPAD lines if they are the same GP-term-with/from but different references. @ukemi - is that what you are meaning to illustrate above?
from gocamgen.
Another potential illustration.
These three annotations could be combined as evidence for a single Noctua instance since they refer to the same binding (EGL-1 binds CED-9) but just cite different references (and have different annotation dates).
from gocamgen.
Whereas in this example (just looking at the WB annotations, ignore the SWIE one), we would have two separate protein binding annotations:
One that combined three pieces of evidence and the WB:WBGene00000418 in the With/From field and one with a single piece of evidence and WB:WBGene00001170.
from gocamgen.
Here's an example where it looks like we could combine evidence for a cellular component annotation, but haven't yet:
ced-9 (WB:WBGene00000423)
lat-1 (WB:WBGene00002251) is another example of two CC annotations whose evidence could be merged into a single individual.
from gocamgen.
@vanaukenk Oh ok, that's what I was thinking too. Differing references alone shouldn't require multiple assertion individuals. I also forgot that all protein binding and descendant term annotations should be using the same IPI evidence code, so that removes one variable.
Thanks so much for clarifying!
from gocamgen.
@vanaukenk I think those two examples are now collapsing correctly. Here they are on my dev server:
- WB:WBGene00000423 to [GO:0005739] mitochondrion; also protein binding:
http://68.181.46.18:8910/editor/graph/gomodel:7a272ea7-f66c-42eb-ba20-804f1c4cb815 - WB:WBGene00002251 to [GO:0005887] integral component of plasma membrane:
http://68.181.46.18:8910/editor/graph/gomodel:d9e069b1-ea00-4403-9ee4-9acd0a866f64
from gocamgen.
@dustine32 - the two CC examples above are indeed now collapsing correctly. Thanks!
from gocamgen.
@vanaukenk have you set up a formal testing document? If not, I will have a shot at it.
from gocamgen.
Yes, I started with this spreadsheet here:
https://docs.google.com/spreadsheets/d/1XFuD6LOyFKXNk94jIK8zv1TrESfwCJo-RnrXQ3tzmJg/edit
from gocamgen.
@vanaukenk @ukemi The latest iteration of WB, MGI models are now up on noctua-dev so this can be tested there now. This won't have the fix for the comma-separated with/from snafu that @ukemi pointed out here but I've since fixed it on my USC server.
Here are some stats from the import attached to the PR.
from gocamgen.
Related Issues (20)
- Update extension validation rules TSV
- Update extension validation rules for acts_o_population_of HOT 3
- Create test files for additional annotation metadata HOT 31
- No extension is an island HOT 4
- Resolve Shex failures in MGI annotations due to invalid identifiers for binding input HOT 38
- add taxon metadata for each model HOT 5
- Create test files for WB import HOT 12
- Write translated models out in N-Quads format HOT 1
- Proteoforms shouldn't be split into separate models HOT 14
- Handle pipe-separation in translation of with/from field HOT 1
- Handling interacting taxon data
- Add date and contributor to ALL annotation individuals? Not just evidence and Axiom? HOT 4
- Emit comment in annotation properties HOT 1
- Collapse comma-delimited objects of chain relations
- Add providedBy to all individuals HOT 2
- Add gene symbol to model title HOT 14
- Param to set modelstate HOT 1
- Comment missing some text from the GPAD HOT 1
- Set import model states to production so annotations are in GPAD outputs from dev HOT 5
- Processing annotation contributors for multiple GPAD lines with single annotation id HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gocamgen.