Giter VIP home page Giter VIP logo

Comments (8)

hdbeukel avatar hdbeukel commented on September 16, 2024 1

There are actually even more evidence types missing (see GO website):
Inferred from High Throughput Experiment (HTP)
Inferred from High Throughput Direct Assay (HDA)
Inferred from High Throughput Mutant Phenotype (HMP)
Inferred from High Throughput Genetic Interaction (HGI)
Inferred from High Throughput Expression Pattern (HEP)

from mini-ac.

nicomaper avatar nicomaper commented on September 16, 2024 1

Alright, but maybe first we should find out why they are not in PLAZA, because maybe there is a reason for that. Perhaps it is just that the TAIR annotation has been updated after the PLAZA release, in which case I would be in favor of adding them, but maybe there was another reason (quality, etc.). Knowing that would be important to make a decision on whether to include them or not.

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

This is the reprocessed TAIR10 annotation file (BP, curated and experimental annotations only, extended to parental terms), now including high-throughput experimental annotations: ath_BP_cur_exp_extended_tair10.txt (392.957 annotations)

The respective annotations as processed from PLAZA: ath_BP_cur_exp_extended_plaza.txt (394.411 annotations)

As you can see they do differ a bit. As expected, PLAZA contains annotations that were missing in TAIR10, but the reverse is also true. Ignoring the specific evidence types, there are 358.530 (~90%) annotations in common between PLAZA and TAIR10. The number of specific annotations present in one set but not in the other, is summarised in the table below.

# Specific annotations ATXXX ids non-ATXXX ids
PLAZA 35.881 0
TAIR10 25.443 8.984

We argued that not having the ~9k non-ATXXX ids that were unique to TAIR10 was desired, but what about the >25k ATXXX gene annotations that are unique to TAIR10? Should we include these as well, in addition to the PLAZA annotations?

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

Ok so we decided to include all PLAZA annotations and the ATxGxxx gene annotations from TAIR10 that were not in PLAZA. As the PLAZA v5 data has been generated about three years ago, the missing annotations are likely new annotations.

This would be the new annotation file for Arabidopsis: ath_go_gene_file.txt. @nicomaper can you check it before I make a pull request?

Data has been extended to parental terms and filtered for:

  • BP only
  • Experimental and curator/authored evidence codes only

In case of duplicate annotations (same gene, same GO term) only the one with the highest priority (most relevant) evidence code has been retained (exp > cur).

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

As discussed I will reprocess the file to remove GO terms with over 1.000 annotated genes, to avoid testing for enrichment of very general terms.

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

@nicomaper after filtering the file to retain only annotations with less than 1.000 genes: ath_go_gene_file.txt.

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

Now also removed obsolete ids. If the GO tree provided a replaced_by then the obsolete id has been replaced with the other id, else it has been discarded.

Final go-gene file: ath_go_gene_file.txt. Includes PLAZA 5 annotations + TAIR10 ATXGXXX annotations not found in PLAZA.

Final applied filtering:

  • BP only
  • Experimental and curated evidence:
    • in order of increasing number of annotations: EXP, HDA, IC, IPI, HEP, NAS, IEP, TAS, IGI, IDA, IMP
  • Discarded/replaced obsolete ids
  • Replaced alternate ids with corresponding primary id
  • Propagated to parental terms (extended)
  • Removed duplicate annotations
  • Very general annotations were discarded (GO terms with at least 1.000 annotated genes, after propagation)

from mini-ac.

hdbeukel avatar hdbeukel commented on September 16, 2024

After further discussion we decided to keep all GO terms (except the BP root) in the annotation file, updated file: ath_go_gene_file.txt.

Other properties have not changed (see above).

We will further investigate to exclude generic terms from enrichment testing when performing the actual analysis, for this new options will be added to enricher.

from mini-ac.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.