Giter VIP home page Giter VIP logo

monochrom's Introduction

Build Status

Chromosome Ontology

Chromo (abbreviation CHR) is an automatically derived ontology of chrosomosomes are chromosome parts

This ontology may eventually be housed at http://obofoundry.org/ontology/chr

Currently we use obolibrary PURLs, but this could potentially be changed to e.g. w3ids, depending on discussion re databases in OBO

Until this is released, you can browse either:

  • chr.owl -- multiple species in one ontology, with minimal imports merged
  • components -- one file per species, both OWL and YAML

About

This "ontology" is a direct conversion of metadata about chromosomes and chromosome bands obtained from UCSC chromosome and cytoband data

Each chromosome and chromosomal region is represented as an OWL class, with the following properties:

  • id/IRI
  • name
  • taxon
  • part-parent
  • coordinates + build
  • aliases and exact mappings (e.g. to NCBI/INSDC as well as ENSEMBL)

To browse the schema, see the schema docs

See the schema for more details.

The use cases for this "ontology" are:

  • To provide standardized identifiers and PURLs for chromosomes and chromosome parts
  • To be used as an import to other ontologies; e.g. to define trisomies for diseases
  • To provide a source of nodes in knowledge graphs
  • For text mining purposes
  • As a nexus for mapping efforts where other terminologies have incorporated chrosomes or their parts (NCIT, MESH, etc)

This ontology is intended primary as a way to provide ontology edges for classes in disease and phenotype ontologies that must reference chromosomes, e.g. to define trisomies, etc.

Note that unlike many ontologies, the ontology is not curated - it is a programmatic transform

There are some parallels to the OBO version of the NCBI taxonomy (http://obofoundry.org/ontology/chr)[http://obofoundry.org/ontology/chr), in that we do not curate any ontological information, we simply perform a direct transform.

Unlike the NCBI Taxonomy, there is no class hierarchy for chromosomes and chromosome bands. Instead things are arranged as a partonomy

  • chr1
    • chr1p
      • chr1p1
        • chr1p11

We deliberately do not create fake grouping classes such as "Human chromosome". Note that this ontology may therefore look unusual in ontology browsers, where there is an implicit assumption of some hierarchy.

Currently only a small number of genomes are provided - it should be relatively easy to extend this to other genomes so long as they are covered by UCSC.

Protege screenshot:

image

TODO

Align with karyotype ontology:

https://arxiv.org/pdf/1305.3758.pdf

Versions

The latest version of the ontology can always be found at:

http://purl.obolibrary.org/obo/chr.owl (once this ontology is registered)

(note this will not show up until the request has been approved by obofoundry.org)

Instructions for maintainers

From the top level of this repo:

pip install -r requirements.txt
make

This will update the monochrom component in src/ontology/components/ucsc.owl. To produce and official ODK release:

cd src/ontology
make prepare_release

The Makefile and the metadata file genomes.yaml drive the python code in monochrom/.

To add more genomes, it is necessary to etxend both the Makefile and the genomes metadata file, but this could be made more elegant in the future.

If you wish to modify the code, here is how it is structured, and the underlying philosophy.

Everything is driven by a LinkML schema, see schema

This defines a few core classes:

These have properties (slots) such as id, start, end, ...

The schema has extensive mappings to standard URIs either from OBO or from the wider world of semantics

The code monochrom.py takes care of

  • parsing files to the chromo datamodel
    • cytoBand.txt files - with the core coordinate info
    • chromAliases.txt files - with alternate names and mappings
  • additional processing
    • assigning synonyms
    • inferring parent bands and arms (UCSC files only give coordinates for most granular subdivisions)
    • validation, e.g. range checking
  • mapping the chromo datamodel
    • mapping to OWL
    • (TODO) mapping tp KGX via Koza

Note that the chromo objects will naturally serialize to YAML. See the components/ directory for examples. We provide both OWL and YAML

The mapping to OWL is handled with relatively generic code that uses slot and class uris defined in the LinkML schema - thus keeping things relaively generic. In future we may instead emit a CSV and use ROBOT templates (mapping from LinkML to robot templates is in the works)

Contact

Please use this GitHub repository's Issue tracker to request new terms/classes or report errors or specific concerns related to the ontology.

Acknowledgements

This ontology repository was created using the ontology development kit

monochrom's People

Contributors

cmungall avatar matentzn avatar sabrinatoro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

matentzn jiaola

monochrom's Issues

Add: Ggal (chicken)

Note: in aves, the sex chroms are W and Z. I believe the code works here and does the right thing but let's add and check

document how ETL and ODK build compose

it's a bit clunky

the top level Makefile has targets to do ETL

  • fetch bands
  • make components/owl,yaml
  • make ucsc.owl and place in src/ontology

the Makefile in src/ontology is a standard odk one. chr-edit.owl is not generally edited, it is mostly an importer

note: if ucsc.owl changes, chr.owl does NOT get triggered automatically. I touch chr-edit.owl to trigger this.

Ensure we have latest builds

not sure of any way to automate easily

genomes.yaml:

name: foo
genomes:
  mm:
    taxon: NCBITaxon:10090
    name: Mouse
  hg:
    taxon: NCBITaxon:9606
    name: Human
  dm:
    taxon: NCBITaxon:7227
    name: Drosophila melanogaster
  danRer:
    taxon: NCBITaxon:7955
    name: Danio rerio
  rn:
    taxon: NCBITaxon:10116
    name: Rat
  ce:
    taxon: NCBITaxon:6239
    name: C elegans
  galGal:
    taxon: NCBITaxon:9031
    name: Chicken
taxons:
  NCBITaxon:10090:
    name: Mus musculus
  NCBITaxon:9606:
    name: Homo sapiens
  NCBITaxon:7227:
    name: Drosophila melanogaster
  NCBITaxon:7955:
    name: Danio rerio
  NCBITaxon:10116:
    name: Rattus norvegicus
  NCBITaxon:6239:
    name: Caenorhabditis elegans

some of these are copied from dipper. We have most up to date chicken. I just updated human with latest from UCSC but I assume the T2T build there soon

Add: Dmel

Dmel ingest doesn't work now because of different naming conventions

Review: hierarchy strategy

From the README:

There are some parallels to the OBO version of the NCBI taxonomy
(http://obofoundry.org/ontology/chr)[http://obofoundry.org/ontology/chr),
in that we do not curate any ontological information, we simply
perform a direct transform.

Unlike the NCBI Taxonomy, there is no class hierarchy for chromosomes and chromosome bands. Instead things are arranged as a partonomy

  • chr1
    • chr1p
      • chr1p1
        • chr1p11

We deliberately do not create fake grouping classes such as "Human
chromosome". Note that this ontology may therefore look unusual in
ontology browsers, where there is an implicit assumption of some
hierarchy.

Do we all agree with this strategy?

Replace dipper monochrom with CHR

  • Make sure that all species are covered
  • Make sure that all uses of monochrom or covered in CHR

@kshefchek maybe its not in the dipper ingest, or not used in scigraph
@cmungall I suspect its not used, nothing would happen.

We should decouple Monochrom from the Monarch KG

Release CHR on OBO (if OBO takes it, else w3id)
Then we can include the human part into Mondo -> So we need to be careful that if CHR goes to Mondo, Mondo goes to Monarch KG, there is already a monochrome in there -- beware of interactions.

Review: mitochondrial chromosomes

currently these go directly under 'chromosome':

image

they should be inferred to be under 'mitochondrial chromosome' when GO axioms added, but should we just infer in advance

question (more for GO?)

Which of these:

  1. chromosome DisjointUnion autosome OR sex-chromosome; (entailment: mtChrom is-a autosome)
  2. chromosome DisjointUnion autosome OR sex-chromosome OR mt chromosome OR chlorplast chromosome

I think 2 seems more sane

Release monochrom

minimal for release

  • Create a GH repo using ODK
    • we would do a manual step of copying the ttl to this repo
  • Ensure all labels are included
  • Fix OWL errors (see below)

OWL error:

image

things to include in future

not urgent, just good to have them tracked

  • syntenic regions
  • chromosome positions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.