Giter VIP home page Giter VIP logo

pbg-ld's Introduction

pbg-ld: Linked Data Platform for Plant Breeding & Genomics

DOI Published in PeerJ CI

The pbg-ld software provides access to semantically integrated geno- & pheno-typic data on Solanaceae species (such as tomato and potato) and enables ranking of candidate genes associated with traits of interest.

Prerequisites

Install & deploy

1. Clone this repository.

git clone https://github.com/candYgene/pbg-ld.git

2. Start Docker service(s).

cd pbg-ld
# list available services
docker-compose config --services
# start all services or one-by-one
docker-compose up -d # or add [SERVICE]

Alternatively, deploy the services on a remote server using Ansible Playbook.

ansible-playbook -i inventory playbook.yml

Note: grlc API can be deployed with SPARQL queries stored

  • locally (in the container)
git clone https://github.com/candYgene/queries.git
docker cp queries grlc:/home/grlc/
  • remotely (in a GitHub repo)

Set the environment variables in docker-compose.yml:

  • GRLC_GITHUB_ACCESS_TOKEN
  • GRLC_SERVER_NAME (or CNAME, excluding URI scheme http(s)//:)
  • GRLC_SPARQL_ENDPOINT

3. Access (meta)data in RDF.

Overview of datasets

RDF graphs:IRIs (A-Box)

  • SGN:
    • http://solgenomics.net/genome/Solanum_lycopersicum
    • http://solgenomics.net/genome/Solanum_pennellii
    • http://solgenomics.net/genome/Solanum_tuberosum
  • Ensembl:
    • http://plants.ensembl.org/Solanum_lycopersicum
    • http://plants.ensembl.org/Solanum_tuberosum
  • UniProt:
    • http://www.uniprot.org/proteomes/Solanum_lycopersicum
    • http://www.uniprot.org/proteomes/Solanum_tuberosum
  • QTLs: http://europepmc.org

RDF graphs:IRIs (T-Box)

  • FALDO: http://biohackathon.org/resource/faldo.rdf
  • SO[FA]: http://purl.obolibrary.org/obo/so.owl
  • SIO: http://semanticscience.org/ontology/sio.owl
  • RO: http://purl.obolibrary.org/obo/ro.owl
  • GO: http://purl.obolibrary.org/obo/go.owl
  • UniProt Core: http://purl.uniprot.org/core/
  • PO: http://purl.obolibrary.org/obo/po.owl
  • TO: http://purl.obolibrary.org/obo/to.owl
  • SPTO: http://purl.bioontology.org/ontology/SPTO
  • PATO: http://purl.obolibrary.org/obo/pato.owl

pbg-ld's People

Contributors

anandgavai avatar arnikz avatar c-martinez avatar gnr1990 avatar jspaaks avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

pbg-ld's Issues

No triples for the graph http://www.uniprot.org/proteomes/Solanum_tuberosum

Output of the test.py function

http://plants.ensembl.org/Solanum_tuberosum	11976134
http://solgenomics.net/genome/Solanum_pennellii	11330901
http://plants.ensembl.org/Solanum_lycopersicum	9467605
http://solgenomics.net/genome/Solanum_tuberosum	8894226
http://solgenomics.net/genome/Solanum_lycopersicum	7747361
http://www.uniprot.org/proteomes/Solanum_lycopersicum	4525375
http://purl.obolibrary.org/obo/go.owl	1405083
http://purl.obolibrary.org/obo/po.owl	60459
http://purl.obolibrary.org/obo/so.owl	41586
http://purl.obolibrary.org/obo/pato.owl	33947
http://purl.obolibrary.org/obo/to.owl	27020
http://semanticscience.org/ontology/sio.owl	15237
http://europepmc.org/articles	9063
http://purl.bioontology.org/ontology/SPTO	6537
http://purl.obolibrary.org/obo/ro.owl	5415
http://purl.uniprot.org/core/	2729
http://biohackathon.org/resource/faldo.rdf	232
http://www.uniprot.org/proteomes/Solanum_tuberosum	0

get-rdf.sh receives html for uniprot_core

The script get-rdf.sh in get-rdf fetching does seem to get html for uniprot_core:

curl --stderr - -LH "Accept: application/rdf+xml" -o uniprot_core.rdf "http://purl.uniprot.org/core/"`

produces in uniprot_core.rdf

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <title>UniProt RDF schema ontology</title>
    <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
    ...

Broken links to PMC articles

see tomato_QTLs.ttl.gz file:
e.g. http://identifiers.org/pmc/3464107 -> http://identifiers.org/pmc/PMC3464107

Fix chromosome labels

graph URI: http://solgenomics.net/genome/Solanum_lycopersicum
rdfs:label chromosome SL2.50ch01 -> chromosome 1

Enable additional annotation tracks in JBrowse

Some flanking markers require additional validation & post-processing

For example, QTL:3464107_4_15 is associated with two (flanking) markers:

  • TG194-J1 tagged as
    • TG194
      • http://localhost:8890/genome/Solanum_lycopersicum/variation/gene72_0-i22; chromosome 11:2947789-2948311
      • http://localhost:8890/genome/Solanum_lycopersicum/variation/gene73_0-i22; chromosome 11:2947789-2948309
    • J1
      • http://localhost:8890/genome/Solanum_pennellii/variation/cLES-5-J1; chromosome 1:95577802-95577596 (which is orthologous to http://localhost:8890/genome/Solanum_lycopersicum/variation/gene358_0-i2; chromosome 1:85319942-85322549)

Example Query 7 is not working

Query 7 from the Examples

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX so: <http://purl.obolibrary.org/obo/so#>

SELECT
   str(?qtl_id) AS ?qtl_id
   str(?sgn_gene_id) AS ?sgn_gene_id
   str(?sgn_trans_id) AS ?sgn_trans_id
   str(?annot) AS ?annot
WHERE {
   GRAPH <http://europepmc.org/articles> {
      ?qtl a obo:SO_0000771 ;
         obo:RO_0003308 ?trait ;
         so:overlaps ?gene ;
         dcterms:identifier ?qtl_id .
      FILTER(?trait = obo:SP_0000366)
   }
   GRAPH <http://solgenomics.net/genome/Solanum_lycopersicum> {
      ?gene so:transcribed_to ?transcript ;
         dcterms:identifier ?sgn_gene_id .
      ?transcript rdfs:comment ?annot ;
         dcterms:identifier ?sgn_trans_id
   }
}
LIMIT 5

Returning zero entries.

Check Query for Uniprot

Please check if this Query should return something.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX so: <http://purl.obolibrary.org/obo/so#>

SELECT
   DISTINCT
   ?prot
   ?gene_name
WHERE {
   
    GRAPH <http://www.uniprot.org/proteomes/Solanum_lycopersicum> {
       ?prot uniprot:classifiedWith ?go ;
          uniprot:encodedBy/skos:prefLabel ?gene_name

   }
}

Limit 100

Failed to connect to host

Hello!

I was trying to access this tool and was wondering if someone could help me troubleshoot an issue.

When running the command:

ansible-playbook -i inventory playbook.yml

I encounter the error:

PLAY [all] ***********************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************
fatal: [pbg-ld.candygene-nlesc.surf-hosted.nl]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: Could not resolve hostname pbg-ld.candygene-nlesc.surf-hosted.nl: Name or service not known", "unreachable": true}

PLAY RECAP ***********************************************************************************************************************************************************************************************************************
pbg-ld.candygene-nlesc.surf-hosted.nl : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0   

Is this because the host is temporarily down, or is the host no longer supported?
Alternatively, is the user meant to setup a database host?

Look forward to hearing from you!

Michael

Add SGN gene to InterPro/GO term mappings for tomato

ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/ITAG2.4.go.csv
ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/annotation/ITAG2.4_release/ITAG2.4_proteins_interproscan.tsv

Add updated Jupyter Notebooks

  • update/extend grlc API
  • replace the SPARQL queries (wrapper) with the Web API where possible
  • improve (meta)data in the tables
  • improve data visualization and/or interactivity
  • improve in-line documentation

See the source codes are here.

Normalize chromosome URIs in SGN graphs

For example, chromosome names (seqids) in PGSC_DM_V403_genes.gff and PGSC_DM_V403_DArT.gff: ST4.03ch00..ST4.03ch12
but in potato_69011SNPs_potato_dm_v4.03.gff3: chr00..chr12

As a result, the corresponding chromosome names or URIs in the tomato/potato RDF graphs must be normalized (e.g., .../chromosome/[0-9]+). This could be done on the intermediate SIGA.py *.db files.

Add RDF graphs for potato

  • QTLs
  • gene models/proteome:
    • ftp://ftp.solgenomics.net/genomes/Solanum_tuberosum/annotation/PGSC_4.03/PGSC_DM_V403_genes.gff.zip
    • ftp://ftp.ensemblgenomes.org/pub/plants/release-33/rdf/solanum_tuberosum/
    • https://www.uniprot.org/proteomes/UP000011115
  • genetic markers:
    • ftp://ftp.solgenomics.net/genomes/Solanum_tuberosum/annotation/PGSC_4.03/PGSC_DM_V403_DArT.gff.zip;
    • ftp://ftp.solgenomics.net/genomes/Solanum_tuberosum/annotation/PGSC_4.03/potato_69011SNPs_potato_dm_v4.03.gff3.zip
  • update/link the RDF graphs

All example queries with the following graphs are not running

Error Message is

Virtuoso 42000 Error The estimated execution time 426 (sec) exceeds the limit of 400 (sec).

For Example Query 1

PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT
   str(?feature_name) AS ?feature_name
   ?feature_id
   COUNT(*) AS ?n
WHERE {
   GRAPH <http://solgenomics.net/genome/Solanum_lycopersicum> {
   # http://solgenomics.net/genome/Solanum_pennellii
   # http://solgenomics.net/genome/Solanum_tuberosum
   # http://plants.ensembl.org/Solanum_lycopersicum
   # http://plants.ensembl.org/Solanum_tuberosum                             
      ?ft a ?feature_type .
      FILTER regex(?feature_type, obo:SO_) .
      BIND(concat('[', replace(replace(str(?feature_type), '.+\\/', ''), '_', ':'), '](', ?feature_type, ')') AS ?feature_id)
   }
   GRAPH <http://purl.obolibrary.org/obo/so.owl> {
      ?feature_type rdfs:label ?feature_name
   }
}
GROUP BY ?feature_name ?feature_id
ORDER BY DESC(?n)

Add DC terms identifiers for chromosomes

  • Solanum lycopersicum
  • S. tuberosum -> chromosome-level assemblies not available in ENA
  • S. pennellii

Use dcterms:identifier predicate with ENA:[accession]. See related issue here.

Issue with Emsembl Plants release-33

While running get_rdf.sh I noticed that it failed to download some data files:

curl --stderr - -LO "ftp://ftp.ensemblgenomes.org/pub/plants/release-${ENSEMBLPLANTS_RELEASE}/rdf/solanum_lycopersicum/solanum_lycopersicum.ttl.gz" \
> && echo "http://plants.ensembl.org/Solanum_lycopersicum" > solanum_lycopersicum.ttl.graph

Resulted in:

curl: (9) Server denied you to change to the given directory

Browsing to ftp://ftp.ensemblgenomes.org/pub/plants/, it seems like release-36 does work. So perhaps just need to update the release number (but I don't know if that will break something else).

Update README

  • add pip install ansible
  • clarify deployment alternatives: docker-compose vs. ansible
  • add keys to ssh-agent prior using the ansible playbook

Example Query 4 not working

This Query is not working, I think Uniprot data has been revised. No GO term associated to Ripening now

PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX so: <http://purl.obolibrary.org/obo/so#>
PREFIX go: <http://www.geneontology.org/formats/oboInOwl#>

SELECT
   DISTINCT
   str(?gene_name) AS ?gene_name
   concat('[', ?sgn_gene_id, '](https://solgenomics.net/locus/Solyc00g005000.2/view)') AS ?sgn_gene_id
   concat('[', ?uniprot_acc, '](', ?prot, ')') AS ?uniprot_acc
   concat('[', ?uniprot_id, '](', ?prot, ')') AS ?uniprot_id
   str(?uniprot_des) AS ?uniprot_des
   str(?go_term) AS ?go_term
   concat('[', ?go_id, '](', ?go, ')') AS ?go_id
   str(?go_cat) AS ?go_cat
WHERE {
   GRAPH <http://www.uniprot.org/proteomes/Solanum_lycopersicum> {
      ?prot uniprot:classifiedWith ?go ;
          uniprot:encodedBy/skos:prefLabel ?gene_name
   }
   GRAPH <http://plants.ensembl.org/Solanum_lycopersicum> {
      ?prot dc:identifier ?uniprot_acc ;
          rdfs:label ?uniprot_id ;
          dc:description ?uniprot_des ;
          ^<http://rdf.ebi.ac.uk/terms/ensembl/CHECKSUM> ?ensembl_prot_id .
       ?ensembl_transcript_id so:translates_to ?ensembl_prot_id ;
          so:transcribed_from/dc:identifier ?sgn_gene_id .
   }
   GRAPH <http://purl.obolibrary.org/obo/go.owl> {
      ?go ?p ?o ;
         rdfs:label ?go_term ;
         go:id ?go_id ;
         go:hasOBONamespace ?go_cat .
      ?o bif:contains '( fruit AND ripening )' .
      FILTER regex(?go, obo:GO_)
   }
}
ORDER BY ?gene_name

Error The estimated execution time exceeds the limit of 400 (sec).

Virtuoso 42000 Error The estimated execution time 366929 (sec) exceeds the limit of 400 (sec).

Example Query 4

SPARQL query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX so: <http://purl.obolibrary.org/obo/so#>
PREFIX go: <http://www.geneontology.org/formats/oboInOwl#>

SELECT
   DISTINCT
   str(?gene_name) AS ?gene_name
   concat('[', ?sgn_gene_id, '](https://solgenomics.net/locus/Solyc00g005000.2/view)') AS ?sgn_gene_id
   concat('[', ?uniprot_acc, '](', ?prot, ')') AS ?uniprot_acc
   concat('[', ?uniprot_id, '](', ?prot, ')') AS ?uniprot_id
   str(?uniprot_des) AS ?uniprot_des
   str(?go_term) AS ?go_term
   concat('[', ?go_id, '](', ?go, ')') AS ?go_id
   str(?go_cat) AS ?go_cat
WHERE {
   GRAPH <http://www.uniprot.org/proteomes/Solanum_lycopersicum> {
      ?prot uniprot:classifiedWith ?go ;
          uniprot:encodedBy/skos:prefLabel ?gene_name
   }
   GRAPH <http://plants.ensembl.org/Solanum_lycopersicum> {
      ?prot dc:identifier ?uniprot_acc ;
          rdfs:label ?uniprot_id ;
          dc:description ?uniprot_des ;
          ^<http://rdf.ebi.ac.uk/terms/ensembl/CHECKSUM> ?ensembl_prot_id .
       ?ensembl_transcript_id so:translates_to ?ensembl_prot_id ;
          so:transcribed_from/dc:identifier ?sgn_gene_id .
   }
   GRAPH <http://purl.obolibrary.org/obo/go.owl> {
      ?go ?p ?o ;
         rdfs:label ?go_term ;
         go:id ?go_id ;
         go:hasOBONamespace ?go_cat .
      ?o bif:contains '( fruit AND ripening )' .
      FILTER regex(?go, obo:GO_)
   }
}
ORDER BY ?gene_name

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.