Giter VIP home page Giter VIP logo

biocypher-ot's Introduction

Open Targets BioCypher KG

This is a collection of BioCypher adapters and corresponding scripts for Open Targets platform data. It is a work in progress.

Installation

The project uses Poetry. You can install it like this:

git clone https://github.com/biocypher/open-targets.git
cd open-targets
poetry install

Poetry will create a virtual environment according to your configuration (either centrally or in the project folder). You can activate it by running poetry shell inside the project directory. Alternatively, you can use a different package manager to install the dependencies listed in pyproject.toml.

Note about pycurl

You may encounter an error in executing the script combining this adapter and the UniProt adapter about the SSL backend in pycurl: ImportError: pycurl: libcurl link-time ssl backend (openssl) is different from compile-time ssl backend (none/other)

Should this happen, it can be fixed as described here: https://stackoverflow.com/questions/68167426/how-to-install-a-package-with-poetry-that-requires-cli-args by running poetry shell followed by pip list, noting the version of pycurl, and then running pip install --compile --install-option="--with-openssl" --upgrade --force-reinstall pycurl==<version> to provide the correct SSL backend.

Open Targets target-disease associations

Target-disease association evidence is available from the Open Targets website at https://platform.opentargets.org/downloads. The data can be downloaded in Parquet format, which is a columnar data format that is compatible with Spark and other big data tools. Currently, the data have to be manually downloaded (e.g. using the wget command supplied on the website) and placed in the data/ot_files directory. The adapter currently supports version 23.02 of the data. Available datasets: Target, Disease/Phenotype, Drug, Target - gene ontology, Target - mouse phenotypes and Target - Disease Evidence. CAVE: The latter, which is the main source of target-disease interactions in the open targets platform, is provided in two links, one for the literature evidence (literature/evidence) and one for the full aggregated set (simply evidence). The adapter uses the full set, so make sure to download the correct one. The scripts directory contains a parquet_download.sh script that can be used to download the files (make sure to execute it in the correct folder, data/ot_files).

To transfer the columnar data to a knowledge graph, we use the adapter in adapters/target_disease_evidence_adapter.py, which is called from the script scripts/target_disease_script.py. This script produces a set of BioCypher-compatible files in the biocypher-out directory. To create the knowledge graph from these files, you can find a version of the neo4j-admin import command for the processed data in each individual output folder, under the file name neo4j-admin-import-call.sh, which simply needs to be executed in the home directory of the target database. More information about the BioCypher package can be found at https://biocypher.org.

Please note that, by default, the adapter will be in test mode, which means that it will only process a small subset of the data. To process the full data, you can set the test_mode parameter in the adapter to False (or remove it).

Adapter combination: UniProt and Dependency Map

To demonstrate the combination of multiple adapters to yield a single harmonised knowledge graph, we add the UniProt adapter (created in the context of the CROssBAR v2 project) and the Dependency Map adapter to the target-disease knowledge graph creation script. The resulting script is scripts/target_disease_script_extended.py.

Please note that while the UniProt adapter downloads data directly from UniProt through pypath, the Dependency Map adapter is only functional for demonstration purposes, as it requires the availability of local data (which is limited to 100 entries for our demo case).

biocypher-ot's People

Contributors

slobentanzer avatar loesvdbiggelaar avatar andimajore avatar

Stargazers

Eelke van der Horst avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.