Giter VIP home page Giter VIP logo

chembl-rdf-queries's Introduction

Introduction

This repository contains a list of SPARQL queries illustrating how to use the ChEMBL enpoint. If you are new to the semantic web, you can first follow a tutorial. If you have some knowledge about SQL and relational database already, the transition to SPARQL should be easy, first read the introduction below and you can directly get started!

RDF

RDF stands for Resource Description Framework. Briefly, it is a conceptual way of representing data as a graph, as opposite to relational databases, focusing on tables and their relations. Within RDF, the information is encoded as triples. The nodes and edges of the RDF graph are identified using Universal Resource Identifiers (URI or web addresses). Representing data as a graph is interesting because it simplifies the integration of information from different sources. URIs guarantees the uniqueness of resources and allow you to simply explore them with your web browser.

SPARQL

SPARQL is a language used to query RDF data. It is fairly similar to SQL, yet better standardised and more flexible to query multiple data sources or services in the same time. A SPARQL endpoint is a web address you use to access the RDF data and run SPARQL queries. The ChEMBL SPARQL enpoint is located at http://www.ebi.ac.uk/rdf/services/chembl/sparql

What should I consider using the SPARQL endpoint?

Semantic web technologies provide two main advantages. First, they remove the need to maintain, update, download, parse and handle flat files or databases. You can query the ChEMBL data directly from the web, in a fully automated way. RDF helps you to focus entirely on the query, where the real scientific value is.

Secondly, it becomes easier to integrate the data from another provider. For instance, when you analyse ChEMBL data, you may realise that it would be interesting to combine your current results with gene expression or pathway information. SPARQL allows you to do this easily, as illustrated in the example queries below.

How do I run the queries?

Simply click on the link to open and run directly the queries from your web browser. You can also copy and paste the queries from the files in the web form on the SPARQL endpoint form. Queries contain a comment in the first lines (lines starting with #), summarising what they do. Do not paste the commented lines on the web form, the endpoint does not support them yet.

It is also possible to run the queries from R or with command lines. More examples will come to demonstrate this feature. Finally, you can also check the ChEMBL endpoint documentation or contact us if you are facing any problems.

SPARQL queries over the ChEMBL endpoint

Queries are listed by degree of difficulty. More complex queries require a better understanding of the RDF graph's underlying structure. You can find a summary map of the structure here.

A. Simple SPARQL queries

  1. Retrieve ChEMBL molecule from the trade name ("sildenafil"). file or see it live
  2. Retrieve the molecular formula of ChEMBL molecule having ChEMBL-id "CHEMBL192". file or see it live
  3. Retrieve rotational bond of ChEMBL molecule having ChEMBL-id "CHEMBL192". file or see it live
  4. Retrieve trade name of CHEMBL192 molecule. file or see it live
  5. Retrieve the ChEMBL molecules URI having molecular formula is combination of “C22H30N6O4S”. file or see it live

B. Moderate difficulty

  1. Retrieve substance types having target type "cell-line". file or see it live
  • Retrieve target types available in ChEMBL rdf triple store. fileor see it live
  • Retrieve compound activity details for all target. fileor see it live
  • Retrieve all the bioactive ChEMBL molecules for bacterial target. fileor see it live
  • Retrieve ChEMBL molecules targeting “Firefly Luciferase”. fileor see it live
  • Retrieve target details, uniprot_reference and sequences for proteins target. fileor see it live
  • Retrieve ChEMBL molecules activity details for all targets containing a protein of interest, and protein of interest is human M2 muscarinic receptor (P08172). fileor see it live
  • Retrieve ChEMBL molecules activity details for a target, and target is Human PDE5 (CHEMBL1827). fileor see it live
  • Retrieve ChEMBL molecules activity details for all target. fileor see it live

In some of the queries, I have used the filter function even is not needed but just to make extra column which give satisfaction for correct output. I helps If you are new for triple store. For example, I am interested in ChEMBL-id of molecules having activity standard type "IC50" then we can put "IC50" value at standard type but to make a extra column to show that I have selected the correct standard type, can use filter. We can add standard type as a new column having constant text "IC50" without filter. These differences make change in running time of query. To analyse this kind of changes, I have made query for about same problem in different way like last 5 ChEMBL queries.

Note: Try to avoid the use of filter function in SPARQL query, because it takes more time for running.

  1. Retrieve ChEMBL molecules ChEMBL-ID, activity standard type, activity standard unit having activity standard type "IC50" and standard unit "nM" using filter. file or see it live
  • Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM". file or see it live
  • Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM" having extra columns with variable name that contain constant text about standard type and standard unit. file or see it live
  • Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM" having extra columns that contains constant text about standard type and standard unit. file or see it live
  • Retrieve ChEMBL molecules ChEMBL-ID, activity standard type, activity standard unit having activity standard type "IC50" and standard unit "nM" using filter but two conditions in a single filter. file or see it live

SPARQL queries for metadata

If you are new in querying RDF triple store then you can try these queries, because these work on any SPARQL endpoint. It will help to get familiar with contains of triple store.

  1. Retrieve all available triples from triple store

Fedarated SPARQL queries or other than ChEMBL endpoint queries

  1. Retrieve known diseases from uniprot. file or see it live
  2. Retrieve the proteins and their sequence involved in Alzheimer disease (Runs on ChEMBL SPARQL endpoint). fileor see it live
  3. Retrieve the proteins and their sequence involved in Alzheimer disease (Runs on UniProt SPARQL endpoint). fileor see it live
  4. Retrieve total number of known diseases in uniprot. fileor see it live

chembl-rdf-queries's People

Contributors

ashwini607 avatar loopasam avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.