Introduction

This repository contains a list of SPARQL queries illustrating how to use the ChEMBL enpoint. If you are new to the semantic web, you can first follow a tutorial. If you have some knowledge about SQL and relational database already, the transition to SPARQL should be easy, first read the introduction below and you can directly get started!

RDF

RDF stands for Resource Description Framework. Briefly, it is a conceptual way of representing data as a graph, as opposite to relational databases, focusing on tables and their relations. Within RDF, the information is encoded as triples. The nodes and edges of the RDF graph are identified using Universal Resource Identifiers (URI or web addresses). Representing data as a graph is interesting because it simplifies the integration of information from different sources. URIs guarantees the uniqueness of resources and allow you to simply explore them with your web browser.

SPARQL

SPARQL is a language used to query RDF data. It is fairly similar to SQL, yet better standardised and more flexible to query multiple data sources or services in the same time. A SPARQL endpoint is a web address you use to access the RDF data and run SPARQL queries. The ChEMBL SPARQL enpoint is located at http://www.ebi.ac.uk/rdf/services/chembl/sparql

What should I consider using the SPARQL endpoint?

Semantic web technologies provide two main advantages. First, they remove the need to maintain, update, download, parse and handle flat files or databases. You can query the ChEMBL data directly from the web, in a fully automated way. RDF helps you to focus entirely on the query, where the real scientific value is.

Secondly, it becomes easier to integrate the data from another provider. For instance, when you analyse ChEMBL data, you may realise that it would be interesting to combine your current results with gene expression or pathway information. SPARQL allows you to do this easily, as illustrated in the example queries below.

How do I run the queries?

Simply click on the link to open and run directly the queries from your web browser. You can also copy and paste the queries from the files in the web form on the SPARQL endpoint form. Queries contain a comment in the first lines (lines starting with #), summarising what they do. Do not paste the commented lines on the web form, the endpoint does not support them yet.

It is also possible to run the queries from R or with command lines. More examples will come to demonstrate this feature. Finally, you can also check the ChEMBL endpoint documentation or contact us if you are facing any problems.

SPARQL queries over the ChEMBL endpoint

Queries are listed by degree of difficulty. More complex queries require a better understanding of the RDF graph's underlying structure. You can find a summary map of the structure here.

A. Simple SPARQL queries

Retrieve ChEMBL molecule from the trade name ("sildenafil"). file or see it live
Retrieve the molecular formula of ChEMBL molecule having ChEMBL-id "CHEMBL192". file or see it live
Retrieve rotational bond of ChEMBL molecule having ChEMBL-id "CHEMBL192". file or see it live
Retrieve trade name of CHEMBL192 molecule. file or see it live
Retrieve the ChEMBL molecules URI having molecular formula is combination of “C22H30N6O4S”. file or see it live

B. Moderate difficulty

Retrieve substance types having target type "cell-line". file or see it live

Retrieve target types available in ChEMBL rdf triple store. fileor see it live
Retrieve compound activity details for all target. fileor see it live
Retrieve all the bioactive ChEMBL molecules for bacterial target. fileor see it live
Retrieve ChEMBL molecules targeting “Firefly Luciferase”. fileor see it live
Retrieve target details, uniprot_reference and sequences for proteins target. fileor see it live
Retrieve ChEMBL molecules activity details for all targets containing a protein of interest, and protein of interest is human M2 muscarinic receptor (P08172). fileor see it live
Retrieve ChEMBL molecules activity details for a target, and target is Human PDE5 (CHEMBL1827). fileor see it live
Retrieve ChEMBL molecules activity details for all target. fileor see it live

In some of the queries, I have used the filter function even is not needed but just to make extra column which give satisfaction for correct output. I helps If you are new for triple store. For example, I am interested in ChEMBL-id of molecules having activity standard type "IC50" then we can put "IC50" value at standard type but to make a extra column to show that I have selected the correct standard type, can use filter. We can add standard type as a new column having constant text "IC50" without filter. These differences make change in running time of query. To analyse this kind of changes, I have made query for about same problem in different way like last 5 ChEMBL queries.

Note: Try to avoid the use of filter function in SPARQL query, because it takes more time for running.

Retrieve ChEMBL molecules ChEMBL-ID, activity standard type, activity standard unit having activity standard type "IC50" and standard unit "nM" using filter. file or see it live

Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM". file or see it live
Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM" having extra columns with variable name that contain constant text about standard type and standard unit. file or see it live
Retrieve ChEMBL molecules ChEMBL-ID having activity standard type "IC50" and activity standard unit "nM" having extra columns that contains constant text about standard type and standard unit. file or see it live
Retrieve ChEMBL molecules ChEMBL-ID, activity standard type, activity standard unit having activity standard type "IC50" and standard unit "nM" using filter but two conditions in a single filter. file or see it live

SPARQL queries for metadata

If you are new in querying RDF triple store then you can try these queries, because these work on any SPARQL endpoint. It will help to get familiar with contains of triple store.

Retrieve all available triples from triple store

Fedarated SPARQL queries or other than ChEMBL endpoint queries

Retrieve known diseases from uniprot. file or see it live
Retrieve the proteins and their sequence involved in Alzheimer disease (Runs on ChEMBL SPARQL endpoint). fileor see it live
Retrieve the proteins and their sequence involved in Alzheimer disease (Runs on UniProt SPARQL endpoint). fileor see it live
Retrieve total number of known diseases in uniprot. fileor see it live

loopasam / chembl-rdf-queries Goto Github PK

chembl-rdf-queries's Introduction

Introduction

RDF

SPARQL

What should I consider using the SPARQL endpoint?

How do I run the queries?

SPARQL queries over the ChEMBL endpoint

A. Simple SPARQL queries

B. Moderate difficulty

SPARQL queries for metadata

Fedarated SPARQL queries or other than ChEMBL endpoint queries

chembl-rdf-queries's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent