Giter VIP home page Giter VIP logo

surechembl-mini-client's Introduction

SureChEMBL mini data client

Create a SQL table of compounds from SureChEMBL database and interlink SureChEMBL IDs with compound sets

Features

  • Client can be used as a Python package or from command-line interface;
  • The client was designed having Postgres in mind, however other databases should work too but with slower insertion of data;
  • surechembl-data-client can accomplish the same task but significantly slower. The client loads all data from FTP (e.g. links to publications, patent office IDs) and uses INSERT method which is slower than Postgres COPY;
  • Load frontfiles for a specifc day, month or year. Default to be used to load new patent data provided by EBI daily i.e. schedule the script using crontab to run every day;
  • If file directory is not found a backlog is created to load the directory on the next scheduled time;
  • map_cmpd_id_surechembl_id.sql performs mapping between SureChEMBL compounds and in-house compound table (must have an InChI column) and returns interlinked compounds. Comment in/out the second snippet after UNION to enable matching while ignoring stereochemical layer.

Dependecies

  • Database account with COPY/INSERT/CREATE TABLE/ALTER TABLE privilleges;
  • Python DBAPI driver for a database of your choice. Postgres (psycopg2), Oracle (cx_oracle), MySQL (mysqldb);
  • Conda to create environment using environment.yml;
  • Contact SureChEMBL support team for the FTP account credentials.

Installation

cd surechembl_mini_client
pip install .

or without pip

python setup.py install

Example usage from CLI

Loads frontfile for a current day

surechembl_mini_client -fu my_ftp_user -fp my_ftp_password -du my_db_user -dp my_db_password -dh my_db_host -port my_db_port -dn my_db_name -dt my_db_type --frontfile

Loads frontfile for a specific day

surechembl_mini_client -fu my_ftp_user -fp my_ftp_password -du my_db_user -dp my_db_password -dh my_db_host -port my_db_port -dn my_db_name -dt my_db_type --frontfile -cd 18 -cm 3 -cy 2017

Loads backfile for a specific year range

surechembl_mini_client -fu my_ftp_user -fp my_ftp_password -du my_db_user -dp my_db_password -dh my_db_host -port my_db_port -dn my_db_name -dt my_db_type -sy 2013 -ey 2018

Example usage within Python

from surechembl_mini_client import surechembl_mini_client
surechembl_mini_client(<arguments>)

Working principle

  • Connect to FTP server and get tsv directory information (can be more than one) from newfiles.txt;
  • If newfiles.txt is not present look for a tsv file to parse in the same directory;
  • Download tsv, parse, load to pandas and drop duplicates;
  • Load to DB and drop duplicates in the database, add primary key back.

Authors

  • Written by Aretas Gaspariunas. Have a question? You can always ask and I can always ignore.

Disclaimer

This client is not an official release by EBI so please use it at your own risk.

surechembl-mini-client's People

Contributors

aretasg avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.