Giter VIP home page Giter VIP logo

enzymemap's Introduction

enzymemap

Python package to atom-map, correct and suggest enzymatic reactions

Cite us

If you use EnzymeMap, please cite our publication "EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions" by E. Heid, D. Probst, W. H. Green and G. K. H. Madsen. Check our preprint on ChemRxiv.

News

  • August 2023: VERSION 2: A new version of EnzymeMap is released including important bugfixes for isomerase reactions and some reactions containing protons, as well as the addition of protein information. This makes the raw and processed files rather large. For your application, if no protein information is required, you should delete the respective columns and then drop duplicates.

Database

To simply use the EnzymeMap database, use data/processed_reactions.csv.gz (corresponds to the newest version, currently v2) or download EnzymeMap from Zenodo:

Within python (with a valid enzymemap installation) you can also run enzymemap.get_data() (corresponds to the newest version, currently v2).

Installation

Download enzymemap from Github:

git clone https://github.com/hesther/enzymemap.git
cd enzymemap

Set up a conda environment (or install the packages in environment.yml in any other way convenient to you):

conda env create -f environment.yml
conda activate enzymemap

Install the enzymemap package:

pip install -e .

Reproduce our study: Recreate EnzymeMap

Extract BRENDA in the data folder (run tar -xzvf brenda_2023_1.txt.tar.gz in the data folder).

Go to the scripts folder and run

python make_raw.py

to produce data/raw_reactions.csv, data/compound_to_smiles.json and ec_nums.csv. This step processes BRENDA entries and resolves all trivial names to SMILES. You might need to download a new opsin.jar from the internet that is suitable for your system. We also provide the three processed files, so you can continue with the following steps without running make_inital.py

Then, for each EC number run process.py, for example to process EC number 1.1.3.2:

python process.py 1.1.3.2

This produces data/processed_reactions_1.1.3.2.csv. Run this for all EC numbers (it is best to parallelize this over many cores). You can also run this the individual calculations on different machines. Once all calculations are done, run

python concatenate.py

to make one dataframe containing all EC numbers. You now have recreated EnzymeMap.

Reproduce our study: Train and evaluate machine learning models

Run the scripts analysis_preprocess.py (process data), analysis_temprel.py (train template relevance model, use conda environment from templatecorr), analysis_chemprop.py(train CGR-chemprop model, use conda environment from chemprop) and analysis_plot (plot results).

Reproduce our study: Additional benchmarks

Follow the instructions in the additional_benchmarks folder to process KEGG and MetaCyc.

Copyright

Copyright (c) 2023, Esther Heid

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.6.

enzymemap's People

Contributors

hesther avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.