sysbiochalmers / dlkcat Goto Github PK

Deep learning and Bayesian approach applied to enzyme turnover number for the improvement of enzyme-constrained genome-scale metabolic models (ecGEMs) reconstruction

Python 60.86% MATLAB 39.09% Shell 0.05%

deep-learning bayesian enzyme-turnover-number enzyme-constraints kinetics kcat

dlkcat's Introduction

DLKcat

Introduction

The DLKcat toolbox is a Matlab/Python package for prediction of kcats and generation of the ecGEMs. The repo is divided into two parts: DeeplearningApproach and BayesianApproach. DeeplearningApproach supplies a deep-learning based prediction tool for kcat prediction, while BayesianApproach supplies an automatic Bayesian based pipeline to construct ecModels using the predicted kcats.

Usage

Please check the instruction README file under these two section Bayesianapproach and DeeplearningApproach for reporducing all figures in the paper.
For people who are interested in using the trained deep-learning model for their own kcat prediction, we supplied an example. please check usage for detailed information in the file DeeplearningApproach/README under the DeeplearningApproach.
- input for the prediction is the Protein sequence and Substrate SMILES structure/Substrate name, please check the file in DeeplearningApproach/Code/example/input.tsv
- output is the correponding kcat value

Citation

Please cite the paper Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction""

Notes

We noticed there is a mismatch of reference list in Supplementary Table 2 of the publication, therefore we made an update for that. New supplementary Tables can be found here

Contact

Feiran Li (@feiranl), Chalmers University of Technology, Gothenburg, Sweden
Le Yuan (@le-yuan), Chalmers University of Technology, Gothenburg, Sweden

Last update: 2022-04-09

dlkcat's People

Contributors

Stargazers

Watchers

dlkcat's Issues

A problem during conducting example

(base) PS C:\Users\Lenovo> cd D:\DLKcat-master\DLKcat-master\DeeplearningApproach\Code\example
(base) PS D:\DLKcat-master\DLKcat-master\DeeplearningApproach\Code\example> python prediction_for_input.py input.tsv
Traceback (most recent call last):
File "prediction_for_input.py", line 14, in
from rdkit import Chem
File "D:\Anaconda\lib\site-packages\rdkit_init_.py", line 38, in
from . import rdBase
ImportError: DLL load failed while importing rdBase: 找不到指定的模块。

Thank you provides helpful suggestions. When I conducted the software according to the README, it remind me these errors. I have install all package, I don't know the reason appearing this errors. Thank you.

RMSE

Hi,

I encountered some trouble while trying to use DLKcat. I am doing some research on E. coli and using DLKcat to predict the kcats would be very helpful. In the paper, you achieved a 1.06 RMSE with respect to the test dataset. Nevertheless, when I do my own calculations with the data from your database I get 13742.58 for the whole set, 14360.11 for only the E. coli substrates and 29778.05 for only the E. coli wildtype substrates. I already double checked my implementation of the RMSE calculation (I do it twice with numpy and sklearn).

If you apply this patch 0001-E.-coli-workflow.patch with the instructions in this stackoverflow thread you can get all the python and shell scripts I use to extract the data, make the predictions and calculate the RMSE. After applying the patch, you should change into the DeepLearningApproach directory and then unzip the Data/input.zip file (as per your instructions). In the DeepLearningApproach directory you can find the ecoli.sh script, to run the script executesh ecoli.sh. The ecoli.sh script extracts the data from your dabase and separates it according to our needs (using DeepLearningApproach/Data/database/extract_subtrates.py). Then, it uses DLKcat to make the predictions (just as indicated in the repository python3 prediction_for_input.py input_file.tsv). Later on, it merges the predictions with the measured values (using DeepLearningApproach/Data/database/merge_dbs.py) and finally it calculates the RMSE (using DeepLearningApproach/Data/database/rmse.py).

I would be very helpful if you could tell us how did you calculate the RMSE and whether the predictions made by the model are correct.

Thanks in advance.

Erick Quintanar

Dear author, you know that you have done a famous and complex work. Then, I meet a question that proteome data NOT 404

https://github.com/SysBioChalmers/DLKcat/tree/master/BayesianApporach/Data/Proteome_ref.xlsx. this website did not Find Page 404. Hope you can help me, thank you very much.

A strange error

When I run this package, it reminds me a error, then I look the pip list, I found that the rdkit has been install. So, I want to ask a help. Thank you very much.

Dataset problem

Thanks for your great work!
I have a question about dataset, the dataset of github has 17010 samples, including 9529 wildtype and 7481 mutant. But the dataset used in published article only has 16838 samples, is there any difference?
If possible, could you mail me the 16838 samples and other samples in separate format, [email protected] thanks again!

您好，使用input例子prediction_for_input.py预测的Kcat是None

请问这个怎么解决呢，谢谢

Training RMSE/R2 not optimized after 50 epochs?

I attempted to train the DL model on the included Data/input dataset, but the RMSE/R2 performance for the optimal hyperparameters does not seem to improve well within 50 epochs.

The training command I am using in Code/model is

# Conda environment: python=3.8.13=h12debd9_0, pytorch=1.11.0=py3.8_cuda11.3_cudnn8.2.0_0, 
#   numpy=1.22.3=py38h99721a1_2
                    #dataset   radius  ngram   dim     layer_gnn   window   layer_cnn  layer_output   lr   lr_decay   decay_interval  weight_decay  iteration  setting
python -u run_model.py  all      2      3     20            3       11         3           3        1e-3     0.5            10          1e-6           50         test

My MAEs output file using the same seed in the source file is

Epoch	Time(sec)	RMSE_train	R2_train	MAE_dev	MAE_test	RMSE_dev	RMSE_test	R2_dev	R2_test
1	102.55310953292064	1.4312380283147441	0.09236332967789695	1.125049116795749	1.1250578585008904	1.4471744973977028	1.4677463388764231	0.060686416041210056	0.04520540426306441
2	203.64777798799332	1.362037256540892	0.17801049164963223	1.0671505529348142	1.058187257840833	1.3639318885468796	1.366977607844821	0.16563866907561975	0.1718085517220319
3	307.93242862902116	1.3323272460802105	0.2134793698365396	1.0579200712753927	1.0571677217552549	1.35433410797124	1.3544796493888676	0.17733990055978488	0.18688324676111367
4	417.40169664192945	1.31506452348586	0.23372893985948295	1.0468607239454852	1.0368631535890154	1.3480473355980844	1.346872249878544	0.184959693811926	0.19599129511149382
....
199	19898.319774403004	1.1893049201013368	0.3732783018122712	1.014455404012681	1.0182008302795162	1.3319525524402698	1.3474205021762213	0.20430558351836126	0.19533660891554494
200	19997.01809749799	1.189304904289939	0.3732783184763676	1.0144553978328625	1.018200828574738	1.3319525447368734	1.3474204932245593	0.2043055927222167	0.19533661960719573

However, the training R2 reaches 0.86 by the 50th epoch in the original MAEs output, and should even perform better by the second epoch. Are the hyperparameters I am running being set correctly?

Example input.tsv file in DeeplearningApproach yields no kcats when run

Hi, I am hoping to use your DLKcat to predict turnover numbers for a few hundred reactions in a model. In order to do this, I have just tried to run python prediction_for_input.py input.tsv as detailed in the README.rst file in DLKcat/DeeplearningApproach/, but all three of the inputs give a kcat value of "none". Note that this is using the input.tsv file given in the cloned github repository, I have not made any changes.

Do I need to run the preprocessing folder before I can use the tool to predict kcats? If so, running
python brenda_retrieve.py gives the following error:

Traceback (most recent call last):
  File "../DLKcat/DeeplearningApproach/Code/preprocess/brenda_retrieve.py", line 121, in <module>
    os.chdir(output_path)
FileNotFoundError: [Errno 2] No such file or directory: '../../Data/database/brenda_ec'

Does the file brenda_ec need to be downloaded first? If so, where from? Running python brenda_download.py returns:

File "../DLKcat/DeeplearningApproach/Code/preprocess/brenda_download.py", line 51
    print 'Succesfully constructed ' + previous + ' file.'
          ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Succesfully constructed ' + previous + ' file.')?

Please let me know if there is some step that I am missing.

Failed parsing SMILES

Hi,

I encountered "SMILES Parse Error"::

"SMILES Parse Error: syntax error while parsing: Naphthalene-1,6-diol"
"SMILES Parse Error: Failed parsing SMILES 'Naphthalene-1,6-diol' for input: 'Naphthalene-1,6-diol'"

The substrate name is "Naphthalene-1,6-diol". The SMILES of my substrate is "C1=CC2=C(C=CC(=C2)O)C(=C1)O", I don't know what's wrong with this SMILES.

Protein sequence is

"MSEAADVERVYAAMEEAAGLLGVACARDKIYPLLSTFQDTLVEGGSVVVFSMASGRHSTELDFSISVPTSHGDPYATVVEKGLFPATGHPVDDLLADTQKHLPVSMFAIDGEVTGGFKKTYAFFPTDNMPGVAELSAIPSMPPAVAENAELFARYGLDKVQMTSMDYKKRQVNLYFSELSAQTLEAESVLALVRELGLHVPNELGLKFCKRSFSVYPTLNWETGKIDRLCFSVISNDPTLVPSSDEGDIEKFHNYATKAPYAYVGEKRTLVYGLTLSPKEEYYKLGAYYHITDVQRGLLKAFDSLED"

I tried inputting only substrate name or only substrate SMILES, neither worked.

Please help. Thanks

Data Missing - 343 Species

I am unable to run predict_kcat_343_species.py because the 343 species input data has not been provided inside Data/input.zip yet.
According to the code, there should be a folder named "kcatpredictionfile" containing all input data files for 343 species inside input.zip but I unzipped it and it was not present there.
Can you please tell me how to access this data?

Hyperparameter file for DLKcat

Hello Yuan,

I am Jingjing Wang, a PhD from the Central South University and research on enzyme activity prediction. I have the honor to learn your work DLkcat, which is a very valuable work for reference. I am trying to run the “run_model.py” under the DeeplearningApproach module. However, the value of the hyperparameter is missing such as "ValueError: not enough values to unpack (expected 14, got 0)" in line 184. Could you please share the Hyperparameter file or help me find it in somewhere? Thank you very much!

isssue about runing prediction_for_input.py

Dear researcher，
I am Dr. Song Lu from East China University of Science and Technology. I tryied to use DLKcat to predict Kcat, but I have met a issue. I hope you can help me. When I run "prediction_for_input.py' ，an error occurs.
Traceback (most recent call last):
File "G:\Anaconda-ProgramData\anaconda3\envs\Aconda\DLKcat-1.0.0\DLKcat-1.0.0\DeeplearningApproach\Code\example\prediction_for_input.py", line 276, in
main()
File "G:\Anaconda-ProgramData\anaconda3\envs\Aconda\DLKcat-1.0.0\DLKcat-1.0.0\DeeplearningApproach\Code\example\prediction_for_input.py", line 166, in main
name = sys.argv[1:][0]
IndexError: list index out of range
Process finished with exit code 1

The code in "prediction_for_input.py" is
def main() :
name = sys.argv[1:][0]
print(name)
# with open('./input.tsv', 'r') as infile :
with open(name, 'r') as infile :
lines = infile.readlines()

How can I solove it？

Sincerely yours

error running example

When I run
python prediction_for_input.py input.tsv

I see the following:
example\prediction_for_input.py", line 14, in
from rdkit import Chem
\Anaconda3\lib\site-packages\rdkit_init_.py", line 38, in
from . import rdBase
ImportError: DLL load failed while importing rdBase: The specified module could not be found.

Could you please help me troubleshoot the example?

RMSE and data

Dear researcher，
I have two questions. One is from the code I can see that the input is sequence and smile，but I find that some data has the same sequence and smile。
The other is that I found that kcat in the code passed the log2 conversion，But the graphs in this article are all log10 transforms

Loss calculation confusion

Hi,
In run_model.py, I noticed that you had calculated the loss on log10 scaled values. Like, the data was initially scaled using the log2 function, then you unscaled the predictions using math.pow(2, value) and finally scaled them again using the log10 function before calculating the losses. Is there any reason for that?

I tried calculating the loss on unscaled data (removing the log10 transformation) and the values were very bad, to the point where they didn't make any sense. So, I was hoping that you could clear this confusion.

Can DLKcat parse the smiles including '.' ?

I test the DLKcat with the substrate 'CC(=O)[O-].C(C(C(C(C(C=O)O)O)O)O)O.[Na+]', however, I see the code of line 234 ('if smiles != None and "." not in smiles ') in 'prediction_for_input.py' indicates that DLKcat may not be able to process the smiles containing '.'. How to make DLKcat handle the smiles containing '.'? Thank U~

I can't find the file I'm looking for

I want to run predict_kcat_343_species.py,but I don't have a folder：/species/MLKCATRESULT/，and I don't know where to get it.What should I do?

Whe I install Client, I try to run brenda_retrieve.py ,but don’t work

https://user-images.githubusercontent.com/89283399/175548525-5dbedb05-225f-4e03-a44e-fb79fbf24c4c.jpg

result inconsistency

I have got the following output log from running the training program with the given input.zip
Epoch Time(sec) RMSE_train R2_train MAE_dev MAE_test RMSE_dev RMSE_test R2_dev R2_test 1 591.195759201888 1.4310122938416778 0.09264961151818307 1.15895686719243 1.1611571147606723 1.4972969782655918 1.5145627341092072 -0.005506072957258024 -0.016675772132620947 2 1090.850616039941 1.3617028701212768 0.1784140465265338 1.0707254525071925 1.0602946728850389 1.3691073123711561 1.368450262273908 0.15929470692289494 0.17002315778341426 3 1534.6800756510347 1.3319223949583234 0.2139572935315014 1.0562233414616296 1.058287865055471 1.352369526472039 1.3551003133745103 0.17972485209260902 0.1861378861154651 4 1982.719321589917 1.314800302997603 0.23403682463718345 1.0482630907684434 1.038917126175767 1.3482573750912759 1.347315154103144 0.18470569087754196 0.19546242921251789 5 2449.9478060430847 1.2982883485900327 0.2531547571645534 1.0504793185435264 1.0413684955859202 1.362666225560954 1.362424385335299 0.16718644472978494 0.17731655663820522 6 2923.605872184038 1.2892430618106956 0.2635251753836083 1.0449533914939675 1.037733565960516 1.344540413545488 1.3446257112099111 0.189194804644079 0.1986711784841254 7 3398.032353363 1.2809296190646564 0.27299259253020436 1.0492207294859914 1.0374832110578298 1.349456546188548 1.3464836808847171 0.18325476292609477 0.19645513685800542 8 3872.818253597943 1.2764843374648651 0.27802978573341874 1.0368018521283815 1.0226437339917291 1.3444044400241166 1.3437657711448956 0.18935878999067024 0.19969581218222632 9 4348.006794578861 1.272291057601492 0.28276537158593185 1.0370382761692067 1.0282282921936372 1.336719006779279 1.338332063513598 0.19860053468827477 0.20615501353541354 10 4823.263732406078 1.2544277221420868 0.30276434772114935 1.034751945343366 1.0224855046054504 1.34367034625276 1.3447504938448742 0.1902438275223377 0.19852244332212 11 5298.730970151024 1.250802309788802 0.30678867524884534 1.0308556552228685 1.0212400039725342 1.339567934562885 1.3421282163068404 0.195180874775861 0.20164517491961265 12 5790.012135254918 1.249284466767341 0.30847007214418365 1.0281859482972497 1.0241147116974407 1.3396902264147876 1.3424478756994833 0.19503392095390182 0.2012648356844624 13 6279.997288804036 1.2487245911673697 0.309089761233279 1.037453755876672 1.0250032838411414 1.3470848811054172 1.3435529610801293 0.18612309387039516 0.1999492779329034 14 6759.1621116509195 1.246268592067925 0.3118048614631497 1.0218105942283016 1.0182931709042489 1.3285972766977245 1.3363293728348642 0.2083093470175742 0.2085290677540933 15 7239.574647001922 1.243657568075824 0.31468547920012024 1.0258662998568473 1.0167069057270444 1.3379281257391111 1.3418849096252268 0.19715007946468088 0.20193460696358156 16 7720.250469012884 1.2427538176518076 0.31568113631104855 1.022289508596472 1.0212566321710606 1.3272325618504304 1.3368145433460943 0.2099349368120197 0.20795425727361072 17 8201.406676641898 1.2409392332288949 0.31767806888159655 1.02804079534088 1.029062821313317 1.3335178607822205 1.3455526499252448 0.20243428718069312 0.19756598061886632 18 8683.232042071875 1.2391008180147585 0.31969825135906493 1.0275487619699717 1.026321514462156 1.3307187084746934 1.3443238485723044 0.2057790714864518 0.19903092778517006 19 9171.627331119962 1.235930481560448 0.3231750084560434 1.0199825046871926 1.0214426903548757 1.3266638387325929 1.3375842800585298 0.21061188222413674 0.20704187610312152 20 9653.158925835975 1.2280508720650594 0.3317776219650187 1.0249795346297383 1.0222585413526852 1.3371855618176312 1.3463769080622725 0.1980410121453834 0.19658257004190727 21 10133.245105677051 1.2268415919675795 0.3330929911935663 1.0202867897323078 1.0228163603580704 1.3317596395121998 1.3465752453780548 0.20453605512699036 0.1963458467403444 22 10619.260446133092 1.225339932309686 0.3347245864328543 1.0252032594693492 1.0225990345323155 1.3336147136330416 1.3438771673571506 0.20231842923762877 0.19956311859480702 23 11098.99530522991 1.2237543815467924 0.3364451628294557 1.0205238796193676 1.014682897379745 1.3324662040294588 1.3403550272181193 0.2036917650939849 0.20375331763787885 24 11578.214861806948 1.222547502837304 0.3377533261738046 1.0166994519548636 1.015589107299357 1.327779731574269 1.3401064368974167 0.2092833723378087 0.20404864366509934 25 12060.263113182038 1.2223854757545094 0.33792885274584505 1.0171884316612478 1.014489764678536 1.3280698356568186 1.3400448574205 0.2089378102300642 0.20412179179825207 26 12537.473528872011 1.2207428057529035 0.3397070703161905 1.0163923021239458 1.0155852209334744 1.3278764670247392 1.3411058387977774 0.20916815275175027 0.20286101872090256 27 13019.709633457009 1.2201355912333105 0.3403637847948796 1.0153822659377587 1.013216302300726 1.3302068225438626 1.3431840029928943 0.2063899779433076 0.2003886274214648 28 13500.641379280947 1.2181565463320123 0.34250189302830425 1.0162941688987834 1.0170563738423943 1.328900970877519 1.3441204152240382 0.20794737228378535 0.19927332695325717 29 13981.695816969033 1.2179998441778639 0.3426710416374992 1.0205288984424843 1.0164080353943936 1.3305427953354452 1.3416801572393366 0.20598904022173237 0.20217813504182225 30 14463.345972521929 1.2115863073689332 0.3495753184467598 1.0143946457092268 1.013831145712079 1.325159889513219 1.3386899343047425 0.2124006197849745 0.2057304075283667 31 14941.541651160922 1.2107809113775123 0.35043976419842116 1.0158863929941269 1.0143224216741993 1.327816584776439 1.3413969058052313 0.20923947824385725 0.2025149669337507 32 15425.758035420906 1.2098874097448957 0.3513981027012 1.0129017113342975 1.0125410977325402 1.3241878741830448 1.3388435816331876 0.2135556168626409 0.20554807344008286 33 15903.838420873974 1.2088817595423076 0.35247588158272014 1.0166180478683864 1.0143861133008485 1.3273937807006215 1.3394777133662883 0.2097429868542764 0.20479532458846728 34 16381.76606452209 1.2084914409845477 0.3528939536995459 1.0238823862603401 1.0184741757152265 1.3361010534218891 1.3487600813670162 0.1993413231681982 0.19373585228275247 35 16859.796899234876 1.2077577091769975 0.35367949199227533 1.0231043640413608 1.0171466135949034 1.3324482095979935 1.3443681955401643 0.20371327260713668 0.19897808168618647 36 17337.28883381188 1.2075043519955257 0.3539506271143007 1.0197383503981932 1.0172863275324027 1.3314050063967917 1.3449648127560438 0.20495964552393608 0.19826695282617912 37 17814.388472107938 1.2069008920174378 0.3545962024626572 1.0139003009606713 1.0122298568824963 1.323555549678465 1.3350278202661712 0.21430652137167794 0.21007006412757545 38 18294.566484702984 1.2063507738773067 0.355184432070733 1.018017721491425 1.0160482316856054 1.3317478503206053 1.3445365688544468 0.2045501385021974 0.19877742362021344 39 18775.200564473867 1.205339326848055 0.3562652510038661 1.015270115909956 1.0148194953351057 1.3271771183532484 1.340109659177083 0.21000094341317643 0.20404481593786272 40 19256.930922385072 1.2019005279817085 0.35993312527105215 1.0170138723539506 1.0141057608023367 1.3319398242495764 1.3441128669397056 0.2043207908196386 0.1992823203374232 41 19740.665436755866 1.201307071066166 0.36056505499478175 1.0182052703928444 1.0134804916307856 1.3329345944739175 1.3427231115585765 0.20313182785469652 0.2009372801799797 42 20223.69574252097 1.2010085416357432 0.3608828195964452 1.0197738888773558 1.0156471783909278 1.3331782860352794 1.3465687339296881 0.20284042901020383 0.19635361896085646 ^A43 20704.910034436034 1.2006230372123892 0.36129304641688365 1.0165043597874228 1.0140598798360392 1.3272000864517517 1.3397483237884549 0.20997359976340313 0.20447398678953554 44 21190.67524313205 1.1999312794463344 0.36202883641575 1.0181458715244607 1.01549670399208 1.3341752218133252 1.3457772209645102 0.20164776926521333 0.19729810781649337 45 21672.443288628943 1.1996744202655976 0.36230193740781036 1.023584926061256 1.0177897693906421 1.336184664029265 1.3467243771994262 0.19924111270477862 0.19616782914393582 46 22169.23129803408 1.199399413875161 0.36259426829175767 1.017726853033988 1.012209334991535 1.333881387053907 1.3439952953159107 0.20199938395542028 0.19942239422966224 47 22678.441151530948 1.19917586411738 0.3628318515623933 1.015230241810684 1.0137984959948156 1.328850433339652 1.342964909689364 0.2080076139863828 0.20064946318002197 48 23159.814822416985 1.199241062144507 0.3627625652520673 1.0152764678559625 1.0141238354155635 1.3289847870799867 1.3424915552360672 0.20784745673186067 0.20121285757782925 49 23641.50736030587 1.1986315040337676 0.3634101990621107 1.0203820429272263 1.0164745394994696 1.3383168849921685 1.350726656446092 0.19668344650465652 0.19138297286740447 50 24119.111524166074 1.1968040202803967 0.36534986219141086 1.017074176121034 1.0138720537617314 1.331797063872256 1.3444620546089272 0.2044913471384201 0.1988662287041335

which differs with the output log stored under the "Results/output/MAEs--all--radius2--ngram3--dim20--layer_gnn3--window11--layer_cnn3--layer_output3--lr1e-3--lr_decay0.5--decay_interval10--weight_decay1e-6--iteration50.txt". By the way, I used the same hyperparameters.

To wrap up, the R^2 value is around 0.2, rather than 0.5 in the MAEs--all--radius2--ngram3--dim20--layer_gnn3--window11--layer_cnn3--layer_output3--lr1e-3--lr_decay0.5--decay_interval10--weight_decay1e-6--iteration50.txt .

nomoudule name SOAPpy

excuse me, I run the brenda_retrieve.py ,always erro not SOAPpy. I had tried many version about installing SOAPpy. I have no idea about this, could you help me slove this problem. Your model which reconstruction be uesd bayesianapproach and DL was very meaningful for the Biological experiments. Thank you for your work.

Unable to reproduce the results in Fig. 2a.

Hi,

DLKcat is an interesting work. But it is difficult for me to find suitable parameters to reproduce the results in Fig. 2a, and there is even a significant discrepancy. Can you provide the parameters corresponding to the data in fig.2a? Looking forward to your reply.

Best regards!

Input a new sequence for measure Kcat using DLKcat

Dear researcher,

I would like to know how to use or learn the new sequence of some kind of enzyme and the mutation. I tried with code in the example folder, but it is not precise compared to in vitro Kcat. Do I need to introduce the sequence for learning again?

two substrates

Hello, excuse me.
When using DL to predict kcat values, I can only input one substrate. However, If I have two substrates with protein to react, how can I calculate the predicted kcat value?

Reason for two kcat values in each PredcitedKcat343species file

Hi,

I am currently trying to use the PredcitedKcat343species results from https://zenodo.org/record/6438262#.Y3NLqi8w0dU , and I would like to know what the two different kcat values for each reaction refer to.

For example, the first reaction of the Saccharomyces_cerevisiae has

rxnID	MNXID	met_model	met_standard_name	Smiles	genes	refID	Kcat_value (substrate_first)

r_0001_1	MNXM731834;MNXM731974	(R)-lactate;ferricytochrome c	(R)-lactate;ferricytochrome c	C[C@@H](O)C([O-])=O;	YDL174C;YEL039C	chebi:16004;chebi:15991	13.8553,3.7425;

I see that it is "substrate first", but am I correct in thinking that this means that the first value gives the turnover number for the forward reaction and the second value is for the backwards reaction? If so, what do the two values mean when there are forward and backward reactions given, eg here:

r_0018_1_fwd	MNXM263;MNXM741173	2-oxoadipic acid;L-glutamate	2-oxoadipate;L-glutamate	[O-]C(=O)CCCC(=O)C([O-])=O;[NH3+][C@@H](CCC([O-])=O)C([O-])=O	YER152C	chebi:57499;chebi:29985	6.3271;3.6237
r_0018_1_rvs	MNXM20;MNXM268	2-oxoglutarate;L-2-aminoadipate	2-oxoglutarate;L-2-aminoadipate	[O-]C(=O)CCC(=O)C([O-])=O;[NH3+][C@@H](CCCC([O-])=O)C([O-])=O	YER152C	chebi:16810;chebi:58672	9.5759;3.3994

An explanation of which of the two values I should use in enzyme constrained modelling would be much appreciated.

Thanks,
Hettie

find the sequence

Dear researcher,

i use brenda_kcat_clean.py to get kcat data file, use ec number and organism in brenda_sequence_organism.py file to get sequences.

for each row of kcat data file (each reaction), there are ec number, organism and substrate (no uniprot id). if i use ec number and organism query, it might find many sequences with corresponding uniprot id. the question is how to know which sequence in found sequences is for this reaction? thanks.

The dimension of the adjacency matrix is not consistent

May I ask how to solve the problem of inconsistent dimensions in the adjacency matrix when constructing a dataset? Will you use padding?