Giter VIP home page Giter VIP logo

labmate.ml's Introduction

LabMate.ML

LabMate.ML was designed to help identifying optimized conditions for chemical reactions.

Installation

In order to use LabMate.ML you must first have Anaconda installed on your machine. Once installed, use the below script to download the relevant dependencies and files.

$ git clone https://github.com/tcorodrigues/LabMate.ML.git
$ cd LabMate.ML
$ conda env create -f environment.yml

This will download and install the required dependencies for LabMate.Ml and store them in the conda environment LabMateML To access this environment again later, simply type the below command in the terminal:

conda activate LabMateML

initializer.py

The initializer script creates the two text files needed to run LabeMate.AI:

  1. A file containint all possible combinations of reaction conditions (all_combos.txt)
  2. Another containing a sample (n >= 10) of reaction conditions from the combinations (train_data.txt)

The script can be run using the below command in the terminal:

$ python initializer.py

After performing the reactions, add a column in the end of the train_data.txt file mentioning the reaction yield/conversion or similar (sample file available)

Customization

Different aspects of the initialisation process can be achieved by specifying the values in the command line as detailed below:

$ python initializer.py -- help

    usage: initializer.py [-h] [-i INIT_DIR] [-b BOUNDARY] [-s SEED] [-n N_SAMPLES]
    
    optional arguments:
      -h, --help,      show this help message and exit
      -i, --init_dir,  default='init_files',     dir to save files to.
      -b, --boundary,  default='Boundaries.yml', File containing boundary ranges.
      -s, --seed,      default=1,                Random seed for sampling.
      -n, --n_samples, default=10,               Number conditions to sample.

Hence, an initialisation using 20 random samples instead of 10 would be as below:

$ python initializer.py --n_samples 20

Bondaries.yml

The Boundaries file allows for customisation of the different parameters of the reaction you wish to optimize over, and can be edited as a regular text file using notepad or equivalent. To include a new reaction condition to be optimised, simply add a keyword describing the condition and also list all values you wish that condition to be evaluated at. For example, if we wished to list the stirring rate of the reaction at 100 and 200 rpm, then the below would be added:

StirRate:
- 100
- 200

optimizer.py

This script implements a routine to search for the next best experiment to be carried out. It requires initializer.py to have been run first to generate the required files.

To run LabMate.ML, open a terminal and navigate to the directory containing the Python script, the train_data.txt and the all_combos.txt files. Then use the below command:

python optimizer.py

Just as with initializer.py there are a number of optinal command line arguments which can be specified:

$ python optimizer.py -- help

    usage: optimizer.py [-h] [-o OUT_DIR] [-t TRAIN_FILE] [-i INIT_DIR] [-s SEED] [-m METRIC] [-c COMBOS_FILE] [-j JOBS]
    
    optional arguments:
      -h, --help,           show this help message and exit
      -o, --out_dir,        default='output_files',            dir to save files to.
      -t, --train_file,     default='train_data.txt',          Training data location.
      -i, --init_dir,       default='init_files',              dir to load files from.
      -s, --seed,           default=1,                         Random seed value.
      -m, --metric,         default='neg_mean_absolute_error', Metric for evaluatng hyperparameters.      
      -c, --combos_file,    default=all_combos.txt,            File containing all reaction combinations.
      -j, --jobs,           default=6,                         Number of parallel jobs when optimising hyperparameters.

Note : the grid search routine will use 6 CPUs unless otherwise specified so make sure there are enough computational resources available.

File Requirements

The columns in the txt files must be tab separated.

train_data.txt

  • Fist column is the reaction identifier
  • Last column is the reaction yield/conversion
  • Columns in the middle correspond to the descriptor set
  • File must be named train_data.txt, otherwise it will not be recognised by the script.

all_combos.txt

  • Fist column is the reaction identifier
  • Following columns correspond to the descriptor set
  • File must be named all_combos.txt, otherwise it will not be recognised by the script.

Output files:

  • best_score.txt : saves the negative mean absolute error value (lower absolute value is better)
  • feature_importances.txt : importance (given in the range of 0-1) for each descriptor, according to the random forest algorithm
  • selected_reaction.txt : this is the next best experiment, as suggested by LabMate.AI
  • predictions.txt : predictions for all possible reactions
  • random_forest_model_grid.sav : saves the model

labmate.ml's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

labmate.ml's Issues

'import initializer' overwrites train_data.txt

OS: Windows 10 / miniconda

Running optimizer.py having added the yield column to init_files/train_data.txt file.

File "optimizer.py", line 95, in
predictions = model2.predict(X2)
File "C:\Users\Jacob\miniconda3\envs\LabMateML\lib\site-packages\sklearn\ensemble_forest.py", line 766, in predict
X = self._validate_X_predict(X)
File "C:\Users\Jacob\miniconda3\envs\LabMateML\lib\site-packages\sklearn\ensemble_forest.py", line 412, in validate_X_predict
return self.estimators
[0]._validate_X_predict(X, check_input=True)
File "C:\Users\Jacob\miniconda3\envs\LabMateML\lib\site-packages\sklearn\tree_classes.py", line 380, in _validate_X_predict
X = check_array(X, dtype=DTYPE, accept_sparse="csr")
File "C:\Users\Jacob\miniconda3\envs\LabMateML\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "C:\Users\Jacob\miniconda3\envs\LabMateML\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

NaNs were generated because in optimizer.py:

import initializer

leads to init_files/train_data.txt being overwritten with the 10 random experiments suggested by initializer.py, hence removing the added yield column.

Line 87: df_train_corrected = train.iloc[:,:-1]

now removes final column which is parameter, NOT yield. Therefore NaNs appear and cannot optimize later.

Input types

Hello, in the example_files you have refrained from using categorical variables and I was wondering if it was done on purpose. Would it be ok to have variables indicating the different types of catalysts and solvents ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.