Giter VIP home page Giter VIP logo

rexgen_direct's Introduction

Python 3 Quickstart

Go to website directory and start the Django app:

cd ../website
python manage.py runserver

Go to http://localhost:8000/visualize/test in a browser to use the interactive visualization tool on the test dataset predictions Go to http://localhost:8000/visualize/validation in a browser to use the interactive visualization tool on the validation dataset predictions

rexgen_direct

Template-free prediction of organic reaction outcomes using graph convolutional neural networks

Described in A graph-convolutional neural network model for the prediction of chemical reactivity

Dependencies

  • Python (trained/tested using 2.7.6, visualization/deployment compatible with 3.6.1)
  • Numpy (trained/tested using 1.12.0, visualization/deployment compatible with 1.14.0)
  • Tensorflow (trained/tested using 1.3.0, visualization/deployment compatible with 1.6.0)
  • RDKit (trained/tested using 2017.09.1, visualization/deployment compatible with 2017.09.3)
  • Django (visualization compatible with 2.0.6)

note: there may be some issues with relative imports when using Python 2 now; this should be easy to resolve by removing the periods preceding package names

Instructions

Looking at predictions from the test set

cd into the website folder and start the Django app using python manage.py runserver. Go to http://localhost:8000/visualize in a browser to use the interactive visualization tool

Using the trained models

You can use the fully trained model to predict outcomes by following the example at the end of rexgen_direct/rank_diff_wln/directcandranker.py

Retraining the models

Look at the two text files in rexgen_direct/core_wln_global/notes.txt and rexgen_direct/rank_diff_wln/notes.txt for the exact commands used for training, validation, and testing. You will have to unarchive the data files after cloning this repo.

Python 3 Instructions

Copy 1976_Sep2016_USPTOgrants_smiles.rsmi into rexgen_direct/data.

Run data preprocessing script

cd rexgen_direct/data
python prep_data.py

Create cbond_detailed file

cd ../core_wln_global

python nntest_direct.py --test ../data/custom_filtered.rsmi.proc --hidden 300 --depth 3 --model model-300-3-direct --checkpoint ckpt-140000 \
--verbose 1 --detailed 1 > model-300-3-direct/new_data.cbond_detailed

Sample Output:

After seeing 200, acc@12: 0.725, acc@16: 0.745, acc@20: 0.755, acc@40: 0.810, acc@80: 0.850
After seeing 300, acc@12: 0.727, acc@16: 0.743, acc@20: 0.760, acc@40: 0.817, acc@80: 0.860
After seeing 400, acc@12: 0.730, acc@16: 0.750, acc@20: 0.765, acc@40: 0.812, acc@80: 0.855
After seeing 500, acc@12: 0.736, acc@16: 0.752, acc@20: 0.764, acc@40: 0.810, acc@80: 0.846
After seeing 600, acc@12: 0.735, acc@16: 0.752, acc@20: 0.765, acc@40: 0.817, acc@80: 0.853
After seeing 700, acc@12: 0.743, acc@16: 0.759, acc@20: 0.770, acc@40: 0.819, acc@80: 0.851
After seeing 800, acc@12: 0.744, acc@16: 0.760, acc@20: 0.774, acc@40: 0.824, acc@80: 0.855
After seeing 900, acc@12: 0.746, acc@16: 0.761, acc@20: 0.773, acc@40: 0.821, acc@80: 0.852
After seeing 1000, acc@12: 0.744, acc@16: 0.763, acc@20: 0.775, acc@40: 0.826, acc@80: 0.857
After seeing 1100, acc@12: 0.750, acc@16: 0.768, acc@20: 0.782, acc@40: 0.831, acc@80: 0.862
After seeing 1200, acc@12: 0.748, acc@16: 0.766, acc@20: 0.780, acc@40: 0.827, acc@80: 0.858
...

Get bond predictions - includes reactivity scores in output

cd ../rank_diff_wln

python nntest_direct_useScores.py --test ../data/custom_filtered.rsmi.proc --cand ../core_wln_global/model-300-3-direct/new_data.cbond_detailed --hidden 500 --depth 3 --ncand 1500   --ncore 16 --model model-core16-500-3-max150-direct-useScores --checkpoint ckpt-2400000 --verbose 1 > model-core16-500-3-max150-direct-useScores/new_data.cbond_detailed_2400000

python ../scripts/eval_by_smiles.py --gold ../data/custom_filtered.rsmi.proc --pred model-core16-500-3-max150-direct-useScores/new_data.cbond_detailed_2400000 --bonds_as_doubles true

Go to website directory and start the Django app:

cd ../website
python manage.py runserver

Go to http://localhost:8000/visualize/test in a browser to use the interactive visualization tool on the test dataset predictions Go to http://localhost:8000/visualize/validation in a browser to use the interactive visualization tool on the validation dataset predictions

rexgen_direct's People

Contributors

bobbyjudd avatar connorcoley avatar

rexgen_direct's Issues

Fix mol_graph.py bug when loading custom dataset

In rexgen_direct/core_wln_global/nntest_direct.py the line:
smiles2graph_batch = partial(_s2g, idxfunc=lambda x:x.GetIntProp('molAtomMapNumber') - 1)

The lambda body x.GetIntProp('molAtomMapNumber') tries to reference an integer property for and Atom object. See the RDKit docs about this property: https://www.rdkit.org/docs/RDKit_Book.html?highlight=molatommapnumber#magic-property-values

When running the custom Patent DB dataset, this code generates a KeyError:

File "nntest_direct.py", line 191, in
bindex_to_o = {val:key for key, val in bo_to_index.iteritems()}
AttributeError: 'dict' object has no attribute 'iteritems'
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/bobby/anaconda3/envs/my-rdkit-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/home/bobby/anaconda3/envs/my-rdkit-env/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "nntest_direct.py", line 168, in read_data
src_tuple = smiles2graph_batch(src_batch)
File "/home/bobby/projects/ucla/cs260/final_proj/rexgen_direct/rexgen_direct/core_wln_global/mol_graph.py", line 143, in smiles2graph_list
res = list(map(lambda x:smiles2graph(x,idxfunc), smiles_list))
File "/home/bobby/projects/ucla/cs260/final_proj/rexgen_direct/rexgen_direct/core_wln_global/mol_graph.py", line 143, in
res = list(map(lambda x:smiles2graph(x,idxfunc), smiles_list))
File "/home/bobby/projects/ucla/cs260/final_proj/rexgen_direct/rexgen_direct/core_wln_global/mol_graph.py", line 81, in smiles2graph
idx = idxfunc(atom)
File "nntest_direct.py", line 52, in
smiles2graph_batch = partial(_s2g, idxfunc=lambda x:x.GetIntProp('molAtomMapNumber') - 1)
KeyError: 'molAtomMapNumber'

The included preprocessed datasets (train.txt.proc, test.txt.proc, valid.txt.proc) do not generate this error, however both datasets have been preprocessed by the prep_data.py script.

Need to determine what is unique to the custom dataset that is causing that is causing the error.

Add route to webserver for visualizing custom dataset predictions

Add another route to the Django app to host the visualization of the prediction output on the custom dataset. The route should be at http://localhost:8000/visualize/custom.

Start by adding the custom route here:

urlpatterns = [
path('', views.index, name='index'),
path('ajax', views.display, name='display')
]

Then add view functions (index and display) for custom here

def index(request):
return render(request, 'example.html', {})
def display(request):
print('Got request')
data = {'err': False}
index = int(request.GET.get('index', 1))
showmap = json.loads(request.GET.get('showmap', 'true'))
highlight = int(request.GET.get('highlight', 0))
print(index)
atts = [highlight] if highlight else []
img = do_index(index - 1, showmap=showmap, atts=atts)
buffered = BytesIO()
img.save(buffered, format="png")
img_str = base64.b64encode(buffered.getvalue()).decode()
data['img_str'] = img_str
data['index'] = index
return JsonResponse(data)

The index and display functions will be pretty much the same but the display function needs to specify the correct dataset when calling do_index() and also need to add another html template to request the new custom display route.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.