cbouy / mols2grid Goto Github PK

View Code? Open in Web Editor NEW

205.0 205.0 25.0 13.94 MB

Interactive molecule viewer for 2D structures

Home Page: https://mols2grid.readthedocs.io

License: Apache License 2.0

Python 64.65% HTML 4.63% CSS 12.62% JavaScript 15.66% TypeScript 2.44%

cheminformatics jupyter molecule-viewer python rdkit visualization

mols2grid's Introduction

Hi, I'm Cédric 👋

I have a PhD in Chemistry and I love contributing to open source computational chemistry projects!

mols2grid's People

Contributors

Stargazers

Watchers

mols2grid's Issues

Using mols2grid in Google Colab

Is there a trick to getting mols2grid to run in Colab? I've been having trouble with this notebook.
https://colab.research.google.com/github/PatWalters/practical_cheminformatics_tutorials/blob/main/sar_analysis/matched_molecular_series.ipynb
When I run this notebook locally or on a server, it works. However, when I try to run in Colab, I get a "runtime disconnected" message for the cell below the title "Display the scaffolds". Strangely though, the other cell with a mols2grid instance, below the title "View Matched Molecular Series" seems to work. Any advice would be greatly appreciated. Thanks!

Save CSV not working as expected

Hello,
I used the following code from the documentation:

import mols2grid
from pathlib import Path
from rdkit import RDConfig
SDF_FILE = (f"{RDConfig.RDDocsDir}/Book/data/solubility.test.sdf"
            if Path(RDConfig.RDDocsDir).is_dir() else "solubility.test.sdf")

mols2grid.display(
    SDF_FILE,
    # rename fields for the output document
    rename={"SOL": "Solubility",
            "SOL_classification": "Class",
            "NAME": "Name"},
    # set what's displayed on the grid
    subset=["ID", "img", "Solubility"],
    # set what's displayed on the hover tooltip
    tooltip=["Name", "SMILES", "Class", "Solubility"],
    # style for the grid labels and tooltips
    style={
      "Solubility": lambda x: "color: red; font-weight: bold;" if x < -3 else "",
      "__all__": lambda x: "background-color: azure;" if x["Solubility"] > -1 else ""
    },
    # change the precision and format (or other transformations)
    transform={"Solubility": lambda x: round(x, 2)},
    # sort the grid in a different order by default
    sort_by="Name",
    # molecule drawing parameters
    fixedBondLength=25, clearBackground=False
)

Then I checked the molecule with ID = 150, Solubility = -2.37, SMILES = CCCCBr. However, when I open the CSV files, the saved structure was:

index	ID	Solubility	Name	SMILES	Class
29	246	-6.8	2	2ﾴ	5	6ﾴ-PCB	c1c(Cl)ccc(Cl)c1c2c(Cl)cccc2Cl	(A) low

suggestion - support for py-shiny

Hi,

I love your package and was wondering if you had thought about supporting py-shiny, a web frame work that was just announced. It would be a great addition to the ecosystem.

Shiny is a very popular framework for developing scientific apps based in R and I am expecting that py-shiny will be equally popular, especially in cheminformatics due to packages such as yours and rdkit.

Cheers,

Iain

Google colab support

Selection doesn't seem to work in colab because IPython.notebook.kernel.execute is not directly supported

command-line tool

Hello,

Is there a way to have a command-line tool for this use case:

I am in a terminal
I want to create a 2D picture (as a grid) of all molecules in the input file
as a .png or .pdf, or ("a la rigueur") as a .html file.

I am not under jupyter or what not.

My input files are either .sdf, .mol2 or .smi (and in a SMILES file, the SMILES string are the first field/column).

Thanks a lot,
Francois.

Inuiry regaring filtering in saved html & images in tooltip.

Thank you for creating this really nifty tool for generation of html reports of chemical data, it has been really helpful for exporting results to other non-computational scientists to view results.

I wanted to ask two things regarding the current capabilities of mols2grid and if these were possible:

In the tooltip, is it possible to render the image of a particular feature of the molecule? The image I am trying to render is a rdkit generated pillow image of the bemis murcko scaffold of the molecule so that viewers can easily see the scaffold of a molecule of interest. Currently just adding a column from a pandas.DataFrame containing pillow objects of those images just shows up blank in the html report so I was wondering if this feature was supported and if there are any extra steps I need to do for it to work?
For the filtering sliders demonstrated in the collab notebooks, are there any ways to include this into the saved html report? I was trying to allow the chemists to do some filtering based on frequency of bemis murcko scaffold to find those molecules which occur the most.

Thank you and I look forward to your reply. 😊

Similarity search

Add additional search option under dropdown (with SMARTS and Text)
Implement new search function

Feature request: allow string formatting for values in subsets

Hi, thanks again.

It's really handy that you built-in CSS styling of the labels and tooltips, like color: red. It would also be great to support Python string formatting (e.g., to trim the string representation of floats). For example:

I'm guessing this will be tricky since it looks like you're building up the CSS manually here:

mols2grid/mols2grid/molgrid.py

Line 316 in 849cf49

value_names = value_names[:-1] + f", {{ attr: 'style', name: {name!r} }}]"

...but I think it's something to consider for the future.

Possibility of highlighting substructure matches in search

Hello @cbouy, thanks for this nice package.

I am wondering if highlighting the substructure matches from the search on the molecule is a planned feature.

Looking at the code, the molecule rendering itself is handled by rdkit in python while the search only acts as filtering on the item list using rdkitjs query matching.

Easiest solution would be to move the molecule rendering to rdkit js, but that might not be what you want.

Conda forge package available

Hello @cbouy.

I just wanted to tell you that I have created a conda package for mols2grid at https://github.com/conda-forge/mols2grid-feedstock: mamba install -c conda-forge mols2grid.

Feel free to make a PR to the feedstock if you want to be added as a maintainer of the package.

As a side note, I am also the author of the datamol library (https://github.com/datamol-org/datamol), in case you want to use it in mol2grid. It also has a nice 3D conformer viewer that could potentially be added to mol2grid.

Close once you have read this, since this is not a real issue :-)

AttributeError: partially initialized module 'mols2grid' has no attribute 'display' (most likely due to a circular import)

Using transform on floats breaks sorting

If I use a transform dictionary to change the formatting of certain properties (say, according to other local variables) like so:

raw_html = mols2grid.display(
    _df,
    mol_col="Mol",
    subset=[
        "Name",
        "img",
    ],
    transform={
        "MMP std. dev. difference": lambda x: (
            f"{x:.2f} {' (Log units)' if log else ''}"
        ),
    },
    tooltip=[
        "MMP std. dev. difference",
    ],
)._repr_html_()  # type: ignore

...then it appears all sorting it based on the str representation of "MPP std. dev. difference" instead of the float representation, even though in the actual data frame, the column "MMP std. dev. difference" has dtype of float64. In this example, I could just change the column title before I pass the data to mols2grid, but in other cases, I want to use transform to otherwise mutate the string shown to the user yet still retain sorting by original dtype. Is this possible?

Edit: I think the issue arises from

mols2grid/mols2grid/molgrid.py

Lines 648 to 649 in d156fc9

 for col, func in transform.items(): 

 df[col] = df[col].apply(func)

where the transform is applied directly to the data, changing the column to a str in the above code snippet.

A simpler example that would show the same behavior is this:

raw_html = mols2grid.display(
    _df,
    mol_col="Mol",
    subset=[
        "Name",
        "img",
        "x",
    ],
    transform={
        "x": lambda x:  f"x = {np.round(x,2)}"
    },
)._repr_html_()  # type: ignore

where just changing the display from 3.14 to x = 3.14 will break sorting.

Question: is there a way to label atoms?

This currently doesn't work for me. Example:

import mols2grid
from rdkit import Chem

mol = Chem.MolFromSmiles("c1ccncc1")

for i, atom in enumerate(mol.GetAtoms()):
    atom.SetProp("molAtomMapNumber", str(atom.GetIdx()+1))
    atom.SetProp("atomNote", "hi")

mols2grid.display([mol])

Hover popup flickers

When the hover popup to display info about molecule is higher than the frame size, the hover popup will keep flickering and not displays the info.

Sorting

It would be very useful to have a "sort" option similar to the search option, e.g. sort solubility from lowest to highest.

Bug & feature request with selections

Discussed in #19

^{Originally posted by PatWalters October 17, 2021}
The callback feature is very useful, thanks! I've been trying to figure out how to implement an enhanced cluster viewer. I've been able to put together an example where I can click on a cluster center in one grid and show the cluster members in a second grid. The part I'm missing is capturing the selections in the cluster member grid. If I click on a cluster in the cluster center grid a second time, the cluster member selections are cleared. Is there a way around this? My example code is in this gist.
https://gist.github.com/PatWalters/ffffa4612e2c143ab7389e539a5c88f7

Also, it would be useful to be able to have a callback in the cluster center grid when checkboxes are not present. However, it appears that when selection=False, clicks in the grid are not recognized.

Bug: no callback when selection=False
Feature request: restore checkbox state when re-displaying a grid

Searching CAS numbers

Hello,

I have a dataframe with two columns, 'SMILES', and 'CAS', with both containing their obvious entities. One issue I am having is that the search tool, when 'text' searching for CAS number, doesn't seem to work with the built in dashes in CAS numbers, (e.g., 50-00-0). For example, if I were to start typing in '50' it would find all the text with '50' somewhere in it, but as soon as i type '50-' it doesn't find anything.

Obviously I could just remove all the dashes from the CAS numbers and just instruct users of my app to do the same, but I'd like to keep the original number present if possible.

I'm surprised this issue hasn't come up before. Any suggestions?

template="pages" appears blank but template="table" works fine

Thanks for putting this together! It's really nice. I'm unclear how to proceed to debug this, but I'm able to get images only when I use template="table". If I try template="pages" (or leave it blank), then I can only see the "Sort by" bar, screenshot below. I've tested this using mols2grid version 0.0.3, in a relatively clean conda environment, on macOS using Chrome Version 89.0.4389.90 and Firefox 87.0b9.

Alternative renderers

Investigate incompatibilities between ipywidgets and jupyterlab

The latest version of jupyterlab (4) seem to be incompatible with the version of ipywidgets specified in the dependencies.
As a result, selections aren't communicated to Python as this line returns undefined, although this object should be created here.

Upgrading to ipywidgets==8.1.1 seem to fix the issue

Selection Option for Streamlit

Hello, thanks a lot for sharing your project!

The README says that the checkbox option is relevant only in the context of Jupyter Notebook. Is there no way of extracting the selection through Streamlit?

Thank you in advance and apologies for the trivial question.

Different MolGrid instances "share" .get_selection

Hello. We added one more MolGrid instance and it appears that the first MolGrid selection affects what is selected in the second MolGrid.

import mols2grid

m1 = mols2grid.MolGrid()
m2 = mols2grid.MolGrid()

# select the first item on m1

m2.get_selection()
# shows the item selected with the index of m1

Version: mols2grid-0.2.2

I am bit busy now but I'll try to have a look in the near future.

Upcoming features 🚀setHoverable for atoms name

import py3Dmol
v = py3Dmol.view(query="pdb:1ubq",style={'cartoon':{'color':'spectrum'},'stick':{}})
v.setHoverable({},True,'''function(atom,viewer,event,container) {
if(!atom.label) {
atom.label = viewer.addLabel(atom.resn+":"+atom.atom,{position: atom, backgroundColor: 'mintcream', fontColor:'black'});
}}''',
'''function(atom,viewer) {
if(atom.label) {
viewer.removeLabel(atom.label);
delete atom.label;
}
}''')

Discussed in #5

^{Originally posted by cbouy March 24, 2021}
This is a list of upcoming features for mols2grid

🙏 Contributions are very much welcome 🙏

JupyterLab compatibility

Needs to refactor the grid as an ipywidget instead of an interactive HTML page

Better text search

The current text search is quite buggy as it escapes some regular expressions characters in the query without actually performing a regex search (see list.js issue) . As a consequence you can't search text containing - or # for example.
It's also not possible to exclude words from the search.
It would be great to 1) fix the text search 2) allow some more complex search on specific fields to be performed (i.e. a proper query system with AND, OR, NOT...etc)

How to use prolif.LigNetwork as callback?

Hi, excellent package, so useful!!!
I am trying to combine the LigNetwork output of ProLIF as callback but without succes. I am trying to do the following:

# Here is the problem
ProLIF_callback = mols2grid.callbacks.make_popup_callback(
    title='Prolif',
    html=net._get_html(),
    js = net._get_js()
)

But it doe not work. I let the full code example just for reproducibility:

import mols2grid
from pathlib import Path
from rdkit import RDConfig
from rdkit.Chem import Descriptors
from ipywidgets import interact, widgets
from prolif.plotting.network import LigNetwork
import MDAnalysis as mda
import prolif as plf
import numpy as np

output = widgets.Output()
SDF_FILE = (f"{RDConfig.RDDocsDir}/Book/data/solubility.test.sdf"
            if Path(RDConfig.RDDocsDir).is_dir() else "solubility.test.sdf")

# ProLIF example
# load topology
u = mda.Universe(plf.datafiles.TOP, plf.datafiles.TRAJ)
lig = u.select_atoms("resname LIG")
prot = u.select_atoms("protein")
# create RDKit-like molecules for visualisation
lmol = plf.Molecule.from_mda(lig)
pmol = plf.Molecule.from_mda(prot)
fp = plf.Fingerprint()
fp.run(u.trajectory[0:1], lig, prot)
fp = plf.Fingerprint()
fp.run(u.trajectory[::10], lig, prot)
df_fp = fp.to_dataframe(return_atoms=True)


net = LigNetwork.from_ifp(
    df_fp,
    lmol,
    # replace with `kind="frame", frame=0` for the other depiction
    kind="aggregate",
    threshold=0.3,
    rotation=270,
)

# Here is the problem
ProLIF_callback = mols2grid.callbacks.make_popup_callback(
    title='Prolif',
    html=net._get_html(),
    js = net._get_js()
)

# mols2grid example
df = mols2grid.sdf_to_dataframe(SDF_FILE)
# compute some descriptors

grid = mols2grid.MolGrid(
  df,
  size=(120, 100),
  name="filters",
)

view = grid.display(
    n_rows=2,
    subset=["img", "ID"],
    tooltip=["SOL", 'SMILES'],
    callback = ProLIF_callback
    )

view

dataframe-like template

Use web workers for the different searches

mol2grid.display slow for a large number of compounds

I applied mol2grid.display to a pd.DataFrame storing the VEHICLe data set, which contains ~25k samples, and the call does not seem to terminate after a reasonable time (for a 6x6 grid). I tried to stop the Jupyter cell execution after ~600s but did not work and I had to kill the entire notebook.

Offline discussion with @cbouy : the bottleneck is probabluy image generation for the whole data set and it would be nice to have the option to perform image generation on page load.

Substructure query highlighting wrong atoms

Performing a substructure query on molecules with explicit hydrogens currently highlights the wrong atoms.

Adding removeHs=True fixes the problem but removes hydrogens from the picture which might be problematic depending on the use case.
Another more radical option (not really a solution) is to use substruct_highlight=False to remove the highlighting...

AttributeError: 'MolDraw2DSVG' object has no attribute 'SetDrawOptions'

import pandas as pd
import mols2grid

smiles = ["CCO", "c1ccccc1", "N", "CO", "O=S(=O)(-O)(-O)", "CCC", "CCC=O"]
df = pd.DataFrame({"smi": smiles,
                   "id": range(1, len(smiles) + 1)})
mg = mols2grid.MolGrid(df, smiles_col="smi", size=(110, 90))
mg.display(subset=["id", "img"], n_cols=7)

I was trying to visualize a list of SMILES, as shown in your colab. But I got the following error message.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-26-cbf1ce952db4> in <module>
      7                    "id": range(1, len(smiles) + 1)})
      8 # setup the grid
----> 9 mg = mols2grid.MolGrid(df, smiles_col="smi", size=(110, 90))
     10 mg.display(subset=["id", "img"], n_cols=7)

~/anaconda3/lib/python3.7/site-packages/mols2grid/molgrid.py in __init__(self, df, smiles_col, mol_col, coordGen, useSVG, mapping, **kwargs)
     74         dataframe.dropna(axis=0, subset=[mol_col], inplace=True)
     75         # generate drawings
---> 76         dataframe["img"] = dataframe[mol_col].apply(self.mol_to_img, **kwargs)
     77         self.dataframe = dataframe
     78         self.mol_col = mol_col

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   4133             else:
   4134                 values = self.astype(object)._values
-> 4135                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   4136 
   4137         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

~/anaconda3/lib/python3.7/site-packages/pandas/core/series.py in f(x)
   4118 
   4119             def f(x):
-> 4120                 return func(x, *args, **kwds)
   4121 
   4122         else:

~/anaconda3/lib/python3.7/site-packages/mols2grid/molgrid.py in mol_to_img(self, mol, **kwargs)
    170         """Convert an RDKit mol to an HTML img tag containing a drawing of the
    171         molecule"""
--> 172         img = self.draw_mol(mol, **kwargs)
    173         if self.useSVG:
    174             return img

~/anaconda3/lib/python3.7/site-packages/mols2grid/molgrid.py in draw_mol(self, mol, size, use_coords, MolDrawOptions, **kwargs)
    158         for key, value in kwargs.items():
    159             setattr(MolDrawOptions, key, value)
--> 160         d2d.SetDrawOptions(MolDrawOptions)
    161         if not use_coords:
    162             mol = deepcopy(mol)

AttributeError: 'MolDraw2DSVG' object has no attribute 'SetDrawOptions'

could you please take a look maybe? Thanks!

on-the-fly descriptor calculation and min-max range filtering

temp

MolGrid.get_selection() always returns empty dict

Hi cbouy,

Thanks for making this tool available - it's great!

I'm trying out the code (with jupyter notebook) and I've observed that after selecting a molecule entry in the mol2grid.MolGrid I am not able to retrieve the selection through mol2grid.MolGrid.get_selection() - an empty dict is returned irrespective of the selection. I have looked through your google colab examples too (RDkit UGM and solubility examples) and see the same behaviour so I think there's an issue within mol2grid.

Thanks in advance,
Pete

template="table" gives blank result

Hi, I love mols2grid! Thanks for the hard work.

I'm having the inverse problem of what is described here: #8

template="table" gives blank output, but template="pages" works fine.

There are no NaNs in my pandas dataframe.

Using mols2grid 0.2.1 with rdkit 2021.09.4 in a conda environment, Jinja2 3.0.2, pandas 1.4.0 on macOS, browser is Brave Version 1.35.104.

Thanks!

Displaying highlights using mols2grid

I was trying to display fragment highlights of a molecules set using mols2grid in a pandas dataframe, but I only got to show the molecules set without the fragments highlights. Please, see the piece of code below:

fragment_list = []
for id in active_id_list:
exp = explainer.explain_instance(test_dataset.X[id], model_fn, num_features=100, top_labels=1)
key = list(exp.as_map().keys())[0]
my_fragments = fp_mol(Chem.MolFromSmiles(test_dataset.ids[id]))
fragment_weight = dict(exp.as_map()[key])
for index in my_fragments:
if index in fragment_weight:
m = Chem.MolFromSmiles(test_dataset.ids[id])
substructure = Chem.MolFromSmarts(list(my_fragments[index])[0])
m.GetSubstructMatches(substructure)
fragment_list.append({'id': id, 'Smiles': test_dataset.ids[id], 'p': key, 'index': index, 'fragments': my_fragments[index], 'weight': fragment_weight[index], 'Highlights': m})
df1 = pd.DataFrame(fragment_list)
df1
mols2grid.display(df1, mol_col="Highlights")

Am I doing something wrong? How can I do it? Thanks.

Fit size of the grid based on the number of rows and image size

The current height of the iframe that displays the grid is set to 600, which fits nicely with the default option but doesn't make much sense when using a different number of rows or image size.
Ideally the default height should be automatically calculated (from the n_rows, image size, gap, and other parameters) to fit the full document without overflowing

Is there a way to not display mols2grid-id

It appears that mols2grid-id is displayed in the upper left of each cell. Is there a way to turn this off and not display mols2grid-id.

	for col, func in transform.items():
	df[col] = df[col].apply(func)