lab-cosmo / glosim Goto Github PK

A Python package to compute similarities between molecules and structures

License: MIT License

Python 93.82% C++ 6.05% Shell 0.14%

glosim's Introduction

glosim

A Python package to compute similarities between molecules and structures

This package is not further developed. We might be able to incorporate small bugfixes, but the original developers do not have time to properly maintain it.

glosim's People

Contributors

Stargazers

Watchers

Forkers

albapa peterbjorgensen sandipde veselovmark quanshengwu yuchaz bartolsthoorn yfyh2013 jlusquad nonvolatilememory unixjunkie rnaimehaom

glosim's Issues

How to handle similarity/distance matrix with sketchmap ?

As far as I can understand, glosim tool generates similarity or distance matrix between the molecules or the structures. While, sketchmap tool handle the high dimensional input data.
How can I input the similarity or distance matrix into sketchmap ?
Is there the mode like sklearn.manifold.MDS to handle pre-computed dissimilarities?
(sklearn.manifold.MDS can handle them with dissimilarity=‘precomputed’ option)
regard,

Performance

Hello,

I am trying to compute farthest point sampling for my data set.
It contains ~10000 frames and I want to get 1000 points.
Currently, I am running in a cluster with 10 nodes (400 CPU) and in 11 hours it computes only the first
~ 130 points.
I am using the following flags: --kernel rematch -n 12 -l 8 --gamma 0.1 -c 4.5 --peratom --distance
and my system contains 38 atoms.
Is this the expected performance? Is there any trick I can do to accelerate the calculation beyond reducing the number of considered atoms and/or the data set size?

Regards,
Yair

When to use unnormalized SOAPs

Hi all,

I noticed the new flag --unsoap was added to use unnormalized SOAPs, when do you recommend to use this flag?

Best regards,
Bart

SOAP with multiple species

Hi,

I have three questions regarding SOAP with multiple species:

Does the --kernel fastavg take different species into account? Looking at the definition it does not seem to be the case:

I would like to take the different species into account with the kronecker delta function, i.e.:

Is --kernel fastspecies the way to go? It seems it requires environmental information (a lot of RAM) so it is not comparable with fastavg in that sense. Are the --kernel average and --kernel rematch also taking into account different species with a delta function, and if so, how is it different from --kernel fastspecies?
It seems like the --separate_species flag is about the specie of the centers, not actually about taking the species in the environment into account. When it is recommended to use this?

Difference between average and fastavg kernel

Not exactly an issue but I was wondering if you could tell me what the difference was between the average and fastavg kernel. In which cases can I use the much faster fastavg kernel and when should I resort to the average kernel?
Thank you! :)

AttributeError: 'module' object has no attribute 'descriptors'

Hi,
Thanks for making these algorithms available.
I just downloaded quippy and installed it using this guide https://libatoms.github.io/QUIP/install.html
I checked out your code and tried to run the example, but I get the following trace:

python2 glosim.py example/mol-50.xyz
WARNING! fast hungarian library is not available 

Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent 
          TIME:   Wed Dec 21 17:34:03 2016
        ___  __    _____  ___  ____  __  __ 
       / __)(  )  (  _  )/ __)(_  _)(  \/  )
      ( (_-. )(__  )(_)( \__ \ _)(_  )    ( 
       \___/(____)(_____)(___/(____)(_/\/\_)
                                            
                                             
Reading input file example/mol-50.xyz
50  Configurations Read
Computing SOAPs
Traceback (most recent call last):   
  File "glosim.py", line 871, in <module>
    main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, periodic=args.periodic, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, nonorm=args.nonorm, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
  File "glosim.py", line 189, in main
    si.parse(at, coff, cotw, nd, ld, gs, centerweight, nocenter, noatom, kit = kit)       
  File "/home/peter/kodesjov/glosim/libmatch/structures.py", line 82, in parse
    desc = quippy.descriptors.Descriptor("soap central_weight="+str(cw)+"  covariance_sigma0=0.0 atom_sigma="+str(gs)+" cutoff="+str(coff)+" cutoff_transition_width="+str(cotw)+" n_max="+str(nmax)+" l_max="+str(lmax)+' '+lspecies+' Z='+str(sp) )   
AttributeError: 'module' object has no attribute 'descriptors'

parameters to repeat the prediction on QM7b with a average kernel

Hi Dr Sandip and Professor Ceriotti,
I want to repeat your work in Science Advances in qm7b dataset. It looks like I did some steps wrong which always return MAE higher than 8 kcal/mol. Can you please give me some tips about which step I have made mistake?
What I did as follows.
1, I use

python glosim.py qm7.xyz -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --peratom --kernel average

to generate the kernel matrix. According to the SI of your paper, the parameters should be -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --periodic --norm , but I replaced the --periodic by --peratom, and deleted --nonorm because glosim.py had been updated.
2, I use the shufflesplit form scikit-learn to get the training kernel matrix and test kernel matrix. (I used random split, instead of choosing training set by FPS).

from sklearn.model_selection import ShuffleSplit
rs = ShuffleSplit(n_splits=10, test_size=.20, random_state=0) # 10 times independently running
train_index, test_index in rs.split(X):
    X_train, y_train = X[train_index][:, train_index], y[train_index]
    X_test, y_test = X[test_index][:, train_index], y[test_index]

3, I put X_train to a Gaussian Process Regression with different regularization parameters, and based on this to test the X_test.

The lowest MAE I got was 8 kcal/mol with average kernel (Cutoff_r = 4 A). The high MAE here may come from the random split, but I think it shouldn’t be responsible for a so high MAE. Can you please give me some suggestions?

Cannot execute the sample code in the ESI of Chem. Sci., 2018, 9, 1289.

Hi,
Thanks for providing the algorithms.
I just installed the quippy and tried the glosim.
I tried the code in electronic supplementary materials for the paper Chem. Sci., 2018, 9, 1289.

glosim.py traj-pentacene.xyz -n 9 -l 9 -g 0.3 -c 3 --kernel rematch --gamma 2 --periodic --nonorm

The glosim program seems not to recognize options '--periodic --nonorm' .

                   :
glosim.py: error: unrecognized arguments: --periodic --nonorm

Also, without these options, the program abort with the message below.

WARNING! fast hungarian library is not available 

Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent 
          TIME:   Wed Oct 17 14:46:59 2018
        ___  __    _____  ___  ____  __  __ 
       / __)(  )  (  _  )/ __)(_  _)(  \/  )
      ( (_-. )(__  )(_)( \__ \ _)(_  )    ( 
       \___/(____)(_____)(___/(____)(_/\/\_)
                                            
                                             
using output prefix = traj-pentacene-n9-l9-c3.0-g0.3_rematch-2.0
Reading input file traj-pentacene.xyz
564  Configurations Read
Computing SOAPs
Traceback (most recent call last):   
  File "glosim/glosim.py", line 862, in <module>
    main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, onpy=args.onpy, peratom=args.peratom, unsoap = args.unsoap, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, normalize_global=args.normalize_global, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
  File "glosim/glosim.py", line 243, in main
    sii,senvii = structk(si, si, alchem, peratom, mode=kmode, fout=fii, peps=permanenteps, gamma=reggamma, zeta=zeta, xspecies=xspecies)        
  File "/home/iino/Work/glosim/libmatch/structures.py", line 292, in structk
    cost=rematch(kk, gamma, 1e-6)  # hard-coded residual error for regularized gamma
  File "/home/iino/Work/glosim/libmatch/lap/perm.py", line 46, in rematch
    raise ValueError("No Python equivalent to rematch function...")
ValueError: No Python equivalent to rematch function...

It seems the code for REMatch kernel is not implemented yet.
Is there any problem in my environment, or is it intentional?

Broken dependencies

Hi, I am trying to run this code with the rematch kernel but it seems that the following dependency is outdated:

https://github.com/cosmo-epfl/glosim/blob/9e18ffa6f302142514f222b0386e7d5dcb85d84f/libmatch/lap/perm.py#L33

I installed https://github.com/sandipde/MCpermanent/ but it does not seem to match the code here.

lab-cosmo / glosim Goto Github PK

glosim's Introduction

glosim

glosim's People

Contributors

Stargazers

Watchers

Forkers

glosim's Issues

How to handle similarity/distance matrix with sketchmap ?

Performance

When to use unnormalized SOAPs

SOAP with multiple species

Difference between average and fastavg kernel

AttributeError: 'module' object has no attribute 'descriptors'

parameters to repeat the prediction on QM7b with a average kernel

Cannot execute the sample code in the ESI of Chem. Sci., 2018, 9, 1289.

Broken dependencies

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent