A Python package to compute similarities between molecules and structures
This package is not further developed. We might be able to incorporate small bugfixes, but the original developers do not have time to properly maintain it.
A Python package to compute similarities between molecules and structures
License: MIT License
As far as I can understand, glosim tool generates similarity or distance matrix between the molecules or the structures. While, sketchmap tool handle the high dimensional input data.
How can I input the similarity or distance matrix into sketchmap ?
Is there the mode like sklearn.manifold.MDS to handle pre-computed dissimilarities?
(sklearn.manifold.MDS can handle them with dissimilarity=‘precomputed’ option)
regard,
Hello,
I am trying to compute farthest point sampling for my data set.
It contains ~10000 frames and I want to get 1000 points.
Currently, I am running in a cluster with 10 nodes (400 CPU) and in 11 hours it computes only the first
~ 130 points.
I am using the following flags: --kernel rematch -n 12 -l 8 --gamma 0.1 -c 4.5 --peratom --distance
and my system contains 38 atoms.
Is this the expected performance? Is there any trick I can do to accelerate the calculation beyond reducing the number of considered atoms and/or the data set size?
Regards,
Yair
Hi all,
I noticed the new flag --unsoap
was added to use unnormalized SOAPs, when do you recommend to use this flag?
Best regards,
Bart
Hi,
I have three questions regarding SOAP with multiple species:
--kernel fastavg
take different species into account? Looking at the definition it does not seem to be the case:I would like to take the different species into account with the kronecker delta function, i.e.:
Is --kernel fastspecies
the way to go? It seems it requires environmental information (a lot of RAM) so it is not comparable with fastavg in that sense. Are the --kernel average
and --kernel rematch
also taking into account different species with a delta function, and if so, how is it different from --kernel fastspecies
?
It seems like the --separate_species
flag is about the specie of the centers, not actually about taking the species in the environment into account. When it is recommended to use this?
Not exactly an issue but I was wondering if you could tell me what the difference was between the average
and fastavg
kernel. In which cases can I use the much faster fastavg
kernel and when should I resort to the average
kernel?
Thank you! :)
Hi,
Thanks for making these algorithms available.
I just downloaded quippy and installed it using this guide https://libatoms.github.io/QUIP/install.html
I checked out your code and tried to run the example, but I get the following trace:
python2 glosim.py example/mol-50.xyz
WARNING! fast hungarian library is not available
Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent
TIME: Wed Dec 21 17:34:03 2016
___ __ _____ ___ ____ __ __
/ __)( ) ( _ )/ __)(_ _)( \/ )
( (_-. )(__ )(_)( \__ \ _)(_ ) (
\___/(____)(_____)(___/(____)(_/\/\_)
Reading input file example/mol-50.xyz
50 Configurations Read
Computing SOAPs
Traceback (most recent call last):
File "glosim.py", line 871, in <module>
main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, periodic=args.periodic, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, nonorm=args.nonorm, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
File "glosim.py", line 189, in main
si.parse(at, coff, cotw, nd, ld, gs, centerweight, nocenter, noatom, kit = kit)
File "/home/peter/kodesjov/glosim/libmatch/structures.py", line 82, in parse
desc = quippy.descriptors.Descriptor("soap central_weight="+str(cw)+" covariance_sigma0=0.0 atom_sigma="+str(gs)+" cutoff="+str(coff)+" cutoff_transition_width="+str(cotw)+" n_max="+str(nmax)+" l_max="+str(lmax)+' '+lspecies+' Z='+str(sp) )
AttributeError: 'module' object has no attribute 'descriptors'
Hi Dr Sandip and Professor Ceriotti,
I want to repeat your work in Science Advances in qm7b dataset. It looks like I did some steps wrong which always return MAE higher than 8 kcal/mol. Can you please give me some tips about which step I have made mistake?
What I did as follows.
1, I use
python glosim.py qm7.xyz -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --peratom --kernel average
to generate the kernel matrix. According to the SI of your paper, the parameters should be -n 9 -l 9 -g 0.3 -c 3 --zeta 2 --periodic --norm
, but I replaced the --periodic
by --peratom
, and deleted --nonorm
because glosim.py had been updated.
2, I use the shufflesplit form scikit-learn to get the training kernel matrix and test kernel matrix. (I used random split, instead of choosing training set by FPS).
from sklearn.model_selection import ShuffleSplit
rs = ShuffleSplit(n_splits=10, test_size=.20, random_state=0) # 10 times independently running
train_index, test_index in rs.split(X):
X_train, y_train = X[train_index][:, train_index], y[train_index]
X_test, y_test = X[test_index][:, train_index], y[test_index]
3, I put X_train to a Gaussian Process Regression with different regularization parameters, and based on this to test the X_test.
The lowest MAE I got was 8 kcal/mol with average kernel (Cutoff_r = 4 A). The high MAE here may come from the random split, but I think it shouldn’t be responsible for a so high MAE. Can you please give me some suggestions?
Hi,
Thanks for providing the algorithms.
I just installed the quippy and tried the glosim.
I tried the code in electronic supplementary materials for the paper Chem. Sci., 2018, 9, 1289.
glosim.py traj-pentacene.xyz -n 9 -l 9 -g 0.3 -c 3 --kernel rematch --gamma 2 --periodic --nonorm
The glosim program seems not to recognize options '--periodic --nonorm' .
:
glosim.py: error: unrecognized arguments: --periodic --nonorm
Also, without these options, the program abort with the message below.
WARNING! fast hungarian library is not available
Cannot find mcpermanent.so module in pythonpath. Permanent evaluations will be very slow and approximate.
Get it from https://github.com/sandipde/MCpermanent
TIME: Wed Oct 17 14:46:59 2018
___ __ _____ ___ ____ __ __
/ __)( ) ( _ )/ __)(_ _)( \/ )
( (_-. )(__ )(_)( \__ \ _)(_ ) (
\___/(____)(_____)(___/(____)(_/\/\_)
using output prefix = traj-pentacene-n9-l9-c3.0-g0.3_rematch-2.0
Reading input file traj-pentacene.xyz
564 Configurations Read
Computing SOAPs
Traceback (most recent call last):
File "glosim/glosim.py", line 862, in <module>
main(args.filename, nd=args.n, ld=args.l, coff=args.c, cotw=args.cotw, gs=args.g, mu=args.mu, centerweight=args.cw, onpy=args.onpy, peratom=args.peratom, unsoap = args.unsoap, usekit=args.usekit, kit=args.kit,alchemyrules=args.alchemy_rules, kmode=args.kernel, normalize_global=args.normalize_global, permanenteps=args.permanenteps, reggamma=args.gamma, noatom=noatom, nocenter=nocenter, envsim=args.envsim, nprocs=args.np, verbose=args.verbose, envij=envij, prefix=args.prefix, nlandmark=args.nlandmarks, printsim=args.distance,ref_xyz=args.refxyz,partialsim=args.livek,lowmem=args.lowmem,restartflag=args.restart, zeta=args.zeta, xspecies=args.separate_species, alrange=(args.first,args.last), refrange=(args.reffirst, args.reflast))
File "glosim/glosim.py", line 243, in main
sii,senvii = structk(si, si, alchem, peratom, mode=kmode, fout=fii, peps=permanenteps, gamma=reggamma, zeta=zeta, xspecies=xspecies)
File "/home/iino/Work/glosim/libmatch/structures.py", line 292, in structk
cost=rematch(kk, gamma, 1e-6) # hard-coded residual error for regularized gamma
File "/home/iino/Work/glosim/libmatch/lap/perm.py", line 46, in rematch
raise ValueError("No Python equivalent to rematch function...")
ValueError: No Python equivalent to rematch function...
It seems the code for REMatch kernel is not implemented yet.
Is there any problem in my environment, or is it intentional?
Hi, I am trying to run this code with the rematch kernel but it seems that the following dependency is outdated:
I installed https://github.com/sandipde/MCpermanent/ but it does not seem to match the code here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.