astrazeneca / kallisto Goto Github PK
View Code? Open in Web Editor NEWEfficiently calculate 3D-features for quantitative structure-activity relationship approaches.
Home Page: https://ehjc.gitbook.io/kallisto/
License: Apache License 2.0
Efficiently calculate 3D-features for quantitative structure-activity relationship approaches.
Home Page: https://ehjc.gitbook.io/kallisto/
License: Apache License 2.0
While the "Statement of Need" in the manuscript introduces the need of computation of atomic features, it is not clear exactly what the actual capabilities of the software are. I understand the main functionality is described in the online documentation, but I think it should also be clearer from the paper. This will allow to add proper references to the implemented methods as well (if relevant).
The same applies to the mentioned "computational modelling helpers". It would be nice to know which one they are in the paper (at least the most useful/relevant ones).
I also think that specifying DFT functionals and basis sets is not relevant in the paper and would be better off left to the documentation (where I can't see any mention of this?). I think (e.g., static/dynamic poralizabilities with time-dependent DFT data)
would be much more readable than (e.g., static/dynamic polarizabilitities with time-dependent (Dreuw & Head-Gordon, 2005) PBE0 (Adamo & Barone, 1999) / d-aug-def2-QZVP (Weigend & Ahlrichs, 2005) data)
.
usecases
-> use cases
polarizabilitities
-> polarizabilities
Stefan Grimme and co-wo rker
-> Stefan Grimme and co-workers
(I suck at spotting typos, so I might add more later on...)
I think the phrase
Tasks as, e.g., building predictive models, performing virtual screening, or predicting compound activities are potential use cases for such ML applications.
could do with some references about ML models for VS and activity prediction.
Multiple references are not correctly displayed. As it can be inferred from the example paper and bibliography (not very clear , I know...!), multiple references can be put together with the following syntax: [@ref1; @ref2]
.
To reach a wider audience - it would be helpful to have a wider set of example files otherwise I am digging through your test directory.
Some folk, and I picked this up from MDAnalysis
will come with prepared files within the package. You can choose to include yours in there for the example in the Coordination and Numbers and Sphere's page.
alanine-glycine.xyz
If someone isn't familiar with the xmol format it is also worth it to place in there as well in the docs - you could also link to the test file but most folk don't wanna flood the system with another file they have to coordinate - they like it compartmentalized.
Hence the problem is shifted from calculating dynamic polarizabilities to calculating atomic coordination numbers and interpolate the atomic reference systems. This procedure comes along with a tremendous increase in efficiency and overcomes furthermore numerical issues that could occur in TD-DFT calculations.
Note that the dynamic reference polarizabilities can be pre-scaled to, e.g., incorporate atomic-charge information as has been introduced recently by Caldeweyher et al.
Other comments:
kallisto
compares to AlphaML
in terms of speed?--molecular
option to compute the molecular polarizabilityThe code is quite complex, it could do with a few more comments so that it easier to follow and understand.
As mentioned in #26 (and for the reason above) , it would be helpful to have a better consistency between variable names in the code and the documentation. For example, you have wf = 6.0
in the code, beta
in the documentation (beta
is not defined at all in the documentation) and beta2
in the paper. I had to dig into the paper to understand what wf = 6.0
was.
kallisto/src/kallisto/methods.py
Lines 244 to 245 in 293503b
kallisto/src/kallisto/methods.py
Lines 314 to 315 in 293503b
There might be other parts where the code could benefit from vectorisation and some parts where loops can be reduced.
For example, here I think you can combine the two loops together:
kallisto/src/kallisto/methods.py
Lines 290 to 299 in 293503b
Missing an author?
authors = ["Eike Caldeweyher <[email protected]>"]
thats it actually....very nicely written file.
Describe the solution you'd like
Implement solvent accessible surface area (SASA) using an angular Lebedev-Laikov grid as described in "A quadrature formula for the sphere of the 131st algebraic order of accuracy", Doklady Mathematics, Vol. 59, No. 3, 1999, pp. 477-481.
Extend present grid implementation in src/grid.py
to cover finer grids and use the defined gridpoints to construct a SASA.
In the rmsd.py
module, it seems that there is a lot of code that could be removed in favor of scipy.spatial.transform
and scipy.linalg
. My view here is that code from another library is code you don't have to maintain, and a library such as scipy
is usually very reliable, extensively tested and possibly faster.
These are the functions that could use scipy
functionality; it would remove a lot of lines of code:
rotionMatrix()
could use scipy.spatial.transform.Rotation
scipy.spatial.transform.Rotation
using from_quat()
as_matrix()
dstmev()
could be substituted with a scipy.linalg.eigh()
?
givens4
getRodriguezRotation()
could use scipy.spatial.transform.Rotation
internally
scipy.spatial.transform.Rotation
using from_rotvec()
apply()
getRotationMatrix()
could use scipy.spatial.transform.Rotation
internally
scipy.spatial.transform.Rotation
using from_rotvec()
as_matrix()
getDist
could be vectorised or substituted with scipy.spatial.distance.euclidean
(or similar)Up to you if you want to change these, but I think it would be helpful for future development and maintenance. The best code is no code ;)
Additionally:
pythag()
is missing a docstring.unit
in getRodriguezRotation()
correct?The following docstring is incorrect:
kallisto/src/kallisto/methods.py
Lines 383 to 390 in 293503b
Unfortunately I think that changes introduced for #22 prevent vectorisation in this case (sorry for not spotting this previously!), since you have to go from the atomic number to the chemical symbol in order to extract the parameters needed.
kallisto/src/kallisto/methods.py
Lines 406 to 408 in 293503b
I'm not too keen to suggest to revert the changes, so it is up to you to decide if it is worth the effort.
As the title suggests, is there any more in depth documentation on sterimol descriptors? Thank you so much.
-Montana Mendy
Actual coverage is 62%.
Missing lines 48->50, 50->53, 53->62, 54->55, 55-59, 62->63, 63-67, 73->78, 75->76, 76-84, 86->87, 87, 90->91, 91, 98->99, 99, 111-126, 130, 134, 140-143, 155-160
The link to the tutorial in the contributing guidelines is currently broken. See fortran-lang/fpm#340.
If you never created a pull request before you can learn how to do this from this great tutorial
The new link to the tutorial is here:
https://app.egghead.io/courses/how-to-contribute-to-an-open-source-project-on-github
Actual coverage is 80%.
Missing lines 18-20, 26, 44->49, 45->44, 53->56, 60->63, 63-64
I started looking at the source code and I noticed that the grid.py
module is outside the kallisto
package. When kallisto
is imported, the grid
module also becomes available, possibly clashing with or shadowing other modules. Is this intended?
I think it would be better to have the grid
module within the package, unless I'm missing a good reason not to.
The documentation for the sterimol descriptors is quite scarce. I'd add some details (or at very least link to the relevant section of the Morfeus package.
Do I understand correctly that the code comes directly from Morfeus package? I would make this clearer in the module docstring (not only the function getClassicalSterimol()
). Is there any kind of copyright or license that needs to be mentioned as well?
I was not able to find this code; if it was given in private, maybe an acknowledgment of this in the paper is needed. Some clarifications about how the code was obtained and integrated would be helpful to better understand the situation.
One thing I can recommend is:
Since you've got the nox in there, I don't know what the licensing issues are in there. When we touch the open source world and you've got massive amounts of python packages a robust licensing is definitely worth it.
FOSSA will scan through each dependency (it's free) and bring each license for dependency to light.
I added a FOSSA badge as an example into one of my projects. It's very easy to setup ~5 minutes for github integration like CodeCov and can go through each license to make sure you are not stepping on anyone's toes.
Is your feature request related to a problem? Please describe.
I just run kallisto
over a set of structures with
for i in *; do
kallisto alp $i/inp.xyz > $i/kallisto.out
done
But remembered a moment too late that kallisto
expects the input file as argument to --inp
and the positional argument is the output. Fortunately, the geometries were under version control, which made reverting easy.
At least to me the CLI feels from time to time unintuititive and I have to frequently consult the help printout to get the command line right.
Describe the solution you'd like
Print the error message to the stderr instead of the output file.
Describe alternatives you've considered
Adjust the CLI to expect the input file as positional argument and write to stdout.
I definitely stalk a bit,
I want to see who you guys are - like the new kids on the block.
Here's a code suggestion:
Currently using Kallisto, and I used Travis, but I thought I'd share my yml
just in case anyone else ran into this issue as well!
I think it's imperative to have instructions for all CI/CD's, so I wrote some Travis build instructions for this repo for AstraZeneca, just incase, and this happened to be what I used when implementing CI/CD of my liking/choice.
---
language: python
matrix:
include:
-
dist: xenial
os: linux
python: 3.6
sudo: required
-
dist: xenial
os: linux
python: 3.7
sudo: required
-
env: PYTHON=36
language: generic
os: osx
osx_image: xcode10
script:
- "pip install nox"
- "pip install poetry"
# Travis CI Build Instructions for Kallisto made by Montana Mendy
# Just incase if needed by AstraZeneca
-Montana Mendy
Actual coverage is 88%.
Missing lines 21->25, 30->31, 31
So digging through your units.py - I'm curious about the design position about implementing your own units instead of using one of your packages dependencies - scipy
? Is Scipy more trustworthy than the 2014 paper more reputable or vice versa?
Scipy Implementation
import scipy.constants as constants
newt_grav = constants.gravitational_constant
I think that's the code. It might be conflicting for users down the line to have two sets of constants. I usually just import from scipy.
Here's the link to the docs: https://docs.scipy.org/doc/scipy/reference/constants.html
Comparison:
6.6740e-11 (Kallisto) | 6.6743e-11 (SciPy) # Gravitational Constant
6.626070040e-34 (Kallisto) | 6.62607015e-34 (Scipy) # Planck Constant
etc..
Describe the bug
I found the command line interface a bit unintuitive, i.e. if I want to calculate a property I'm using:
โฏ kallisto --verbose cns --inp struc.xyz
[3.98610602 3.11339432 3.25775815 3.40451617 2.07932831 3.45973873
0.9958313 3.00988788 3.05595857 1.00441848 3.01990055 3.01780514
1.00235052 4.00645069 0.99417174 0.99389366 0.99390611 3.9980567
0.99426699 0.9936817 0.99393769 0.99395809 0.99370343 0.99367221]
Now if I'm leaving away the --verbose
flag, the calculation seems to happen but no printout is produced?
โฏ kallisto cns --inp struc.xyz
To Reproduce
Environment setup:
โฏ conda info
active environment : kallisto
active env location : /home/awvwgk/miniforge3/envs/kallisto
shell level : 2
user config file : /home/awvwgk/.condarc
populated config files : /home/awvwgk/miniforge3/.condarc
conda version : 4.9.2
conda-build version : not installed
python version : 3.7.8.final.0
virtual packages : __glibc=2.33=0
__unix=0=0
__archspec=1=x86_64
base environment : /home/awvwgk/miniforge3 (writable)
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /home/awvwgk/miniforge3/pkgs
/home/awvwgk/.conda/pkgs
envs directories : /home/awvwgk/miniforge3/envs
/home/awvwgk/.conda/envs
platform : linux-64
user-agent : conda/4.9.2 requests/2.24.0 CPython/3.7.8 Linux/5.10.15-1-MANJARO manjaro/20.2.1 glibc/2.33
UID:GID : 1000:1000
netrc file : None
offline mode : False
โฏ conda list
# packages in environment at /home/awvwgk/miniforge3/envs/kallisto:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
ca-certificates 2020.12.5 ha878542_0 conda-forge
certifi 2020.12.5 py39hf3d152e_1 conda-forge
click 7.1.2 pyh9f0ad1d_0 conda-forge
kallisto 1.0.4 pypi_0 pypi
ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge
libblas 3.9.0 8_openblas conda-forge
libcblas 3.9.0 8_openblas conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 9.3.0 h2828fa1_18 conda-forge
libgfortran-ng 9.3.0 hff62375_18 conda-forge
libgfortran5 9.3.0 hff62375_18 conda-forge
libgomp 9.3.0 h2828fa1_18 conda-forge
liblapack 3.9.0 8_openblas conda-forge
libopenblas 0.3.12 pthreads_h4812303_1 conda-forge
libstdcxx-ng 9.3.0 h6de172a_18 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
numpy 1.20.1 py39hdbf815f_0 conda-forge
openssl 1.1.1j h7f98852_0 conda-forge
pip 21.0.1 pyhd8ed1ab_0 conda-forge
python 3.9.2 hffdb5ce_0_cpython conda-forge
python_abi 3.9 1_cp39 conda-forge
readline 8.0 he28a2e2_2 conda-forge
scipy 1.6.0 py39hee8e79c_0 conda-forge
setuptools 49.6.0 py39hf3d152e_3 conda-forge
sqlite 3.34.0 h74cdb3f_0 conda-forge
tk 8.6.10 hed695b0_1 conda-forge
tzdata 2021a he74cb21_0 conda-forge
wheel 0.36.2 pyhd3deb0d_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h516909a_1010 conda-forge
Input file (any input geometry would do as well)
24
-42.15442198
C -3.2668063145 -1.0612383311 0.0019792586
N -2.2766003898 -0.0195172936 -0.0013208218
C -0.9134590858 -0.2019878522 0.0036747135
C -0.3724701006 1.0762202467 0.0017569102
N -1.3526586265 2.0066755645 -0.0051933750
C -2.4700895686 1.3111433173 -0.0065911261
H -3.4434528448 1.7598589152 -0.0115392456
N 0.9701620055 1.2833365802 -0.0008793828
C 1.8257144377 0.2028321683 -0.0007168653
O 3.0278876281 0.3474912012 -0.0051812078
N 1.2515027114 -1.0636950051 -0.0008922263
C -0.1065018202 -1.3842039491 0.0029991689
O -0.5015980556 -2.5366534217 0.0044099955
C 2.2018357820 -2.1638123093 -0.0037185988
H 1.6323675079 -3.0900622488 0.0097444448
H 2.8244315108 -2.1119787186 -0.8964193499
H 2.8450781508 -2.0977859157 0.8731480371
C 1.5216365707 2.6211569486 0.0093869454
H 0.7306184352 3.3088549484 -0.2771144995
H 1.8886888508 2.8714188674 1.0060887389
H 2.3551115434 2.6687817985 -0.6902763920
H -2.7510404767 -2.0202634640 -0.0176576390
H -3.9109098848 -0.9753935259 -0.8744768432
H -3.8816811438 -0.9984445054 0.9013764258
Expected behavior
Do something meaningful for non-verbose calculations.
Most of the tests are in test_cli.py
instead of being in separate and standalone files (for example test_methods.py
, test_rmsd.py
, ...). This is not a problem in itself, but it seems that in some tests the assert
s do not really check the result.
Using the CLI to perform tests forces you to check the results as strings instead of numbers. This seems to be done incorrectly (see below). Additionally, if at some point you want to change the output format (for example to use scientific notation), you will have to change all the assertions.
I would suggest to re-work the tests in order to decouple the test of the CLI from the tests of the functionality. This is not really necessary, but the feeling I have looking at test_cli.py
is that the code coverage is very high because all parts of the code are run while some tests are not really meaningful (i.e. they mostly check that the test was run and output files have been created). I can't see many tests for edge cases either. I would like to see more clear and meaningful tests, comparing with known results (obtained from reference software or from manual calculations), so that users can make sure the package produces correct numerical results.
testcli.py
The following tests does not seem to have any output when run outside the testsuite:
Lines 96 to 104 in fe84963
Lines 107 to 115 in fe84963
The following tests only check if files have been produced, but not the content of the output file and there is no check beside the fact that they run:
Lines 118 to 133 in fe84963
Lines 418 to 433 in fe84963
Lines 730 to 745 in fe84963
The following assertions look incorrect, since "A" and "B"
evaluates to "B"
if "A"
is true (i.e. non-empty string):
Line 202 in fe84963
Line 214 in fe84963
Line 226 in fe84963
Line 250 in fe84963
Line 262 in fe84963
Line 274 in fe84963
Line 310 in fe84963
Line 322 in fe84963
Line 336 in fe84963
Line 350 in fe84963
Line 364 in fe84963
Line 378 in fe84963
Line 459 in fe84963
Line 471 in fe84963
Lines 1182 to 1186 in fe84963
Line 1236 in fe84963
The following test brought to my attention that the original information in the second line of the XYZ file is lots and replaced by "Created with kallisto":
Lines 175 to 178 in fe84963
I think it would be better to retain the original comment and add "Created with kallisto" after it.
I would include the Numpy Citation, now that Travis published it. If you are using it in your docs and in your software, it should be mentioned.
https://www.nature.com/articles/s41586-020-2649-2
worth a read if you haven't read it :)
I've ran through your First tutorial on Coordination Numbers and Sphere's for the xyz. While the output is nice and I have a general idea of what's going on I want to also know how you tested this feature right there.
What I do for gitbook applications is split the features into three sections:
So when someone looks at this feature they know how you tested it. It doesn't have to be verbose but just a "nice to have" to give me more confidence.
If you would like I can provide a link to my unreleased gitbook to show you that.
Hello
(First time writing a suggestion here on GitHub why I hope this is correctly done)
I want to use Kallisto do to an exchange of a polydentate ligand but it seems not to be possible why I suggest this to be a good feature to add.
Actual coverage is 87%.
Missing lines 20->23, 26->30, 27->26, 30->42
Installation with pip
works flawlessly in a clean conda
environment.
Installation from source also works flawlessly, but I think the instructions could be simplified so that users can choose how to manage their environments. For example, poetry
is also available on the conda-forge
channel and nox
can be installed using conda
instead of pip
.
I suggest to decouple the installation instructions for of Python and dependencies from the installation of kallisto
as follows:
poetry
, Python version, ...)
kallisto
: clone, change directory, call poterynox
)The advantage of linking to the installation instructions of the dependencies, IMHO, is that you won't need to maintain such instructions up to date yourself and users can install such dependencies in their preferred way (conda
vs pip
, ...).
Additional comments/questions:
Forgive me but it's been awhile since I've seen the __slots__
argument. I remember some of the numpy folk arguing about it at the last PyData conference.
In src/kallisto/atom.py
, Line 52.
__slots__ = ["data", "molecule", "index"]
def __init__(self, symbol="X", position=(0, 0, 0), charge=0, molecule=None):
self.data = data = {}
Here's what I'm looking at, can you elaborate on your reason for choosing slots, and does it actually make access to your class attributes access that much faster?
In the documentation, I think the variable chi_i
should be defined after the first formula for E_IES
instead than within the definition of the matrix elements of A
.
There are many places where the code can be vectorised in getAtomicPartialCharges()
. This will speedup the execution since it will get rid of multiple for
loops that grow linearly with the number of atoms. Some places that could benefit from vectorisation are the following:
kallisto/src/kallisto/methods.py
Lines 173 to 177 in 293503b
kallisto/src/kallisto/methods.py
Lines 202 to 205 in 293503b
kallisto/src/kallisto/methods.py
Lines 207 to 210 in 293503b
I struggle to see why the arrays A
and X
are being reshaped
kallisto/src/kallisto/methods.py
Lines 215 to 216 in 293503b
since they have been initialized with the correct shape already. Am I missing something?
Is the following line of code correct?
kallisto/src/kallisto/methods.py
Line 191 in 293503b
If yes, I think a few things would help understanding what is going on:
gam
(which recalls to "gamma") is very similar to gamij
; if I understand correctly the former represent Jii
in the documentation and the latter is clearly gammaij
. I'd make the variable names in the code more consistent with the documentation (or vice-versa)gammaij
and gammaii
in terms of the parameters alphai
and alphaj
; from the code there is gamij = 1.0 / np.sqrt(alpha[i] + alpha[j])
, but then the line above seems to indicate that gammaii = 1.0 / np.sqrt(alpha[i])
and not gamii = 1.0 / np.sqrt(alpha[i] + alpha[i])
.Actual coverage is 63%.
Missing lines 14, 17, 20, 29, 32, 58->67, 71->72, 72, 74->77, 77, 81-82, 86-91, 95-96
getCoordinationNumberSpheres()
There seem to be some code duplication in getCoordinationNumberSpheres()
. The code after scale=
is the same for whichever scale is chosen. Additionally, cnsp
is initialized and computed, but not used in the final result.
kallisto/src/kallisto/methods.py
Line 102 in 4e03cd5
I suggest to move the following code
kallisto/src/kallisto/methods.py
Lines 115 to 133 in 4e03cd5
and call it with scale=2
and scale=3
. With such changes, the code will better reflect the definition of CNSP_i^(3,2) given in the documentation.
getCoordinationNumbers()
There is some code duplication in getCoordinationNumbers
kallisto/src/kallisto/methods.py
Lines 24 to 37 in 4e03cd5
However, writing a common function supporting all the different cntype
s could possibly obfuscate the implementation so it is up to you if you want to unify them.
In the documentation I'd mention how the parameters k0, k1, k2, k3
are obtained. Could you also link the following bit of text (see Applications part)
to the corresponding section?
cnsp
calculation in getCoordinationNumberSpheres()
getCoordinationNumberSpheres()
k0, k1, k2, k3
are obtained (in the documentation)(see Applications part)
to the corresponding sectionThis seems like a cool feature to have if it was a dict.
truhlar = [
0.65, # H
1.00, # He
]
and turn into:
truhlar = {
"H": 0.65
"He: 1.00
}
And you could access it like so:
from kallisto.data.vdw import truhlar
print (truhlar)
Maybe if I needed these for any reason down the line. You could also add it in the docs as "if you want these values or to adjust your own.
Update
I just read through more of your data.py
. This might something be something useful for users:
Example Code
class KallistoData(object):
def _get_truhlar():
truhlar = {
"H": 0.65
"He: 1.00
}
return truhlar
truhlar = property(_get_truhlar)
And you could access it like
from kallisto import KallistoData
truhlar = KallistoData.truhlar
The current toolchain to install this project is quite heavy on the user. Is there a plan to provide packaged versions of this project on a conda channel or via PyPI?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.