Giter VIP home page Giter VIP logo

kallisto's People

Contributors

dependabot[bot] avatar f3rmion avatar lynnehansen avatar nhadler avatar rmeli avatar stefanoborini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kallisto's Issues

[JOSS] Paper

Content

While the "Statement of Need" in the manuscript introduces the need of computation of atomic features, it is not clear exactly what the actual capabilities of the software are. I understand the main functionality is described in the online documentation, but I think it should also be clearer from the paper. This will allow to add proper references to the implemented methods as well (if relevant).

The same applies to the mentioned "computational modelling helpers". It would be nice to know which one they are in the paper (at least the most useful/relevant ones).

I also think that specifying DFT functionals and basis sets is not relevant in the paper and would be better off left to the documentation (where I can't see any mention of this?). I think (e.g., static/dynamic poralizabilities with time-dependent DFT data) would be much more readable than (e.g., static/dynamic polarizabilitities with time-dependent (Dreuw & Head-Gordon, 2005) PBE0 (Adamo & Barone, 1999) / d-aug-def2-QZVP (Weigend & Ahlrichs, 2005) data).

Typos

  • Line 9: usecases -> use cases
  • Line 30: polarizabilitities -> polarizabilities
  • Line 38: Stefan Grimme and co-wo rker -> Stefan Grimme and co-workers

(I suck at spotting typos, so I might add more later on...)

References

I think the phrase

Tasks as, e.g., building predictive models, performing virtual screening, or predicting compound activities are potential use cases for such ML applications.

could do with some references about ML models for VS and activity prediction.

Multiple references are not correctly displayed. As it can be inferred from the example paper and bibliography (not very clear , I know...!), multiple references can be put together with the following syntax: [@ref1; @ref2].

Suggestions

  • Add details of implemented featurizers; this can be a comma-separated list
    • Add relevant references (if applicable)
  • Add list of useful computational modelling helpers
    • Add relevant references (if applicable)
  • Fix typos
  • Combine multiple reference into a single one, where necessary
  • Move low-level details to documentation (DFT functionals and basis sets)

JOSS Review

Example Files for xyz

To reach a wider audience - it would be helpful to have a wider set of example files otherwise I am digging through your test directory.

Some folk, and I picked this up from MDAnalysis will come with prepared files within the package. You can choose to include yours in there for the example in the Coordination and Numbers and Sphere's page.


alanine-glycine.xyz

If someone isn't familiar with the xmol format it is also worth it to place in there as well in the docs - you could also link to the test file but most folk don't wanna flood the system with another file they have to coordinate - they like it compartmentalized.

[JOSS] Polarizabilities

Documentation

Hence the problem is shifted from calculating dynamic polarizabilities to calculating atomic coordination numbers and interpolate the atomic reference systems. This procedure comes along with a tremendous increase in efficiency and overcomes furthermore numerical issues that could occur in TD-DFT calculations.

  • Can you please add a comment on the accuracy of this method at this stage and link to the table?

Note that the dynamic reference polarizabilities can be pre-scaled to, e.g., incorporate atomic-charge information as has been introduced recently by Caldeweyher et al.

  • Add missing reference to "Caldeweyher et al."

Other comments:

  • Add timing details to the table? How kallisto compares to AlphaML in terms of speed?
  • Add example with the --molecular option to compute the molecular polarizability

Implementation

The code is quite complex, it could do with a few more comments so that it easier to follow and understand.

As mentioned in #26 (and for the reason above) , it would be helpful to have a better consistency between variable names in the code and the documentation. For example, you have wf = 6.0 in the code, beta in the documentation (beta is not defined at all in the documentation) and beta2 in the paper. I had to dig into the paper to understand what wf = 6.0 was.

Vectorisation

for i in range(nat):
ndim += refn[at[i] - 1]

for i in range(nat):
atomicAiw[i] = aw[0][i]

There might be other parts where the code could benefit from vectorisation and some parts where loops can be reduced.

For example, here I think you can combine the two loops together:

for ii in range(refn[ia]):
for iii in range(ncount[ii][ia]):
twf = (iii + 1) * wf
norm = norm + cngw(twf, covcn[i], refcn[ii][ia])
norm = 1.0 / norm
for ii in range(refn[ia]):
k = itbl[ii][i]
for iii in range(ncount[ii][ia]):
twf = (iii + 1) * wf
gw[k] += cngw(twf, covcn[i], refcn[ii][ia]) * norm


JOSS Review

Feature request: solvent accessible surface area (SASA)

Describe the solution you'd like
Implement solvent accessible surface area (SASA) using an angular Lebedev-Laikov grid as described in "A quadrature formula for the sphere of the 131st algebraic order of accuracy", Doklady Mathematics, Vol. 59, No. 3, 1999, pp. 477-481.
Extend present grid implementation in src/grid.py to cover finer grids and use the defined gridpoints to construct a SASA.

[JOSS] RMSD

In the rmsd.py module, it seems that there is a lot of code that could be removed in favor of scipy.spatial.transform and scipy.linalg. My view here is that code from another library is code you don't have to maintain, and a library such as scipy is usually very reliable, extensively tested and possibly faster.

These are the functions that could use scipy functionality; it would remove a lot of lines of code:

  • rotionMatrix() could use scipy.spatial.transform.Rotation
    • Build a scipy.spatial.transform.Rotation using from_quat()
    • Return corresponding rotation matrix with as_matrix()
  • dstmev() could be substituted with a scipy.linalg.eigh()?
    • This should allow the removal of givens4
    • If there is a good reason to implement this for scratch please let me know, I definitely missed it
  • getRodriguezRotation() could use scipy.spatial.transform.Rotation internally
    • Build a scipy.spatial.transform.Rotation using from_rotvec()
    • Apply the rotation with apply()
  • getRotationMatrix() could use scipy.spatial.transform.Rotation internally
    • Build a scipy.spatial.transform.Rotation using from_rotvec()
    • Return corresponding rotation matrix with as_matrix()
  • getDist could be vectorised or substituted with scipy.spatial.distance.euclidean (or similar)

Up to you if you want to change these, but I think it would be helpful for future development and maintenance. The best code is no code ;)

Additionally:

  • The functionpythag() is missing a docstring.
  • Is the definition of unit in getRodriguezRotation() correct?

JOSS Review

[JOSS] Van der Waals Radii

Documentation

The following docstring is incorrect:

def getVanDerWaalsRadii(
nat: int, at: np.ndarray, aw: np.ndarray, vdwtype: str, scale: float
):
"""A method to compute atomic-charge dependent dynamic atomic polarizabilities (alps).
ALP values are calculated for a given structure and are returned as an
array. For the charge dependency EEQ atomic partial charges are used
in an empirical scaling function as used in the dftd4 program."""

Implementation

Vectorisation

Unfortunately I think that changes introduced for #22 prevent vectorisation in this case (sorry for not spotting this previously!), since you have to go from the atomic number to the chemical symbol in order to extract the parameters needed.

ia = at[i]
isym = chemical_symbols[ia]
theta_b = truhlar[isym]

I'm not too keen to suggest to revert the changes, so it is up to you to decide if it is worth the effort.


JOSS Review

Increase unit test coverage for molecule.

Actual coverage is 62%.
Missing lines 48->50, 50->53, 53->62, 54->55, 55-59, 62->63, 63-67, 73->78, 75->76, 76-84, 86->87, 87, 90->91, 91, 98->99, 99, 111-126, 130, 134, 140-143, 155-160

[JOSS] grid module outside of kallisto

I started looking at the source code and I noticed that the grid.py module is outside the kallisto package. When kallisto is imported, the grid module also becomes available, possibly clashing with or shadowing other modules. Is this intended?

I think it would be better to have the grid module within the package, unless I'm missing a good reason not to.


JOSS Review

[JOSS] Sterimol Descriptors

Documentation

The documentation for the sterimol descriptors is quite scarce. I'd add some details (or at very least link to the relevant section of the Morfeus package.

Code

Do I understand correctly that the code comes directly from Morfeus package? I would make this clearer in the module docstring (not only the function getClassicalSterimol()). Is there any kind of copyright or license that needs to be mentioned as well?

I was not able to find this code; if it was given in private, maybe an acknowledgment of this in the paper is needed. Some clarifications about how the code was obtained and integrated would be helpful to better understand the situation.


JOSS Review

Recommend FOSSA for license visibility

One thing I can recommend is:

FOSSA Status

Since you've got the nox in there, I don't know what the licensing issues are in there. When we touch the open source world and you've got massive amounts of python packages a robust licensing is definitely worth it.

FOSSA will scan through each dependency (it's free) and bring each license for dependency to light.

I added a FOSSA badge as an example into one of my projects. It's very easy to setup ~5 minutes for github integration like CodeCov and can go through each license to make sure you are not stepping on anyone's toes.

Python Badge Adjustment - Suggestion

Add a python badge - i need to know which python versions are supported. Right now I float at python 3.7 and have been just scared about python 3.8 and above. I just need to know with what context in mind as i read the docs.

Here's some code to do so:

Python

Error reporting on wrong usage of the CLI

Is your feature request related to a problem? Please describe.
I just run kallisto over a set of structures with

for i in *; do
  kallisto alp $i/inp.xyz > $i/kallisto.out
done

But remembered a moment too late that kallisto expects the input file as argument to --inp and the positional argument is the output. Fortunately, the geometries were under version control, which made reverting easy.

At least to me the CLI feels from time to time unintuititive and I have to frequently consult the help printout to get the command line right.

Describe the solution you'd like
Print the error message to the stderr instead of the output file.

Describe alternatives you've considered
Adjust the CLI to expect the input file as positional argument and write to stdout.

Travis CI Build Instructions (Just in case others are looking for it)

Currently using Kallisto, and I used Travis, but I thought I'd share my yml just in case anyone else ran into this issue as well!

I think it's imperative to have instructions for all CI/CD's, so I wrote some Travis build instructions for this repo for AstraZeneca, just incase, and this happened to be what I used when implementing CI/CD of my liking/choice.

--- 
language: python
matrix: 
  include: 
    - 
      dist: xenial
      os: linux
      python: 3.6
      sudo: required
    - 
      dist: xenial
      os: linux
      python: 3.7
      sudo: required
    - 
      env: PYTHON=36
      language: generic
      os: osx
      osx_image: xcode10
script: 
  - "pip install nox"
  - "pip install poetry"

# Travis CI Build Instructions for Kallisto made by Montana Mendy
# Just incase if needed by AstraZeneca

-Montana Mendy

[JOSS] Units

So digging through your units.py - I'm curious about the design position about implementing your own units instead of using one of your packages dependencies - scipy? Is Scipy more trustworthy than the 2014 paper more reputable or vice versa?

Scipy Implementation

import scipy.constants as constants

newt_grav = constants.gravitational_constant

I think that's the code. It might be conflicting for users down the line to have two sets of constants. I usually just import from scipy.

Here's the link to the docs: https://docs.scipy.org/doc/scipy/reference/constants.html

Comparison:


6.6740e-11 (Kallisto) | 6.6743e-11 (SciPy) # Gravitational Constant
6.626070040e-34 (Kallisto) | 6.62607015e-34 (Scipy) # Planck Constant

etc..

No output if not in verbose mode?

Describe the bug

I found the command line interface a bit unintuitive, i.e. if I want to calculate a property I'm using:

โฏ kallisto --verbose cns --inp struc.xyz
[3.98610602 3.11339432 3.25775815 3.40451617 2.07932831 3.45973873
 0.9958313  3.00988788 3.05595857 1.00441848 3.01990055 3.01780514
 1.00235052 4.00645069 0.99417174 0.99389366 0.99390611 3.9980567
 0.99426699 0.9936817  0.99393769 0.99395809 0.99370343 0.99367221]

Now if I'm leaving away the --verbose flag, the calculation seems to happen but no printout is produced?

โฏ kallisto cns --inp struc.xyz

To Reproduce

Environment setup:

โฏ conda info

     active environment : kallisto
    active env location : /home/awvwgk/miniforge3/envs/kallisto
            shell level : 2
       user config file : /home/awvwgk/.condarc
 populated config files : /home/awvwgk/miniforge3/.condarc
          conda version : 4.9.2
    conda-build version : not installed
         python version : 3.7.8.final.0
       virtual packages : __glibc=2.33=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /home/awvwgk/miniforge3  (writable)
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/awvwgk/miniforge3/pkgs
                          /home/awvwgk/.conda/pkgs
       envs directories : /home/awvwgk/miniforge3/envs
                          /home/awvwgk/.conda/envs
               platform : linux-64
             user-agent : conda/4.9.2 requests/2.24.0 CPython/3.7.8 Linux/5.10.15-1-MANJARO manjaro/20.2.1 glibc/2.33
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

โฏ conda list
# packages in environment at /home/awvwgk/miniforge3/envs/kallisto:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
certifi                   2020.12.5        py39hf3d152e_1    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
kallisto                  1.0.4                    pypi_0    pypi
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_18    conda-forge
libgfortran-ng            9.3.0               hff62375_18    conda-forge
libgfortran5              9.3.0               hff62375_18    conda-forge
libgomp                   9.3.0               h2828fa1_18    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libstdcxx-ng              9.3.0               h6de172a_18    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
numpy                     1.20.1           py39hdbf815f_0    conda-forge
openssl                   1.1.1j               h7f98852_0    conda-forge
pip                       21.0.1             pyhd8ed1ab_0    conda-forge
python                    3.9.2           hffdb5ce_0_cpython    conda-forge
python_abi                3.9                      1_cp39    conda-forge
readline                  8.0                  he28a2e2_2    conda-forge
scipy                     1.6.0            py39hee8e79c_0    conda-forge
setuptools                49.6.0           py39hf3d152e_3    conda-forge
sqlite                    3.34.0               h74cdb3f_0    conda-forge
tk                        8.6.10               hed695b0_1    conda-forge
tzdata                    2021a                he74cb21_0    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge

Input file (any input geometry would do as well)

  24
        -42.15442198
 C         -3.2668063145       -1.0612383311        0.0019792586
 N         -2.2766003898       -0.0195172936       -0.0013208218
 C         -0.9134590858       -0.2019878522        0.0036747135
 C         -0.3724701006        1.0762202467        0.0017569102
 N         -1.3526586265        2.0066755645       -0.0051933750
 C         -2.4700895686        1.3111433173       -0.0065911261
 H         -3.4434528448        1.7598589152       -0.0115392456
 N          0.9701620055        1.2833365802       -0.0008793828
 C          1.8257144377        0.2028321683       -0.0007168653
 O          3.0278876281        0.3474912012       -0.0051812078
 N          1.2515027114       -1.0636950051       -0.0008922263
 C         -0.1065018202       -1.3842039491        0.0029991689
 O         -0.5015980556       -2.5366534217        0.0044099955
 C          2.2018357820       -2.1638123093       -0.0037185988
 H          1.6323675079       -3.0900622488        0.0097444448
 H          2.8244315108       -2.1119787186       -0.8964193499
 H          2.8450781508       -2.0977859157        0.8731480371
 C          1.5216365707        2.6211569486        0.0093869454
 H          0.7306184352        3.3088549484       -0.2771144995
 H          1.8886888508        2.8714188674        1.0060887389
 H          2.3551115434        2.6687817985       -0.6902763920
 H         -2.7510404767       -2.0202634640       -0.0176576390
 H         -3.9109098848       -0.9753935259       -0.8744768432
 H         -3.8816811438       -0.9984445054        0.9013764258

Expected behavior
Do something meaningful for non-verbose calculations.

[JOSS] Tests

Most of the tests are in test_cli.py instead of being in separate and standalone files (for example test_methods.py, test_rmsd.py, ...). This is not a problem in itself, but it seems that in some tests the asserts do not really check the result.

Using the CLI to perform tests forces you to check the results as strings instead of numbers. This seems to be done incorrectly (see below). Additionally, if at some point you want to change the output format (for example to use scientific notation), you will have to change all the assertions.

I would suggest to re-work the tests in order to decouple the test of the CLI from the tests of the functionality. This is not really necessary, but the feeling I have looking at test_cli.py is that the code coverage is very high because all parts of the code are run while some tests are not really meaningful (i.e. they mostly check that the test was run and output files have been created). I can't see many tests for edge cases either. I would like to see more clear and meaningful tests, comparing with known results (obtained from reference software or from manual calculations), so that users can make sure the package produces correct numerical results.

testcli.py

Tests without output?

The following tests does not seem to have any output when run outside the testsuite:

kallisto/tests/test_cli.py

Lines 96 to 104 in fe84963

def test_cli_bonds(runner):
with tempfile.NamedTemporaryFile(mode="w+", encoding="utf-8") as f:
f.write("$coord" + s)
f.write("0.00000000000000 0.00000000000000 -1.06176434496059 c" + s)
f.write("0.00000000000000 0.00000000000000 1.06176434496059 h" + s)
f.write("$end")
f.flush()
result = runner.invoke(cli, ["bonds", "--inp", f.name])
assert result.exit_code == 0

kallisto/tests/test_cli.py

Lines 107 to 115 in fe84963

def test_cli_bonds_partner(runner):
with tempfile.NamedTemporaryFile(mode="w+", encoding="utf-8") as f:
f.write("$coord" + s)
f.write("0.00000000000000 0.00000000000000 -1.06176434496059 c" + s)
f.write("0.00000000000000 0.00000000000000 1.06176434496059 h" + s)
f.write("$end")
f.flush()
result = runner.invoke(cli, ["bonds", "--inp", f.name, "--partner", "0"])
assert result.exit_code == 0

Test checking only if output files have been produced

The following tests only check if files have been produced, but not the content of the output file and there is no check beside the fact that they run:

kallisto/tests/test_cli.py

Lines 118 to 133 in fe84963

def test_cli_bonds_constrain(runner):
with tempfile.NamedTemporaryFile(mode="w+", encoding="utf-8") as f:
f.write("$coord" + s)
f.write("0.00000000000000 0.00000000000000 -1.06176434496059 c" + s)
f.write("0.00000000000000 0.00000000000000 1.06176434496059 h" + s)
f.write("$end")
f.flush()
constrain = "constrain.inp"
gotFile = os.path.isfile(constrain)
assert gotFile is False
result = runner.invoke(cli, ["bonds", "--inp", f.name, "--constrain"])
assert result.exit_code == 0
gotFile = os.path.isfile(constrain)
assert gotFile is True
if gotFile:
os.remove(constrain)

kallisto/tests/test_cli.py

Lines 418 to 433 in fe84963

def test_cli_rms(runner):
with tempfile.NamedTemporaryFile(mode="w+", encoding="utf-8") as f1:
f1.write("$coord" + s)
f1.write("0 0 -1.06176 c" + s)
f1.write("0 0 1.06176 h" + s)
f1.write("$end")
f1.flush()
f2 = tempfile.NamedTemporaryFile(mode="w+", encoding="utf-8")
f2.write("$coord" + s)
f2.write("1 1 -0.06176 c" + s)
f2.write("1 1 2.06176 h" + s)
f2.write("$end")
f2.flush()
result = runner.invoke(cli, ["rms", "--compare", f1.name, f2.name])
assert result.exit_code == 0
f2.close()

kallisto/tests/test_cli.py

Lines 730 to 745 in fe84963

newstructure = "newstructure.xyz"
gotFile = os.path.isfile(newstructure)
assert gotFile is False
result = runner.invoke(
cli, ["exs", "--inp", f1.name, f2.name, "--center", "18", "--subnr", "2"],
)
assert result.exit_code == 0
gotFile = os.path.isfile(newstructure)
assert gotFile is True
if gotFile:
os.remove(newstructure)
constrain = "constrain.inp"
gotFile = os.path.isfile(constrain)
assert gotFile is True
if gotFile:
os.remove(constrain)

Incorrect assertions

The following assertions look incorrect, since "A" and "B" evaluates to "B" if "A" is true (i.e. non-empty string):

assert "-0.17166856" and "0.17166856" in result.output

assert "0.59769359" and "0.40230641" in result.output

assert "-0.94103071" and "-0.058969287" in result.output

assert "6.5655467" and "1.7519379" in result.output

assert "4.8142762" and "1.0744963" in result.output

assert "9.4232394" and "3.2506322" in result.output

assert "3.29019696" and "2.5041682" in result.output

assert "1.74109" and "1.32514" in result.output

assert "1.565228" and "0.946534" in result.output

assert "1.741096" and "1.325148" in result.output

assert "1.665613" and "1.235759" in result.output

assert "1.833333" and "1.447486" in result.output

assert "Substructure" and "[1]" in result.output

assert "Substructure" and "[0]" in result.output

kallisto/tests/test_cli.py

Lines 1182 to 1186 in fe84963

assert "14.08" and "7.45" in result.output
# Bmin value in Bohr and Angstrom
assert "11.18" and "5.92" in result.output
# Bmax value in Bohr and Angstrom
assert "14.62" and "7.74" in result.output

assert "4.38" and "3.36" in result.output

Additional Comments

The following test brought to my attention that the original information in the second line of the XYZ file is lots and replaced by "Created with kallisto":

kallisto/tests/test_cli.py

Lines 175 to 178 in fe84963

result = runner.invoke(cli, ["sort", "--inp", f.name, "--start", "5"])
assert result.exit_code == 0
assert "11" in result.output
assert "kallisto\nN 0.6816 1.1960 0.0000" in result.output

I think it would be better to retain the original comment and add "Created with kallisto" after it.


JOSS Review

Test Section in the Feature's Pages

I've ran through your First tutorial on Coordination Numbers and Sphere's for the xyz. While the output is nice and I have a general idea of what's going on I want to also know how you tested this feature right there.

What I do for gitbook applications is split the features into three sections:

  • Theory
  • Implementation
  • Test

So when someone looks at this feature they know how you tested it. It doesn't have to be verbose but just a "nice to have" to give me more confidence.

If you would like I can provide a link to my unreleased gitbook to show you that.

Exchange of polydentate ligand

Hello

(First time writing a suggestion here on GitHub why I hope this is correctly done)

I want to use Kallisto do to an exchange of a polydentate ligand but it seems not to be possible why I suggest this to be a good feature to add.

[JOSS] Installation instructions

Installation with pip works flawlessly in a clean conda environment.

Installation from source also works flawlessly, but I think the instructions could be simplified so that users can choose how to manage their environments. For example, poetry is also available on the conda-forge channel and nox can be installed using conda instead of pip.

I suggest to decouple the installation instructions for of Python and dependencies from the installation of kallisto as follows:

  • Clearly state the dependencies to install from source (poetry, Python version, ...)
    • Link to the installation instructions of such dependencies
  • Simplify the instructions to install kallisto: clone, change directory, call potery
  • Create a separate section for testing
    • List dependencies for testing (nox)

The advantage of linking to the installation instructions of the dependencies, IMHO, is that you won't need to maintain such instructions up to date yourself and users can install such dependencies in their preferred way (conda vs pip, ...).

Additional comments/questions:

  • Is "The Verbose Mode" section intended to be in the installation page?
  • It would also be nice to have a list of the main dependencies at the top of the installation page.

JOSS Review

[JOSS] Slots

Forgive me but it's been awhile since I've seen the __slots__ argument. I remember some of the numpy folk arguing about it at the last PyData conference.

In src/kallisto/atom.py, Line 52.


    __slots__ = ["data", "molecule", "index"]

    def __init__(self, symbol="X", position=(0, 0, 0), charge=0, molecule=None):

        self.data = data = {}

Here's what I'm looking at, can you elaborate on your reason for choosing slots, and does it actually make access to your class attributes access that much faster?

[JOSS] Partial Charges

Documentation

In the documentation, I think the variable chi_i should be defined after the first formula for E_IES instead than within the definition of the matrix elements of A.

Implementation

Vectorisation

There are many places where the code can be vectorised in getAtomicPartialCharges(). This will speedup the execution since it will get rid of multiple for loops that grow linearly with the number of atoms. Some places that could benefit from vectorisation are the following:

for i in range(nat):
xi[i] = eeq_en[at[i] - 1]
gam[i] = eeq_gamm[at[i] - 1]
kappa[i] = eeq_cnfak[at[i] - 1]
alpha[i] = pow(eeq_alp[at[i] - 1], 2)

X = np.zeros(shape=(m,), dtype=np.float64)
for i in range(nat):
tmp = kappa[i] / (np.sqrt(cns[i]) + 1e-14)
X[i] = -xi[i] + tmp * cns[i]

# set Lagragian constraints
for i in range(m):
A[i][nat] = 1.0
A[nat][i] = 1.0

Reshaping

I struggle to see why the arrays A and X are being reshaped

np.reshape(A, (m, m))
np.reshape(X, (m,))

since they have been initialized with the correct shape already. Am I missing something?

Consistency with Documentation

Is the following line of code correct?

A[i][i] = gam[i] + sqrt2pi / np.sqrt(alpha[i])

If yes, I think a few things would help understanding what is going on:

  1. The name gam (which recalls to "gamma") is very similar to gamij; if I understand correctly the former represent Jii in the documentation and the latter is clearly gammaij. I'd make the variable names in the code more consistent with the documentation (or vice-versa)
  2. I'd give explicitly give the definition of gammaij and gammaii in terms of the parameters alphai and alphaj; from the code there is gamij = 1.0 / np.sqrt(alpha[i] + alpha[j]), but then the line above seems to indicate that gammaii = 1.0 / np.sqrt(alpha[i]) and not gamii = 1.0 / np.sqrt(alpha[i] + alpha[i]) .

JOSS Review

[JOSS] Coordination Numbers and Spheres

getCoordinationNumberSpheres()

There seem to be some code duplication in getCoordinationNumberSpheres(). The code after scale= is the same for whichever scale is chosen. Additionally, cnsp is initialized and computed, but not used in the final result.

cnsp = np.zeros(shape=(nat,), dtype=np.float64)

I suggest to move the following code

for i in range(nat):
ia = at[i] - 1
for j in range(nat):
if i is j:
continue
ja = at[j] - 1
dx = coords[j][0] - coords[i][0]
dy = coords[j][1] - coords[i][1]
dz = coords[j][2] - coords[i][2]
rSquared = dx * dx + dy * dy + dz * dz
if rSquared > threshold:
continue
r = np.sqrt(rSquared)
rco = scale * (rcov[ia] + rcov[ja])
eni = pauling_en[ia]
enj = pauling_en[ja]
den = k4 * np.exp(-((np.abs(eni - enj) + k5) ** 2) / k6)
damp = den * 0.5 * (1 + special.erf(-kn * (r - rco) / rco))
cnsp[i] += damp

and call it with scale=2 and scale=3. With such changes, the code will better reflect the definition of CNSP_i^(3,2) given in the documentation.

getCoordinationNumbers()

There is some code duplication in getCoordinationNumbers

for i in range(nat):
ia = at[i] - 1
for j in range(nat):
if i is j:
continue
dx = coords[j][0] - coords[i][0]
dy = coords[j][1] - coords[i][1]
dz = coords[j][2] - coords[i][2]
rSquared = dx * dx + dy * dy + dz * dz
if rSquared > threshold:
continue
ja = at[j] - 1
r = np.sqrt(rSquared)
rco = rcov[ia] + rcov[ja]

However, writing a common function supporting all the different cntypes could possibly obfuscate the implementation so it is up to you if you want to unify them.

Documentation

In the documentation I'd mention how the parameters k0, k1, k2, k3 are obtained. Could you also link the following bit of text (see Applications part) to the corresponding section?

Suggested TODOs

  • Remove unused cnsp calculation in getCoordinationNumberSpheres()
  • Merge duplicate code in getCoordinationNumberSpheres()
  • Mention how the parameters k0, k1, k2, k3 are obtained (in the documentation)
  • Link the text (see Applications part) to the corresponding section

JOSS Review

[JOSS] VDW radii

This seems like a cool feature to have if it was a dict.

truhlar = [
    0.65,  # H
    1.00,  # He
]

and turn into:

truhlar = {
    "H": 0.65
    "He: 1.00
}

And you could access it like so:

from kallisto.data.vdw import truhlar
print (truhlar)

Maybe if I needed these for any reason down the line. You could also add it in the docs as "if you want these values or to adjust your own.

Update

I just read through more of your data.py. This might something be something useful for users:

Example Code

class KallistoData(object):

    def _get_truhlar():
      truhlar = {
          "H": 0.65
          "He: 1.00
      }
      return truhlar

    truhlar = property(_get_truhlar)

And you could access it like

from kallisto import KallistoData

truhlar = KallistoData.truhlar

Packaging for PyPI or conda?

The current toolchain to install this project is quite heavy on the user. Is there a plan to provide packaged versions of this project on a conda channel or via PyPI?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.