gemdat-repos / gemdat Goto Github PK
View Code? Open in Web Editor NEWPython toolkit for molecular dynamics analysis
Home Page: https://gemdat.readthedocs.io
License: Apache License 2.0
Python toolkit for molecular dynamics analysis
Home Page: https://gemdat.readthedocs.io
License: Apache License 2.0
These are calculated in calc_rdfs.m
. Data can be verified against rdf.mat
.
rdf.distributions
rdf.integrated
rdf.rdf_names
rdf.elements
rdf.max_dist
rdf.resolution
rdf.total
It might look something like this:
from GEMDAT import plot_1, plot_all
plot_1(<data_and_config>)
plot_all(<data_and_config>)
Or something like this:
from GEMDAT import plot
plot(<data_and_config>, ['diffusivity', 'MSD'])
from GEMDAT.plots import plot_diffusivity
plot_diffusivity(<data_and_config>)
data_and_config:
There will probably be the need to adjust plots with some configuration (splitting into x number of multiple smaller simulations, or cutting away the first few timesteps). I think it would be okay to just pass those through **kwargs
and let all plot functions accept **kwargs
, so they can extract those keywords that they will use.
All the possible keywords should be listed in the plot
(or plot_all
for option 1) function, but how they are implemented can be explained in the specific plot_xxx
function.
I prefer option two, what are your thoughts about this, @stefsmeets ?
To calculate statistics, we are calculating dedicated *_parts attributes on SitesData
. These inputs for these analyses are no different than the full data set, just on a smaller subset.
So, instead of having dedicated parts variables, like $VARIABLE_parts
, consider feeding SitesData
with the parent data already split (probably atom_sites
or all_transitions
. Use these as a basis for subsequent calculations.
Basically, instead of:
parent_data -> SitesData -> derived_data -> split -> derived_data_parts
One full SitesData instance for entire timeseries with attributes containing lists of parts data.
Do:
parent_data -> SitesData -> derived_data
parent_data -> split -> SitesData -> derived_data_parts
One SitesData instance for entire timeseries + list of SitesData instances (one for each part).
Use of functools.lru_cache
on class methods can lead to memory leaks. The cache may retain instance references, preventing garbage collection.
See this SO post for more info:
https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-instance-methods-release-object
This seems to be the most straightforward way to work around it:
https://stackoverflow.com/a/68550238
RDFS are still missing from the dashboard. Example snippet to generate RDF plots:
from gemdat import SimulationData, SitesData
from gemdat.io import load_known_material
equilibration_steps = 1250
diffusing_element = 'Li'
diffusion_dimensions = 3
z_ion = 1
VASP_XML = '/home/stef/md-analysis-matlab-example-short/vasprun.xml'
data = SimulationData.from_vasprun(VASP_XML)
extras = data.calculate_all(
equilibration_steps=equilibration_steps,
diffusing_element=diffusing_element,
z_ion=z_ion,
diffusion_dimensions=diffusion_dimensions,
)
structure = load_known_material('argyrodite', supercell=(2,1,1))
sites = SitesData(structure)
sites.calculate_all(data=data, extras=extras)
from gemdat.rdf import *
rdfs = calculate_rdfs(
data=data,
sites=sites,
diff_coords=extras.diff_coords,
n_steps=extras.n_steps,
equilibration_steps=extras.equilibration_steps,
max_dist=10,
resolution=0.1,
)
for state, rdf in rdfs.items():
plot_rdf(rdf, name=state)
As we ported more, we realized that some structures could be better represented. this issue is to track that.
Trajectory
functions, and instead allow a view on trajectory, like:li_trajectory = trajectory.where(element='Li', equilibration_steps=1250)
__all__
trajectory.precompute()
and sites.precompute()
I think we should set up a trajectory class which will make it easier to handle the simulation data.
Most that we care about is the Trajectory anyways. The pymatgen trajectory class is somewhat limited for our use-case.
This could also take some of the methods in the vibration/displacements modules (speed, displacements, etc.)
Just writing some ideas here:
class Trajectory:
_coords: np.ndarray[$time, $site, $xyz]
structure: pymatgen.core.Structure
@property
def n_steps(self):
# return number of steps after equilibration time
@property
def coords(self):
return self._coords[self.n_steps:, ...]
def displacements(self, element: list[str] | str | None=None):
# Return displacements for all / selected elements
def speed(self, element: list[str] | str | None=None):
# Return speed for all / selected elements
def set_equilibration_time(self, equilibration_time):
# sets starting point for data
def get_coords_for_element(self, label):
# replaces diff_coords, which is somewhat poorly named
Trajectory.metadata
attribute to track some sort of global simulation parameters like temperaturecalculate_all()
to GemdatTrajectory
?Trajectory
to easily get coordinates for diffusing atomTrajectory.from_PymatgenTrajectory
As a researcher working with MD data,
I want to load sites locations from a density file,
so that I can have have higher accuracy analysis
Loading the sites from a cif file works, but is not ideal, because it has to be a manually defined and the positions are static.
As an alternative, we can generate the sites from the trajectory directly. The trajectory can be used to generate an electron density, in turn, we can use peaks in the electron density to define the position of sites.
For example,
trajectory
trajectory.get_lattice()
?Alternative to 2-4: squash along time axis and use cluster analysis to find best n sites.
CIF files are the standard file format for storing crystallographic information.
Crystal structures for the materials we work with are available:
E.g. for argyrodite:
In the matlab code these are coded by hand in known_materials.m
. I would like to have these available in standard CIF format, so that
Some of the simulation statistics we are calculating can be exposed in the dashboard.
I think this element would be a fun way to display it:
https://docs.streamlit.io/library/api-reference/data/st.metric
This tool lends itself very well for an interactive dashboard.
I would like to have something where you can load your project in the sidebar, set some parameters (like the diffusing element, equilibration time, etc), and have a selector for which plots to generate.
I feel like these axis are not correct. This issue is to make sure I don't forget
#48 adds the first plot (plots.jumps.plot_jumps_vs_distance
) that uses the SitesData
class. We need a selector in the dashboard to:
/src/data
)(1,1,1)
)I think plotly can be a good candidate for this, but the matplotlib to plotly conversion function seems to be broken for this figure at the moment.
File "/home/vikko/local_projects/GEMDAT/.venv/lib/python3.11/site-packages/plotly/matplotlylib/mplexporter/exporter.py", line 289, in draw_collection
offset_order = offset_dict[collection.get_offset_position()]
The issue is well known and because of a deprecation in matplotlib: this very ugly fix works:
mpld3/mpld3#477
But then again it does not seem to understand more than 2 dimensions, so this is not the way to go.
If we want to do this it is probably best to re-implement it fully in plotly (see also comment below)
There are two main properties which can control this (assuming the sites are defined correctly).
This should be a post-process step after defining the Sites and probably should be specified as parameters for calculating the jumps
Currently the plots are created with matplotlib, however for some plots and the dashboard interactivity might be nice to add. We could use
This has to be fixed in pymatgen <... room for pull request link>
We can use the matplotlib decorators to test for plot similarity:
https://matplotlib.org/stable/api/testing_api.html#matplotlib.testing.decorators.image_comparison
See implementation example here:
https://github.com/hpgem/nanomesh/blob/main/tests/test_plotting.py
This issue has a list of plots and visualizations that should be implemented.
This issue tracks processing the 'known materials' data and calculating sites data.
Some of the data we are generating for the timesteps are well suited for storing in an xarray. Most of the data we are working with are some form of (time step, atom index).
As dimensions we can use:
atom_locations
and sites_occup
and their parts equivalents, atom_loc_parts
and sites_occup_parts
appear to have the same values. What is the difference? Is there a difference?
One of the stretch goals of the project would be to work with dynamic site locations.
At present, the sites are defined from a cif file and static with time. In a real scenario, the atomic clusters are oscillating/moving. This affects the jumps calculations.
Pymatgen is currently pinned to my fork.
The label fixes were merged yesterday. Once a new release of pymatgen becomes available, we should update our pin to the latest version and make a new release on pypi.
Find/generate structures in CIF format for known materials in matlab code. These are the ones that are available.
plot_collective_jumps
: File "/home/vikko/local_projects/GEMDAT/src/plots/jumps.py", line 112, in plot_collective_jumps
ticks = range(len(sites.jump_names))
File "/home/vikko/local_projects/GEMDAT/src/sites.py", line 153, in jump_names
return ['->'.join(key) for key in self.rates]
File "/home/vikko/local_projects/GEMDAT/src/sites.py", line 153, in <listcomp>
return ['->'.join(key) for key in self.rates]
TypeError: sequence item 0: expected str instance, NoneType found
plot_jumps_3d
File "/home/vikko/local_projects/GEMDAT/src/plots/jumps.py", line 189, in plot_jumps_3d
plotter.plot_labels(site_labels,
File "/home/vikko/local_projects/pymatgen/pymatgen/electronic_structure/plotter.py", line 4268, in plot_labels
if k.startswith("\\") or k.find("_") != -1:
AttributeError: 'NoneType' object has no attribute 'startswith'
Currently disabled by not putting them in plots.__all__
Some analyses require that we use the labels to tag the sites we are working with. These can be read from the cif file, but pymatgen
does not store these data. I have a fork here with this feature: https://github.com/stefsmeets/pymatgen/tree/cif-site-labels3
This will make the site labels available via structure.labels
/ structure.sites[0].label
.
Pydantic v2 was released 30 june. It seems to be somewhat broken: https://github.com/pydantic/pydantic/issues
I pinned the version to V1 for now in #19.
One option might be to add sensible computable defaults to all plots as most arrays can be calculated from the Data arrays.
Another might be to make those arrays optionally computable on the Data object somehow.
The nice thing here would be to have it transparantly, so that if a user provides it that arrray is used, and that otherwise a default is calculated from the provided Data if possible.
issues raised by stefsmeets: #71 (comment)
with the transition energy we can determine the energy threshold which has to be crossed for a successful jump, if this energy is too low we could inform the user that the sites are not well-defined and should probably be merged.
This function:
Line 195 in b6d6f98
Periodic boundaries should be taken into account
Possible paths to explore:
Should be an operation f(Trajectory) -> Trajectory
vasprun.xml
When converting between positions and displacements on a trajectory, pymatgen does not completely convert back to the origin cell.
For example, a position at [0, 0, 0.001] may end up at [0, 0, 1.001].
>>> trajectory = Trajectory.from_vasprun(vasp_xml)
>>> coords1 = trajectory.filter('Li').coords
>>>
>>> trajectory.to_displacements()
>>> trajectory.to_positions()
>>>
>>> coords2 = trajectory.filter('Li').coords
>>>
>>> np.testing.assert_allclose(coords1, coords2)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
Mismatched elements: 33264 / 540000 (6.16%)
Max absolute difference: 1.
Max relative difference: 4761904.77031164
x: array([[[0.13401 , 0.36404 , 0.028937],
[0.09121 , 0.712146, 0.508927],
[0.289686, 0.675079, 0.220109],...
y: array([[[ 0.13401 , 0.36404 , 0.028937],
[ 0.09121 , 0.712146, 0.508927],
[ 0.289686, 0.675079, 0.220109],...
I just noticed pymatgen has a very useful module to work with scientific units, including a FloatWithUnit
class.
I think we should consider using this (or an alternative package) to keep track/define scientific values/constants.
See available formats here:
https://pymatgen.org/pymatgen.io.html
Minimum target is to be able to load these into a Trajectory
.
The option to select a server-side file seems to be missing.
Currently this is solved by just having a text input box, which is a bit ugly
pymatgen does not read the labels correctly if there are multiple of the same species.
>>> structure = load_known_material('lisnps')
>>> set(structure.labels)
{'Li1'}
# expected: {'Li1', 'Li2', 'Li3', 'Li4'}
There are two ways to go about this:
A site should have: Position, size
, the size might differ per dimension. In this way ellipsoidal sites could also be defined
In the matlab code you can specify the supercell to use for the known structure. We should have this feature in gemdat
as well.
We can do this using this method on Structure
:
https://pymatgen.org/pymatgen.core.structure.html#pymatgen.core.structure.Structure.make_supercell
Currently if a calculation fails, it is just "out of bounds" or "shape mismatch".
We could check the desired shapes and give a useful error message if any of those do not match.
The matlab code has no check for if the known materials structure matches the vasp/lammps data. This can happen if the cell orientation, supercell, or symmetry does not match.
We can add a basic check on the lattice parameters (within some tolerance, e.g. 0.5 Angstrom / 1 degrees) to prevent potential errors.
This check should probably be implemented in SitesData.calculate_all
(compare SimulationData.structure
with SitesData.structure
).
The RDF plots have incorrect labels for the x-axis. Currently it just uses the bin number, these should be adjusted to the distance bin in Angstrom.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.