rs-station / reciprocalspaceship Goto Github PK
View Code? Open in Web Editor NEWTools for exploring reciprocal space
Home Page: https://rs-station.github.io/reciprocalspaceship/
License: MIT License
Tools for exploring reciprocal space
Home Page: https://rs-station.github.io/reciprocalspaceship/
License: MIT License
If a DataSet object contains unmerged reflections, a call to DataSet.write_mtz()
currently has the side effect of calling DataSet.hkl_to_asu(inplace=True)
on the object:
In [1]: ds = rs.read_mtz("data_unmerged.mtz")
In [2]: ds["BATCH"].head()
Out[2]:
H K L
19 -11 2 1424
3 18 -1 328
-17 21 17 576
21 26 -7 70
-9 17 11 798
Name: BATCH, dtype: Batch
In [3]: ds.write_mtz("/dev/null")
In [4]: ds["BATCH"].head()
Out[4]:
H K L
19 11 2 1424
18 3 1 328
21 17 17 576
26 21 7 70
17 9 11 798
Name: BATCH, dtype: Batch
I consider this to be a bug because writing data to a file should not change the underlying object.
is there a way to do this: https://stackoverflow.com/questions/18176933/create-an-empty-data-frame-with-index-from-another-data-frame for rs dataframes?
It is common to assign reflections to resolution bins for different types of crystallographic analyses. This functionality is used in rs.utils.add_rfree()
, as well as in rs.algorithms.scale_merged_intensities()
when average intensities are computed isotropically.
Since this is a common operation for computing crystallographic stats, I think this functionality should be refactored to be a built-in method for DataSet
-- something like DataSet.assign_resolution_bins(nbins=20)
.
Although there is a function for flagging systematic absences in rs.utils
, there is not currently a user-facing method in rs.DataSet
. Such a function should be written with the same call signature as rs.DataSet.label_centrics()
.
DataSet.to_reciprocalgrid()
does not currently enable data to differ between Friedel halves of reciprocal space. This functionality would be useful for things like generating maps from underlying anomalous data. As far as API, this could be implemented with an anomalous=True|False
argument. However, it may involve a decision such as how to specify the anomalous columns labels.
Internally, if such a function were called with a two-column anomalous DataSet, it would be possible to implement this method using a call to stack_anomalous()
in the place of expand_anomalous()
. Implicit in this is that the (+)
/(-)
-suffixed columns would be renamed to drop the Friedel specifications.
f803979 added explicit support for version 1.3.0, but excluded patch releases such as 1.3.1.
Was that on purpose or should the expression read <1.4
?
When converting from a centered spacegroup like C2 to a non-centered one like P1, it is natural to apply a symop like
gemmi.Op("1/2*x-1/2*y,1/2*x+1/2*y,z")
. When not careful with the input Miller indices, this can result in fractional Miller indices. It may be good to add a check that new indices are integers.
It came up today that rs.read_precognition()
reads .ii files. This isn't explicitly stated in the docstring. It should be made more clear that the io function supports both file types.
This issue provides visibility into Renovate updates and their statuses. Learn more
This repository currently has no open or pending branches.
Decorators that scan the arguments of a function and automatically convert cell
and spacegroup
arguments to their proper gemmi
types would be useful in writing standalone functions. We have 3 decorators
gemmify
decorator. @gemmify scans for keyword arguments named sg, spacegroup , space_group, cell, unitcell, unit_cell, and converts them accordinglyspacegroupify
decorator which has *args
for each argument name to be converted@spacegroupify("parent_sg", "child_sg")
def a_crazy_efx_function(parent_sg, child_sg, data):
....
cellify
decoraor (in the same vein of spacegroupify
)Minimal example:
import reciprocalspaceship as rs
ds = rs.DataSet({"int_col": [0, 1, 2, 3]}, dtype="MTZInt")
ds.loc[0, "int_col"] = np.nan
print(ds["int_col"].to_numpy(float))
Traceback:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/commandline/mtzdump.py in <module>
----> 1 ds["int_col"].to_numpy(float)
~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas/core/base.py in to_numpy(self, dtype, copy, na_value, **kwargs)
511 if is_extension_array_dtype(self.dtype):
512 # error: Too many arguments for "to_numpy" of "ExtensionArray"
--> 513 return self.array.to_numpy( # type: ignore[call-arg]
514 dtype, copy=copy, na_value=na_value, **kwargs
515 )
~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dtypes/base.py in to_numpy(self, dtype, copy, na_value)
122 if self._hasna:
123 data = self._data.astype(dtype, copy=copy)
--> 124 data[self._mask] = na_value
125 else:
126 data = self._data.astype(dtype, copy=copy)
TypeError: float() argument must be a string or a number, not 'NAType'
This error is related to the overloading of the pandas to_numpy()
method, and can be fixed by changing the default na_value
to np.nan
. This is a safe assumption to be making here, because all MTZIntegerArray
-backed datatypes have to be compatible with float32 dtypes by construction.
I have a local fix implemented, and will make a PR shortly -- just posting this issue to log the error.
Currently, DataSet.unstack_anomalous()
can be used with unmerged data to assign reflections to Friedel+ and Friedel- columns, and DataSet.stack_anomalous()
can be used to undo the action. In the two-column anomalous format, each reflection is kept on its own row, and NaNs
are used to pad the unused columns.
We should revisit this design decision, because it seems to be an uncommon action for unmerged data. The new anomalous
flag for hkl_to_asu
seems preferable for assigning reflections to Friedel "zones" of the ASU, and if its useful to explicitly assign observations to Friedel+ or Friedel-, I think that will be better handled by a DataSet.assign_friedel()
helper function (This would be significantly less memory-intensive, as well).
My plan here is to make DataSet.unstack_anomalous()
and DataSet.stack_anomalous()
only applicable to DataSet
objects with the merged=True
attribute. A ValueError
(orAttributeError
?) would be raised if the functions are invoked with merged=False
DataSet
objects.
Any additional thoughts?
Often times, I notice that SAD phasing solutions from isomorphous structures don't overlay in real space. This seems to have something to do with the choice of "phase origin". It'd be super useful if we had a function to make sure that two sets of structure factors have compatible origins. We should be able to implement something akin to @apeck12's method. After talking to @JBGreisman, we decided doing this in cases with good phases might boil down to solving a simple linear program. At any rate, this should absolutely be something we put in rs.algorithms
.
This is way outside of my wheelhouse, but I clicked on our binder link for the first time, and it takes a crazy amount of time (8-10 minutes!) to launch. On our end, I think all we're asking for Binder to do is run pip install reciprocalspaceship[dev]
. This command installs a lot of stuff. Notably, it installs PyTorch for the robust merging example. I suspect this is what takes so long, but it could certainly be other packages.
I would suggest it is worth some time to pair down the dependencies. In the case of PyTorch, we could defer the install to the first cell of the example notebook.
I'd love some help from @ianhi on this one. He certainly knows this stuff better than I.
Here is a partial log from a binder launch which may be helpful:
Picked Git content provider.
Cloning into '/tmp/repo2docker4xoq9m89'...
HEAD is now at 7781360 Bump version to 0.9.18
Building conda environment for python=3.7Using PythonBuildPack builder
Building conda environment for python=3.7Building conda environment for python=3.7Step 1/48 : FROM buildpack-deps:bionic
---> 72ccd8e28f8d
Step 2/48 : ENV DEBIAN_FRONTEND=noninteractive
---> Using cache
---> 915bfe954aed
Step 3/48 : RUN apt-get -qq update && apt-get -qq install --yes --no-install-recommends locales > /dev/null && apt-get -qq purge && apt-get -qq clean && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 382a4bfa36dd
Step 4/48 : RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen
---> Using cache
---> 23cc551494ea
Step 5/48 : ENV LC_ALL en_US.UTF-8
---> Using cache
---> ce16adc179fd
Step 6/48 : ENV LANG en_US.UTF-8
---> Using cache
---> 1443ecbb572c
Step 7/48 : ENV LANGUAGE en_US.UTF-8
---> Using cache
---> 67fd1928a7e1
Step 8/48 : ENV SHELL /bin/bash
---> Using cache
---> 0492f760ec33
Step 9/48 : ARG NB_USER
---> Using cache
---> 0fa8275551af
Step 10/48 : ARG NB_UID
---> Using cache
---> 9283ec23cac9
Step 11/48 : ENV USER ${NB_USER}
---> Using cache
---> e92aaa5352f0
Step 12/48 : ENV HOME /home/${NB_USER}
---> Using cache
---> 6f44454fabec
Step 13/48 : RUN groupadd --gid ${NB_UID} ${NB_USER} && useradd --comment "Default user" --create-home --gid ${NB_UID} --no-log-init --shell /bin/bash --uid ${NB_UID} ${NB_USER}
---> Using cache
---> 0ee4d890aa6c
Step 14/48 : RUN apt-get -qq update && apt-get -qq install --yes --no-install-recommends less unzip > /dev/null && apt-get -qq purge && apt-get -qq clean && rm -rf /var/lib/apt/lists/*
---> Using cache
---> bd296a1594c5
Step 15/48 : EXPOSE 8888
---> Using cache
---> 6d6578ba9b76
Step 16/48 : ENV APP_BASE /srv
---> Using cache
---> 92901276e26d
Step 17/48 : ENV CONDA_DIR ${APP_BASE}/conda
---> Using cache
---> c61852980ae7
Step 18/48 : ENV NB_PYTHON_PREFIX ${CONDA_DIR}/envs/notebook
---> Using cache
---> 4f44e70e3668
Step 19/48 : ENV NPM_DIR ${APP_BASE}/npm
---> Using cache
---> 9bae366c7f81
Step 20/48 : ENV NPM_CONFIG_GLOBALCONFIG ${NPM_DIR}/npmrc
---> Using cache
---> 8fe62e0295b7
Step 21/48 : ENV NB_ENVIRONMENT_FILE /tmp/env/environment.lock
---> Using cache
---> a7dee12cf999
Step 22/48 : ENV KERNEL_PYTHON_PREFIX ${NB_PYTHON_PREFIX}
---> Using cache
---> 8b2229129a3d
Step 23/48 : ENV PATH ${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${NPM_DIR}/bin:${PATH}
---> Using cache
---> 4c9bcaf823ba
Step 24/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2factivate-2dconda-2esh-391af5 /etc/profile.d/activate-conda.sh
---> Using cache
---> f8d2890e201b
Step 25/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2fenvironment-2epy-2d3-2e7-2elock-4f1154 /tmp/env/environment.lock
---> Using cache
---> c1410cdb5a70
Step 26/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2finstall-2dminiforge-2ebash-514214 /tmp/install-miniforge.bash
---> Using cache
---> 44936c7cb078
Step 27/48 : RUN TIMEFORMAT='time: %3R' bash -c 'time /tmp/install-miniforge.bash' && rm -rf /tmp/install-miniforge.bash /tmp/env
---> Using cache
---> 4ab657a80f7e
Step 28/48 : RUN mkdir -p ${NPM_DIR} && chown -R ${NB_USER}:${NB_USER} ${NPM_DIR}
---> Using cache
---> 01840c1deeaf
Step 29/48 : ARG REPO_DIR=${HOME}
---> Using cache
---> b440807ab159
Step 30/48 : ENV REPO_DIR ${REPO_DIR}
---> Using cache
---> f1d0395dc84d
Step 31/48 : WORKDIR ${REPO_DIR}
---> Using cache
---> 11e14be612ee
Step 32/48 : RUN chown ${NB_USER}:${NB_USER} ${REPO_DIR}
---> Using cache
---> 4baa295e9a1c
Step 33/48 : ENV PATH ${HOME}/.local/bin:${REPO_DIR}/.local/bin:${PATH}
---> Using cache
---> 8303e9277f1c
Step 34/48 : ENV CONDA_DEFAULT_ENV ${KERNEL_PYTHON_PREFIX}
---> Using cache
---> 4c3893acd0bd
Step 35/48 : COPY --chown=1000:1000 src/ ${REPO_DIR}
---> d5bd5f9ab584
Step 36/48 : USER ${NB_USER}
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 54852568430a
Removing intermediate container 54852568430a
---> 1191068b9b5d
Step 37/48 : RUN ${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache-dir .
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 05f10e1eaab0
Processing /home/jovyan
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting gemmi<=0.5.1,>=0.4.2
Downloading gemmi-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Collecting pandas<=1.3.5,>=1.2.0
Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
Collecting numpy
Downloading numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
Collecting scipy
Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
Requirement already satisfied: ipython in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship==0.9.18) (7.30.1)
Requirement already satisfied: pytz>=2017.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (2.8.2)
Requirement already satisfied: setuptools>=18.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (60.0.4)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (5.1.0)
Requirement already satisfied: traitlets>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (5.1.1)
Requirement already satisfied: backcall in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.2.0)
Requirement already satisfied: pygments in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (2.10.0)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.18.1)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (4.8.0)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.1.3)
Requirement already satisfied: pickleshare in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (3.0.24)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jedi>=0.16->ipython->reciprocalspaceship==0.9.18) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pexpect>4.3->ipython->reciprocalspaceship==0.9.18) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->reciprocalspaceship==0.9.18) (0.2.5)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (1.16.0)
Building wheels for collected packages: reciprocalspaceship
Building wheel for reciprocalspaceship (setup.py): started
Building wheel for reciprocalspaceship (setup.py): finished with status 'done'
Created wheel for reciprocalspaceship: filename=reciprocalspaceship-0.9.18-py3-none-any.whl size=68921 sha256=cefc203a0593825f967eeff3511e8c362c64247e6eaff0dfc0730ff01e1a47bd
Stored in directory: /tmp/pip-ephem-wheel-cache-1b6sliwd/wheels/24/67/61/461d47532c7e3b6048f03e8f0e3c2ddb1976b17163d48f9fe9
Successfully built reciprocalspaceship
Installing collected packages: numpy, scipy, pandas, gemmi, reciprocalspaceship
Successfully installed gemmi-0.5.1 numpy-1.21.5 pandas-1.3.5 reciprocalspaceship-0.9.18 scipy-1.7.3
Removing intermediate container 05f10e1eaab0
---> 9b8a5f6ddc67
Step 38/48 : LABEL repo2docker.ref="7781360585814dbbac9ba0628fb2fbc974a964fb"
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 3a4d023a8819
Removing intermediate container 3a4d023a8819
---> 6e4e4f0325d0
Step 39/48 : LABEL repo2docker.repo="https://github.com/Hekstra-Lab/reciprocalspaceship"
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 22531833e814
Removing intermediate container 22531833e814
---> 46ad036a4080
Step 40/48 : LABEL repo2docker.version="2021.08.0+78.g4352535"
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in fe8f9fe7819f
Removing intermediate container fe8f9fe7819f
---> e0e80f22843d
Step 41/48 : USER ${NB_USER}
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in e87f3615d067
Removing intermediate container e87f3615d067
---> 629e656d9a07
Step 42/48 : RUN chmod +x postBuild
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 2e7bfb1cba7d
Removing intermediate container 2e7bfb1cba7d
---> 233f67c9a99e
Step 43/48 : RUN ./postBuild
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in a519284bfcd0
Requirement already satisfied: reciprocalspaceship[dev] in /srv/conda/envs/notebook/lib/python3.7/site-packages (0.9.18)
Requirement already satisfied: ipython in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (7.30.1)
Requirement already satisfied: pandas<=1.3.5,>=1.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.3.5)
Requirement already satisfied: gemmi<=0.5.1,>=0.4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (0.5.1)
Requirement already satisfied: scipy in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.7.3)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.21.5)
Collecting jupyter
Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting torch
Downloading torch-1.10.1-cp37-cp37m-manylinux1_x86_64.whl (881.9 MB)
Collecting sphinx-rtd-theme
Downloading sphinx_rtd_theme-1.0.0-py2.py3-none-any.whl (2.8 MB)
Collecting pytest
Downloading pytest-6.2.5-py3-none-any.whl (280 kB)
Collecting pytest-cov
Downloading pytest_cov-3.0.0-py3-none-any.whl (20 kB)
Collecting matplotlib
Downloading matplotlib-3.5.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
Collecting nbsphinx
Downloading nbsphinx-0.8.8-py3-none-any.whl (25 kB)
Collecting seaborn
Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
Collecting autodocsumm
Downloading autodocsumm-0.2.7.tar.gz (43 kB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Collecting pytest-xdist
Downloading pytest_xdist-2.5.0-py3-none-any.whl (41 kB)
Collecting sphinx
Downloading Sphinx-4.4.0-py3-none-any.whl (3.1 MB)
Collecting tqdm
Downloading tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
Collecting scikit-image
Downloading scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB)
Collecting sphinx-panels
Downloading sphinx_panels-0.6.0-py3-none-any.whl (87 kB)
Collecting sphinxcontrib-autoprogram
Downloading sphinxcontrib_autoprogram-0.1.7-py2.py3-none-any.whl (8.7 kB)
Collecting celluloid
Downloading celluloid-0.2.0-py3-none-any.whl (5.4 kB)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship[dev]) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship[dev]) (2021.3)
Collecting sphinxcontrib-htmlhelp>=2.0.0
Downloading sphinxcontrib_htmlhelp-2.0.0-py2.py3-none-any.whl (100 kB)
Requirement already satisfied: Jinja2>=2.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (3.0.3)
Collecting sphinxcontrib-applehelp
Downloading sphinxcontrib_applehelp-1.0.2-py2.py3-none-any.whl (121 kB)
Collecting snowballstemmer>=1.1
Downloading snowballstemmer-2.2.0-py2.py3-none-any.whl (93 kB)
Collecting sphinxcontrib-qthelp
Downloading sphinxcontrib_qthelp-1.0.3-py2.py3-none-any.whl (90 kB)
Collecting docutils<0.18,>=0.14
Downloading docutils-0.17.1-py2.py3-none-any.whl (575 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (4.10.0)
Requirement already satisfied: Pygments>=2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.10.0)
Collecting sphinxcontrib-devhelp
Downloading sphinxcontrib_devhelp-1.0.2-py2.py3-none-any.whl (84 kB)
Requirement already satisfied: requests>=2.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.26.0)
Collecting sphinxcontrib-jsmath
Downloading sphinxcontrib_jsmath-1.0.1-py2.py3-none-any.whl (5.1 kB)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (21.3)
Collecting alabaster<0.8,>=0.7
Downloading alabaster-0.7.12-py2.py3-none-any.whl (14 kB)
Requirement already satisfied: babel>=1.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.9.1)
Collecting imagesize
Downloading imagesize-1.3.0-py2.py3-none-any.whl (5.2 kB)
Collecting sphinxcontrib-serializinghtml>=1.1.5
Downloading sphinxcontrib_serializinghtml-1.1.5-py2.py3-none-any.whl (94 kB)
Requirement already satisfied: backcall in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.2.0)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (5.1.0)
Requirement already satisfied: pickleshare in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.7.5)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.18.1)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.1.3)
Requirement already satisfied: setuptools>=18.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (60.0.4)
Requirement already satisfied: traitlets>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (5.1.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (3.0.24)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (4.8.0)
Requirement already satisfied: ipywidgets in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (7.6.3)
Requirement already satisfied: ipykernel in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.6.0)
Requirement already satisfied: notebook in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.3.0)
Requirement already satisfied: nbconvert in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.0.7)
Collecting jupyter-console
Downloading jupyter_console-6.4.0-py3-none-any.whl (22 kB)
Collecting qtconsole
Downloading qtconsole-5.2.2-py3-none-any.whl (120 kB)
Collecting fonttools>=4.22.0
Downloading fonttools-4.28.5-py3-none-any.whl (890 kB)
Requirement already satisfied: pyparsing>=2.2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib->reciprocalspaceship[dev]) (3.0.6)
Collecting pillow>=6.2.0
Downloading Pillow-9.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.3.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
Requirement already satisfied: nbformat in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbsphinx->reciprocalspaceship[dev]) (5.1.3)
Collecting py>=1.8.2
Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)
Requirement already satisfied: attrs>=19.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pytest->reciprocalspaceship[dev]) (21.2.0)
Collecting iniconfig
Downloading iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting pluggy<2.0,>=0.12
Downloading pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting toml
Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting coverage[toml]>=5.2.1
Downloading coverage-6.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (213 kB)
Collecting pytest-forked
Downloading pytest_forked-1.4.0-py3-none-any.whl (4.9 kB)
Collecting execnet>=1.1
Downloading execnet-1.9.0-py2.py3-none-any.whl (39 kB)
Collecting imageio>=2.4.1
Downloading imageio-2.13.5-py3-none-any.whl (3.3 MB)
Collecting PyWavelets>=1.1.1
Downloading PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB)
Collecting networkx>=2.2
Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB)
Collecting tifffile>=2019.7.26
Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB)
Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinxcontrib-autoprogram->reciprocalspaceship[dev]) (1.16.0)
Requirement already satisfied: typing-extensions in /srv/conda/envs/notebook/lib/python3.7/site-packages (from torch->reciprocalspaceship[dev]) (4.0.1)
Collecting tomli
Downloading tomli-2.0.0-py3-none-any.whl (12 kB)
Requirement already satisfied: zipp>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from importlib-metadata>=4.4->sphinx->reciprocalspaceship[dev]) (3.6.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jedi>=0.16->ipython->reciprocalspaceship[dev]) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from Jinja2>=2.3->sphinx->reciprocalspaceship[dev]) (2.0.1)
Requirement already satisfied: defusedxml in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.7.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.8.4)
Requirement already satisfied: testpath in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.0)
Requirement already satisfied: entrypoints>=0.2.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.3)
Requirement already satisfied: jupyterlab-pygments in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.1.2)
Requirement already satisfied: jupyter-core in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (4.9.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (1.5.0)
Requirement already satisfied: bleach in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (4.1.0)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.9)
Requirement already satisfied: ipython-genutils in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbformat->nbsphinx->reciprocalspaceship[dev]) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbformat->nbsphinx->reciprocalspaceship[dev]) (4.3.2)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pexpect>4.3->ipython->reciprocalspaceship[dev]) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->reciprocalspaceship[dev]) (0.2.5)
Requirement already satisfied: charset-normalizer~=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (2.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (1.26.7)
Requirement already satisfied: argcomplete>=1.12.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (1.12.3)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (1.5.1)
Requirement already satisfied: tornado<7.0,>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (6.1)
Requirement already satisfied: jupyter-client<8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (7.1.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipywidgets->jupyter->reciprocalspaceship[dev]) (3.5.2)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipywidgets->jupyter->reciprocalspaceship[dev]) (1.0.2)
Requirement already satisfied: pyzmq>=17 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (22.3.0)
Requirement already satisfied: argon2-cffi in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (21.1.0)
Requirement already satisfied: prometheus-client in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (0.12.0)
Requirement already satisfied: terminado>=0.8.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (0.12.1)
Requirement already satisfied: Send2Trash>=1.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (1.8.0)
Collecting qtpy
Downloading QtPy-2.0.0-py3-none-any.whl (62 kB)
Requirement already satisfied: importlib-resources>=1.4.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->nbsphinx->reciprocalspaceship[dev]) (5.4.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->nbsphinx->reciprocalspaceship[dev]) (0.18.0)
Requirement already satisfied: nest-asyncio>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter-client<8.0->ipykernel->jupyter->reciprocalspaceship[dev]) (1.5.4)
Requirement already satisfied: cffi>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from argon2-cffi->notebook->jupyter->reciprocalspaceship[dev]) (1.15.0)
Requirement already satisfied: webencodings in /srv/conda/envs/notebook/lib/python3.7/site-packages (from bleach->nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.1)
Requirement already satisfied: pycparser in /srv/conda/envs/notebook/lib/python3.7/site-packages (from cffi>=1.0.0->argon2-cffi->notebook->jupyter->reciprocalspaceship[dev]) (2.21)
Building wheels for collected packages: autodocsumm
Building wheel for autodocsumm (setup.py): started
Building wheel for autodocsumm (setup.py): finished with status 'done'
Created wheel for autodocsumm: filename=autodocsumm-0.2.7-py3-none-any.whl size=13521 sha256=1d8a97a3eb5851349339a541e8d7a4d78610530a35b5b10a57e18d05b2de3c03
Stored in directory: /home/jovyan/.cache/pip/wheels/c7/7e/cb/6102fccefbd2ca3339722fcddfa7787a88d52ddbbfbd280221
Successfully built autodocsumm
Installing collected packages: toml, py, pluggy, iniconfig, tomli, sphinxcontrib-serializinghtml, sphinxcontrib-qthelp, sphinxcontrib-jsmath, sphinxcontrib-htmlhelp, sphinxcontrib-devhelp, sphinxcontrib-applehelp, snowballstemmer, qtpy, pytest, pillow, kiwisolver, imagesize, fonttools, docutils, cycler, coverage, alabaster, tifffile, sphinx, qtconsole, PyWavelets, pytest-forked, networkx, matplotlib, jupyter-console, imageio, execnet, tqdm, torch, sphinxcontrib-autoprogram, sphinx-rtd-theme, sphinx-panels, seaborn, scikit-image, pytest-xdist, pytest-cov, nbsphinx, jupyter, celluloid, autodocsumm
Successfully installed PyWavelets-1.2.0 alabaster-0.7.12 autodocsumm-0.2.7 celluloid-0.2.0 coverage-6.2 cycler-0.11.0 docutils-0.17.1 execnet-1.9.0 fonttools-4.28.5 imageio-2.13.5 imagesize-1.3.0 iniconfig-1.1.1 jupyter-1.0.0 jupyter-console-6.4.0 kiwisolver-1.3.2 matplotlib-3.5.1 nbsphinx-0.8.8 networkx-2.6.3 pillow-9.0.0 pluggy-1.0.0 py-1.11.0 pytest-6.2.5 pytest-cov-3.0.0 pytest-forked-1.4.0 pytest-xdist-2.5.0 qtconsole-5.2.2 qtpy-2.0.0 scikit-image-0.19.1 seaborn-0.11.2 snowballstemmer-2.2.0 sphinx-4.4.0 sphinx-panels-0.6.0 sphinx-rtd-theme-1.0.0 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-autoprogram-0.1.7 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 tifffile-2021.11.2 toml-0.10.2 tomli-2.0.0 torch-1.10.1 tqdm-4.62.3
Removing intermediate container a519284bfcd0
---> 42bf0708571d
Step 44/48 : ENV PYTHONUNBUFFERED=1
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in b585ae5ccf83
Removing intermediate container b585ae5ccf83
---> 4d56d3fbbe2d
Step 45/48 : COPY /python3-login /usr/local/bin/python3-login
---> 3d98ceb43ecb
Step 46/48 : COPY /repo2docker-entrypoint /usr/local/bin/repo2docker-entrypoint
---> 11a6f6fd2d25
Step 47/48 : ENTRYPOINT ["/usr/local/bin/repo2docker-entrypoint"]
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 8fc78e8e4f26
Removing intermediate container 8fc78e8e4f26
---> 9cdcb96d4175
Step 48/48 : CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
---> Running in 851c7348604f
Removing intermediate container 851c7348604f
---> 39789a85a264
{"aux": {"ID": "sha256:39789a85a2645b262f62b3bb8e887e03ae3c43efbef55692fb8103be25c09452"}}Successfully built 39789a85a264
Successfully tagged turingmybinder/binder-prod-r2d-g5b5b759-hekstra-2dlab-2dreciprocalspaceship-670595:7781360585814dbbac9ba0628fb2fbc974a964fb
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Currently, we try to fully support applying phase shifts when moving reflections around the unit cell (DataSet.hkl_to_asu()
, DataSet.apply_symop()
, etc.). This is implemented by checking for the PhaseDtype
, and updating any columns that are found.
We should also consider doing the same more broadly for structure factors stored as complex numbers. I added some support for this in DataSet.apply_symop()
, but it is still missing from hkl_to_asu()
and hkl_to_observed()
. In an analogous way to DataSet.get_phase_keys()
, this can be implemented using the recently added DataSet.get_complex_keys()
helper method.
Few additional thoughts:
DataSet
represents a structure factor? Are there any other reasonable use cases for storing a complex number for which applying a phase shift would be problematic?P.S. I labeled this as a bug because I think we should 1) explain the behavior clearly in the documentation and 2) make it consistent to minimize surprise
The cell and spacegroup attributes of a rs.DataSet
are stored as gemmi
objects, which breaks the DataSet.to_pickle()
method inherited from pandas
.
import reciprocalspaceship as rs
mtz = rs.read_mtz("docs/examples/data/HEWL_SSAD_24IDC.mtz")
print(mtz.spacegroup) # Prints <gemmi.SpaceGroup("P 43 21 2")>
mtz.to_pickle("test.pkl")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-e7af7b9cd589> in <module>
----> 1 mtz.to_pickle("test.pkl")
~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas-1.2.1-py3.8-macosx-10.9-x86_64.egg/pandas/core/generic.py in to_pickle(self, path, compression, protocol, storage_options)
2861 from pandas.io.pickle import to_pickle
2862
-> 2863 to_pickle(
2864 self,
2865 path,
~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas-1.2.1-py3.8-macosx-10.9-x86_64.egg/pandas/io/pickle.py in to_pickle(obj, filepath_or_buffer, compression, protocol, storage_options)
95 storage_options=storage_options,
96 ) as handles:
---> 97 pickle.dump(obj, handles.handle, protocol=protocol) # type: ignore[arg-type]
98
99
TypeError: cannot pickle 'gemmi.SpaceGroup' object
It would be useful to fix this functionality so that DataSet
objects can be restored without needing to go to MTZ format. This could enable storing complex numbers in columns or the storage of arbitrary metadata in the DataSet.attrs
attribute.
Two possible solutions:
gemmi.SpaceGroup
and gemmi.UnitCell
objectsDataSet.to_pickle()
to cache the spacegroup / cell attributes in pickle-friendly forms, and write rs.read_pickle()
to read the pickle file and re-set the cached spacegroup / cell.Here is a proof of concept of how the workaround above could look:
# Cache attributes -- this would get implicitly handled by DataSet.to_pickle()
mtz.attrs["spacegroup"] = mtz.spacegroup.xhm()
mtz.attrs["cell"] = mtz.cell.parameters
# Remove gemmi objects to avoid pickle problems
mtz.spacegroup = None
mtz.cell = None
# Write pickle
mtz.to_pickle("dataset.pkl")
# Read pickle -- in the future this could all get wrapped into `rs.read_pickle()`
import pandas as pd
ds = rs.DataSet(pd.read_pickle("dataset.pkl"))
ds.spacegroup = ds.attrs["spacegroup"]
ds.cell = ds.attrs["cell"]
The example notebooks included in the documentation and binder involve a few additional dependencies that are not included in the install_requires
. These include matplotlib
, seaborn
, celluloid
, scikit-image
, and soon-to-be pytorch
.
These additional dependencies should be added to the dev
mode of extras_require
, so that it is easy to get compatible versions of all relevant dependencies. This can also be used to simplify the binder
setup, and to add the ability to "re-run" the notebooks when building documentation to ensure that everything is up-to-date.
Way to reproduce this issue:
conda create -n rs_test python=3.9
conda activate rs_test
pip install reciprocalspaceship
Error:
Failed building wheel for gemmi ...
The building is ok in 3.8 or lower python version as i tried, although still slow. Seems this is a gemmi issue, not reciprocalspaceship's. But it might be meaningful to note this in the installation tutorial.
Right now there is no obvious way to change a DataSet
instance to a different spacegroup which possibly has a different basisop
and/or reciprocal asu. For unmerged data, I think something along the lines of the following is probably sufficient:
def change_spacegroup(self, new_sg, inplace=False):
if not inplace:
ds = self.copy()
else:
ds = self
if not isinstance(new_sg, gemmi.SpaceGroup):
new_sg = gemmi.SpaceGroup(new_sg)
ds.apply_symop(ds.spacegroup.basisop.inverse(), inplace=True)
ds.apply_symop(new_sg.basisop, inplace=True)
return ds
For merged data, I think in addition to the basisop application, you would want to expand the asu to P1 and select the new asu.
IMHO there is enough nuance to this problem that we should provide a well tested method for this.
the link referring to
http://legacy.ccp4.ac.uk/html/mtzformat.html#coltypes
does no longer work and should read instead:
https://www.ccp4.ac.uk/html/mtzformat.html#coltypes (the #coltypes part does not actually work)
If Friedel columns share the same base name with non-Friedel columns, stack_anomalous
will fail with a cryptic error message.
Here is a minimal example
import reciprocalspaceship as rs
import numpy as np
dmin = 2.
cell = [10., 20., 30., 90., 90., 90.]
sg = 19
h,k,l = rs.utils.generate_reciprocal_asu(cell, sg, dmin, anomalous=False).T
ds = rs.DataSet({
'H' : h,
'K' : k,
'L' : l,
'I' : np.ones(len(h)),
'I(+)' : np.ones(len(h)),
'I(-)' : np.ones(len(h)),
},
merged=True,
cell=cell,
spacegroup=sg
).infer_mtz_dtypes().set_index(['H', 'K', 'L'])
print(ds)
print(ds.dtypes)
ds.stack_anomalous()
which outputs
user@computer:~$ python bug.py
I Intensity
I(+) FriedelIntensity
I(-) FriedelIntensity
dtype: object
Traceback (most recent call last):
File "bug.py", line 26, in <module>
ds.stack_anomalous()
File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 907, in stack_anomalous
F[label] = F[label].from_friedel_dtype()
File "/home/kmdalton/opt/anaconda/envs/careless/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataSet' object has no attribute 'from_friedel_dtype'
Related to this (and hopefully not compounding the issue) would it make sense for DataSet.stack_anomalous()
to take something like plus_suffix
and minus_suffix
arguments, as alternatives for plus_labels
and minus_labels
?
Definitely not critical, but would a) occasionally save some typing and b) be internally consistent with the way the defaults work (without breaking previous code), e.g. plus_suffix="(+)"
/ minus_suffix="(-)"
(This also seems like an easy enough change that I could try to tackle it myself, with blessing?)
Originally posted by @dennisbrookner in #99 (comment)
We do not consistently use the same dtypes
in functions within rs.utils
. We should come up with a unified philosophy for how numpy
dtypes
are determined for returned values. I can think of at least three defensible possibilities:
dtype=np.{float32, int32, ...}
parameter to each functionnp.float32
or np.int32
as applicable.I lean toward the last option, because it meshes best with the mtz standard, and I think it will lead to fewer edge cases and gotchas.
The current implementation of infer_dtypes
makes strong assumptions about the type of index. If infer_dtypes
is called on a DataSet
with a RangeIndex
, it will throw a KeyError
:
[ins] In [35]: ds.reset_index().infer_mtz_dtypes()
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
-> 2646 return self._engine.get_loc(key)
2647 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: None
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
~/opt/restoration-software/examples/precognition.py in <module>
----> 1 ds.reset_index().infer_mtz_dtypes()
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/reciprocalspaceship-0.8.2-py3.7.egg/reciprocalspaceship/dataset.py in infer_mtz_dtypes(self, inplace)
281 if c is not None:
282 dataset[c] = dataset[c].infer_mtz_dtype()
--> 283 dataset.set_index(index_keys, inplace=True)
284 return dataset
285
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/reciprocalspaceship-0.8.2-py3.7.egg/reciprocalspaceship/dataset.py in set_index(self, keys, **kwargs)
92 # Copy dtypes of keys to cache
93 for key in keys:
---> 94 self._cache_index_dtypes[key] = self[key].dtype.name
95
96 return super().set_index(keys, **kwargs)
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
-> 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
-> 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: None
This happens, because RangeIndex.names
is a FrozenList
with contents None
:
[ins] In [8]: ds.index.names
Out[8]: FrozenList([None])
result["redundancy"] = g["wI"].count() would add an output column with the redundancy per observation.
Right now it is sort of frustrating to map reflections to the reciprocal asu while preserving their sign.
The cleanest solution I've come up with so far is this:
ds = rs.read_mtz(inFN).hkl_to_asu()
fplus = ds['M/ISYM']%2 == 1 #Identify friedel plus reflections
ds = ds[~fplus].apply_symop('-x,-y,-z').append(ds[fplus])
The problem with just letting this be the supported approach is that it requires the user to have to understand the M/ISYM
column. Given that M/ISYM
is really sort of an odd historical artifact more than anything, I think we should provide an intuitive method to solve this. I propose we embed a solution like this within DataSet.hkl_to_asu
in the form of a a new named parameter anomalous
which defaults to false.
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)
I was curious whether it's possible to output MTZ files that support assigning 'crystals' and/or 'projects' to columns (in the sense of https://www.ccp4.ac.uk/html/mtzformat.html). I ran into this issue when trying to run a data set through SCALEIT and it objecting that data sets scaled belonged to the same crystal. I circumvented the issue by outputting separate mtz files and merging them in CAD. My understanding is that GEMMI can handle project names and crystal names (https://gemmi.readthedocs.io/en/latest/hkl.html#mtz-format).
Right now epsilon factors do not account for space group centering. We should change rs.utils.compute_structurefactor_multiplicity
to account for the space group centering operations. This is easy enough to implement, but right now our test data for epsilon factors are from sgtbx. It is easily verified that these don't take centering into account.
>>> df = pd.read_csv("tests/data/sgtbx/sgtbx.csv.bz2")
>>> df.groupby('xhm').min()['epsilon'].max()
1
I propose we modify tests/data/gen_sgtbx_reference_data.sh
to use both gemmi and sgtbx for epsilons. When we test against sgtbx, we just have to remember to divide by len(spacegroup.operations().cen_ops)
and/or epsilons.min()
.
pd.DataFrame
has a method select_dtypes
which returns columns matching a particular numpy
dtype
. In the context of rs
it'd be natural for this to support differentiating custom MTZDtype
's. However, this is not the case right now.
Given an example mtz
file,
[ins] In [1]: mtz.head()
Out[1]:
F(+) SigF(+) F(-) SigF(-) N(+) N(-) high(+) loc(+) low(+) scale(+) high(-) loc(-) low(-) scale(-)
H K L
0 0 4 0.94140863 0.0060185874 0.94140863 0.0060185874 8.0 8.0 10000000000.0 0.94140863 1e-32 0.0060185874 10000000000.0 0.94140863 1e-32 0.0060185874
8 1.8974894 0.01334675 1.8974894 0.01334675 8.0 8.0 10000000000.0 1.8974894 1e-32 0.01334675 10000000000.0 1.8974894 1e-32 0.01334675
12 2.1121132 0.02015744 2.1121132 0.02015744 8.0 8.0 10000000000.0 2.1121132 1e-32 0.02015744 10000000000.0 2.1121132 1e-32 0.02015744
16 5.133872 0.033373583 5.133872 0.033373583 4.0 4.0 10000000000.0 5.133872 1e-32 0.033373583 10000000000.0 5.133872 1e-32 0.033373583
20 0.19568625 0.12823802 0.19568625 0.12823802 1.0 1.0 10000000000.0 0.12831146 1e-32 0.17213167 10000000000.0 0.12831146 1e-32 0.17213167
with dtypes
[ins] In [2]: mtz.dtypes
Out[2]:
F(+) FriedelSFAmplitude
SigF(+) StddevFriedelSF
F(-) FriedelSFAmplitude
SigF(-) StddevFriedelSF
N(+) MTZReal
N(-) MTZReal
high(+) MTZReal
loc(+) MTZReal
low(+) MTZReal
scale(+) MTZReal
high(-) MTZReal
loc(-) MTZReal
low(-) MTZReal
scale(-) MTZReal
dtype: object
rs.DataSet.select_dtypes
appears to fallback to the numpy
dtype
. For instance, when I call, mtz.select_dtypes("G")
I expect rs
to return a DataSet
or view
containing only "F(+)"
and "F(-)"
columns. Instead, I get all the columns backed by np.float32
[nav] In [3]: mtz.select_dtypes("G")
Out[5]:
F(+) SigF(+) F(-) SigF(-) N(+) N(-) high(+) loc(+) low(+) scale(+) high(-) loc(-) low(-) scale(-)
H K L
0 0 4 0.94140863 0.0060185874 0.94140863 0.0060185874 8.0 8.0 10000000000.0 0.94140863 1e-32 0.0060185874 10000000000.0 0.94140863 1e-32 0.0060185874
8 1.8974894 0.01334675 1.8974894 0.01334675 8.0 8.0 10000000000.0 1.8974894 1e-32 0.01334675 10000000000.0 1.8974894 1e-32 0.01334675
12 2.1121132 0.02015744 2.1121132 0.02015744 8.0 8.0 10000000000.0 2.1121132 1e-32 0.02015744 10000000000.0 2.1121132 1e-32 0.02015744
16 5.133872 0.033373583 5.133872 0.033373583 4.0 4.0 10000000000.0 5.133872 1e-32 0.033373583 10000000000.0 5.133872 1e-32 0.033373583
20 0.19568625 0.12823802 0.19568625 0.12823802 1.0 1.0 10000000000.0 0.12831146 1e-32 0.17213167 10000000000.0 0.12831146 1e-32 0.17213167
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
14 13 19 NaN NaN 0.55378014 0.08148462 NaN 2.0 NaN NaN NaN NaN 10000000000.0 0.55378014 0.0 0.08148462
11 20 NaN NaN 0.6732702 0.09068045 NaN 2.0 NaN NaN NaN NaN 10000000000.0 0.6732702 0.0 0.09068045
10 20 NaN NaN 0.8092094 0.08233523 NaN 2.0 NaN NaN NaN NaN 10000000000.0 0.8092094 0.0 0.08233523
9 20 NaN NaN 1.2847979 0.06926164 NaN 2.0 NaN NaN NaN NaN 10000000000.0 1.2847979 0.0 0.06926164
8 20 NaN NaN 1.344098 0.06747224 NaN 2.0 NaN NaN NaN NaN 10000000000.0 1.344098 0.0 0.06747224
which is all columns in this case.
Making this behave as expected either requires a change to the underlying pandas
method or overloading the method in rs
. From this perspective, it might be better to raise this issue with the pandas
devs. Not sure.
Calling DataSet.unstack_anomalous
followed by DataSet.stack_anomalous
does not always work in rs
version 0.9.15
. The following code verifies that this fails sometimes and succeeds others.
import gemmi
import reciprocalspaceship as rs
import numpy as np
cell = gemmi.UnitCell(10., 20., 30., 90., 90., 90.)
sg = gemmi.SpaceGroup(19)
dmin = 2.
h,k,l = rs.utils.generate_reciprocal_asu(cell, sg, dmin, anomalous=True).T
n = len(h)
ds = rs.DataSet({
'H' : h,
'K' : k,
'L' : l,
'F' : np.random.random(n),
'loc' : np.random.random(n),
'scale' : np.random.random(n),
},
spacegroup=sg,
cell=cell,
merged=True,
).infer_mtz_dtypes().set_index(['H', 'K', 'L'])
assert all(ds.keys() == ['F', 'loc', 'scale'])
unstacked = ds.unstack_anomalous()
print(unstacked.keys())
unstacked.stack_anomalous()
I find that the order of columns in unstacked
is not always the same, despite consistent column ordering in ds
. When the column order is Index(['F(+)', 'loc(+)', 'scale(+)', 'F(-)', 'loc(-)', 'scale(-)'], dtype='object')
, the script succeeds. When the column order is Index(['F(+)', 'loc(+)', 'scale(+)', 'scale(-)', 'loc(-)', 'F(-)'], dtype='object')
, it fails with the following traceback:
Traceback (most recent call last):
File "stack_bug.py", line 31, in <module>
unstacked.stack_anomalous()
File ".../anaconda/envs/careless/lib/python3.8/site-packages/reciprocalspaceship/dataset.py", line 911, in stack_anomalous
raise ValueError(
ValueError: Corresponding labels in ['F(+)', 'loc(+)', 'scale(+)'] and ['scale(-)', 'loc(-)', 'F(-)'] are not the same dtype: FriedelSFAmplitude and MTZReal
I don't know where this stochasticity is coming from, but it is probably somewhere in DataSet.stack_anomalous
. My guess would be it has something to do with the (non?)determinism of pd.DataFrame.merge. I don't see any obvious place where the column order could be getting scrambled.
I have to fight really hard to prevent my brain from parsing _cache_index_dtypes
as a method name. That is I interpret cache
as a verb. This is low priority, but could we switch this attribute name to _index_dtypes_cache
?
Hey all, when working through the second example I'm getting an error with rs.algorithms.merge()
. A code chunk:
import reciprocalspaceship as rs
hewl = rs.read_mtz("data/HEWL_unmerged.mtz")
result3 = rs.algorithms.merge(hewl)
Which raises:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-14-901e5cd689bc> in <module>
----> 1 result3 = rs.algorithms.merge(hewl)
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/algorithms/merge.py in merge(dataset, intensity_key, sigma_key, sort)
38
39 # Reshape anomalous data and use to compute IMEAN / SIGIMEAN
---> 40 result = result.unstack_anomalous()
41 result.loc[:, ["N(+)", "N(-)"]] = result[["N(+)", "N(-)"]].fillna(0).astype("I")
42 result["IMEAN"] = result[["wI(+)", "wI(-)"]].sum(axis=1) / result[["w(+)", "w(-)"]].sum(axis=1).astype("Intensity")
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
56 names = ds.index.names
57 ds = ds._index_from_names([None], inplace=True)
---> 58 result = f(ds, *args, **kwargs)
59 result = result._index_from_names(names, inplace=True)
60 ds = ds._index_from_names(names, inplace=True)
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in unstack_anomalous(self, columns, suffixes)
876 # Separate DataSet into Friedel(+) and Friedel(-)
877 columns = set(columns).union(set(["H", "K", "L"]))
--> 878 dataset = self.hkl_to_asu()
879 if "PARTIAL" in columns: columns.remove("PARTIAL")
880 for column in columns:
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
37 return f(ds, *args, **kwargs)
38 else:
---> 39 return f(ds.copy(), *args, **kwargs)
40 else:
41 raise KeyError(f'"inplace" not found in local variables of @inplacemethod decorated function {f} '
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
56 names = ds.index.names
57 ds = ds._index_from_names([None], inplace=True)
---> 58 result = f(ds, *args, **kwargs)
59 result = result._index_from_names(names, inplace=True)
60 ds = ds._index_from_names(names, inplace=True)
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in hkl_to_asu(self, inplace, anomalous)
975 hkls = dataset.get_hkls()
976 compressed_hkls, inverse = np.unique(hkls, axis=0, return_inverse=True)
--> 977 asu_hkls, isym, phi_coeff, phi_shift = hkl_to_asu(
978 compressed_hkls,
979 dataset.spacegroup,
~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/utils/asu.py in hkl_to_asu(H, spacegroup, return_phase_shifts)
82 an array length n containing phase shifts in degrees
83 """
---> 84 basis_op = spacegroup.basisop
85 group_ops = spacegroup.operations()
86 num_ops = len(group_ops)
AttributeError: 'NoneType' object has no attribute 'basisop'
This error occurs in my local copy of reciprocalspaceship/docs/examples/2_mergingstats.ipynb
, but not when I open the same notebook through binder, meaning the error must have to do with my local installation (I think?). I have the same versions of rs
(0.9.2
) and gemmi
(0.4.3
) as in the binder notebook; my copy has python 3.8.5
, whereas the binder notebook has 3.7.8
.
An aside: when I had reciprocalspaceship
installed via pip
, I was instead getting the error that "rs.algorithms
has no method merge
", but switching to the github version of rs
fixed that.
Of course, always possible I'm just doing something silly, but figured I'd pass this along.
Currently, methods that create or use M/ISYM
columns do not account for partiality flags. This occurs in DataSet.hkl_to_observed()
, where the ISYM value is used to map Miller indices, but the M/ISYM
column is then left intact (though it no longer applies to the mapped indices). Here's a short example using HEWL_unmerged.mtz
in tests/data/algorithms
:
import reciprocalspaceship as rs
mtz = rs.read_mtz("HEWL_unmerged.mtz")
print(mtz["M/ISYM"])
outputs:
H K L
-22 -9 4 5
-7 18 -9 8
19 -26 15 7
-15 19 4 3
15 -2 9 14
..
-3 -17 6 16
-19 -24 1 16
-33 5 6 10
-5 -22 4 16
29 10 14 1
Name: M/ISYM, Length: 20597, dtype: M/ISYM
Instead, this function should map the Miller indices (which it does currently), and extract the partiality flag into a new column.
Similarly, DataSet.hkl_to_asu()
should take such a partiality flag into account, if it exists, in order to write a correct M/ISYM
column. I think it also makes sense to make sure that any IO methods can handle such a column describing partiality method to make sure that one can read/write unmerged reflection data without loss of information.
I was just trying to load a precognition strong spot file which is just a whitespace delimited text file
$ head precognition_integration/e080_001.mccd.re.spt
0 0 0 2102.28 2175.12 43575.2 290.0
0 0 0 1429.60 1791.03 19469.1 197.7
0 0 0 2365.19 1508.56 13289.9 161.6
0 0 0 2677.00 2137.74 13169.7 161.6
0 0 0 1572.55 2012.03 9752.7 141.7
0 0 0 2319.03 2529.97 7220.5 120.4
0 0 0 1562.52 1850.75 7231.6 123.0
0 0 0 1818.91 2197.97 6648.6 117.0
0 0 0 1863.79 1514.90 6607.9 118.2
0 0 0 1514.11 2559.16 6274.7 112.7
rs.read_precognition
fails for this sort of file which is arguably a bug or a design choice. I'm personally not very concerned about that distinction. What concerns me is that this is a very reasonable file format for reflections and I have no good way to get it into a DataSet
. Best I can figure is to pass it through pandas as follows
import pandas as pd
import reciprocalspaceship as rs
inFN = "precognition_integration/e080_001.mccd.re.spt"
df = pd.read_csv(inFN, delim_whitespace=True, names=["H", "K", "L", "X", "Y", "I", "SIGI"])
ds = rs.DataSet(df).infer_mtz_dtypes()
Now, this is not super onerous or anything, but I usually don't keep pandas in my imports when working with rs
. So it is two extra steps that would just go away if we could do:
ds = rs.read_csv(
inFN,
delim_whitespace=True,
names=["H", "K", "L", "X", "Y", "I", "SIGI"],
infer_dtypes=True
)
Am I just being cranky or is this a good addition? Does this break any of our API decisions?
As we discussed extensively on the DIALS Slack channel, it is now relatively easy to parse DIALS .refl
files without cctbx/DIALS
. Newer versions of DIALS encode reflection tables using msgpack
which seems a relatively innocuous dependency to add.
To this end @ndevenish has built a parser that decodes refl tables using numpy
. It's nearly complete but may be missing column types. We can find a full list of types in this block. It should be easy to build this into the rs.io
submodule as I've done here for example.
There remains the issue of DIALS reflection tables potentially containing some fairly exotic objects (shoeboxes, vectors, matrices). The safest (sadly slowest) thing to do for a first pass is to just default them to objects. We can think about clever solutions later.
Parsing legacy pickle
based reflection tables is an open question. For the time being, I think we just can't support them. @ndevenish suggests looking here for clues though.
@JBGreisman, let's chat about this early next week and get it up and running. I think this is already mostly there!
When DataSet.hkl_to_asu()
is called with the anomalous=True
flag, reflections are mapped to the Friedel +/- ASU. This makes it useful to construct calls using DataSet.groupby(["H", "K", "L"])
that handle Friedel pairs separately. However, all reflections are only defined as Friedel +/- based on the M/ISYM flag (odd are Friedel+, even. are Friedel-), even if they are centric.
Example:
import reciprocalspaceship as rs
unmerged = rs.read_mtz("tests/data/algorithms/HEWL_unmerged.mtz")
unmerged.label_centrics(inplace=True)
example = unmerged.loc[[(11, 11, 8), (11, -11, -8), (-11, -11, -8)], ["BATCH", "CENTRIC"]]
print("Observations:")
print(example)
not_anom = example.hkl_to_asu(anomalous=False)
print("Friedel + ASU:")
print(not_anom)
anom = example.hkl_to_asu(anomalous=True)
print("Friedel +/- ASU:")
print(anom)
Outputs:
Observations:
BATCH CENTRIC
H K L
11 11 8 454 True
-11 -8 909 True
-8 474 True
-11 -11 -8 203 True
-8 814 True
-8 627 True
Friedel + ASU:
BATCH CENTRIC M/ISYM
H K L
11 11 8 454 True 1
8 909 True 4
8 474 True 4
8 203 True 2
8 814 True 2
8 627 True 2
Friedel +/- ASU:
BATCH CENTRIC M/ISYM
H K L
11 11 8 454 True 1
-11 -11 -8 909 True 4
-8 474 True 4
-8 203 True 2
-8 814 True 2
-8 627 True 2
This behavior should be modified to only be used for acentric reflections -- centric reflections should not be considered "Friedel", and should only be mapped to the Friedel-plus ASU. The above example should give identical results for hkl_to_asu()
with anomalous=True
and anomalous=False
.
This is similar to #25. Currently DataSet.stack_anomalous()
returns a new DataSet with twice as many rows as the input object. This is because every row is split into two, and mapped to the +/- reciprocal space ASU. However centric reflections should not be considered "Friedel" and should always remain in the +ASU.
As such, the returned dataset should end up having 2*n_acentric + n_centric
rows.
I am not sure why yet, but the rs build fails with gemmi 0.4.0 which is the latest in pypi. For the time being, I have just made gemmi 0.3.8 the required version in setup.py.
For EF-X experiments in which ON and OFF data are merged in a lower-symmetry and higher symmetry spacegroup for perturbed and unperturbed data sets, respectively, it might be helpful to retain, in some way, the symops that map the lower-sym datasets ASU to the parent ASU, or to otherwise facilitate this mapping by comparing two settings.
We have made it easy to convert intensities to structure factors using our French-Wilson implementation. However, sometimes I find myself wanting to go the other direction. This is especially useful in comparing careless
output to other methods.
I propose we provide a method for doing this conversion which takes into account uncertainties. By this I mean,
I = SigF*SigF + F*F
SigI = abs(2*F*SigF)
Note that 2 is only approximate, we could do better if we knew the distribution of F. For starters, we could just add this as a function in the algorithms submodule.
Sometimes when converting multiple columns to numpy
, reciprocalspaceship
converts the columns to object dtype
.
A simple example to reproduce:
import reciprocalspaceship as rs
import numpy as np
ds = rs.DataSet({
"X" : np.random.random(100),
"Y" : np.random.random(100),
}).infer_mtz_dtypes()
print(f"ds.dtypes:\n{ds.dtypes}\n")
print(f"ds['X'].to_numpy().dtype:\n {ds['X'].to_numpy().dtype}\n")
print(f"ds['Y'].to_numpy().dtype:\n {ds['Y'].to_numpy().dtype}\n")
print(f"ds[['X', 'Y']].to_numpy().dtype:\n {ds[['X', 'Y']].to_numpy().dtype}")
outputs:
ds.dtypes:
X MTZReal
Y MTZReal
dtype: object
ds['X'].to_numpy().dtype:
float32
ds['Y'].to_numpy().dtype:
float32
ds[['X', 'Y']].to_numpy().dtype:
object
DataSet.reset_index()
raises a KeyError
when using the level argument to specify only a few labels in a MultiIndex
. This occurs because reset_index()
assumes that all labels are being removed from the index when trying to reassign cached MTZ dtypes:
dataset = rs.read_mtz("tests/data/algorithms/HEWL_unmerged.mtz")
print(dataset.index.names) # prints ['H', 'K', 'L']
dataset.reset_index(level=['H', 'K'])
Outputs:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2888 try:
-> 2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'L'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-12-93c78908ac77> in <module>
----> 1 dataset.reset_index(level=['H', 'K'])
~/reciprocalspaceship/reciprocalspaceship/dataset.py in reset_index(self, **kwargs)
135 for key in newdf._cache_index_dtypes.keys():
136 dtype = newdf._cache_index_dtypes[key]
--> 137 newdf[key] = newdf[key].astype(dtype)
138 newdf._cache_index_dtypes = {}
139 return newdf
~/rs/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2897 if self.columns.nlevels > 1:
2898 return self._getitem_multilevel(key)
-> 2899 indexer = self.columns.get_loc(key)
2900 if is_integer(indexer):
2901 indexer = [indexer]
~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
-> 2891 raise KeyError(key) from err
2892
2893 if tolerance is not None:
KeyError: 'L'
Since pandas supports a level=
argument to reset_index()
, the overloaded method should be modified to only try to change dtypes of columns that are removed from the index.
Currently, the DataSet.cell
and DataSet.spacegroup
attributes do not do any type checking. They are intended to be set to gemmi.UnitCell
and gemmi.SpaceGroup
objects, but they can currently be set to anything:
mtz.spacegroup = [1, 2, 3]
print(mtz.spacegroup) # prints [1, 2, 3]
It would make sense to add some type checking here in their respective setter methods. This can also be made to broaden the API so that if spacegroups are set to be a string or int, the gemmi.SpaceGroup
constructor is called, and same thing for gemmi.UnitCell
if a list/tuple of 6 values is passed.
I often find myself using external tools to compute completeness, but this is something we could easily implement within rs
. I imagine there are other stats we might want to have baked in as well. I propose we add a compute_completeness
function with an optional bins
argument. I would be in favor of adding an rs.stats
namespace for it to live in.
How does this plan sit with you, @JBGreisman ?
The following minimal snippet illustrates how to reproduce the issue. If a DataSet
is constructed without an explicit index
object, the constructor won't populate self._cache_index_dtypes
. This has a lot of nasty side effects, and exposes users to some pretty cryptic error messages.
import reciprocalspaceship as rs
import numpy as np
inFN = 'reciprocalspaceship/tests/data/algorithms/HEWL_SSAD_24IDC.mtz'
mtz = rs.read_mtz(inFN)
mtz = rs.DataSet({
'I' : mtz['IMEAN'],
'SigI' : mtz['SIGIMEAN'],
},
cell = mtz.cell,
spacegroup=mtz.spacegroup
)
The issue that tipped me off to this pathology was when i tried to call DataSet.infer_mtz_dtypes
.
Traceback (most recent call last):
File "dtypes_bug.py", line 16, in <module>
mtz.infer_mtz_dtypes()
File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 38, in wrapped
return f(ds.copy(), *args, **kwargs)
File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 658, in infer_mtz_dtypes
self.reset_index(inplace=True, level=index_keys)
File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 271, in reset_index
_handle_cached_dtypes(self, columns, drop)
File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 265, in _handle_cached_dtypes
dtype = dataset._cache_index_dtypes.pop(key)
KeyError: 'H'
Input:
dataset = read_crystfel("example.stream")
import reciprocalspaceship as rs
Output:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-1212d53388f4> in <module>
----> 1 dataset = rs.read_crystfel("example.stream")
AttributeError: module 'reciprocalspaceship' has no attribute 'read_crystfel'
It can be useful to have normalized structure factors for certain applications, and we have all the ingredients needed to make this happen in different places. I think it would make sense to add a function that computes normalized structure factors to rs.algorithms
, or perhaps as a built-in method of rs.DataSet
.
DataFrame.append()
and Series.append()
were deprecated in pandas v1.4. We should remove the overloaded functions from reciprocalspaceship
to avoid future compatibility issues. rs.concat()
already has the required functionality to replace the methods.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.