rs-station / reciprocalspaceship Goto Github PK

View Code? Open in Web Editor NEW

28.0 8.0 12.0 110.06 MB

Tools for exploring reciprocal space

Home Page: https://rs-station.github.io/reciprocalspaceship/

License: MIT License

Python 99.29% Shell 0.71%

crystallography xray xray-diffraction pandas

reciprocalspaceship's Issues

`DataSet.write_mtz()` with unmerged DataSet moves reflections to ASU

If a DataSet object contains unmerged reflections, a call to DataSet.write_mtz() currently has the side effect of calling DataSet.hkl_to_asu(inplace=True) on the object:

In [1]: ds = rs.read_mtz("data_unmerged.mtz")

In [2]: ds["BATCH"].head()
Out[2]: 
H    K    L 
19   -11  2    1424
3    18   -1    328
-17  21   17    576
21   26   -7     70
-9   17   11    798
Name: BATCH, dtype: Batch

In [3]: ds.write_mtz("/dev/null")

In [4]: ds["BATCH"].head()
Out[4]: 
H   K   L 
19  11  2    1424
18  3   1     328
21  17  17    576
26  21  7      70
17  9   11    798
Name: BATCH, dtype: Batch

I consider this to be a bug because writing data to a file should not change the underlying object.

create empty DataSet from another one?

is there a way to do this: https://stackoverflow.com/questions/18176933/create-an-empty-data-frame-with-index-from-another-data-frame for rs dataframes?

Generalize method for assigning reflections to resolution bins

It is common to assign reflections to resolution bins for different types of crystallographic analyses. This functionality is used in rs.utils.add_rfree(), as well as in rs.algorithms.scale_merged_intensities() when average intensities are computed isotropically.

Since this is a common operation for computing crystallographic stats, I think this functionality should be refactored to be a built-in method for DataSet -- something like DataSet.assign_resolution_bins(nbins=20).

Add rs.DataSet.label_absences() to flag systematic absences

Although there is a function for flagging systematic absences in rs.utils, there is not currently a user-facing method in rs.DataSet. Such a function should be written with the same call signature as rs.DataSet.label_centrics().

Add support for anomalous data to DataSet.to_reciprocalgrid()

DataSet.to_reciprocalgrid() does not currently enable data to differ between Friedel halves of reciprocal space. This functionality would be useful for things like generating maps from underlying anomalous data. As far as API, this could be implemented with an anomalous=True|False argument. However, it may involve a decision such as how to specify the anomalous columns labels.

Internally, if such a function were called with a two-column anomalous DataSet, it would be possible to implement this method using a call to stack_anomalous() in the place of expand_anomalous(). Implicit in this is that the (+)/(-)-suffixed columns would be renamed to drop the Friedel specifications.

Pandas 1.3.0 (but not 1.3.1) support on purpose?

f803979 added explicit support for version 1.3.0, but excluded patch releases such as 1.3.1.

Was that on purpose or should the expression read <1.4?

Adding a warning to apply_symop()

When converting from a centered spacegroup like C2 to a non-centered one like P1, it is natural to apply a symop like
gemmi.Op("1/2*x-1/2*y,1/2*x+1/2*y,z"). When not careful with the input Miller indices, this can result in fractional Miller indices. It may be good to add a check that new indices are integers.

`rs.read_precognition()`: Update docstring and set `spacegroup` after parsing log

It came up today that rs.read_precognition() reads .ii files. This isn't explicitly stated in the docstring. It should be made more clear that the io function supports both file types.

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.

Check this box to trigger a request for Renovate to run again on this repository

[Feature Request] gemmification decorators

Decorators that scan the arguments of a function and automatically convert cell and spacegroup arguments to their proper gemmi types would be useful in writing standalone functions. We have 3 decorators

a general purpose gemmify decorator. @gemmify scans for keyword arguments named sg, spacegroup , space_group, cell, unitcell, unit_cell, and converts them accordingly
a spacegroupify decorator which has *args for each argument name to be converted

@spacegroupify("parent_sg", "child_sg")
def a_crazy_efx_function(parent_sg, child_sg, data):
    ....

a cellify decoraor (in the same vein of spacegroupify)

Integer-backed columns with NaNs cannot be converted to float dtype

Minimal example:

import reciprocalspaceship as rs
ds = rs.DataSet({"int_col": [0, 1, 2, 3]}, dtype="MTZInt")
ds.loc[0, "int_col"] = np.nan
print(ds["int_col"].to_numpy(float))

Traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/commandline/mtzdump.py in <module>
----> 1 ds["int_col"].to_numpy(float)

~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas/core/base.py in to_numpy(self, dtype, copy, na_value, **kwargs)
    511         if is_extension_array_dtype(self.dtype):
    512             # error: Too many arguments for "to_numpy" of "ExtensionArray"
--> 513             return self.array.to_numpy(  # type: ignore[call-arg]
    514                 dtype, copy=copy, na_value=na_value, **kwargs
    515             )

~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dtypes/base.py in to_numpy(self, dtype, copy, na_value)
    122         if self._hasna:
    123             data = self._data.astype(dtype, copy=copy)
--> 124             data[self._mask] = na_value
    125         else:
    126             data = self._data.astype(dtype, copy=copy)

TypeError: float() argument must be a string or a number, not 'NAType'

This error is related to the overloading of the pandas to_numpy() method, and can be fixed by changing the default na_value to np.nan. This is a safe assumption to be making here, because all MTZIntegerArray-backed datatypes have to be compatible with float32 dtypes by construction.

I have a local fix implemented, and will make a PR shortly -- just posting this issue to log the error.

DataSet.stack_anomalous() and DataSet.unstack_anomalous() should not handle unmerged DataSets

Currently, DataSet.unstack_anomalous() can be used with unmerged data to assign reflections to Friedel+ and Friedel- columns, and DataSet.stack_anomalous() can be used to undo the action. In the two-column anomalous format, each reflection is kept on its own row, and NaNs are used to pad the unused columns.

We should revisit this design decision, because it seems to be an uncommon action for unmerged data. The new anomalous flag for hkl_to_asu seems preferable for assigning reflections to Friedel "zones" of the ASU, and if its useful to explicitly assign observations to Friedel+ or Friedel-, I think that will be better handled by a DataSet.assign_friedel() helper function (This would be significantly less memory-intensive, as well).

My plan here is to make DataSet.unstack_anomalous() and DataSet.stack_anomalous() only applicable to DataSet objects with the merged=True attribute. A ValueError (orAttributeError?) would be raised if the functions are invoked with merged=False DataSet objects.

Any additional thoughts?

Align Phases of Isomorphous Structures

Often times, I notice that SAD phasing solutions from isomorphous structures don't overlay in real space. This seems to have something to do with the choice of "phase origin". It'd be super useful if we had a function to make sure that two sets of structure factors have compatible origins. We should be able to implement something akin to @apeck12's method. After talking to @JBGreisman, we decided doing this in cases with good phases might boil down to solving a simple linear program. At any rate, this should absolutely be something we put in rs.algorithms.

Launching Binder is _really_ slow.

This is way outside of my wheelhouse, but I clicked on our binder link for the first time, and it takes a crazy amount of time (8-10 minutes!) to launch. On our end, I think all we're asking for Binder to do is run pip install reciprocalspaceship[dev]. This command installs a lot of stuff. Notably, it installs PyTorch for the robust merging example. I suspect this is what takes so long, but it could certainly be other packages.

I would suggest it is worth some time to pair down the dependencies. In the case of PyTorch, we could defer the install to the first cell of the example notebook.

I'd love some help from @ianhi on this one. He certainly knows this stuff better than I.

Here is a partial log from a binder launch which may be helpful:

Picked Git content provider.
Cloning into '/tmp/repo2docker4xoq9m89'...
HEAD is now at 7781360 Bump version to 0.9.18
Building conda environment for python=3.7Using PythonBuildPack builder
Building conda environment for python=3.7Building conda environment for python=3.7Step 1/48 : FROM buildpack-deps:bionic
 ---> 72ccd8e28f8d
Step 2/48 : ENV DEBIAN_FRONTEND=noninteractive
 ---> Using cache
 ---> 915bfe954aed
Step 3/48 : RUN apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends locales > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 382a4bfa36dd
Step 4/48 : RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen &&     locale-gen
 ---> Using cache
 ---> 23cc551494ea
Step 5/48 : ENV LC_ALL en_US.UTF-8
 ---> Using cache
 ---> ce16adc179fd
Step 6/48 : ENV LANG en_US.UTF-8
 ---> Using cache
 ---> 1443ecbb572c
Step 7/48 : ENV LANGUAGE en_US.UTF-8
 ---> Using cache
 ---> 67fd1928a7e1
Step 8/48 : ENV SHELL /bin/bash
 ---> Using cache
 ---> 0492f760ec33
Step 9/48 : ARG NB_USER
 ---> Using cache
 ---> 0fa8275551af
Step 10/48 : ARG NB_UID
 ---> Using cache
 ---> 9283ec23cac9
Step 11/48 : ENV USER ${NB_USER}
 ---> Using cache
 ---> e92aaa5352f0
Step 12/48 : ENV HOME /home/${NB_USER}
 ---> Using cache
 ---> 6f44454fabec
Step 13/48 : RUN groupadd         --gid ${NB_UID}         ${NB_USER} &&     useradd         --comment "Default user"         --create-home         --gid ${NB_UID}         --no-log-init         --shell /bin/bash         --uid ${NB_UID}         ${NB_USER}
 ---> Using cache
 ---> 0ee4d890aa6c
Step 14/48 : RUN apt-get -qq update &&     apt-get -qq install --yes --no-install-recommends        less        unzip        > /dev/null &&     apt-get -qq purge &&     apt-get -qq clean &&     rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> bd296a1594c5
Step 15/48 : EXPOSE 8888
 ---> Using cache
 ---> 6d6578ba9b76
Step 16/48 : ENV APP_BASE /srv
 ---> Using cache
 ---> 92901276e26d
Step 17/48 : ENV CONDA_DIR ${APP_BASE}/conda
 ---> Using cache
 ---> c61852980ae7
Step 18/48 : ENV NB_PYTHON_PREFIX ${CONDA_DIR}/envs/notebook
 ---> Using cache
 ---> 4f44e70e3668
Step 19/48 : ENV NPM_DIR ${APP_BASE}/npm
 ---> Using cache
 ---> 9bae366c7f81
Step 20/48 : ENV NPM_CONFIG_GLOBALCONFIG ${NPM_DIR}/npmrc
 ---> Using cache
 ---> 8fe62e0295b7
Step 21/48 : ENV NB_ENVIRONMENT_FILE /tmp/env/environment.lock
 ---> Using cache
 ---> a7dee12cf999
Step 22/48 : ENV KERNEL_PYTHON_PREFIX ${NB_PYTHON_PREFIX}
 ---> Using cache
 ---> 8b2229129a3d
Step 23/48 : ENV PATH ${NB_PYTHON_PREFIX}/bin:${CONDA_DIR}/bin:${NPM_DIR}/bin:${PATH}
 ---> Using cache
 ---> 4c9bcaf823ba
Step 24/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2factivate-2dconda-2esh-391af5 /etc/profile.d/activate-conda.sh
 ---> Using cache
 ---> f8d2890e201b
Step 25/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2fenvironment-2epy-2d3-2e7-2elock-4f1154 /tmp/env/environment.lock
 ---> Using cache
 ---> c1410cdb5a70
Step 26/48 : COPY --chown=1000:1000 build_script_files/-2fusr-2flib-2fpython3-2e8-2fsite-2dpackages-2frepo2docker-2fbuildpacks-2fconda-2finstall-2dminiforge-2ebash-514214 /tmp/install-miniforge.bash
 ---> Using cache
 ---> 44936c7cb078
Step 27/48 : RUN TIMEFORMAT='time: %3R' bash -c 'time /tmp/install-miniforge.bash' && rm -rf /tmp/install-miniforge.bash /tmp/env
 ---> Using cache
 ---> 4ab657a80f7e
Step 28/48 : RUN mkdir -p ${NPM_DIR} && chown -R ${NB_USER}:${NB_USER} ${NPM_DIR}
 ---> Using cache
 ---> 01840c1deeaf
Step 29/48 : ARG REPO_DIR=${HOME}
 ---> Using cache
 ---> b440807ab159
Step 30/48 : ENV REPO_DIR ${REPO_DIR}
 ---> Using cache
 ---> f1d0395dc84d
Step 31/48 : WORKDIR ${REPO_DIR}
 ---> Using cache
 ---> 11e14be612ee
Step 32/48 : RUN chown ${NB_USER}:${NB_USER} ${REPO_DIR}
 ---> Using cache
 ---> 4baa295e9a1c
Step 33/48 : ENV PATH ${HOME}/.local/bin:${REPO_DIR}/.local/bin:${PATH}
 ---> Using cache
 ---> 8303e9277f1c
Step 34/48 : ENV CONDA_DEFAULT_ENV ${KERNEL_PYTHON_PREFIX}
 ---> Using cache
 ---> 4c3893acd0bd
Step 35/48 : COPY --chown=1000:1000 src/ ${REPO_DIR}
 ---> d5bd5f9ab584
Step 36/48 : USER ${NB_USER}
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 54852568430a
Removing intermediate container 54852568430a
 ---> 1191068b9b5d
Step 37/48 : RUN ${KERNEL_PYTHON_PREFIX}/bin/pip install --no-cache-dir .
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 05f10e1eaab0
Processing /home/jovyan
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting gemmi<=0.5.1,>=0.4.2
  Downloading gemmi-0.5.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
Collecting pandas<=1.3.5,>=1.2.0
  Downloading pandas-1.3.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB)
Collecting numpy
  Downloading numpy-1.21.5-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
Collecting scipy
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
Requirement already satisfied: ipython in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship==0.9.18) (7.30.1)
Requirement already satisfied: pytz>=2017.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (2021.3)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (2.8.2)
Requirement already satisfied: setuptools>=18.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (60.0.4)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (5.1.0)
Requirement already satisfied: traitlets>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (5.1.1)
Requirement already satisfied: backcall in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.2.0)
Requirement already satisfied: pygments in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (2.10.0)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.18.1)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (4.8.0)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.1.3)
Requirement already satisfied: pickleshare in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship==0.9.18) (3.0.24)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jedi>=0.16->ipython->reciprocalspaceship==0.9.18) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pexpect>4.3->ipython->reciprocalspaceship==0.9.18) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->reciprocalspaceship==0.9.18) (0.2.5)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas<=1.3.5,>=1.2.0->reciprocalspaceship==0.9.18) (1.16.0)
Building wheels for collected packages: reciprocalspaceship
  Building wheel for reciprocalspaceship (setup.py): started
  Building wheel for reciprocalspaceship (setup.py): finished with status 'done'
  Created wheel for reciprocalspaceship: filename=reciprocalspaceship-0.9.18-py3-none-any.whl size=68921 sha256=cefc203a0593825f967eeff3511e8c362c64247e6eaff0dfc0730ff01e1a47bd
  Stored in directory: /tmp/pip-ephem-wheel-cache-1b6sliwd/wheels/24/67/61/461d47532c7e3b6048f03e8f0e3c2ddb1976b17163d48f9fe9
Successfully built reciprocalspaceship
Installing collected packages: numpy, scipy, pandas, gemmi, reciprocalspaceship
Successfully installed gemmi-0.5.1 numpy-1.21.5 pandas-1.3.5 reciprocalspaceship-0.9.18 scipy-1.7.3
Removing intermediate container 05f10e1eaab0
 ---> 9b8a5f6ddc67
Step 38/48 : LABEL repo2docker.ref="7781360585814dbbac9ba0628fb2fbc974a964fb"
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 3a4d023a8819
Removing intermediate container 3a4d023a8819
 ---> 6e4e4f0325d0
Step 39/48 : LABEL repo2docker.repo="https://github.com/Hekstra-Lab/reciprocalspaceship"
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 22531833e814
Removing intermediate container 22531833e814
 ---> 46ad036a4080
Step 40/48 : LABEL repo2docker.version="2021.08.0+78.g4352535"
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in fe8f9fe7819f
Removing intermediate container fe8f9fe7819f
 ---> e0e80f22843d
Step 41/48 : USER ${NB_USER}
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in e87f3615d067
Removing intermediate container e87f3615d067
 ---> 629e656d9a07
Step 42/48 : RUN chmod +x postBuild
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 2e7bfb1cba7d
Removing intermediate container 2e7bfb1cba7d
 ---> 233f67c9a99e
Step 43/48 : RUN ./postBuild
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in a519284bfcd0
Requirement already satisfied: reciprocalspaceship[dev] in /srv/conda/envs/notebook/lib/python3.7/site-packages (0.9.18)
Requirement already satisfied: ipython in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (7.30.1)
Requirement already satisfied: pandas<=1.3.5,>=1.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.3.5)
Requirement already satisfied: gemmi<=0.5.1,>=0.4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (0.5.1)
Requirement already satisfied: scipy in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.7.3)
Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.7/site-packages (from reciprocalspaceship[dev]) (1.21.5)
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting torch
  Downloading torch-1.10.1-cp37-cp37m-manylinux1_x86_64.whl (881.9 MB)
Collecting sphinx-rtd-theme
  Downloading sphinx_rtd_theme-1.0.0-py2.py3-none-any.whl (2.8 MB)
Collecting pytest
  Downloading pytest-6.2.5-py3-none-any.whl (280 kB)
Collecting pytest-cov
  Downloading pytest_cov-3.0.0-py3-none-any.whl (20 kB)
Collecting matplotlib
  Downloading matplotlib-3.5.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
Collecting nbsphinx
  Downloading nbsphinx-0.8.8-py3-none-any.whl (25 kB)
Collecting seaborn
  Downloading seaborn-0.11.2-py3-none-any.whl (292 kB)
Collecting autodocsumm
  Downloading autodocsumm-0.2.7.tar.gz (43 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting pytest-xdist
  Downloading pytest_xdist-2.5.0-py3-none-any.whl (41 kB)
Collecting sphinx
  Downloading Sphinx-4.4.0-py3-none-any.whl (3.1 MB)
Collecting tqdm
  Downloading tqdm-4.62.3-py2.py3-none-any.whl (76 kB)
Collecting scikit-image
  Downloading scikit_image-0.19.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.3 MB)
Collecting sphinx-panels
  Downloading sphinx_panels-0.6.0-py3-none-any.whl (87 kB)
Collecting sphinxcontrib-autoprogram
  Downloading sphinxcontrib_autoprogram-0.1.7-py2.py3-none-any.whl (8.7 kB)
Collecting celluloid
  Downloading celluloid-0.2.0-py3-none-any.whl (5.4 kB)
Requirement already satisfied: python-dateutil>=2.7.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship[dev]) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas<=1.3.5,>=1.2.0->reciprocalspaceship[dev]) (2021.3)
Collecting sphinxcontrib-htmlhelp>=2.0.0
  Downloading sphinxcontrib_htmlhelp-2.0.0-py2.py3-none-any.whl (100 kB)
Requirement already satisfied: Jinja2>=2.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (3.0.3)
Collecting sphinxcontrib-applehelp
  Downloading sphinxcontrib_applehelp-1.0.2-py2.py3-none-any.whl (121 kB)
Collecting snowballstemmer>=1.1
  Downloading snowballstemmer-2.2.0-py2.py3-none-any.whl (93 kB)
Collecting sphinxcontrib-qthelp
  Downloading sphinxcontrib_qthelp-1.0.3-py2.py3-none-any.whl (90 kB)
Collecting docutils<0.18,>=0.14
  Downloading docutils-0.17.1-py2.py3-none-any.whl (575 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (4.10.0)
Requirement already satisfied: Pygments>=2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.10.0)
Collecting sphinxcontrib-devhelp
  Downloading sphinxcontrib_devhelp-1.0.2-py2.py3-none-any.whl (84 kB)
Requirement already satisfied: requests>=2.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.26.0)
Collecting sphinxcontrib-jsmath
  Downloading sphinxcontrib_jsmath-1.0.1-py2.py3-none-any.whl (5.1 kB)
Requirement already satisfied: packaging in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (21.3)
Collecting alabaster<0.8,>=0.7
  Downloading alabaster-0.7.12-py2.py3-none-any.whl (14 kB)
Requirement already satisfied: babel>=1.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinx->reciprocalspaceship[dev]) (2.9.1)
Collecting imagesize
  Downloading imagesize-1.3.0-py2.py3-none-any.whl (5.2 kB)
Collecting sphinxcontrib-serializinghtml>=1.1.5
  Downloading sphinxcontrib_serializinghtml-1.1.5-py2.py3-none-any.whl (94 kB)
Requirement already satisfied: backcall in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.2.0)
Requirement already satisfied: decorator in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (5.1.0)
Requirement already satisfied: pickleshare in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.7.5)
Requirement already satisfied: jedi>=0.16 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.18.1)
Requirement already satisfied: matplotlib-inline in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (0.1.3)
Requirement already satisfied: setuptools>=18.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (60.0.4)
Requirement already satisfied: traitlets>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (5.1.1)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (3.0.24)
Requirement already satisfied: pexpect>4.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipython->reciprocalspaceship[dev]) (4.8.0)
Requirement already satisfied: ipywidgets in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (7.6.3)
Requirement already satisfied: ipykernel in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.6.0)
Requirement already satisfied: notebook in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.3.0)
Requirement already satisfied: nbconvert in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter->reciprocalspaceship[dev]) (6.0.7)
Collecting jupyter-console
  Downloading jupyter_console-6.4.0-py3-none-any.whl (22 kB)
Collecting qtconsole
  Downloading qtconsole-5.2.2-py3-none-any.whl (120 kB)
Collecting fonttools>=4.22.0
  Downloading fonttools-4.28.5-py3-none-any.whl (890 kB)
Requirement already satisfied: pyparsing>=2.2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib->reciprocalspaceship[dev]) (3.0.6)
Collecting pillow>=6.2.0
  Downloading Pillow-9.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
Requirement already satisfied: nbformat in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbsphinx->reciprocalspaceship[dev]) (5.1.3)
Collecting py>=1.8.2
  Downloading py-1.11.0-py2.py3-none-any.whl (98 kB)
Requirement already satisfied: attrs>=19.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pytest->reciprocalspaceship[dev]) (21.2.0)
Collecting iniconfig
  Downloading iniconfig-1.1.1-py2.py3-none-any.whl (5.0 kB)
Collecting pluggy<2.0,>=0.12
  Downloading pluggy-1.0.0-py2.py3-none-any.whl (13 kB)
Collecting toml
  Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting coverage[toml]>=5.2.1
  Downloading coverage-6.2-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (213 kB)
Collecting pytest-forked
  Downloading pytest_forked-1.4.0-py3-none-any.whl (4.9 kB)
Collecting execnet>=1.1
  Downloading execnet-1.9.0-py2.py3-none-any.whl (39 kB)
Collecting imageio>=2.4.1
  Downloading imageio-2.13.5-py3-none-any.whl (3.3 MB)
Collecting PyWavelets>=1.1.1
  Downloading PyWavelets-1.2.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (6.1 MB)
Collecting networkx>=2.2
  Downloading networkx-2.6.3-py3-none-any.whl (1.9 MB)
Collecting tifffile>=2019.7.26
  Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB)
Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from sphinxcontrib-autoprogram->reciprocalspaceship[dev]) (1.16.0)
Requirement already satisfied: typing-extensions in /srv/conda/envs/notebook/lib/python3.7/site-packages (from torch->reciprocalspaceship[dev]) (4.0.1)
Collecting tomli
  Downloading tomli-2.0.0-py3-none-any.whl (12 kB)
Requirement already satisfied: zipp>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from importlib-metadata>=4.4->sphinx->reciprocalspaceship[dev]) (3.6.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jedi>=0.16->ipython->reciprocalspaceship[dev]) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from Jinja2>=2.3->sphinx->reciprocalspaceship[dev]) (2.0.1)
Requirement already satisfied: defusedxml in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.7.1)
Requirement already satisfied: mistune<2,>=0.8.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.8.4)
Requirement already satisfied: testpath in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.0)
Requirement already satisfied: entrypoints>=0.2.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.3)
Requirement already satisfied: jupyterlab-pygments in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.1.2)
Requirement already satisfied: jupyter-core in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (4.9.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (1.5.0)
Requirement already satisfied: bleach in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (4.1.0)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.9)
Requirement already satisfied: ipython-genutils in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbformat->nbsphinx->reciprocalspaceship[dev]) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from nbformat->nbsphinx->reciprocalspaceship[dev]) (4.3.2)
Requirement already satisfied: ptyprocess>=0.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pexpect>4.3->ipython->reciprocalspaceship[dev]) (0.7.0)
Requirement already satisfied: wcwidth in /srv/conda/envs/notebook/lib/python3.7/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->reciprocalspaceship[dev]) (0.2.5)
Requirement already satisfied: charset-normalizer~=2.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (2.0.9)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (2021.10.8)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (3.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from requests>=2.5.0->sphinx->reciprocalspaceship[dev]) (1.26.7)
Requirement already satisfied: argcomplete>=1.12.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (1.12.3)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (1.5.1)
Requirement already satisfied: tornado<7.0,>=4.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (6.1)
Requirement already satisfied: jupyter-client<8.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipykernel->jupyter->reciprocalspaceship[dev]) (7.1.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipywidgets->jupyter->reciprocalspaceship[dev]) (3.5.2)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from ipywidgets->jupyter->reciprocalspaceship[dev]) (1.0.2)
Requirement already satisfied: pyzmq>=17 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (22.3.0)
Requirement already satisfied: argon2-cffi in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (21.1.0)
Requirement already satisfied: prometheus-client in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (0.12.0)
Requirement already satisfied: terminado>=0.8.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (0.12.1)
Requirement already satisfied: Send2Trash>=1.5.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from notebook->jupyter->reciprocalspaceship[dev]) (1.8.0)
Collecting qtpy
  Downloading QtPy-2.0.0-py3-none-any.whl (62 kB)
Requirement already satisfied: importlib-resources>=1.4.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->nbsphinx->reciprocalspaceship[dev]) (5.4.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat->nbsphinx->reciprocalspaceship[dev]) (0.18.0)
Requirement already satisfied: nest-asyncio>=1.5 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from jupyter-client<8.0->ipykernel->jupyter->reciprocalspaceship[dev]) (1.5.4)
Requirement already satisfied: cffi>=1.0.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from argon2-cffi->notebook->jupyter->reciprocalspaceship[dev]) (1.15.0)
Requirement already satisfied: webencodings in /srv/conda/envs/notebook/lib/python3.7/site-packages (from bleach->nbconvert->jupyter->reciprocalspaceship[dev]) (0.5.1)
Requirement already satisfied: pycparser in /srv/conda/envs/notebook/lib/python3.7/site-packages (from cffi>=1.0.0->argon2-cffi->notebook->jupyter->reciprocalspaceship[dev]) (2.21)
Building wheels for collected packages: autodocsumm
  Building wheel for autodocsumm (setup.py): started
  Building wheel for autodocsumm (setup.py): finished with status 'done'
  Created wheel for autodocsumm: filename=autodocsumm-0.2.7-py3-none-any.whl size=13521 sha256=1d8a97a3eb5851349339a541e8d7a4d78610530a35b5b10a57e18d05b2de3c03
  Stored in directory: /home/jovyan/.cache/pip/wheels/c7/7e/cb/6102fccefbd2ca3339722fcddfa7787a88d52ddbbfbd280221
Successfully built autodocsumm
Installing collected packages: toml, py, pluggy, iniconfig, tomli, sphinxcontrib-serializinghtml, sphinxcontrib-qthelp, sphinxcontrib-jsmath, sphinxcontrib-htmlhelp, sphinxcontrib-devhelp, sphinxcontrib-applehelp, snowballstemmer, qtpy, pytest, pillow, kiwisolver, imagesize, fonttools, docutils, cycler, coverage, alabaster, tifffile, sphinx, qtconsole, PyWavelets, pytest-forked, networkx, matplotlib, jupyter-console, imageio, execnet, tqdm, torch, sphinxcontrib-autoprogram, sphinx-rtd-theme, sphinx-panels, seaborn, scikit-image, pytest-xdist, pytest-cov, nbsphinx, jupyter, celluloid, autodocsumm
Successfully installed PyWavelets-1.2.0 alabaster-0.7.12 autodocsumm-0.2.7 celluloid-0.2.0 coverage-6.2 cycler-0.11.0 docutils-0.17.1 execnet-1.9.0 fonttools-4.28.5 imageio-2.13.5 imagesize-1.3.0 iniconfig-1.1.1 jupyter-1.0.0 jupyter-console-6.4.0 kiwisolver-1.3.2 matplotlib-3.5.1 nbsphinx-0.8.8 networkx-2.6.3 pillow-9.0.0 pluggy-1.0.0 py-1.11.0 pytest-6.2.5 pytest-cov-3.0.0 pytest-forked-1.4.0 pytest-xdist-2.5.0 qtconsole-5.2.2 qtpy-2.0.0 scikit-image-0.19.1 seaborn-0.11.2 snowballstemmer-2.2.0 sphinx-4.4.0 sphinx-panels-0.6.0 sphinx-rtd-theme-1.0.0 sphinxcontrib-applehelp-1.0.2 sphinxcontrib-autoprogram-0.1.7 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.0 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 tifffile-2021.11.2 toml-0.10.2 tomli-2.0.0 torch-1.10.1 tqdm-4.62.3
Removing intermediate container a519284bfcd0
 ---> 42bf0708571d
Step 44/48 : ENV PYTHONUNBUFFERED=1
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in b585ae5ccf83
Removing intermediate container b585ae5ccf83
 ---> 4d56d3fbbe2d
Step 45/48 : COPY /python3-login /usr/local/bin/python3-login
 ---> 3d98ceb43ecb
Step 46/48 : COPY /repo2docker-entrypoint /usr/local/bin/repo2docker-entrypoint
 ---> 11a6f6fd2d25
Step 47/48 : ENTRYPOINT ["/usr/local/bin/repo2docker-entrypoint"]
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 8fc78e8e4f26
Removing intermediate container 8fc78e8e4f26
 ---> 9cdcb96d4175
Step 48/48 : CMD ["jupyter", "notebook", "--ip", "0.0.0.0"]
 ---> [Warning] Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
 ---> Running in 851c7348604f
Removing intermediate container 851c7348604f
 ---> 39789a85a264
{"aux": {"ID": "sha256:39789a85a2645b262f62b3bb8e887e03ae3c43efbef55692fb8103be25c09452"}}Successfully built 39789a85a264
Successfully tagged turingmybinder/binder-prod-r2d-g5b5b759-hekstra-2dlab-2dreciprocalspaceship-670595:7781360585814dbbac9ba0628fb2fbc974a964fb
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image
Pushing image

Added support for complex datatypes

Currently, we try to fully support applying phase shifts when moving reflections around the unit cell (DataSet.hkl_to_asu(), DataSet.apply_symop(), etc.). This is implemented by checking for the PhaseDtype, and updating any columns that are found.

We should also consider doing the same more broadly for structure factors stored as complex numbers. I added some support for this in DataSet.apply_symop(), but it is still missing from hkl_to_asu() and hkl_to_observed(). In an analogous way to DataSet.get_phase_keys(), this can be implemented using the recently added DataSet.get_complex_keys() helper method.

Few additional thoughts:

Is it safe to assume that any complex number stored in a DataSet represents a structure factor? Are there any other reasonable use cases for storing a complex number for which applying a phase shift would be problematic?
If there are other use cases for complex numbers, we could perhaps implement a custom dtype for complex structure factors. It would also be possible to implement compatibility with MTZ files by automatically reducing the complex number into |F|/phi columns.

P.S. I labeled this as a bug because I think we should 1) explain the behavior clearly in the documentation and 2) make it consistent to minimize surprise

Not possible to pickle a rs.DataSet

The cell and spacegroup attributes of a rs.DataSet are stored as gemmi objects, which breaks the DataSet.to_pickle() method inherited from pandas.

import reciprocalspaceship as rs

mtz = rs.read_mtz("docs/examples/data/HEWL_SSAD_24IDC.mtz")
print(mtz.spacegroup) # Prints <gemmi.SpaceGroup("P 43 21 2")>
mtz.to_pickle("test.pkl")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-e7af7b9cd589> in <module>
----> 1 mtz.to_pickle("test.pkl")

~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas-1.2.1-py3.8-macosx-10.9-x86_64.egg/pandas/core/generic.py in to_pickle(self, path, compression, protocol, storage_options)
   2861         from pandas.io.pickle import to_pickle
   2862 
-> 2863         to_pickle(
   2864             self,
   2865             path,

~/miniconda3/envs/rs/lib/python3.8/site-packages/pandas-1.2.1-py3.8-macosx-10.9-x86_64.egg/pandas/io/pickle.py in to_pickle(obj, filepath_or_buffer, compression, protocol, storage_options)
     95         storage_options=storage_options,
     96     ) as handles:
---> 97         pickle.dump(obj, handles.handle, protocol=protocol)  # type: ignore[arg-type]
     98 
     99 

TypeError: cannot pickle 'gemmi.SpaceGroup' object

It would be useful to fix this functionality so that DataSet objects can be restored without needing to go to MTZ format. This could enable storing complex numbers in columns or the storage of arbitrary metadata in the DataSet.attrs attribute.

Two possible solutions:

Make it possible to pickle gemmi.SpaceGroup and gemmi.UnitCell objects
Or, as a workaround, we could overload DataSet.to_pickle() to cache the spacegroup / cell attributes in pickle-friendly forms, and write rs.read_pickle() to read the pickle file and re-set the cached spacegroup / cell.

Here is a proof of concept of how the workaround above could look:

# Cache attributes -- this would get implicitly handled by DataSet.to_pickle()
mtz.attrs["spacegroup"] = mtz.spacegroup.xhm()
mtz.attrs["cell"] = mtz.cell.parameters

# Remove gemmi objects to avoid pickle problems
mtz.spacegroup = None
mtz.cell = None

# Write pickle
mtz.to_pickle("dataset.pkl")

# Read pickle -- in the future this could all get wrapped into `rs.read_pickle()`
import pandas as pd
ds = rs.DataSet(pd.read_pickle("dataset.pkl"))
ds.spacegroup = ds.attrs["spacegroup"]
ds.cell = ds.attrs["cell"]

Update "dev" install to include dependencies for example notebooks

The example notebooks included in the documentation and binder involve a few additional dependencies that are not included in the install_requires. These include matplotlib, seaborn, celluloid, scikit-image, and soon-to-be pytorch.

These additional dependencies should be added to the dev mode of extras_require, so that it is easy to get compatible versions of all relevant dependencies. This can also be used to simplify the binder setup, and to add the ability to "re-run" the notebooks when building documentation to ensure that everything is up-to-date.

'pip install reciprocalspaceship' fails in a python=3.9 env on macOS

Way to reproduce this issue:

conda create -n rs_test python=3.9
conda activate rs_test
pip install reciprocalspaceship

Error:

Failed building wheel for gemmi ...

The building is ok in 3.8 or lower python version as i tried, although still slow. Seems this is a gemmi issue, not reciprocalspaceship's. But it might be meaningful to note this in the installation tutorial.

Make changing spacegroups easy

Right now there is no obvious way to change a DataSet instance to a different spacegroup which possibly has a different basisop and/or reciprocal asu. For unmerged data, I think something along the lines of the following is probably sufficient:

def change_spacegroup(self, new_sg, inplace=False):
    if not inplace:
        ds = self.copy()
    else:
        ds = self

    if not isinstance(new_sg, gemmi.SpaceGroup):
        new_sg = gemmi.SpaceGroup(new_sg)

    ds.apply_symop(ds.spacegroup.basisop.inverse(), inplace=True)
    ds.apply_symop(new_sg.basisop, inplace=True)

    return ds

For merged data, I think in addition to the basisop application, you would want to expand the asu to P1 and select the new asu.

IMHO there is enough nuance to this problem that we should provide a well tested method for this.

broken URL

the link referring to
http://legacy.ccp4.ac.uk/html/mtzformat.html#coltypes

at https://hekstra-lab.github.io/reciprocalspaceship/api/autoapi/reciprocalspaceship.summarize_mtz_dtypes.html

does no longer work and should read instead:
https://www.ccp4.ac.uk/html/mtzformat.html#coltypes (the #coltypes part does not actually work)

DataSet.stack_anomalous Fails for duplicate column names

If Friedel columns share the same base name with non-Friedel columns, stack_anomalous will fail with a cryptic error message.

Here is a minimal example

import reciprocalspaceship as rs
import numpy as np


dmin = 2.
cell = [10., 20., 30., 90., 90., 90.]
sg = 19

h,k,l = rs.utils.generate_reciprocal_asu(cell, sg, dmin, anomalous=False).T

ds = rs.DataSet({
        'H' : h,
        'K' : k,
        'L' : l,
        'I' : np.ones(len(h)),
        'I(+)' : np.ones(len(h)),
        'I(-)' : np.ones(len(h)),
    }, 
    merged=True, 
    cell=cell, 
    spacegroup=sg
).infer_mtz_dtypes().set_index(['H', 'K', 'L'])

print(ds)
print(ds.dtypes)

ds.stack_anomalous()

which outputs

user@computer:~$ python bug.py
I              Intensity
I(+)    FriedelIntensity
I(-)    FriedelIntensity
dtype: object
Traceback (most recent call last):
  File "bug.py", line 26, in <module>
    ds.stack_anomalous()
  File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 907, in stack_anomalous
    F[label] = F[label].from_friedel_dtype()
  File "/home/kmdalton/opt/anaconda/envs/careless/lib/python3.8/site-packages/pandas/core/generic.py", line 5487, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataSet' object has no attribute 'from_friedel_dtype'

`DataSet.stack_anomalous()` should accept suffixes as alternatives to full column specification

Related to this (and hopefully not compounding the issue) would it make sense for DataSet.stack_anomalous() to take something like plus_suffix and minus_suffix arguments, as alternatives for plus_labels and minus_labels?

Definitely not critical, but would a) occasionally save some typing and b) be internally consistent with the way the defaults work (without breaking previous code), e.g. plus_suffix="(+)" / minus_suffix="(-)"

(This also seems like an easy enough change that I could try to tackle it myself, with blessing?)

Originally posted by @dennisbrookner in #99 (comment)

dtypes are inconsistent throughout rs.utils

We do not consistently use the same dtypes in functions within rs.utils. We should come up with a unified philosophy for how numpy dtypes are determined for returned values. I can think of at least three defensible possibilities:

Use input dtypes to decide return dtypes
Add a dtype=np.{float32, int32, ...} parameter to each function
Always return np.float32 or np.int32 as applicable.

I lean toward the last option, because it meshes best with the mtz standard, and I think it will lead to fewer edge cases and gotchas.

DataSet.infer_mtz_dtypes fails with pandas.core.indexes.range.RangeIndex

The current implementation of infer_dtypes makes strong assumptions about the type of index. If infer_dtypes is called on a DataSet with a RangeIndex, it will throw a KeyError:

[ins] In [35]: ds.reset_index().infer_mtz_dtypes()                                                                                                                                                                                                             
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: None

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
~/opt/restoration-software/examples/precognition.py in <module>
----> 1 ds.reset_index().infer_mtz_dtypes()

~/opt/anaconda/envs/rs/lib/python3.7/site-packages/reciprocalspaceship-0.8.2-py3.7.egg/reciprocalspaceship/dataset.py in infer_mtz_dtypes(self, inplace)
    281             if c is not None:
    282                 dataset[c] = dataset[c].infer_mtz_dtype()
--> 283         dataset.set_index(index_keys, inplace=True)
    284         return dataset
    285 

~/opt/anaconda/envs/rs/lib/python3.7/site-packages/reciprocalspaceship-0.8.2-py3.7.egg/reciprocalspaceship/dataset.py in set_index(self, keys, **kwargs)
     92         # Copy dtypes of keys to cache
     93         for key in keys:
---> 94             self._cache_index_dtypes[key] = self[key].dtype.name
     95 
     96         return super().set_index(keys, **kwargs)

~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~/opt/anaconda/envs/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: None

This happens, because RangeIndex.names is a FrozenList with contents None:

[ins] In [8]: ds.index.names  
Out[8]: FrozenList([None])

Add a redundancy column to rs.algorithms.merge() output

result["redundancy"] = g["wI"].count() would add an output column with the redundancy per observation.

Add Anomalous Flag to DataSet.hkl_to_asu

Right now it is sort of frustrating to map reflections to the reciprocal asu while preserving their sign.
The cleanest solution I've come up with so far is this:

ds = rs.read_mtz(inFN).hkl_to_asu()
fplus = ds['M/ISYM']%2 == 1 #Identify friedel plus reflections
ds = ds[~fplus].apply_symop('-x,-y,-z').append(ds[fplus])

The problem with just letting this be the supported approach is that it requires the user to have to understand the M/ISYM column. Given that M/ISYM is really sort of an odd historical artifact more than anything, I think we should provide an intuitive method to solve this. I propose we embed a solution like this within DataSet.hkl_to_asu in the form of a a new named parameter anomalous which defaults to false.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

MTZ data set hierarchy?

I was curious whether it's possible to output MTZ files that support assigning 'crystals' and/or 'projects' to columns (in the sense of https://www.ccp4.ac.uk/html/mtzformat.html). I ran into this issue when trying to run a data set through SCALEIT and it objecting that data sets scaled belonged to the same crystal. I circumvented the issue by outputting separate mtz files and merging them in CAD. My understanding is that GEMMI can handle project names and crystal names (https://gemmi.readthedocs.io/en/latest/hkl.html#mtz-format).

`rs.utils.compute_structurefactor_multiplicity` doesn't account for centering

Right now epsilon factors do not account for space group centering. We should change rs.utils.compute_structurefactor_multiplicity to account for the space group centering operations. This is easy enough to implement, but right now our test data for epsilon factors are from sgtbx. It is easily verified that these don't take centering into account.

>>> df = pd.read_csv("tests/data/sgtbx/sgtbx.csv.bz2")
>>> df.groupby('xhm').min()['epsilon'].max()
    1

I propose we modify tests/data/gen_sgtbx_reference_data.sh to use both gemmi and sgtbx for epsilons. When we test against sgtbx, we just have to remember to divide by len(spacegroup.operations().cen_ops) and/or epsilons.min().

DataSet.select_dtypes does not support custom dtypes

pd.DataFrame has a method select_dtypes which returns columns matching a particular numpy dtype. In the context of rs it'd be natural for this to support differentiating custom MTZDtype's. However, this is not the case right now.

Given an example mtz file,

[ins] In [1]: mtz.head()
Out[1]: 
             F(+)      SigF(+)       F(-)      SigF(-)  N(+)  N(-)       high(+)     loc(+)  low(+)     scale(+)       high(-)     loc(-)  low(-)     scale(-)
H K L
0 0 4  0.94140863 0.0060185874 0.94140863 0.0060185874   8.0   8.0 10000000000.0 0.94140863   1e-32 0.0060185874 10000000000.0 0.94140863   1e-32 0.0060185874
    8   1.8974894   0.01334675  1.8974894   0.01334675   8.0   8.0 10000000000.0  1.8974894   1e-32   0.01334675 10000000000.0  1.8974894   1e-32   0.01334675
    12  2.1121132   0.02015744  2.1121132   0.02015744   8.0   8.0 10000000000.0  2.1121132   1e-32   0.02015744 10000000000.0  2.1121132   1e-32   0.02015744
    16   5.133872  0.033373583   5.133872  0.033373583   4.0   4.0 10000000000.0   5.133872   1e-32  0.033373583 10000000000.0   5.133872   1e-32  0.033373583
    20 0.19568625   0.12823802 0.19568625   0.12823802   1.0   1.0 10000000000.0 0.12831146   1e-32   0.17213167 10000000000.0 0.12831146   1e-32   0.17213167

with dtypes

[ins] In [2]: mtz.dtypes
Out[2]: 
F(+)        FriedelSFAmplitude
SigF(+)        StddevFriedelSF
F(-)        FriedelSFAmplitude
SigF(-)        StddevFriedelSF
N(+)                   MTZReal
N(-)                   MTZReal
high(+)                MTZReal
loc(+)                 MTZReal
low(+)                 MTZReal
scale(+)               MTZReal
high(-)                MTZReal
loc(-)                 MTZReal
low(-)                 MTZReal
scale(-)               MTZReal
dtype: object

rs.DataSet.select_dtypes appears to fallback to the numpy dtype. For instance, when I call, mtz.select_dtypes("G") I expect rs to return a DataSet or view containing only "F(+)" and "F(-)" columns. Instead, I get all the columns backed by np.float32

[nav] In [3]: mtz.select_dtypes("G")
Out[5]: 
               F(+)      SigF(+)       F(-)      SigF(-)  N(+)  N(-)       high(+)     loc(+)  low(+)     scale(+)       high(-)     loc(-)  low(-)     scale(-)
H  K  L
0  0  4  0.94140863 0.0060185874 0.94140863 0.0060185874   8.0   8.0 10000000000.0 0.94140863   1e-32 0.0060185874 10000000000.0 0.94140863   1e-32 0.0060185874
      8   1.8974894   0.01334675  1.8974894   0.01334675   8.0   8.0 10000000000.0  1.8974894   1e-32   0.01334675 10000000000.0  1.8974894   1e-32   0.01334675
      12  2.1121132   0.02015744  2.1121132   0.02015744   8.0   8.0 10000000000.0  2.1121132   1e-32   0.02015744 10000000000.0  2.1121132   1e-32   0.02015744
      16   5.133872  0.033373583   5.133872  0.033373583   4.0   4.0 10000000000.0   5.133872   1e-32  0.033373583 10000000000.0   5.133872   1e-32  0.033373583
      20 0.19568625   0.12823802 0.19568625   0.12823802   1.0   1.0 10000000000.0 0.12831146   1e-32   0.17213167 10000000000.0 0.12831146   1e-32   0.17213167
...             ...          ...        ...          ...   ...   ...           ...        ...     ...          ...           ...        ...     ...          ...
14 13 19        NaN          NaN 0.55378014   0.08148462   NaN   2.0           NaN        NaN     NaN          NaN 10000000000.0 0.55378014     0.0   0.08148462
   11 20        NaN          NaN  0.6732702   0.09068045   NaN   2.0           NaN        NaN     NaN          NaN 10000000000.0  0.6732702     0.0   0.09068045
   10 20        NaN          NaN  0.8092094   0.08233523   NaN   2.0           NaN        NaN     NaN          NaN 10000000000.0  0.8092094     0.0   0.08233523
   9  20        NaN          NaN  1.2847979   0.06926164   NaN   2.0           NaN        NaN     NaN          NaN 10000000000.0  1.2847979     0.0   0.06926164
   8  20        NaN          NaN   1.344098   0.06747224   NaN   2.0           NaN        NaN     NaN          NaN 10000000000.0   1.344098     0.0   0.06747224

which is all columns in this case.

Making this behave as expected either requires a change to the underlying pandas method or overloading the method in rs. From this perspective, it might be better to raise this issue with the pandas devs. Not sure.

stack/unstack_anomalous roundtrip does not work

Calling DataSet.unstack_anomalous followed by DataSet.stack_anomalous does not always work in rs version 0.9.15. The following code verifies that this fails sometimes and succeeds others.

import gemmi
import reciprocalspaceship as rs
import numpy as np



cell = gemmi.UnitCell(10., 20., 30., 90., 90., 90.)
sg = gemmi.SpaceGroup(19)
dmin = 2.

h,k,l = rs.utils.generate_reciprocal_asu(cell, sg, dmin, anomalous=True).T
n = len(h)

ds = rs.DataSet({
        'H' : h,
        'K' : k,
        'L' : l,
        'F' : np.random.random(n),
        'loc' : np.random.random(n),
        'scale' : np.random.random(n),
    }, 
    spacegroup=sg, 
    cell=cell, 
    merged=True,
).infer_mtz_dtypes().set_index(['H', 'K', 'L'])

assert all(ds.keys() == ['F', 'loc', 'scale'])

unstacked = ds.unstack_anomalous()
print(unstacked.keys())
unstacked.stack_anomalous()

I find that the order of columns in unstacked is not always the same, despite consistent column ordering in ds. When the column order is Index(['F(+)', 'loc(+)', 'scale(+)', 'F(-)', 'loc(-)', 'scale(-)'], dtype='object'), the script succeeds. When the column order is Index(['F(+)', 'loc(+)', 'scale(+)', 'scale(-)', 'loc(-)', 'F(-)'], dtype='object'), it fails with the following traceback:

Traceback (most recent call last):                                                                                                                             
  File "stack_bug.py", line 31, in <module>                                                                                                                    
    unstacked.stack_anomalous()                                                                                                                                
  File ".../anaconda/envs/careless/lib/python3.8/site-packages/reciprocalspaceship/dataset.py", line 911, in stack_anomalous                    
    raise ValueError(                                                                                                                                          
ValueError: Corresponding labels in ['F(+)', 'loc(+)', 'scale(+)'] and ['scale(-)', 'loc(-)', 'F(-)'] are not the same dtype: FriedelSFAmplitude and MTZReal

I don't know where this stochasticity is coming from, but it is probably somewhere in DataSet.stack_anomalous. My guess would be it has something to do with the (non?)determinism of pd.DataFrame.merge. I don't see any obvious place where the column order could be getting scrambled.

_cache_index_dtypes reads like a method name

I have to fight really hard to prevent my brain from parsing _cache_index_dtypes as a method name. That is I interpret cache as a verb. This is low priority, but could we switch this attribute name to _index_dtypes_cache?

Error in `algorithms.merge()` or (more likely) a helper function

Hey all, when working through the second example I'm getting an error with rs.algorithms.merge(). A code chunk:

import reciprocalspaceship as rs

hewl = rs.read_mtz("data/HEWL_unmerged.mtz")

result3 = rs.algorithms.merge(hewl)

Which raises:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-901e5cd689bc> in <module>
----> 1 result3 = rs.algorithms.merge(hewl)

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/algorithms/merge.py in merge(dataset, intensity_key, sigma_key, sort)
     38 
     39     # Reshape anomalous data and use to compute IMEAN / SIGIMEAN
---> 40     result = result.unstack_anomalous()
     41     result.loc[:, ["N(+)", "N(-)"]] = result[["N(+)", "N(-)"]].fillna(0).astype("I")
     42     result["IMEAN"] = result[["wI(+)", "wI(-)"]].sum(axis=1) / result[["w(+)", "w(-)"]].sum(axis=1).astype("Intensity")

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
     56         names = ds.index.names
     57         ds = ds._index_from_names([None], inplace=True)
---> 58         result = f(ds, *args, **kwargs)
     59         result = result._index_from_names(names, inplace=True)
     60         ds = ds._index_from_names(names, inplace=True)

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in unstack_anomalous(self, columns, suffixes)
    876         # Separate DataSet into Friedel(+) and Friedel(-)
    877         columns = set(columns).union(set(["H", "K", "L"]))
--> 878         dataset = self.hkl_to_asu()
    879         if "PARTIAL" in columns: columns.remove("PARTIAL")
    880         for column in columns:

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
     37                 return f(ds, *args, **kwargs)
     38             else:
---> 39                 return f(ds.copy(), *args, **kwargs)
     40         else:
     41             raise KeyError(f'"inplace" not found in local variables of @inplacemethod decorated function {f} '

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in wrapped(ds, *args, **kwargs)
     56         names = ds.index.names
     57         ds = ds._index_from_names([None], inplace=True)
---> 58         result = f(ds, *args, **kwargs)
     59         result = result._index_from_names(names, inplace=True)
     60         ds = ds._index_from_names(names, inplace=True)

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/dataset.py in hkl_to_asu(self, inplace, anomalous)
    975         hkls = dataset.get_hkls()
    976         compressed_hkls, inverse = np.unique(hkls, axis=0, return_inverse=True)
--> 977         asu_hkls, isym, phi_coeff, phi_shift = hkl_to_asu(
    978             compressed_hkls,
    979             dataset.spacegroup,

~/miniconda3/envs/rs/lib/python3.8/site-packages/reciprocalspaceship-0.9.2-py3.8.egg/reciprocalspaceship/utils/asu.py in hkl_to_asu(H, spacegroup, return_phase_shifts)
     82         an array length n containing phase shifts in degrees
     83     """
---> 84     basis_op = spacegroup.basisop
     85     group_ops = spacegroup.operations()
     86     num_ops = len(group_ops)

AttributeError: 'NoneType' object has no attribute 'basisop'

This error occurs in my local copy of reciprocalspaceship/docs/examples/2_mergingstats.ipynb, but not when I open the same notebook through binder, meaning the error must have to do with my local installation (I think?). I have the same versions of rs (0.9.2) and gemmi (0.4.3) as in the binder notebook; my copy has python 3.8.5, whereas the binder notebook has 3.7.8.

An aside: when I had reciprocalspaceship installed via pip, I was instead getting the error that "rs.algorithms has no method merge", but switching to the github version of rs fixed that.

Of course, always possible I'm just doing something silly, but figured I'd pass this along.

DataSet.hkl_to_observed() does not handle partiality flag in M/ISYM

Currently, methods that create or use M/ISYM columns do not account for partiality flags. This occurs in DataSet.hkl_to_observed(), where the ISYM value is used to map Miller indices, but the M/ISYM column is then left intact (though it no longer applies to the mapped indices). Here's a short example using HEWL_unmerged.mtz in tests/data/algorithms:

import reciprocalspaceship as rs
mtz = rs.read_mtz("HEWL_unmerged.mtz")
print(mtz["M/ISYM"])

outputs:

H    K    L  
-22  -9    4      5
-7    18  -9      8
 19  -26   15     7
-15   19   4      3
 15  -2    9     14
                 ..
-3   -17   6     16
-19  -24   1     16
-33   5    6     10
-5   -22   4     16
 29   10   14     1
Name: M/ISYM, Length: 20597, dtype: M/ISYM

Instead, this function should map the Miller indices (which it does currently), and extract the partiality flag into a new column.

Similarly, DataSet.hkl_to_asu() should take such a partiality flag into account, if it exists, in order to write a correct M/ISYM column. I think it also makes sense to make sure that any IO methods can handle such a column describing partiality method to make sure that one can read/write unmerged reflection data without loss of information.

There should be an rs.read_csv method

I was just trying to load a precognition strong spot file which is just a whitespace delimited text file

$ head precognition_integration/e080_001.mccd.re.spt
    0    0    0    2102.28  2175.12      43575.2      290.0
    0    0    0    1429.60  1791.03      19469.1      197.7
    0    0    0    2365.19  1508.56      13289.9      161.6
    0    0    0    2677.00  2137.74      13169.7      161.6
    0    0    0    1572.55  2012.03       9752.7      141.7
    0    0    0    2319.03  2529.97       7220.5      120.4
    0    0    0    1562.52  1850.75       7231.6      123.0
    0    0    0    1818.91  2197.97       6648.6      117.0
    0    0    0    1863.79  1514.90       6607.9      118.2
    0    0    0    1514.11  2559.16       6274.7      112.7

rs.read_precognition fails for this sort of file which is arguably a bug or a design choice. I'm personally not very concerned about that distinction. What concerns me is that this is a very reasonable file format for reflections and I have no good way to get it into a DataSet. Best I can figure is to pass it through pandas as follows

import pandas as pd
import reciprocalspaceship as rs

inFN = "precognition_integration/e080_001.mccd.re.spt"


df = pd.read_csv(inFN, delim_whitespace=True, names=["H", "K", "L", "X", "Y", "I", "SIGI"])
ds = rs.DataSet(df).infer_mtz_dtypes()

Now, this is not super onerous or anything, but I usually don't keep pandas in my imports when working with rs. So it is two extra steps that would just go away if we could do:

ds = rs.read_csv(
  inFN, 
  delim_whitespace=True, 
  names=["H", "K", "L", "X", "Y", "I", "SIGI"], 
  infer_dtypes=True
)

Am I just being cranky or is this a good addition? Does this break any of our API decisions?

Add Support for Reading DIALS `refl` Files

As we discussed extensively on the DIALS Slack channel, it is now relatively easy to parse DIALS .refl files without cctbx/DIALS. Newer versions of DIALS encode reflection tables using msgpack which seems a relatively innocuous dependency to add.

To this end @ndevenish has built a parser that decodes refl tables using numpy. It's nearly complete but may be missing column types. We can find a full list of types in this block. It should be easy to build this into the rs.io submodule as I've done here for example.

There remains the issue of DIALS reflection tables potentially containing some fairly exotic objects (shoeboxes, vectors, matrices). The safest (sadly slowest) thing to do for a first pass is to just default them to objects. We can think about clever solutions later.

Parsing legacy pickle based reflection tables is an open question. For the time being, I think we just can't support them. @ndevenish suggests looking here for clues though.

@JBGreisman, let's chat about this early next week and get it up and running. I think this is already mostly there!

DataSet.hkl_to_asu(anomalous=True) should map centric reflections to Friedel-plus ASU

When DataSet.hkl_to_asu() is called with the anomalous=True flag, reflections are mapped to the Friedel +/- ASU. This makes it useful to construct calls using DataSet.groupby(["H", "K", "L"]) that handle Friedel pairs separately. However, all reflections are only defined as Friedel +/- based on the M/ISYM flag (odd are Friedel+, even. are Friedel-), even if they are centric.

Example:

import reciprocalspaceship as rs

unmerged = rs.read_mtz("tests/data/algorithms/HEWL_unmerged.mtz")
unmerged.label_centrics(inplace=True)

example = unmerged.loc[[(11, 11, 8), (11, -11, -8), (-11, -11, -8)], ["BATCH", "CENTRIC"]]
print("Observations:")
print(example)

not_anom = example.hkl_to_asu(anomalous=False)
print("Friedel + ASU:")
print(not_anom)

anom = example.hkl_to_asu(anomalous=True)
print("Friedel +/- ASU:")
print(anom)

Outputs:

Observations:
            BATCH  CENTRIC
H   K   L                 
 11  11  8    454     True
    -11 -8    909     True
        -8    474     True
-11 -11 -8    203     True
        -8    814     True
        -8    627     True
Friedel + ASU:
         BATCH  CENTRIC  M/ISYM
H  K  L                        
11 11 8    454     True       1
      8    909     True       4
      8    474     True       4
      8    203     True       2
      8    814     True       2
      8    627     True       2
Friedel +/- ASU:
            BATCH  CENTRIC  M/ISYM
H   K   L                         
 11  11  8    454     True       1
-11 -11 -8    909     True       4
        -8    474     True       4
        -8    203     True       2
        -8    814     True       2
        -8    627     True       2

This behavior should be modified to only be used for acentric reflections -- centric reflections should not be considered "Friedel", and should only be mapped to the Friedel-plus ASU. The above example should give identical results for hkl_to_asu() with anomalous=True and anomalous=False.

DataSet.stack_anomalous() should not stack centric reflections

This is similar to #25. Currently DataSet.stack_anomalous() returns a new DataSet with twice as many rows as the input object. This is because every row is split into two, and mapped to the +/- reciprocal space ASU. However centric reflections should not be considered "Friedel" and should always remain in the +ASU.

As such, the returned dataset should end up having 2*n_acentric + n_centric rows.

Build Failing with Latest Gemmi

I am not sure why yet, but the rs build fails with gemmi 0.4.0 which is the latest in pypi. For the time being, I have just made gemmi 0.3.8 the required version in setup.py.

keeping track of parent spacegroup symop?

For EF-X experiments in which ON and OFF data are merged in a lower-symmetry and higher symmetry spacegroup for perturbed and unperturbed data sets, respectively, it might be helpful to retain, in some way, the symops that map the lower-sym datasets ASU to the parent ASU, or to otherwise facilitate this mapping by comparing two settings.

Converting Structure Factors to Intensities

We have made it easy to convert intensities to structure factors using our French-Wilson implementation. However, sometimes I find myself wanting to go the other direction. This is especially useful in comparing careless output to other methods.

I propose we provide a method for doing this conversion which takes into account uncertainties. By this I mean,

From the definition of variance, I = SigF*SigF + F*F
From uncertainty propation, SigI = abs(2*F*SigF)

Note that 2 is only approximate, we could do better if we knew the distribution of F. For starters, we could just add this as a function in the algorithms submodule.

Invoking to_numpy on 2D array returns object dtype

Sometimes when converting multiple columns to numpy, reciprocalspaceship converts the columns to object dtype.

A simple example to reproduce:

import reciprocalspaceship as rs
import numpy as np


ds = rs.DataSet({
    "X" : np.random.random(100),
    "Y" : np.random.random(100),
}).infer_mtz_dtypes()

print(f"ds.dtypes:\n{ds.dtypes}\n")
print(f"ds['X'].to_numpy().dtype:\n  {ds['X'].to_numpy().dtype}\n")
print(f"ds['Y'].to_numpy().dtype:\n  {ds['Y'].to_numpy().dtype}\n")
print(f"ds[['X', 'Y']].to_numpy().dtype:\n  {ds[['X', 'Y']].to_numpy().dtype}")

outputs:

ds.dtypes:
X    MTZReal
Y    MTZReal
dtype: object

ds['X'].to_numpy().dtype:
  float32

ds['Y'].to_numpy().dtype:
  float32

ds[['X', 'Y']].to_numpy().dtype:
  object

DataSet.reset_index() fails using level argument with MultiIndex

DataSet.reset_index() raises a KeyError when using the level argument to specify only a few labels in a MultiIndex. This occurs because reset_index() assumes that all labels are being removed from the index when trying to reassign cached MTZ dtypes:

dataset = rs.read_mtz("tests/data/algorithms/HEWL_unmerged.mtz")
print(dataset.index.names)      # prints ['H', 'K', 'L']
dataset.reset_index(level=['H', 'K'])

Outputs:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2888             try:
-> 2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'L'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-12-93c78908ac77> in <module>
----> 1 dataset.reset_index(level=['H', 'K'])

~/reciprocalspaceship/reciprocalspaceship/dataset.py in reset_index(self, **kwargs)
    135                 for key in newdf._cache_index_dtypes.keys():
    136                     dtype = newdf._cache_index_dtypes[key]
--> 137                     newdf[key] = newdf[key].astype(dtype)
    138                 newdf._cache_index_dtypes = {}
    139             return newdf

~/rs/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2897             if self.columns.nlevels > 1:
   2898                 return self._getitem_multilevel(key)
-> 2899             indexer = self.columns.get_loc(key)
   2900             if is_integer(indexer):
   2901                 indexer = [indexer]

~/rs/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889                 return self._engine.get_loc(casted_key)
   2890             except KeyError as err:
-> 2891                 raise KeyError(key) from err
   2892 
   2893         if tolerance is not None:

KeyError: 'L'

Since pandas supports a level= argument to reset_index(), the overloaded method should be modified to only try to change dtypes of columns that are removed from the index.

DataSet.cell and DataSet.spacegroup attributes do not do type-checking

Currently, the DataSet.cell and DataSet.spacegroup attributes do not do any type checking. They are intended to be set to gemmi.UnitCell and gemmi.SpaceGroup objects, but they can currently be set to anything:

mtz.spacegroup = [1, 2, 3]
print(mtz.spacegroup) # prints [1, 2, 3]

It would make sense to add some type checking here in their respective setter methods. This can also be made to broaden the API so that if spacegroups are set to be a string or int, the gemmi.SpaceGroup constructor is called, and same thing for gemmi.UnitCell if a list/tuple of 6 values is passed.

broken URL

the link to https://www.ccp4.ac.uk/html/mtzformat.html#coltypes

on https://hekstra-lab.github.io/reciprocalspaceship/api/autoapi/reciprocalspaceship.summarize_mtz_dtypes.html#reciprocalspaceship.summarize_mtz_dtypes

is broken (at the moment)

Add a function to compute completeness

I often find myself using external tools to compute completeness, but this is something we could easily implement within rs. I imagine there are other stats we might want to have baked in as well. I propose we add a compute_completeness function with an optional bins argument. I would be in favor of adding an rs.stats namespace for it to live in.

How does this plan sit with you, @JBGreisman ?

rs.DataSet constructor doesn't populate _cache_index_dtypes

The following minimal snippet illustrates how to reproduce the issue. If a DataSet is constructed without an explicit index object, the constructor won't populate self._cache_index_dtypes. This has a lot of nasty side effects, and exposes users to some pretty cryptic error messages.

import reciprocalspaceship as rs
import numpy as np


inFN = 'reciprocalspaceship/tests/data/algorithms/HEWL_SSAD_24IDC.mtz'
mtz = rs.read_mtz(inFN)

mtz = rs.DataSet({
        'I' : mtz['IMEAN'],
        'SigI' : mtz['SIGIMEAN'],
    }, 
    cell = mtz.cell, 
    spacegroup=mtz.spacegroup
)

The issue that tipped me off to this pathology was when i tried to call DataSet.infer_mtz_dtypes.

Traceback (most recent call last):
  File "dtypes_bug.py", line 16, in <module>
    mtz.infer_mtz_dtypes()
  File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 38, in wrapped
    return f(ds.copy(), *args, **kwargs)
  File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 658, in infer_mtz_dtypes
    self.reset_index(inplace=True, level=index_keys)
  File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 271, in reset_index
    _handle_cached_dtypes(self, columns, drop)
  File "/home/kmdalton/opt/reciprocalspaceship/reciprocalspaceship/dataset.py", line 265, in _handle_cached_dtypes
    dtype = dataset._cache_index_dtypes.pop(key)
KeyError: 'H'

name 'read_crystfel' is not defined, on version 0.9.5

Input:

dataset = read_crystfel("example.stream")
import reciprocalspaceship as rs

Output:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-1212d53388f4> in <module>
----> 1 dataset = rs.read_crystfel("example.stream")

AttributeError: module 'reciprocalspaceship' has no attribute 'read_crystfel'

Add method to compute normalized structure factors

It can be useful to have normalized structure factors for certain applications, and we have all the ingredients needed to make this happen in different places. I think it would make sense to add a function that computes normalized structure factors to rs.algorithms, or perhaps as a built-in method of rs.DataSet.

Remove support for `DataSet.append()` due to pandas deprecation

DataFrame.append() and Series.append() were deprecated in pandas v1.4. We should remove the overloaded functions from reciprocalspaceship to avoid future compatibility issues. rs.concat() already has the required functionality to replace the methods.

rs-station / reciprocalspaceship Goto Github PK

reciprocalspaceship's Issues

Recommend Projects

Recommend Topics

Recommend Org