Often times, I notice that SAD phasing solutions from isomorphous structures don't ove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Yes, I think you've characterized the issue correctly, <a class="user-mention notransl

I can confirm that Searching a finite set of origins works to

A few thoughts: As a first pass to define the search space for

This looks really promising, and thank you for the source code. This should be v

Align Phases of Isomorphous Structures,about rs-station/reciprocalspaceship

Comments (26)

kmdalton commented on September 24, 2024 1

I realized that I had solved the 3D translation search problem with a simple iterative algorithm and forgot to post it publicly. I'm just going to put this code up here so it is public. @JBGreisman is working on putting this into rs.algorithms.

import numpy as np 
import reciprocalspaceship as rs
import gemmi



reference_mtz_file = "RTSAD_HEWL_refine_25.mtz"
mtz_file = "overall_best_denmod_map_coeffs.mtz"


ds = rs.read_mtz(reference_mtz_file).stack_anomalous().join(rs.read_mtz(mtz_file)).dropna()

dmin = -1.

ds = ds[ds.compute_dHKL().dHKL >= dmin]

H = ds.get_hkls().astype('float32')
phi_ref = ds['PHIF-model'].to_numpy()
phi = ds['PHWT'].to_numpy()


class TranslationSolver:
    def __init__(self, H, phi_ref, phi):
        """
        H : millers as an N x 3 numpy array
        phi_ref : reference phases in degrees as a numpy array length N
        phi : other phases in degrees as a numpy array length N
        """
        self.H = H
        self.phi_ref = phi_ref
        self.phi = phi
        self.t = None
        self.D = (np.deg2rad(phi) - np.deg2rad(phi_ref)) / (2. * np.pi)

        # Current voxel size in fractional coordinates
        self.grid_size = 0.5

        # Current bounding box which contains the solution
        # 3 ranges in fractional coordinates
        self.a_range = [0., 1.]
        self.b_range = [0., 1.]
        self.c_range = [0., 1.]

    def _fit(self):
        T = np.mgrid[
            self.a_range[0] : self.a_range[1] : self.grid_size,
            self.b_range[0] : self.b_range[1] : self.grid_size,
            self.c_range[0] : self.c_range[1] : self.grid_size,
        ].reshape((3, -1)).T
        loss = np.sum(np.square((self.D[:,None] - H@T.T) - np.round((self.D[:,None] - H@T.T))), axis=0)
        tbest = T[np.argmin(loss)]

        # Update grid size for next iteration
        self.grid_size = self.grid_size / 2.

        # Update bounding box around solution
        a,b,c = tbest
        self.a_range = [max(a-self.grid_size, 0.), min(a+self.grid_size, 1.)]
        self.b_range = [max(b-self.grid_size, 0.), min(b+self.grid_size, 1.)]
        self.c_range = [max(c-self.grid_size, 0.), min(c+self.grid_size, 1.)]
        self.t = tbest

        return tbest

    def fit(self, maxiter=10):
        for i in range(maxiter):
            t = self._fit()
        return t


ts = TranslationSolver(H, phi_ref, phi)
tbest = ts.fit()


out = rs.read_mtz(mtz_file)
H = out.get_hkls()
out['PHWT'] = rs.utils.canonicalize_phases(out['PHWT'] + np.rad2deg(2*np.pi*H@tbest))
out.write_mtz("translated.mtz")

from reciprocalspaceship.

apeck12 commented on September 24, 2024

@kmdalton Is the issue that both structures look correct but are rotated and translated with respect to each other? If that’s the case, then a full phase origin search isn’t needed (except in P1). There’s a limited number of equivalent phase origins for each space group — P212121, for example, has only 4 — so one could get away with just mapping the phases of the second structure onto each of those equivalent origins and checking for consistency with the first.

from reciprocalspaceship.

kmdalton commented on September 24, 2024

Yes, I think you've characterized the issue correctly, @apeck12. Do you know how we can generate the phase origins for a particular group programmatically? In cases like P21 or P61 without orthogonal symmetry elements, are there still finitely many phase origins, or are they restricted to a line?

from reciprocalspaceship.

kmdalton commented on September 24, 2024

I can confirm that

Searching a finite set of origins works to align phases in tetragonal lysozyme. I have started work on a new branch that implements the algorithm I tested.
Arbitrarily translating along the z-axis in spacegroup P63 leads to valid phase combinations and nice electron density.
Arbitrarily translating the phases along other axes does not result in interpretable density.

Based on this, I think we need a hybrid approach to cover all space groups. Space groups with orthogonal symmetry axes should be handled by the approach in 1. I'm afraid we will need to learn the translation vector by a grid search and/or optimization in cases like P63.

What edge cases am I missing?

How can we write effective tests for these new algorithms?

from reciprocalspaceship.

apeck12 commented on September 24, 2024

A few thoughts:

As a first pass to define the search space for a specific space group, I tried parsing the symmetry operations to detect polar axes and valid shifts along nonpolar axes. Polar axes are distinguished by having no symmetry operations that invert the axis (see “Polar Space Groups”). For nonpolar axes, my understanding is that permitted shifts are determined by the fractional translation associated with the symmetry operation that inverts the axis.

Here’s the code:

def find_equivalent_origins(sg_object, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_object : cctbx object
        space group object
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = list(sg_object.smx())
    for op in sym_ops:
        op_as_xyz = op.as_xyz()
        if "-" in op_as_xyz:
            for element in op_as_xyz.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    return eq_origins

Here are plots for valid origin shifts versus a random shift (second row, last column) for a P212121 crystal:

And for a P61 crystal:

I examined a few other space groups with similar results, but this should be tested more exhaustively.

I think for some non-primitive cells, each of the equivalent origins also needs to be checked when translated by (1/2,1/2,1/2).
For experimental data, would it make sense to weight each phase by the I/sig(I) of the associated reflection?
One test would be to check consistency between centric reflections and their expected values.

from reciprocalspaceship.

kmdalton commented on September 24, 2024

This looks really promising, and thank you for the source code. This should be very easy to translate to use gemmi space groups.

What I had done in my branch was just to search the union of all origins available to all nonpolar groups. I have this implemented as

v = np.array([0, 1/6, 1/4, 1/3, 1/2, 2/3, 3/4, 5/6])
v = np.stack(np.meshgrid(v,v,v)).reshape((3, -1))

This is a tad inefficient, but prevents having to write anything space group specific. I think my little meshgrid should include all possible translations across all groups. When I tested that grid with my experimental maps, there was a clear winner. To be concrete, I had experimental SAD phases for tetragonal lysozyme (space group 96) that I wanted to align to an F-Model output from an isomorphous PDB.

Here are the residuals I was evaluating over my meshgrid:

Here are the phases after alignment:

^^ There might still be something confusing going on with centrics. I am not sure. I may need to look into it...

It is a little silly, but I think this could be a decent fallback implementation. I'll have a go at translating your code to rs/gemmi when i have a spare moment.

Easy enough to just always check that vector. It is not a very expensive calculation mercifully.
This is a really good idea. When we implement this in rs.algorithms we should have a weights= keyword.
You are thinking about something like equation 6 in your manuscript?

from reciprocalspaceship.

kmdalton commented on September 24, 2024

I looked into the centrics and scatter plot was just misleading. Turns out most of them are correct.

from reciprocalspaceship.

apeck12 commented on September 24, 2024

Here’s the Gemmi version of the find_equivalent_origins function that takes as input the space group number rather than a cctbx object:

def find_equivalent_origins(sg_no, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_no : int
        space group number
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = gemmi.find_spacegroup_by_number(sg_no).operations()
    sym_ops = [op.triplet() for op in sym_ops]
    for op in sym_ops:
        if "-" in op:
            for element in op.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    return eq_origins

It seems like the main motivation for restricting the search is to avoid invalid origins since align_phases is very efficient. Though perhaps experimental data are always of high enough quality that an invalid origin will never be mistaken for a correct one based on the residuals. It's interesting to see some experimental data.

Eq. 6 is what I had in mind for the centric vs. expected values residual. I can try to write something for this.

from reciprocalspaceship.

apeck12 commented on September 24, 2024

Updated version of the above code, which accounts for centering:

def find_equivalent_origins(sg_no, sampling_interval=0.1):
    """
    Determine fractional coordinates corresponding to equivalent phase
    origins for the space group of interest.
    Parameters
    ----------
    sg_no : int
        space group number
    sampling_interval : float (optional)
        search interval for polar axes as fractional cell length
    Returns
    -------
    eq_origins : array
        nx3 array of equivalent fractional origins 
    """
    
    permitted = dict()
    for axis in ['x','y','z']:
        permitted[axis] = []
    
    # add shift for each symmetry element that contains an inversion
    sym_ops = gemmi.find_spacegroup_by_number(sg_no).operations()
    ops = [op.triplet() for op in sym_ops]
    for op in ops:
        if "-" in op:
            for element in op.split(","):
                if "-" in element: 
                    if any(i.isdigit() for i in element):
                        axis, frac = element.split("+")
                        permitted[axis[1]].append(float(frac[0]) / float(frac[2]))
                    else:
                        permitted[element.split("-")[1][0]].append(0)
    
    # eliminate redundant entries and expand polar axes
    for axis in permitted.keys():
        permitted[axis] = np.unique(permitted[axis])
        if len(permitted[axis]) == 0:
            permitted[axis] = np.arange(0,1,sampling_interval)
            
    # get all combinations from each axis
    eq_origins = np.stack(np.meshgrid(permitted['x'],
                                      permitted['y'],
                                      permitted['z'])).reshape((3, -1)).T
    
    # add centering operations
    cen_ops = np.array(list(sym_ops.cen_ops)) / 24.0
    eq_origins = eq_origins + cen_ops[:,np.newaxis]
    eq_origins = eq_origins.reshape(-1, eq_origins.shape[-1])
    eq_origins = np.delete(eq_origins, np.where(eq_origins==1)[0], axis=0)
    
    return np.unique(eq_origins, axis=0)

from reciprocalspaceship.

kmdalton commented on September 24, 2024

I can confirm that this code recovers [0.5, 0.5, 0.5] which is the correct translation vector for my experimental data in space group 96. Here's the full list:

[[0.   0.   0.  ]
 [0.   0.   0.25]
 [0.   0.   0.5 ]
 [0.   0.   0.75]
 [0.   0.5  0.  ]
 [0.   0.5  0.25]
 [0.   0.5  0.5 ]
 [0.   0.5  0.75]
 [0.5  0.   0.  ]
 [0.5  0.   0.25]
 [0.5  0.   0.5 ]
 [0.5  0.   0.75]
 [0.5  0.5  0.  ]
 [0.5  0.5  0.25]
 [0.5  0.5  0.5 ]
 [0.5  0.5  0.75]]

This is very cool, @apeck12. I will modify this slightly and add it to the branch.

I think I might add a little optimization routine to refine the translation vector after the grid search. I think we will probably need that to get precise translation vectors for these groups like P61.

from reciprocalspaceship.

kmdalton commented on September 24, 2024

Also, @apeck12, if you want to polish this up and submit a PR, I'd obviously be totally cool with that.

from reciprocalspaceship.

kmdalton commented on September 24, 2024

On second thought -- my experiments lead me to believe this function is probably too rough to optimize easily.

Apparently the global minimum is hilariously narrow. I wonder if the problem is better posed in real space (credit to @JBGreisman for suggesting)?

from reciprocalspaceship.

apeck12 commented on September 24, 2024

Curious, thanks for sharing this. I meant to look into doing optimization rather than a grid search some time ago but never got around to it, so this is interesting to see. We didn’t have better luck finding a common or crystallographic phase origin in real rather than reciprocal space, but our data might have been sufficiently different (very incomplete and not reduced) to be irrelevant to this case.

And thanks for adding the find_equivalent_origins function to phase.py. I just added another function for calculating the centric phase residuals in case that’s useful (and adjusted the test case in that script accordingly), but please feel free to revise or omit it as you see fit.

from reciprocalspaceship.

wojdyr commented on September 24, 2024

in the real space - would it be similar to Phenix find_alt_orig_sym_mate and CCP4 csymmatch?

from reciprocalspaceship.

kmdalton commented on September 24, 2024

@wojdyr, the goal is to align two electron density maps directly without a coordinate file. I cannot find a function that does this anywhere. If you know of one, please let us know. Thanks!

from reciprocalspaceship.

kmdalton commented on September 24, 2024

Come to think, I guess this problem is pretty much molecular replacement without the rotation search component.

from reciprocalspaceship.

wojdyr commented on September 24, 2024

I have a naive question: the equivalent origins here are not the same thing as equivalent origins mentioned in
http://legacy.ccp4.ac.uk/html/alternate_origins.html ?

from reciprocalspaceship.

kmdalton commented on September 24, 2024

Yes, those are exactly what we're trying to account for. The problem is that many groups, take for instance P21 have infinitely many alternate origins corresponding to translations along a particular axis. What @apeck12 's snippet does is infer the translation vectors mapping to possible alternate origins. If there are infinitely many possibilities, the function will construct a grid over them with some sampling rate. Does that answer your question?

Also, thanks for that webpage. I had been struggling to find something like that.

from reciprocalspaceship.

kmdalton commented on September 24, 2024

I am afraid the residuals don't look much better in real space.

These are real data from a pair of density modified experimental SAD maps from AutoSol. They are off by a [0.5, 0.5, 0.5] translation. The maps are from the same underlying data set merged with drastically different software. I line-scanned each of the axes keeping the other two fixed at their optimal values.

This is just to say that real space doesn't seem any more promising unfortunately.

from reciprocalspaceship.

JBGreisman commented on September 24, 2024

I'm not quite sure I understand the distinction between red/black entries in the CCP4 table linked by @wojdyr. @apeck12's function for generating possible alternate origins seems to agree nicely with the table for sg19:

from reciprocalspaceship.algorithms.phase import find_equivalent_origins # phalign branch
import gemmi

find_equivalent_origins(gemmi.SpaceGroup(19))

Output:

array([[0. , 0. , 0. ],
       [0. , 0. , 0.5],
       [0. , 0.5, 0. ],
       [0. , 0.5, 0.5],
       [0.5, 0. , 0. ],
       [0.5, 0. , 0.5],
       [0.5, 0.5, 0. ],
       [0.5, 0.5, 0.5]])

However, for sg4, it seems to only produce the first of the red entries:

find_equivalent_origins(gemmi.SpaceGroup(4), sampling_interval=0.25)

Output:

array([[0.  , 0.  , 0.  ],
       [0.  , 0.25, 0.  ],
       [0.  , 0.5 , 0.  ],
       [0.  , 0.75, 0.  ]])

Does the function miss some alternate origins? Or am I misunderstanding the distinction between red/black entries in the CCP4 table?

from reciprocalspaceship.

apeck12 commented on September 24, 2024

@JBGreisman, thanks for catching this discrepancy. Examining space group 4 specifically, the space group diagram for the primitive cell indicates only a polar y axis and no alternative origins:

http://img.chem.ucl.ac.uk/sgp/large/004ay1.htm

For the unconventional B-type lattice, there’s an additional translation that corresponds to one of the alternative origins listed in the CCP4 table:

http://img.chem.ucl.ac.uk/sgp/large/004dy1.htm

I haven’t encountered this before, but according to the fourth slide here, monoclinic B is a transformation of the primitive monoclinic lattice with twice the unit cell volume.

I’m not sure how to retrieve the full set of alternative origins listed in the CCP4 table from the space group’s symmetry operations alone. Perhaps the best approach would be to search the union of all valid nonpolar origins, amended for any polar axes, as @kmdalton described earlier in this issue. However, if some of the origins listed in that table correspond to unconventional lattices, would reindexing potentially be required to find the origin that aligns the phases?

from reciprocalspaceship.

JBGreisman commented on September 24, 2024

This has been dormant for a while, but I just wanted to summarize where I think things stand:

Non-polar spacegroups: we can use the function above written by @apeck12 to generate the possible alternate origins (or we can just brute-force the superset of all possible ones), and the minimal residual between phases (or a similar metric) should give us the appropriate shift in origin.
Polar spacegroups: these are a bit trickier, because of the continuous axis (or axes). As shown by @kmdalton, the residuals are non-trivial to use for optimization, but the appropriate solution does show very distinct minimum.

We could start out by restricting this function to non-polar spacegroups. I know how to classify spacegroups as polar or not in sgtbx (sgtbx.space_group_info(i).number_of_continuous_allowed_origin_shifts()), but is there any way to do this in gemmi?

from reciprocalspaceship.

JBGreisman commented on September 24, 2024

For reference, here is an updated link with alternate origins for each spacegroup: https://www.ccp4.ac.uk/html/alternate_origins_allgroups.html

from reciprocalspaceship.

JBGreisman commented on September 24, 2024

Here's a short snippet that uses gemmi to classify spacegroups as polar, and checks it against the 530 spacegroup settings in sgtbx -- mostly adding this here in case it proves useful:

import numpy as np
import gemmi
from cctbx import sgtbx

def is_polar(sg):
    """Returns whether spacegroup is polar"""
    sym_ops = sg.operations().sym_ops
    a  = np.array([ op.rot for op in sym_ops ])
    return ~(a < 0).any(axis=2).any(axis=0).all()

# Check against sgtbx
for sg in sgtbx.space_group_symbol_iterator():
    sgtbx_polar = sgtbx.space_group(sg).info().number_of_continuous_allowed_origin_shifts() > 0
    if sgtbx_polar != is_polar(gemmi.SpaceGroup(sg.universal_hermann_mauguin())):
        # Will only print spacegroup if there's a disagreement
        print(sg.universal_hermann_mauguin())

from reciprocalspaceship.

kmdalton commented on September 24, 2024

Here's an example using a fourier peak search to align phases (see here):

import numpy as np
import reciprocalspaceship as rs
import gemmi

t_true = np.array([0.205, 0.66666666666666666, 0.7512])

# Read MTZ
reference_mtz_file = "4lzt_phases.mtz"
ds = rs.read_mtz(reference_mtz_file).dropna().hkl_to_asu().canonicalize_phases()
dmin = -1
ds = ds[ds.compute_dHKL().dHKL >= dmin]

ds_trans = ds.copy()

# Add jitter to phases
ds_trans["PHWT"] = ds_trans["PHWT"] + np.random.normal(scale=15.0, size=len(ds))
ds_trans.canonicalize_phases(inplace=True)

# Add jitter to structure factors
ds_trans["FP"] = ds_trans["FP"] + np.random.normal(scale=0.1 * ds_trans['FP'], size=len(ds))

# Translate Phi by t_true
H = ds.get_hkls().astype("float32")
ds_trans["PHWT"] = rs.utils.canonicalize_phases(
    ds_trans["PHWT"] + np.rad2deg(2 * np.pi * H @ t_true)
)

phi_ref = ds["PHWT"].to_numpy()
phi = ds_trans["PHWT"].to_numpy()

# Based on: https://my.ece.msstate.edu/faculty/du/JSTARS-Registration.pdf
# Use phase correlations
dmin = None
sample_rate=3.
f_key = 'FP'
phi_key = 'PHWT'
ds['PC'] = np.conj(ds.to_structurefactor(f_key, phi_key)) * ds_trans.to_structurefactor(f_key, phi_key)
ds['PC'] = ds['PC'] / ds['PC'].abs()

if dmin is None:
    dmin = ds.compute_dHKL().dHKL.min()

gridsize = np.ceil(
    sample_rate * np.array(
        ds.cell.get_hkl_limits(dmin)
    )
).astype('int')
pc_grid = ds.to_reciprocalgrid('PC', gridsize)

real = np.abs(np.fft.fftn(pc_grid))
t = np.concatenate(np.where(real == real.max())) / real.shape

print(f"Translation vector from Fourier peak search: {t}")

This is based on a lysozyme example from @JBGreisman. The mtz to run the script is:
4lzt_phases.mtz.zip

I would love to do this with gemmi.find_blobs_by_flood_fill, but it just doesn't seem to work with these data. I think perhaps the peak is too sharp and has too many fourier ripples to work well. Or I just haven't figured out the right parameters.

from reciprocalspaceship.

JBGreisman commented on September 24, 2024

Just to close up this discussion -- as per the above, a good way to think of this is as an image registration problem. Conveniently, scikit-image supports this and the function, skimage.registration. phase_cross_correlation() works well for this task. It also works in reciprocal space using a 3D complex grid of the data if given space="fourier".

Though I have the outline of a solution here (#174), this seems better suited for rs-booster -- that's the library where we will be putting more applications-based tasks to separate them from the core library.

from reciprocalspaceship.

Align Phases of Isomorphous Structures about reciprocalspaceship HOT 26 OPEN

Comments (26)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent