At <a href="https://github.com/Hekstra-Lab/reciprocalspaceship/blob/98409cc8552d22d03d

For future reference, this is a productive work-around: <div class="snippet-clipbo

Unstack anomalous taking into account Careless repeats about reciprocalspaceship HOT 10 CLOSED

DHekstra commented on September 2, 2024

Unstack anomalous taking into account Careless repeats

from reciprocalspaceship.

Comments (10)

DHekstra commented on September 2, 2024 1

OK, after some consideration, I agree with all of these design decisions. I agree that the commandline application supports this case already well. Part of the issue came from me wanting to call a function in rsbooster.stats.ccanom directly from in a notebook (which works fine as long as one is aware of the above). My goal is to provide some better support for Careless users who are trying to understand how to interpret the Careless output.

from reciprocalspaceship.

DHekstra commented on September 2, 2024

Ihis would be helpful for CCanom calculation on Careless xval#.mtz files using rsbooster.stats.ccanom.analyze_ccanom_mtz() directly. It is obviously not essential, since I could just use the make_halves_ccanom function in the same location first, but it seems undesirable for the default behavior to be to randomly pair repeats.

from reciprocalspaceship.

kmdalton commented on September 2, 2024

i have a feeling this is better handled at the top level by the user with a ds.groupby("repeat").apply(lambda x: x.unstack_anomalous(...)) or some similar formula. perhaps share the code you're working on and we can recommend a strategy.

from reciprocalspaceship.

DHekstra commented on September 2, 2024

the workaround in this case is quite simple--the problem only occurs if I stack and then unstack--then the repeats no longer match up. So the workaround is not to stack in the first place.

from reciprocalspaceship.

kmdalton commented on September 2, 2024

the stack / unstack_anomalous methods are not meant to be called on unmerged data. in this case, you may have several merged datasets which are concatenated inside one object. this means the miller indices have repeated values. hence, semantically they are the same as unmerged data. these methods should not be called in this situation, it may lead to a number of unpleasant side effects. the proper way to handle this is to scope each of the method calls within a groupby operator so that the miller indices are not redundant in each set.

for this reason, both methods raise a ValueError when called on unmerged data. the issue is that this is contingent on checking the dataset.merged attribute which is, in this case set to True which might have been a bad design choice in careless. on the rs end, we should probably directly test for redundant miller indices rather than rely on dataset.merged. I wonder what @JBGreisman thinks.

from reciprocalspaceship.

DHekstra commented on September 2, 2024

Good point. I'm going the groupby route for now, going off what is implemented in rsbooster.stats.anom.py.

from reciprocalspaceship.

DHekstra commented on September 2, 2024

my initial take was that the "repeat" was a sufficiently common case (for me, anyways) that we might want to implement a provision for it.

from reciprocalspaceship.

kmdalton commented on September 2, 2024

yeah i think this use case is going to be pretty unusual outside the hekstra lab. i don't think any other program besides careless makes this sort of data structure. this is maybe an indicator that careless shouldn't make these sorts of files 🤔

from reciprocalspaceship.

JBGreisman commented on September 2, 2024

My personal feeling is that rs should be kept agnostic to careless decisions regarding column naming. More broadly, I try to avoid any hard-coded column names, because I think that's a recipe for problematic corner cases and "user surprise." There are certainly exceptions for H, K, and L and some internal column names, but generally I'd like to avoid hardcoding a case for repeat.

As @kmdalton said, in my mind, the xval mtz that gets output should be considered "unmerged" because it has repeat Miller indices. To my knowledge, CCanom can be computed from careless output without any use of {stack/unstack}_anomalous -- there is a commandline tool rs.ccanom provided in rsbooster that should be applicable to careless xval output without any modification needed. If you're running into cases of careless output that are unsupported by it, please let me know or file a ticket on rsbooster. rs.ccanom -h can be used to see the arguments it takes. Its internals can also be used as a framework for implementing your own function if that is more useful.

Let me know if there's anything else I can provide as far as useful snippets/templates

from reciprocalspaceship.

DHekstra commented on September 2, 2024

For future reference, this is a productive work-around:

half_repeats=[]
for repeat in out.repeat.unique():
    for half in range(2):
        half_repeat=tmp.loc[(tmp.repeat==repeat) & (tmp.half==half),["F","SigF","I","SigI","N","high$
        half_repeat=half_repeat.unstack_anomalous()
        half_repeat["half"]=half
        half_repeat["half"]=half_repeat["half"].astype('MTZInt')
        half_repeat["repeat"]=repeat
        half_repeat["repeat"]=half_repeat["repeat"].astype('MTZInt')
        half_repeats.append(half_repeat)

out2=rs.concat(half_repeats)

from reciprocalspaceship.

Unstack anomalous taking into account Careless repeats about reciprocalspaceship HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent