Comments (10)
OK, after some consideration, I agree with all of these design decisions. I agree that the commandline application supports this case already well. Part of the issue came from me wanting to call a function in rsbooster.stats.ccanom directly from in a notebook (which works fine as long as one is aware of the above). My goal is to provide some better support for Careless users who are trying to understand how to interpret the Careless output.
from reciprocalspaceship.
Ihis would be helpful for CCanom calculation on Careless xval#.mtz files using rsbooster.stats.ccanom.analyze_ccanom_mtz() directly. It is obviously not essential, since I could just use the make_halves_ccanom function in the same location first, but it seems undesirable for the default behavior to be to randomly pair repeats.
from reciprocalspaceship.
i have a feeling this is better handled at the top level by the user with a ds.groupby("repeat").apply(lambda x: x.unstack_anomalous(...))
or some similar formula. perhaps share the code you're working on and we can recommend a strategy.
from reciprocalspaceship.
the workaround in this case is quite simple--the problem only occurs if I stack and then unstack--then the repeats no longer match up. So the workaround is not to stack in the first place.
from reciprocalspaceship.
the stack / unstack_anomalous methods are not meant to be called on unmerged data. in this case, you may have several merged datasets which are concatenated inside one object. this means the miller indices have repeated values. hence, semantically they are the same as unmerged data. these methods should not be called in this situation, it may lead to a number of unpleasant side effects. the proper way to handle this is to scope each of the method calls within a groupby operator so that the miller indices are not redundant in each set.
for this reason, both methods raise a ValueError
when called on unmerged
data. the issue is that this is contingent on checking the dataset.merged
attribute which is, in this case set to True which might have been a bad design choice in careless. on the rs
end, we should probably directly test for redundant miller indices rather than rely on dataset.merged
. I wonder what @JBGreisman thinks.
from reciprocalspaceship.
Good point. I'm going the groupby
route for now, going off what is implemented in rsbooster.stats.anom.py
.
from reciprocalspaceship.
my initial take was that the "repeat"
was a sufficiently common case (for me, anyways) that we might want to implement a provision for it.
from reciprocalspaceship.
yeah i think this use case is going to be pretty unusual outside the hekstra lab. i don't think any other program besides careless makes this sort of data structure. this is maybe an indicator that careless shouldn't make these sorts of files 🤔
from reciprocalspaceship.
My personal feeling is that rs
should be kept agnostic to careless
decisions regarding column naming. More broadly, I try to avoid any hard-coded column names, because I think that's a recipe for problematic corner cases and "user surprise." There are certainly exceptions for H
, K
, and L
and some internal column names, but generally I'd like to avoid hardcoding a case for repeat
.
As @kmdalton said, in my mind, the xval
mtz that gets output should be considered "unmerged" because it has repeat Miller indices. To my knowledge, CCanom
can be computed from careless output without any use of {stack/unstack}_anomalous
-- there is a commandline tool rs.ccanom
provided in rsbooster
that should be applicable to careless xval
output without any modification needed. If you're running into cases of careless output that are unsupported by it, please let me know or file a ticket on rsbooster
. rs.ccanom -h
can be used to see the arguments it takes. Its internals can also be used as a framework for implementing your own function if that is more useful.
Let me know if there's anything else I can provide as far as useful snippets/templates
from reciprocalspaceship.
For future reference, this is a productive work-around:
half_repeats=[]
for repeat in out.repeat.unique():
for half in range(2):
half_repeat=tmp.loc[(tmp.repeat==repeat) & (tmp.half==half),["F","SigF","I","SigI","N","high$
half_repeat=half_repeat.unstack_anomalous()
half_repeat["half"]=half
half_repeat["half"]=half_repeat["half"].astype('MTZInt')
half_repeat["repeat"]=repeat
half_repeat["repeat"]=half_repeat["repeat"].astype('MTZInt')
half_repeats.append(half_repeat)
out2=rs.concat(half_repeats)
from reciprocalspaceship.
Related Issues (20)
- Add `sample_rate` or `resolution_cutoff` options for `to_reciprocalgrid` HOT 5
- rs.DataSet.assign_resolution_bins ought to return bin edges HOT 1
- Cannot unstack_anomalous with other columns in index
- groupby apply drops cell and spacegroup HOT 2
- `stack_anomalous` inside `groupby` breaks `as_index=False` HOT 1
- rs.utils.asu.in_asu() does not use the 'anomalous ASU' for stacked anomalous data. HOT 1
- function for cif file IO and possible support for multi-dataset files HOT 3
- `hkl_to_asu` does not annotate M/ISYM field correctly HOT 2
- support for read_precognition() for hkl without anomalous columns HOT 6
- unstack_anomalous makes data that phenix cannot interpret HOT 3
- API reference website display
- Return keys of dictionary in crystfel.py HOT 6
- add_rfree() does not consider Friedel mates HOT 2
- `rs.DataSet.reset_index()` call signature does not match pandas >1.5 HOT 1
- Used pandas.core.ops attribute does not appear to exist HOT 4
- A `rs.cifdump` utility? HOT 1
- No documentation of CrystFEL columns HOT 1
- mean_intensity_by_miller_index should use a grid HOT 1
- write_mtz SettingWithCopyWarning HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reciprocalspaceship.