Giter VIP home page Giter VIP logo

Comments (10)

DHekstra avatar DHekstra commented on July 28, 2024 1

OK, after some consideration, I agree with all of these design decisions. I agree that the commandline application supports this case already well. Part of the issue came from me wanting to call a function in rsbooster.stats.ccanom directly from in a notebook (which works fine as long as one is aware of the above). My goal is to provide some better support for Careless users who are trying to understand how to interpret the Careless output.

from reciprocalspaceship.

DHekstra avatar DHekstra commented on July 28, 2024

Ihis would be helpful for CCanom calculation on Careless xval#.mtz files using rsbooster.stats.ccanom.analyze_ccanom_mtz() directly. It is obviously not essential, since I could just use the make_halves_ccanom function in the same location first, but it seems undesirable for the default behavior to be to randomly pair repeats.

from reciprocalspaceship.

kmdalton avatar kmdalton commented on July 28, 2024

i have a feeling this is better handled at the top level by the user with a ds.groupby("repeat").apply(lambda x: x.unstack_anomalous(...)) or some similar formula. perhaps share the code you're working on and we can recommend a strategy.

from reciprocalspaceship.

DHekstra avatar DHekstra commented on July 28, 2024

the workaround in this case is quite simple--the problem only occurs if I stack and then unstack--then the repeats no longer match up. So the workaround is not to stack in the first place.

from reciprocalspaceship.

kmdalton avatar kmdalton commented on July 28, 2024

the stack / unstack_anomalous methods are not meant to be called on unmerged data. in this case, you may have several merged datasets which are concatenated inside one object. this means the miller indices have repeated values. hence, semantically they are the same as unmerged data. these methods should not be called in this situation, it may lead to a number of unpleasant side effects. the proper way to handle this is to scope each of the method calls within a groupby operator so that the miller indices are not redundant in each set.

for this reason, both methods raise a ValueError when called on unmerged data. the issue is that this is contingent on checking the dataset.merged attribute which is, in this case set to True which might have been a bad design choice in careless. on the rs end, we should probably directly test for redundant miller indices rather than rely on dataset.merged. I wonder what @JBGreisman thinks.

from reciprocalspaceship.

DHekstra avatar DHekstra commented on July 28, 2024

Good point. I'm going the groupby route for now, going off what is implemented in rsbooster.stats.anom.py.

from reciprocalspaceship.

DHekstra avatar DHekstra commented on July 28, 2024

my initial take was that the "repeat" was a sufficiently common case (for me, anyways) that we might want to implement a provision for it.

from reciprocalspaceship.

kmdalton avatar kmdalton commented on July 28, 2024

yeah i think this use case is going to be pretty unusual outside the hekstra lab. i don't think any other program besides careless makes this sort of data structure. this is maybe an indicator that careless shouldn't make these sorts of files 🤔

from reciprocalspaceship.

JBGreisman avatar JBGreisman commented on July 28, 2024

My personal feeling is that rs should be kept agnostic to careless decisions regarding column naming. More broadly, I try to avoid any hard-coded column names, because I think that's a recipe for problematic corner cases and "user surprise." There are certainly exceptions for H, K, and L and some internal column names, but generally I'd like to avoid hardcoding a case for repeat.

As @kmdalton said, in my mind, the xval mtz that gets output should be considered "unmerged" because it has repeat Miller indices. To my knowledge, CCanom can be computed from careless output without any use of {stack/unstack}_anomalous -- there is a commandline tool rs.ccanom provided in rsbooster that should be applicable to careless xval output without any modification needed. If you're running into cases of careless output that are unsupported by it, please let me know or file a ticket on rsbooster. rs.ccanom -h can be used to see the arguments it takes. Its internals can also be used as a framework for implementing your own function if that is more useful.

Let me know if there's anything else I can provide as far as useful snippets/templates

from reciprocalspaceship.

DHekstra avatar DHekstra commented on July 28, 2024

For future reference, this is a productive work-around:

half_repeats=[]
for repeat in out.repeat.unique():
    for half in range(2):
        half_repeat=tmp.loc[(tmp.repeat==repeat) & (tmp.half==half),["F","SigF","I","SigI","N","high$
        half_repeat=half_repeat.unstack_anomalous()
        half_repeat["half"]=half
        half_repeat["half"]=half_repeat["half"].astype('MTZInt')
        half_repeat["repeat"]=repeat
        half_repeat["repeat"]=half_repeat["repeat"].astype('MTZInt')
        half_repeats.append(half_repeat)

out2=rs.concat(half_repeats)

from reciprocalspaceship.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.