Comments (8)
Gemmi will have a function to calculate completeness at some point, but I don't know when.
from reciprocalspaceship.
yes -- I think this is a good idea. I think adding a stats
subdirectory seems like a good place for such functions to live.
from reciprocalspaceship.
I would mark this as low-priority, FWIW.
I did do a simple implementation previously (https://github.com/Hekstra-Lab/double-wilson/blob/main/old/DoubleWIlsonExplorations.ipynb), but it's kind of slow.
from reciprocalspaceship.
@DHekstra , I think I need this for processing the GFP data.
from reciprocalspaceship.
Sounds good. Multiplicity and completeness would be useful stats to be able to compute, and could live in rs.stats
. Computing those statistics using the groupby
functionality in pandas
will also make for a rather convenient way to summarize over resolution bins.
from reciprocalspaceship.
I will go ahead and implement something to get us started.
from reciprocalspaceship.
#60 introduces low level tools to rs.utils
which can be used to address this issue. However, we still need to wrap the tools I added to become part of the top-level API or with a convenient function in the rs.stats
module.
from reciprocalspaceship.
Here is a snippet that uses the stats function introduced in #60 and packages the results as a DataSet indexed by resolution bin. I'm just putting this here as a placeholder snippet for when I get around to implementing this:
h, counts = rs.utils.compute_redundancy(mtz.get_hkls(), mtz.cell, mtz.spacegroup)
ds = rs.DataSet({"n": counts, "H": h[:, 0], "K": h[:, 1], "L": h[:, 2]}, spacegroup=mtz.spacegroup, cell=mtz.cell)
ds.set_index(["H", "K", "L"], inplace=True)
ds, labels = ds.assign_resolution_bins(10)
ds["observed"] = ds["n"] > 0
result = ds.groupby("bin")["observed"].agg(["sum", "count"])
result["completeness"] = result["sum"] / result["count"]
result.index = labels
result["completeness"]
The resulting DataSet looks like this (for an example dataset that is admittedly low-completeness):
99.53 - 3.70 0.736536
3.70 - 2.93 0.779202
2.93 - 2.55 0.797380
2.55 - 2.31 0.807513
2.31 - 2.15 0.820326
2.15 - 2.02 0.825619
2.02 - 1.92 0.835808
1.92 - 1.83 0.836003
1.83 - 1.76 0.838672
1.76 - 1.70 0.839348
Name: completeness, dtype: float64
from reciprocalspaceship.
Related Issues (20)
- groupby apply drops cell and spacegroup HOT 2
- `stack_anomalous` inside `groupby` breaks `as_index=False` HOT 1
- Unstack anomalous taking into account Careless repeats HOT 10
- rs.utils.asu.in_asu() does not use the 'anomalous ASU' for stacked anomalous data. HOT 1
- function for cif file IO and possible support for multi-dataset files HOT 3
- `hkl_to_asu` does not annotate M/ISYM field correctly HOT 2
- support for read_precognition() for hkl without anomalous columns HOT 6
- unstack_anomalous makes data that phenix cannot interpret HOT 3
- API reference website display
- Return keys of dictionary in crystfel.py HOT 8
- add_rfree() does not consider Friedel mates HOT 2
- `rs.DataSet.reset_index()` call signature does not match pandas >1.5 HOT 1
- Used pandas.core.ops attribute does not appear to exist HOT 4
- A `rs.cifdump` utility? HOT 1
- No documentation of CrystFEL columns HOT 1
- mean_intensity_by_miller_index should use a grid HOT 1
- write_mtz SettingWithCopyWarning HOT 2
- Failure to correctly read Miller indices from cctbx-generated MTZ HOT 8
- RS requires set_index(["H","K","L"]) to create a multi_index using rs.DataSet HOT 5
- Refactor Ray contextmanager
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reciprocalspaceship.