cytomining / cytominer-eval Goto Github PK

View Code? Open in Web Editor NEW

7.0 4.0 10.0 7.4 MB

Common Evaluation Metrics for DataFrames

License: BSD 3-Clause "New" or "Revised" License

Python 98.77% Makefile 0.55% Batchfile 0.69%

cytominer-eval's Introduction

Cytominer-eval: Evaluating quality of perturbation profiles

Cytominer-eval contains functions to calculate quality metrics for perturbation profiling experiments.

Installation

Cytominer-eval can be installed via pip:

pip install cytominer-eval

Since the project is actively being developed, to get up to date functionality, you can also install via github commit hash:

# Example
pip install git+git://github.com/cytomining/cytominer-eval@f7f5b293da54d870e8ba86bacf7dbc874bb79565

Usage

Cytominer-eval uses a simple API for all evaluation metrics.

# Working example
import pandas as pd
from cytominer_eval import evaluate

# Load Data
commit = "6f9d350badd0a18b6c1a76171813aaf9a52f8d9f"
url = f"https://github.com/cytomining/cytominer-eval/raw/{commit}/cytominer_eval/example_data/compound/SQ00015054_normalized_feature_select.csv.gz"

df = pd.read_csv(url)

# Define important function arguments
meta_features = df.columns[df.columns.str.startswith("Metadata_")]
features = df.drop(meta_features, axis="columns").columns.tolist()
replicate_groups = ["Metadata_broad_sample", "Metadata_mg_per_ml"]

# Evaluate profile quality
evaluate(
    profiles=df,
    features=features,
    meta_features=meta_features,
    replicate_groups=replicate_groups,
    replicate_reproducibility_return_median_cor=False,
    operation="replicate_reproducibility",
)

Metrics

Currently, five metric operations are supported:

Replicate reproducibility
Precision/recall
mp-value
Grit
Enrichment
Hit@k

Demos

For more in depth tutorials, see https://github.com/cytomining/cytominer-eval/tree/master/demos.

cytominer-eval's People

Contributors

Stargazers

Watchers

Forkers

gwaybio joe-nano hillsbury lvulliard niranjchandrasekaran michaelbornholdt flohu alxndrkalinin vladchimescu alonshpigler

cytominer-eval's Issues

Specify option to output median pairwise replicate correlations

We currently calculate a pairwise correlation matrix, and we often want to also calculate median pairwise correlations per replicate.

This should be fairly straightforward to implement. I already calculate median pairwise correlation per replicate using cytominer-eval in one of the grit-benchmark analyses.

Option to not have to recalculate melted similarity matrix

Currently, the only option in evaluate() is to provide a pandas dataframe with metadata and profile features. The function then calculates and melts the similarity matrix.

We should provide an option to evaluate precomputed melted similarity matrices to speed up computation.

Define precision and recall @ k

(Stubs for now, so we can add this documentation to code later)

Precision@k can be computed in many different contexts:

For each replicate well, the fraction of replicates of the same perturbations, among the k most-similar wells.
For each guide, the fraction of guides targeting the same genes, among the k most-similar guides.
For each compound (in the context of MOA), the fraction of compounds in the same MOA class, among the k most-similar compounds.

Recall@k is defined similarly e.g. the fraction of all the replicates that are among the k most-similar wells.

R API for "plyr-ing" similarity matrices

@gwaygenomics just a heads up that the code I've been using to test things out here (private) and here is evolving towards getting a consistent API. I'm wary of creating a package – it will be yet another bifurcation in our efforts – so I am actively avoiding doing so (as you can see, a single R file!) while still following R package principles (docstrings, explicit namespaces).

Let's make a decision on what to do once we've written up the benchmark paper. Definitely no R package for now, and we can see where we are post-submission and figure out plans then. Ideally, I'd just switch over to Python and life will be simpler :D But that will really slow me down!

Define percent strong

(Stubs for now, so we can add this documentation to code later)

Percent strong is reported in two ways. We should distinguish between these ways of reporting (they are similar but not the same)

The fraction of replicate pairs that are more similar to each other than 95% of non-replicate pairs.
The fraction of replicate sets that are more coherent than 95% of non-replicate sets of the same size. The coherence of a set is defined as the median correlation among elements of the set.

The second version can be a bit confusing so here is an example:

Consider an experiment that has 100 perturbations, performed in 5 replicates
For each perturbation, we compute the pairwise correlation among its 5 replicates (the replicate set), and report the median value. This median replicate correlation value is the coherence of the perturbation's replicates.
We now compute the coherence of non-replicate sets of the same size. So we sample 5 random wells in the experiment such that no two are replicates of the same perturbation. We compute the pairwise correlation among the 5 wells (the non-replicate set), and report the median value. We do this several times to build a null distribution
We now report the fraction of perturbations that have a coherence greater than 95th percentile of the null. This is the percent strong.

Calculate MOA enrichment as an evaluation metric

We need to implement an enrichment analysis in this repo.

The goal is to be able to apply this metric to other datasets in the future, but a good example of the functionality we want can be found here: https://github.com/broadinstitute/DeepProfilerExperiments/blob/0dd69555d5dc4244e9c8b66bec509fa4399cef35/profiling/quality.py

Evaluation metrics with full vs. subset profiles

One functionality we want in this repo is the ability to split profiles into specific feature subgroups. @niranjchandrasekaran performed a split on channels in broadinstitute/lincs-cell-painting#51

It would be great to replicate similar functionality in this package

Remaining tests for the new precision recall

Here is a list of ideas on what tests to add to the precision recall function. This is needed because #63 is only focussed on results and not on good tests.

Use a small example as Florian has written in #62 to make sure that the math is correct
Make sure that the output df has the right shape depending on the number of groupby_cols
Add checks that make sure the groupby is available, ie it exists in the df.

Test which is about replicate groups, not about groupby_cols

Make sure that if the replicate groups span a unique mapping over the input df, then all values should be 0. This is the case because no connection will have a true label.

Distance from negative control

`precision_recall()` counts within group connections twice but not between group connections

When reading the source code for cytominer_eval.operations.precision_recall() I noticed that the similarity_melted_df variable counts each replicate pair twice, e.g. A1 --> A2 and A2 --> A1.
This becomes a problem because only the first replicate_group_col in lines 49-52 is subsequently used for grouping:

49 replicate_group_cols = [
50    "{x}{suf}".format(x=x, suf=pair_ids[list(pair_ids)[0]]["suffix"])  # [0] keeps only the first of two grouping columns
51    for x in replicate_groups
52 ]

In the next step, each group is passed to calculate_precision_recall():

59   precision_recall_df_at_k = similarity_melted_df.groupby(
60        replicate_group_cols
61    ).apply(lambda x: calculate_precision_recall(x, k=k_))
62    precision_recall_df = precision_recall_df.append(precision_recall_df_at_k)

With the effect that all samples from within a group are counted twice. However, samples from outside the group are only counted once because group_by will filter out one direction.

Let me clarify this with an example. Consider 5 samples, the first 3 from group 'A', the second 2 from group 'B', both with greater within-group than between group correlations:

Then what calculate_precision_recall will see is this:

For example, one can see that the sample_pair_a column has a row for A1-->A2 and one for A2-->A1 but only one for A1-->B1. B1-->A1 is missing because of the way the melted data frame is generated and the grouping is performed. One can also see that the similarity metrics for within group connections appear in duplicates.

Accordingly the outcome for precision and recall at k=4 is the following:

Precision: all 4 closest connections are from within group for A but only 2 for group B.
Recall: 4/6 connections found for A but all 2 found for B.

In summary, the computations are not entirely correct, especially for smaller groups. Also consider that with odd values for k only one of the two connections of the symmetric pair is used.

Admittedly, this is a bit mind-boggling. I recommend using a debugger if you want to trace all the steps in detail by yourself.

Proposed solution: I would suggest to count each pair only once when creating the melted data frame.

Define grit

(Stubs for now, so we can add this documentation to code later)

Grit can be computed in many different contexts:

A replicate's grit is its average similarity to other replicates, z-scored against its similarity to negative control replicates.

A guide's grit is its average similarity to other guides targeting the same gene, z-scored against its similarity to negative control guides.

A compound's grit (in the context of MOA) is its average similarity to other compounds in the same MOA class, z-scored against its similarity to DMSO replicates.

Add Precision at R to precision at k

This update should be done soon after #63 since it is fast to implement and important for me to use for my thesis.

I will simply add the option to do precision at R which will set k to the number of correct neighbors such that the precision can always be between 0 and 1 no matter how many MOA replicas exist for a given class.

Add option to summarize replicate z-scores from controls as medians

The only option is to define based on mean. I need to add an option to define based on median.

percent_strong outputs nan when no replicates are present

When no replicates are present, denom = 0 and we receive the following error:

/home/ubuntu/efs/Periscope_Calico/workspace/software/Periscope_Calico/miniconda3/envs/pooled-cell-painting/lib/python3.7/site-packages/cytominer_eval/operations/percent_strong.py:50: RuntimeWarning: invalid value encountered in long_scalars
  ).sum() / denom

We should add an AssertionError when denom = 0 and provide an error message that "no replicate groups identified".

The line is

cytominer-eval/cytominer_eval/operations/percent_strong.py

Lines 45 to 50 in 56bd9e5

 replicate_df = similarity_melted_df.query("group_replicate") 

 denom = replicate_df.shape[0] 

 percent_strong = ( 

 replicate_df.similarity_metric > non_replicate_quantile 

 ).sum() / denom

Support multiple columns as replicate indicators

In grit() and mp_value() specifically, we can add support for a list of columns indicating replicates vs. just a single string (so one column)

assert melt in enrichment

Currently, enrichment.py is lacking a step in which we assert the melt was performed correctly.

cytominer-eval/cytominer_eval/transform/util.py

Line 127 in 7f94b11

def assert_melt(

We need to add this assertion in a future pull request, which will also add an "enrichment" option to the assert_melt function.

Figure out why percent replicating is apparently insensitive to number of replicates

@gwaygenomics reported this in https://github.com/broadinstitute/lincs-profiling-complementarity

Additionally, because we had 5 replicates of Cell Painting and 3 replicates of L1000 per treatment, we performed a subsampling experiment with Cell Painting to match the number of L1000 replicates. We actually observed slightly increased percent replicating scores in the subsampled Cell Painting data with fewer replicates indicating that the two additional Cell Painting replicates did not artificially inflate percent replicating scores (Supplementary Figure 3).

Running metric_melt several times

This is an optimization issue.
When running metrics such as precision-recall or enrichment, like in the demo book: https://github.com/cytomining/cytominer-eval/blob/master/demos/CellPainting_Demo.ipynb
You call evaluate several times to calculate the metric at different value of p e.g.

for p in np.arange(0.99, 0.96, -0.01):
    r = evaluate(
        profiles=df,
        features=features,
        meta_features=meta_features,
        replicate_groups=["Metadata_gene_name"],
        operation="enrichment",
        similarity_metric="pearson",
        enrichment_percentile = p,
    )
    result.append(r)

However, this calls the function cytominer_eval.transform.get_pairwise_metric several times (once per call of evaluate). This is not necessary since the metrics can be retrieved from the same similarity_melted_df!!

We need to adapt the function evaluate such that it either only calculates the pairs once, when called several times. Or we maybe just need to change the demos to show that when calculating several values, you must not use evaluate for them.

What are your thoughts @gwaygenomics

Skip calculating similarity matrix for mp-value

The mp_value function operates on the input profiles, and not the similarity melted dataframe.

cytominer-eval/cytominer_eval/evaluate.py

Lines 59 to 66 in fdda99e

 elif operation == "mp_value": 

 metric_result = mp_value( 

 df=profiles, 

 control_perts=grit_control_perts, 

 replicate_id=replicate_groups, 

 features=features, 

 params=mp_value_params, 

 )

We should skip this calculation when operation == "mp_value":

cytominer-eval/cytominer_eval/evaluate.py

Lines 31 to 37 in fdda99e

 similarity_melted_df = metric_melt( 

 df=profiles, 

 features=features, 

 metadata_features=meta_features, 

 similarity_metric=similarity_metric, 

 eval_metric=operation, 

 )

Upgrade to minimum pandas 1.2 and deprecate python 3.5

At some point in the future, we should consider making the minimum pandas version >=1.2 in the requirements.txt file.

I debated performing this in #55 because of one silly deprecating in pandas.testing.assert_frame_equal (rtol argument, see 03c09ee). However, I decided that this version upgrade and deprecation should probably be thought through in more detail.

One idea is to mirror the version requirements of pycytominer since both of them are likely going to be used together often. We should also think about certain projects that might no longer be able to use cytominer-eval after this decision, although I think that should be a non-issue since those other projects are probably using a specific cytominer-eval hash.

Add Demo of Hit@k

Implement Mahalanobis distance to negative controls

(Stub) Mahalanobis distance of each data point to the negative controls

Read methods paper comparing approaches for bioactivity hits

Nyffeler J, Haggard DE, Willis C, Setzer RW, Judson R, Paul-Friedman K, Everett LJ, & Harrill JA. (2020). Comparison of Approaches for Determining Bioactivity Hits from High-Dimensional Profiling Data. SLAS Discov, 2472555220950245. doi: 10.1177/2472555220950245

https://paperpile.com/shared/SaNURy

h/t @AnneCarpenter

calculate_precision_recall() fails when replicate_groups has just one member

In operations.py calculate_precision_recall() is called in line 67. When the replicate_groups has just one member, this fails. Before I investigate this in more detail: should we return NA in such cases with a warning or simply throw a more informative exception? At the moment a division by zero error is raised without additional information.

Further optimizations

also, I'm wondering if you can optimize this (probably don't need to change for this PR - it is likely beyond scope), but if you split out the two queries, is it faster?

e.g.

nongroup_replicate_query = replicate_truth_df.query("not group_replicate")
group_replicate_query = replicate_truth_df.query("group_replicate")

v11 = group_replicate_query("similarity_metric > @Threshold")

and so on...

optmize enrichment

Add an `Audit` test applying to more than one group

Something weird is going on, need to see why more than one audit groups doesn't appear to be working...

Add https link for cytominer-eval installation

In the readme front page the link for the most recent cytominer-eval installation is given as

pip install git+git://github.com/cytomining/cytominer-eval@f7f5b293da54d870e8ba86bacf7dbc874bb79565

However, my company server only allows https access. For such cases the link should be

pip install git+https://github.com/cytomining/cytominer-eval@f7f5b293da54d870e8ba86bacf7dbc874bb79565

Please add this to the readme file and include it also in other similar installation instructions. :)

Replicate reproducibility

Automate black formatting with github action

In #55 @michaelbornholdt noticed that black was making unanticipated changes, and I thought that it could be some version difference.

To make it consistent (and to avoid these confusions in the future), we should add an auto-black github action: https://github.com/psf/black/actions/runs/17913292/workflow

Issues with groupby columns

Soo,

For prec_recall and Hitk we know have the case that the input groupby columns determines by which columns the similarity df is sorted. This has a important impact on your solution. If you for example sort by something that is not unique, ie not unique in the input df - then you will get internal connections in the sub dataframe that you are grouping.

Lets say you have for example a df with Sampels and different dosages. If you then have groupby_columns = Metadata_broad_sample, then you will sort into sub groups that have several connections within each other (all the different doses). And your precision will have the weird effects that @FloHu described in #62 for example. Similarly, hitk will have weird results because you are now looking at internal connections and not only the nearest neighbors of one sample.

Either we keep it all this way and make users aware of this or we find some workaround here? Maybe the solution is to not allow anything other than unique groupby_cols ?

Add Enrichment Analysis

Enrichment Analysis is a method currently used by Juan. His code can be found here:
https://github.com/broadinstitute/DeepProfilerExperiments/blob/master/ta-orf/04-downstream-analysis.ipynb

I would like to add the enrichment operation to the list of operations while using the given tools and adapting the code from Juan.

Add reporting option to percent_strong

@shntnu notes in #21 (comment) that there are two ways to report the calculation of percent_strong.

We should add a boolean argument named median_coherence and set the default to be False.

This argument will perform the second version as described in #21

Adding comparison to mp-value

Hi there!
@gwaygenomics as we discussed I am looking on adding the multidimensional perturbation value (mp-value, DOI:
10.1177/1087057112469257) as another metric to which you could compare the grit score. This actually sounds like solving #23 (the mp-value is basically the Mahalanobis distance to the negative controls on the PCA-reduced space).
To clarify and make sure this would fit in this project, could you please confirm that these are the steps it would require?

In cytominer_eval.operations define a new operation mp_value.
In test-evaluate.py, add a method test_evaluate_mp_value.
In evaluate.py, add a case for operation == "mp_value".

Cheers!

	replicate_df = similarity_melted_df.query("group_replicate")
	denom = replicate_df.shape[0]

	percent_strong = (
	replicate_df.similarity_metric > non_replicate_quantile
	).sum() / denom

	elif operation == "mp_value":
	metric_result = mp_value(
	df=profiles,
	control_perts=grit_control_perts,
	replicate_id=replicate_groups,
	features=features,
	params=mp_value_params,
	)

	similarity_melted_df = metric_melt(
	df=profiles,
	features=features,
	metadata_features=meta_features,
	similarity_metric=similarity_metric,
	eval_metric=operation,
	)

cytomining / cytominer-eval Goto Github PK

cytominer-eval's Introduction

Cytominer-eval: Evaluating quality of perturbation profiles

Installation

Usage

Metrics

Demos

cytominer-eval's People

Contributors

Stargazers

Watchers

Forkers

cytominer-eval's Issues

and so on...

Recommend Projects

Recommend Topics

Recommend Org