clinical-data-committee-tracking-voting's People

Contributors

Stargazers

Watchers

Forkers

bassa1

clinical-data-committee-tracking-voting's Issues

Clinical Data ARS Use Case - Y3 Prioritization #2 - Drug Target Landscape

As I outlined in the Slack channel I believe having a clear goal and shared process in mind with respect to drug repurposing will help align the efforts we're going to be putting forth in upcoming hack-a-thons. This should also help insure that we are producing answers that are in line with current rational drug development/repurposing practices.

I found the this 2010 review from Pfizer helpful

Specifically, it describes two lines of evidence that can be used to characterize the "druggability" of targets, which is displayed in the following figure:

Where (a) describes the target's relevance to the disease and (b) describes the nature of available compounds which interact with the target (therapeutic, or potentially therapeutic)

When plotted as the x and y axis of a matrix, we can see clusters of drug targets that can be grouped into "zones":

Important for our case, zone "C" is described as having high likelihood of repurposing, since there is moderate to high disease relevancy and compound relevancy for the targets that are not indicated for that disease (but are for some other disease).

My proposal was to use figure 1 as a starting point for formulating our use case queries.

The questions listed on the "component info" section could be relevant use cases for Translator to attempt to answer. Some need to be creatively reformatted into a query graph, however.
The "Example Scoring Basis" serves as a guideline for the EPC requirements that would be needed for making decisions about classifying and explaining answers. This would help determine which ARAs to invoke for the query.
The "Data Source" section serves as a guideline for which KPs or ARAs would be relevant for aggregating relevant knowledge graphs.

We could start by brainstorming how to formulate these questions into query graphs and select KPs and ARAs that might be able to respond with the desired EPC. For instance the first question under "Expression" might look like a multi-hop query graph:

(target) -transcribed from-> Transcript -expressed_in-> AnatomicalEntity -related_to*> Disease
*not sure if related_to is appropriate

Required EPC might include the expression levels of a transcript within the disease-relevant tissue on the expressed_in edge. Optional EPC might include confidence values that associate the anatomical entity to the disease, or the number of alternative splice forms that the target has in addition to the Transcript identified (as this might indicate possible off-target effects).

Clinical Data Modeling - Y3 Prioritization #2 - Biolink Schema/Definition for Cohort

This issue is intended to determine and vote on whether the current Biolink schema and definition for 'cohort' is sufficient. The currrent hierarchy can be found here and is provided below, along with the definition of 'cohort'. Please comment and/or vote by providing a +1 (in favor), 0 (neutral), -1 (not in favor).

Biolink Hierarchy for Cohort

Biological Entity - Organismal Entity - Individual Organism - Population of Individual Organisms - Study Population - Cohort

Biolink Definition of Cohort

A group of people banded together or treated as a group who share common characteristics. A cohort ‘study’ is a particular form of longitudinal study that samples a cohort, performing a cross-section at intervals through time.

Vote on Prioritizations/Milestones for Y3

This issue is intended to stimulate a vote on committee prioritizations/milestones for Y3, as outlined in issues #3, #4, #9, #10, #11, #12, #13, and #14. Please comment and/or vote by providing a +1 (in favor), 0 (neutral), -1 (not in favor).

biolink predicates no longer exist in latest Biolink Model

The following predicates are not found in the latest Biolink Model:

Clinical-Data-Committee-Tracking-Voting/GetCreative()_DrugDiscoveryRepurposing_RarePulmonaryDisease/Path_A/Path_A_no_overlay_chem_+reg_gene.json

Lines 38 to 46 in 0a85448

 "predicates": [ 

 "biolink:increases_activity_of", 

 "biolink:increases_expression_of", 

 "biolink:increases_abundance_of", 

 "biolink:decreases_metabolic_processing_of", 

 "biolink:increases_secretion_of", 

 "biolink:increases_transport_of", 

 "biolink:entity_positively_regulates_entity" 

 ]

For reference, the Biolink Model can be found here: https://github.com/biolink/biolink-model/blob/master/biolink-model.yaml

May relay pitch

Our May Relay goal will consider use cases of target landscaping and drug repurposing, with a specific focus on ensuring that provenance returned is sufficient for users’ needs and also to confirm that the current TRAPI specification is fit-for-purpose. Our September Relay goal is to have a feature-complete Translator system deployed into a primarily NCATS hosted production environment that can address questions in these two use case domains, giving us ample time to refine performance as we plan for our soft release for the formal presentations in December.

When developing your pitch, we would like these questions addressed:

  What is the duration of your challenge? Keep in mind, the relay is scheduled from May 10 to May 14, 2021 with 11:00am-4:00pm core hours.

  Who is the preferred audience/attendee/contributors?

  Do you foresee potential overlap with any other WG/Committees?

  How do you anticipate reporting out? (I.e. what is the proposed deliverable?)

 How does your challenge align with goals written in the FOA?

Please email me and cc Tyler Beck your pitches by March 26, 2021.

Clinical Data Modeling - Y3 Prioritization #3 - Feature Variable Representation

This issue is intended to (1) identify specific use cases for existing feature variables that can/cannot be represented properly using existing ontologies plus modifiers/qualifiers and (2) generate a rough estimate of the proportion of existing feature variables that cannot be represented properly using existing ontologies plus modifiers/qualifiers. Please see this sheet and also comment.

Augmentin DILI Use Case [proposed by Exposures Provider]

Prior to the breast cancer use case, we had considered a DILI use case. At the time, Multi-omics EHR Provider and Clinical Data Provider were capable of answering questions related to DILI, but Exposures Provider was only semi-capable, as we had not yet stood-up our planned ICEES+ DILI instance. We were going to move forward regardless until we realized that a breast cancer use case was something that all of the clinical KPs (including Connections Hypothesis Provider) could contribute to.

However, Exposures Provider received an Augmentin DILI dataset from the international DILIN network just this past Tuesday, 3/16/2021. We should be able to expose the data via ICEES+ fairly quickly. As such, I'm inclined to rethink this use case.

Thoughts from others?

Clinical Data Types - Labs - Y3 Prioritization #1

This issue is intended to initiate a discussion regarding the treatment of clinical laboratory measurements within the context of Translator.

Biolink Extension to Support Clinical Data

See summary here.

Clinical Data ARS Use Case - Y3 Prioritization #1 - Clinical x Genetics

This issue is intended to initiate discussion on how Translator clinical KPs can best leverage the genetics knowledge available via the Translator Genetics Provider.

Clinical Data Modeling - Y3 Prioritization #5 - Spatiotemporal Modeling

This issue is intended to develop an approach for modeling spatiotemporal relationships in Biolink. Please see this sheet and also comment.

TCDC CARA

TCDC CARA Overview

This issue is intended to initiate implementation work on the Translator Clinical Data Committee (TCDC) Curated ARA (CARA). The goal is to create a skeletal ARA that initially will support the TCDC's MVP1 workflow on rare pulmonary disease but eventually will support any workflow developed by the committee. CARA also will provide a general model and approach for other teams, committees, working groups, and external users who wish to contribute an ARA to the Translator ecosystem. The development and implementation work is being supported by the SRI, with Jason Reilly serving as lead developer. Plans for long-term maintenance are TBD.

TCDC CARA Implementation Plan

A detailed implementation plan was developed by Jason F., Arbrar M., Chris B., Casey T., and Kara F. on 11/15/2022 and finalized by those same persons on 11/17/2022. That plan is described below.

TCDC will register within CARA mappings between a template query-graph and one or more TRAPI queries with workflows but without score operations (i.e., a TRAPI message with a query_graph and a workflow element)
- For the ‘treats’ MVP1 question, there will be ~~two such queries, one for Path A and one for Path B~~ on query, Path D [revised 03/22/2023)
At runtime, when the registered template query-graph (without a workflow but with a URL for return response) comes in from the ARS, CARA will submit the associated TRAPI queries with workflows but without score operations to the Workflow Runner (WFR) and get back the results
After all results are returned, CARA will use FastAPI Reasoner Pydantic to merge the N sets of results by the result node
CARA will then run a score workflow (not under user control) through the WFR [to be discussed 03/29/2023]
The WFR will generate scores for the merged result from multiple ARAs, but rather than generating multiple results (one score per each ARA response), it will put all of the scores into some property on the (one) result, and then generate some kind of half-baked average of the scores from the different ARAs TRAPI 1.4: All of the separate scores generated by each ARA will be presented individually as analyses of the result [revised 03/29/2023]
The WFR sends that scored result back to CARA, who returns it to the ARS using URL for return response

Clinical Data Modeling - Y3 Prioritization #1 - Cohort Modeling

This issue is intended to identify and vote on the minimum set of characteristics required to define a cohort. Our initial draft of requirements can be found here and is also posted below. Please comment and/or vote by providing a +1 (in favor), 0 (neutral), or -1 (not in favor).

Minimum set of cohort characteristics

cohort size, date range, age, sex, race, ethnicity, diagnosis, procedure, medication, source (e.g, EHR data, public exposures data, survey data, research project data), clinical setting (e.g., inpatient, ICU), cohort catchment area (e.g., geographic location), provider (e.g., institution, clinical KP)

Data Modeling for likelihood/risk/prediction

Data source = EHR datasets(need a way to demo these features from EHR)

Data type = real-world evidence(has connotation of FDA use)

Note: CHP has used real-world evidence predicate, but is not from EHR

Predicate = ??

Biolink relation = related_to

associated_with_increased_likelihood_of_concurrent
--- associated_with_increased_likelihood_of_future
associated_with_decreased_likelihood_of_concurrent
--- associated_with_decreased_likelihood_of_future

Methods: Supervised machine learning, logistic regression (or other methods in future)

Contemporaneous (same time): Can be used to mean the likelihood of a drug being given to patients with the disease X (not necessarily that drug treats disease X)
Predict 2 months out: associated with likelihood a disease will occur, or a drug will be prescribed
Predict 2 years out: associated with likelihood a disease will occur, or a drug will be prescribed

Decisions for Clinical Data Team

Predicate
Relation

Decisions for Multiomics EHR Risk KP

Cut-offs
--- No cut-offs for coefficient (have ARA)
--- No cut-offs for p-value > 0.1 (have ARA)
Coefficient log odds, NOT +/-
Two predicate NOT one (increased/decreased)

Consistent use of biolink:correlated_with in TRAPI queries and responses from KPs

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0007254"
        },
        "n01": {
          "category": "biolink:Drug"
        }
      },
      "edges": {
        "e00": {
          "predicate": "biolink:correlated_with",
          "subject": "n00",
          "object": "n01"
        }
      }
    }
  }
}

e.g. COHD: the issue with how to represent our data as knowledge in Biolink is a common issue with the clinical KPs and one that we’re actively discussing with Matt and Richard. The discussion is on the bigger question of how to represent data and knowledge coming from clinical and contextual KPs, but it also involves the discussion of what predicates and what node identifiers should be used. We’ve made good headway within our small group and are about to seek input from the larger Translator group to make sure that the model Matt and Richard designed will be usable by other KPs and ARAs.
To answer your question more specifically regarding COHD, currently our only supported predicate types are biolink:correlated_with and its parent biolink:related_to (the same information represented in both predicates). Since COHD mines its edges from associations within observational clinical data, biolink:correlated_with is the most specific predicate that we can assert as knowledge, so we won’t have predicates like biolink:has_phenotype or biolink:treats. We could try to infer things like biolink:treats, but currently, there’s no way to distinguish a biolink:treats edge derived from inference from our KP vs a similarly stated biolink:treats edge derived from curated knowledge from another KP. We currently do not have genetic data in COHD.

mediKanren Use Case [proposed by Unsecret Agent, NCATS]

Candidate No. 4: mediKanren Real-world Use Cases [NCATS, Unsecret Agent]

(Also see this doc)

Acanthosis nigricans: Patient presented with acanthosis nigricans. Genomic sequencing revealed variants in multiple genes. Causal variant was a gain of function in EGFR. MediKanren recommended erlotinib. We compounded it as a topical cream with a specialist pharmacy, and the patient applied to one arm, but not the other. Significant reduction in the skin growths occurred on the treated arm.
Severe ataxic episodes: Patient presented with ataxic episodes. Genomic sequencing revealed multiple variants. Causal variant was a missense mutation in the domain controlling degradation of RHOBTB2, hence making it a gain of function in RHOBTB2. No direct RHOBTB2 inhibitors are known to exist, but several indirector downregulators of RHOBTB2 (via E2F1) do exist, and celecoxib showed a substantial reduction in ataxic episodes.
Extreme developmental delay; non-ambulatory at age 5: Patient presented with multiple VUSes. Causal variant was determined to be in MAPK8IP3, leading to predicted haploinsufficiency. Retinoic acid is a potent upregulator of MAPK8IP3. 6 months of treatment with vitamin A lead to patient standing and taking simple steps. 18 months of treatments led to patient walking freely under own power and notable cognitive gains.

Identification of use case challenge #2

Clinical Data Modeling - Y3 Prioritization #4 - Clinical Predicate(s)

This issue is intended to develop a more eloquent modeling solution to replace biolink:has_real_world_evidence_of_association_with. The currrent hierarchy can be found here and is provided below, along with the definition of 'cohort'. Please also see this sheet and comment.

Biolink Hierarchy

related_to - related_to_at_instance_level - has_real_world_evidence_of_association_with

Biolink Definition

this means that the assertion was derived by applying statistical and machine learning models to clinical data such as EHR data, survey data, etc

	"predicates": [
	"biolink:increases_activity_of",
	"biolink:increases_expression_of",
	"biolink:increases_abundance_of",
	"biolink:decreases_metabolic_processing_of",
	"biolink:increases_secretion_of",
	"biolink:increases_transport_of",
	"biolink:entity_positively_regulates_entity"
	]

ncatstranslator / clinical-data-committee-tracking-voting Goto Github PK