ncatstranslator / reasoner-validator Goto Github PK

Validation of Translator OpenAPI (TRAPI) messages both to TRAPI and Biolink Model standards. See https://ncatstranslator.github.io/reasoner-validator/

License: Other

Python 99.87% Dockerfile 0.08% Makefile 0.06%

reasoner-validator's Introduction

Reasoner Validator

This package provides software methods to Translator components (e.g. Knowledge Providers and Autonomous Relay Agents) using any version of the Translator Reasoner API (TRAPI) and the Biolink Model.

See the full documentation and the contributor guidelines.

Using the Package

Python Dependency

The Reasoner Validator now requires Python 3.9 or later (some library dependencies now force this).

Installing the Module

The module may be installed directly from pypi.org using (Python 3) pip or pip3, namely:

pip install reasoner-validator

Installing and working with the module locally from source

As of release 3.1.6, this project uses the poetry dependency management tool to orchestrate its installation and dependencies.

After installing poetry and cloning the project, the poetry installation may be run (within the available poetry shell):

git clone https://github.com/NCATSTranslator/reasoner-validator.git
cd reasoner-validator
poetry use 3.10
poetry shell
poetry install

Note that the poetry env can be set to either Python 3.10 or 3.11 at the present time.

This installation also installs testing dependencies (in the poetry 'dev' group in the pyproject.toml) and documentation dependencies (in the corresponding poetry 'docs' group). If you don't want the overhead of these dependencies, then the installation of these poetry group dependencies may be excluded:

poetry install --without dev,docs

If you plan to run the web service API, then install it with the optional web group:

poetry install reasoner-validator --with web

Running Validation against an ARS UUID Result(*) or using a Local TRAPI Request Query

A local script trapi_validator.py is available to run TRAPI Response validation against either a PK (UUID) indexed query result of the Biomedical Knowledge Translator "Autonomous Relay System" (ARS), a local JSON Response text file or a locally triggered ad hoc query Request against a directly specified TRAPI endpoint.

Note that it is best run within a poetry shell created by poetry install.

For script usage, type:

./trapi_validator.py --help

(*) Thank you Eric Deutsch for the prototype code for this script

Running tests

Run the available unit tests with coverage report:

poetry run pytest --cov

Note that poetry automatically uses any existing virtual environment, but you can otherwise also enter the one that is created by poetry by default:

poetry shell
# run your commands, e.g. the web service module
exit  # exit the poetry shell

The use of the Poetry shell command allows for running of the tests without the poetry run prefix. We will continue in this manner.

% poetry shell
(reasoner-validator-py3.9) % pytest --cov

Run the tests with detailed coverage report in a HTML page:

pytest --cov --cov-report html

Serve the report on http://localhost:3000:

python -m http.server 3000 --directory ./htmlcov

Building the Documentation Locally

All paths here are relative to the root project directory. The validation codes MarkDown file should first be regenerated if needed (i.e. if the codes.yaml was revised):

cd reasoner_validator
python ./validation_codes.py

Then build the documentation locally:

cd ../docs
make html

The resulting index.html and related pages describing the programmatic API are now available for viewing within the docs subfolder _build/html.

Validation Run as a Web Service

The Reasoner Validator is available wrapped as a simple web service. The service may be run directly or as a Docker container.

API

The web service has a single POST endpoint /validate taking a simple JSON request body, as follows:

{
  "trapi_version": "1.4.1",
  "biolink_version": "3.5.0",
  "target_provenance": {
    "ara_source": "infores:aragorn",
    "kp_source": "infores:panther",
    "kp_source_type": "primary"
  },
  "strict_validation": true,
  "response": "{<some full JSON object of a TRAPI query Response...>}"
}

The request body consists of JSON data structure with two top level tag:

An optional trapi_version tag can be given a value of the TRAPI version against which the message will be validated, expressed as a SemVer string (defaults to 'latest' if omitted; partial SemVer strings are resolved to their 'latest' minor and patch releases). This value may also be a GitHub branch name (e.g. 'master').
An optional biolink_version tag can be given a value of the Biolink Model version against which the message knowledge graph semantic contents will be validated, expressed as a SemVer string (defaults to 'latest' Biolink Model Toolkit supported version, if omitted).
An optional target_provenance with an object dictionary (example shown) specifying the ARA and KP infores-specified knowledge sources expected to be recovered in the TRAPI query results (specified by infores CURIE) and the expected KP provenance source type, i.e. 'primary' implies that the KP is tagged as a 'biolink:primary_knowledge_source'. Optional in that the root "target_provenance" or any of the subsidiary tags may be omitted (default to None)
An optional strict_validation flag (default: None or 'false'). If 'true' then follow strict validation rules, such as treating as 'error' states the use of category, predicate and attribute_type_id that are of type abstract or mixin as errors.
A mandatory message tag should have as its value the complete JSON TRAPI Response to be validated (See the example below)

Running the Web Service Directly

First install the web-specific dependencies, if not already done (e.g. by --all-extras above):

poetry install --extras web  # or poetry install --all-extras

The service may be run directly as a Python module. The web services module may be directly run, as follows.

python -m api.main

Go to http://localhost/docs to see the service documentation and to use the simple UI to input TRAPI messages for validation.

Typical Output

As an example of the kind of output to expect, if one posts the following TRAPI Response JSON data structure to the /validate endpoint:

{
  "trapi_version": "1.4.2",
  "biolink_version": "4.1.5",
  "response": {
      "message": {
        "query_graph": {
            "nodes": {
                "type-2 diabetes": {"ids": ["MONDO:0005148"]},
                "drug": {"categories": ["biolink:Drug"]}
            },
            "edges": {
                "treats": {"subject": "drug", "predicates": ["biolink:treats"], "object": "type-2 diabetes"}
            }
        },
        "knowledge_graph": {
            "nodes": {
                "MONDO:0005148": {"name": "type-2 diabetes"},
                "CHEBI:6801": {"name": "metformin", "categories": ["biolink:Drug"]}
            },
            "edges": {
                "df87ff82": {"subject": "CHEBI:6801", "predicate": "biolink:treats", "object": "MONDO:0005148"}
            }
        },
        "results": [
            {
                "node_bindings": {
                    "type-2 diabetes": [{"id": "MONDO:0005148"}],
                    "drug": [{"id": "CHEBI:6801"}]
                },
                "edge_bindings": {
                    "treats": [{"id": "df87ff82"}]
                }
            }
        ]
      },
      "workflow": [{"id": "annotate"}]
  }
}

one should typically get a response body something like the following JSON validation result back:

{
  "messages": {
    "Validate TRAPI Response": {
      "Standards Test": {
        "info": {
          "info.query_graph.edge.predicate.mixin": {
            "global": {
              "biolink:treats": [
                {
                  "edge_id": "drug[biolink:Drug]--['biolink:treats']->type-2 diabetes[None]"
                }
              ]
            }
          }
        },
        "skipped": {},
        "warning": {},
        "error": {
          "error.query_graph.edge.predicate.invalid": {
            "global": {
              "biolink:treats": [
                {
                  "edge_id": "drug[biolink:Drug]--['biolink:treats']->type-2 diabetes[None]"
                }
              ]
            }
          }
        },
        "critical": {}
      }
    }
  },
  "trapi_version": "v1.4.2",
  "biolink_version": "4.1.5"
}

To minimize redundancy in validation messages, messages are uniquely indexed in dictionaries at two levels:

the (codes.yaml recorded) dot-delimited validation code path string
for messages with templated parameters, by a mandatory 'identifier' field (which is expected to exist as a field in a template if such template has one or more parameterized fields)

OpenTelemetry and Jaeger

NOTE: OpenTelemetry is temporarily disabled in this code release (to be updated later)

The web service may be monitored for OpenTelemetry by setting an environment variable TELEMETRY_ENDPOINT to a suitable trace collecting endpoint in an application like Jaeger (see also the Translator SRI Jaeger-Demo).

Note: the current system Docker (Compose) design only supports OpenTemplate tracing using the internal Jaeger container and may require further refinements to enable use of an external telemetry collector.

Running the Web Service within Docker

The Reasoner Validator web service may be run inside a docker container, using Docker Compose.

First, from the root project directory, build the local docker container

docker-compose build

Then, run the service:

docker-compose up -d

Once again, go to http://localhost/docs to see the service documentation.

To stop the service:

docker-compose down

Of course, the above docker-compose commands may be customized by the user to suit their needs. Note that the docker implementation assumes the use of uvicorn

Change Log

Summary of earlier releases and current Change Log is here.

Code Limitations (implied Future Work?)

The release of the reasoner-validator after v2.2.0 will not likely be able to (reliably, if at all) validate TRAPI JSON data models prior to 1.3.0
Biolink Model release <= 2.4.8 versus 3.0.0 validation: the reasoner-validator uses the Biolink Model Toolkit. As it happens, the toolkit is not backwards compatible with at least one Biolink Model structural change from release 2.#.# to 3.#.#: the tagging of 'canonical' predicates. That is, the 0.8.10++ toolkit reports canonical <= 2.4.8 model predicates as 'non-canonical'.
This release of the Reasoner Validator Web Service will detect TRAPI 1.0.* releases but doesn't strive to be completely backwards compatible with them (considering that TRAPI 1.0.* is totally irrelevant now).
The web service validation doesn't do deep validation of the Results part of a TRAPI Message
The validation is only run on the first 1000 nodes and 100 edges of graphs, to keep the validation time tractable (this risks not having complete coverage of the graph)
Biolink Model toolkit is not (yet) cached so changing the model version during use will result in some latency in results
The validator service doesn't (yet) deeply validate non-core node and edge slot contents of Message Knowledge Graphs
The validator service doesn't (yet) attempt validation of Query Graph nodes and edges 'constraints' (e.g. biolink:Association etc. domain and range constraints)
Query Graph node 'ids' are not validated except when an associated 'categories' parameter is provided for the given node. In general, Query Graph Validation could be elaborated.
The system should leverage the Reasoner Pydantic Models

Core Contributors

Kudos to Patrick Wang, who created the original implementation of the Reasoner-Validator project while with CoVar (an entrepreneurial team contributing to the Biomedical Data Translator).
Thanks to Kenneth Morton (CoVar) for his reviews of the latest code.
The project is currently being extended and maintained by Richard Bruskiewich (Delphinai Corporation, on the SRI team contributing to Translator)

reasoner-validator's People

Contributors

Stargazers

Watchers

Forkers

di2ag richardbruskiewich arthur-huan vemonet

reasoner-validator's Issues

Add Biolink Model validation depth to Edge attribute (constraint) validation

The current TRAPI Knowledge Graph edge validation falls short of detailed validation of edge attributes (see attribute validation code snippet (or lack thereof) in the edge validator.

Deeper validation of Biolink Model (and perhaps, some TRAPI-specific) validation is required at this level.

Deeper validation of TRAPI workflow specifications

We have fixed the check_compliance_of_trapi_response() method to substantially validate a full TRAPI Response object, but only perform schema validation of TRAPI workflow specifications.

As it happens, careful scrutiny of the schema definition indicates that the values of some parameters of some operations, have additional cardinality and semantic constraints, invisible to basic schema validation. This issue is a placeholder to remind us to ponder how to do deeper validation of such constraints (if feasible and performant enough for our use cases).

An empty knowledge graph may be an error in some use cases

Although empty Knowledge Graphs are allowed, there may be some use cases where this should be flagged as an error. Some way of flagging discretionary strict validation of empty graphs as error may be required.

crash if workflow is null

Describe the bug
Code crashes if workflow is null

To Reproduce

import os
import sys
import re
import json
import requests
import json

from reasoner_validator import TRAPIResponseValidator

def main():

    #### Parse command line options
    import argparse
    argparser = argparse.ArgumentParser(description='CLI testing of the ResponseCache class')
    argparser.add_argument('--verbose', action='count', help='If set, print more information about ongoing processing' )
    argparser.add_argument('response_id', type=str, nargs='*', help='Integer number of a response to read and display')
    params = argparser.parse_args()

    #### Query and print some rows from the reference tables
    if len(params.response_id) == 0 or len(params.response_id) > 1:
        eprint("Please specify a single ARS response UUID")
        return

    response_id = params.response_id[0]

    response_content = requests.get('https://ars-dev.transltr.io/ars/api/messages/'+response_id, headers={'accept': 'application/json'})
    status_code = response_content.status_code

    if status_code != 200:
        eprint("Cannot fetch from ARS a response corresponding to response_id="+str(response_id))
        return

    #### Unpack the response content into a dict
    try:
        response_dict = response_content.json()
    except:
        eprint("Cannot decode ARS response_id="+str(response_id)+" to a Translator Response")
        return

    if 'fields' in response_dict and 'actor' in response_dict['fields'] and str(response_dict['fields']['actor']) == '9':
            eprint("The supplied response id is a collection id. Please supply the UUID for a response")
            return

    if 'fields' in response_dict and 'data' in response_dict['fields']:
        envelope = response_dict['fields']['data']
        if envelope is None:
            envelope = {}
            return envelope

        #### Perform a validation on it
        validator = TRAPIResponseValidator(trapi_version="1.3.0", biolink_version="3.0.3")
        validator.check_compliance_of_trapi_response(envelope)
        messages: Dict[str, List[Dict[str,str]]] = validator.get_messages()

        if params.verbose:
            print(json.dumps(messages, sort_keys=True, indent=2))

    return


if __name__ == "__main__": main()

if you run the above with:

python3 test_response.py ce6f88f1-4b85-4124-8cf1-7d4cd34f7ab0       
Traceback (most recent call last):
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 73, in <module>
    if __name__ == "__main__": main()
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 64, in main
    validator.check_compliance_of_trapi_response(envelope)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 134, in check_compliance_of_trapi_response
    response: Dict = self.sanitize_trapi_query(response)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 103, in sanitize_trapi_query
    for step in workflow_steps:
TypeError: 'NoneType' object is not iterable

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Validate infores CURIEs including unknown or renamed/deprecated infores CURIEs

Detect the biolink:knowledge_source (or child) term in the TRAPI Attribute attribute_type_id field then attempt to validate the contents of the Attribute value as a well-formed and known infores, applying the BMT code resolving biolink/biolink-model-toolkit#85 validation of the TRAPI provenance attributes.

Support for TRAPI v1.3-beta

I was about to implement validation in ARAX for TRAPI 1.3 beta, but before doing so I was informed that the current validator in PyPI does not permit version numbers with a hyphen, i.e. v1.3-beta

It would be very useful to support v1.3-beta as soon as possible:
https://github.com/NCATSTranslator/ReasonerAPI/releases

Duplication in ValidationReporter messages

The current ValidationReporter messages (in the reasoner_validator.report.py module) are managed as a list of dictionary objects with 'code' and various parameters. This is the flaw that if a 'code' is repeatedly reported, it may be highly duplicated in the messages reported. It would be better (for a given edge) to index messages based on 'code' and capture all the unique parameter values in a list under the message. Affected ValidationReporter methods:

report
add_messages
CodeDictionary.display(**message) ?

We might even try to also collapse completely duplicated parameters(?) into one reported message(?).

Add support for validating infores keys

TRAPI refers to infores keys for provenance. There is a spreadsheet of infores keys somewhere. It would be great to ensure that the infores keys listed in a TRAPI document are valid.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Trouble validating TRAPI 1.3-beta

Hi, I am trying to validate some TRAPI 1.3-beta messages and they are not validating but I don't understand why.

Here's my test code:

import requests
import json
import re
from datetime import datetime
from reasoner_validator import validate
from jsonschema.exceptions import ValidationError

# Set the base URL for the ARAX reasonaer TRAPI 1.1 base
endpoint_url = 'https://arax.ncats.io/api/arax/v1.3'

url = endpoint_url + '/response/54172'
url = endpoint_url + '/response/60320'
print(f"Retrieving response from {url}")

response_content = requests.get(url, headers={'accept': 'application/json'})
status_code = response_content.status_code
if status_code != 200:
    print("ERROR returned with status "+str(status_code))
    print(response_content)
    exit()

# Unpack the response content into a dict
response_dict = response_content.json()
print(f"Retrieved a message with {len(response_dict['message']['results'])} results")
envelope = response_dict

trapi_version = '1.2.0'
trapi_version = '1.3.0-beta'

try:
    validate(envelope,'Response',trapi_version)
    if 'description' not in envelope or envelope['description'] is None:
        envelope['description'] = 'reasoner-validator: PASS'

except ValidationError as error:
    print("Validation failed")
    timestamp = str(datetime.now().isoformat())
    if 'logs' not in envelope or envelope['logs'] is None:
        envelope['logs'] = []
        envelope['logs'].append( { "code": 'InvalidTRAPI', "level": "ERROR", "message": "TRAPI validator reported an error: " + str(error),
            "timestamp": timestamp } )
    if 'description' not in envelope or envelope['description'] is None:
        envelope['description'] = ''
        envelope['description'] = 'ERROR: TRAPI validator reported an error: ' + str(error) + ' --- ' + envelope['description']

print(f"Result: {envelope['description']}")

and when I run this, it appears to fail validation. There's a lot of spewage that's a bit difficult to understand, but I think the crux is:

Failed validating 'oneOf' in schema['properties']['message']['properties']['results']:
    {'oneOf': [{'description': 'List of all returned Result objects for '
                               'the query posed. The list SHOULD NOT be '
                               "assumed to be ordered. The 'score' "
                               'property,\n'
                               ' if present, MAY be used to infer result '
                               'rankings.',
                'items': {'$ref': '#/components/schemas/Result'},
                'type': 'array'},
               {'type': 'null'}]}

On instance['message']['results']:
    [{'confidence': None,
      'description': 'No description available',
      'edge_bindings': {'N1': [{'attributes': None, 'id': 'N1_2'}],
                        'e00': [{'attributes': None,
                                 'id': 'infores:rtx-kg2:CHEMBL.COMPOUND:CHEMBL112-biolink:physically_interacts_with-UniProtKB:P23219'},
                                {'attributes': None,
                                 'id': 'infores:service-provider-trapi:PUBCHEM.COMPOUND:1983-biolink:physically_interacts_with-NCBIGene:5742'},
                                {'attributes': None,
                                 'id': 'infores:rtx-kg2:UniProtKB:P23219-biolink:physically_interacts_with-CHEMBL.COMPOUND:CHEMBL112'}]},
      'essence': 'PTGS1',
      'essence_category': "['biolink:Protein']",
      'id': None,
      'node_bindings': {'n00': [{'attributes': None,
                                 'id': 'CHEMBL.COMPOUND:CHEMBL112',
                                 'query_id': None}],
                        'n01': [{'attributes': None,
                                 'id': 'UniProtKB:P23219',
                                 'query_id': None}]},
      'reasoner_id': 'ARAX',
      'result_group': None,
      'result_group_similarity_score': None,
      'row_data': [1.0, 'PTGS1', "['biolink:Protein']"],
      'score': 1.0,
      'score_direction': None,
      'score_name': None},
...

Any ideas what going on here?

error code with made-up predicate

reasoner_validator 3.1 with TRAPI version 1.3.0 and biolink_version 2.2.11, I see this:

  "errors": [
.......
    {
      "context": "Query Graph n00--['biolink:has_normalized_google_distance_with']->n01 Predicate",
      "name": "biolink:has_normalized_google_distance_with",
      "code": "error.unknown"
    }
  ]

I suppose the problem is that biolink:has_normalized_google_distance_with is something that we made up and isn't real.

First, should "code" be something more clarifying that "error.unknown"?

Second, What should we be using instead of biolink:has_normalized_google_distance_with?

Convert dependency management to poetry

Review all TRAPI models 'object' contexts for 'nullable: true' directives that allow for missing keys without error

Deleting node ids?

Describe the bug
I am trying to implement Reasoner validator into ARAX again and am hitting a weird problem that I don't understand. It's entirely possible that I am doing something weird, but I think I've isolated it?

When I run validation on a TRAPI message, it appears to delete node ids from the query_graph. (??!)
Before validation:

    "query_graph": {
      "edges": {
        "t_edge": {
          "attribute_constraints": [],
          "exclude": null,
          "knowledge_type": "inferred",
          "object": "on",
          "option_group_id": null,
          "predicates": [
            "biolink:treats"
          ],
          "qualifier_constraints": [],
          "subject": "sn"
        }
      },
      "nodes": {
        "on": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [
            "MONDO:0015564"
          ],
          "is_set": false,
          "option_group_id": null
        },
        "sn": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false,
          "option_group_id": null
        }
      }
    },

and after validation:

    "query_graph": {
      "edges": {
        "t_edge": {
          "attribute_constraints": [],
          "exclude": null,
          "knowledge_type": "inferred",
          "object": "on",
          "option_group_id": null,
          "predicates": [
            "biolink:treats"
          ],
          "qualifier_constraints": [],
          "subject": "sn"
        }
      },
      "nodes": {
        "on": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [],
          "is_set": false,
          "option_group_id": null
        },
        "sn": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false,
          "option_group_id": null
        }
      }
    },

How is this possible? I am doubting my sanity.
Could there have been some experimenting/testing/intentional breaking code left in the validator?

To Reproduce
Steps to reproduce the behavior:
python3 test_response.py 140532

#!/usr/bin/python3
import sys
def eprint(*args, **kwargs): print(*args, file=sys.stderr, **kwargs)

import os
import sys
import re
import json
import requests
import json

#sys.path = ['/mnt/data/python/TestValidator'] + sys.path
from reasoner_validator import TRAPIResponseValidator



############################################ Main ############################################################

def main():

    #### Parse command line options
    import argparse
    argparser = argparse.ArgumentParser(description='CLI testing of the ResponseCache class')
    argparser.add_argument('--verbose', action='count', help='If set, print more information about ongoing processing' )
    argparser.add_argument('response_id', type=str, nargs='*', help='UUID or integer number of a response to fetch and validate')
    params = argparser.parse_args()

    #### Query and print some rows from the reference tables
    if len(params.response_id) == 0 or len(params.response_id) > 1:
        eprint("Please specify a single ARS response UUID")
        return

    response_id = params.response_id[0]

    if len(response_id) > 20:
        debug = True

        ars_hosts = [ 'ars-prod.transltr.io', 'ars.test.transltr.io', 'ars.ci.transltr.io', 'ars-dev.transltr.io', 'ars.transltr.io' ]
        for ars_host in ars_hosts:
            if debug:
                eprint(f"Trying {ars_host}...")
            try:
                response_content = requests.get(f"https://{ars_host}/ars/api/messages/"+response_id, headers={'accept': 'application/json'})
            except Exception as e:
                eprint( f"Connection attempts to {ars_host} triggered an exception: {e}" )
                return
            status_code = response_content.status_code
            if debug:
                eprint(f"--- Fetch of {response_id} from {ars_host} yielded {status_code}")

        if status_code != 200:
            if debug:
                eprint("Cannot fetch from ARS a response corresponding to response_id="+str(response_id))
                eprint(str(response_content.content))
            return

    else:
        response_content = requests.get('https://arax.ncats.io/devED/api/arax/v1.4/response/'+response_id, headers={'accept': 'application/json'})

    status_code = response_content.status_code

    if status_code != 200:
        eprint("Cannot fetch a response corresponding to response_id="+str(response_id))
        return

    #### Unpack the response content into a dict
    try:
        response_dict = response_content.json()
    except:
        eprint("Cannot decode Response with response_id="+str(response_id)+" into JSON")
        return

    if 'fields' in response_dict and 'actor' in response_dict['fields'] and str(response_dict['fields']['actor']) == '9':
            eprint("The supplied response id is a collection id. Please supply the UUID for a single Response")
            return

    if 'fields' in response_dict and 'data' in response_dict['fields']:
        envelope = response_dict['fields']['data']
        if envelope is None:
            envelope = {}
            return envelope
    else:
        envelope = response_dict

    #### If there is a previous validation, remove it
    if 'validation_result' in envelope:
        del(envelope['validation_result'])

    outfile = open('zz_before.json','w')
    print(json.dumps(envelope, sort_keys=True, indent=2), file=outfile)
    outfile.close()

    #### Perform a validation on it
    if params.verbose:
        print(json.dumps(envelope, sort_keys=True, indent=2))
    validator = TRAPIResponseValidator(trapi_version="1.4.0-beta4", biolink_version="3.2.1")
    validator.check_compliance_of_trapi_response(envelope)

    messages: Dict[str, List[Dict[str,str]]] = validator.get_messages()
    print(json.dumps(messages, sort_keys=True, indent=2))

    print("-------------------------")
    validation_output = validator.dump()
    print(validation_output)

    outfile = open('zz_after.json','w')
    print(json.dumps(envelope, sort_keys=True, indent=2), file=outfile)
    outfile.close()

    return


if __name__ == "__main__": main()

Is this a bug?

While trying to validate a TRAPI Response with the reasoner_validator version 3.1, I am seeing:

  "warnings": [
    {
      "context": "Query Graph",
      "edge_id": "n00--['biolink:physically_interacts_with']->n01",
      "predicate": "biolink:physically_interacts_with",
      "code": "warning.edge.predicate.non_canonical"
    }
  ],

What does this mean?
What is wrong with using n00--['biolink:physically_interacts_with']->n01 ?

Add ability to pass a URL or schema dictionary to TRAPISchemaValidator.validate so it can be used to test example data that corresponds to changes in the ReasonerAPI specification on PR.

Is your feature request related to a problem? Please describe.
When I make changes to the ReasonerAPI schema, I want to write tests that validate the examples I provided with my change.

Following this basic workflow:

make changes in the TranslatorReasonerAPI.yaml in a branch
write some example data that conforms to the new schema
run some tests to make sure the example data conforms to the new schema.

TRAPISchemaValidator.validate(trapi_version) sets the trapi_version to the latest version of TRAPI (1.3.0) even if I pass in another version. (e.g. 1.3.1rc1)

I think this works now because 1.3.0 was made after the last round of schema changes (and ReasonerAPI tests) were in and working. The tests currently validate because they match the latest released schema, but will fail on each PR that attempts to modify the schema.

Describe the solution you'd like
I'd like to be able to pass a URL (e.g.: https://raw.githubusercontent.com/sierra-moxon/ReasonerAPI/mkg_qualifiers/TranslatorReasonerAPI.yaml), or the result of a yaml.load operation (a dictionary containing the schema I want to validate against) to the TRAPISchemaValidator.validate() method for it to use as the schema of reference.

Describe alternatives you've considered
I can write my own validator that uses a local/updated version of the schema to run on PR for ReasonerAPI PRs.

Additional Context
schema to validate against: https://raw.githubusercontent.com/sierra-moxon/ReasonerAPI/mkg_qualifiers/TranslatorReasonerAPI.yaml
simple tests: https://github.com/sierra-moxon/ReasonerAPI/blob/f42e7786e9c958d17ceb04f51ce91c80ad415130/tests/test_valid.py#L56

Add validation for 'canonical' predicates

We need to check if TRAPI knowledge graphs (input query graphs too?) returned by KP (and ARA's), are using the 'canonical' predicates

Detect 'dangling nodes/edges' by iterating through node <-> edge mappings)

This operation may be applied both to the Query graph (see issue #14) and to the Knowledge Graph - the latter which may have to either have edges sub-sampled or deemed an optional validation, in the event that the returned TRAPI knowledge graph is very large.

For dangling nodes/edges in Knowledge Graphs, there are two sub-use cases:

Nodes defined in the graph, but not used by any edge
Edges defined in the graph whose nodes are not registered in the nodes list

check_compliance_of_trapi_response() should validate the whole TRAPI Response object, not just the Message part

check_compliance_of_trapi_response() currently only validates the Message part (QueryGraph, KnowledgeGraph and Results) portion of the TRAPI Response object, but rather need to validate the whole message.

Upgrade SRI Testing harness to accept and adequately validate TRAPI 1.3

https://github.com/NCATSTranslator/ReasonerAPI/tree/1.3

Reasoner Validator yielding incorrect errors for primary_knowledge_source in TRAPI 1.4

Describe the bug
When trying to validate our TRAPI 1.4 messages, we're getting two errors:
error.knowledge_graph.edge.provenance.missing_primary
error.knowledge_graph.edge.attribute.missing

I think we have the new RetrievalSource modeled correctly for TRAPI 1.4, so I'm guessing reasoner validator is still looking in attributes for the primary_knowledge_source.

On a related note, KG Edge attributes are not required and are nullable in TRAPI 1.3 and 1.4, but error.knowledge_graph.edge.attribute.missing implies attributes are required. I assume this is related to the requirement for primary_knowledge_source, and so this error code may not be applicable to 1.4.

To Reproduce
Validating TRAPI response against 1.4.0-beta. Can try the attached COHD response.

cohd_trapi_1.4.txt

Placeholder issue reflecting feedback from Eric Deutsch

1) use check_biolink_model_compliance_of_query_graph() right away to validate QGraphs prior to query submission (note: if a KG is also part of a query, you can also use the corresponding knowledge graph method)
2) We will write a "one stop" method for "semantic" (beyond schema validation) validation of TRAPI Response (including the Message components: QGraph, KnowledgeGraph and Results components)
3) iterate to enhance validation (e.g. we can already ponder add detection of 'dangling nodes/edges' by iterating through node <-> edge mappings)
4) we will strive to optimize performance to make the the reasoner-validator methods tractable for high throughput use
5) Add Biolink Model 3.0 support (after the September 2022 Translator Relay)

crash trying to validate TRAPI response

Describe the bug
I tried to validate a TRAPI message and the code crashed

To Reproduce
Run same test as in #65 (after working around that error) and get:

$ python3 test_response.py ce6f88f1-4b85-4124-8cf1-7d4cd34f7ab0
Traceback (most recent call last):
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 73, in <module>
    if __name__ == "__main__": main()
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 64, in main
    validator.check_compliance_of_trapi_response(envelope)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 154, in check_compliance_of_trapi_response
    elif self.has_valid_query_graph(message) and \
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 219, in has_valid_query_graph
    biolink_validator = check_biolink_model_compliance_of_query_graph(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 846, in check_biolink_model_compliance_of_query_graph
    validator.check_biolink_model_compliance(graph)
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 749, in check_biolink_model_compliance
    self.validate_graph_node(node_id, details)
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 213, in validate_graph_node
    self.validate_category(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 634, in validate_category
    biolink_class = self.validate_element_status(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 258, in validate_element_status
    element: Optional[Element] = self.bmt.get_element(name)
  File "/mnt/data/python/TestValidator/bmt/toolkit.py", line 585, in get_element
    element = self.view.get_element(parsed_name)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 874, in get_element
    e = self.get_class(element, imports=imports)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 486, in get_class
    c = self.all_classes(imports=imports).get(class_name, None)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 290, in all_classes
    classes = copy(self._get_dict(CLASSES, imports))
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 416, in _get_dict
    schemas = self.all_schema(imports)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 235, in all_schema
    return [m[sn] for sn in self.imports_closure(imports)]
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 211, in imports_closure
    imported_schema = self.load_import(sn)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 185, in load_import
    schema = load_schema_wrap(sname + '.yaml',
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 70, in load_schema_wrap
    schema = yaml_loader.load(path, target_class=SchemaDefinition, **kwargs)
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/loader_root.py", line 85, in load
    results = self.load_any(*args, **kwargs)
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/yaml_loader.py", line 32, in load_any
    return self.load_source(source, loader, target_class, accept_header="text/yaml, application/yaml;q=0.9",
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/loader_root.py", line 58, in load_source
    data = hbread(source, metadata, metadata.base_path, accept_header)
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 260, in hbread
    with hbopen(source, open_info, base_path, accept_header, is_actual_data, read_codec) as f:
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 188, in hbopen
    raise e
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 184, in hbopen
    response = urlopen(req, context=ssl._create_unverified_context())
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: https://raw.githubusercontent.com/mnt/data/python/TestValidator/linkml_runtime/linkml_model/model/schema/types.yaml

Expected behavior
Successful validation

Additional context
Add any other context about the problem here.

Accept version as a parameter for old version support

Requesting a feature to accept version as a parameter when bumping to latest version. For example, when you bump your package to 1.1.X, then it would accept 1.1.X, as well as all the older versions to validate as a parameter.

PyYaml update creates 2 different dependencies in version 1.0.X and 1.1.X.

This way, we could install the latest version of your package, but then to validate my 0.9, 1.0 old versions, I would just need to tell it what version.

direction_qualifier errors?

Describe the bug
It seems like many TRAPI messages in the wild are failing with an error:

    "error.query_graph.edge.qualifier_constraints.qualifier_set.qualifier.invalid": {
      "chementity-node--['biolink:affects']->cox2": [
        {
          "qualifier_type_id": "biolink:object_direction_qualifier",
          "qualifier_value": "decreased"
        }
      ]
    },

But I don't quite understand what the complaint is here. Is the problem that "decreased" is thought to not be an allowed value? It seems to be a legal value.

so I'm suspecting that this is an erroneous error? but I'm uncertain.

Can you explain that this error means and if it is valid?

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Validation error on null relation?

I am trying to validate this:
https://ars.transltr.io/ars/api/messages/3f7d1b6f-0af2-4884-b334-60602c3ad427

And get this error:

- Message INVALID: None is not of type 'string'

Failed validating 'type' in schema['properties']['query_graph']['properties']['edges']['additionalProperties']['properties']['relation']:
    {'description': 'Lower-level relationship type of this edge',
     'example': 'upregulates',
     'type': 'string'}

On instance['query_graph']['edges']['e00']['relation']:
    None

apparently on this part:

    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "predicate": "biolink:subclass_of",
          "relation": null,
          "subject": "n00"
        }
      },

Are you seeing the same?
What to do about this?

Explore Integration with Reasoner Pydantic Models

Reasoner Pydantic is a project that aims to build pydantic models as a means for validation and serialization. It has been used in production for some time and is available on pypi . I think we should explore integration of this project with the validator service.

validation fails when results or knowledge_graph are null

Describe the bug
The spec defines Message.results and Message.knowledge_graph to be nullable, but validation fails when the results or knowledge_graph are null.

To Reproduce
validate_Response on

{
    "datetime": "2021-02-05 23:27:50",
    "description": "Could not map node n01 to OMOP concept",
    "message": {
        "knowledge_graph": null,
        "query_graph": {
            "edges": {
                "e00": {
                    "object": "n01",
                    "predicate": [
                        "biolink:correlated_with"
                    ],
                    "subject": "n00"
                }
            },
            "nodes": {
                "n00": {
                    "category": [
                        "biolink:ChemicalSubstance"
                    ]
                },
                "n01": {
                    "id": [
                        "DOID:90535"
                    ]
                }
            }
        },
        "results": null
    },
    "query_options": {
        "confidence_interval": 0.99,
        "dataset_id": 3,
        "max_results": 5,
        "method": "obsExpRatio",
        "min_cooccurrence": 0
    },
    "reasoner_id": "COHD",
    "schema_version": "1.0.0",
    "status": "CouldNotMapCurieToLocalVocab",
    "tool_version": "COHD 3.0.0"
}

Expected behavior
Passes validtion.

Additional context
Using release version 1.0.2/1.0.1.

Improve examples in Python API documentation

https://reasoner-validator.readthedocs.io/en/latest/reasoner_validator.html

Right now all of the example objects are {'message': {}}. Instead we should adopt the examples from the Reasoner API schema, and/or dynamically generate more appropriate ones.

Pass through jsonschema.exceptions.ValidationError

It would be nice to be able to from reasoner_pydantic import validate, ValidationError rather than having the additional from jsonschema.exceptions import ValidationError.

New 'recommendations' view needs a new web API endpoint

Design and implement the API, something like /recommend?test_run=####&kp_id=???&ara_id=???

Semantic validator to return a list of problems with the document, versus, assertion failures on the first problem encountered.

(from @edeutsch): can the semantic validator return a list of problems with the document. So instead of the paradigm of "exit at the first violation" like the schema validator, can we have another method (eg. check_semantics() whose output would be a list of problems. So don't crash out on the first problem, but compile all problems. Then we can display those in the UI. And we might have several levels, ERROR, WARNING, INFO, etc. where as we refine the validator we can categorize the issues we find.

Tests broken since release 3.0.0

Describe the bug
A clear and concise description of what the bug is.

Before release 3.0.0 we were using the reasoner-validator to validate the output of our TRAPI APIs

But since release 3.0.0 we are getting the following error:

Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/production/test_production_api.py:7: in <module>
    from reasoner_validator import validate
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/reasoner_validator/__init__.py:4: in <module>
    from reasoner_validator.biolink import (
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/reasoner_validator/biolink/__init__.py:14: in <module>
    from reasoner_validator.sri.util import is_curie
E   ModuleNotFoundError: No module named 'reasoner_validator.sri'

Here is the description of release 3.0.0. It does not seems like any breaking changes have been introduced according to this description (why moving to 3.0.0 in this case btw?)

Reports validation status messages - error, warning and information - using a parameterized coded messages (see the codes.yaml file in the project). Some validation previously tagged as 'error' are now 'warning' or 'information' messages.

To Reproduce
Code to reproduce the behavior:

from reasoner_validator import validate
TRAPI_VERSION_TEST: str = "1.2.0"
assert validate(trapi_results['message'], "Message", TRAPI_VERSION_TEST) == None

It can't be more simple, so I have hard time to see how I can trigger an import issue here

cf. https://github.com/MaastrichtU-IDS/translator-openpredict/blob/master/tests/production/test_production_api.py#L42

Expected behavior
It should have worked like it was working before
Or there should be a disclaimer in the Release notes for 3.0.0 explaining which changes I need to make to have it working

Review how deprecated Biolink Model classes are being detected and reported

By example, a recent test using the validator gives the following error:

SKIPPED (test case S-P-O triple '(CHEBI:3002$biolink:ChemicalSubstance)--[biolink:treats]->(MESH:D001249$biolink:Disease)', since it is not Biolink Model compliant: BLM Version 2.4.4 Error in Input Edge: 'subject' category 'biolink:ChemicalSubstance' is unknown?)

In the 2.4.4 model, we see

  chemical substance:
    deprecated: >-
      This class is deprecated in favor of 'small molecule.'

The validation is detecting use of the deprecated class as test error. Although this is probably ok, we perhaps still need to review (and possibly adjust) error reporting for deprecated classes.

dependence issue

Describe the bug
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.2 which is incompatible.

Invalid NodeBinding?

After manually fumbling my way down the object model to try to find the problem in my message, it appears that the validator is calling my NodeBindings (among other things) invalid, but I don't understand why. Here's the snippet. What's wrong with this?

from reasoner_validator import validate_Message, validate_Result, validate_EdgeBinding, validate_NodeBinding, ValidationError

node_binding = {
  "id": "MONDO:0018081"
}

print(json.dumps(node_binding, sort_keys=True, indent=2))
try:
    validate_NodeBinding(node_binding)
    print(f"      - node_binding is valid")
except ValidationError:
    print(f"      - node_binding INVALID")

yields:

{
  "id": "MONDO:0018081"
}
      - node_binding INVALID

Why is that?

Extend TRAPI SemVer regex to accept 'extended' semantic versioning.

Core semantic versioning is simply a period delimited number schema of major.minor.patch but allows for build, pre-release and pre-release + build variants. We wish for the Reasoner Validator regex parsing to support these use cases.

Issue with dependencies

Describe the bug
We have a really simple test that just do a request to our API in production every day and run the reasoner-validator to check the whole thing is compliant.

Every other day the whole test fails even if we change nothing and it's always because of the reasoner-validator dependency tree

This issue has been raised a few times already in the past, and everytime it has been temporary solved, but like a snake whack-a-mole another dependency conflict is always surfacing a few days after a fix has been brought

This time it seems the issue is related to a dependency of oaklib:

___________ ERROR collecting tests/production/test_production_api.py ___________
tests/production/test_production_api.py:3: in <module>
    from reasoner_validator import TRAPIResponseValidator
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/__init__.py:3: in <module>
    from reasoner_validator.biolink import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/biolink/__init__.py:11: in <module>
    from bmt import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/__init__.py:1: in <module>
    from bmt.toolkit import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/toolkit.py:6: in <module>
    from oaklib.implementations import UbergraphImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/__init__.py:9: in <module>
    from oaklib.selector import get_adapter, get_implementation_from_shorthand  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/selector.py:15: in <module>
    from oaklib.implementations.funowl.funowl_implementation import FunOwlImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/__init__.py:13: in <module>
    from oaklib.implementations.cx.cx_implementation import CXImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/cx/cx_implementation.py:9: in <module>
    from oaklib.implementations.obograph.obograph_implementation import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/obograph/obograph_implementation.py:40: in <module>
    from oaklib.interfaces.differ_interface import DifferInterface
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/interfaces/differ_interface.py:25: in <module>
    from oaklib.utilities.kgcl_utilities import generate_change_id
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/utilities/kgcl_utilities.py:11: in <module>
    import kgcl_schema.grammar.parser as kgcl_parser
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/kgcl_schema/grammar/parser.py:9: in <module>
    from bioregistry import parse_iri, get_preferred_prefix, curie_to_str
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/__init__.py:5: in <module>
    from .collection_api import get_collection, get_context  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/collection_api.py:7: in <module>
    from .resource_manager import manager
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/resource_manager.py:41: in <module>
    from .schema import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/__init__.py:5: in <module>
    from .struct import (  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/struct.py:2054: in <module>
    class RegistryGovernance(BaseModel):
pydantic/main.py:197: in pydantic.main.ModelMetaclass.__new__
    ???
pydantic/fields.py:506: in pydantic.fields.ModelField.infer
    ???
pydantic/fields.py:436: in pydantic.fields.ModelField.__init__
    ???
pydantic/fields.py:552: in pydantic.fields.ModelField.prepare
    ???
pydantic/fields.py:668: in pydantic.fields.ModelField._type_analysis
    ???
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/typing.py:852: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class

The test is here: https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/tests/production/test_production_api.py

And in the test environment only the following dependencies are installed (cf. dependencies and optional dependencies in https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/pyproject.toml#L27 )

    "python-multipart >=0.0.5",
    "requests >=2.23.0",
    "httpx >=0.21.1",
    "pydantic[dotenv] >=1.9",
    "fastapi >=0.68.1",
    "uvicorn >=0.15.0",
    "gunicorn >=20.0.4",
    "Authlib >=0.15.4",
    "itsdangerous >=2.0.1",
    "reasoner-pydantic >=2.2.3",
    "rdflib >=6.1.1",
    "SPARQLWrapper >=1.8.5",
    "pytest >=7.1.3,<8.0.0",
    "pytest-cov >=2.12.0,<4.0.0",
    "ruff >=0.0.219",
    "reasoner-validator >=3.1.4",

To Reproduce
Steps to reproduce the behavior:

Try to use reasoner-validator in a real project

Expected behavior
Dependencies should be better defined, and better tested.

Tests have been defined in the tests/ folder, but they are currently not run by the GitHub Actions set in this repo (there are 1 action to publish without testing, and 1 to update the docs). Ideally tests should be run at every push to the repository (like the GitHub action to update the docs), and they should test for different version of python (3.9, 3.10, 3.11 at the moment)

Using the reasoner-validator is really important for us to make sure our output fits the ever changing TRAPI/BioLink specifications. Unfortunately the current state of the reasoner-validator forces us to spend hours per week to fix basic dependencies issues that should be handled upstream by the reasoner-validator, bmt, oaklib, etc. Those tools and libraries should have automated tests on every commit, and those tests should capture those kind of dependency failure by running in different environment.

Unfortunately at the moment our only long term solution is to do just like the reasoner-validator is doing right now: leaving the testing to someone else! (to the next person in the line, which are ARAs and ARX) and wait to be notified by those actors if our TRAPI payload is not compliant

Semantic check for error.node.categories_not_array not right?

Using reasoner_validator 3.1, I have validating a response and getting this error:

  "errors": [
    {
      "node_id": "n00",
      "code": "error.node.categories_not_array"
    },

And my n00 is:

      "nodes": {
        "n00": {
          "categories": null,
          "constraints": [],
          "ids": [
            "CHEMBL.COMPOUND:CHEMBL112"
          ],
          "is_set": false,
          "option_group_id": null
        },

It's true that it's not an array, but it is allow to be null. From spec:

        categories:
          type: array
          items:
            $ref: '#/components/schemas/BiolinkEntity'
          nullable: true

Accept version="A.B"

Default to the latest A.B.x

Better to brand 'QNode' context errors with 'QNode' rather than just 'Node'

Fix documentation

maybe migrate the documentation to mkdocs

Query Graph Validation

As making queries is one of the more difficult aspects of the current translator, I believe we could do better validation of the graph. Although this may not technically validate the JSON schema, it isn't really a useful query. A number of notable issues exist.

No valid node information
An edge that does not point to existing nodes (now seems related to #74)
A predicate that contains "biolink:" but is not a valid predicate

{
  "version": "1.2",
  "message": {
    "query_graph": {
        "nodes": {
            "drug": {}
        },
        "edges": {
            "treats": {"subject": "not_here", "predicates": ["biolink:not_a_real_predicate"], "object": "also_not_here"}
        }
    }
  }
}

Observations from the UI MVP front lines...

We create here a 'meta-' issue to generally capture additional validation tasks suggested by the Translator UI (see session notes and related Andy Crouse slide presentation).

Duplicate CURIEs for a given name. Shows some items as a duplicate if there are FDA approved results and non-FDA approved results: 'error' versus 'warning'?
- Are the duplications coming from different KPs?
- How would we test for this?
- Could we test ARAs for this?
- Do we conduct any testing at the level of the ARS itself?
Some Missing Names: results may contain cases where there is no name for a drug (in a TRAPI query for drugs treating diseases...): 'error' versus 'warning'?
Missing (concept) node descriptions: 'error' versus 'warning'?

How do I report what is wrong?

I'm trying to report exactly what is wrong. I'm running this successfully:

    try:
        validate_Message(envelope['message'])
    except ValidationError:
        raise ValueError('Bad Reasoner component!')

But how do I report to the user exactly what is wrong with the message? I can't figure out how to do that.

Provide a formal code based system of error reporting and messaging

Error reporting in the reasoner-validator is currently just a list of free formatted strings. Developing a standard index of such error messages (indexed by 'error code') would streamline the error reporting process.

As part of this new indexing, we should likely consider partitioning the reporting codes into distinct levels: ERROR, WARNING and INFO, (DEBUG?) to meet specific use cases.

add descriptions to validation messages in codes.yaml

Enhanced documentation of validation messages, both for publication in https://ncatstranslator.github.io/reasoner-validator/ and for programmatic access (e.g. by SRI Testing etc.)

Strive to enhance performance of the validation execution

Validation is getting a bit more sluggish with the extent of Biolink Model semantic validation now injected into the package.

Some ideas to explore:

Selectively disable Biolink Model validation by user-specified choice (if they don't want it) - Implemented in release v3.5.5
Identify opportunities for caching results of methods in various parts of the system
Can CodeDictionary access to codes and messages (and subtrees) be cached?

If a TRAPI Attribute is received with "value: None" (or null) this should trigger a validation error

Is your feature request related to a problem? Please describe.
From the Translator stand-up calls, we see TRAPI Attribute objects coming back with null (or None) value components.

Describe the solution you'd like
Since value is a required field of the Attribute object, the validator should return an error if it sees "None" or "N/A" or null.

check_compliance_of_trapi_response() only works with Message not Response

Moving from reasoner-validator 2.2.14 to 3.1.0 with some issues.

Version 3.1 seems to work with this:

from reasoner_validator import TRAPIResponseValidator
    validator = TRAPIResponseValidator(trapi_version=trapi_version, biolink_version=biolink_version, sources=None, strict_validation=True)
    validator.check_compliance_of_trapi_response(message=envelope['message'])
    validation_result = validator.get_messages()

And seems to work pretty well. The most startling thing is that even though the class and main method are named as if they are designed to validate a TRAPI Response object, the check_compliance_of_trapi_response() method only seems to work when passed a TRAPI Message object (child of Response). Does not work when passed a Response object. I think it should!

Add full Biolink Model release 3.0.2++ validation support; iterate on future model releases

Biolink Model release 3.0.2++ validation support, in particular, edge qualifier validation support.

ncatstranslator / reasoner-validator Goto Github PK

reasoner-validator's Introduction

Reasoner Validator

Using the Package

Python Dependency

Installing the Module

Installing and working with the module locally from source

Running Validation against an ARS UUID Result(*) or using a Local TRAPI Request Query

Running tests

Building the Documentation Locally

Validation Run as a Web Service

API

Running the Web Service Directly

Typical Output

OpenTelemetry and Jaeger

Running the Web Service within Docker

Change Log

Code Limitations (implied Future Work?)

Core Contributors

reasoner-validator's People

Contributors

Stargazers

Watchers

Forkers

reasoner-validator's Issues

Recommend Projects

Recommend Topics

Recommend Org