ncatstranslator / reasoner-validator Goto Github PK

Validation of Translator OpenAPI (TRAPI) messages both to TRAPI and Biolink Model standards. See https://ncatstranslator.github.io/reasoner-validator/

License: Other

Python 99.87% Dockerfile 0.08% Makefile 0.06%

reasoner-validator's Issues

Strive to enhance performance of the validation execution

Validation is getting a bit more sluggish with the extent of Biolink Model semantic validation now injected into the package.

Some ideas to explore:

Selectively disable Biolink Model validation by user-specified choice (if they don't want it) - Implemented in release v3.5.5
Identify opportunities for caching results of methods in various parts of the system
Can CodeDictionary access to codes and messages (and subtrees) be cached?

Explore Integration with Reasoner Pydantic Models

Reasoner Pydantic is a project that aims to build pydantic models as a means for validation and serialization. It has been used in production for some time and is available on pypi . I think we should explore integration of this project with the validator service.

Tests broken since release 3.0.0

Describe the bug
A clear and concise description of what the bug is.

Before release 3.0.0 we were using the reasoner-validator to validate the output of our TRAPI APIs

But since release 3.0.0 we are getting the following error:

Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/production/test_production_api.py:7: in <module>
    from reasoner_validator import validate
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/reasoner_validator/__init__.py:4: in <module>
    from reasoner_validator.biolink import (
/opt/hostedtoolcache/Python/3.8.14/x64/lib/python3.8/site-packages/reasoner_validator/biolink/__init__.py:14: in <module>
    from reasoner_validator.sri.util import is_curie
E   ModuleNotFoundError: No module named 'reasoner_validator.sri'

Here is the description of release 3.0.0. It does not seems like any breaking changes have been introduced according to this description (why moving to 3.0.0 in this case btw?)

Reports validation status messages - error, warning and information - using a parameterized coded messages (see the codes.yaml file in the project). Some validation previously tagged as 'error' are now 'warning' or 'information' messages.

To Reproduce
Code to reproduce the behavior:

from reasoner_validator import validate
TRAPI_VERSION_TEST: str = "1.2.0"
assert validate(trapi_results['message'], "Message", TRAPI_VERSION_TEST) == None

It can't be more simple, so I have hard time to see how I can trigger an import issue here

cf. https://github.com/MaastrichtU-IDS/translator-openpredict/blob/master/tests/production/test_production_api.py#L42

Expected behavior
It should have worked like it was working before
Or there should be a disclaimer in the Release notes for 3.0.0 explaining which changes I need to make to have it working

If a TRAPI Attribute is received with "value: None" (or null) this should trigger a validation error

Is your feature request related to a problem? Please describe.
From the Translator stand-up calls, we see TRAPI Attribute objects coming back with null (or None) value components.

Describe the solution you'd like
Since value is a required field of the Attribute object, the validator should return an error if it sees "None" or "N/A" or null.

add descriptions to validation messages in codes.yaml

Enhanced documentation of validation messages, both for publication in https://ncatstranslator.github.io/reasoner-validator/ and for programmatic access (e.g. by SRI Testing etc.)

Add Biolink Model validation depth to Edge attribute (constraint) validation

The current TRAPI Knowledge Graph edge validation falls short of detailed validation of edge attributes (see attribute validation code snippet (or lack thereof) in the edge validator.

Deeper validation of Biolink Model (and perhaps, some TRAPI-specific) validation is required at this level.

Duplication in ValidationReporter messages

The current ValidationReporter messages (in the reasoner_validator.report.py module) are managed as a list of dictionary objects with 'code' and various parameters. This is the flaw that if a 'code' is repeatedly reported, it may be highly duplicated in the messages reported. It would be better (for a given edge) to index messages based on 'code' and capture all the unique parameter values in a list under the message. Affected ValidationReporter methods:

report
add_messages
CodeDictionary.display(**message) ?

We might even try to also collapse completely duplicated parameters(?) into one reported message(?).

Accept version as a parameter for old version support

Requesting a feature to accept version as a parameter when bumping to latest version. For example, when you bump your package to 1.1.X, then it would accept 1.1.X, as well as all the older versions to validate as a parameter.

PyYaml update creates 2 different dependencies in version 1.0.X and 1.1.X.

This way, we could install the latest version of your package, but then to validate my 0.9, 1.0 old versions, I would just need to tell it what version.

dependence issue

Describe the bug
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
numba 0.56.4 requires numpy<1.24,>=1.18, but you have numpy 1.24.2 which is incompatible.

direction_qualifier errors?

Describe the bug
It seems like many TRAPI messages in the wild are failing with an error:

    "error.query_graph.edge.qualifier_constraints.qualifier_set.qualifier.invalid": {
      "chementity-node--['biolink:affects']->cox2": [
        {
          "qualifier_type_id": "biolink:object_direction_qualifier",
          "qualifier_value": "decreased"
        }
      ]
    },

But I don't quite understand what the complaint is here. Is the problem that "decreased" is thought to not be an allowed value? It seems to be a legal value.

so I'm suspecting that this is an erroneous error? but I'm uncertain.

Can you explain that this error means and if it is valid?

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Semantic check for error.node.categories_not_array not right?

Using reasoner_validator 3.1, I have validating a response and getting this error:

  "errors": [
    {
      "node_id": "n00",
      "code": "error.node.categories_not_array"
    },

And my n00 is:

      "nodes": {
        "n00": {
          "categories": null,
          "constraints": [],
          "ids": [
            "CHEMBL.COMPOUND:CHEMBL112"
          ],
          "is_set": false,
          "option_group_id": null
        },

It's true that it's not an array, but it is allow to be null. From spec:

        categories:
          type: array
          items:
            $ref: '#/components/schemas/BiolinkEntity'
          nullable: true

Semantic validator to return a list of problems with the document, versus, assertion failures on the first problem encountered.

(from @edeutsch): can the semantic validator return a list of problems with the document. So instead of the paradigm of "exit at the first violation" like the schema validator, can we have another method (eg. check_semantics() whose output would be a list of problems. So don't crash out on the first problem, but compile all problems. Then we can display those in the UI. And we might have several levels, ERROR, WARNING, INFO, etc. where as we refine the validator we can categorize the issues we find.

Invalid NodeBinding?

After manually fumbling my way down the object model to try to find the problem in my message, it appears that the validator is calling my NodeBindings (among other things) invalid, but I don't understand why. Here's the snippet. What's wrong with this?

from reasoner_validator import validate_Message, validate_Result, validate_EdgeBinding, validate_NodeBinding, ValidationError

node_binding = {
  "id": "MONDO:0018081"
}

print(json.dumps(node_binding, sort_keys=True, indent=2))
try:
    validate_NodeBinding(node_binding)
    print(f"      - node_binding is valid")
except ValidationError:
    print(f"      - node_binding INVALID")

yields:

{
  "id": "MONDO:0018081"
}
      - node_binding INVALID

Why is that?

Convert dependency management to poetry

Add validation for 'canonical' predicates

We need to check if TRAPI knowledge graphs (input query graphs too?) returned by KP (and ARA's), are using the 'canonical' predicates

Fix documentation

maybe migrate the documentation to mkdocs

Improve examples in Python API documentation

https://reasoner-validator.readthedocs.io/en/latest/reasoner_validator.html

Right now all of the example objects are {'message': {}}. Instead we should adopt the examples from the Reasoner API schema, and/or dynamically generate more appropriate ones.

An empty knowledge graph may be an error in some use cases

Although empty Knowledge Graphs are allowed, there may be some use cases where this should be flagged as an error. Some way of flagging discretionary strict validation of empty graphs as error may be required.

Extend TRAPI SemVer regex to accept 'extended' semantic versioning.

Core semantic versioning is simply a period delimited number schema of major.minor.patch but allows for build, pre-release and pre-release + build variants. We wish for the Reasoner Validator regex parsing to support these use cases.

Add support for validating infores keys

TRAPI refers to infores keys for provenance. There is a spreadsheet of infores keys somewhere. It would be great to ensure that the infores keys listed in a TRAPI document are valid.

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Upgrade SRI Testing harness to accept and adequately validate TRAPI 1.3

https://github.com/NCATSTranslator/ReasonerAPI/tree/1.3

Support for TRAPI v1.3-beta

I was about to implement validation in ARAX for TRAPI 1.3 beta, but before doing so I was informed that the current validator in PyPI does not permit version numbers with a hyphen, i.e. v1.3-beta

It would be very useful to support v1.3-beta as soon as possible:
https://github.com/NCATSTranslator/ReasonerAPI/releases

Placeholder issue reflecting feedback from Eric Deutsch

1) use check_biolink_model_compliance_of_query_graph() right away to validate QGraphs prior to query submission (note: if a KG is also part of a query, you can also use the corresponding knowledge graph method)
2) We will write a "one stop" method for "semantic" (beyond schema validation) validation of TRAPI Response (including the Message components: QGraph, KnowledgeGraph and Results components)
3) iterate to enhance validation (e.g. we can already ponder add detection of 'dangling nodes/edges' by iterating through node <-> edge mappings)
4) we will strive to optimize performance to make the the reasoner-validator methods tractable for high throughput use
5) Add Biolink Model 3.0 support (after the September 2022 Translator Relay)

Better to brand 'QNode' context errors with 'QNode' rather than just 'Node'

Trouble validating TRAPI 1.3-beta

Hi, I am trying to validate some TRAPI 1.3-beta messages and they are not validating but I don't understand why.

Here's my test code:

import requests
import json
import re
from datetime import datetime
from reasoner_validator import validate
from jsonschema.exceptions import ValidationError

# Set the base URL for the ARAX reasonaer TRAPI 1.1 base
endpoint_url = 'https://arax.ncats.io/api/arax/v1.3'

url = endpoint_url + '/response/54172'
url = endpoint_url + '/response/60320'
print(f"Retrieving response from {url}")

response_content = requests.get(url, headers={'accept': 'application/json'})
status_code = response_content.status_code
if status_code != 200:
    print("ERROR returned with status "+str(status_code))
    print(response_content)
    exit()

# Unpack the response content into a dict
response_dict = response_content.json()
print(f"Retrieved a message with {len(response_dict['message']['results'])} results")
envelope = response_dict

trapi_version = '1.2.0'
trapi_version = '1.3.0-beta'

try:
    validate(envelope,'Response',trapi_version)
    if 'description' not in envelope or envelope['description'] is None:
        envelope['description'] = 'reasoner-validator: PASS'

except ValidationError as error:
    print("Validation failed")
    timestamp = str(datetime.now().isoformat())
    if 'logs' not in envelope or envelope['logs'] is None:
        envelope['logs'] = []
        envelope['logs'].append( { "code": 'InvalidTRAPI', "level": "ERROR", "message": "TRAPI validator reported an error: " + str(error),
            "timestamp": timestamp } )
    if 'description' not in envelope or envelope['description'] is None:
        envelope['description'] = ''
        envelope['description'] = 'ERROR: TRAPI validator reported an error: ' + str(error) + ' --- ' + envelope['description']

print(f"Result: {envelope['description']}")

and when I run this, it appears to fail validation. There's a lot of spewage that's a bit difficult to understand, but I think the crux is:

Failed validating 'oneOf' in schema['properties']['message']['properties']['results']:
    {'oneOf': [{'description': 'List of all returned Result objects for '
                               'the query posed. The list SHOULD NOT be '
                               "assumed to be ordered. The 'score' "
                               'property,\n'
                               ' if present, MAY be used to infer result '
                               'rankings.',
                'items': {'$ref': '#/components/schemas/Result'},
                'type': 'array'},
               {'type': 'null'}]}

On instance['message']['results']:
    [{'confidence': None,
      'description': 'No description available',
      'edge_bindings': {'N1': [{'attributes': None, 'id': 'N1_2'}],
                        'e00': [{'attributes': None,
                                 'id': 'infores:rtx-kg2:CHEMBL.COMPOUND:CHEMBL112-biolink:physically_interacts_with-UniProtKB:P23219'},
                                {'attributes': None,
                                 'id': 'infores:service-provider-trapi:PUBCHEM.COMPOUND:1983-biolink:physically_interacts_with-NCBIGene:5742'},
                                {'attributes': None,
                                 'id': 'infores:rtx-kg2:UniProtKB:P23219-biolink:physically_interacts_with-CHEMBL.COMPOUND:CHEMBL112'}]},
      'essence': 'PTGS1',
      'essence_category': "['biolink:Protein']",
      'id': None,
      'node_bindings': {'n00': [{'attributes': None,
                                 'id': 'CHEMBL.COMPOUND:CHEMBL112',
                                 'query_id': None}],
                        'n01': [{'attributes': None,
                                 'id': 'UniProtKB:P23219',
                                 'query_id': None}]},
      'reasoner_id': 'ARAX',
      'result_group': None,
      'result_group_similarity_score': None,
      'row_data': [1.0, 'PTGS1', "['biolink:Protein']"],
      'score': 1.0,
      'score_direction': None,
      'score_name': None},
...

Any ideas what going on here?

Validation error on null relation?

I am trying to validate this:
https://ars.transltr.io/ars/api/messages/3f7d1b6f-0af2-4884-b334-60602c3ad427

And get this error:

- Message INVALID: None is not of type 'string'

Failed validating 'type' in schema['properties']['query_graph']['properties']['edges']['additionalProperties']['properties']['relation']:
    {'description': 'Lower-level relationship type of this edge',
     'example': 'upregulates',
     'type': 'string'}

On instance['query_graph']['edges']['e00']['relation']:
    None

apparently on this part:

    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "predicate": "biolink:subclass_of",
          "relation": null,
          "subject": "n00"
        }
      },

Are you seeing the same?
What to do about this?

crash trying to validate TRAPI response

Describe the bug
I tried to validate a TRAPI message and the code crashed

To Reproduce
Run same test as in #65 (after working around that error) and get:

$ python3 test_response.py ce6f88f1-4b85-4124-8cf1-7d4cd34f7ab0
Traceback (most recent call last):
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 73, in <module>
    if __name__ == "__main__": main()
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 64, in main
    validator.check_compliance_of_trapi_response(envelope)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 154, in check_compliance_of_trapi_response
    elif self.has_valid_query_graph(message) and \
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 219, in has_valid_query_graph
    biolink_validator = check_biolink_model_compliance_of_query_graph(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 846, in check_biolink_model_compliance_of_query_graph
    validator.check_biolink_model_compliance(graph)
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 749, in check_biolink_model_compliance
    self.validate_graph_node(node_id, details)
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 213, in validate_graph_node
    self.validate_category(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 634, in validate_category
    biolink_class = self.validate_element_status(
  File "/mnt/data/python/TestValidator/reasoner_validator/biolink/__init__.py", line 258, in validate_element_status
    element: Optional[Element] = self.bmt.get_element(name)
  File "/mnt/data/python/TestValidator/bmt/toolkit.py", line 585, in get_element
    element = self.view.get_element(parsed_name)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 874, in get_element
    e = self.get_class(element, imports=imports)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 486, in get_class
    c = self.all_classes(imports=imports).get(class_name, None)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 290, in all_classes
    classes = copy(self._get_dict(CLASSES, imports))
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 416, in _get_dict
    schemas = self.all_schema(imports)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 235, in all_schema
    return [m[sn] for sn in self.imports_closure(imports)]
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 211, in imports_closure
    imported_schema = self.load_import(sn)
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 185, in load_import
    schema = load_schema_wrap(sname + '.yaml',
  File "/mnt/data/python/TestValidator/linkml_runtime/utils/schemaview.py", line 70, in load_schema_wrap
    schema = yaml_loader.load(path, target_class=SchemaDefinition, **kwargs)
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/loader_root.py", line 85, in load
    results = self.load_any(*args, **kwargs)
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/yaml_loader.py", line 32, in load_any
    return self.load_source(source, loader, target_class, accept_header="text/yaml, application/yaml;q=0.9",
  File "/mnt/data/python/TestValidator/linkml_runtime/loaders/loader_root.py", line 58, in load_source
    data = hbread(source, metadata, metadata.base_path, accept_header)
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 260, in hbread
    with hbopen(source, open_info, base_path, accept_header, is_actual_data, read_codec) as f:
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 188, in hbopen
    raise e
  File "/mnt/data/python/TestValidator/hbreader/__init__.py", line 184, in hbopen
    response = urlopen(req, context=ssl._create_unverified_context())
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/mnt/data/python/Python-3.9.13/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: https://raw.githubusercontent.com/mnt/data/python/TestValidator/linkml_runtime/linkml_model/model/schema/types.yaml

Expected behavior
Successful validation

Additional context
Add any other context about the problem here.

New 'recommendations' view needs a new web API endpoint

Design and implement the API, something like /recommend?test_run=####&kp_id=???&ara_id=???

check_compliance_of_trapi_response() only works with Message not Response

Moving from reasoner-validator 2.2.14 to 3.1.0 with some issues.

Version 3.1 seems to work with this:

from reasoner_validator import TRAPIResponseValidator
    validator = TRAPIResponseValidator(trapi_version=trapi_version, biolink_version=biolink_version, sources=None, strict_validation=True)
    validator.check_compliance_of_trapi_response(message=envelope['message'])
    validation_result = validator.get_messages()

And seems to work pretty well. The most startling thing is that even though the class and main method are named as if they are designed to validate a TRAPI Response object, the check_compliance_of_trapi_response() method only seems to work when passed a TRAPI Message object (child of Response). Does not work when passed a Response object. I think it should!

Issue with dependencies

Describe the bug
We have a really simple test that just do a request to our API in production every day and run the reasoner-validator to check the whole thing is compliant.

Every other day the whole test fails even if we change nothing and it's always because of the reasoner-validator dependency tree

This issue has been raised a few times already in the past, and everytime it has been temporary solved, but like a snake whack-a-mole another dependency conflict is always surfacing a few days after a fix has been brought

This time it seems the issue is related to a dependency of oaklib:

___________ ERROR collecting tests/production/test_production_api.py ___________
tests/production/test_production_api.py:3: in <module>
    from reasoner_validator import TRAPIResponseValidator
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/__init__.py:3: in <module>
    from reasoner_validator.biolink import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/reasoner_validator/biolink/__init__.py:11: in <module>
    from bmt import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/__init__.py:1: in <module>
    from bmt.toolkit import Toolkit
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bmt/toolkit.py:6: in <module>
    from oaklib.implementations import UbergraphImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/__init__.py:9: in <module>
    from oaklib.selector import get_adapter, get_implementation_from_shorthand  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/selector.py:15: in <module>
    from oaklib.implementations.funowl.funowl_implementation import FunOwlImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/__init__.py:13: in <module>
    from oaklib.implementations.cx.cx_implementation import CXImplementation
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/cx/cx_implementation.py:9: in <module>
    from oaklib.implementations.obograph.obograph_implementation import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/implementations/obograph/obograph_implementation.py:40: in <module>
    from oaklib.interfaces.differ_interface import DifferInterface
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/interfaces/differ_interface.py:25: in <module>
    from oaklib.utilities.kgcl_utilities import generate_change_id
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/oaklib/utilities/kgcl_utilities.py:11: in <module>
    import kgcl_schema.grammar.parser as kgcl_parser
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/kgcl_schema/grammar/parser.py:9: in <module>
    from bioregistry import parse_iri, get_preferred_prefix, curie_to_str
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/__init__.py:5: in <module>
    from .collection_api import get_collection, get_context  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/collection_api.py:7: in <module>
    from .resource_manager import manager
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/resource_manager.py:41: in <module>
    from .schema import (
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/__init__.py:5: in <module>
    from .struct import (  # noqa:F401
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/bioregistry/schema/struct.py:2054: in <module>
    class RegistryGovernance(BaseModel):
pydantic/main.py:197: in pydantic.main.ModelMetaclass.__new__
    ???
pydantic/fields.py:506: in pydantic.fields.ModelField.infer
    ???
pydantic/fields.py:436: in pydantic.fields.ModelField.__init__
    ???
pydantic/fields.py:552: in pydantic.fields.ModelField.prepare
    ???
pydantic/fields.py:668: in pydantic.fields.ModelField._type_analysis
    ???
/opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/typing.py:852: in __subclasscheck__
    return issubclass(cls, self.__origin__)
E   TypeError: issubclass() arg 1 must be a class

The test is here: https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/tests/production/test_production_api.py

And in the test environment only the following dependencies are installed (cf. dependencies and optional dependencies in https://github.com/MaastrichtU-IDS/knowledge-collaboratory/blob/main/backend/pyproject.toml#L27 )

    "python-multipart >=0.0.5",
    "requests >=2.23.0",
    "httpx >=0.21.1",
    "pydantic[dotenv] >=1.9",
    "fastapi >=0.68.1",
    "uvicorn >=0.15.0",
    "gunicorn >=20.0.4",
    "Authlib >=0.15.4",
    "itsdangerous >=2.0.1",
    "reasoner-pydantic >=2.2.3",
    "rdflib >=6.1.1",
    "SPARQLWrapper >=1.8.5",
    "pytest >=7.1.3,<8.0.0",
    "pytest-cov >=2.12.0,<4.0.0",
    "ruff >=0.0.219",
    "reasoner-validator >=3.1.4",

To Reproduce
Steps to reproduce the behavior:

Try to use reasoner-validator in a real project

Expected behavior
Dependencies should be better defined, and better tested.

Tests have been defined in the tests/ folder, but they are currently not run by the GitHub Actions set in this repo (there are 1 action to publish without testing, and 1 to update the docs). Ideally tests should be run at every push to the repository (like the GitHub action to update the docs), and they should test for different version of python (3.9, 3.10, 3.11 at the moment)

Using the reasoner-validator is really important for us to make sure our output fits the ever changing TRAPI/BioLink specifications. Unfortunately the current state of the reasoner-validator forces us to spend hours per week to fix basic dependencies issues that should be handled upstream by the reasoner-validator, bmt, oaklib, etc. Those tools and libraries should have automated tests on every commit, and those tests should capture those kind of dependency failure by running in different environment.

Unfortunately at the moment our only long term solution is to do just like the reasoner-validator is doing right now: leaving the testing to someone else! (to the next person in the line, which are ARAs and ARX) and wait to be notified by those actors if our TRAPI payload is not compliant

Pass through jsonschema.exceptions.ValidationError

It would be nice to be able to from reasoner_pydantic import validate, ValidationError rather than having the additional from jsonschema.exceptions import ValidationError.

Validate infores CURIEs including unknown or renamed/deprecated infores CURIEs

Detect the biolink:knowledge_source (or child) term in the TRAPI Attribute attribute_type_id field then attempt to validate the contents of the Attribute value as a well-formed and known infores, applying the BMT code resolving biolink/biolink-model-toolkit#85 validation of the TRAPI provenance attributes.

Reasoner Validator yielding incorrect errors for primary_knowledge_source in TRAPI 1.4

Describe the bug
When trying to validate our TRAPI 1.4 messages, we're getting two errors:
error.knowledge_graph.edge.provenance.missing_primary
error.knowledge_graph.edge.attribute.missing

I think we have the new RetrievalSource modeled correctly for TRAPI 1.4, so I'm guessing reasoner validator is still looking in attributes for the primary_knowledge_source.

On a related note, KG Edge attributes are not required and are nullable in TRAPI 1.3 and 1.4, but error.knowledge_graph.edge.attribute.missing implies attributes are required. I assume this is related to the requirement for primary_knowledge_source, and so this error code may not be applicable to 1.4.

To Reproduce
Validating TRAPI response against 1.4.0-beta. Can try the attached COHD response.

cohd_trapi_1.4.txt

Add ability to pass a URL or schema dictionary to TRAPISchemaValidator.validate so it can be used to test example data that corresponds to changes in the ReasonerAPI specification on PR.

Is your feature request related to a problem? Please describe.
When I make changes to the ReasonerAPI schema, I want to write tests that validate the examples I provided with my change.

Following this basic workflow:

make changes in the TranslatorReasonerAPI.yaml in a branch
write some example data that conforms to the new schema
run some tests to make sure the example data conforms to the new schema.

TRAPISchemaValidator.validate(trapi_version) sets the trapi_version to the latest version of TRAPI (1.3.0) even if I pass in another version. (e.g. 1.3.1rc1)

I think this works now because 1.3.0 was made after the last round of schema changes (and ReasonerAPI tests) were in and working. The tests currently validate because they match the latest released schema, but will fail on each PR that attempts to modify the schema.

Describe the solution you'd like
I'd like to be able to pass a URL (e.g.: https://raw.githubusercontent.com/sierra-moxon/ReasonerAPI/mkg_qualifiers/TranslatorReasonerAPI.yaml), or the result of a yaml.load operation (a dictionary containing the schema I want to validate against) to the TRAPISchemaValidator.validate() method for it to use as the schema of reference.

Describe alternatives you've considered
I can write my own validator that uses a local/updated version of the schema to run on PR for ReasonerAPI PRs.

Additional Context
schema to validate against: https://raw.githubusercontent.com/sierra-moxon/ReasonerAPI/mkg_qualifiers/TranslatorReasonerAPI.yaml
simple tests: https://github.com/sierra-moxon/ReasonerAPI/blob/f42e7786e9c958d17ceb04f51ce91c80ad415130/tests/test_valid.py#L56

Provide a formal code based system of error reporting and messaging

Error reporting in the reasoner-validator is currently just a list of free formatted strings. Developing a standard index of such error messages (indexed by 'error code') would streamline the error reporting process.

As part of this new indexing, we should likely consider partitioning the reporting codes into distinct levels: ERROR, WARNING and INFO, (DEBUG?) to meet specific use cases.

Query Graph Validation

As making queries is one of the more difficult aspects of the current translator, I believe we could do better validation of the graph. Although this may not technically validate the JSON schema, it isn't really a useful query. A number of notable issues exist.

No valid node information
An edge that does not point to existing nodes (now seems related to #74)
A predicate that contains "biolink:" but is not a valid predicate

{
  "version": "1.2",
  "message": {
    "query_graph": {
        "nodes": {
            "drug": {}
        },
        "edges": {
            "treats": {"subject": "not_here", "predicates": ["biolink:not_a_real_predicate"], "object": "also_not_here"}
        }
    }
  }
}

check_compliance_of_trapi_response() should validate the whole TRAPI Response object, not just the Message part

check_compliance_of_trapi_response() currently only validates the Message part (QueryGraph, KnowledgeGraph and Results) portion of the TRAPI Response object, but rather need to validate the whole message.

Observations from the UI MVP front lines...

We create here a 'meta-' issue to generally capture additional validation tasks suggested by the Translator UI (see session notes and related Andy Crouse slide presentation).

Duplicate CURIEs for a given name. Shows some items as a duplicate if there are FDA approved results and non-FDA approved results: 'error' versus 'warning'?
- Are the duplications coming from different KPs?
- How would we test for this?
- Could we test ARAs for this?
- Do we conduct any testing at the level of the ARS itself?
Some Missing Names: results may contain cases where there is no name for a drug (in a TRAPI query for drugs treating diseases...): 'error' versus 'warning'?
Missing (concept) node descriptions: 'error' versus 'warning'?

error code with made-up predicate

reasoner_validator 3.1 with TRAPI version 1.3.0 and biolink_version 2.2.11, I see this:

  "errors": [
.......
    {
      "context": "Query Graph n00--['biolink:has_normalized_google_distance_with']->n01 Predicate",
      "name": "biolink:has_normalized_google_distance_with",
      "code": "error.unknown"
    }
  ]

I suppose the problem is that biolink:has_normalized_google_distance_with is something that we made up and isn't real.

First, should "code" be something more clarifying that "error.unknown"?

Second, What should we be using instead of biolink:has_normalized_google_distance_with?

Detect 'dangling nodes/edges' by iterating through node <-> edge mappings)

This operation may be applied both to the Query graph (see issue #14) and to the Knowledge Graph - the latter which may have to either have edges sub-sampled or deemed an optional validation, in the event that the returned TRAPI knowledge graph is very large.

For dangling nodes/edges in Knowledge Graphs, there are two sub-use cases:

Nodes defined in the graph, but not used by any edge
Edges defined in the graph whose nodes are not registered in the nodes list

How do I report what is wrong?

I'm trying to report exactly what is wrong. I'm running this successfully:

    try:
        validate_Message(envelope['message'])
    except ValidationError:
        raise ValueError('Bad Reasoner component!')

But how do I report to the user exactly what is wrong with the message? I can't figure out how to do that.

Deeper validation of TRAPI workflow specifications

We have fixed the check_compliance_of_trapi_response() method to substantially validate a full TRAPI Response object, but only perform schema validation of TRAPI workflow specifications.

As it happens, careful scrutiny of the schema definition indicates that the values of some parameters of some operations, have additional cardinality and semantic constraints, invisible to basic schema validation. This issue is a placeholder to remind us to ponder how to do deeper validation of such constraints (if feasible and performant enough for our use cases).

Deleting node ids?

Describe the bug
I am trying to implement Reasoner validator into ARAX again and am hitting a weird problem that I don't understand. It's entirely possible that I am doing something weird, but I think I've isolated it?

When I run validation on a TRAPI message, it appears to delete node ids from the query_graph. (??!)
Before validation:

    "query_graph": {
      "edges": {
        "t_edge": {
          "attribute_constraints": [],
          "exclude": null,
          "knowledge_type": "inferred",
          "object": "on",
          "option_group_id": null,
          "predicates": [
            "biolink:treats"
          ],
          "qualifier_constraints": [],
          "subject": "sn"
        }
      },
      "nodes": {
        "on": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [
            "MONDO:0015564"
          ],
          "is_set": false,
          "option_group_id": null
        },
        "sn": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false,
          "option_group_id": null
        }
      }
    },

and after validation:

    "query_graph": {
      "edges": {
        "t_edge": {
          "attribute_constraints": [],
          "exclude": null,
          "knowledge_type": "inferred",
          "object": "on",
          "option_group_id": null,
          "predicates": [
            "biolink:treats"
          ],
          "qualifier_constraints": [],
          "subject": "sn"
        }
      },
      "nodes": {
        "on": {
          "categories": [
            "biolink:Disease"
          ],
          "constraints": [],
          "ids": [],
          "is_set": false,
          "option_group_id": null
        },
        "sn": {
          "categories": [
            "biolink:ChemicalEntity"
          ],
          "constraints": [],
          "ids": null,
          "is_set": false,
          "option_group_id": null
        }
      }
    },

How is this possible? I am doubting my sanity.
Could there have been some experimenting/testing/intentional breaking code left in the validator?

To Reproduce
Steps to reproduce the behavior:
python3 test_response.py 140532

#!/usr/bin/python3
import sys
def eprint(*args, **kwargs): print(*args, file=sys.stderr, **kwargs)

import os
import sys
import re
import json
import requests
import json

#sys.path = ['/mnt/data/python/TestValidator'] + sys.path
from reasoner_validator import TRAPIResponseValidator



############################################ Main ############################################################

def main():

    #### Parse command line options
    import argparse
    argparser = argparse.ArgumentParser(description='CLI testing of the ResponseCache class')
    argparser.add_argument('--verbose', action='count', help='If set, print more information about ongoing processing' )
    argparser.add_argument('response_id', type=str, nargs='*', help='UUID or integer number of a response to fetch and validate')
    params = argparser.parse_args()

    #### Query and print some rows from the reference tables
    if len(params.response_id) == 0 or len(params.response_id) > 1:
        eprint("Please specify a single ARS response UUID")
        return

    response_id = params.response_id[0]

    if len(response_id) > 20:
        debug = True

        ars_hosts = [ 'ars-prod.transltr.io', 'ars.test.transltr.io', 'ars.ci.transltr.io', 'ars-dev.transltr.io', 'ars.transltr.io' ]
        for ars_host in ars_hosts:
            if debug:
                eprint(f"Trying {ars_host}...")
            try:
                response_content = requests.get(f"https://{ars_host}/ars/api/messages/"+response_id, headers={'accept': 'application/json'})
            except Exception as e:
                eprint( f"Connection attempts to {ars_host} triggered an exception: {e}" )
                return
            status_code = response_content.status_code
            if debug:
                eprint(f"--- Fetch of {response_id} from {ars_host} yielded {status_code}")

        if status_code != 200:
            if debug:
                eprint("Cannot fetch from ARS a response corresponding to response_id="+str(response_id))
                eprint(str(response_content.content))
            return

    else:
        response_content = requests.get('https://arax.ncats.io/devED/api/arax/v1.4/response/'+response_id, headers={'accept': 'application/json'})

    status_code = response_content.status_code

    if status_code != 200:
        eprint("Cannot fetch a response corresponding to response_id="+str(response_id))
        return

    #### Unpack the response content into a dict
    try:
        response_dict = response_content.json()
    except:
        eprint("Cannot decode Response with response_id="+str(response_id)+" into JSON")
        return

    if 'fields' in response_dict and 'actor' in response_dict['fields'] and str(response_dict['fields']['actor']) == '9':
            eprint("The supplied response id is a collection id. Please supply the UUID for a single Response")
            return

    if 'fields' in response_dict and 'data' in response_dict['fields']:
        envelope = response_dict['fields']['data']
        if envelope is None:
            envelope = {}
            return envelope
    else:
        envelope = response_dict

    #### If there is a previous validation, remove it
    if 'validation_result' in envelope:
        del(envelope['validation_result'])

    outfile = open('zz_before.json','w')
    print(json.dumps(envelope, sort_keys=True, indent=2), file=outfile)
    outfile.close()

    #### Perform a validation on it
    if params.verbose:
        print(json.dumps(envelope, sort_keys=True, indent=2))
    validator = TRAPIResponseValidator(trapi_version="1.4.0-beta4", biolink_version="3.2.1")
    validator.check_compliance_of_trapi_response(envelope)

    messages: Dict[str, List[Dict[str,str]]] = validator.get_messages()
    print(json.dumps(messages, sort_keys=True, indent=2))

    print("-------------------------")
    validation_output = validator.dump()
    print(validation_output)

    outfile = open('zz_after.json','w')
    print(json.dumps(envelope, sort_keys=True, indent=2), file=outfile)
    outfile.close()

    return


if __name__ == "__main__": main()

Add full Biolink Model release 3.0.2++ validation support; iterate on future model releases

Biolink Model release 3.0.2++ validation support, in particular, edge qualifier validation support.

Review how deprecated Biolink Model classes are being detected and reported

By example, a recent test using the validator gives the following error:

SKIPPED (test case S-P-O triple '(CHEBI:3002$biolink:ChemicalSubstance)--[biolink:treats]->(MESH:D001249$biolink:Disease)', since it is not Biolink Model compliant: BLM Version 2.4.4 Error in Input Edge: 'subject' category 'biolink:ChemicalSubstance' is unknown?)

In the 2.4.4 model, we see

  chemical substance:
    deprecated: >-
      This class is deprecated in favor of 'small molecule.'

The validation is detecting use of the deprecated class as test error. Although this is probably ok, we perhaps still need to review (and possibly adjust) error reporting for deprecated classes.

crash if workflow is null

Describe the bug
Code crashes if workflow is null

To Reproduce

import os
import sys
import re
import json
import requests
import json

from reasoner_validator import TRAPIResponseValidator

def main():

    #### Parse command line options
    import argparse
    argparser = argparse.ArgumentParser(description='CLI testing of the ResponseCache class')
    argparser.add_argument('--verbose', action='count', help='If set, print more information about ongoing processing' )
    argparser.add_argument('response_id', type=str, nargs='*', help='Integer number of a response to read and display')
    params = argparser.parse_args()

    #### Query and print some rows from the reference tables
    if len(params.response_id) == 0 or len(params.response_id) > 1:
        eprint("Please specify a single ARS response UUID")
        return

    response_id = params.response_id[0]

    response_content = requests.get('https://ars-dev.transltr.io/ars/api/messages/'+response_id, headers={'accept': 'application/json'})
    status_code = response_content.status_code

    if status_code != 200:
        eprint("Cannot fetch from ARS a response corresponding to response_id="+str(response_id))
        return

    #### Unpack the response content into a dict
    try:
        response_dict = response_content.json()
    except:
        eprint("Cannot decode ARS response_id="+str(response_id)+" to a Translator Response")
        return

    if 'fields' in response_dict and 'actor' in response_dict['fields'] and str(response_dict['fields']['actor']) == '9':
            eprint("The supplied response id is a collection id. Please supply the UUID for a response")
            return

    if 'fields' in response_dict and 'data' in response_dict['fields']:
        envelope = response_dict['fields']['data']
        if envelope is None:
            envelope = {}
            return envelope

        #### Perform a validation on it
        validator = TRAPIResponseValidator(trapi_version="1.3.0", biolink_version="3.0.3")
        validator.check_compliance_of_trapi_response(envelope)
        messages: Dict[str, List[Dict[str,str]]] = validator.get_messages()

        if params.verbose:
            print(json.dumps(messages, sort_keys=True, indent=2))

    return


if __name__ == "__main__": main()

if you run the above with:

python3 test_response.py ce6f88f1-4b85-4124-8cf1-7d4cd34f7ab0       
Traceback (most recent call last):
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 73, in <module>
    if __name__ == "__main__": main()
  File "/mnt/data/orangeboard/devED/RTX/code/ARAX/ResponseCache/test_response.py", line 64, in main
    validator.check_compliance_of_trapi_response(envelope)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 134, in check_compliance_of_trapi_response
    response: Dict = self.sanitize_trapi_query(response)
  File "/mnt/data/python/TestValidator/reasoner_validator/__init__.py", line 103, in sanitize_trapi_query
    for step in workflow_steps:
TypeError: 'NoneType' object is not iterable

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

Review all TRAPI models 'object' contexts for 'nullable: true' directives that allow for missing keys without error

validation fails when results or knowledge_graph are null

Describe the bug
The spec defines Message.results and Message.knowledge_graph to be nullable, but validation fails when the results or knowledge_graph are null.

To Reproduce
validate_Response on

{
    "datetime": "2021-02-05 23:27:50",
    "description": "Could not map node n01 to OMOP concept",
    "message": {
        "knowledge_graph": null,
        "query_graph": {
            "edges": {
                "e00": {
                    "object": "n01",
                    "predicate": [
                        "biolink:correlated_with"
                    ],
                    "subject": "n00"
                }
            },
            "nodes": {
                "n00": {
                    "category": [
                        "biolink:ChemicalSubstance"
                    ]
                },
                "n01": {
                    "id": [
                        "DOID:90535"
                    ]
                }
            }
        },
        "results": null
    },
    "query_options": {
        "confidence_interval": 0.99,
        "dataset_id": 3,
        "max_results": 5,
        "method": "obsExpRatio",
        "min_cooccurrence": 0
    },
    "reasoner_id": "COHD",
    "schema_version": "1.0.0",
    "status": "CouldNotMapCurieToLocalVocab",
    "tool_version": "COHD 3.0.0"
}

Expected behavior
Passes validtion.

Additional context
Using release version 1.0.2/1.0.1.

Is this a bug?

While trying to validate a TRAPI Response with the reasoner_validator version 3.1, I am seeing:

  "warnings": [
    {
      "context": "Query Graph",
      "edge_id": "n00--['biolink:physically_interacts_with']->n01",
      "predicate": "biolink:physically_interacts_with",
      "code": "warning.edge.predicate.non_canonical"
    }
  ],

What does this mean?
What is wrong with using n00--['biolink:physically_interacts_with']->n01 ?

Accept version="A.B"

Default to the latest A.B.x

ncatstranslator / reasoner-validator Goto Github PK

reasoner-validator's Issues

Recommend Projects

Recommend Topics

Recommend Org