monarch-initiative / ontogpt Goto Github PK

View Code? Open in Web Editor NEW

499.0 17.0 63.0 59.42 MB

LLM-based ontological extraction tools, including SPIRES

Home Page: https://monarch-initiative.github.io/ontogpt/

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.03% Python 10.48% HTML 0.27% Perl 0.01% Shell 0.01% Jupyter Notebook 89.14% Jinja 0.07%

ai gpt-3 language-models linkml nlp oaklib relation-extraction monarchinitiative chat-gpt large-language-models

ontogpt's Introduction

OntoGPT

Introduction

OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding.

Two different strategies for knowledge extraction are currently implemented in OntoGPT:

For more details, please see the full documentation.

Quick Start

OntoGPT runs on the command line, though there's also a minimal web app interface (see Web Application section below).

Ensure you have Python 3.9 or greater installed.
Install with pip:
```
pip install ontogpt
```

Set your OpenAI API key:

runoak set-apikey -e openai <your openai api key>

See the list of all OntoGPT commands:
```
ontogpt --help
```
Try a simple example of information extraction:
```
echo "One treatment for high blood pressure is carvedilol." > example.txt
ontogpt extract -i example.txt -t drug
```
OntoGPT will retrieve the necessary ontologies and output results to the command line. Your output will provide all extracted objects under the heading extracted_object.

Web Application

There is a bare bones web application for running OntoGPT and viewing results.

First, install the required dependencies with pip by running the following command:

pip install ontogpt[web]

Then run this command to start the web application:

web-ontogpt

NOTE: We do not recommend hosting this webapp publicly without authentication.

Evaluations

OpenAI's functions have been evaluated on test data. Please see the full documentation for details on these evaluations and how to reproduce them.

Related Projects

TALISMAN, a tool for generating summaries of functions enriched within a gene set. TALISMAN uses OntoGPT to work with LLMs.

Tutorials and Presentations

Presentation: "Staying grounded: assembling structured biological knowledge with help from large language models" - presented by Harry Caufield as part of the AgBioData Consortium webinar series (September 2023)
- Slides
- Video
Presentation: "Transforming unstructured biomedical texts with large language models" - presented by Harry Caufield as part of the BOSC track at ISMB/ECCB 2023 (July 2023)
- Slides
- Video
Presentation: "OntoGPT: A framework for working with ontologies and large language models" - talk by Chris Mungall at Joint Food Ontology Workgroup (May 2023)
- Slides
- Video

Citation

The information extraction approach used in OntoGPT, SPIRES, is described further in: Caufield JH, Hegde H, Emonet V, Harris NL, Joachimiak MP, Matentzoglu N, et al. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. arXiv publication: http://arxiv.org/abs/2304.02711

Acknowledgements

This project is part of the Monarch Initiative. We also gratefully acknowledge Bosch Research for their support of this research project.

ontogpt's People

Contributors

Stargazers

Watchers

ontogpt's Issues

Add framework for interfacing with arbitrary LLM

Current dependency on OpenAI API means OntoGPT is set up to assume that it will make API calls to a that resource, but to use other APIs and local resources we will need to generalize our API calls.
Perhaps a modular implementation system a la OAK is appropriate.

Add citation.cff

Can create this dynamically so it includes multiple papers and most recent software version. See https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files

Makefile requires some configuration to process custom model

The README says running make is sufficient to get Pydantic from LinkML YAML, but the list of templates is hard-coded into the Makefile. Including a project.Makefile with a ready-to-update TEMPLATES list (and updating the docs accordingly) may make it easier to update.

About ERROR:ontogpt.engines.knowledge_engine:Error with GildaImplementation

Hello @cmungall, I was able to install ontogpt after adjusting dependencies using poetry. I was trying to run the example your readme, i.e.

kih1pal@SYV-C-0001Y:~/ontogpt$ poetry run ontogpt extract -t gocam.GoCamAnnotations abstract.txt

It seems that something happens from ontology matching steps; it takes very long time and weird SSL issues are currently printing as errors, causing the failure of the operation. I wonder if there are ways to avoid this SSL validation issue, e.g.,

...
WARNING:root:Could not find any mappings for http://identifiers.org/hgnc/1885
WARNING:root:Could not find any mappings for http://identifiers.org/hgnc/1885
ERROR:ontogpt.engines.knowledge_engine:Error with GildaImplementation(resource=OntologyResource(slug=None, directory=None, scheme='gilda', format=None, url=None, readonly=False, provider=None, local=False, in_memory=False, data=None, implementation_class=<class 'oaklib.implementations.gilda.GildaImplementation'>, import_depth=None), strict=False, multilingual=None, preferred_language='en', autosave=True, exclude_owl_top_and_bottom=True, ontology_metamodel_mapper=None, _converter=None) for cGAS: SSL validation failed for https://gilda.s3.amazonaws.com/0.10.3/grounding_terms.tsv.gz [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)
...

I guess this may come from oaklib. I wonder if there are ways to solve or avoid issues. Thank you for your help!

p.s. I also tried if " --skip-annotator gilda:" works, suggested in other posts, but it's not working yet.

Add CellType to IBD extract template

This extraction process should collect cell types when possible.
See https://github.com/monarch-initiative/ontogpt/blob/6ebe777007459390f18ae6d26596974ba65dcf66/src/ontogpt/templates/kidney.yaml

investigate sorting of results from PubMed API -- eg X results truncated to Y, but what is the ordering

Add a generate-extract command for parsing the results of generated text

Use case: generate a description of a concept (e.g. cell type) entirely from LLM's "latent knowledge base", and then extract structure knowledge from it, thus bypassing the need for an incomplete pubmed search.

Could also be used as a kind of validation procedure on generated text - compare extracted knowledge with what is in KB. The difference is either hallucination or KB gaps.

Documentation: provide example of custom LinkML model

I tried

ontogpt extract -t path/to/templates/phenotype.yaml abstract.txt

but got:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/matentzn/ws/ontogpt/src/ontogpt/templates/path/to/templates/phenotype.yaml'`

add option to export KGX TSV

per convo in hackathon with @hrshdhgd @caufieldjh @AgranyaGitHub

We could/should consider adding an option to export KGX TSV

This will be useful for the literature -> KG workflow

Gene entities without annotator available

Hi!

I am trying to define my own extraction model using LinkML (more like editing the already existing templates) that I will then use to extract gene associations to ontologies from these 2 sources: http://purl.obolibrary.org/obo/opl.owl and http://purl.obolibrary.org/obo/wbphenotype.owl.

My issue is that there are no annotators for the gene IDs i would like to extract. These gene IDs are from parasitic worms hence not in gilda or bioportal:hgnc etc.

Is there any way I could locally set-up an "annotator" for my genes (I already have a list with all of them) that I could then use in my template?

Many thanks!

Update docs for enrichment; should all input be YAML?

The example usage for the enrichment command ($ ontogpt enrichment -r sqlite:obo:hgnc -U tests/input/human-genes.txt) doesn't work because the genesets are in a different directory now, and using a local file of one gene ID per line doesn't work:

$ ontogpt enrichment -r sqlite:obo:hgnc -U geneset.txt
Traceback (most recent call last):
  File "/home/harry/ontogpt/.venv/bin/ontogpt", line 6, in <module>
    sys.exit(main())
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 632, in enrichment
    results = ke.summarize(gene_set, normalize=resolver is not None, **kwargs)
  File "/home/harry/ontogpt/src/ontogpt/engines/enrichment.py", line 171, in summarize
    for gene in gene_set.genes.values():
AttributeError: 'NoneType' object has no attribute 'values'

Looks like it's mostly a documentation issue - but also the question of what the expected input format should be. Does it need to be YAML?

Add the possibility to disable or customize grounding

When running ontogpt getting the raw output from GPT models is quite fast (less than 5s), but the process of grounding the extracted entities is really slow (it can take minutes)

It is useful to give a default solution for grounding entities, but it would be practical to enable developers to choose which system they want to use for grounding

An easy way to implement this would be to add a parameter to the KnowledgeEngine to enable/disable grounding when extracting text. So that the developers can choose to implement their own grouding system after retrieving the raw output.

For example in my case I would be interested to use the SRI NameResolution API (https://name-resolution-sri.renci.org/docs)

Let me know if you are interested in this feature, I can create a pull request as I already implemented it in my fork for the SPINES engine

Unexpected behavior with text argument

These lines:

        if len(input) < 50 and Path(input).exists():
            text = open(input, "r").read()
        else:
            logging.info(f"Input {input} is not a file, assuming it is a string")
            text = input

seems useful for piping in text from another command on the command line, but seems likely to lead to unexpected behavior:

if a path to an input file is 50 or more characters long, it will silently be interpreted as input text, leading to unexpected output, example:

~/PythonProject/ontogpt $ extract -t disease_physiology.DiseasePhysiologyAnnotations really/long/path/name/for/some/reason/myInputFile.txt
ERROR:root:Line 'This text does not contain the necessary information to split into fields.' does not contain a colon; ignoring

if the user gives a path to an input file that does not exist, it will again be interpreted as input text, leading to confusing output

~/PythonProject/ontogpt $ extract -t disease_physiology.DiseasePhysiologyAnnotations oops/wrong/directory/myInputFile.txt
ERROR:root:Line 'This text does not contain the necessary information to split into fields.' does not contain a colon; ignoring

Parse numbered lists correctly

gpt-3.5-turbo-16k has 16k context windows - good for enrichment much less need to truncate.

I theory this can be dropped in --gpt-3.5-turbo-16k

but I have noticed the newer models seem less inclined to follow instructions and give ; separated lists. Seen this @caufieldjh? It likes to give numbered lists

  Summary: The common function among these genes is the regulation of various cellular processes and signaling pathways.


  Hypothesis:  The enriched terms suggest that these genes are involved in molecular interactions and signaling networks, which are essential for numerous cellular processes and regulation of various pathways. The overlapping functions may point towards their involvement in common regulatory mechanisms and networks that contribute to cellular homeostasis and development.
term_strings:
  - |-
    1. protein binding
    2. enzyme binding
    3. dna-binding transcription factor activity
    4. rna polymerase ii-specific transcription factor activity
    5. receptor binding activity
    6. atp binding activity
    7. cytoskeletal protein binding activity
    8. growth factor activity
    9. carbohydrate binding activity
    10. heme binding activity

    mechanism: these genes play a role in cellular processes such as protein-protein interactions
  - enzymatic activities
  - transcriptional regulation
  - receptor signaling
  - and binding of various molecules. they are involved in multiple signaling pathways
  - including growth factor signaling
  - dna transcription
  - cellular metabolism

it should be easy to modify the hacky payload parser to accept numbered lists

or maybe we just bite the bullet and use the json-structured function call reponses

Add function to generate prompt from enum

From @cmungall on Slack:

we should also have an option such that it will auto-generate a prompt for an enum (dynamic or static) of a small size

Do we need to expose the `temperature` field?

https://github.com/monarch-initiative/ontogpt/blob/8bb0dc058794924871f5b948fd851b9ed4694d4b/src/ontogpt/clients/openai_client.py#L50-L67C1

According to the OpenAI best practices for prompt engineering:

temperature - A measure of how often the model outputs a less likely token. The higher the temperature, the more random (and usually creative) the output. This, however, is not the same as “truthfulness”. For most factual use cases such as data extraction, and truthful Q&A, the temperature of 0 is best.

This seems like something which clients might want to be able to alter - e.g. a classification task wants to minimise hallucinations whereas ontology creation might want a bit more 'creativity'.

Add functionality to pubmed-extract to retrieve PMC XML

          In the short term, I can add functionality to retrieve PMC XML through the API for corresponding PMIDs, as available

Originally posted by @caufieldjh in #154 (comment)

mapping a string span in a document to a partially overlapping term label is too risky

In the 2023-05-17 EGSB microbial traits hackaton, we found many cases where a text in the input text was mapped to the ENVTHES ontology, based on a partial overlap. Like

- id: ENVTHES:22024
  label: trophic strategy

When ENVTHES:22024's label is only "trophic"

That kind of fuzzy mapping should not be a default behavior.

missing input file results in hang

If a specified input file doesn't exist (such as if you accidentally saved it in a subdirectory), then ontogpt hangs forever rather than complaining (e.g., if there is no file character states.txt):

poetry run ontogpt extract -t character_state.Trait -i characterstates.txt

Add parsing for tables from clinical case reports

Given a table in a PDF (or text) of a clinical case report, we would like to be able to do the following.

Extract variable names, reference ranges (if present) and values.
Interpret whether the value falls within the reference range. GPT will do this, with some irregularity, but that may be excessive.
Determine the corresponding phenotype for the value. This is the real value, and we'd really like to be able to get an HPO term, but even knowing what a value outside the reference range indicates (in descriptive terms, like "hypernatremia", not its diagnostic value) can be further linked through the existing methods in OntoGPT

About ModuleNotFoundError: No module named 'importlib.util' issue

Hello @cmungall, I was trying to install and test ontogpt but I was not able to proceed further because of the importlib issue shown below. It seems that it keeps searching for the python library called importlib, but most people say that library is already available in Python 3.9.

yy20716@yy20716-VirtualBox:~/ontogpt-test$ python3.9 -m venv venv
yy20716@yy20716-VirtualBox:~/ontogpt-test$ source venv/bin/activate
(venv) yy20716@yy20716-VirtualBox:~/ontogpt-test$ export PYTHONPATH=.:$PYTHONPATH
(venv) yy20716@yy20716-VirtualBox:~/ontogpt-test$ pip install ontogpt
Collecting ontogpt
  Downloading ontogpt-0.2.0-py3-none-any.whl (978 kB)
     |████████████████████████████████| 978 kB 4.0 MB/s 
Collecting bioc<3.0,>=2.0.post5
  Downloading bioc-2.0.post5-py3-none-any.whl (37 kB)
Collecting gilda<0.11.0,>=0.10.3
  Downloading gilda-0.10.3-py3-none-any.whl (166 kB)
     |████████████████████████████████| 166 kB 14.7 MB/s 
Collecting oaklib<0.2.0,>=0.1.64
  Downloading oaklib-0.1.73-py3-none-any.whl (458 kB)
     |████████████████████████████████| 458 kB 14.9 MB/s 
Collecting linkml-owl<0.3.0,>=0.2.7
  Downloading linkml_owl-0.2.7-py3-none-any.whl (16 kB)
Collecting airium<0.3.0,>=0.2.5
  Downloading airium-0.2.5-py3-none-any.whl (13 kB)
Collecting nlpcloud<2.0.0,>=1.0.39
  Downloading nlpcloud-1.0.40-py3-none-any.whl (4.5 kB)
Collecting importlib<2.0.0,>=1.0.4
  Downloading importlib-1.0.4.zip (7.1 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/yy20716/ontogpt-test/venv/bin/python3.9 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-m29z69lc/importlib/setup.py'"'"'; __file__='"'"'/tmp/pip-install-m29z69lc/importlib/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-m29z69lc/importlib/pip-egg-info
         cwd: /tmp/pip-install-m29z69lc/importlib/
    Complete output (11 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/yy20716/ontogpt-test/venv/lib/python3.9/site-packages/setuptools/__init__.py", line 6, in <module>
        import distutils.core
      File "/usr/lib/python3.9/distutils/core.py", line 16, in <module>
        from distutils.dist import Distribution
      File "/usr/lib/python3.9/distutils/dist.py", line 19, in <module>
        from distutils.util import check_environ, strtobool, rfc822_escape
      File "/usr/lib/python3.9/distutils/util.py", line 9, in <module>
        import importlib.util
    ModuleNotFoundError: No module named 'importlib.util'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

I also see similar errors when trying to use poetry:

(venv) yy20716@yy20716-VirtualBox:~/ontogpt$ poetry install
Installing dependencies from lock file

Package operations: 18 installs, 1 update, 0 removals

  • Installing adeft (0.11.2)
  • Installing bioc (2.0.post5)
  • Installing eutils (0.6.0)
  • Installing importlib (1.0.4): Failed

  ChefBuildError

  Backend 'setuptools.build_meta:__legacy__' is not available.

  at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/installation/chef.py:152 in _prepare
      148│ 
      149│                 error = ChefBuildError("\n\n".join(message_parts))
      150│ 
      151│             if error is not None:
    → 152│                 raise error from None
      153│ 
      154│             return path
      155│ 
      156│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with importlib (1.0.4) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "importlib (==1.0.4)"'.

I tested this issue using other ubuntu machines, but the same issue also arises. Can you help me to solve or avoid this issue? Thank you.

p.s. I tried with the version 1.1, and the same issue still exists. I wonder if we could remove this importlib in the dependencies.

Reconfigure HuggingFace Hub interface to use any repo

I initially set up the HF Hub interface to only work with a pre-established set of models (those defined in models.yaml).
This was built on the assumption that some models will work better with OntoGPT's extract methods than others. This assumption isn't wrong, but it's also not possible to test all of the available models out there.
It would be convenient to have a pre-tested set of models defined, but still permit usage of any model the API can access (i.e., any HF Hub repo name).
The CLI could look for a specific prefix like hf_hub: to indicate that it should use a manually specified model repo.

Add `gpt-3.5-turbo-16k`

The OpenAI gpt-3.5-turbo-16k model has a context limit of 16K tokens.
Just need to add it to the models list for it to be usable.

Some folks have reported that the full context isn't used, even if it can be passed to the model:
https://community.openai.com/t/gpt-3-5-turbo-16k-with-long-context-not-work/264987

web-ontogpt doesn't work in new installation

jim (main)$ poetry run web-ontogpt
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/homebrew/Cellar/[email protected]/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/jim/Documents/Source/ontogpt/src/ontogpt/webapp/main.py", line 5, in <module>
    import uvicorn
ModuleNotFoundError: No module named 'uvicorn'

Implement function calls for extract using OpenAI API

OpenAI models now support function calling: https://platform.openai.com/docs/guides/gpt/function-calling
One of the example use cases is extracting structured data from text, with the implication being that the function call will yield more consistent results.

Add ability to pass in synonymizer rules

@realmarcin can direct

when running the betacat case it doesn't match "protein kinase" - need to add activity

Support additional chat LLMs

GPT-3+ isn't the only LLM out there, and not even the only LLM set up with a chat interface. Let's support others to expand the scope of use cases and offer more flexibility in approaches. More open options would also give us and other OntoGPT users a better idea about the training data used with these models.
This is more of a parent issue - please split as needed.
Candidate LLMs and their accessibility (where known):

Bard - API access is currently waitlisted (Google calls it the PaLM API) and may require using Google Cloud Platform. See this post.
Open-Assistant - https://github.com/LAION-AI/Open-Assistant
Alpaca-7B+ - https://github.com/basetenlabs/alpaca-7b-truss
StableLM - https://github.com/stability-AI/stableLM/
OpenChatKit - https://github.com/togethercomputer/OpenChatKit
instruct-gpt-j https://huggingface.co/nlpcloud/instruct-gpt-j-fp16

Evaluate relation extraction on OMIM gene to disease

OMIM gene to disease relations are a good biomedical example of relations, and there are curated sets of these relations (https://maayanlab.cloud/Harmonizome/dataset/OMIM+Gene-Disease+Associations).
Can we use OntoGPT/SPIRES to, starting with either OMIM profiles for each gene OR the cited papers, extract the same G2D association?

Autogenerate docs

As follow-up from #92 ,

Autogenerate docs w/ GH action - see how the linkml-cookiecutter does it
Remove docs from main

Add a cell type template

https://docs.google.com/document/d/1cVvD6WfmaTZ_LOKUjUa-ZcWbXytA-_e974Ij2On6ohA/edit

cc @dosumis

Add filters to pubmed-annotate to pre-select entries based on gene annotations and MeSH

Mostly relevant to the ibd_literature branch for now, but we may select for a more useful collection of documents in a pubmed-annotate set by examining the available metadata re: genes, MeSH terms, and keywords.

e.g., if we want docs with genes to extract, only include those (if preferred) with gene mentions already present on Pubmed.

Error on running extract with egg noodles recipe

poetry run ontogpt extract -t recipe -i tests/input/cases/recipe-egg-noodles.txt
ERROR:root:Line 'No entities found. The text provides instructions for making homemade egg noodles but does not include information on a specific recipe with a name or categories. The ingredients and steps are described in a general sense without specific quantities.' does not contain a colon; ignoring
ERROR:root:Cannot ground None annotation, cls=Recipe

need spire logo

(the Berkeley campanile, which is sort of a spire, could be an inspiration...)

Add interface for abstract OpenAI-equivalent API

From #70:

Currently, I cannot use ontoGPT in my organization as it is due to security concerns around external APIs like OpenAI's. To address this issue, I suggest exploring the possibility of using open-source, local LLMs and their APIs as an alternative to OpenAI's API.

I have a similar use case, however, could it ever be abstracted even further, with an option to use in-house LLM services? We are currently working on such a ChatGPT alternative, and it would be amazing to simply plug in that API for ontoGPT...

Originally posted by @remerjohnson in #70 (comment)

installation issue regarding importlib (using conda and pip)

Many thanks for the great work!
I tried to install the package on Linux server through the two commands below but encountered an installation problem after collecting importlib. Please see the commands and the console output.

Best regards,
A

Commands:

> conda create -n onto39 -y python=3.9 && conda activate onto39
> pip install ontogpt

Console output:

Collecting ontogpt
  Using cached ontogpt-0.2.6-py3-none-any.whl (1.0 MB)
Collecting wikipedia<2.0.0,>=1.4.0
  Using cached wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... done
Collecting aiohttp<4.0.0,>=3.8.4
  Using cached aiohttp-3.8.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting inflection<0.6.0,>=0.5.1
  Using cached inflection-0.5.1-py2.py3-none-any.whl (9.5 kB)
Collecting beautifulsoup4<5.0.0,>=4.11.1
  Using cached beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
Collecting importlib<2.0.0,>=1.0.4
  Using cached importlib-1.0.4.zip (7.1 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [1 lines of output]
      ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Output to ttl or owl fails with `pubmed-annotate`

Trying to get output in ttl or owl format for pubmed-annotate fails with a few different errors.
Running something like the following:

ontogpt -vvv pubmed-annotate -t ibd_literature "irritable bowel disease drug blood" -o output.owl -O owl

initially throws an AttributeError because the write_extraction method in cli.py expects owl and ttl output to pass knowledge_engine.schemaview to the exporter.
So, as a quick fix for that (thanks @hrshdhgd), ensure that calls to write_extraction include the knowledge engine (usually ke).

however,

there's still potential to not get the desired output, as seen below:

$ ontogpt -vvv pubmed-annotate -t ibd_literature "irritable bowel disease drug blood" -o output.owl -O owl
...
INFO:root:{'genes': ['Not specified in the text'], 'exposures': ['CHEBI:16243'], 'gene_exposures_relationships': []}
DEBUG:root:BASE=/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema
INFO:root:Default_range not specified. Default set to 'string'
Traceback (most recent call last):
  File "/home/harry/ontogpt/.venv/bin/ontogpt", line 6, in <module>
    sys.exit(main())
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 301, in pubmed_annotate
    write_extraction(results, output, output_format, ke)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 102, in write_extraction
    exporter.export(results, output, knowledge_engine.schemaview)
  File "/home/harry/ontogpt/src/ontogpt/io/owl_exporter.py", line 43, in export
    ne_as_dc = self._as_dataclass_object(named_entity, schemaview)
  File "/home/harry/ontogpt/src/ontogpt/io/owl_exporter.py", line 55, in _as_dataclass_object
    dataclasses_module = PythonGenerator(schemaview.schema).compile_module()
  File "<string>", line 27, in __init__
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/generators/pythongen.py", line 57, in __post_init__
    super().__post_init__()
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/generator.py", line 189, in __post_init__
    self._initialize_using_schemaloader(schema)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/generator.py", line 232, in _initialize_using_schemaloader
    loader.resolve()
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/schemaloader.py", line 252, in resolve
    self.raise_value_error(
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/schemaloader.py", line 1037, in raise_value_error
    SchemaLoader.raise_value_errors(error, loc_str)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/schemaloader.py", line 1047, in raise_value_errors
    raise ValueError(f'{TypedNode.yaml_loc(loc_str, suffix="")} {error}')
ValueError: File "<file>", line 255, col 16 slot: geneExposureRelationship__molecular_activity - unrecognized range (Exposure)

OK, so this looks like an issue with the schema but why do we see it now and not with yaml output?

Trying with a different template raises its own issues:

$ontogpt -vvv pubmed-annotate -t ctd.ChemicalToDiseaseRelationship "irritable bowel disease drug blood" -o output.owl -O owl
...
ERROR:root:Line 'No entities were provided in the given text.' does not contain a colon; ignoring
DEBUG:root:RAW: None
DEBUG:root:Grounding annotation object None
ERROR:root:Cannot ground None annotation, cls=ChemicalToDiseaseRelationship
Traceback (most recent call last):
  File "/home/harry/ontogpt/.venv/bin/ontogpt", line 6, in <module>
    sys.exit(main())
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 301, in pubmed_annotate
    write_extraction(results, output, output_format, ke)
  File "/home/harry/ontogpt/src/ontogpt/cli.py", line 102, in write_extraction
    exporter.export(results, output, knowledge_engine.schemaview)
  File "/home/harry/ontogpt/src/ontogpt/io/owl_exporter.py", line 31, in export
    id_slot = schemaview.get_identifier_slot(cls_name)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 1331, in get_identifier_slot
    for sn in self.class_slots(cn, imports=imports):
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 1150, in class_slots
    ancs = self.class_ancestors(class_name, imports=imports)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 703, in class_ancestors
    return _closure(lambda x: self.class_parents(x, imports=imports, mixins=mixins, is_a=is_a),
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 56, in _closure
    vals = f(i)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 703, in <lambda>
    return _closure(lambda x: self.class_parents(x, imports=imports, mixins=mixins, is_a=is_a),
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 590, in class_parents
    cls = self.get_class(class_name, imports, strict=True)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml_runtime/utils/schemaview.py", line 507, in get_class
    raise ValueError(f'No such class as "{class_name}"')
ValueError: No such class as "NoneType"

Running web-ontogpt gives Internal Server Error

It says name 'OPENAI_MODELS' is not defined

Error using the BioLink model

Hi, I tried to use the BioLink model as base to run the SPIRES engine (as it was recommended in the readme!), but faced some issue.

Generating the .py file from the biolink_model.yaml put in src/ontogpt/templates/ worked well with make

But we encounter an error when trying to extract a class:

poetry run ontogpt extract -t biolink_model.ChemicalToDiseaseOrPhenotypicFeatureAssociation treatment.txt

We are getting the error ValueError: Template biolink_model.ChemicalToDiseaseOrPhenotypicFeatureAssociation not found because the classes name defined in the BioLink model are in the format chemical to disease or phenotypic feature association

If we try to provide the class name with spaces:

poetry run ontogpt -v extract -t "biolink_model.chemical to disease or phenotypic feature association" treatment.txt

There is an error due to the python classes name not matching:

  File "/home/vemonet/develop/translator/ontogpt/src/ontogpt/engines/knowledge_engine.py", line 230, in _get_template_class
    self.template_pyclass = mod.__dict__[class_name]
KeyError: 'chemical to disease or phenotypic feature association'

I could easily implement a fix I think, but I wonder if anyone here with a better knowledge of LinkML than me has a quick fix for this problem? @cmungall

No ontogpt executable after make

Hi,

Thanks for a great initiative, can't wait to try it!

I set up a condaenv with py3.9

Added poetry and oaklib
conda install -c conda-forge poetry
pip install oaklib

Then followed readme steps from "Pre-requisites" onwards.

make worked fine

But the test case failed:
(py39) magz@Magdalenas-MacBook-Pro ontogpt % ontogpt extract -t mendelian_disease.MendelianDisease -i tests/input/cases/mendelian-disease-marfan.txt
zsh: command not found: ontogpt

So, I suppose there should be an executable called ontogpt, which I for some reason don't have?
Should I be in unix or bash for it to work? This is the make output (but it looks fine to me, no errors):

(py39) magz@Magdalenas-MacBook-Pro ontogpt % make
poetry run gen-project src/ontogpt/templates/core.yaml -d projects/core && cp -pr projects/core/docs docs/core
ALL_SCHEMAS = ['src/ontogpt/templates/core.yaml']
INFO:root:Generating: graphql
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/graphql
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: graphql ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/core/graphql/core.graphql
INFO:root:Generating: jsonldcontext
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/jsonld
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: jsonldcontext ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/core/jsonld/core.context.jsonld
INFO:root:Generating: jsonld
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/jsonld
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: jsonld ARGS: {'mergeimports': True, 'context': 'projects/core/jsonld/core.context.jsonld'}
INFO:root: WRITING TO: projects/core/jsonld/core.jsonld
INFO:root:Generating: jsonschema
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/jsonschema
INFO:root: jsonschema ARGS: {'mergeimports': True}
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/core.yaml; base_dir=None
INFO:root: WRITING TO: projects/core/jsonschema/core.schema.json
INFO:root:Generating: markdown
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/docs
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: markdown ARGS: {'mergeimports': True, 'directory': 'projects/core/docs', 'index_file': 'core.md'}
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
INFO:root:Generating: owl
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/owl
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: owl ARGS: {'mergeimports': True}
ERROR:root:Multiple slots with URI: http://w3id.org/ontogpt/core/id: ['namedEntity__id', 'publication__id']; consider giving each a unique slot_uri
ERROR:root:Multiple slots with URI: http://w3id.org/ontogpt/core/id: ['namedEntity__id', 'publication__id']; consider giving each a unique slot_uri
INFO:root: WRITING TO: projects/core/owl/core.owl.ttl
INFO:root:Generating: prefixmap
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/prefixmap
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: prefixmap ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/core/prefixmap/core.yaml
INFO:root:Generating: proto
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/protobuf
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
syntax="proto3";
package
// metamodel_version: 1.7.0
INFO:root: proto ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/core/protobuf/core.proto
INFO:root:Generating: python
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: python ARGS: {'mergeimports': True}
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: extractionResult__extracted_object => Any
INFO:root:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:root:CHECK: extractionResult__named_entities => Any
INFO:root:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: extractionResult__extracted_object => Any
INFO:root:CHECK: extractionResult__named_entities => Any
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root: WRITING TO: projects/core/core.py
INFO:root:Generating: shex
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/shex
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: shex ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/core/shex/core.shex
INFO:root:Generating: shacl
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/shacl
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: shacl ARGS: {'mergeimports': True}
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/core.yaml; base_dir=None
INFO:root: WRITING TO: projects/core/shacl/core.shacl.ttl
INFO:root:Generating: sqlddl
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/sqlschema
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
/* metamodel_version: 1.7.0 */
INFO:root: sqlddl ARGS: {'mergeimports': True}
INFO:root:No PK for Publication
INFO:root: WRITING TO: projects/core/sqlschema/core.sql
INFO:root:Generating: excel
INFO:root: SCHEMA: src/ontogpt/templates/core.yaml
INFO:root: PARENT=projects/core/excel
INFO:root: excel ARGS: {'mergeimports': True, 'output': 'projects/core/excel/core.xlsx'}
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/core.yaml; base_dir=None
poetry run gen-project src/ontogpt/templates/mendelian_disease.yaml -d projects/mendelian_disease && cp -pr projects/mendelian_disease/docs docs/mendelian_disease
ALL_SCHEMAS = ['src/ontogpt/templates/mendelian_disease.yaml']
INFO:root:Generating: graphql
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/graphql
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: graphql ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/mendelian_disease/graphql/mendelian_disease.graphql
INFO:root:Generating: jsonldcontext
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/jsonld
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: jsonldcontext ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/mendelian_disease/jsonld/mendelian_disease.context.jsonld
INFO:root:Generating: jsonld
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/jsonld
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: jsonld ARGS: {'mergeimports': True, 'context': 'projects/mendelian_disease/jsonld/mendelian_disease.context.jsonld'}
INFO:root: WRITING TO: projects/mendelian_disease/jsonld/mendelian_disease.jsonld
INFO:root:Generating: jsonschema
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/jsonschema
INFO:root: jsonschema ARGS: {'mergeimports': True}
INFO:root:Importing core as core from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=src/ontogpt/templates
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=None
INFO:root: WRITING TO: projects/mendelian_disease/jsonschema/mendelian_disease.schema.json
INFO:root:Generating: markdown
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/docs
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: markdown ARGS: {'mergeimports': True, 'directory': 'projects/mendelian_disease/docs', 'index_file': 'mendelian_disease.md'}
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
WARNING:root:Instantiating generator with another generator is deprecated
INFO:root:Generating: owl
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/owl
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: owl ARGS: {'mergeimports': True}
ERROR:root:Multiple slots with URI: http://w3id.org/ontogpt/core/id: ['namedEntity__id', 'publication__id']; consider giving each a unique slot_uri
ERROR:root:Multiple slots with URI: http://w3id.org/ontogpt/core/id: ['namedEntity__id', 'publication__id']; consider giving each a unique slot_uri
INFO:root: WRITING TO: projects/mendelian_disease/owl/mendelian_disease.owl.ttl
INFO:root:Generating: prefixmap
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/prefixmap
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: prefixmap ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/mendelian_disease/prefixmap/mendelian_disease.yaml
INFO:root:Generating: proto
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/protobuf
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
syntax="proto3";
package
// metamodel_version: 1.7.0
INFO:root: proto ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/mendelian_disease/protobuf/mendelian_disease.proto
INFO:root:Generating: python
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
INFO:root: python ARGS: {'mergeimports': True}
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: extractionResult__extracted_object => Any
INFO:root:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:root:CHECK: extractionResult__named_entities => Any
INFO:root:FALSE: OCCURS BEFORE: Any == Any owning: ExtractionResult
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: mendelianDisease__name => string
INFO:root:CHECK: mendelianDisease__description => string
INFO:root:CHECK: mendelianDisease__synonyms => string
INFO:root:CHECK: mendelianDisease__subclass_of => DiseaseCategory
INFO:root:CHECK: mendelianDisease__symptoms => Symptom
INFO:root:CHECK: mendelianDisease__inheritance => Inheritance
INFO:root:CHECK: mendelianDisease__genes => Gene
INFO:root:CHECK: mendelianDisease__disease_onsets => Onset
INFO:root:CHECK: mendelianDisease__publications => Publication
INFO:root:TRUE: OCCURS SAME: MendelianDisease == Publication owning: MendelianDisease
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: mendelianDisease__name => string
INFO:root:CHECK: mendelianDisease__description => string
INFO:root:CHECK: mendelianDisease__synonyms => string
INFO:root:CHECK: mendelianDisease__subclass_of => DiseaseCategory
INFO:root:CHECK: mendelianDisease__symptoms => Symptom
INFO:root:CHECK: mendelianDisease__inheritance => Inheritance
INFO:root:CHECK: mendelianDisease__genes => Gene
INFO:root:CHECK: mendelianDisease__disease_onsets => Onset
INFO:root:CHECK: mendelianDisease__publications => Publication
INFO:root:TRUE: OCCURS SAME: MendelianDisease == Publication owning: MendelianDisease
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: symptom__characteristic => string
INFO:root:CHECK: symptom__affects => string
INFO:root:CHECK: symptom__severity => string
INFO:root:CHECK: symptom__onset_of_symptom => Onset
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: symptom__characteristic => string
INFO:root:CHECK: symptom__affects => string
INFO:root:CHECK: symptom__severity => string
INFO:root:CHECK: symptom__onset_of_symptom => Onset
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: onset__years_old => string
INFO:root:CHECK: onset__decades => string
INFO:root:CHECK: onset__juvenile_or_adult => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: onset__years_old => string
INFO:root:CHECK: onset__decades => string
INFO:root:CHECK: onset__juvenile_or_adult => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:TRUE: OCCURS SAME: TextWithTriples == Publication owning: TextWithTriples
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:FALSE: OCCURS BEFORE: Triple == Triple owning: TextWithTriples
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: namedEntity__id => string
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root:CHECK: mendelianDisease__name => string
INFO:root:CHECK: mendelianDisease__description => string
INFO:root:CHECK: mendelianDisease__synonyms => string
INFO:root:CHECK: mendelianDisease__subclass_of => DiseaseCategory
INFO:root:CHECK: mendelianDisease__symptoms => Symptom
INFO:root:CHECK: mendelianDisease__inheritance => Inheritance
INFO:root:CHECK: mendelianDisease__genes => Gene
INFO:root:CHECK: mendelianDisease__disease_onsets => Onset
INFO:root:CHECK: mendelianDisease__publications => Publication
INFO:root:CHECK: symptom__characteristic => string
INFO:root:CHECK: symptom__affects => string
INFO:root:CHECK: symptom__severity => string
INFO:root:CHECK: symptom__onset_of_symptom => Onset
INFO:root:CHECK: onset__years_old => string
INFO:root:CHECK: onset__decades => string
INFO:root:CHECK: onset__juvenile_or_adult => string
INFO:root:CHECK: extractionResult__input_id => string
INFO:root:CHECK: extractionResult__input_title => string
INFO:root:CHECK: extractionResult__input_text => string
INFO:root:CHECK: extractionResult__raw_completion_output => string
INFO:root:CHECK: extractionResult__prompt => string
INFO:root:CHECK: extractionResult__extracted_object => Any
INFO:root:CHECK: extractionResult__named_entities => Any
INFO:root:CHECK: namedEntity__label => string
INFO:root:CHECK: triple__subject => NamedEntity
INFO:root:CHECK: triple__predicate => RelationshipType
INFO:root:CHECK: triple__object => NamedEntity
INFO:root:CHECK: triple__qualifier => string
INFO:root:CHECK: triple__subject_qualifier => NamedEntity
INFO:root:CHECK: triple__object_qualifier => NamedEntity
INFO:root:CHECK: textWithTriples__publication => Publication
INFO:root:CHECK: textWithTriples__triples => Triple
INFO:root:CHECK: publication__id => string
INFO:root:CHECK: publication__title => string
INFO:root:CHECK: publication__abstract => string
INFO:root:CHECK: publication__combined_text => string
INFO:root:CHECK: publication__full_text => string
INFO:root:CHECK: annotatorResult__subject_text => string
INFO:root:CHECK: annotatorResult__object_id => string
INFO:root:CHECK: annotatorResult__object_text => string
INFO:root: WRITING TO: projects/mendelian_disease/mendelian_disease.py
INFO:root:Generating: shex
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/shex
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: shex ARGS: {'mergeimports': True}
INFO:root: WRITING TO: projects/mendelian_disease/shex/mendelian_disease.shex
INFO:root:Generating: shacl
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/shacl
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink

metamodel_version: 1.7.0

INFO:root: shacl ARGS: {'mergeimports': True}
INFO:root:Importing core as core from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=src/ontogpt/templates
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=None
INFO:root: WRITING TO: projects/mendelian_disease/shacl/mendelian_disease.shacl.ttl
INFO:root:Generating: sqlddl
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/sqlschema
WARNING:root:File "core.yaml", line 59, col 19: Unrecognized prefix: rdfs
WARNING:root:Unrecognized prefix: MONDO
WARNING:root:Unrecognized prefix: HGNC
WARNING:root:Unrecognized prefix: HP
WARNING:root:Unrecognized prefix: RO
WARNING:root:Unrecognized prefix: biolink
/* metamodel_version: 1.7.0 */
INFO:root: sqlddl ARGS: {'mergeimports': True}
INFO:root:No PK for Publication
INFO:root: WRITING TO: projects/mendelian_disease/sqlschema/mendelian_disease.sql
INFO:root:Generating: excel
INFO:root: SCHEMA: src/ontogpt/templates/mendelian_disease.yaml
INFO:root: PARENT=projects/mendelian_disease/excel
INFO:root: excel ARGS: {'mergeimports': True, 'output': 'projects/mendelian_disease/excel/mendelian_disease.xlsx'}
INFO:root:Importing core as core from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=src/ontogpt/templates
INFO:root:Importing linkml:types as /Users/magz/miniconda3/envs/py39/lib/python3.9/site-packages/linkml_runtime/linkml_model/model/schema/types from source src/ontogpt/templates/mendelian_disease.yaml; base_dir=None

Docs only link to subset of templates

The current docs only contain pages for:

yet other templates exist and some even have registered w3id's already, e.g. http://w3id.org/ontogpt/mendelian_disease

In most (all?) cases there is a generated page, it just isn't linked in the sidebar.

error for 'make all_enrich' random.SystemRandom.choice

seeing this error today afternoon and per Chris pinging @hrshdhgd

Traceback (most recent call last):
File "", line 1, in
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/src/ontogpt/cli.py", line 728, in eval_enrichment
print(f"RANDOM GENE: {eval_engine.random_gene_symbol()}")
File "/Users/marcin/Documents/VIMSS/ontology/LLMs/ontogpt/src/ontogpt/evaluation/enrichment/eval_enrichment.py", line 189, in random_gene_symbol
ann = random.SystemRandom.choice(assocs)
TypeError: choice() missing 1 required positional argument: 'seq'
make: *** [analysis/enrichment/EDS-results-2.yaml] Error 1

Add option to use HuggingFace model locally

HF models may be accessed through their API, as implemented in #145 , but it's prone to timeouts when using larger models (which is unfortunate, since those are the ones OntoGPT needs to use). They can be retrieved and used locally instead, with the expected tradeoff of needing more local resources but no ongoing reliance on a persnickety API.

Langchain has a HuggingFacePipeline - see example here: https://github.com/databrickslabs/dolly/blob/master/examples/langchain.py

Fix dead links in the docs

The project docs contain the following dead links:

"See full output"
"See src/semantic_llama/templates/ for examples"

suggestion: avoid code generation

Sorry if I'm talking into the void or beating a dead horse, but just a heads up that it's generally ill advised to generate code like you do here (i.e. yaml -> pydantic etc.). I just noticed this and thought it may be helpful to point that out as I'm having difficulty understanding the codebase and also it seems quite finicky in terms of execution.

Nevertheless I love the repo and its function; awesome!

running make gives `sed: 1: "CITATION.cff": invalid command code C`

I added a new template and ran make. I get:

V=$(poetry run python -c "import ontogpt;print('.'.join((ontogpt.__version__).split('.', 3)[:3]))") ; \
	sed -i '/^version:/c\version: '"$V"'' CITATION.cff
sed: 1: "CITATION.cff": invalid command code C
make: *** [update_citation] Error 1

Issue with dependencies

Hi @cmungall , thanks a lot for this library, it is quite interesting

I am trying to import and use ontogpt from other python projects and am facing issues with the dependencies, such as:

Error due to importlib being explicitly imported (it is part of the standard lib so it can be removed from the pyproject.toml)
A lot of dev dependencies are included as default package dependencies, such as tox and sphinx
Dependencies to deploy the web app are also included as default package dependencies, they could be made optional

This could look like this to start:

[tool.poetry.dependencies]
python = "^3.9"
click = "^8.1.3"
openai = "^0.25.0"
oaklib = "^0.1.64"
gilda = "^0.10.3"
jsonlines = "^3.1.0"
python-multipart = "^0.0.5"
linkml-owl = "^0.2.4"
beautifulsoup4 = "^4.11.1"
eutils = "^0.6.0"
class-resolver = "^0.3.10"
inflect = "^6.0.2"
bioc = "^2.0.post5"
linkml = "1.3.16"
wikipedia = "^1.4.0"
tiktoken = "^0.1.1"
airium = "^0.2.5"
SQLAlchemy = "^1.4.32, !=1.4.46"
greenlet = "!=2.0.2"

[tool.poetry.dev-dependencies]
pytest = "^7.1.2"
setuptools = "^65.5.0"
tox = "^3.25.1"
mkdocs-mermaid2-plugin = "^0.6.0"
sphinx = {version = "^5.3.0", extras = ["docs"]}
sphinx-rtd-theme = {version = "^1.0.0", extras = ["docs"]}
sphinx-autodoc-typehints = {version = "^1.19.4", extras = ["docs"]}
sphinx-click = {version = "^4.3.0", extras = ["docs"]}
myst-parser = {version = "^0.18.1", extras = ["docs"]}
fastapi = "^0.88.0"
uvicorn = "^0.20.0"
Jinja2 = "^3.1.2"

[tool.poetry.scripts]
ontogpt = "ontogpt.cli:main"
web-ontogpt = "ontogpt.webapp.main:start"

[tool.poetry.extras]
docs = [
    "sphinx",
    "sphinx-rtd-theme",
    "sphinx-autodoc-typehints",
    "sphinx-click",
    "myst-parser"
]
web = [
    "fastapi",
    "uvicorn",
    "Jinja2",
]

With those changes I am able to import and use ontogpt in other python projects (e.g. https://github.com/vemonet/ontogpt-api)

instructions for project.Makefile don't work

The instructions say to add a new template to TEMPLATES in project.Makefile, but that doesn't have an effect. I had to edit TEMPLATES in Makefile.

Conflicting URIs across templates

Running make for all projects/templates currently produces this error:

Traceback (most recent call last):
  File "/home/harry/ontogpt/.venv/bin/gen-project", line 8, in <module>
    sys.exit(cli())
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/generators/projectgen.py", line 260, in cli
    gen.generate(yamlfile, project_config)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/generators/projectgen.py", line 149, in generate
    gen = gen_cls(local_path, **all_gen_args)
  File "<string>", line 23, in __init__
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/generators/graphqlgen.py", line 26, in __post_init__
    super().__post_init__()
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/generator.py", line 189, in __post_init__
    self._initialize_using_schemaloader(schema)
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/generator.py", line 232, in _initialize_using_schemaloader
    loader.resolve()
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/schemaloader.py", line 166, in resolve
    merge_schemas(
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/mergeutils.py", line 48, in merge_schemas
    merge_dicts(
  File "/home/harry/ontogpt/.venv/lib/python3.9/site-packages/linkml/utils/mergeutils.py", line 147, in merge_dicts
    raise ValueError(
ValueError: Conflicting URIs (http://w3id.org/ontogpt/gocam, https://w3id.org/ontogpt/biological_process) for item: Gene
make: *** [Makefile:41: projects/all] Error 1

This happens after an attempt to run

poetry run gen-project src/ontogpt/templates/all.yaml -d projects/all && cp -pr projects/all/docs docs/all

so that conflict makes sense: two different templates try to define Gene.
Who gets to say what a Gene is?
Or should we skip trying to merge all into one template?

Modify application to allow running on streamlit cloud

We currently have a SPINDOCTOR app that can be run locally using streamlit

There will need to be some changes to allow running hosted on streamlit cloud

OAK api key handling uses AppConfig which results in ValueError: No API key found in: /home/appuser/.config/ontology-access-kit/openai-apikey.txt
sqlite caches handled by pystow, likely similar issue
we probably don't want (can't) cache whole sqlite databases on sl cloud so perhaps have an option to fall back to bioportal for grounding?

Add parser for PubMedCentral full texts

We'd like to work with larger sets of full texts from PubMed (Central).
These may be downloaded here: https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/
But we still need functions to:

Index PMC docs
Retrieve texts for a list of PMIDs
Pass segments of full texts to extract function (assuming the full doc will be longer than context length) - ideally filterable by some segment, or defaulting to the first n tokens or so

monarch-initiative / ontogpt Goto Github PK

ontogpt's Introduction

OntoGPT

Introduction

Quick Start

Web Application

Evaluations

Related Projects

Tutorials and Presentations

Citation

Acknowledgements

ontogpt's People

Contributors

Stargazers

Watchers

Forkers

ontogpt's Issues

metamodel_version: 1.7.0

metamodel_version: 1.7.0

metamodel_version: 1.7.0

metamodel_version: 1.7.0

metamodel_version: 1.7.0

metamodel_version: 1.7.0

Recommend Projects

Recommend Topics

Recommend Org