dataherald / dataherald Goto Github PK

View Code? Open in Web Editor NEW

3.2K 24.0 219.0 4.45 MB

Interact with your SQL database, Natural Language to SQL using LLMs

Home Page: https://dataherald.readthedocs.io/en/latest/

License: Apache License 2.0

Python 58.61% Dockerfile 0.30% Shell 0.19% JavaScript 1.27% TypeScript 39.32% CSS 0.16% Makefile 0.08% Batchfile 0.07%

ai database finetuning llm nl-to-sql rag sql text-to-sql

dataherald's Introduction

Dataherald monorepo

Query your relational data in natural language.

| | Docs | Homepage

Dataherald is a natural language-to-SQL engine built for enterprise-level question answering over relational data. It allows you to set up an API from your database that can answer questions in plain English. You can use Dataherald to:

Allow business users to get insights from the data warehouse without going through a data analyst
Enable Q+A from your production DBs inside your SaaS application
Create a ChatGPT plug-in from your proprietary data

This repository contains four components under /services which can be used together to set up an end-to-end Dataherald deployment:

Engine: The core natural language-to-SQL engine. If you would like to use the dataherald API without users or authentication, running the engine will suffice.
Enterprise: The application API layer which adds authentication, organizations and users, and other business logic to Dataherald.
Admin-console: The front-end component of Dataherald which allows a GUI for configuration and observability. You will need to run both engine and enterprise for the admin-console to work.
Slackbot: A slackbot which allows users from a slack channel to interact with dataherald. Requires both engine and enterprise to run.

For more information on each component, please take a look at their README.md files.

Running locally

Each component in the /services directory has its own docker-compose.yml file. To set up the environment, follow these steps:

Set Environment Variables: Each service requires specific environment variables. Refer to the .env.example file in each service directory and create a .env file with the necessary values.

For the Next.js front-end app is .env.local
Run Services: You can run all the services using a single script located in the root directory. This script creates a common Docker network and runs each service in detached mode.

Run the script to start all services:

sh docker-run.sh

Contributing

As an open-source project in a rapidly developing field, we are open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

dataherald's People

Contributors

Stargazers

Watchers

Forkers

jcjc712 dh-nick kirikov souravzzz hbcbh1999 leftomelas jithinraj sigkill79 tonywhite11 abcampbell anujsaigal josegron crivetimihai hallta lumiqai inoculus rmaroun partnerise ppmarkus wriver4 massivefermion eric-epsilla alexrogalskiy claycampbell divhit mohammadrezapourreza martinkosela ishaan-jaff rahulpawar1212 pankaj9296 eu2pey4 mrtunguyen khaianis qqq-tech jlabastida0000 huangjjjjjj identify-team chinnaiyanvignesh damonclifford nixent trasgum linuer roy-moven mindkhichdi jfontestad marcinsage priyansh121096 jiaerdangjia markremmey jape-dev prabha-git jmoork ushakrishnan zancmeresek ativiper muralidharand cafew nishu-builder belkmouf toliver38 tannerlove regression-io gbbs-cloud dfa126 brucemeek kalinkinisaac enteprise mivanovitch rwatts3 jade2290 deep-learner-msp jcephas guillermo0591 codehornets sheikhhanif khursheed33 spiswe kurskikh chrissblm ego mastersatish shikhilg kp-forks archit15singh aesmin irousso fkapsahili kautshukla santoshdahale harshitdy arunsanna jmanuelnavarro gsayesh draculadengdev computervisionsports yllus theodevmta rebekz mta-tech eltociear

dataherald's Issues

Can u support Azure Open AI APIs?

The SDKs for Azure OPEN AI n OPEN AI are different.
Azure requires a few more credentials. Can you please support this?

Meanwhile please let me know where all changes needed to support Azure APIs. Will make the changes and file a PR.

stream mode always with \nObservation in Action Input

Action: DbTablesWithRelevanceScores
Action Input: query five records Observation
Observation: Table: aiops_metric_alarm, relevance score: 0.4535

I need to look at the schema of the relevant table.
Action: DbRelevantTablesSchema
Action Input: aiops_metric_alarm Observation

Observation: Tables not found in the database
Final Answer: The table 'aiops_metric_alarm' is not found in the database, so no records can be retrieved.

this is the table_names_list ['aiops_metric_alarm\nObservation']

Hi, got this error while executing /api/v1/golden-records after database, table and columns

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 833, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/server/fastapi/init.py", line 218, in add_golden_records
created_records = self._api.add_golden_records(golden_records)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/api/fastapi.py", line 208, in add_golden_records
return context_store.add_golden_records(golden_records)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/context_store/default.py", line 65, in add_golden_records
self.vector_store.add_record(
File "/app/dataherald/vector_store/chroma.py", line 48, in add_record
target_collection.add(documents=documents, metadatas=metadata, ids=ids)
File "/usr/local/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 96, in add
ids, embeddings, metadatas, documents = self._validate_embedding_set(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/chromadb/api/models/Collection.py", line 387, in _validate_embedding_set
embeddings = self._embedding_function(documents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 300, in call
self._init_model_and_tokenizer()
File "/usr/local/lib/python3.11/site-packages/chromadb/utils/embedding_functions.py", line 293, in _init_model_and_tokenizer
self.model = self.ort.InferenceSession(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 432, in init
raise e
File "/usr/local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in init
self._create_inference_session(providers, provider_options, disabled_optimizers)
File "/usr/local/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 451, in _create_inference_session
raise ValueError(
ValueError: This ORT build has ['AzureExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['AzureExecutionProvider', 'CPUExecutionProvider'], ...)

Random errors

Awesome project! I am testing some queries, and after setting up a connection, scanned a table and add one golden record, running a query sometimes returns a 404 with the following output:

Error: Not FoundResponse bodyDownload{   "detail": "Got unexpected type of `handle_parsing_errors`" }
--

Is there a way to get a better error message?

Support finetuning open-source LLMs

Hi everyone,

Our engine currently supports fine-tuning OpenAI models. However, we've received considerable feedback from our community expressing a desire to integrate their own fine-tuned models into our pipeline, as an alternative to the default OpenAI models. We welcome and greatly appreciate any contributions toward implementing this feature. Here are the steps outlining the necessary modifications:

Introduction of a New Endpoint: We need a new endpoint that serves the fine-tuning dataset. This endpoint should deliver a JSONL file generated by the create_finetuning_dataset() function located within our finetuning directory. The provided file will be instrumental for users looking to fine-tune a model.
Modification of SQL Generation Endpoints: It's crucial to update all SQL generation endpoints to accept two new parameters: base_url and model_name. These adjustments will enable the endpoints to interface with the user's fine-tuned model that has been deployed.

Your involvement in this project would significantly contribute to enhancing its flexibility and usability. We look forward to any support or input you can provide on this matter.

Thank you!

Ms SQL Support

Hello All,
does anyone achieved a connection to an MS SQL server instance.

if yes can you show how to do this.

Example of ideal DDL for database schema table description for multiple DB/table queries within one snowflake/RDMS account

For RAG based agent, can you please provide an example of an ideal DDL for DB schema table description for multiple DB/table queries within one snowflake/RDMS account

Enhance ColumnDescription with is_required and operators Fields for Improved SQL Query Generation

Issue Description

In the context of utilizing Foreign Data Wrappers (FDW) in PostgreSQL, there's a unique requirement to specify certain columns as mandatory in every SQL query, as well as to restrict specific columns to a subset of SQL operators. This enhancement request aims to address these needs by proposing the addition of two new fields to the ColumnDescription class: is_required and operators.

Proposed Fields

is_required: bool - This field indicates whether a specific field must always be included in an SQL statement. This is particularly relevant for ensuring compliance with certain constraints when working with FDWs in PostgreSQL.
operators: list[str] - This field specifies the subset of operators that are permissible for a given column. For example, a role_id column might only accept the equality operator (=). This flexibility is crucial for accurately reflecting the capabilities and constraints of underlying data sources accessed via FDWs.

Use Case

Consider a table wrapped by an FDW where certain columns are required for query execution due to the remote data source's constraints, or where specific columns only support a limited range of operations. The proposed enhancements would allow for more granular control over query construction, ensuring that generated SQL queries are both valid and optimized for execution against such data sources.

Proposed Implementation

The following outlines a preliminary approach to implementing these enhancements:

Modify the ColumnDescription class to include is_required and operators fields, ensuring that these fields are appropriately serialized/deserialized and included in any relevant data structures or APIs.
Update the SQL query generation logic to:
- Ensure that columns marked as is_required are always included in SELECT queries.
- Restrict the use of operators for each column to those specified in the operators list, particularly when constructing WHERE clauses or other conditional expressions.

Potential Challenges and Considerations

Compatibility: Care must be taken to ensure that these enhancements do not introduce breaking changes for existing users of the library.
Validation: Implementing robust validation for the operators field to ensure only valid SQL operators are included.
Performance: Assessing the impact of these changes on query generation performance, especially in complex queries involving multiple constrained columns.

Request for Feedback

Feedback on the proposed enhancements, including any potential issues, alternative approaches, or considerations that might improve the implementation and utility of these new fields, is highly appreciated. Insights from developers with experience in PostgreSQL FDWs or similar technologies would be particularly valuable.

PRIMARY KEY definition on the table descriptions

dataherald/dataherald/db_scanner/sqlalchemy.py

Line 145 in 7016ac9

new_columns = [Column(col.name, col.type) for col in valid_columns]

This seems to have messed up the PRIMARY KEY definition in the create_table_ddl as Column is newly initialised with only col.name, col.type ignoring all other properties present on valid_columns Column list.

Also foreign key should be included in the ddl.

How to specify the tables to be used when generating a query?

I have many tables in my database, and I would like to be able to specify the tables that can be used in a query, while also improving query accuracy

Add support for multiple schemas

Currently DataHerald only seems to know or care about the default schema. I tried to provide some convenient reporting views in another schema, but DataHerald ignores them. (PostgreSQL)

422 error while creating database connection with MySQL 5.7

Hello,

I am testing this on my local machine with MySQL 5.7. Does it support mysql 5.7 or any specific version?

The following call fails with 422 for me. Can't make out what is the error here. Appreciate any clue.

bash-3.2$ curl -X POST 'http://localhost/api/v1/database-connections' --header 'Content-Type: application/json' --data-raw '{ "alias": "test", "use_ssh": false, "connection_uri": mysql+pymysql://<user>:<passwd>@localhost:3306/openai }'

{"detail":[{"loc":["body",55],"msg":"Expecting value: line 1 column 56 (char 55)","type":"value_error.jsondecode","ctx":{"msg":"Expecting value","doc":"{ \"alias\": \"test\", \"use_ssh\": false, \"connection_uri\": mysql+pymysql://<user>:<passwd>@localhost:3306/openai }","pos":55,"lineno":1,"colno":56}}]}

bash-3.2$

Is it possible to provide local startup project documentation?

Data sources cannot be added using the ui

I use the same parameters and can add it successfully in the api, but not in the UI.

Unhandled ERROR
Trace ID: E-4967668d-7f20-43a1-82cd-3b0f13dc73d5
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
yield
File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
resp = await self._pool.handle_async_request(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 214, in handle_async_request
raise UnsupportedProtocol(
httpcore.UnsupportedProtocol: Request URL is missing an 'http://' or 'https://' protocol.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/app/middleware/error.py", line 14, in dispatch
return await call_next(request)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 84, in call_next
raise app_exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/base.py", line 70, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 167, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/modules/db_connection/controller.py", line 108, in ac_add_db_connection
return await db_connection_service.add_db_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/modules/db_connection/service.py", line 100, in add_db_connection
response = await client.post(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1848, in post
return await self.request(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1530, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
response = await self._send_handling_auth(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
response = await self._send_handling_redirects(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
response = await self._send_single_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
response = await transport.handle_async_request(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 352, in handle_async_request
with map_httpcore_exceptions():
File "/usr/local/lib/python3.11/contextlib.py", line 155, in exit
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions

Open Source Dataherald with custom model

Hello, thank you for open sourcing this amazing project.
I would like to explore the framework in local using the opensource version. I would like to fine-tune an HuggingFace model on the golden queries and then exploit the dataherald framework to connect with the data source and use the endpoints.
Is this possible with the open source version using a docker app?

Thank you for your help!

How to get `table_description_id` to add columns descriptions?

Hello,

Nice project! I played around with this but wonder that how we can get table_description_id to add to this endpoint PATCH /api/v1/table-descriptions/{table_description_id}? I see nowhere in the code that add table_description_id in table_descriptions collection.

Thanks.

After redeploying DH, fewshot_examples_retriever doesn't work

Recently I started to experience this scenario after redeploying DH (both DH/mongo containers).

> Entering new AgentExecutor chain...
Action: fewshot_examples_retriever
Action Input: << input question >>
Observation: fewshot_examples_retriever is not a valid tool, try one of [sql_db_query, get_admin_instructions, get_current_datetime, db_tables_with_relevance_scores, db_relevant_tables_schema, db_relevant_columns_info, db_column_entity_checker].

But if after re-deploy, I refresh all the golden-records (i.e. delete/upload via API), it works again.
I'm using the default VECTOR_STORE and CONTEXT_STORE.

This didn't happen before.
Can't figure out why this is happening yet.

Pymongo Authentication Error

I followed the docker setup guidelines and did not make any changes to .env besides the required keys.
Still getting the authentication error even while doing get request for database-connections:

pymongo.errors.OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}

Enhance Error Handling for Invalid Response Type in Golden SQL Addition

Issue Description:

Currently, when attempting to add golden SQLs for few-shot examples with an incorrect database connection ID, the application throws a lengthy error traceback. To improve user experience and error handling, this issue proposes implementing exception handling in the init.py file of the FastAPI module.

Error we are currently getting as response:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/dataherald/server/fastapi/__init__.py", line 536, in add_golden_sqls
    golden_sqls_as_dicts = [record.dict() for record in created_records]
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'JSONResponse' object is not iterable

Proposed Solution:

Update add_golden_sqls method in dataherald\server\fastapi\ _ init _.py file:

def add_golden_sqls(
    self, golden_sqls: List[GoldenSQLRequest]
) -> List[GoldenSQLResponse]:
    created_records = self._api.add_golden_sqls(golden_sqls)
    if not isinstance(created_records, list):
        response_body = created_records.body
        response_data = json.loads(response_body)
        raise HTTPException(status_code=404, detail=response_data)

    # Return a JSONResponse with status code 201 and the location header.
    golden_sqls_as_dicts = [record.dict() for record in created_records]

    return JSONResponse(
        content=golden_sqls_as_dicts, status_code=status.HTTP_201_CREATED
    )

This code block checks if created_records is not a list, indicating an unexpected response type. If so, it extracts relevant information from the response body, converts it to JSON format, and raises an HTTPException with a status code of 404 and detailed error information.

Expected Outcome:

By implementing exception handling for invalid response types, the application will provide more informative and user-friendly error messages when encountering issues related to incorrect database connection IDs.

Response after proposed solution is implemented:

{
  "detail": {
    "error_code": "database_connection_not_found",
    "message": "Database connection not found, 6603a7675723a587a67e590e",
    "description": null,
    "detail": {
      "items": [
        {
          "db_connection_id": "6603a7675723a587a17e590e",
          "prompt_text": "Random prompt",
          "sql": "sql query...................",
          "metadata": {}
        },
        {
          "db_connection_id": "6603a7675723a587a67e590e",
          "prompt_text": "Random prompt",
          "sql": "sql query ..................",
          "metadata": {}
        }
      ]
    }
  }
}

Preliminary Evaluation:

I've carried out initial assessments of the suggested modifications, tested the proposed solution using Docker as suggested in the CONTRIBUTING.md.

At present, I haven't initiated a pull request, opting instead to solicit feedback on the proposed solution as I believe there may be more efficient methods to achieve this.

More strict SQL validation

Hey there, cool project. This prompt constraint caught my eye. It seems you are soft validating the resulting SQL query, which can be risky.

I have written a library to do hard validation, HeimdaLLM. It uses a grammar to parse, validate, and potentially edit the query created from an LLM. It gives you a frontend for rigorously constraining the output query so that it can only perform safe actions. You can read more about the attack surface that it addresses here.

Is there any interest in collaborating? HeimdaLLM provides the rigorous query validation, and dataherald could provide the LLM integration? Thoughts?

Timeout scanning low cardinality unindexed column in large table

When scanning a column to determine if it's low cardinality, DataHerald runs SQL:

SELECT DISTINCT col_name FROM table_name LIMIT 200;

That query does a full table scan if the column is low cardinality and not indexed, which times out on a large table. DataHerald catches the timeout as an error and assumes the column is not low cardinality.

How to wipe out application data?

The Mongo installation is configured to store application data in the /dbdata folder. In case you want to wipe the local DB, try completely deleting /dbdata before rebuilding the databases.

bson.errors.InvalidDocument: cannot encode object: Decimal

When running a NL question on my dataset which then return a result as follow:

Action: sql_db_query
2023-08-28 17:55:37 Observation: [(Decimal('4867.6282051282051282'),)]
2023-08-28 17:55:38 Thought:The highest cost per kilogram is 4867.63.
2023-08-28 17:55:38 Final Answer: 4867.63

I am then getting the following error:

bson.errors.InvalidDocument: cannot encode object: Decimal('4867.6282051282051282'), of type: <class 'decimal.Decimal'>

Looked in the simple_evaluator and I can see the run_result is equal to [(Decimal('4867.6282051282051282'),)] and that seems to be sent to MongoDb for storage but then the error shows up.

@MohammadrezaPourreza any idea on how to fix this?

chroma vector store doesn't return similar questions (even the same question)

Hi,

I encountered the problem that the chroma vector store doesn't return exact the same question that was put into the golden records. When I deep dive into the code, I found out that when you add record into chroma collection, you don't add embeddings for that record, which become None. I think that's why it doesn't work as expected.

dataherald/dataherald/vector_store/chroma.py

Line 44 in 9e39613

target_collection.add(documents=documents, metadatas=metadata, ids=ids)

Scanning table ends with error: list index out of range

Preconditions:

newest datahelard
postgres db (I used latest docker image)
imported chinoko db from https://github.com/lerocha/chinook-database/blob/master/ChinookDatabase/DataSources/Chinook_PostgreSql.sql

Steps to reproduce
-> Import db into postgres
-> scan database table "Genre"
-> mongo table_description collection for Genre table in error_message contains "list index out of range"

The error occurs in the line 100 of dataherald/db_scanner/sqlalchemy.py in the if statement
if MIN_CATEGORY_VALUE < rs[0]["n_distinct"] <= MAX_CATEGORY_VALUE:

The rs list retrieved from
rs = db_engine.engine.execute( f"SELECT n_distinct, most_common_vals::TEXT::TEXT[] FROM pg_catalog.pg_stats WHERE tablename = '{table}' AND attname = '{column['name']}'" # noqa: S608 E501 ).fetchall()
is empty.

If I change condition to
if **len(rs) > 0 and** MIN_CATEGORY_VALUE < rs[0]["n_distinct"] <= MAX_CATEGORY_VALUE:
the database is scanned correctly

Error while using Create SQL generation API

Hi Team, I can create the MSSQL DB connection, sync all of the tables, and created the prompt.
1. Prompt creation response:

[
  {
    "id": "6627a8f877f6a6414e4dff59",
    "metadata": {},
    "created_at": "2024-04-23T12:26:32.940000+00:00",
    "text": "What are top 5 selling products?",
    "db_connection_id": "6627a45577f6a6414e4dff4f"
  }
]

2. While hitting http://localhost/api/v1/prompts/6627a8f877f6a6414e4dff59/sql-generations using following request body:
prompt_id = 6627a8f877f6a6414e4dff59

{
  "finetuning_id": "string",
  "low_latency_mode": false,
  "llm_config": {
    "llm_name": "gpt-3.5-turbo",
    "api_base": "string"
  },
  "evaluate": false,
  "sql": "string",
  "metadata": {}
}

I am getting a response as follows:

response body : 
{
  "id": "6627a93077f6a6414e4dff5a",
  "metadata": {},
  "created_at": "2024-04-23T12:27:28.028396+00:00",
  "prompt_id": "6627a8f877f6a6414e4dff59",
  "finetuning_id": null,
  "status": "INVALID",
  "completed_at": "2024-04-23T12:27:29.162903+00:00",
  "llm_config": {
    "llm_name": "gpt-3.5-turbo",
    "api_base": "string"
  },
  "intermediate_steps": null,
  "sql": "string",
  "tokens_used": 0,
  "confidence_score": null,
  "error": "(pymssql._pymssql.ProgrammingError) (2812, b\"Could not find stored procedure 'string'.DB-Lib error message 20018, severity 16:\\nGeneral SQL Server error: Check messages from the SQL Server\\n\")\n\n(Background on this error at: https://sqlalche.me/e/14/f405)"
}

How to solve this issue?

Getting error when trying to access from Swagger

When I tried to access the swagger url for Database connection (say "/api/v1/database-connections" endpoint) or any other endpoint, it's giving error as,
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 833, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/server/fastapi/init.py", line 167, in list_database_connections
return self._api.list_database_connections()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/api/fastapi.py", line 147, in list_database_connections
return db_connection_repository.find_all()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/repositories/database_connections.py", line 43, in find_all
rows = self.storage.find_all(DB_COLLECTION)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/db/mongo.py", line 48, in find_all
return list(self._data_store[collection].find({}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/cursor.py", line 1251, in next
if len(self.__data) or self._refresh():
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/cursor.py", line 1168, in _refresh
self.__send_message(q)
File "/usr/local/lib/python3.11/site-packages/pymongo/cursor.py", line 1055, in __send_message
response = client._run_operation(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/_csot.py", line 106, in csot_wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/mongo_client.py", line 1341, in _run_operation
return self._retryable_read(
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/_csot.py", line 106, in csot_wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/mongo_client.py", line 1459, in _retryable_read
with self._socket_from_server(read_pref, server, session) as (sock_info, read_pref):
File "/usr/local/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/mongo_client.py", line 1293, in _socket_from_server
with self._get_socket(server, session) as sock_info:
File "/usr/local/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/mongo_client.py", line 1228, in _get_socket
with server.get_socket(handler=err_handler) as sock_info:
File "/usr/local/lib/python3.11/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/pool.py", line 1522, in get_socket
sock_info = self._get_socket(handler=handler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/pool.py", line 1635, in _get_socket
sock_info = self.connect(handler=handler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/pool.py", line 1493, in connect
sock_info.authenticate()
File "/usr/local/lib/python3.11/site-packages/pymongo/pool.py", line 987, in authenticate
auth.authenticate(creds, self, reauthenticate=reauthenticate)
File "/usr/local/lib/python3.11/site-packages/pymongo/auth.py", line 617, in authenticate
auth_func(credentials, sock_info)
File "/usr/local/lib/python3.11/site-packages/pymongo/auth.py", line 522, in _authenticate_default
return _authenticate_scram(credentials, sock_info, "SCRAM-SHA-1")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/auth.py", line 248, in _authenticate_scram
res = sock_info.command(source, cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/helpers.py", line 279, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/pool.py", line 879, in command
return command(
^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pymongo/network.py", line 166, in command
helpers._check_command_response(
File "/usr/local/lib/python3.11/site-packages/pymongo/helpers.py", line 194, in _check_command_response
raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Authentication failed., full error: {'ok': 0.0, 'errmsg': 'Authentication failed.', 'code': 18, 'codeName': 'AuthenticationFailed'}

POST [/api/v1/question](/docs#/Question/answer_question) : Internal Server Error :

Hi hope u are below i m facing this errors when calling :
POST
/api/v1/question
parameter :
{
"db_connection_id": "XXXXXXXX",
"question": "list all YYYY"
}

Code | Details
500 Undocumented | Error: Internal Server Error :

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call
raise e
File "/usr/local/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 241, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/routing.py", line 169, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2106, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 833, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/server/fastapi/init.py", line 151, in answer_question
return self._api.answer_question(question_request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/api/fastapi.py", line 91, in answer_question
context_store = self.system.instance(ContextStore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/config.py", line 105, in instance
impl = type(self)
^^^^^^^^^^
File "/app/dataherald/context_store/default.py", line 17, in init
super().init(system)
File "/app/dataherald/context_store/init.py", line 23, in init
self.vector_store = self.system.instance(VectorStore)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/config.py", line 102, in instance
type = get_class(fqn, type)
^^^^^^^^^^^^^^^^^^^^
File "/app/dataherald/config.py", line 119, in get_class
module_name, class_name = fqn.rsplit(".", 1)
^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)

Missing Apache 2.0 `LICENSE` file referenced from `README.md`

First of all, thank you for releasing this project and sharing it with us!

The README.md says the license for this project is Apache 2.0 and links to a LICENSE file:

dataherald/README.md

Lines 15 to 17 in ab142f2

 <a href="./LICENSE" target="_blank"> 

  <img src="https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white" alt="License"> 

 </a> |

However, there's no LICENSE file in the repo as of the latest commit ab142f2:

https://github.com/Dataherald/dataherald/blob/ab142f2e03bf8b75189c92b5f530be674d376dde/LICENSE returns "404 Not found"

Could you please add the license? You can find the license text here for easy copy-pasting:

Thanks again!

Query containing markdown (```sql) causing sql_db_query action to fail and try again in a loop

I experienced the following issue after generating the sql query (Action: sql_db_query) I'd get the following error:

Model used: gpt-4-1106-preview
(note when I tried using gpt-4 instead it couldn't even create the sql properly in this case)

DH Log output
Observation: Error: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '```sql\n-- This query retrieves xxxxxx...' at line 1")
[SQL: ```sql

...

Thought:The error occurred because the SQL query was submitted with markdown code block syntax (```sql) which is not part of the actual SQL query. I need to remove the markdown syntax and resubmit the query.

It seems like it's coming from the QuerySQLDataBaseTool

And then this would loop many times until the RateLimitError was produced.
We ran out of run quota (gpt-4-1106-preview can only be run 100 times a day). I'll try to get more information tomorrow.

DH Log output
Action: sql_db_query
Action Input:

...

Observation: Error: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '```sql\n-- This query retrieves xxxxxxx...' at line 1")
[SQL: ```sql

Thought:The error occurred again because the SQL query was submitted with markdown code block syntax (```sql) which is not part of the actual SQL query. I need to remove the markdown syntax and resubmit the query without the markdown code block syntax.

Observation: Error: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '``sql\n-- This query retrieves daily crowding data for instrument 7203 for th...' at line 1")

Action: sql_db_query
Action Input:

etc., etc.,

Is there an open-source community driven frontend for dataherald ?

Hello,

I've been working on a frontend for dataherald and I was wondering if there is an open source frontend around ? If not, I'll consider rebuilding mine from scratch and make it available to the community.

Thank you for this amazing tool,

Support for duckdb

I’d love to know if you plan to support duckdb and/or if you can point me in the right direction to add support.

Support for Foreign Tables in PostgreSQL Not Currently Available

Issue Description

The current implementation of the database scanner in Dataherald does not support foreign tables in PostgreSQL databases. This limitation restricts the tool's utility in environments where databases extensively use foreign tables for cross-database queries and data integration.

Proposed Solution

I propose enhancing the get_all_tables_and_views method to include foreign tables when scanning PostgreSQL databases. This change involves checking if the database engine is PostgreSQL (psycopg2) and, if so, appending the list of foreign tables to the lists of tables and views. Additionally, the process for generating table examples and processing foreign table columns should be adjusted to handle foreign tables appropriately, by returning an empty example for foreign tables and ensuring that foreign table columns are not processed.

Here is a sketch of the proposed changes:

Enhancing get_all_tables_and_views:

@override
def get_all_tables_and_views(self, database: SQLDatabase) -> list[str]:
    inspector = inspect(database.engine)
    meta = MetaData(bind=database.engine)
    MetaData.reflect(meta, views=True)
    if database.engine.driver == "psycopg2":
        return inspector.get_table_names() + inspector.get_view_names() + inspector.get_foreign_table_names()
    return inspector.get_table_names() + inspector.get_view_names()

Adjusting table example generation for foreign tables:

# If the engine is PostgreSQL and the table is a foreign table, return an empty list.
if db_engine.engine.driver == "psycopg2" and <foreign_table_condition>:
    return []

Ensuring foreign table columns are not processed:

if db_engine.engine.driver == "psycopg2" and <foreign_table_condition>:
    # Process for skipping or handling foreign table columns

Including foreign tables in the scan method:

if db_engine.engine.driver == "psycopg2":
    tables += inspector.get_foreign_table_names()

Initial Testing

I've conducted brief testing of these proposed changes, which suggests they can effectively incorporate foreign table support into Dataherald's PostgreSQL database scanning capabilities. However, I believe there may be more efficient or robust methods to achieve this, and further testing and refinement are necessary.

I have not submitted a pull request at this time, as I'm looking for feedback on the proposed solution and any additional insights that could improve it.

Request for Feedback

I welcome feedback on the proposed solution, including any potential issues or alternative approaches that could enhance support for foreign tables in PostgreSQL databases within Dataherald. If anyone has experience with similar implementations or suggestions for refining this proposal, your insights would be highly appreciated.

Create database and tables through the API

I have a use case where I have known table structure, data types, descriptions, but no actual database to connect to. It's for an obscure AWS product that uses Athena under the hood. So the database queries would be Athena-like. It's just there is no real connection to specify that could be used to read the schema from.

I could, as a workaround, create a fake Glue catalog of the same shape, and then provide it to Dataherald.

But given the insight in #395 (API endpoints to update descriptions), I was wondering if there's a use case for being able to create the entire database and all of the tables via the API alone, without having to connect to a database?

I could then use the known table models and feed them to DH directly, without having to create fake Athena database.

Thanks.

How to provide table documentation to the model?

Let's say we have extensive docs of every column available to us.

What is the best way to provide this context to the model?

Will it introspect the database and discover these comments?

What about engines that do not support comments?

invalid literal for int() with base 10: '

I hooked up the library to a test table in BQ. Then synced it using the sync-schemas endpoint. I was able to verify that the connection was established correctly and logs were returned as well. Now without adding any context, i sent a request to generate sql - "Give me all records from X table" and ran into this error.

Custom LLM Model

Hello All,

trying to use Dataherald with mistral LLM or with the new SQLCoder

does any one achieved this before, and post some step by step guide to use custom llm models.

best

Whether the open source model is supported

I want to support the llma model, to make it private

class `SQLDatabase` in `dataherald/sql_database/base.py` does not have a constructor!

I've been trying to work with this but when I use the /api/v1/scanner endpoint, I get the error:

"Unable to connect to db: "

After doing some detective work, it turns out that the problem comes from the fact that when from_uri tries to create an instance of SQLDatabase, an exception is raised because SQLDatabase does not have a constructor.

I added a constructor and called super, but I still get the "Unable to connect..." error. (of course the error is different if you don't call super, that's how I know the initial problem was the lack of a constructor!)

Thanks

P.S.
Forgot to mention! Before calling the scanner endpoint, I did of course use the /api/v1/database endpoint to introduce my database and did get the true response.

Why is a Golden SQL query passed to the tool but not used?

for example:
in dataherald_sqlagent.py method get tools() every tool set context but not used

	<a href="./LICENSE" target="_blank">
	<img src="https://img.shields.io/static/v1?label=license&message=Apache 2.0&color=white" alt="License">
	</a> \|

dataherald / dataherald Goto Github PK

dataherald's Introduction

Dataherald monorepo

Running locally

Contributing

dataherald's People

Contributors

Stargazers

Watchers

Forkers

dataherald's Issues

Issue Description

Proposed Fields

Use Case

Proposed Implementation

Potential Challenges and Considerations

Request for Feedback

Issue Description

Proposed Solution

Initial Testing

Request for Feedback

Recommend Projects

Recommend Topics

Recommend Org