abhijithneilabraham / tableqa Goto Github PK

View Code? Open in Web Editor NEW

287.0 4.0 46.0 28.84 MB

AI Tool for querying natural language on tabular data.

License: GNU General Public License v3.0

Python 100.00%

querying-natural-language tabular-data nlp tableqa qa question-answering table-qa machine-learning sql ai

tableqa's Introduction

tableQA

AI Tool for querying natural language on tabular data.Built using QA models from transformers.

This work is described in the following paper:
TableQuery: Querying tabular data with natural language, by Abhijith Neil Abraham, Fariz Rahman and Damanpreet Kaur.
If you use TableQA, please cite the paper.

Here is a detailed blog to understand how this works.

A tabular data can be:

Dataframes
CSV files

.
.

Features

Supports detection from multiple csvs (csvs can also be read from Amazon s3)
Supports FuzzyString implementation. i.e, incomplete column values in query can be automatically detected and filled in the query.
Supports Databases - SQLite, Postgresql, MySQL, Amazon RDS (Postgresql, MySQL).
Open-Domain, No training required.
Add manual schema for customized experience
Auto-generate schemas in case schema not provided
Data visualisations.

Supported operations.

Configuration:

install via pip:

pip install tableqa

installing from source:

git clone https://github.com/abhijithneilabraham/tableQA

cd tableqa

python setup.py install

Quickstart

Do sample query

from tableqa.agent import Agent
agent=Agent(df) #input your dataframe
response=agent.query_db("Your question here")
print(response)

Get an SQL query from the question

sql=agent.get_query("Your question here")  
print(sql) #returns an sql query

Adding Manual schema

Schema Format:

{
    "name": DATABASE NAME,
    "keywords":[DATABASE KEYWORDS],
    "columns":
    [
        {
        "name": COLUMN 1 NAME,
        "mapping":{
            CATEGORY 1: [CATEGORY 1 KEYWORDS],
            CATEGORY 2: [CATEGORY 2 KEYWORDS]
        }

        },
        {
        "name": COLUMN 2 NAME,
        "keywords": [COLUMN 2 KEYWORDS]
        },
        {
        "name": "COLUMN 3 NAME",
        "keywords": [COLUMN 3 KEYWORDS],
        "summable":"True"
        }
    ]
}

Mappings are for those columns whose values have only few distinct classes.
Include only the column names which need to have manual keywords or mappings.Rest will will be autogenerated.
summable is included for Numeric Type columns whose values are already count representations. Eg. Death Count,Cases etc. consists values which already represent a count.

Example (with manual schema):

Database query

Default Database - SQLite (File-based database, does not require creation of a separate connection.)

from tableqa.agent import Agent
agent=Agent(df,schema) #pass the dataframe and schema objects
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

To use PostgreSQL, you must have a postgresql server installed and running on your local. To download postgresql, visit the page.

from tableqa.agent import Agent
agent = Agent(df, schema_file, 'postgres', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

To use MySQL, you must have a mysql server installed and running on your local. To download mysql, visit the page.

from tableqa.agent import Agent
agent = Agent(df, schema_file, 'mysql', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

To use PostgreSQL or MySQL on Amazon RDS, you must create a database on Amazon RDS. The RDS must be in public subnet with security groups allowing connections from outside of AWS.

Refer to step 1 in the document to create a mysql db instance on Amazon RDS. Same steps can be followed for creating a PostgreSQL db instance by selecting PostgreSQL in the Engine tab. Obtain the username, password, database, endpoint, and port from your database connection details on Amazon RDS.

from tableqa.agent import Agent
agent = Agent(df, schema_file, 'postgres', username='Master username', password='Master password', database='DB name', host='Endpoint', port='Port', aws_db=True)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

SQL query

sql=agent.get_query("How many people died of stomach cancer in 2011")
print(sql)
#sql query: SELECT SUM(Death_Count) FROM cancer_death WHERE Cancer_site = "Stomach" AND Year = "2011"

Multiple CSVs

Pass the absolute path of the directories containing the csvs and schemas respectively. Refer cleaned_data and schema for examples.

Example

Read CSV and Schema from local machine-

csv_path="/content/tableQA/tableqa/cleaned_data"
schema_path="/content/tableQA/tableqa/schema"
agent=Agent(csv_path,schema_path)

Read CSV and schema files from Amazon s3 -

Create a bucket on Amazon s3.
Upload objects to the bucket.
Create an IAM user and provide it access to read files from Amazon s3 storage.
Obtain the access key and secret access key for the user and pass it as an argument to the agent.

csv_path="s3://{bucket}/cleaned_data"
schema_path="s3://{bucket}/schema"
agent = Agent(csv_path, schema_path, aws_s3=True, access_key_id=access_key_id, secret_access_key=secret_access_key)

Join us

Join our workspace:Slack

tableqa's People

Contributors

Stargazers

Watchers

tableqa's Issues

Engine object error

I get this error when using tableQA and your sample code

response=agent.query_db("how many deaths of age below 40 had stomach cancer?")
print(response)

AttributeError: 'Engine' object has no attribute 'execute'

More complex Queries

The queries which use other features of SQL syntax are the next targets to be incorporated to enable better querying on a wide range of questions.

requirements, versions - compatibility issues

Hi!
Can you, please, upload requirements.txt with python packages versions that are working with your code?
For example i had to manually download Cython, but it warns me about deprecated code and throws errors:

warning: /tmp/easy_install-9_mu3wk6/numpy-1.18.0/numpy/__init__.pxd:17:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310

Error compiling Cython file:
------------------------------------------------------------
...
    def __init__(self, seed=None):
        BitGenerator.__init__(self, seed)
        self.rng_state.pcg_state = &self.pcg64_random_state

        self._bitgen.state = <void *>&self.rng_state
        self._bitgen.next_uint64 = &pcg64_uint64
                                   ^
------------------------------------------------------------

_pcg64.pyx:113:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.

I'm afraid that, with repository this old, i can expect many more compatibility issues.

Using pip:
pip install tableqa
also is not working.

(PYL-R1718) Consider using a set comprehension

Description

Although there is nothing syntactically wrong with this code, it is hard to read and can be simplified to a set comprehension. Also it is faster since there is no need to create another transient list.

Occurrences

There are 5 occurrences of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/abhijithneilabraham/tableQA/issue/PYL-R1718/occurrences/

Use of Schema Format

Schema Format:
{
"name": DATABASE NAME,
"keywords":[DATABASE KEYWORDS],
"columns":
[
{
"name": COLUMN 1 NAME,
"mapping":{
CATEGORY 1: [CATEGORY 1 KEYWORDS],
CATEGORY 2: [CATEGORY 2 KEYWORDS]
}

    },
    {
    "name": COLUMN 2 NAME,
    "keywords": [COLUMN 2 KEYWORDS]
    },
    {
    "name": "COLUMN 3 NAME",
    "keywords": [COLUMN 3 KEYWORDS],
    "summable":"True"
    }
]

}

Hi, I ould like to check how to utilize the Schema Format.
Could you kindly provide an example on how to use them?
Thanks a ton in advance!!
It would be most gracefully appreciated!

Multi Language Support

Hello there! @abhijithneilabraham

I am fascinated by this project and i would like to adapt it for different languages.

Could you provide the original dataset used for the Question_classifier.h5 model in order to allow the translation of it?

Thanks!!

P.S. I actually tried to contact you also on your personal mail!

Not getting results as aspected?

@abhijithneilabraham thanks a lot for your beautiful work, i have some questions,

What is the efficient way to create the schema, like we cannot analyze the data every time, and create the schema for that?
I tried with some custom datasets its performing well but after passing the schema , it is not recognizing some of the columns?
Can I fine tune your model?

Thanks in advance!

Not able to run on colab

When I am trying to run it on colab, getting the following error:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tableqa
Using cached tableqa-0.0.10-py3-none-any.whl (930 kB)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.7.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from tableqa) (1.5.3)
Collecting transformers[tf-cpu]==3.0.2
Using cached transformers-3.0.2-py3-none-any.whl (769 kB)
Collecting rake-nltk
Using cached rake_nltk-1.0.6-py3-none-any.whl (9.1 kB)
Collecting graphql-core==2.3
Using cached graphql_core-2.3-py2.py3-none-any.whl (251 kB)
Collecting responder
Using cached responder-2.0.7-py3-none-any.whl (24 kB)
Collecting graphene==2.1.8
Using cached graphene-2.1.8-py2.py3-none-any.whl (107 kB)
Requirement already satisfied: tensorflow-hub in /usr/local/lib/python3.10/dist-packages (from tableqa) (0.13.0)
Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.10/dist-packages (from tableqa) (2.0.10)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.8.1)
Collecting graphql-relay<3,>=2
Using cached graphql_relay-2.0.1-py3-none-any.whl (20 kB)
Requirement already satisfied: six<2,>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from graphene==2.1.8->tableqa) (1.16.0)
Collecting aniso8601<=7,>=3
Using cached aniso8601-7.0.0-py2.py3-none-any.whl (42 kB)
Collecting rx<2,>=1.6
Using cached Rx-1.6.3-py2.py3-none-any.whl
Requirement already satisfied: promise<3,>=2.3 in /usr/local/lib/python3.10/dist-packages (from graphql-core==2.3->tableqa) (2.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (3.12.0)
Collecting sentencepiece!=0.1.92
Using cached sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (23.1)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (4.65.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2.27.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (1.22.4)
Collecting tokenizers==0.8.1.rc1
Using cached tokenizers-0.8.1rc1.tar.gz (97 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting sacremoses
Using cached sacremoses-0.0.53-py3-none-any.whl
Collecting keras2onnx
Using cached keras2onnx-1.7.0-py3-none-any.whl (96 kB)
Collecting tensorflow-cpu
Using cached tensorflow_cpu-2.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231.8 MB)
Collecting onnxconverter-common
Using cached onnxconverter_common-1.13.0-py2.py3-none-any.whl (83 kB)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (4.39.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.4.4)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (1.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (8.1.3)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->tableqa) (2022.7.1)
Collecting python-multipart
Using cached python_multipart-0.0.6-py3-none-any.whl (45 kB)
Collecting graphql-server-core>=1.1
Using cached graphql_server_core-2.0.0-py2.py3-none-any.whl
Collecting uvicorn[standard]<0.13.3,>=0.12.0
Using cached uvicorn-0.13.2-py3-none-any.whl (45 kB)
Collecting marshmallow
Using cached marshmallow-3.19.0-py3-none-any.whl (49 kB)
Collecting starlette==0.13.*
Using cached starlette-0.13.8-py3-none-any.whl (60 kB)
Collecting apispec>=1.0.0b1
Using cached apispec-6.3.0-py3-none-any.whl (29 kB)
Collecting docopt
Using cached docopt-0.6.2-py2.py3-none-any.whl
Collecting requests-toolbelt
Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Collecting apistar
Using cached apistar-0.7.2-py3-none-any.whl
Requirement already satisfied: itsdangerous in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (2.1.2)
Requirement already satisfied: chardet in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (4.0.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (6.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (3.1.2)
Collecting rfc3986
Using cached rfc3986-2.0.0-py2.py3-none-any.whl (31 kB)
Collecting whitenoise
Using cached whitenoise-6.4.0-py3-none-any.whl (19 kB)
Collecting aiofiles
Using cached aiofiles-23.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (2.0.2)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (4.5.0)
Requirement already satisfied: protobuf>=3.19.6 in /usr/local/lib/python3.10/dist-packages (from tensorflow-hub->tableqa) (3.20.3)
Collecting click
Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting h11>=0.8
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting websockets==8.*
Using cached websockets-8.1-cp310-cp310-linux_x86_64.whl
Collecting python-dotenv>=0.13.*
Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting httptools==0.1.*
Using cached httptools-0.1.2-cp310-cp310-linux_x86_64.whl
Collecting uvloop>=0.14.0
Using cached uvloop-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
Collecting watchgod<0.7,>=0.6
Using cached watchgod-0.6-py35.py36.py37-none-any.whl (10 kB)
Collecting typesystem
Using cached typesystem-0.4.1-py3-none-any.whl (28 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->responder->tableqa) (2.1.2)
Collecting onnx
Using cached onnx-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
Collecting fire
Using cached fire-0.5.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (1.26.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (3.4)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2022.12.7)
Requirement already satisfied: gast<=0.4.0,>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.0)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.4.0)
Requirement already satisfied: jax>=0.3.15 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.8)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.3.0)
Requirement already satisfied: tensorflow-estimator<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (16.0.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.54.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (67.7.2)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.8.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.6.3)
Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (23.3.3)
Requirement already satisfied: tensorboard<2.13,>=2.12 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.2)
Requirement already satisfied: keras<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.32.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.14.1)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.0->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.40.0)
Requirement already satisfied: scipy>=1.7 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.10.1)
Requirement already satisfied: ml-dtypes>=0.0.3 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.1.0)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.8.1)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.7.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.4.3)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.0.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.17.3)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.3.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.3.1)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.2.2)
Building wheels for collected packages: tokenizers
error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Can anybody help me? Thanks in advance.

Suggestion to include summary feature

Can tableQA have a parameter passed that generates the summary for the data set / gives useful insights about the dataset.

Datatype error while processing query from Postgresql

Hi, have added Postgresql connection and created dataframe which has below datatype:
object_id int64
discovery_id int64
schema object
object_type object
object_name object
no_of_rows int64
table_size_mb int64
lines_of_code int64
object_created_datetime datetime64[ns, UTC]
created_at datetime64[ns, UTC]
modified_at datetime64[ns, UTC]

while running Agent, I am getting below error:
KeyError: 'datetime64_ns_utc_'

Could you please help to resolve this?

Support for data visualisations

The output from the database could be used for generating pie charts, bar graphs, etc using pandas or any visualisation libraries. Add features to enable the same.

Support for complex queries like sub-query and joins

Does it support joins and complex queries?

Can we use elastic search?

Hi I am curious to know that whether we can store the data in elastic search indices?

how to use this model in chinese model?

hello! how to use this model in chinese model? By changing the pretrained bert class in nlp.py? It doesn't help...

qa_model = TFBertForQuestionAnswering.from_pretrained('bert-base-chinese')
qa_tokenizer = BertTokenizer.from_pretrained('bert-base-chinese',padding=True)

Optimise logger

Currently, the logger is just print statements, use the built-in logging module in python and remove the need of Hide_logs() class.

Entry point is at verbose parameter for query_db().

@abhijitramesh Kindly take this over

Can you give me some suggestions about fine tune squad models in your project ?

As before discussion, i have try to replace the function in other language.
And i think about fine tuning the squad model you use to extract condition string
from question input (as the code says, you use colquery (construct by keyword and use “which”
or “number of” as question word) as question and truly input question as document to extract.
If one want to have a better inference on this, should have a fine tuning on its dataset.
So can you give me some suggestions about labeling myself datasets for fine tuning ?
I think if i always use your colquery construction to construct my squad dataset may be too plain.

* Support more dataframes

-Currently the feature supports csv files only. However, integrating more dataframes is easy. Go through the get_dataframe() method in data_utils.py and include support to detect the incoming file and parse the dataframe from it.

Failed Building Wheel for tableQA while installing dependencies like numpy and tokenizer

On Mac M1,
installation failing for tokenizers==0.8.1.rc1 and numpy==1.18.0 for tableqa 0.0.11 due to failed building wheel.

Multiple issues occurs while installing in local pc. If google collab used, it usually gets installed without any errors.

Setup pipfile and piplock

Segmentation fault (core dumped) error

Hi everyone,
I am using 3.8 python and while I am running test, it gives me Segmentation fault (core dumped) error. pls help :)

Integration with LangChain?

Hi! This is such a cool project - wondering if you've thought at all about integrating with LangChain or writing an adapter? Is there any reason why this has to be used with BERT? Or could it also work with other models?

Issue while installing "pip install tableqa" with tokenizers esepcially build on tokenizers==0.8.1.rc1

  error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib` failed with code 101
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers

Error when running colab

I am running the colab code in local using python 3.8.18, and it seems like I have successfully downloaded all the package, but when I run "agent.query_db("how many deaths of age below 40 had stomach cancer?")", it shows this error. Strangely, I can run this line "agent.get_query("how many deaths of age below 40 had stomach cancer?")"

TypeError Traceback (most recent call last)
c:\Users\super\OneDrive\Desktop\research\tableQA.ipynb Cell 5 line 1
----> 1 agent.query_db("how many deaths of age below 40 had stomach cancer?")

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\agent.py:72, in Agent.query_db(self, question, verbose, chart, size)
70 database = Database(self.data_dir, self.schema_dir)
71 create_db = getattr(database, self.db_type)
---> 72 engine = create_db(question)
73 answer = engine.execute(query).fetchall()
74 if chart is not None:

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\database.py:26, in Database.sqlite(self, question)
24 data_frame=self.data_process.get_dataframe(csv).astype(str)
25 schema=self.data_process.get_schema_for_csv(csv)
---> 26 data_frame = data_frame.fillna(data_frame.mean())
27 sql_schema = {}
28 for col in schema['columns']:

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations..mean(self, axis, skipna, numeric_only, **kwargs)
11539 @doc(
11540 _num_doc,
11541 desc="Return the mean of the values over the requested axis.",
(...)
11554 **kwargs,
...
46 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
47 initial=_NoValue, where=True):
---> 48 return umr_sum(a, axis, dtype, out, keepdims, initial, where)

TypeError: can only concatenate str (not "int") to str

Support for distinct

Need a distinct=true parameter for get_query and query_db inside agent.py. This means the SQL query will be SELECT DISTINCT(Something) FROM table . Refer the Clause.adapt in clause.py to see how this can be done.

Using own data

Hello,
Thank you for providing your code! I'm very new to this and tried using my own data by simply feeding a CSV file into the pandas dataframe in the sample.ipynb.
This doesn't seem to work. When I try to ask a question with agent.query_db it throws me an OperationalError.
What are the necessary steps to try the code on own CSV files?
Thank you in advance.

About train data of clf.py

Hi,
I review the code , and have two questions want to ask.
One is about the wikidata.csv trained in clf.py
The samples mainly from english and also have some tiny data from other languages, such as japanese.
And have some samples use upper case.
My question is that, this data used to train a classifier about the question meaning format of sql.
the tensorflow hub model also trained on english only. and the question input seems always in english.
Why you use some multilingual input samples and use some upper case transformations.
This seems like your want to make the clf can also use in multilingual input ,
and want it to adapt with uppercase as some sql input .
If you want do it, why you do not use multilingual embedding in tf hub ? extend the sample by
some NMT translations and apply case transformations as sample augmentation methods ?

Second is that because the project construction mainly above on trained models and can use without
training. the lexicon parse of question input to identify intention mainly related with some custom (pre-assigned)
keywords defined in adapt methods of some inherent class of ColumnType (such as Number and Date)
so this project is mainly focus on simple input questions in lexicon.
So if i used it in question from other languages (such as chinese or japanese), It seems that can use some simple
NMT model to translate from these language into english and use your model. without replace the keywords
defined in above adapt methods. (because the lexicon is simple in question input, so the translated question
should be will formed or formatted)
As we all know, the schema or column name defined in database table or pandas dataframe may always in
english, And the table content may from other languages.
In this situation, i must have a choice in the translation the other language content. If i also translate content into
english, this seems like works. If i don’t, the qa function defined in your nlp.py should use multilingual squad transformers (some roberta model)

All i want to do is to adapt this project from only english tableQA to multilingual tableQA.
Because the input question and table dataset is simple in lexicon, dose this feature will support
in the project in the future ?