abhijithneilabraham / tableqa Goto Github PK

AI Tool for querying natural language on tabular data.

License: GNU General Public License v3.0

Python 100.00%

querying-natural-language tabular-data nlp tableqa qa question-answering table-qa machine-learning sql ai

tableqa's Issues

Use of Schema Format

Schema Format:
{
"name": DATABASE NAME,
"keywords":[DATABASE KEYWORDS],
"columns":
[
{
"name": COLUMN 1 NAME,
"mapping":{
CATEGORY 1: [CATEGORY 1 KEYWORDS],
CATEGORY 2: [CATEGORY 2 KEYWORDS]
}

    },
    {
    "name": COLUMN 2 NAME,
    "keywords": [COLUMN 2 KEYWORDS]
    },
    {
    "name": "COLUMN 3 NAME",
    "keywords": [COLUMN 3 KEYWORDS],
    "summable":"True"
    }
]

}

Hi, I ould like to check how to utilize the Schema Format.
Could you kindly provide an example on how to use them?
Thanks a ton in advance!!
It would be most gracefully appreciated!

Add support for Week, Day and Month

Class Year in column_types.py contains a feature to convert "this year", "last year",etc into numericals using datetime. Similarly, add support for day, week month etc.

Failed Building Wheel for tableQA while installing dependencies like numpy and tokenizer

On Mac M1,
installation failing for tokenizers==0.8.1.rc1 and numpy==1.18.0 for tableqa 0.0.11 due to failed building wheel.

Multiple issues occurs while installing in local pc. If google collab used, it usually gets installed without any errors.

Can you give me some suggestions about fine tune squad models in your project ?

As before discussion, i have try to replace the function in other language.
And i think about fine tuning the squad model you use to extract condition string
from question input (as the code says, you use colquery (construct by keyword and use “which”
or “number of” as question word) as question and truly input question as document to extract.
If one want to have a better inference on this, should have a fine tuning on its dataset.
So can you give me some suggestions about labeling myself datasets for fine tuning ?
I think if i always use your colquery construction to construct my squad dataset may be too plain.

Documentation Team call

We're setting up a documentation page for TableQA.Join our slack channel and ping me to discuss more.

Setup pipfile and piplock

Support for complex queries like sub-query and joins

Does it support joins and complex queries?

Optimise logger

Currently, the logger is just print statements, use the built-in logging module in python and remove the need of Hide_logs() class.

Entry point is at verbose parameter for query_db().

@abhijitramesh Kindly take this over

requirements, versions - compatibility issues

Hi!
Can you, please, upload requirements.txt with python packages versions that are working with your code?
For example i had to manually download Cython, but it warns me about deprecated code and throws errors:

warning: /tmp/easy_install-9_mu3wk6/numpy-1.18.0/numpy/__init__.pxd:17:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310

Error compiling Cython file:
------------------------------------------------------------
...
    def __init__(self, seed=None):
        BitGenerator.__init__(self, seed)
        self.rng_state.pcg_state = &self.pcg64_random_state

        self._bitgen.state = <void *>&self.rng_state
        self._bitgen.next_uint64 = &pcg64_uint64
                                   ^
------------------------------------------------------------

_pcg64.pyx:113:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.

I'm afraid that, with repository this old, i can expect many more compatibility issues.

Using pip:
pip install tableqa
also is not working.

Suggestion to include summary feature

Can tableQA have a parameter passed that generates the summary for the data set / gives useful insights about the dataset.

Engine object error

I get this error when using tableQA and your sample code

response=agent.query_db("how many deaths of age below 40 had stomach cancer?")
print(response)

AttributeError: 'Engine' object has no attribute 'execute'

Not able to run on colab

When I am trying to run it on colab, getting the following error:

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tableqa
Using cached tableqa-0.0.10-py3-none-any.whl (930 kB)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.7.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from tableqa) (1.5.3)
Collecting transformers[tf-cpu]==3.0.2
Using cached transformers-3.0.2-py3-none-any.whl (769 kB)
Collecting rake-nltk
Using cached rake_nltk-1.0.6-py3-none-any.whl (9.1 kB)
Collecting graphql-core==2.3
Using cached graphql_core-2.3-py2.py3-none-any.whl (251 kB)
Collecting responder
Using cached responder-2.0.7-py3-none-any.whl (24 kB)
Collecting graphene==2.1.8
Using cached graphene-2.1.8-py2.py3-none-any.whl (107 kB)
Requirement already satisfied: tensorflow-hub in /usr/local/lib/python3.10/dist-packages (from tableqa) (0.13.0)
Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.10/dist-packages (from tableqa) (2.0.10)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.8.1)
Collecting graphql-relay<3,>=2
Using cached graphql_relay-2.0.1-py3-none-any.whl (20 kB)
Requirement already satisfied: six<2,>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from graphene==2.1.8->tableqa) (1.16.0)
Collecting aniso8601<=7,>=3
Using cached aniso8601-7.0.0-py2.py3-none-any.whl (42 kB)
Collecting rx<2,>=1.6
Using cached Rx-1.6.3-py2.py3-none-any.whl
Requirement already satisfied: promise<3,>=2.3 in /usr/local/lib/python3.10/dist-packages (from graphql-core==2.3->tableqa) (2.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (3.12.0)
Collecting sentencepiece!=0.1.92
Using cached sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (23.1)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (4.65.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2.27.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (1.22.4)
Collecting tokenizers==0.8.1.rc1
Using cached tokenizers-0.8.1rc1.tar.gz (97 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting sacremoses
Using cached sacremoses-0.0.53-py3-none-any.whl
Collecting keras2onnx
Using cached keras2onnx-1.7.0-py3-none-any.whl (96 kB)
Collecting tensorflow-cpu
Using cached tensorflow_cpu-2.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231.8 MB)
Collecting onnxconverter-common
Using cached onnxconverter_common-1.13.0-py2.py3-none-any.whl (83 kB)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (4.39.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.4.4)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (1.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (8.1.3)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->tableqa) (2022.7.1)
Collecting python-multipart
Using cached python_multipart-0.0.6-py3-none-any.whl (45 kB)
Collecting graphql-server-core>=1.1
Using cached graphql_server_core-2.0.0-py2.py3-none-any.whl
Collecting uvicorn[standard]<0.13.3,>=0.12.0
Using cached uvicorn-0.13.2-py3-none-any.whl (45 kB)
Collecting marshmallow
Using cached marshmallow-3.19.0-py3-none-any.whl (49 kB)
Collecting starlette==0.13.*
Using cached starlette-0.13.8-py3-none-any.whl (60 kB)
Collecting apispec>=1.0.0b1
Using cached apispec-6.3.0-py3-none-any.whl (29 kB)
Collecting docopt
Using cached docopt-0.6.2-py2.py3-none-any.whl
Collecting requests-toolbelt
Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Collecting apistar
Using cached apistar-0.7.2-py3-none-any.whl
Requirement already satisfied: itsdangerous in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (2.1.2)
Requirement already satisfied: chardet in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (4.0.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (6.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (3.1.2)
Collecting rfc3986
Using cached rfc3986-2.0.0-py2.py3-none-any.whl (31 kB)
Collecting whitenoise
Using cached whitenoise-6.4.0-py3-none-any.whl (19 kB)
Collecting aiofiles
Using cached aiofiles-23.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (2.0.2)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (4.5.0)
Requirement already satisfied: protobuf>=3.19.6 in /usr/local/lib/python3.10/dist-packages (from tensorflow-hub->tableqa) (3.20.3)
Collecting click
Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting h11>=0.8
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting websockets==8.*
Using cached websockets-8.1-cp310-cp310-linux_x86_64.whl
Collecting python-dotenv>=0.13.*
Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting httptools==0.1.*
Using cached httptools-0.1.2-cp310-cp310-linux_x86_64.whl
Collecting uvloop>=0.14.0
Using cached uvloop-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
Collecting watchgod<0.7,>=0.6
Using cached watchgod-0.6-py35.py36.py37-none-any.whl (10 kB)
Collecting typesystem
Using cached typesystem-0.4.1-py3-none-any.whl (28 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->responder->tableqa) (2.1.2)
Collecting onnx
Using cached onnx-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
Collecting fire
Using cached fire-0.5.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (1.26.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (3.4)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2022.12.7)
Requirement already satisfied: gast<=0.4.0,>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.0)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.4.0)
Requirement already satisfied: jax>=0.3.15 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.8)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.3.0)
Requirement already satisfied: tensorflow-estimator<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (16.0.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.54.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (67.7.2)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.8.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.6.3)
Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (23.3.3)
Requirement already satisfied: tensorboard<2.13,>=2.12 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.2)
Requirement already satisfied: keras<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.32.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.14.1)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.0->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.40.0)
Requirement already satisfied: scipy>=1.7 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.10.1)
Requirement already satisfied: ml-dtypes>=0.0.3 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.1.0)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.8.1)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.7.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.4.3)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.0.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.17.3)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.3.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.3.1)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.2.2)
Building wheels for collected packages: tokenizers
error: subprocess-exited-with-error

× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

Can anybody help me? Thanks in advance.

Using own data

Hello,
Thank you for providing your code! I'm very new to this and tried using my own data by simply feeding a CSV file into the pandas dataframe in the sample.ipynb.
This doesn't seem to work. When I try to ask a question with agent.query_db it throws me an OperationalError.
What are the necessary steps to try the code on own CSV files?
Thank you in advance.

Can we use elastic search?

Hi I am curious to know that whether we can store the data in elastic search indices?

Segmentation fault (core dumped) error

Hi everyone,
I am using 3.8 python and while I am running test, it gives me Segmentation fault (core dumped) error. pls help :)

Error when running colab

I am running the colab code in local using python 3.8.18, and it seems like I have successfully downloaded all the package, but when I run "agent.query_db("how many deaths of age below 40 had stomach cancer?")", it shows this error. Strangely, I can run this line "agent.get_query("how many deaths of age below 40 had stomach cancer?")"

TypeError Traceback (most recent call last)
c:\Users\super\OneDrive\Desktop\research\tableQA.ipynb Cell 5 line 1
----> 1 agent.query_db("how many deaths of age below 40 had stomach cancer?")

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\agent.py:72, in Agent.query_db(self, question, verbose, chart, size)
70 database = Database(self.data_dir, self.schema_dir)
71 create_db = getattr(database, self.db_type)
---> 72 engine = create_db(question)
73 answer = engine.execute(query).fetchall()
74 if chart is not None:

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\database.py:26, in Database.sqlite(self, question)
24 data_frame=self.data_process.get_dataframe(csv).astype(str)
25 schema=self.data_process.get_schema_for_csv(csv)
---> 26 data_frame = data_frame.fillna(data_frame.mean())
27 sql_schema = {}
28 for col in schema['columns']:

File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations..mean(self, axis, skipna, numeric_only, **kwargs)
11539 @doc(
11540 _num_doc,
11541 desc="Return the mean of the values over the requested axis.",
(...)
11554 **kwargs,
...
46 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
47 initial=_NoValue, where=True):
---> 48 return umr_sum(a, axis, dtype, out, keepdims, initial, where)

TypeError: can only concatenate str (not "int") to str

Bug with Agent

Canceled future for execute_request message before replies were done
I can't run Agent()

Integration with LangChain?

Hi! This is such a cool project - wondering if you've thought at all about integrating with LangChain or writing an adapter? Is there any reason why this has to be used with BERT? Or could it also work with other models?

* Support more dataframes

-Currently the feature supports csv files only. However, integrating more dataframes is easy. Go through the get_dataframe() method in data_utils.py and include support to detect the incoming file and parse the dataframe from it.

Support for data visualisations

The output from the database could be used for generating pie charts, bar graphs, etc using pandas or any visualisation libraries. Add features to enable the same.

About train data of clf.py

Hi,
I review the code , and have two questions want to ask.
One is about the wikidata.csv trained in clf.py
The samples mainly from english and also have some tiny data from other languages, such as japanese.
And have some samples use upper case.
My question is that, this data used to train a classifier about the question meaning format of sql.
the tensorflow hub model also trained on english only. and the question input seems always in english.
Why you use some multilingual input samples and use some upper case transformations.
This seems like your want to make the clf can also use in multilingual input ,
and want it to adapt with uppercase as some sql input .
If you want do it, why you do not use multilingual embedding in tf hub ? extend the sample by
some NMT translations and apply case transformations as sample augmentation methods ?

Second is that because the project construction mainly above on trained models and can use without
training. the lexicon parse of question input to identify intention mainly related with some custom (pre-assigned)
keywords defined in adapt methods of some inherent class of ColumnType (such as Number and Date)
so this project is mainly focus on simple input questions in lexicon.
So if i used it in question from other languages (such as chinese or japanese), It seems that can use some simple
NMT model to translate from these language into english and use your model. without replace the keywords
defined in above adapt methods. (because the lexicon is simple in question input, so the translated question
should be will formed or formatted)
As we all know, the schema or column name defined in database table or pandas dataframe may always in
english, And the table content may from other languages.
In this situation, i must have a choice in the translation the other language content. If i also translate content into
english, this seems like works. If i don’t, the qa function defined in your nlp.py should use multilingual squad transformers (some roberta model)

All i want to do is to adapt this project from only english tableQA to multilingual tableQA.
Because the input question and table dataset is simple in lexicon, dose this feature will support
in the project in the future ?

More complex Queries

The queries which use other features of SQL syntax are the next targets to be incorporated to enable better querying on a wide range of questions.

Support more Databases

Currently the project supports queries of sqlite format only(See database.py) . However, integrating other databases like MySQL and postgreSQL could be done by mirroring the work similar to sqlite.

(PYL-R1718) Consider using a set comprehension

Description

Although there is nothing syntactically wrong with this code, it is hard to read and can be simplified to a set comprehension. Also it is faster since there is no need to create another transient list.

Occurrences

There are 5 occurrences of this issue in the repository.

See all occurrences on DeepSource → deepsource.io/gh/abhijithneilabraham/tableQA/issue/PYL-R1718/occurrences/

how to use this model in chinese model?

hello! how to use this model in chinese model? By changing the pretrained bert class in nlp.py? It doesn't help...

qa_model = TFBertForQuestionAnswering.from_pretrained('bert-base-chinese')
qa_tokenizer = BertTokenizer.from_pretrained('bert-base-chinese',padding=True)

Not getting results as aspected?

@abhijithneilabraham thanks a lot for your beautiful work, i have some questions,

What is the efficient way to create the schema, like we cannot analyze the data every time, and create the schema for that?
I tried with some custom datasets its performing well but after passing the schema , it is not recognizing some of the columns?
Can I fine tune your model?

Thanks in advance!

Support for distinct

Need a distinct=true parameter for get_query and query_db inside agent.py. This means the SQL query will be SELECT DISTINCT(Something) FROM table . Refer the Clause.adapt in clause.py to see how this can be done.

Multi Language Support

Hello there! @abhijithneilabraham

I am fascinated by this project and i would like to adapt it for different languages.

Could you provide the original dataset used for the Question_classifier.h5 model in order to allow the translation of it?

Thanks!!

P.S. I actually tried to contact you also on your personal mail!

Issue while installing "pip install tableqa" with tokenizers esepcially build on tokenizers==0.8.1.rc1

  error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib` failed with code 101
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers

Datatype error while processing query from Postgresql

Hi, have added Postgresql connection and created dataframe which has below datatype:
object_id int64
discovery_id int64
schema object
object_type object
object_name object
no_of_rows int64
table_size_mb int64
lines_of_code int64
object_created_datetime datetime64[ns, UTC]
created_at datetime64[ns, UTC]
modified_at datetime64[ns, UTC]

while running Agent, I am getting below error:
KeyError: 'datetime64_ns_utc_'

Could you please help to resolve this?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.