abhijithneilabraham / tableqa Goto Github PK
View Code? Open in Web Editor NEWAI Tool for querying natural language on tabular data.
License: GNU General Public License v3.0
AI Tool for querying natural language on tabular data.
License: GNU General Public License v3.0
Schema Format:
{
"name": DATABASE NAME,
"keywords":[DATABASE KEYWORDS],
"columns":
[
{
"name": COLUMN 1 NAME,
"mapping":{
CATEGORY 1: [CATEGORY 1 KEYWORDS],
CATEGORY 2: [CATEGORY 2 KEYWORDS]
}
},
{
"name": COLUMN 2 NAME,
"keywords": [COLUMN 2 KEYWORDS]
},
{
"name": "COLUMN 3 NAME",
"keywords": [COLUMN 3 KEYWORDS],
"summable":"True"
}
]
}
Hi, I ould like to check how to utilize the Schema Format.
Could you kindly provide an example on how to use them?
Thanks a ton in advance!!
It would be most gracefully appreciated!
Class Year
in column_types.py
contains a feature to convert "this year", "last year",etc into numericals using datetime
. Similarly, add support for day, week month etc.
On Mac M1,
installation failing for tokenizers==0.8.1.rc1 and numpy==1.18.0 for tableqa 0.0.11 due to failed building wheel.
Multiple issues occurs while installing in local pc. If google collab used, it usually gets installed without any errors.
As before discussion, i have try to replace the function in other language.
And i think about fine tuning the squad model you use to extract condition string
from question input (as the code says, you use colquery (construct by keyword and use “which”
or “number of” as question word) as question and truly input question as document to extract.
If one want to have a better inference on this, should have a fine tuning on its dataset.
So can you give me some suggestions about labeling myself datasets for fine tuning ?
I think if i always use your colquery construction to construct my squad dataset may be too plain.
We're setting up a documentation page for TableQA.Join our slack channel and ping me to discuss more.
Does it support joins and complex queries?
Currently, the logger is just print statements, use the built-in logging
module in python and remove the need of Hide_logs()
class.
Entry point is at verbose
parameter for query_db()
.
@abhijitramesh Kindly take this over
Hi!
Can you, please, upload requirements.txt with python packages versions that are working with your code?
For example i had to manually download Cython, but it warns me about deprecated code and throws errors:
warning: /tmp/easy_install-9_mu3wk6/numpy-1.18.0/numpy/__init__.pxd:17:0: The 'DEF' statement is deprecated and will be removed in a future Cython version. Consider using global variables, constants, and in-place literals instead. See https://github.com/cython/cython/issues/4310
Error compiling Cython file:
------------------------------------------------------------
...
def __init__(self, seed=None):
BitGenerator.__init__(self, seed)
self.rng_state.pcg_state = &self.pcg64_random_state
self._bitgen.state = <void *>&self.rng_state
self._bitgen.next_uint64 = &pcg64_uint64
^
------------------------------------------------------------
_pcg64.pyx:113:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.
I'm afraid that, with repository this old, i can expect many more compatibility issues.
Using pip:
pip install tableqa
also is not working.
Can tableQA have a parameter passed that generates the summary for the data set / gives useful insights about the dataset.
I get this error when using tableQA and your sample code
response=agent.query_db("how many deaths of age below 40 had stomach cancer?")
print(response)
AttributeError: 'Engine' object has no attribute 'execute'
When I am trying to run it on colab, getting the following error:
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tableqa
Using cached tableqa-0.0.10-py3-none-any.whl (930 kB)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.7.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from tableqa) (1.5.3)
Collecting transformers[tf-cpu]==3.0.2
Using cached transformers-3.0.2-py3-none-any.whl (769 kB)
Collecting rake-nltk
Using cached rake_nltk-1.0.6-py3-none-any.whl (9.1 kB)
Collecting graphql-core==2.3
Using cached graphql_core-2.3-py2.py3-none-any.whl (251 kB)
Collecting responder
Using cached responder-2.0.7-py3-none-any.whl (24 kB)
Collecting graphene==2.1.8
Using cached graphene-2.1.8-py2.py3-none-any.whl (107 kB)
Requirement already satisfied: tensorflow-hub in /usr/local/lib/python3.10/dist-packages (from tableqa) (0.13.0)
Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.10/dist-packages (from tableqa) (2.0.10)
Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from tableqa) (3.8.1)
Collecting graphql-relay<3,>=2
Using cached graphql_relay-2.0.1-py3-none-any.whl (20 kB)
Requirement already satisfied: six<2,>=1.10.0 in /usr/local/lib/python3.10/dist-packages (from graphene==2.1.8->tableqa) (1.16.0)
Collecting aniso8601<=7,>=3
Using cached aniso8601-7.0.0-py2.py3-none-any.whl (42 kB)
Collecting rx<2,>=1.6
Using cached Rx-1.6.3-py2.py3-none-any.whl
Requirement already satisfied: promise<3,>=2.3 in /usr/local/lib/python3.10/dist-packages (from graphql-core==2.3->tableqa) (2.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (3.12.0)
Collecting sentencepiece!=0.1.92
Using cached sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (23.1)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (4.65.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (2.27.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from transformers[tf-cpu]==3.0.2->tableqa) (1.22.4)
Collecting tokenizers==0.8.1.rc1
Using cached tokenizers-0.8.1rc1.tar.gz (97 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting sacremoses
Using cached sacremoses-0.0.53-py3-none-any.whl
Collecting keras2onnx
Using cached keras2onnx-1.7.0-py3-none-any.whl (96 kB)
Collecting tensorflow-cpu
Using cached tensorflow_cpu-2.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231.8 MB)
Collecting onnxconverter-common
Using cached onnxconverter_common-1.13.0-py2.py3-none-any.whl (83 kB)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.0.7)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (3.0.9)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (4.39.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (2.8.2)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (8.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->tableqa) (1.4.4)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (1.2.0)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->tableqa) (8.1.3)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->tableqa) (2022.7.1)
Collecting python-multipart
Using cached python_multipart-0.0.6-py3-none-any.whl (45 kB)
Collecting graphql-server-core>=1.1
Using cached graphql_server_core-2.0.0-py2.py3-none-any.whl
Collecting uvicorn[standard]<0.13.3,>=0.12.0
Using cached uvicorn-0.13.2-py3-none-any.whl (45 kB)
Collecting marshmallow
Using cached marshmallow-3.19.0-py3-none-any.whl (49 kB)
Collecting starlette==0.13.*
Using cached starlette-0.13.8-py3-none-any.whl (60 kB)
Collecting apispec>=1.0.0b1
Using cached apispec-6.3.0-py3-none-any.whl (29 kB)
Collecting docopt
Using cached docopt-0.6.2-py2.py3-none-any.whl
Collecting requests-toolbelt
Using cached requests_toolbelt-1.0.0-py2.py3-none-any.whl (54 kB)
Collecting apistar
Using cached apistar-0.7.2-py3-none-any.whl
Requirement already satisfied: itsdangerous in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (2.1.2)
Requirement already satisfied: chardet in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (4.0.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (6.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from responder->tableqa) (3.1.2)
Collecting rfc3986
Using cached rfc3986-2.0.0-py2.py3-none-any.whl (31 kB)
Collecting whitenoise
Using cached whitenoise-6.4.0-py3-none-any.whl (19 kB)
Collecting aiofiles
Using cached aiofiles-23.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (2.0.2)
Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy->tableqa) (4.5.0)
Requirement already satisfied: protobuf>=3.19.6 in /usr/local/lib/python3.10/dist-packages (from tensorflow-hub->tableqa) (3.20.3)
Collecting click
Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Collecting h11>=0.8
Using cached h11-0.14.0-py3-none-any.whl (58 kB)
Collecting websockets==8.*
Using cached websockets-8.1-cp310-cp310-linux_x86_64.whl
Collecting python-dotenv>=0.13.*
Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting httptools==0.1.*
Using cached httptools-0.1.2-cp310-cp310-linux_x86_64.whl
Collecting uvloop>=0.14.0
Using cached uvloop-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
Collecting watchgod<0.7,>=0.6
Using cached watchgod-0.6-py35.py36.py37-none-any.whl (10 kB)
Collecting typesystem
Using cached typesystem-0.4.1-py3-none-any.whl (28 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->responder->tableqa) (2.1.2)
Collecting onnx
Using cached onnx-1.13.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.5 MB)
Collecting fire
Using cached fire-0.5.0-py2.py3-none-any.whl
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (1.26.15)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (3.4)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2.0.12)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers[tf-cpu]==3.0.2->tableqa) (2022.12.7)
Requirement already satisfied: gast<=0.4.0,>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.0)
Requirement already satisfied: absl-py>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.4.0)
Requirement already satisfied: jax>=0.3.15 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.4.8)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.2.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.3.0)
Requirement already satisfied: tensorflow-estimator<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: libclang>=13.0.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (16.0.0)
Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.54.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (67.7.2)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.8.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.6.3)
Requirement already satisfied: flatbuffers>=2.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (23.3.3)
Requirement already satisfied: tensorboard<2.13,>=2.12 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.2)
Requirement already satisfied: keras<2.13,>=2.12.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.12.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.32.0)
Requirement already satisfied: wrapt<1.15,>=1.11.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.14.1)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/local/lib/python3.10/dist-packages (from astunparse>=1.6.0->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.40.0)
Requirement already satisfied: scipy>=1.7 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.10.1)
Requirement already satisfied: ml-dtypes>=0.0.3 in /usr/local/lib/python3.10/dist-packages (from jax>=0.3.15->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.1.0)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.8.1)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.7.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.4.3)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.3.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.0.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.10/dist-packages (from tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (2.17.3)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.10/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.3.0)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (1.3.1)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /usr/local/lib/python3.10/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.10/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.13,>=2.12->tensorflow-cpu->transformers[tf-cpu]==3.0.2->tableqa) (3.2.2)
Building wheels for collected packages: tokenizers
error: subprocess-exited-with-error
× Building wheel for tokenizers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Building wheel for tokenizers (pyproject.toml) ... error
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
Can anybody help me? Thanks in advance.
Hello,
Thank you for providing your code! I'm very new to this and tried using my own data by simply feeding a CSV file into the pandas dataframe in the sample.ipynb.
This doesn't seem to work. When I try to ask a question with agent.query_db it throws me an OperationalError.
What are the necessary steps to try the code on own CSV files?
Thank you in advance.
Hi I am curious to know that whether we can store the data in elastic search indices?
Hi everyone,
I am using 3.8 python and while I am running test, it gives me Segmentation fault (core dumped) error. pls help :)
TypeError Traceback (most recent call last)
c:\Users\super\OneDrive\Desktop\research\tableQA.ipynb Cell 5 line 1
----> 1 agent.query_db("how many deaths of age below 40 had stomach cancer?")
File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\agent.py:72, in Agent.query_db(self, question, verbose, chart, size)
70 database = Database(self.data_dir, self.schema_dir)
71 create_db = getattr(database, self.db_type)
---> 72 engine = create_db(question)
73 answer = engine.execute(query).fetchall()
74 if chart is not None:
File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\tableqa\database.py:26, in Database.sqlite(self, question)
24 data_frame=self.data_process.get_dataframe(csv).astype(str)
25 schema=self.data_process.get_schema_for_csv(csv)
---> 26 data_frame = data_frame.fillna(data_frame.mean())
27 sql_schema = {}
28 for col in schema['columns']:
File c:\Users\super\anaconda3\envs\nextgpt\lib\site-packages\pandas\core\generic.py:11556, in NDFrame._add_numeric_operations..mean(self, axis, skipna, numeric_only, **kwargs)
11539 @doc(
11540 _num_doc,
11541 desc="Return the mean of the values over the requested axis.",
(...)
11554 **kwargs,
...
46 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,
47 initial=_NoValue, where=True):
---> 48 return umr_sum(a, axis, dtype, out, keepdims, initial, where)
TypeError: can only concatenate str (not "int") to str
Canceled future for execute_request message before replies were done
I can't run Agent()
Hi! This is such a cool project - wondering if you've thought at all about integrating with LangChain or writing an adapter? Is there any reason why this has to be used with BERT? Or could it also work with other models?
-Currently the feature supports csv files only. However, integrating more dataframes is easy. Go through the get_dataframe() method in data_utils.py and include support to detect the incoming file and parse the dataframe from it.
The output from the database could be used for generating pie charts, bar graphs, etc using pandas or any visualisation libraries. Add features to enable the same.
Hi,
I review the code , and have two questions want to ask.
One is about the wikidata.csv trained in clf.py
The samples mainly from english and also have some tiny data from other languages, such as japanese.
And have some samples use upper case.
My question is that, this data used to train a classifier about the question meaning format of sql.
the tensorflow hub model also trained on english only. and the question input seems always in english.
Why you use some multilingual input samples and use some upper case transformations.
This seems like your want to make the clf can also use in multilingual input ,
and want it to adapt with uppercase as some sql input .
If you want do it, why you do not use multilingual embedding in tf hub ? extend the sample by
some NMT translations and apply case transformations as sample augmentation methods ?
Second is that because the project construction mainly above on trained models and can use without
training. the lexicon parse of question input to identify intention mainly related with some custom (pre-assigned)
keywords defined in adapt methods of some inherent class of ColumnType (such as Number and Date)
so this project is mainly focus on simple input questions in lexicon.
So if i used it in question from other languages (such as chinese or japanese), It seems that can use some simple
NMT model to translate from these language into english and use your model. without replace the keywords
defined in above adapt methods. (because the lexicon is simple in question input, so the translated question
should be will formed or formatted)
As we all know, the schema or column name defined in database table or pandas dataframe may always in
english, And the table content may from other languages.
In this situation, i must have a choice in the translation the other language content. If i also translate content into
english, this seems like works. If i don’t, the qa function defined in your nlp.py should use multilingual squad transformers (some roberta model)
All i want to do is to adapt this project from only english tableQA to multilingual tableQA.
Because the input question and table dataset is simple in lexicon, dose this feature will support
in the project in the future ?
The queries which use other features of SQL syntax are the next targets to be incorporated to enable better querying on a wide range of questions.
Currently the project supports queries of sqlite format only(See database.py) . However, integrating other databases like MySQL and postgreSQL could be done by mirroring the work similar to sqlite.
Although there is nothing syntactically wrong with this code, it is hard to read and can be simplified to a set comprehension. Also it is faster since there is no need to create another transient list.
There are 5 occurrences of this issue in the repository.
See all occurrences on DeepSource → deepsource.io/gh/abhijithneilabraham/tableQA/issue/PYL-R1718/occurrences/
hello! how to use this model in chinese model? By changing the pretrained bert class in nlp.py? It doesn't help...
qa_model = TFBertForQuestionAnswering.from_pretrained('bert-base-chinese')
qa_tokenizer = BertTokenizer.from_pretrained('bert-base-chinese',padding=True)
@abhijithneilabraham thanks a lot for your beautiful work, i have some questions,
Thanks in advance!
Need a distinct=true parameter for get_query
and query_db
inside agent.py
. This means the SQL query will be SELECT DISTINCT(Something) FROM table
. Refer the Clause.adapt
in clause.py to see how this can be done.
Hello there! @abhijithneilabraham
I am fascinated by this project and i would like to adapt it for different languages.
Could you provide the original dataset used for the Question_classifier.h5 model in order to allow the translation of it?
Thanks!!
P.S. I actually tried to contact you also on your personal mail!
error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib` failed with code 101
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Hi, have added Postgresql connection and created dataframe which has below datatype:
object_id int64
discovery_id int64
schema object
object_type object
object_name object
no_of_rows int64
table_size_mb int64
lines_of_code int64
object_created_datetime datetime64[ns, UTC]
created_at datetime64[ns, UTC]
modified_at datetime64[ns, UTC]
while running Agent, I am getting below error:
KeyError: 'datetime64_ns_utc_'
Could you please help to resolve this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.