Giter VIP home page Giter VIP logo

tableqa's Introduction

tableQA

AI Tool for querying natural language on tabular data.Built using QA models from transformers.

Here is a detailed blog to understand how this works.

A tabular data can be:

  • Dataframes
  • CSV files

Build Status.
Gradient.
Open In Colab

Features

  • Supports detection from multiple csvs (csvs can also be read from Amazon s3)
  • Supports FuzzyString implementation. i.e, incomplete column values in query can be automatically detected and filled in the query.
  • Supports Databases - SQLite, Postgresql, MySQL, Amazon RDS (Postgresql, MySQL).
  • Open-Domain, No training required.
  • Add manual schema for customized experience
  • Auto-generate schemas in case schema not provided
  • Data visualisations.

Supported operations.

  • SELECT
    • one column
    • multiple columns
    • all columns
    • aggregate functions
    • distinct select
      • count-select
      • sum-select
      • avg-select
      • min-select
      • max-select
  • WHERE
    • one condition
    • multiple conditions
    • operators
      • equal operator
      • greater-than operator
      • less-than operator
      • between operator

Configuration:

install via pip:

pip install tableqa

installing from source:

git clone https://github.com/abhijithneilabraham/tableQA

cd tableqa

python setup.py install

Quickstart

Do sample query

from tableqa.agent import Agent
agent=Agent(df) #input your dataframe
response=agent.query_db("Your question here")
print(response)

Get an SQL query from the question

sql=agent.get_query("Your question here")  
print(sql) #returns an sql query

Adding Manual schema

Schema Format:
{
    "name": DATABASE NAME,
    "keywords":[DATABASE KEYWORDS],
    "columns":
    [
        {
        "name": COLUMN 1 NAME,
        "mapping":{
            CATEGORY 1: [CATEGORY 1 KEYWORDS],
            CATEGORY 2: [CATEGORY 2 KEYWORDS]
        }

        },
        {
        "name": COLUMN 2 NAME,
        "keywords": [COLUMN 2 KEYWORDS]
        },
        {
        "name": "COLUMN 3 NAME",
        "keywords": [COLUMN 3 KEYWORDS],
        "summable":"True"
        }
    ]
}

  • Mappings are for those columns whose values have only few distinct classes.
  • Include only the column names which need to have manual keywords or mappings.Rest will will be autogenerated.
  • summable is included for Numeric Type columns whose values are already count representations. Eg. Death Count,Cases etc. consists values which already represent a count.

Example (with manual schema):

Database query
  • Default Database - SQLite (File-based database, does not require creation of a separate connection.)
from tableqa.agent import Agent
agent=Agent(df,schema) #pass the dataframe and schema objects
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]
  • To use PostgreSQL, you must have a postgresql server installed and running on your local. To download postgresql, visit the page.
from tableqa.agent import Agent
agent = Agent(df, schema_file, 'postgres', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]
  • To use MySQL, you must have a mysql server installed and running on your local. To download mysql, visit the page.
from tableqa.agent import Agent
agent = Agent(df, schema_file, 'mysql', username='username', password='password', database='DBname', host='localhost', port=5432, aws_db=False)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

  • To use PostgreSQL or MySQL on Amazon RDS, you must create a database on Amazon RDS. The RDS must be in public subnet with security groups allowing connections from outside of AWS.

Refer to step 1 in the document to create a mysql db instance on Amazon RDS. Same steps can be followed for creating a PostgreSQL db instance by selecting PostgreSQL in the Engine tab. Obtain the username, password, database, endpoint, and port from your database connection details on Amazon RDS.

from tableqa.agent import Agent
agent = Agent(df, schema_file, 'postgres', username='Master username', password='Master password', database='DB name', host='Endpoint', port='Port', aws_db=True)
response=agent.query_db("how many people died of stomach cancer in 2011")
print(response)
#Response =[(22,)]

SQL query
sql=agent.get_query("How many people died of stomach cancer in 2011")
print(sql)
#sql query: SELECT SUM(Death_Count) FROM cancer_death WHERE Cancer_site = "Stomach" AND Year = "2011"

Multiple CSVs

  • Pass the absolute path of the directories containing the csvs and schemas respectively. Refer cleaned_data and schema for examples.
Example
  • Read CSV and Schema from local machine-
csv_path="/content/tableQA/tableqa/cleaned_data"
schema_path="/content/tableQA/tableqa/schema"
agent=Agent(csv_path,schema_path)

  • Read CSV and schema files from Amazon s3 -
  1. Create a bucket on Amazon s3.
  2. Upload objects to the bucket.
  3. Create an IAM user and provide it access to read files from Amazon s3 storage.
  4. Obtain the access key and secret access key for the user and pass it as an argument to the agent.
csv_path="s3://{bucket}/cleaned_data"
schema_path="s3://{bucket}/schema"
agent = Agent(csv_path, schema_path, aws_s3=True, access_key_id=access_key_id, secret_access_key=secret_access_key)

Join us

Join our workspace:Slack

tableqa's People

Contributors

abhijithneilabraham avatar abhijitramesh avatar damanpreet avatar nandanasreeraj123 avatar deepsourcebot avatar kurianbenoy avatar deepsource-autofix[bot] avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.