Giter VIP home page Giter VIP logo

gpt-auto-data-analytics's Introduction

GPT-Auto-Data-Analytics

Automate local data analysis with groups of tool-using GPT agents.

ChatGPT and Code Interpreter has changed the way of data analysis. But have you ever got frustrated about ...

  • ๐Ÿค” Limited runtime and computing resources (no GPU access!)
  • ๐Ÿ“Š Challenges in handling large and complex local datasets (which is hard to upload)
  • ๐Ÿ“ฆ Certain Python packages being unavailable in online environment.
  • ๐Ÿšซ No vision ability to interpret the generated figures.
  • ๐Ÿงฉ Lack of organized output from the analysis.

This project aims to replicate the online code interpreter experience locally, but also addressing all the issues mentioned above:

  • ๐Ÿ’ป Code generation and execution in a local Python kernel
  • ๐Ÿ“‚ Full access to your data on local storage
  • ๐Ÿค– A supervisor agent guides the coding agent to iteratively solve data analysis problems
  • ๐Ÿ‘€ Vision capabilities for the coding agent to interpret visual figures
  • ๐Ÿ“š Organized output, exported as Jupyter notebooks, PDF, and HTML

๐Ÿš€ Installation

git clone https://github.com/Animadversio/GPT-Auto-Data-Analytics.git
cd GPT-Auto-Data-Analytics
pip install -e .

Usage

# set up openai key env variable
from auto_analytics.tabular_analysis_session import TabularAnalysisPipeline
# Tell the agent some info about your data and columns, and your overall objective.
table_descriptions = """..."""
column_descriptions = """..."""
task_objective = """Perform explorative data analysis of this dataset to uncover relationships among different variables."""
csvpath = "~/GPT-Auto-Data-Analytics/table_data/Diabetes_Blood_Classification.csv"
report_root = "reports"

# Example usage
analysis_session = TabularAnalysisPipeline("Diabetes_Classification", csvpath, report_root=report_root)
# Set up dataset and column descriptions
analysis_session.set_dataset_description(table_descriptions, column_descriptions)
# Let supervisor agent setup and save analysis task
analysis_session.supervisor_set_analysis_task(task_objective, ) 
# Let research assitent perform data analysis
analysis_session.perform_data_analysis(query=None, MAX_ROUND=30)
# Save results to notebook, HTML, and PDF
analysis_session.save_results()

Play with our Colab demo! Open In Colab

Video walk-through of our system

โœ… Current Features ๐ŸŒŸ

  • Auto Analytics in Local Env: The coding agent have access to a local python kernel, which runs code and interacts with data on your computer. No more concerns about file uploads, compute limitations, or the online ChatGPT code interpreter environment. Leverage any Python library or computing resources as needed.
  • Collaborative Intelligence: We've built a team of LLM agents assuming the roles of research supervisor and coding assistant. Their interaction allows the supervisor's overarching vision to guide the detailed coding and data analysis process, leading to a cohesive report.
  • Tabular Data Analysis: Full support for tabular data analysis (e.g., Kaggle competitions). From tables to analytical insights in one step.
  • Vision Analytics: Integration with Vision API enables the data analytics agent to generate and understand the meaning of plots in a closed loop.
  • Versatile Report Export: After automated data analysis, a Jupyter notebook is generated, combining code, results, and visuals into a narrative that tells the story of your data. Exports are available in Jupyter notebook, PDF, and HTML formats for your review and reproduction.

How does it work?

Click to expand!

System Overview. Our high-level idea is to emulate the workflow in research labs: we will assign roles to AI agents, such as research supervisor or student coding agent and let them work in a closed loop. The supervisor, equipped with broader background knowledge, is tasked to set an overarching research goal, and then break it down into detailed code-solvable tasks which are sent to the student. Then, the student coding agent, equipped with coding tool and vision capability, will tackle the tasks one-by-one, synthesizing their findings into reports for the supervisor's review. This iterative process allows the supervisor to refine their understanding and possibly adjust the research agenda, prompting further investigation. Through this collaborative effort, both parties converge to a unified conclusion, culminating in a final report with code, figures and text interpretations.

Internal Coding Loop. At the core of our system lies the coding agent, an LLM agent who is tasked for conducting interactive data analysis. This agent will take in task objective, and then analyze datasets by outputting code snippets, which are executed in a local IPython kernel. The results of code execution (including error message) will be sent back to LLM in the form of text strings. Notably, when figures are generated, they will also be turned into text descriptions by a multimodal LLM prompted to interpret the figure .

๐ŸŒ What's Next?

  • Better report summarization.
  • Enhanced report presentation. (filtering, formating etc.)
  • CLI interface to auto data analysis.
  • Addressing complex, multi-file data analysis challenges.
  • Tackling multi-modal data analysis problems, including text, image and neural data.
  • Enhancing multi-round communication between the supervisor and the coding agent, adjusting the research goal based on the report

Join Us on This Journey

As an end goal, we hope our tools can auto-pilot data analysis on its own. If you're interested in using our tool in your daily data analysis work or scientific research, we'd love to hear your thoughts and what you find interesting!

gpt-auto-data-analytics's People

Contributors

animadversio avatar

Stargazers

Peng Liu avatar Fatih Sogukpinar avatar Peter Kadlot avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.