Giter VIP home page Giter VIP logo

langsuite's Introduction

🏘️ LangSuit⋅E

Controlling, Planning, and Interacting with Large Language Models in Embodied Text Environments

License: MIT Documentation

LangSuit⋅E is a systematic and simulation-free testbed for evaluating embodied capabilities of large language models (LLMs) across different tasks in embodied textual worlds. The highlighted features include:

  • Embodied Textual Environments: The testbed provides a general simulation-free textual world that supports most embodied tasks, including navigation, manipulation, and communications. The environment is based on Gymnasium and inherits the design patterns.
  • Embodied Observations and Actions: All agents' observations are designed to be embodied with customizable max_view_distance, max_manipulate_distance, focal_length, etc.
  • Customizable Embodied Agents: The agents in LangSuit⋅E are fully-customizable w.r.t their action spaces and communicative capabilities, i.e., one can easily adapt the communication and acting strategy from one task to another.
  • Multi-agent Cooperation: The testbed supports planning, acting and communication among multiple agents, where each agent can be customized to have different configurations.
  • Human-agent Communication: Besides communication between agents, the testbed supports communication and cooperation between humans and agents.
  • Full support to LangChain library: The LangSuitE testbed supports full usage of API language models, Open-source language models, tool usages, Chain-of-Thought (CoT) strategies, etc..
  • Expert Trajectory Generation: We provide expert trajectory generation algorithms for most tasks.

Table of Contents

⚠️ Upgrade Warning !!!

We are currently refactoring and upgrading LangSuit·E. As of now, the following features are temporarily offline. If you have an urgent need, you can find our old version in the dev branch. If you discover any bugs in the new version or wish for us to prioritize certain features, feel free to tell us in the issues.

Below is our priority list:

  • Support interactive WebUI (currently WebUI can render the picture of scene, but do not support interactive actions)
  • Support Gymnasium API (will provide better support for BabyAI and more tasks)
  • Refactored Expert Agent for IQA, Rearrenage and BabyAI.
  • More deeply integration with LangChain
  • Support multi-room and multi-agent (TEACh and CWAH need it)
  • Better support for AI2-THOR and ProcTHOR based tasks
  • Support VirtualHome format (CWAH need it)

📦 Benchmark and Dataset

We form a benchmark by adapting from existing annotations of simulated embodied engines, a by-product benefit of pursuing a general textual embodied world. Below showcase 6 representative embodied tasks, with variants of the number of rooms, the number of agents, and the action spaces of agents (whether they can communicate with each other or ask humans).

Task Simulator # of Scenes # of Tasks # of Actions Multi-Room Multi-Agent Communicative
BabyAI Mini Grid 105 500 6
Rearrange AI2Thor 120 500 8
IQA AI2Thor 30 3,000 5
ALFred AI2Thor 120 506 12
TEACh AI2Thor 120 200 13
CWAH Virtual Home 2 50 6

🛠 Getting Started

Installation

  1. Clone this repository
git clone https://github.com/langsuite/langsuite.git
cd langsuite
  1. Create a conda environment with Python3.8+ and install python requirements
conda create -n langsuite python=3.8
conda activate langsuite
pip install -e .
  1. Export your OPENAI_API_KEY by
export OPENAI_API_KEY="your_api_key_here"

or you can customize your APIs by

cp api.config.yml.example api.config.yml

and add or update your API configurations. For a full API agent list, please refer to LangChain Chat Models.

  1. Download the task dataset by
bash ./data/download.sh <data name>

Currently supported datasets include: alfred, babyai, cwah, iqa, rearrange.

Quick Start: CommandLine Interface (Default)

langsuite task <config-file.yml>

webui

Quick Start: Interactive Web UI

  1. Start langsuite server
langsuite serve <config-file.yml>
  1. Start webui
langsuite webui

The user inferface will run on http://localhost:8501/

webui

Task Configuration

task: AlfredTask_V0

template: ./templates/alfred/alfred_react.json

world:

agents:
  - type: ChatAgent
    inventory_capacity: 5
    focal_length: 10
    max_manipulate_distance: 2
    max_view_distance: 2
    step_size: 0.25

Prompt Template

{
    "intro": {
        "default": [
            "As an autonomous intelligent agent, you are now navigating a virtual home, and your task is to perform household tasks using specific actions. You will have access to the following information:  ..."
        ]
    },
    "example": {
        "default": [
            "Task: put a clean lettuce in diningtable.\nObs: In front of you, You see a stoveburner_2. On your left, you see a stoveburner_1; a sinkbasin_1. On your right, you see a countertop_1; a tomato_0; a toaster_0.\n> Act: turn_left ..."
        ]
    },
    "InvalidAction": {
        "failure.invalidObjectName": [
            "Feedback: Action failed. There is no the object \"{object}\" in your view space. Please operate the object in sight.\nObs: {observation}"
        ],
        ...
    },
    ...
}

📝 Citation

If you find our work useful, please cite

@inproceedings{langsuite2023,
  author    = {Jia, Zixia and Wang, Mengmeng and Tong, Baichen and Zheng, Zilong},
  title     = {LangSuit⋅E: Controlling, Planning, and Interacting with Large Language Models in Embodied Text Environments},
  year      = {2024},
  booktitle = {Findings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)},
  url       = {https://github.com/bigai-nlco/langsuite}
}

For any questions and issues, please contact [email protected].

📄 Acknowledgements

Some of the tasks of LangSuit⋅E are based on the datasets and source-code proposed by previous researchers, including BabyAI, AI2Thor, ALFred, TEAch, CWAH.

langsuite's People

Contributors

advadder avatar dependabot[bot] avatar mengmengwang90 avatar zilongzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.