Giter VIP home page Giter VIP logo

opendevin's Introduction

Contributors Forks Stargazers Issues MIT License
Join our Slack community Join our Discord community
SWE-bench CodeCov
Logo

OpenDevin: Code Less, Make More

Check out the documentation

Welcome to OpenDevin, a platform for autonomous software engineers, powered by AI and LLMs.

OpenDevin agents collaborate with human developers to write code, fix bugs, and ship features.

App screenshot

โšก Getting Started

The easiest way to run OpenDevin is inside a Docker container. It works best with the most recent version of Docker, 26.0.0. You must be using Linux, Mac OS, or WSL on Windows.

To start OpenDevin in a docker container, run the following commands in your terminal:

Warning

When you run the following command, files in ./workspace may be modified or deleted.

OPENDEVIN_WORKSPACE=$(pwd)/workspace
docker run -it \
    --pull=always \
    -e SANDBOX_USER_ID=$(id -u) \
    -e PERSIST_SANDBOX="true" \
    -e SSH_PASSWORD="make something up here" \
    -e WORKSPACE_MOUNT_PATH=$OPENDEVIN_WORKSPACE \
    -v $OPENDEVIN_WORKSPACE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name opendevin-app-$(date +%Y%m%d%H%M%S) \
    ghcr.io/opendevin/opendevin:0.6

You'll find OpenDevin running at http://localhost:3000 with access to ./workspace. To have OpenDevin operate on your code, place it in ./workspace.

OpenDevin will only have access to this workspace folder. The rest of your system will not be affected as it runs in a secured docker sandbox.

๐Ÿš€ Documentation

To learn more about the project, and for tips on using OpenDevin, check out our documentation.

There you'll find resources on how to use different LLM providers (like ollama and Anthropic's Claude), troubleshooting resources, and advanced configuration options.

๐Ÿค How to Contribute

OpenDevin is a community-driven project, and we welcome contributions from everyone. Whether you're a developer, a researcher, or simply enthusiastic about advancing the field of software engineering with AI, there are many ways to get involved:

  • Code Contributions: Help us develop new agents, core functionality, the frontend and other interfaces, or sandboxing solutions.
  • Research and Evaluation: Contribute to our understanding of LLMs in software engineering, participate in evaluating the models, or suggest improvements.
  • Feedback and Testing: Use the OpenDevin toolset, report bugs, suggest features, or provide feedback on usability.

For details, please check CONTRIBUTING.md.

๐Ÿค– Join Our Community

Whether you're a developer, a researcher, or simply enthusiastic about OpenDevin, we'd love to have you in our community. Let's make software engineering better together!

  • Slack workspace - Here we talk about research, architecture, and future development.
  • Discord server - This is a community-run server for general discussion, questions, and feedback.

๐Ÿ“ˆ Progress

SWE-Bench Lite Score

Star History Chart

๐Ÿ“œ License

Distributed under the MIT License. See LICENSE for more information.

๐Ÿ“š Cite

@misc{opendevin2024,
  author       = {{OpenDevin Team}},
  title        = {{OpenDevin: An Open Platform for AI Software Developers as Generalist Agents}},
  year         = {2024},
  version      = {v1.0},
  howpublished = {\url{https://github.com/OpenDevin/OpenDevin}},
  note         = {Accessed: ENTER THE DATE YOU ACCESSED THE PROJECT}
}

opendevin's People

Contributors

808vita avatar amanape avatar assertion avatar borda avatar computer-whisperer avatar dependabot[bot] avatar dorbanianas avatar eltociear avatar enyst avatar foragerr avatar frankxu2004 avatar huybery avatar ifuryst avatar isavita avatar jayquimby avatar justinlin610 avatar li-boxuan avatar neubig avatar rainrat avatar rbren avatar shimada666 avatar smartmanoj avatar sparkier avatar tobitege avatar umpire2018 avatar xcodebuild avatar xingyaoww avatar yimothysu avatar yufansong avatar zeul22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendevin's Issues

Feature Outline and Requirements Engineering

Took a crack at what I think this thing should do (with ChatGPT of course).

Ideal Scope and Capabilities

1. Task Understanding

  • Natural Language Processing (NLP): The AI must excel in understanding software development tasks described in natural language, including vague or incomplete specifications. It should ask clarifying questions if the task description is not clear.
  • Contextual Interpretation: Ability to understand the context of a project or a codebase to make relevant suggestions or generate appropriate code. This includes understanding the specific libraries, frameworks, and coding standards in use.

2. Code Generation

  • Multi-Language Support: Generate code in multiple programming languages, understanding the idiomatic nuances of each.
  • Adaptive Coding Style: Adapt to the existing codebase's style, following naming conventions, commenting styles, and structural patterns.
  • Algorithm Design: Beyond translating tasks into code, the AI should be capable of designing algorithms to solve complex problems efficiently.

3. Debugging

  • Error Detection: Identify syntax errors, runtime errors, and logical errors in code.
  • Error Explanation: Provide clear explanations for identified errors, making it easier for human developers to understand and fix them.
  • Suggest Fixes: Offer one or more solutions to fix the identified errors, considering the most efficient and idiomatic approaches.

4. Code Optimization

  • Performance Optimization: Suggest or automatically refactor code to improve performance, such as reducing time complexity or optimizing resource usage.
  • Readability and Maintainability: Refactor code to improve readability and maintainability, following best practices and design patterns.
  • Security Enhancements: Identify and fix security vulnerabilities, ensuring the code adheres to security best practices.

5. Documentation

  • Automatic Documentation: Generate comprehensive and understandable documentation for code, including function/method descriptions, parameter explanations, and example usage.
  • Code Comments: Add meaningful comments within the code to explain complex logic or important decisions.
  • Update Documentation: Keep documentation synchronized with code changes, updating descriptions and examples as the code evolves.

6. Collaboration

  • Version Control: Understand and execute version control operations, such as commits, merges, and pull requests, with meaningful commit messages.
  • Code Reviews: Participate in code review processes by providing suggestions for improvements and identifying potential issues in others' code.
  • Team Communication: If integrated into team communication tools, the AI could summarize code changes, explain technical decisions, and facilitate knowledge sharing.

7. Learning and Adaptation

  • Feedback Incorporation: Use feedback from users to improve task understanding, code generation quality, and debugging capabilities.
  • Continuous Learning: Stay updated with the latest programming languages, frameworks, and best practices by continuously incorporating new information into its knowledge base.

Reasonable MVP

This is something I think is achievable. Pick a typical codebase (a Node.js backend API) which generally is mostly glue code that is easy to reason about. (Unlike a frontend with layout!)

MVP Scope for an AI Node.js Engineer

1. Basic Task Understanding and Code Generation

  • Focus on Common Node.js Tasks: Start with understanding and generating code for a set of common Node.js development tasks, such as setting up a server with Express, connecting to a MongoDB database, or handling REST API requests.
  • Template-Based Code Generation: Utilize a library of code templates and patterns for common tasks and scenarios in Node.js applications. This approach can speed up the MVP development by relying on proven solutions.

2. Simple Debugging and Error Handling

  • Static Code Analysis: Integrate basic static code analysis to identify syntax errors and common mistakes specific to JavaScript and Node.js. This feature helps in ensuring that the generated code is error-free at a basic level.
  • Error Explanation and Suggestions: Provide explanations for common errors and suggest fixes. At this stage, focusing on the most frequent Node.js errors (e.g., callback errors, promise handling, and async/await issues) can add significant value.

3. Code Optimization for Performance

  • Best Practices Guide: Instead of automatic optimization, the MVP could include suggestions for best practices in Node.js development. This can cover topics like efficient asynchronous programming, memory management, and avoiding common pitfalls.

4. Basic Documentation Generation

  • Function and API Documentation: Automatically generate comments and documentation for functions, classes, and API endpoints. This feature can significantly speed up the development process and ensure that the generated code is accessible to other developers.

5. Version Control Integration

  • Basic Git Operations: Enable the AI to perform basic Git operations such as init, add, commit, and push. This feature can be particularly useful for automating the setup of new projects and maintaining a clean version history from the start.

Graceful shutdown of docker containers

Describe the bug
The DockerInteractive class starts a docker container for each instance that's created

Ideally we would stop and remove these containers in the __del__ function, which gets called when the instance is destroyed, or when the python process ends.

Unfortunately, the docker SDK makes a blocking call which isn't allowed during python's shutdown, so you end up with a stack trace.

Steps to Reproduce
To see the lack of cleanup:

  1. PYTHONPATH=pwd python ./opendevin/main.py -d ./workspace -t "write a bash script that prints hello world"
  2. wait for it to finish
  3. docker ps should show sandbox-default

To see the graceless shutdown:

  1. uncomment this line
  2. Follow steps above
  3. note the stack trace when the program finishes
  4. docker ps still shows sandbox-default

Expected behavior
All sandbox containers are removed on shutdown

Actual behavior
Sandbox containers are left running

Additional context

Ideas for how to fix this:

  • Under normal circumstances (i.e. program finishes naturally, not ctrl+c or crash) we can definitely do the cleanup, e.g. by putting some cleanup logic in controller.py
  • Is there a way to force the docker SDK to run kill or remove?
  • Can we use subprocess to send a docker kill command without blocking exit?

WebSocket API

It seems to me that the frontend is primarily displaying what OpenDevin is doing to the user for visibility. The actual agent is implemented on the backend.

We'll therefore want to stream a lot of information from the backend to the frontend via WebSockets and/or Server-Sent Events. Each module of OpenDevin should receive its own events.

Below is a draft of what the events for such a WebSocket API might look like.

Terminal

terminal writes to the terminal. terminal.write(...) is a function in xterm.js, so we can forward the terminal sequences directly from the backend to the frontend. the paylod might look like

{
    "content": "\x1B[1;3;31OpenDevin\x1B[0m $"
}

Planner

planner writes to the planner in MarkDown format, which the frontend renders. we could reuse the same payload as the code endpoint below since the planner state can be represented as a single .md file.

Code

code streams code, which the frontend renders syntax-highlighted in a code editor. the code may be stored in a string array, where each element is a line of code. the payload might look like

{
    "line": 109
    "change": "INSERT",
    "content": [
        "with open(\"tmp.txt\") as f:",
        "\tcontent = f.read()"
    ]
}
  • line: the line number at which the code change begins
  • change: the type of change being made ("INSERT" or "DELETE")
  • content: the lines of code to insert

Browser

navigate navigates to a URL and sends a screenshot every second (or every page change). the frontend displays this URL and screenshot.

it's possible to render an <iframe />, but

  1. this seems unnecessary because the backend already needs to access pages via Selenium
  2. this can have security/reliability issues (such as CORS)

the payload might look like

{
    "url": "https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html",
    "screenshot": "data:image/png;base64, ..."
}

System Architecture

System Overview

The AI-powered software engineering assistant employs a multi-agent swarm model to provide a comprehensive development experience. At its core is a delegator agent that manages user interactions, project contexts, and delegates tasks to specialized agents.

Components

Web Application (Frontend)

Chat Interface: Primary user interaction point. Driven by a robust NLP engine for natural language communication.
Embedded IDE: Full-featured web IDE (Theia-based) for code development and project review.
Shell Emulator: Secure shell environment for development tasks and project setup.
Settings: Manages user preferences and access to LLM credentials.

Delegator Agent

Conversation Management: Interprets user intent, routes requests, and manages interruptions across multiple projects.
Project Contextualization: Tracks active projects, their stage, and associated data.
Task Delegation: Delegates tasks to the appropriate agents, manages dependencies, and tracks progress.
State Management: Maintains a robust system for storing and retrieving project states to handle context switching fluidly.

Specialized Agent Swarm

Requirements Engineering Agent: Excels in requirements elicitation, design suggestion, and generating architectural diagrams. May leverage specialized LLMs and knowledge bases.
Project Management Agent: Focuses on task breakdown, estimation, timelines, and potentially integrates with external PM tools.
Software Development Agent: Code-centric, responsible for code generation, stubbing, test cases, PRs, and leverages LLMs trained on code.
Release Engineering Agent: Handles environment setup, CI/CD pipelines, deployment strategies, and build configurations.
QA/QC Agent: Generates test plans, understands different testing paradigms, and may suggest tools and extensive test suites.

Backend Server

Coordination Logic: Houses the delegator agent and potentially the specialized swarm, enabling communication and orchestration.
Secure Credential Storage: Encrypted system for storing and retrieving user LLM API keys.
Shared Knowledge Base (Optional): If appropriate, a centralized store of data, learnings, and code examples to improve the collective intelligence of the agents.

External Services

**GitHub: **Integration for repository creation, code management, and issue tracking.
User-Selected LLM Providers: System connects to external LLMs (GPT-3, etc.) via a flexible API abstraction layer.
CI Server: Executes test suites, build processes, and may connect with deployment pipelines.

System Strengths

Specialization: Agents become highly focused, increasing potential for high-quality outputs in their domains.
User-Focused: The delegator creates a seamless chat-based interface, simplifying the complexity for the user.
Adaptability: LLM choices reside with the user. New LLMs or specialized agents can be integrated over time.
Resilience: The swarm model allows for potential scaling and lessens the impact of single agent failures.

We need a way to intelligently navigate large code bases.

Maybe having an LLM create an adjacency list of all the files that depend on each other and also write short descriptions of each file so that the LLM can intelligently navigate the codebase rather than just using embeddings.

Create Aider Agent

Summary
There's an open source AI pair programming tool called aider that implements something interesting to you: a bunch of python classes and functions to ask the LLM to output only the diff to apply instead of writing the whole code. This both reduces the chances of errors and greatly reduces the number of tokens to write (importantly: the completion tokens are way more expensive that the prompt tokens)

Motivation
Reduce token cost and errors.

Technical Design
A report showcasing their suff can be found here. Most of the code is here and the prompts are here.
As you can see lots of though when into this because the LLM has otherwise trouble with the number of lines etc.

Alternatives to Consider
None that I know of.

Additional context
For a personnal project I inquired about using only the functions of aider and you can read the issue here
Also, hearing about OpenDevin made me hear about devika too so I'll be posting this exact same issue on their repo too.

We'd like to support this project

We are a startup focused on developing innovative tools. Currently, we are in the process of creating an AI-powered search engine specifically designed for developers, accessible at https://devv.ai/.

We are interested in sponsoring this project and are open to including the token usage of models from OpenAI, Anthropic, Gemini, or others. Additionally, we can offer our search index infrastructure to significantly enhance the development of OpenDevin.

Please feel free to reach out to me at [email protected] to discuss further details.

How to contribute?

I have some experience finetuning LLMs and synthetic data and would like to know more about how to contribute; already submitted the form but have not received the invitation from the Slack channel.
Just want to know, is there any plan for the (outer) community? For example, if I clone this repo, add some code, and make a PR, is that OK to try?

Local API or Gradio Client Support focus.

Gradio clients that run local language models such as โ€œOobaBoogaโ€ and allow api support should be a major consideration for the roadmap process. Creating usable model swapping with a cache functionality is feasible. I made an example chart months ago when I saw the potential in MinP greedy sampling that Kalomaze did work on being helpful for memory driven tasked recall due to the token accuracy.
image

Please note that current projects like MemoryGPT allow api usage but no widespread application allows for effective model swapping or multi system offloading. Itโ€™s also important to note that a side server โ€œchainโ€ of cheaper machines or a GGML focused network solution could allow for more garage labs.

Current Roadblocks are memory management, non-useful hallucinations (effective hallucinations could generate better idea tokens in a agent focus), and ineffective inter model conversation solutions that are actually open source for System prompting style implementation.

The most feasible multi model solution is to allow for most elements to be cpu offloaded but for features like live training a model with a model doing RLHF being a โ€œdrop inโ€ use that requires a GPU with enough vram for training. Unless a Traditional ram based training solution is usable with current model base such as mistral.

To summarize, a focus on using API solutions such as chatgpt or Claude will stagnate research on local language model feasibility. Creating a feasible framework for agent structures and Lora based live tuning for memory retention elements on a version based task list will most likely be the best course.

"Architecting and Customizing Database Solutions for Enhancing Core Functionalities and Backend Performance Evaluation"

The critical role of properly architected data infrastructure and the selection of specific data technologies and models cannot be overstated in the development and performance optimization of Large Language Models (LLMs) and GPT-based projects. These foundational elements are pivotal for enhancing core functionalities, achieving unprecedented process accelerationโ€”potentially up to 1000Xโ€”and managing vast contextual volumes. Such infrastructure underpins the strategic long-term objectives of GPT-based agents, enabling them to navigate and manipulate extensive data landscapes efficiently.

Research and developments in the integration of database systems with LLMs underscore this significance. For instance, "DB-GPT: Empowering Database Interactions with Private Large Language Models" discusses optimizing database interactions through adaptive contextual learning techniques, which significantly enhance LLM performance in contextual information management. This research highlights the necessity of a robust data architecture for efficient knowledge construction and retrieval [โž].

Similarly, the exploration of LangChain's integration with GPT and database technologies reveals the transformative potential of facilitating natural language interactions with databases. By translating user requests into SQL queries, LangChain demonstrates the power of merging LLMs with database technologies, making database interactions more accessible and efficient for users without SQL expertise. This advancement is a testament to the flexibility and efficiency achievable through the strategic customization of data models and technologies [โž].

These examples illustrate the indispensable need for selecting and customizing specific data technologies and models to support the primary functionalities and performance optimization in projects involving LLMs and GPT-based agents. The correct choice and customization of these technologies not only facilitate accelerated processes but also pave the way for expanded capabilities and more ambitious long-term goals for the project.

For further insights and details, you can refer to the full articles:

Create RepoPilot Agent

It seems that Devin is based on multi-agent system, we at FSoft AI Center, proposed RepoPilot few months ago for repo-level code understanding. Adding some components like web browsing, test execution might be valuable on top of RepoPilot.

HOW I CAN RUN THIS PROJECT

Summary
Sorry for the inconvenience, can I ask if you can tell me how to run this project? Thank you very much for your help.

Motivation

Technical Design

Alternatives to Consider

Additional context

Which open source licence?

Which open source licence?

I don't have an opinion but feel the project should have a licence before people contribute.

[Evaluation] Fix SWE-Bench Evaluation on Devin's Output

Following instructions here, you will set up prediction files from Devin, and run evaluation using OpenDevin's SWE-Bench fork.

This task aims to ensure the SWE-Bench evaluation (using OpenDevin's fork) can successfully run on all of Devin's prediction files. Instead of sending PR to this repo, you should fix issues and send PRs to our SWE-Bench fork.

I have attached the log file with multiple issues running SWE-Bench on Devin's output -- Search for 'Traceback' to find exact error messages.

swe-bench-devin.log

A suggested way to get started: You may try to create one prediction JSON (see more about the prediction file format here) from each SWE-Bench repo (e.g., you will have data/predictions/sklearn.json, data/predictions/matplotlibs.json, etc). Then, you may try to run evaluations on them to debug repositories one-by-one until the issue is fixed.

Set up Python Linting/TypeChecking/CI

Currently we don't have any linting or typechecking in CI. It'd be good to add this. My suggestions are:

Assuming this, we should:

  • add linting
  • add typechecking
  • add CI to check the linting/typechecking
  • add git pre-commit hook to ensure that these are run on commit

Docker unreachable error not reported by server on websocket connect

Describe the bug
When starting the server with Docker stopped, the server throws a backtrace error indicating Docker is not reachable instead of a clear error message.

Steps to Reproduce

  1. Stop Docker.
  2. Execute the command: $ uvicorn opendevin.server.listen:app --reload --port 3000.
  3. Attempt to connect via WebSocket: websocat ws://127.0.0.1:3000/ws.

Expected behavior
The server should report a clear error message indicating Docker is down and a WebSocket connection cannot be established.

Actual behavior
The server starts and accepts the WebSocket connection, but upon attempting any operation that requires Docker, it crashes with a backtrace error pointing to Docker connectivity issues. The error log is as follows:

$ uvicorn opendevin.server.listen:app --reload --port 3000
INFO:     Will watch for changes in these directories: ['./OpenDevin-rbren/OpenDevin']
INFO:     Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)
INFO:     Started reloader process [16775] using WatchFiles
INFO:     Started server process [16779]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     ('127.0.0.1', 49231) - "WebSocket /ws" [accepted]
INFO:     connection open
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connectionpool.py", line 496, in _make_request
    conn.request(
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connection.py", line 400, in request
    self.endheaders()
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1289, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1048, in _send_output
    self.send(msg)
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 986, in send
    self.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/docker/transport/unixconn.py", line 27, in connect
    sock.connect(self.unix_socket)

Create Documentation Site

Summary

The goal of this issue is to propose a first documentation concept to iterate on and start a discussion around this topic.

Motivation

As the Project grows it would be helpful to have a good documentation, especially for new contributors and users.
The documentation has to contain all neccessary information while still beeing easy to use and maintain.

Technical Design

TL;DR
Create a seperate repository "OpenDevin/OpenDevinDocs" for documentation, containing autogenerated code documentation and manually created parts (Overview, architecture, examples)

Generation and content
The documentation should at least contain the following contents:

  • High-level project overview
  • Installation instructions, How-To Guide, Examples
  • Architectural diagrams
  • API documentation
  • Generated code documentation

This means documentation will consist of two parts: An autogenerated code documentation Part which is generated by GitHub actions on every commit/PR and a part for which has to be maintained manually, but less frequently.

Versioning / Repository
The documentation should be stored in a seperate repository (e.g. OpenDevin/OpenDevinDocs)
This enables tracking documentation-related issues separately, keeping the main repository focused on development.
It would also be possible to implement the same branch concept as in the code repository by automatically creating a docs branch for each code branch.

Separation between Frontend and Backend
Pro seperation: Frontend and backend use very different technology stacks. A seperation allows it to use better suited tools for each part.
Con seperation: Frontend and Backend are part of the same project, so the documentation should also consider both parts for a better understanding of the project as a whole.
Proposal: Store frontend and backend docs in the same repository with a root-level separation (similar to the code repository). Then docs for frontend and backend can be generated separately and maybe can be tied together with an index site for navigation between both parts.

Tooling
For Backend: Use Sphinx due to its wide usage and extensive configurability.
For Frontend: TBD.

Documentation Format
HTML for best readability/design and compatibility for hosting on various platforms.

Hosting
Sphinx generates HTML files which can be hosted for example on GitHub Pages or readthedocs.

Alternatives to Consider

Additional Context

This page provides a good guideline on the content of a project documentation
https://coderefinery.github.io/documentation/wishlist/

Create AutoDev agent

What problem or use case are you trying to solve?
The primary challenge is overcoming the limitations of using a single AI model, which can sometimes get stuck in loops or produce lower-quality content as the interaction lengthens. Incorporating two AI models, a small language model (SLM) and a large language model (LLM), with AutoDev's framework aims to enhance the efficiency and quality of generated content by ensuring detailed and focused responses tailored to user needs.

Describe the UX of the solution you'd like
Users will interact seamlessly with both the SLM and LLM. The SLM acts as an intermediary, refining instructions and feedback for the LLM to ensure the generated content accurately meets user requirements. This setup will be embedded in the AutoDev framework, which automates software development tasks with AI agents, enhancing the user experience by providing a more efficient, autonomous, and secure development process.

Do you have thoughts on the technical implementation?
The solution will utilize AutoDev's ability to manage AI agents and execute code . The SLM will be integrated to pre-process user requests and post-process the LLM's outputs, ensuring clarity and relevance. The LLM, being the primary model, will generate the content based on refined inputs. This setup can be hosted locally or accessed via API, depending on the user's preference and resource availability.

Describe alternatives you've considered
An alternative considered was enhancing a single AI model's training to handle a wider range of tasks more effectively. However, this approach doesn't fully address the issue of maintaining focus and quality in extended interactions as efficiently as using two specialized models.

Additional context
AutoDev is a Microsoft-developed AI-powered software development framework that aims to redefine the development process by enabling AI agents to autonomously perform tasks like code editing, advanced Git operations, and comprehensive testing. Incorporating AutoDev with the dual AI model architecture could significantly improve the automation and quality of software development tasks, leveraging the strengths of each AI model and AutoDev's autonomous capabilities for a synergistic effect. This integration offers a promising avenue for enhancing the adaptability, efficiency, and user experience of AI-driven development projects.

Keypresses in Terminal throws the exception

Describe the bug

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 240, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/middleware/errors.py", line 151, in __call__
    await self.app(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 375, in handle
    await self.app(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 98, in app
    await wrap_app_handling_exceptions(app, session)(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 96, in app
    await func(session)
  File "/opt/homebrew/lib/python3.11/site-packages/fastapi/routing.py", line 348, in app
    await dependant.call(**values)
  File "/Users/rudrani.angira/devin/OpenDevin/server/server.py", line 42, in websocket_endpoint
    data = await websocket.receive_json()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/websockets.py", line 145, in receive_json
    return json.loads(text)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

Steps to Reproduce

  1. Start the application (server and frontend)
  2. change the tab from terminal to code-editor and back
  3. type something on terminal window

Expected behavior
No error thrown till editing i not allowed
Actual behavior
Throws the above exception
Additional context

Docker terminal state

Describe the bug
Currently, our DockerInteractive terminal loses some state. In particular, if the agent runs a cd command, the next command doesn't run inside that directory.

Steps to Reproduce

  1. uvicorn opendevin.server.listen:app --reload --port 3000
  2. websocat ws://127.0.0.1:3000/ws (in a second terminal)
  3. send these messages:
{"action": "run", "command": "ls"}
{"action": "run", "command": "mkdir foo && cd foo && touch file.txt"}
{"action": "run", "command": "ls"}

Expected behavior

  • output of second ls command shows file.txt

Actual behavior

  • output of second ls command shows directory foo

Additional context
I'm not sure what other state we might be losing with docker's exec command. The only other one I can think of is exported environment variables.

Suggested solution
Two ways we could go here:

  • Create a long-lived shell connection to the running docker container, e.g. via ssh
  • Figure out cwd at the end of each exec, and use that as workdir for the next exec

Backend Mock

Currently, frontend development requires running the backend. It would be great if we could mock backend responses to some degree. This allows us to test frontend features in isolation.

Originally proposed by @xcodebuild in #128

Control Loop: long term planning and execution

The biggest, most complicated aspect of Devin is long-term planning and execution. I'd like to start a discussion about how this might work in OpenDevin.

There's some recent prior work from Microsoft with some impressive results. I'll summarize here, with some commentary.

Overall Flow

  • User specifies objective and associated settings
  • Conversation Manager kicks in
  • Sends convo to Agent Scheduler
  • Agents execute commands
  • Output is placed back into the conversation
  • Rinse and repeat

Configuraiton

  • A YAML file defines a set of actions/commands the bot can take (e.g. npm test)
    • comment: why not just leave it open-ended?
  • You can have different agents with different capabilities, e.g. a "dev agent" and a "reviewer agent", who work collaboratively
    • comment: this sounds like MetaGPT

Components

Conversation Manager

  • maintains message history and command outputs
  • decides when to interrupt the conversation
    • comment: for what? more info from the user?
  • decides when the conversation is over, i.e. task has been completed
    • agent can send a "stop" command, max tokens can be reached, problems w/ execution environment

Parser

  • interprets agent output and turns it into commands, file edits, etc
  • in case of parsing failure, a message is sent back to the agent to rewrite its command

Output Organizer

  • Takes command output and selectively places it into the conversation history
    • sometimes summarizes the content first
    • comment: why not just drop everything back into the conversation history (maybe truncating really long CLI output)

Agent Scheduler

  • orchestrates different agents
  • uses different algos for deciding who gets to go next
    • round-robin: everyone takes turns in order
    • token-based: agent gets to keep going until it says it's done
    • priority-based: agents go based on (user defined?) priority

Tools Library

  • file editing (can edit entire file, or specify start line and end line)
  • retrieval (file contents, ls, grep). Seems to use vector search as well
  • build and execution: abstracts away the implementation in favor of simple commands like build foo
  • testing and validation: includes linters and bug-finding utils
  • git: can commit, push, merge
  • communication: can as human for input/feedback, can talk to other agents

Evaluation Environment

  • runs in Docker

Add Openrouter API option

Openrouter hosts many of the latest models, both paid and open source, like Claude 3 Opus (unregulated beta).

Moreover you can fund use of the paid ones in one place just by adding funds to Openrouter account.

Enable wiki and/or discussions, or create issue templates

There's a lot of chatter in the Issues. Opening up discussions or the wiki might help cut down on it.

Alternatively, issue templates (ideas, feedback, bugs) would at least help triage. I'm happy to take a stab at this.

(And thanks for getting this rolling @huybery! Very excited for the project, looking forward to contributing some code. If there's any way I can help with logistics just let me know.)

UI layout changes as you switch between tabs

Describe the bug

When you switch between tabs, the layout changes slightly, making the UI flicker.

Steps to Reproduce

  1. npm start
  2. Switch between planner and terminal tabs on Safari on macOS
  3. You will see the UI flicker (see screenshots below)

Additional context

image image

Frontend: Implement browser tab

In the actual Devin demo there is a browser tab that allows the user to see which pages the assistant is currently looking at. But we do not have such a tab within our current prototype frontend.

We could add such a tab. Contributors for this welcome!

Frontend/Backend: Connect chat interface to agent

We now are close to having a prototype frontend design, so a natural next step is to connect the frontend to an agent.

We have an issue (#20) and PR (#35) for this, and also a prototype API design (#44) that would allow all of them to communicate.

These are not yet merged into main, but if we assume that these or something similar will be merged, then a next step would be to make it so that when we press the "send" button on the frontend chat interface, it uses the websocket API to send a message to the agent, and the agent provides a response which is displayed in the frontend.

Add formatting and linting for typescript components

Right now we don't have any central standard for formatting typescript, and because of this various PRs are doing things like changing the formatting, which makes it difficult to focus on the actual content of the PR.

It'd be good to:

  • choose our formatting standard for typescript (maybe ESLint+Prettier)
  • add CI to check the linting
  • add git pre-commit hook to ensure that these are run on commit

Any comments or contributions are welcome!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.