opendevin / opendevin Goto Github PK

🐚 OpenDevin: Code Less, Make More

Home Page: https://opendevin.github.io/OpenDevin/

License: MIT License

HTML 0.31% CSS 0.09% TypeScript 13.62% Python 76.94% Dockerfile 0.32% Shell 7.23% JavaScript 0.24% Makefile 1.08% Ruby 0.05% Roff 0.13%

agent artificial-intelligence llm

opendevin's Introduction

OpenDevin: Code Less, Make More

Welcome to OpenDevin, a platform for autonomous software engineers, powered by AI and LLMs.

OpenDevin agents collaborate with human developers to write code, fix bugs, and ship features.

⚡ Getting Started

The easiest way to run OpenDevin is inside a Docker container. It works best with the most recent version of Docker, 26.0.0. You must be using Linux, Mac OS, or WSL on Windows.

To start OpenDevin in a docker container, run the following commands in your terminal:

Warning

When you run the following command, files in ./workspace may be modified or deleted.

OPENDEVIN_WORKSPACE=$(pwd)/workspace
docker run -it \
    --pull=always \
    -e SANDBOX_USER_ID=$(id -u) \
    -e PERSIST_SANDBOX="true" \
    -e SSH_PASSWORD="make something up here" \
    -e WORKSPACE_MOUNT_PATH=$OPENDEVIN_WORKSPACE \
    -v $OPENDEVIN_WORKSPACE:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name opendevin-app-$(date +%Y%m%d%H%M%S) \
    ghcr.io/opendevin/opendevin:0.6

You'll find OpenDevin running at http://localhost:3000 with access to ./workspace. To have OpenDevin operate on your code, place it in ./workspace.

OpenDevin will only have access to this workspace folder. The rest of your system will not be affected as it runs in a secured docker sandbox.

🚀 Documentation

To learn more about the project, and for tips on using OpenDevin, check out our documentation.

There you'll find resources on how to use different LLM providers (like ollama and Anthropic's Claude), troubleshooting resources, and advanced configuration options.

🤝 How to Contribute

OpenDevin is a community-driven project, and we welcome contributions from everyone. Whether you're a developer, a researcher, or simply enthusiastic about advancing the field of software engineering with AI, there are many ways to get involved:

Code Contributions: Help us develop new agents, core functionality, the frontend and other interfaces, or sandboxing solutions.
Research and Evaluation: Contribute to our understanding of LLMs in software engineering, participate in evaluating the models, or suggest improvements.
Feedback and Testing: Use the OpenDevin toolset, report bugs, suggest features, or provide feedback on usability.

For details, please check CONTRIBUTING.md.

🤖 Join Our Community

Whether you're a developer, a researcher, or simply enthusiastic about OpenDevin, we'd love to have you in our community. Let's make software engineering better together!

Slack workspace - Here we talk about research, architecture, and future development.
Discord server - This is a community-run server for general discussion, questions, and feedback.

📈 Progress

📜 License

Distributed under the MIT License. See LICENSE for more information.

📚 Cite

@misc{opendevin2024,
  author       = {{OpenDevin Team}},
  title        = {{OpenDevin: An Open Platform for AI Software Developers as Generalist Agents}},
  year         = {2024},
  version      = {v1.0},
  howpublished = {\url{https://github.com/OpenDevin/OpenDevin}},
  note         = {Accessed: ENTER THE DATE YOU ACCESSED THE PROJECT}
}

opendevin's People

Contributors

Stargazers

Watchers

Forkers

plurigrid muhammadkotb yangqu allinbsv mivanovitch anuraag-ch acumennations zengdard mathengem anarresian sunwood-ai-labs ukaserge squareandcompass pk1762012 raphaelsc19 taocao quant-360 wanhuo yuanjun5681 asadm hrutik7 abdellah-hallou shanthshivam tarungupta83 maharshpatelx darksidesfear cephren karol-depka venkatmanavarthi kinofsin crtag dxd214 realsnick thanhpham1987 razzu75 z-zeechung 01anuraganand vsujeesh liwagu maveri1kog yesouicom tolgatasci shubhamjagtap2000 xingyaoww oktaydbk54 geohotstan rb125 naveenkai xiangyue9607 enyst javiervicho anergcorp taichengguo leavenotrace leixy76 liunix61 hongvincent hiphopcoders nickscherbakov zackoliy frankendeba jannikwinghart lee180817 yimothysu didopps mslmr369 yuvraj042003 pansophism everhusk stophobia jonplumb5789 flexdinero lloydchang hfwguitar chaudharysurya14 feiok zhu-weijie lycsqq ibest6789 generalweare-b flashyzool45 maintemagreator trust4mechikktinny arani-k nicsyscalamarket suryavirkapur jackfasc-e t-monstorma broodsee90keeperbe scorpions11 glorydoll37plentyeat tonyonst56 zhangzhuobys imhamad svorwerk-flextg henderj akshat-0001 artembrenner objsgit m1ndb0ts

opendevin's Issues

Feature Outline and Requirements Engineering

Took a crack at what I think this thing should do (with ChatGPT of course).

Ideal Scope and Capabilities

1. Task Understanding

Natural Language Processing (NLP): The AI must excel in understanding software development tasks described in natural language, including vague or incomplete specifications. It should ask clarifying questions if the task description is not clear.
Contextual Interpretation: Ability to understand the context of a project or a codebase to make relevant suggestions or generate appropriate code. This includes understanding the specific libraries, frameworks, and coding standards in use.

2. Code Generation

Multi-Language Support: Generate code in multiple programming languages, understanding the idiomatic nuances of each.
Adaptive Coding Style: Adapt to the existing codebase's style, following naming conventions, commenting styles, and structural patterns.
Algorithm Design: Beyond translating tasks into code, the AI should be capable of designing algorithms to solve complex problems efficiently.

3. Debugging

Error Detection: Identify syntax errors, runtime errors, and logical errors in code.
Error Explanation: Provide clear explanations for identified errors, making it easier for human developers to understand and fix them.
Suggest Fixes: Offer one or more solutions to fix the identified errors, considering the most efficient and idiomatic approaches.

4. Code Optimization

Performance Optimization: Suggest or automatically refactor code to improve performance, such as reducing time complexity or optimizing resource usage.
Readability and Maintainability: Refactor code to improve readability and maintainability, following best practices and design patterns.
Security Enhancements: Identify and fix security vulnerabilities, ensuring the code adheres to security best practices.

5. Documentation

Automatic Documentation: Generate comprehensive and understandable documentation for code, including function/method descriptions, parameter explanations, and example usage.
Code Comments: Add meaningful comments within the code to explain complex logic or important decisions.
Update Documentation: Keep documentation synchronized with code changes, updating descriptions and examples as the code evolves.

6. Collaboration

Version Control: Understand and execute version control operations, such as commits, merges, and pull requests, with meaningful commit messages.
Code Reviews: Participate in code review processes by providing suggestions for improvements and identifying potential issues in others' code.
Team Communication: If integrated into team communication tools, the AI could summarize code changes, explain technical decisions, and facilitate knowledge sharing.

7. Learning and Adaptation

Feedback Incorporation: Use feedback from users to improve task understanding, code generation quality, and debugging capabilities.
Continuous Learning: Stay updated with the latest programming languages, frameworks, and best practices by continuously incorporating new information into its knowledge base.

Reasonable MVP

This is something I think is achievable. Pick a typical codebase (a Node.js backend API) which generally is mostly glue code that is easy to reason about. (Unlike a frontend with layout!)

MVP Scope for an AI Node.js Engineer

1. Basic Task Understanding and Code Generation

Focus on Common Node.js Tasks: Start with understanding and generating code for a set of common Node.js development tasks, such as setting up a server with Express, connecting to a MongoDB database, or handling REST API requests.
Template-Based Code Generation: Utilize a library of code templates and patterns for common tasks and scenarios in Node.js applications. This approach can speed up the MVP development by relying on proven solutions.

2. Simple Debugging and Error Handling

Static Code Analysis: Integrate basic static code analysis to identify syntax errors and common mistakes specific to JavaScript and Node.js. This feature helps in ensuring that the generated code is error-free at a basic level.
Error Explanation and Suggestions: Provide explanations for common errors and suggest fixes. At this stage, focusing on the most frequent Node.js errors (e.g., callback errors, promise handling, and async/await issues) can add significant value.

3. Code Optimization for Performance

Best Practices Guide: Instead of automatic optimization, the MVP could include suggestions for best practices in Node.js development. This can cover topics like efficient asynchronous programming, memory management, and avoiding common pitfalls.

4. Basic Documentation Generation

Function and API Documentation: Automatically generate comments and documentation for functions, classes, and API endpoints. This feature can significantly speed up the development process and ensure that the generated code is accessible to other developers.

5. Version Control Integration

Basic Git Operations: Enable the AI to perform basic Git operations such as init, add, commit, and push. This feature can be particularly useful for automating the setup of new projects and maintaining a clean version history from the start.

Add requirements.txt to /server

There is no requirements.txt in the server directory.

Graceful shutdown of docker containers

Describe the bug
The DockerInteractive class starts a docker container for each instance that's created

Ideally we would stop and remove these containers in the __del__ function, which gets called when the instance is destroyed, or when the python process ends.

Unfortunately, the docker SDK makes a blocking call which isn't allowed during python's shutdown, so you end up with a stack trace.

Steps to Reproduce
To see the lack of cleanup:

PYTHONPATH=pwd python ./opendevin/main.py -d ./workspace -t "write a bash script that prints hello world"
wait for it to finish
docker ps should show sandbox-default

To see the graceless shutdown:

uncomment this line
Follow steps above
note the stack trace when the program finishes
docker ps still shows sandbox-default

Expected behavior
All sandbox containers are removed on shutdown

Actual behavior
Sandbox containers are left running

Additional context

Ideas for how to fix this:

Under normal circumstances (i.e. program finishes naturally, not ctrl+c or crash) we can definitely do the cleanup, e.g. by putting some cleanup logic in controller.py
Is there a way to force the docker SDK to run kill or remove?
Can we use subprocess to send a docker kill command without blocking exit?

WebSocket API

It seems to me that the frontend is primarily displaying what OpenDevin is doing to the user for visibility. The actual agent is implemented on the backend.

We'll therefore want to stream a lot of information from the backend to the frontend via WebSockets and/or Server-Sent Events. Each module of OpenDevin should receive its own events.

Below is a draft of what the events for such a WebSocket API might look like.

Terminal

terminal writes to the terminal. terminal.write(...) is a function in xterm.js, so we can forward the terminal sequences directly from the backend to the frontend. the paylod might look like

{
    "content": "\x1B[1;3;31OpenDevin\x1B[0m $"
}

Planner

planner writes to the planner in MarkDown format, which the frontend renders. we could reuse the same payload as the code endpoint below since the planner state can be represented as a single .md file.

Code

code streams code, which the frontend renders syntax-highlighted in a code editor. the code may be stored in a string array, where each element is a line of code. the payload might look like

{
    "line": 109
    "change": "INSERT",
    "content": [
        "with open(\"tmp.txt\") as f:",
        "\tcontent = f.read()"
    ]
}

line: the line number at which the code change begins
change: the type of change being made ("INSERT" or "DELETE")
content: the lines of code to insert

Browser

navigate navigates to a URL and sends a screenshot every second (or every page change). the frontend displays this URL and screenshot.

it's possible to render an <iframe />, but

this seems unnecessary because the backend already needs to access pages via Selenium
this can have security/reliability issues (such as CORS)

the payload might look like

{
    "url": "https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html",
    "screenshot": "data:image/png;base64, ..."
}

Build on top of MetaGPT

To save significant effort, we should consider building on top of metaGPT:

https://github.com/geekan/MetaGPT/tree/main

System Architecture

System Overview

The AI-powered software engineering assistant employs a multi-agent swarm model to provide a comprehensive development experience. At its core is a delegator agent that manages user interactions, project contexts, and delegates tasks to specialized agents.

Components

Web Application (Frontend)

Chat Interface: Primary user interaction point. Driven by a robust NLP engine for natural language communication.
Embedded IDE: Full-featured web IDE (Theia-based) for code development and project review.
Shell Emulator: Secure shell environment for development tasks and project setup.
Settings: Manages user preferences and access to LLM credentials.

Delegator Agent

Conversation Management: Interprets user intent, routes requests, and manages interruptions across multiple projects.
Project Contextualization: Tracks active projects, their stage, and associated data.
Task Delegation: Delegates tasks to the appropriate agents, manages dependencies, and tracks progress.
State Management: Maintains a robust system for storing and retrieving project states to handle context switching fluidly.

Specialized Agent Swarm

Requirements Engineering Agent: Excels in requirements elicitation, design suggestion, and generating architectural diagrams. May leverage specialized LLMs and knowledge bases.
Project Management Agent: Focuses on task breakdown, estimation, timelines, and potentially integrates with external PM tools.
Software Development Agent: Code-centric, responsible for code generation, stubbing, test cases, PRs, and leverages LLMs trained on code.
Release Engineering Agent: Handles environment setup, CI/CD pipelines, deployment strategies, and build configurations.
QA/QC Agent: Generates test plans, understands different testing paradigms, and may suggest tools and extensive test suites.

Backend Server

Coordination Logic: Houses the delegator agent and potentially the specialized swarm, enabling communication and orchestration.
Secure Credential Storage: Encrypted system for storing and retrieving user LLM API keys.
Shared Knowledge Base (Optional): If appropriate, a centralized store of data, learnings, and code examples to improve the collective intelligence of the agents.

External Services

**GitHub: **Integration for repository creation, code management, and issue tracking.
User-Selected LLM Providers: System connects to external LLMs (GPT-3, etc.) via a flexible API abstraction layer.
CI Server: Executes test suites, build processes, and may connect with deployment pipelines.

System Strengths

Specialization: Agents become highly focused, increasing potential for high-quality outputs in their domains.
User-Focused: The delegator creates a seamless chat-based interface, simplifying the complexity for the user.
Adaptability: LLM choices reside with the user. New LLMs or specialized agents can be integrated over time.
Resilience: The swarm model allows for potential scaling and lessens the impact of single agent failures.

We need a way to intelligently navigate large code bases.

Maybe having an LLM create an adjacency list of all the files that depend on each other and also write short descriptions of each file so that the LLM can intelligently navigate the codebase rather than just using embeddings.

Which shell sandbox env to use?

i'm not familiar with this subject... maybe proot?

Currently CloseDevin

readme.md

Give it a Chatdev like interface

Create Aider Agent

Summary
There's an open source AI pair programming tool called aider that implements something interesting to you: a bunch of python classes and functions to ask the LLM to output only the diff to apply instead of writing the whole code. This both reduces the chances of errors and greatly reduces the number of tokens to write (importantly: the completion tokens are way more expensive that the prompt tokens)

Motivation
Reduce token cost and errors.

Technical Design
A report showcasing their suff can be found here. Most of the code is here and the prompts are here.
As you can see lots of though when into this because the LLM has otherwise trouble with the number of lines etc.

Alternatives to Consider
None that I know of.

Additional context
For a personnal project I inquired about using only the functions of aider and you can read the issue here
Also, hearing about OpenDevin made me hear about devika too so I'll be posting this exact same issue on their repo too.

We'd like to support this project

We are a startup focused on developing innovative tools. Currently, we are in the process of creating an AI-powered search engine specifically designed for developers, accessible at https://devv.ai/.

We are interested in sponsoring this project and are open to including the token usage of models from OpenAI, Anthropic, Gemini, or others. Additionally, we can offer our search index infrastructure to significantly enhance the development of OpenDevin.

Please feel free to reach out to me at [email protected] to discuss further details.

Please include the option to insert a custom model

Like this

OPENAI_ENDPOINT=https://some-private.com/v1
OPENAI_API_KEY=****

Thanks

Support External Knowledge Sources

Could it be possible to attach to custom knowledge sources? (Like existing vector DB, graph DB, relational DB and other stuff?)

How to contribute?

I have some experience finetuning LLMs and synthetic data and would like to know more about how to contribute; already submitted the form but have not received the invitation from the Slack channel.
Just want to know, is there any plan for the (outer) community? For example, if I clone this repo, add some code, and make a PR, is that OK to try?

Create Dockerfile for running OpenDevin end to end

THROW IT INTO A DOCKER CONTAINER!!!!!

we need to

if you all provide me with steps into doing thiis i can make a script to do this

Join the slack channel for the discussion

Hello everyone,

Join the channel below to share our ideas and suggestions for making the Open Devin AI project successful. Here's the link: https://join.slack.com/t/opendevindisc-etw8056/shared_invite/zt-2eyriiqtw-eYvbpgXdR6iPZjNOX6498w

Local API or Gradio Client Support focus.

Gradio clients that run local language models such as “OobaBooga” and allow api support should be a major consideration for the roadmap process. Creating usable model swapping with a cache functionality is feasible. I made an example chart months ago when I saw the potential in MinP greedy sampling that Kalomaze did work on being helpful for memory driven tasked recall due to the token accuracy.

Please note that current projects like MemoryGPT allow api usage but no widespread application allows for effective model swapping or multi system offloading. It’s also important to note that a side server “chain” of cheaper machines or a GGML focused network solution could allow for more garage labs.

Current Roadblocks are memory management, non-useful hallucinations (effective hallucinations could generate better idea tokens in a agent focus), and ineffective inter model conversation solutions that are actually open source for System prompting style implementation.

The most feasible multi model solution is to allow for most elements to be cpu offloaded but for features like live training a model with a model doing RLHF being a “drop in” use that requires a GPU with enough vram for training. Unless a Traditional ram based training solution is usable with current model base such as mistral.

To summarize, a focus on using API solutions such as chatgpt or Claude will stagnate research on local language model feasibility. Creating a feasible framework for agent structures and Lora based live tuning for memory retention elements on a version based task list will most likely be the best course.

"Architecting and Customizing Database Solutions for Enhancing Core Functionalities and Backend Performance Evaluation"

The critical role of properly architected data infrastructure and the selection of specific data technologies and models cannot be overstated in the development and performance optimization of Large Language Models (LLMs) and GPT-based projects. These foundational elements are pivotal for enhancing core functionalities, achieving unprecedented process acceleration—potentially up to 1000X—and managing vast contextual volumes. Such infrastructure underpins the strategic long-term objectives of GPT-based agents, enabling them to navigate and manipulate extensive data landscapes efficiently.

Research and developments in the integration of database systems with LLMs underscore this significance. For instance, "DB-GPT: Empowering Database Interactions with Private Large Language Models" discusses optimizing database interactions through adaptive contextual learning techniques, which significantly enhance LLM performance in contextual information management. This research highlights the necessity of a robust data architecture for efficient knowledge construction and retrieval [❞].

Similarly, the exploration of LangChain's integration with GPT and database technologies reveals the transformative potential of facilitating natural language interactions with databases. By translating user requests into SQL queries, LangChain demonstrates the power of merging LLMs with database technologies, making database interactions more accessible and efficient for users without SQL expertise. This advancement is a testament to the flexibility and efficiency achievable through the strategic customization of data models and technologies [❞].

These examples illustrate the indispensable need for selecting and customizing specific data technologies and models to support the primary functionalities and performance optimization in projects involving LLMs and GPT-based agents. The correct choice and customization of these technologies not only facilitate accelerated processes but also pave the way for expanded capabilities and more ambitious long-term goals for the project.

For further insights and details, you can refer to the full articles:

Create RepoPilot Agent

It seems that Devin is based on multi-agent system, we at FSoft AI Center, proposed RepoPilot few months ago for repo-level code understanding. Adding some components like web browsing, test execution might be valuable on top of RepoPilot.

HOW I CAN RUN THIS PROJECT

Summary
Sorry for the inconvenience, can I ask if you can tell me how to run this project? Thank you very much for your help.

Motivation

Technical Design

Alternatives to Consider

Additional context

Which open source licence?

I don't have an opinion but feel the project should have a licence before people contribute.

[Evaluation] Convert Devin's output into SWE-Bench runnable format

We can format Devin's released output patches into SWE-Bench's prediction format, once that's completed, we can use it to debug the issue of SWE-Bench and make sure we can reproduce the results from Devin.

{
    "instance_id": "<Unique task instance ID>",
    "model_patch": "<.patch file content string>",
    "model_name_or_path": "<Model name here (i.e. SWE-Llama-13b)>",
}

[Evaluation] Fix SWE-Bench Evaluation on Devin's Output

Following instructions here, you will set up prediction files from Devin, and run evaluation using OpenDevin's SWE-Bench fork.

This task aims to ensure the SWE-Bench evaluation (using OpenDevin's fork) can successfully run on all of Devin's prediction files. Instead of sending PR to this repo, you should fix issues and send PRs to our SWE-Bench fork.

I have attached the log file with multiple issues running SWE-Bench on Devin's output -- Search for 'Traceback' to find exact error messages.

swe-bench-devin.log

A suggested way to get started: You may try to create one prediction JSON (see more about the prediction file format here) from each SWE-Bench repo (e.g., you will have data/predictions/sklearn.json, data/predictions/matplotlibs.json, etc). Then, you may try to run evaluations on them to debug repositories one-by-one until the issue is fixed.

Set up Python Linting/TypeChecking/CI

Currently we don't have any linting or typechecking in CI. It'd be good to add this. My suggestions are:

Linting: Ruff
Typechecking: MyPy

Assuming this, we should:

add linting
add typechecking
add CI to check the linting/typechecking
add git pre-commit hook to ensure that these are run on commit

Create LangGraph Agent

What problem or use case are you trying to solve?
OpenDevin requires an Agent implementation and LangGraph seems to be a good candidate.

Describe the UX of the solution you'd like
N\A

Do you have thoughts on the technical implementation?

Describe alternatives you've considered
LATS paper seems to be SOTA considering other alternatives.

Additional context

Docker unreachable error not reported by server on websocket connect

Describe the bug
When starting the server with Docker stopped, the server throws a backtrace error indicating Docker is not reachable instead of a clear error message.

Steps to Reproduce

Stop Docker.
Execute the command: $ uvicorn opendevin.server.listen:app --reload --port 3000.
Attempt to connect via WebSocket: websocat ws://127.0.0.1:3000/ws.

Expected behavior
The server should report a clear error message indicating Docker is down and a WebSocket connection cannot be established.

Actual behavior
The server starts and accepts the WebSocket connection, but upon attempting any operation that requires Docker, it crashes with a backtrace error pointing to Docker connectivity issues. The error log is as follows:

$ uvicorn opendevin.server.listen:app --reload --port 3000
INFO:     Will watch for changes in these directories: ['./OpenDevin-rbren/OpenDevin']
INFO:     Uvicorn running on http://127.0.0.1:3000 (Press CTRL+C to quit)
INFO:     Started reloader process [16775] using WatchFiles
INFO:     Started server process [16779]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     ('127.0.0.1', 49231) - "WebSocket /ws" [accepted]
INFO:     connection open
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connectionpool.py", line 793, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connectionpool.py", line 496, in _make_request
    conn.request(
  File "/opt/homebrew/lib/python3.11/site-packages/urllib3/connection.py", line 400, in request
    self.endheaders()
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1289, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1048, in _send_output
    self.send(msg)
  File "/opt/homebrew/Cellar/[email protected]/3.11.7_2/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 986, in send
    self.connect()
  File "/opt/homebrew/lib/python3.11/site-packages/docker/transport/unixconn.py", line 27, in connect
    sock.connect(self.unix_socket)

Frontend: Implement real terminal with xterm.js

Currently the frontend has a mocked-up terminal component.
We could replace this with a real functional terminal using xterm.js.

Contributions welcome!

Create Documentation Site

Summary

The goal of this issue is to propose a first documentation concept to iterate on and start a discussion around this topic.

Motivation

As the Project grows it would be helpful to have a good documentation, especially for new contributors and users.
The documentation has to contain all neccessary information while still beeing easy to use and maintain.

Technical Design

TL;DR
Create a seperate repository "OpenDevin/OpenDevinDocs" for documentation, containing autogenerated code documentation and manually created parts (Overview, architecture, examples)

Generation and content
The documentation should at least contain the following contents:

High-level project overview
Installation instructions, How-To Guide, Examples
Architectural diagrams
API documentation
Generated code documentation

This means documentation will consist of two parts: An autogenerated code documentation Part which is generated by GitHub actions on every commit/PR and a part for which has to be maintained manually, but less frequently.

Versioning / Repository
The documentation should be stored in a seperate repository (e.g. OpenDevin/OpenDevinDocs)
This enables tracking documentation-related issues separately, keeping the main repository focused on development.
It would also be possible to implement the same branch concept as in the code repository by automatically creating a docs branch for each code branch.

Separation between Frontend and Backend
Pro seperation: Frontend and backend use very different technology stacks. A seperation allows it to use better suited tools for each part.
Con seperation: Frontend and Backend are part of the same project, so the documentation should also consider both parts for a better understanding of the project as a whole.
Proposal: Store frontend and backend docs in the same repository with a root-level separation (similar to the code repository). Then docs for frontend and backend can be generated separately and maybe can be tied together with an index site for navigation between both parts.

Tooling
For Backend: Use Sphinx due to its wide usage and extensive configurability.
For Frontend: TBD.

Documentation Format
HTML for best readability/design and compatibility for hosting on various platforms.

Hosting
Sphinx generates HTML files which can be hosted for example on GitHub Pages or readthedocs.

Alternatives to Consider

Additional Context

This page provides a good guideline on the content of a project documentation
https://coderefinery.github.io/documentation/wishlist/

Create AutoDev agent

What problem or use case are you trying to solve?
The primary challenge is overcoming the limitations of using a single AI model, which can sometimes get stuck in loops or produce lower-quality content as the interaction lengthens. Incorporating two AI models, a small language model (SLM) and a large language model (LLM), with AutoDev's framework aims to enhance the efficiency and quality of generated content by ensuring detailed and focused responses tailored to user needs.

Describe the UX of the solution you'd like
Users will interact seamlessly with both the SLM and LLM. The SLM acts as an intermediary, refining instructions and feedback for the LLM to ensure the generated content accurately meets user requirements. This setup will be embedded in the AutoDev framework, which automates software development tasks with AI agents, enhancing the user experience by providing a more efficient, autonomous, and secure development process.

Do you have thoughts on the technical implementation?
The solution will utilize AutoDev's ability to manage AI agents and execute code . The SLM will be integrated to pre-process user requests and post-process the LLM's outputs, ensuring clarity and relevance. The LLM, being the primary model, will generate the content based on refined inputs. This setup can be hosted locally or accessed via API, depending on the user's preference and resource availability.

Describe alternatives you've considered
An alternative considered was enhancing a single AI model's training to handle a wider range of tasks more effectively. However, this approach doesn't fully address the issue of maintaining focus and quality in extended interactions as efficiently as using two specialized models.

Additional context
AutoDev is a Microsoft-developed AI-powered software development framework that aims to redefine the development process by enabling AI agents to autonomously perform tasks like code editing, advanced Git operations, and comprehensive testing. Incorporating AutoDev with the dual AI model architecture could significantly improve the automation and quality of software development tasks, leveraging the strengths of each AI model and AutoDev's autonomous capabilities for a synergistic effect. This integration offers a promising avenue for enhancing the adaptability, efficiency, and user experience of AI-driven development projects.

python or golang?

Keypresses in Terminal throws the exception

Describe the bug

Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.11/site-packages/uvicorn/protocols/websockets/websockets_impl.py", line 240, in run_asgi
    result = await self.app(self.scope, self.asgi_receive, self.asgi_send)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/middleware/errors.py", line 151, in __call__
    await self.app(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 375, in handle
    await self.app(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 98, in app
    await wrap_app_handling_exceptions(app, session)(scope, receive, send)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/routing.py", line 96, in app
    await func(session)
  File "/opt/homebrew/lib/python3.11/site-packages/fastapi/routing.py", line 348, in app
    await dependant.call(**values)
  File "/Users/rudrani.angira/devin/OpenDevin/server/server.py", line 42, in websocket_endpoint
    data = await websocket.receive_json()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/starlette/websockets.py", line 145, in receive_json
    return json.loads(text)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.4_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

Steps to Reproduce

Start the application (server and frontend)
change the tab from terminal to code-editor and back
type something on terminal window

Expected behavior
No error thrown till editing i not allowed
Actual behavior
Throws the above exception
Additional context

Create Self-Discover Prompting Agent

Based on this blog post, Devin might be using a new prompting technique such as Self-Discover: https://arxiv.org/abs/2402.03620. We might want to consider a similar prompting technique, since Self-Discover seems to be noticeably better than chain-of-thought.

Devin

Docker terminal state

Describe the bug
Currently, our DockerInteractive terminal loses some state. In particular, if the agent runs a cd command, the next command doesn't run inside that directory.

Steps to Reproduce

uvicorn opendevin.server.listen:app --reload --port 3000
websocat ws://127.0.0.1:3000/ws (in a second terminal)
send these messages:

{"action": "run", "command": "ls"}
{"action": "run", "command": "mkdir foo && cd foo && touch file.txt"}
{"action": "run", "command": "ls"}

Expected behavior

output of second ls command shows file.txt

Actual behavior

output of second ls command shows directory foo

Additional context
I'm not sure what other state we might be losing with docker's exec command. The only other one I can think of is exported environment variables.

Suggested solution
Two ways we could go here:

Create a long-lived shell connection to the running docker container, e.g. via ssh
Figure out cwd at the end of each exec, and use that as workdir for the next exec

[Evaluation TODO] Move regression tests to `evaluation/`

@rbren developed some regression tests here. It would be pretty helpful if we could standardize it to a "test suites" to test any given Agent instance (abstraction: here).

Backend Mock

Currently, frontend development requires running the backend. It would be great if we could mock backend responses to some degree. This allows us to test frontend features in isolation.

Originally proposed by @xcodebuild in #128

Control Loop: long term planning and execution

The biggest, most complicated aspect of Devin is long-term planning and execution. I'd like to start a discussion about how this might work in OpenDevin.

There's some recent prior work from Microsoft with some impressive results. I'll summarize here, with some commentary.

Overall Flow

User specifies objective and associated settings
Conversation Manager kicks in
Sends convo to Agent Scheduler
Agents execute commands
Output is placed back into the conversation
Rinse and repeat

Configuraiton

A YAML file defines a set of actions/commands the bot can take (e.g. npm test)
- comment: why not just leave it open-ended?
You can have different agents with different capabilities, e.g. a "dev agent" and a "reviewer agent", who work collaboratively
- comment: this sounds like MetaGPT

Components

Conversation Manager

maintains message history and command outputs
decides when to interrupt the conversation
- comment: for what? more info from the user?
decides when the conversation is over, i.e. task has been completed
- agent can send a "stop" command, max tokens can be reached, problems w/ execution environment

Parser

interprets agent output and turns it into commands, file edits, etc
in case of parsing failure, a message is sent back to the agent to rewrite its command

Output Organizer

Takes command output and selectively places it into the conversation history
- sometimes summarizes the content first
- comment: why not just drop everything back into the conversation history (maybe truncating really long CLI output)

Agent Scheduler

orchestrates different agents
uses different algos for deciding who gets to go next
- round-robin: everyone takes turns in order
- token-based: agent gets to keep going until it says it's done
- priority-based: agents go based on (user defined?) priority

Tools Library

file editing (can edit entire file, or specify start line and end line)
retrieval (file contents, ls, grep). Seems to use vector search as well
build and execution: abstracts away the implementation in favor of simple commands like build foo
testing and validation: includes linters and bug-finding utils
git: can commit, push, merge
communication: can as human for input/feedback, can talk to other agents

Evaluation Environment

runs in Docker

Doomsday is coming Gotta do fast

What is there fo beginners to contribute

Add Openrouter API option

Openrouter hosts many of the latest models, both paid and open source, like Claude 3 Opus (unregulated beta).

Moreover you can fund use of the paid ones in one place just by adding funds to Openrouter account.

There is a similar project called Devika

There is a similar project called Devika and I don't know if it make sense that people working on two similar projects.

Enable wiki and/or discussions, or create issue templates

There's a lot of chatter in the Issues. Opening up discussions or the wiki might help cut down on it.

Alternatively, issue templates (ideas, feedback, bugs) would at least help triage. I'm happy to take a stab at this.

(And thanks for getting this rolling @huybery! Very excited for the project, looking forward to contributing some code. If there's any way I can help with logistics just let me know.)

Frontend: implement code editor tab

In the Devin demo, there is a code editor tab, which is not implemented in our current frontend.

This could be implemented using, for example, the @monaco-editor/react.

Contributions welcome!

UI: Extra vertical scroll

Describe the bug

There's some extra vertical scroll space

Steps to Reproduce

Load UI
Scroll down

UI layout changes as you switch between tabs

Describe the bug

When you switch between tabs, the layout changes slightly, making the UI flicker.

Steps to Reproduce

npm start
Switch between planner and terminal tabs on Safari on macOS
You will see the UI flicker (see screenshots below)

Additional context

Use devin to code opendevin?

..DevinCeption!!!

Not able to commit and push

While pushing the code from vs code I'm getting the below error

Frontend: Implement browser tab

In the actual Devin demo there is a browser tab that allows the user to see which pages the assistant is currently looking at. But we do not have such a tab within our current prototype frontend.

We could add such a tab. Contributors for this welcome!

Frontend/Backend: Connect chat interface to agent

We now are close to having a prototype frontend design, so a natural next step is to connect the frontend to an agent.

We have an issue (#20) and PR (#35) for this, and also a prototype API design (#44) that would allow all of them to communicate.

These are not yet merged into main, but if we assume that these or something similar will be merged, then a next step would be to make it so that when we press the "send" button on the frontend chat interface, it uses the websocket API to send a message to the agent, and the agent provides a response which is displayed in the frontend.

Add formatting and linting for typescript components

Right now we don't have any central standard for formatting typescript, and because of this various PRs are doing things like changing the formatting, which makes it difficult to focus on the actual content of the PR.

It'd be good to:

choose our formatting standard for typescript (maybe ESLint+Prettier)
add CI to check the linting
add git pre-commit hook to ensure that these are run on commit

Any comments or contributions are welcome!

opendevin / opendevin Goto Github PK

opendevin's Introduction

OpenDevin: Code Less, Make More

⚡ Getting Started

🚀 Documentation

🤝 How to Contribute

🤖 Join Our Community

📈 Progress

📜 License

📚 Cite

opendevin's People

Contributors

Stargazers

Watchers

Forkers

opendevin's Issues

Ideal Scope and Capabilities

1. Task Understanding

2. Code Generation

3. Debugging

4. Code Optimization

5. Documentation

6. Collaboration

7. Learning and Adaptation

Reasonable MVP

MVP Scope for an AI Node.js Engineer

1. Basic Task Understanding and Code Generation

2. Simple Debugging and Error Handling

3. Code Optimization for Performance

4. Basic Documentation Generation

5. Version Control Integration

Terminal

Planner

Code

Browser

System Overview

Components

Web Application (Frontend)

Delegator Agent

Specialized Agent Swarm

Backend Server

External Services

System Strengths

Summary

Motivation

Technical Design

Alternatives to Consider

Additional Context

Overall Flow

Configuraiton

Components

Conversation Manager

Parser

Output Organizer

Agent Scheduler

Tools Library

Evaluation Environment

Recommend Projects

Recommend Topics

Recommend Org