Giter VIP home page Giter VIP logo

opendan-personal-ai-os's Introduction

OpenDAN : Your Personal AIOS

Official Website GitHub Repo stars Twitter Follow

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.

Project Introduction

OpenDAN (Open and Do Anything Now with AI) is revolutionizing the AI landscape with its Personal AI Operating System. Designed for seamless integration of diverse AI modules, it ensures unmatched interoperability. OpenDAN empowers users to craft powerful AI agents—from butlers and assistants to personal tutors and digital companions—all while retaining control. These agents can team up to tackle complex challenges, integrate with existing services, and command smart(IoT) devices.

With OpenDAN, we're putting AI in your hands, making life simpler and smarter.

This project is still in its very early stages, and there may be significant changes in the future.

Updates

After over three months of development, the code for the first version of OpenDAN MVP (0.5.1), driven by the new contributor waterflier, has been merged into the Master branch. This version has realized many concepts proposed in the PoC version of OpenDAN and completed the basic framework of the OS, especially defining the application form on AIOS. Currently, the 0.5.1 version operates in an "all-in-one" mode. For 0.5.2, we will advance the formal implementation of the OpenDAN OS kernel based on the partial framework code of the CYFS Owner Online Device(OOD) OS that has already been completed.

MVP

The main new features of OpenDAN 0.5.1 (MVP) :

  • Rapid installation and deployment of OpenDAN based on Docker, making OpenDAN compatible with a wide range of hardware environments (PC/Mac/RaspberryPI/NAS) through Docker.
  • AI Agent's large language model can be switched, now supporting locally running the open-source model (LLaMa).
  • Introduction of more built-in AI Agents:
    • Personal Assistant Jarvis : Consultant.Assistant who anages your schedule and communication records. ChatGPT alternative.
    • Information Assistant Mia : Manage your personal data and sort it into a knowledge base
    • Private English Teacher Tracy : Your private English teacher
    • ai_bash (for developers) :No longer need to memory complicated command line parameters! Bash is used by "Find FILES in ~/Documents that Contain OpenDAN".
  • Connectivity to AI Agent/Workflow via Telegram/Email.
  • Building a local private Knowledge Base based on existing file or email spiders, enabling AI Agent access to personal data.
    • Supports text files and common image formats.
    • Supports other common formats.
  • Implemented Workflow: Collaboration of Agents to solve more complex issues.
    • Built-in Workflow story_maker, integrated the AIGC tool to create audio fairy tale books.
  • Distributed AI computing core available for complex selections.
  • Manual download and installation of new Agent/Workflow.
  • OpenDAN Store : Agent/Workflow/Models One-Stop installation (Delayed to 0.5.2).

Try it NOW!

Developers click here to learn about OpenDan's system development updates.

Intro video - What is OpenDAN?

Click the image below for a demo:

Intro Video

Subscribe to updates here

https://twitter.com/openDAN_AI

Installation

There are two ways to install the Internal Test Version of OpenDAN:

  1. Installation through docker, this is also the installation method we recommend now
  2. Installing through the source code, this method may encounter some traditional Python dependence problems and requires you to have a certain ability to solve.But if you want to do secondary development of OpenDAN, this method is necessary.

Preparation before installation

  1. Docker environment This article does not introduce how to install the docker, execute it under your console
docker -version

If you can see the docker version number (> 20.0), it means that you have installed Docker. If you don't know how to install docker, you can refer to here

  1. OpenAI API Token If there is no api token, you can apply for here

Applying for the API Token may have some thresholds for new players. You can find friends around you, and he can give you a temporary, or join our internal test experience group. We will also release some free experience API token from time to time.These token is limited to the maximum consumption and effective time

Install

After executing the following command, you can install the Docker Image of OpenDAN

docker pull paios/aios:latest

Run OpenDAN

The first Run of OpenDAN needs to be initialized. You need to enter some information in the process of initialization. Therefore, when starting the docker, remember to bring the -it parameter.

OpenDAN is your Personal AIOS, so it will generate some important personal data (such as chat history with agent, schedule data, etc.) during its operation. These data will be stored on your local disk. ThereforeWe recommend that you mount the local disk into the container of Docker so that the data can be guaranteed.

docker run -v /your/local/myai/:/root/myai --name aios -it paios/aios:latest 

In the above command, we also set up a Docker instance for Docker Run named AIOS, which is convenient for subsequent operations.You can also use your favorite name instead.

After the first operation of the docker instance is created, it only needs to be executed again:

docker start -ai aios

If you plan to run in a service mode (NO UI), you don't need to bring the -AI parameter:

docker start aios

Hello, Jarvis

After the configuration is completed, you will enter a AIOS Shell, which is similar to Linux Bash and similar. The meaning of this interface is: The current user "username" is communicating with the name "Agent/Workflow of Jarvis". The current topic is default.

Say Hello to your private AI assistant Jarvis !

If everything is OK, you will get a reply from Jarvis after a moment .At this time, the OpenDAN system is running . MVP

Core Concepts and Features of OpenDAN

  1. AI Agent: Driven by a large language model, having own memory.The AI Agent completes tasks through natural language interaction.
  2. AI Workflow: Organize different AI Agents into an AI Agent Group to complete complex tasks. workflow
  3. AI Environment: Supports AI Agents to access file systems, IoT devices, network services, smart contracts, and everything on today's internet once authorized.
  4. AI Marketplace: Offer a solution for one-click installation and use of various AI applications, helping users easily access and manage AI apps.
  5. AI Model Solution: Provide a unified entry point for model search, download, and access control, making it convenient for users to find and use models suitable for their needs.
  6. Hardware-specific optimization: Optimize for specific hardware to enable smooth local running of most open-source AI applications.
  7. Strict Privacy Protection and Management: Strictly manage personal data, ranging from family albums to chat records and social media records, and provide a unified access control interface for AI applications.
  8. Personal knowledge Base:
  9. Integrated AIGC Workflow: Offer AIGC Agent/Workflow for users to train their own voice models, Lora models, knowledge models, etc., using personal data. Based on these private model data, integrate the most advanced AIGC algorithm to help people release creativity easily and build more COOL and more personalized content.
  10. Development Framework: Provide a development framework for customizing AI assistants for specific purposes, making it easy for developers to create unique AI applications / service for their customers.

Deeply Understanding OpenDAN

Build OpenDAN from source code

  1. Install the latest version of python (>= 3.11) and pip
  2. Clone the source code
    git clone https://github.com/fiatrete/OpenDAN-Personal-AI-OS.git
    cd OpenDAN-Personal-AI-OS
    
  3. Enable virtual env
    virtualenv venv
    source ./venv/bin/activate
    
  4. Install the dependent python library
    pip install -r ./src/requirements.txt
    
    Waiting for installation.
  5. Start OpenDAN through aios_shell
    python ./src/srvice/aios_shell/aios_shell.py
    
    1. If seeing error saying No ffmpeg exe could be found, you need to install it manually from https://www.ffmpeg.org/

Now OpenDAN runs in the development mode, and the directory is:

  • AIOS_ROOT: ./rootfs (/opt/aios in docker)
  • AIOS_MYAI: ~/myai (/root/myai in docer)

OpenDAN Cookbook

Chapter 1: Hello, Jarvis!

  • 1.1 Installation of OpenDAN
  • 1.2 Initial Configuration of OpenDAN
  • 1.3 Introduction to Agent and Using Jarvis
  • 1.4 Communicating with Jarvis Anytime and Anywhere via Telegram and Email
  • 1.5 Using Jarvis in Daily Life
  • 1.6 Mia and the Knowledge Base
  • 1.7 Introduction to Other Built-in Agents

Click to Read

Chapter 2: AIGC Workflow (Coming Soon)

Using Workflow to activate the AIGC feature and let the Agent team (director, artist, and narrator) collaborate to create a unique bedtime story for your child based on your instructions!

  • 2.1 Using Workflow story_maker
  • 2.2 Enabling Your Own AIGC Computation Node
  • 2.3 Training and Using Your Own AIGC LoRA Model.

Chapter 3: Develop Agent/Workflow on OpenDAN (Writing)

What's the most crucial design aspect of an operating system? Defining new forms of applications!

This article will systematically introduce what future Intelligence Applications look like, how to develop and release Intelligence Applications, and how to connect new-age Intelligence Applications with traditional computing.

  • 3.1 Developing Agents that Run on OpenDAN
  • 3.2 Developing Workflows that Run on OpenDAN
  • 3.3 Extending the Environments Accessible by Agents
  • 3.4 Releasing Various Models Trained by Yourself
  • 3.5 Expanding More Tunnels to Enhance the Accessibility of Agents/Workflow
  • 3.6 Developing Traditional dApps on the Personal Server.

Chapter 4: OpenDAN Kernel Development (Writing)

This article will introduce the design and implementation of OpenDAN's architecture

architecture

  • 4.1 Integrate your own LLM core into OpenDAN.
  • 4.2 Knowledge Base: Expand more file types, allowing Agents to better understand your knowledge graph.
  • 4.3 AI computation engine, integrating more AIGC capabilities, and accessing more computational power.
  • 4.4 OpenDAN's state management: File system and vector database.
  • 4.5 Kernel services and permission isolation.
  • 4.6 Smart gateway.

Upcoming Roadmap

  • Release PoC of OpenDAN
  • 0.5.1 Implement personal data embeding to Knownlege-Base(KB) via Spider, followed by access by AI Agent
  • 0.5.2 Separate user mode and kernel mode, Knowledge Base supports scene format and more Spiders, supports personal AIGC model training
  • 0.5.3 Release Home Environment, allowing Agents to access and control your home's IoT devices
  • 0.5.x Official version of OpenDAN Alpha. Release OpenDAN SDK 1.0.

Contributing

We welcome community members to contribute to the project, including but not limited to submitting issues, improving documentation, fixing bugs, or providing new features. You can participate in the contribution through the following ways:

  • Submit an Issue in the GitHub repository
  • Submit a Pull Request to the repository
  • Participate in discussions and development

OpenDAN utilizes the SourceDAO smart contract to incentivize the community. Developers who contribute can receive rewards in the form of OpenDAN DAO Tokens. DAO Token holders can collaboratively determine the development direction of OpenDAN. You can learn more about the rules of SourceDAO by reading this article( #25

The DAO governance page for OpenDAN is under development. Once officially launched, all contributors will receive DAO Tokens according to the rules.

⭐Star History

Star History Chart

License

The current license is MIT, but it will transition to SourceDAO in the future.

opendan-personal-ai-os's People

Contributors

alexsunxl avatar diligentcatcat avatar eltociear avatar fiatrete avatar glen0125 avatar lllucy4901 avatar lurenpluto avatar photosssa avatar rishikreddyl avatar seabornlee avatar streetycat avatar synthintel0 avatar waterflier avatar wugren avatar xiangxin72 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendan-personal-ai-os's Issues

Initial Configuration and Distribution Plan for OpenDAN DAO Contract Based on SourceDAO

I would like to propose the initial configuration and distribution plan for the OpenDAN DAO contract based on SourceDAO:

Token Economics

  • Total Supply: 1 billion tokens
  • Allowance for Additional Issuance: Yes (Equity Issuance Model). After the project achieves its initial goals and the 1 billion tokens are fully distributed, new milestones can be authorized for additional issuance. Each issuance should not exceed 5% of the total supply.

Initial Distribution:

The tokens will be distributed into the following major categories:

  • Development (Commit is Mining): 610 million,61%
  • Marketing: 150 million,15%, It is mainly used for exchanging tokens with cooperative projects, especially 3rd source projects we are currently using and planning to use in the future.
  • Investors: 150 million,15%
  • Team: 90 million,9% (The unlock schedule will align with the pace of development and will be managed by the committee)

Development Mining Roadmap:

PoC: 80 million (Already completed).

The version leader @fiatrete should assign contribution ratios to all contributors of the NoC version.People who enter the list will get DAO Tokens immediately after the contract is created.

MVP:

100 million (Currently). Version Leader: @waterflier
Sub version plan :

  • 0.5.1 Implement data capture into Knownlege-Base(KB) via Spider, followed by access by AI Agent (35 million,35%)
  • 0.5.2 Build a Personal-Center based on the KB and associate the AI Agent with accessible telgram accounts (30 million,30%)
  • 0.5.3 - Release for waitlist (5 million,5%)
  • 0.5.4 First public release (10 million,10%)
  • 0.5.5 Incorporate modifications after the first public version, workload depends on feedback (15 million,15%)
  • 0.5.6 Official version of MVP (5 million,5%)

We aslo need to apply for some marketing tokens to give to early users.

Alpha: 120 million

Beta: 150 million

Release: 160 million

Initial Committee Members:

@fiatrete:(need eth address), CTO
@maxwilliamdev:(need eth address) CFO+CMO, Investor Relations
@waterflier: (need eth address), Currently Version Leader

Upon the completion of SourceDAO's initialization, I suggest conducting a small-scale token financing round to introduce some early investors within our community. This will diversify our community composition and allow us to hear from a wider variety of voices in our community governance.

Looking forward to hearing your thoughts and feedback on this proposal.

Avoiding the Use of Exceptions in Python and Considering Alternatives

I would like to propose a change in our error handling approach in Python. Please refrain from creating new exceptions (i.e., using the raise statement).

In languages where it is not possible to declare what exceptions a function might throw, the use of exceptions should be avoided. Otherwise, there's always a risk of uncaught exceptions. Wrapping each function call with try{}, can make the code structure unpleasantly complex.

In fact, exceptions should not be used as a regular error-handling mechanism. 90% of exceptions should lead to a program stop and facilitate easy debugging.

I appreciate the design of Result in Rust. Perhaps we can build a similar facility in Python. In the meantime, using traditional error code logic (0 for success, non-zero for errors) or True/False logic would make our code more readable and stable.

I look forward to hearing your thoughts on this proposal.

Knowledge pipeline error - crash shell

Entered following command in the shell:
/knowledge add email | dir

and provided relevant inputs, and saw the crash (see snapshot)

Looks like a simple typo error (save_cosnfig instead of save_config), but thought important enough to fix since it crashes the shell, OS process altogether. I can try fixing it by submitting pull request later this week, but maybe someone at it can make a quick fix.

Screenshot 2023-11-07 at 10 39 46 AM

How to split a complex workflow

I read your pseudocode about workflow, and I found that after receiving a task (AgentMsg), the workflow has two processing methods:

  1. Find a corresponding role to handle the task
  2. All roles process the task separately and combine the results

In reality, many complex tasks need to be completed through division of labor and cooperation, that is, there will be a leader who splits tasks into smaller subtasks and performs them synchronously. This working mode is not found directly in workflow.

After thinking about it, there are roughly two scenarios where tasks need to be split:

  1. Tasks require characters with different skills to complete

Based on the current design, these roles can be constructed in the workflow, and this task can be assigned to all the roles in the workflow to handle it separately. These roles understand and decompose the task, and only deal with the part within their own responsibilities. Finally, each The processing results of roles are combined to become the final result.

  1. A single role has limited capabilities (for LLM, mainly context length?), tasks require many characters with the same skills to complete

Based on the current design, a two-level workflow can be designed. The first level constructs a role and assigns the task to the role. The role understands and splits it into a task list, and delivers the task list to the sub-workflow. The sub-workflow receives the task list. Finally, only the first subtask that has not been completed is processed, and then the status of the subtasks in the task list is modified, and a new sub-workflow is started again until all subtasks in the task list are completed.

flow chart:

graph TB
    subgraph "Workflow(leader)"
        ReceiveMsg["msg = pop_msg()"]-->SplitTask["task_list = leader_role.split_task(msg)"]-->SendToExecutor["result = send(executor_workflow)"]
        WaitResult["result = wait_result()"]
    end

    SendToExecutor.->ReceiveTaskList

    subgraph "Workflow(executor)"
        ReceiveTaskList["task_list = pop_msg()"]-->ExecuteNextTask["result = executor_role.execute(task_list)"]-->IsFinished{"is_finished(result)"}--no-->ReceiveTaskList
        IsFinished--yes-->PostResult["post_result(result)"]
    end

    PostResult.->WaitResult
Loading

For the second scenario, it is a bit like a recursive structure, which is more difficult to understand. If you can directly provide a general split mode, it will be more friendly to developers. If you want to achieve this, you may need to introduce a special role (leader ). The general process is as follows:

if self. leader is not None:
     result = await _process_msg(msg, self. leader)
     task_list = split_tasks(result);
     task_result_list = []

     for task in task_list:
         msg = result + ". please execute this task:" + task
         role = self.input_filter.select(msg)
         task_result = await _process_msg(msg, role)
         task_result_list += task_result

     result = self._merge_msg_result(task_result_list)

     chatsession.append_post(result)

In addition, is the process of merging results also an intelligent process of summarization, and does this also require a role to handle?

Local models for llama.cpp

I has a local server for llama, it's built with the project:

https://github.com/soulteary/llama-docker-playground

And I want to add it in the system, but it dependence my custom api, so I will add it a example.

But I find the base class ComputeNode is provide at src/aios_kernel, and the folder src is not a package, I think we should provide the extern interfaces in a extern package, or splite them into another project?

Objected knowleadge base, a specialized implemention for emails

Vectorized Knowledge

Large language models are trained on general corpora and without fine-tuning on user-specific data, they struggle to utilize user-related context effectively.

Users accumulate a vast amount of content that reflects their personality during their regular internet usage. This includes personal photos, tweets, Facebook posts, emails, etc. While it's possible to include all this content in the prompt during each interaction with the large language model, this approach is costly and can easily reach the token limit.

A common solution is to generate feature vectors from this content using word embedding techniques and store them in a vector database. During an interaction, the vector that is most relevant to the prompt is retrieved from the database, merged with the prompt, and then passed to the large language model.

We refer to this vectorized content as "knowledge".

Objected knownleadge base

In a personal AI system, to build a user's own knowledge base, we first need to implement various spider programs to crawl and retrieve all user-related data. Modern web content is typically rich text, including text, images, videos, hyperlinks, etc. Organizing this rich text in a tree-like structure similar to HTML is necessary, hence the need to introduce an object structure to represent this content.

Different parts of this content cannot be vectorized using the same embedding model. For instance, text and images, as well as the content of an image and its EXIF information, need separate embeddings. This means that in the vector database, the same content may have multiple vector values, and a row can represent a whole content item or just a part of it.

We need a comprehensive object structure to represent the hierarchy and relationships of content, as well as to implement the indexing and storage of objects. In the Minimum Viable Product (MVP) version, we'll implement a specialized solution for email content. In future versions, we can generalize this to handle other types of content, such as Facebook posts, tweets, etc.

Class hierarchy for email object
knowleadge

How to generate email object in spider
knowleadge

How to store email object in knowleadge base
knowleadge

Agent with knowleadge base

At the same time, we also need to explore the paradigm of using the knowledge base in Agents and workflows, so that the agent can better complete tasks in interaction with users through the context provided by the knowledge base.

Brief flow of generating and using kownleadge
knowleadge

Let's start the development of the MVP!

I think I cann't wait your guys complete the DAO Page of opendan~
In the meantime, we can start development of the OpenDAN MVP version in parallel.

The MVP goal of OpenDAN is to help people build their own knowledge base in the AI era, and based on this to support future AI Agents running on OpenDAN, the main process is as shown in the diagram.

procedures

I think we can achieve this goal through two main versions:

Step 1: Change the way people search/browse personal information, and further change the way data is stored.

  1. A better "resource manager": information manager (library) to replace the file system.
  2. Provide a conversational product for searching and browsing personal information.
  3. Through Functions expansion, allow AI Agent (possibly using cloud-based LLM kernel) to do more personalized work using KB.

Step 2: Establish a personal (information, device) center with the participation of AI Agent. Strengthen the "accessibility" and usability of personal (information/capabilities). (Read-only -> Read-write) The personal center becomes the "personal homepage" in the AI+Web3 era.

Could you please create a new branch for MVP ~ I will commit more documents through Pull Request.

Open source LLM performance evaluation

I will list the test results of various open-source models here. You can refer to these data to select models and configure devices. Of course, the evaluation of LLM is quite subjective. I also suggest you make evaluations more suitable for your needs based on your own requirements. Your opinions and suggestions on the evaluation methods and results are also welcome.

I will give the overall score in the first comment, and provide performance statistics in the second comment.

At present, I plan to complete the evaluation of several mainstream models first, and may also pay attention to some related fine-tuned models in the middle.

  1. Alpaca
  2. Vicuna
  3. Mistral
  4. Bloom
  5. Aquila

There are several tasks that need to be handled as follows:

  • Test cases
  • ChatGPT-4(as a reference)
    • Execute test cases
  • ChatGPT-3.5(as a reference)
    • Execute test cases
  • Llama 70B Chat
    • Execute test cases
  • Llama 13B Chat
    • Execute test cases
  • Alpaca
    • Download model
    • Execute test cases
  • Vicuna
    • Download model
    • Execute test cases
  • Mistral
    • Download model
    • Execute test cases
  • Falcon
    • Download model
    • Execute test cases
  • Aquila
    • Download model
    • Execute test cases

Introducing a New Project Collaborator

Hello everyone,

I wanted to formally introduce my friend @waterflier, whom I'm considering to bring on as a collaborator for this project.

Here are a few things to know about him:

  1. He has recently dedicated a significant amount of time to migrate OpenDAN's DAO implementation to SourceDAO.
  2. He initiated the OpenDAN MVP, a personal knowledge base integrated with AI, showcasing his innovative approach to leveraging technology.

Given his contributions and expertise, I believe he'll be a tremendous asset to our ongoing efforts.

Before making this official, I'd love to hear any thoughts or concerns regarding this decision from the community. Please voice your opinions below this issue, as we highly value the feedback of every member.

If there are no objections within the next 48 hours, I'll formally invite @waterflier as a collaborator to our project.

Future Branch Usage Guidelines

Dear Team,

I would like to provide some guidance on the usage of different branches moving forward:

  1. Master Branch: This branch is designated for releases. As a rule, we should aim to merge the MVP (0.5.1) branch into the master branch on a weekly basis.

  2. MVP (0.5.1) Branch: We will continue to refine the design of the Agent/Workflow and enhance the user experience based on the current 0.5.1 version. The principle is not to modify the implementation of components within the Kernel.

  3. New 0.5.2 Branch: As per our plan (#46 ), we will begin the separation of kernel-mode and user-mode in this branch. Given the significant architectural changes to the system, it has been separated for this purpose. This branch, in principle, will not modify the implementation of any non-kernel components such as Agent/Workflow.

Please adhere to these guidelines to ensure smooth and organized development. If you have any questions or need further clarification, please don't hesitate to ask.

OpenAI’s 0613 updates seems awesome

OpenAI just pushed out their 0613 updates, now rocking support for 'functions'. This is huge! It's going to make the Agent's call to functional modules much easier and make the function calls more stable. I'm stoked about this and can't wait to revamp Jarvis's code using this new 'functions' style. Hang tight, I should have this ready for you in a couple of days.

The AI community is moving really fast. Just when I thought I could take a breather and get my head around the AI OS architecture, this thrilling update comes out of nowhere. And this update coincides with my understanding of the separation of intelligence and computing. I believe the open source LLM community will quickly catch up with support for functions, even whipping out specialized models optimized just for functions.

Proposal for Adjusting the Goals for Version 0.5.2

Dear Team,

Given the recent launch of OpenAI's new version in early November 2023, many of us may have felt a profound shift in the industry. As the world changes, I believe we should adapt accordingly. Here are some of my thoughts:

  1. Affirmation of Our Path: OpenAI's latest release, particularly the functionalities of the so-called GPTs Agent platform, is largely similar to our 0.5.1 version released on September 28th. This strongly affirms the correctness of our direction. OpenAI has done a great job educating the market about the Agent, so we no longer need to emphasize the correct use of LLM based on the Agent through version releases. This part of user education product design can be simplified.

  2. Innovation in Version 0.5.2: For our new version (0.5.2), besides maintaining the combinational advantages brought by private deployment of LLM, I believe we need to implement some of the innovative ideas we've discussed about the Agent. This is crucial to maintaining our leading position and avoiding the impression that OpenDAN is merely a follower of GPTs.

  3. Integration of OpenAI's New Capabilities: We should fully integrate the new capabilities brought by OpenAI's latest release, especially the longer Token Windows, GPT-V, and Code-interpreter. I believe these new features can effectively solve some known issues.

Therefore, I propose to adjust the goals and plans for version 0.5.2. Here are the core objectives:

  • Aim to release version 0.5.2 by the end of November, focusing on:
    • Launching a new Agent with Autonomous capabilities and multi-Agent collaboration based on Workspace.
    • The integrated product of 0.5.2 will be a private deployment email analysis Agent for small and medium-sized enterprises. This will allow any company to better support its CEO and other management positions through LLM while ensuring privacy and security.
    • (Optional) By combining LLM and AIGC, build an Agent-based personalized AIGC application, such as a "children's audio picture book" generator that includes both text-to-image and text-to-sound.
    • (Optional) Through multi-Agent collaboration, fully utilize the capabilities of GPT4-Turbo, and attempt to let AI and engineers collaborate on research and development tasks based on Git.

The detailed version plan is as follows:

MVP plan adjustment

In order to keep the list below too long, the system distributed version is 0.5.3, I think we will open another ISSUE discussion and record, this list does not include.

The modules that are not specially explained are components completed in the 0.5.2 plan

Some Explanation

Upgrade Agent Working Cycle

The goal is to transform the Agent from a passive message-handling Assistant to an actively acting Agent based on roles. The concept of the relevant modules mainly involves the Agent's behavior patterns (4 types), the Agent's capabilities, and the Agent's memory management (learning and introspection).

For a detailed introduction, refer here: #91

Workspace Environment

The Workspace supports the implementation of the Agent Working Cycle design. Its core abstraction is defined as: saving the shared state needed for Agent collaboration and providing the basic capabilities for Agents to complete their work. I carefully referenced AutoGPT in the design. The difference between Workspace and AutoGPT is the emphasis on collaboration (Agent with Agent, Agent with humans). After contemplation, the Workspace primarily consists of the following components:

  1. Task/Todo manager, representing the unfinished tasks in the Workspace.
  2. Saving work logs.
  3. Saving learning outcomes and records of known documents.
  4. Ability to access the Knowledge Base (RAG support).
  5. Virtual file system for saving any work outcomes.
  6. A set of SQL-based databases to save any structured data.
  7. Real-time internet search capability.
  8. Ability to use existing internet services.
  9. Ability to use major blockchain systems (Web3).
  10. Ability to write/improve code (based on git), run code, and publish services.
  11. Communication capabilities with the outside world.
  12. Ability to use social networks.

Each Agent has its own private Workspace, not shared with others. I hope to achieve diversity through the combination of "Agent and Workflow Role". Each user "trains" different Agents through their usage habits, and then these Agents collaborate to complete complex tasks defined in the Workflow. The final results of these complex tasks can reflect the user's inherent personality and preferences.

This component design also reflects my thoughts on the key question, "What capabilities should we endow an Agent with, and how do we control the security boundaries when it transitions from a consultant to a steward?" It's not a simple question, so I anticipate this component will continue to iterate in the future.

Agent Message MIME Support

Agent Message MIME Support means that Agents can handle multiple types of messages, including images, videos, audio, files, etc. For most Agents, this requires adding a customizable standard step of parsing messages in the message handling process. The input of this step is the message's MIME type, and the output is the text content of the message. This step can be implemented by calling the text_parser module.

Another core requirement of MIME support is to use a unified method to save these non-text content data.

Text base Knowledge Base

In 0.5.1, we mainly implemented RAG based on the popular Embedding + vector database solution. Through practice, we found that this solution did not fully utilize the potential of LLM, so I want to introduce two new modes to further enhance RAG:

  1. Build a local text search engine that LLM can use for proper local searches when needed.
  2. Assuming LLM will become cheaper in the future, let LLM learn all the documents once and organize the learning results by directory structure (Text Summary). LLM can use browsing methods to find the information it needs.

Text Parser Support

Both MIME Support and Text-based Knowledge Base require the system to support converting various document formats into text that can express semantics as much as possible. This component, known as TextParser, should be implemented as an open and extensible framework, given the vast amount of digital content that exists in different formats.

Local Text Search

Using traditional inverted index technology to save all document content locally and provide rapid local search capabilities. The implementation of this component can refer to ElasticSearch.

Text Summary

Using the capabilities of LLM to learn all the documents and then save the learning results locally. This behavior can be considered "Self-Learn". Users can let Agents responsible for organizing materials use different prompts according to the purpose of organizing the materials to obtain more targeted results.

Stable Diffusion Controler Agent

Practice the concept of "Agent as a new era method of using computing", replacing the complex Stable Diffusion WebUI with an easy-to-use Agent. Help users complete complex AIGC tasks and build a paradigm. This paradigm can cover the entire process of AIGC: LORA training, use, model downloading, plugin downloading, generation of prompt words, selection of AIGC results.

Email Agent/CEO assistant

The integrated test product of 0.5.2, aimed at private deployment for small and medium-sized enterprises, is a CEO Assistant that can read all company emails and materials. I am writing a detailed product document, which is not elaborated here.


I look forward to hearing your thoughts on these proposed adjustments.

Error of telegram bot

After I connect a telegram bot,and input the bot token.
I try to send a message in telegram,but got a error below.
So I feel that the configuration and use of this telegram are not very clear now.

[2023-10-31 15:39:57,725]aios_kernel.tg_tunnel[INFO]: on_message: 27 from sunxinle (sunxinle) to None(5577362278)
[2023-10-31 15:39:57,725]aios_kernel.tg_tunnel[ERROR]: tg_tunnel error:cannot access local variable 'contact' where it is not associated with a value
[2023-10-31 15:39:59,076]httpx[INFO]: HTTP Request: POST https://api.telegram.org/*********/getUpdates "HTTP/1.1 200 OK"
[2023-10-31 15:39:59,077]aios_kernel.tg_tunnel[INFO]: on_message: 27 from sunxinle (sunxinle) to None(5577362278)
[2023-10-31 15:39:59,077]aios_kernel.tg_tunnel[WARNING]: ignore message from telegram bot 5577362278

Custom LLM

I have already experienced Jarvis and found it very interesting.

It looks like you are researching a new self-developed LLM engine. When can we experience it? Will it be open source?

Discussion on the contribution rates of the NoC version’s contributors

As mentioned in issue #34, here are the contribution rates of the NoC version’s contributors from my point of view:

contributor contribution rates
@fiatrete 45%
@maxwilliamdev 20%
@troy6en 10%
@DiligentCatCat 10%
@Synthintel0 10%
@suntodai 5%

Thank you for your contributions during the NoC phase. Please note that this is an approximate estimation as the nature and expertise required for tasks vary.

We started with a vague concept, and now the project's objectives are becoming clearer. A special shoutout to @troy6en, @Synthintel0, and @DiligentCatCat, who took on a significant amount of development work in addition to their full-time jobs. @maxwilliamdev has been an active community advocate, making outstanding contributions to the project's promotion in the AI community, and inspiring many creative ideas. And last but not least, @suntodai, a seasoned AI scientist, has provided invaluable advice for the project's evolution.

Please clearly reply to this issue with either 'agree' or 'disagree' to indicate your stance on the proposed contribution rate assessment. And attach your ETH address if you agree.

The workflow and agent of the same application should be placed in the same directory

For application developers, it is easier to manage the configuration and code of their applications in the same directory, and the chance of reusing workflows and agents between different applications is very small. Third-party application access should not be allowed by default. It's also safer. If there is a need for reuse, it can also be exported through configuration.

What abilities do you most wish to add to Jarvis?

Jarvis is the built-in AI Agent in OpenDAN and serves as the primary operational channel for users to access various AI capabilities. Today, we have released the Jarvis Discord bot. With it, you will be able to communicate with Jarvis on the Discord app on any device and utilize various AI capabilities on the server through Jarvis.

Right now we have developed Stable Diffusion functional module, twitter functional module, Youtube video analysis function module and it's not complicated to add more functional modules with the OpenDAN develop framework.

What abilities do you most wish to add to Jarvis?
Feel free to let us know.

Some suggestions about python code style

After looked at the mvp branch of some of the latest commit code, feel inside the python code writing style maybe some conflict, in accordance with the management, our python code style should follow the PEP-8 specification, as follows

https://peps.python.org/pep-0008/

where

Class names should normally use the CapWords convention.

So the code below corresponds to

The correct class name for agent_manager should be AgentManager. @waterflier

Some question on discord-bot

  1. According to README.md of agent_jarvis and discord_bot. There are conflicts in their listening port (both are 10000).
  2. I cannot find a full guide to deploy the discord bot, jarvis and discord-bot are both server, how can they communication to each other?

Computing resource scheduling.

I have compiled a rough definition and design of the computing resource module, and I hope you can join the discussion to reach a consensus on this module.

Compute Node

A computing resource node in the system that should have the following functions:

  1. Install and start several services that support computing

  2. Accept calculation tasks submitted by users and execute them

  3. Schedule these tasks (various tasks may be executed in parallel or queued)

  4. Some preset standard task types, while others are customized by developers

  5. Some computing resources are public, and some may require authorization

Compute Task Manager

The singleton component responsible for managing computing resources in the system should have the following functions:

  1. Accept registration of 'Compute Node'

  2. Accept calculation tasks submitted by users and select appropriate nodes to execute

  3. Maintain load balancing among various computing nodes

Flowchart

  1. Start up
graph TB
    subgraph ComputeNode["ComputeNode(node_id, node_entry)"]
        InstallService["InstallService(type, service_entry)"]-->StartService["StartService(type, service_entry)"]-->ServiceList["Services{type, service_entry}"]
    end

    ServiceList.->RegisterNode

    subgraph ComputeTaskManager
        RegisterNode["StartService(node_id, node_entry)"]-->Nodes["Nodes{node_id, node_entry}, Services{type, node_id[]}"]
    end
Loading
  1. Execute task
graph TB
    subgraph ComputeTaskManager
        RunTask["Run(type, params, [node])"]-->SpecifyNode{"if (node)"}
        PostTask["PostTask(type, params, node)"]
        SpecifyNode--yes-->PostTask
        SpecifyNode--No-->FilterNode["nodes=Services(type)"]-->NextNode["node = nodes.next()"]
        WaitResult["result=WaitResult()"]
    end

    NextNode.->IsBusy-.yes.->NextNode
    IsBusy-.no.->PostTask
    PostTask.->ExecuteTask

    subgraph "ComputeNode(Any)"
        IsBusy{"is busy"}
    end

    subgraph "ComputeNode(Selected)"
        ExecuteTask["result=Execute(type, params)"]-->PostResult["PostResult(result)"]
    end

    PostResult.->WaitResult
Loading

I think we can first design a universal task scheduling framework, and then support various execution environments(docker eg.) and preset different task types within this framework.

Project Proposal for Package System

Hi guys~

Follow the guildeline in #32, I'm pleased to announce that I have completed the project proposal for our Package System. I hope this document can serve as an example for future project proposals within our community.

The proposal has been submitted and can be found at the following path: https://github.com/fiatrete/OpenDAN-Personal-AI-OS/blob/MVP/doc/mvp/package_manager.md. Please note that I will continue to update this document throughout the development process to reflect the most current status and plans.

Here is the full text of the proposal:

Problems to Solve

The Package Manager is a fundamental component of the system for managing Packages.
The sub system provides fundamental support for packaging, publishing, downloading, verifying, installing, and loading folders containing required packages under different scenarios. Based on relevant modules, it's easy to build a package management system similar to apt/pip/npm.

The system design has deeply referenced Git and NDN networks. The distinction between client and server is not that important. Through cryptography, it achieves decentralized trustworthy verification. Any client can become an effective repo server through simple configuration.

Design

Let's start by introducing the two important processes.

Load Package

load_package

Install Package

install_package

Note that the dependency check during installation allows for the missing packages to be installed into the current environment.

Some Basic Concepts

  • env:A target environment consisting of a series of configuration files, where packages can be loaded/installed.
  • pkg:A Package(pkg) is either a folder or a file that serves the same purpose as a folder (such as zip, iso, etc.).
  • pkg_name:A unique string used to label a package. It's usually a readable package name, but can also include the version number or even the ContentId.
  • version_id:A complete version number is made up of channel_name and version. channel_name is generally not specified and is configured uniformly by the env (e.g., all use the nightly channel or all use the release channel). Version_ids are divided into exact versions and conditional versions.
  • mediainfo:A recognizable file or folder format. Successfully loading a package means obtaining a confirmed MediaInfo, from which the package's content can be further read.
  • Author: The package's author, which includes the author's friendly name (unique) and public key. This system cannot determine the trustworthiness of the relationship between the friendly name and the public key; upper layers need to extend this based on product design. Verification through DNS systems or smart contracts is common.
  • Distributor:The distributor is responsible for maintaining index_db.
  • index_db:A database containing a series of package information (pkg_info), maintained by the distributor. index_db is a complete file saved within the env. It is updated through the index_db update operation in the env from the distributor. Since -pkg_info contains the package's cid info and the corresponding author's signature, the distributor can only select the version released by the author and cannot release packages on behalf of the Author.
  • repo_server:Includes pkg_server and index_db_server, which can be deployed separately.
  • ndn_client: A library for trustworthy downloading of packages through the package's ContentId.

Package Env Directory Structure

pkg_tree

The diagram represents a typical pkg_env directory structure, where:

  • pkg.cfg.toml The root directory contains a pkg.cfg.toml file, which is the main configuration file for the environment.
  • index-db Inside pkg.cfg.toml, there are two external files included: an external pkg.lock (local version locking) and .pkgs/index-db.toml (independently distributed package index by the distributor).
  • .pkgs Similar to .git, this folder contains a series of files and directories not directly used by users but supporting package management. It stores all different versions of packages using the naming convention $.pkg_name/pkg_name#cid.
  • pkgs This directory is user-facing, structured according to successful package installations. Installed packages are soft-linked to the actual files/directories under the .pkgs folder. This minimizes redundant file copying and makes it convenient for users to view and modify. In this example, pkg_nameA has two versions (the default version and 1.0.3), both pointing to the actual folders with CIDs in .pkgs.

During the local testing phase, users can easily place their own packages in the pkgs directory for successful loading. Any local changes will not affect the content of the index-db, nor will it impact testing. Cryptographic verification only occurs during the download and installation process.

The above environment isolation design also provides a fairly standard solution for common dependency conflicts.

Test

Load Package Test

Load testing some times not depend on index-db.

Loading Using pkg_id

This is the simplest mode for users.
Not specifying a version number usually means using the default version. When an index exists, the default version is fetched from the index.
In the absence of an index, the default version will prioritize links without suffixes; otherwise, it will use the link with the highest version.

load("english-dict")

Actual load:

./pkgs/english-dict/
./pkgs/english-dict#0.1.5/

When there's an index-db, it will determine the default version based on the index-db information and load using the directory name with the version:

./pkgs/english-dict#0.1.3/

Note that even when there's an index-db with a cid, the system still primarily loads by symbol. This gives system administrators more flexibility. Try to avoid modifying directories named with cids.

Loading Using pkg_id + cid

load("english-dict#sha256:1234567890")

This is the simplest method and doesn't rely on index-db. The system can precisely locate and load the package, which is stored in:

./.pkgs/english-dict/sha256:1234567890/

Verification of media information does not occur before loading; it only takes place after the download is complete.

The channel is part of the version. If it's not specified, the default channel name will be read from the environment.
If the version number is fixed, the directory is directly constructed for loading. If the version number is conditional, it depends on the locally installed version list to first determine the version, and then constructs the directory for loading.

load("english-dict#>0.1.4")

The package will be loaded based on the actual version installed locally:

./pkgs/english-dict#0.1.5/

Automatic local repair logic for loading using an exact version number:
At this point, if that directory does not exist, but it can be seen from the index-db that the cid corresponding to version 0.1.5 is already installed locally, loading will fail by default (simply deleting the version link effectively blocks a version).

Only when the option to automatically repair links during loading is enabled (which requires permissions), will it automatically create a link to that cid directory and successfully load.

Version Control

Support Only 4 Comparison Operators: >, <, >=, <=

  • >0.1.2: Any version greater than 0.1.2
  • >0.1.2<0.1.5: Any version greater than 0.1.2 and less than 0.1.5
  • <0.1.2: Any version less than 0.1.2

The logic for version selection during load is as follows:

  1. Retrieve all locally installed versions.
  2. Based on the version selection criteria, choose one version.

Note that during installation, the version chosen based on dependency information has its selection set from all versions in index-db, and it is not related to the versions already installed locally.

Package Installation Testing (To Be Completed)

Installation testing depends on index-db.

Installing Using pkg_id

Check if the installed result matches the current version specified in index-db.

Installing Using pkg_id + cid

Verify that the installation process correctly validates the cid. After successful installation, making simple changes to the files on the server should result in a download verification failure.

Installing Using pkg_id + Version Constraints

Check if the installed result matches the correct version specified in index-db.

Installation with Local Upgrades

After installing using the pkg_id method, make changes to the package content, then reinstall. At this point, the locally modified version should be backed up, and the current version should be reinstalled.

The 1660 graphics card does not work well with local sd

My laptop configuration:
Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2.60 GHz
16.0 GB (15.9 GB 可用)
nVidia 1660

System Info:
Windows 11 home
22H2

I've deployed stable-diffusion-webui locally. Use the following command to start: python launch.py --share --api . Set the value DEMO_STABLE_DIFFUSION_ADDRESS=<sd share url> in docker/jarvis/.env . When I was running autotest in webset, the cpu, memory and hard disk usage was 100%.

Enhancements for Knowledge Base and Multimedia Content Handling

I would like to suggest some improvements in our current KB system:

Contextual Text Prompts in Knowledge Base: The text prompts returned by the Knowledge Base should be multiline, including the lines above and below where the relevant terms are found in the document. This would provide a richer context for the associated terms.

Token Limit and Priority in Knowledge Base: Due to the token limit, the Knowledge Base should find the highest weighted associated objects based on the provided token limit. If this approach isn't effective, I have another idea: We could only return the titles of the associated objects, and allow the LLM to decide through inner functions whether it needs to read the original text of these associated objects. If we can include a table of contents in the associated objects, it might further improve the efficiency of LLM's autonomous search.

Prompt Construction for Multimedia Content: The current structure seems more suitable for generating Knowledge Base prompts for document content. I believe we should also construct prompts for different types of multimedia content.

I look forward to hearing your thoughts on these suggestio

Knowledge pipeline manager

Configure Knowledge Pipeline

The knowledge pipeline scans the specified input and constructs structured knowledge objects based on the input content. Depending on the application scenarios where the knowledge is used, various indexes are created for these objects.

Input

The input defines the process of transforming personal data sources into structured knowledge objects and specifies the granularity of calling the parser on the objects. Here are some typical implementations of input:

  • Local Directory: Specify a local directory, scan all files in the directory, and monitor their updates. Generate an object for each file and write it to the object store. Call the parser for each newly generated object.
  • Personal Email: Scan the inbox of a personal email account and monitor new emails. Generate an email object for each email and write it to the object store. Call the parser for each newly generated email object.
  • Browser Context: Implement a browser plugin that sends the elements of the current webpage to the corresponding input backend through RPC. Generate a rich text object for each newly generated object. Call the parser for each newly generated rich text object.

Parser

The parser defines the process of creating indexes from the input objects. It includes, but is not limited to, the following main methods and their combinations:

  • Vectorization and writing to the vector store.
  • Creating RDB and NoSQL indexes of various dimensions.
  • Sending objects to the Agent.

The configuration of the pipeline should include the following parts:

  • Input Method: Including the Python module that implements the input.
  • Input Parameters: Parameters for the input module, such as local path and email address.
  • Parser Method: Including the Python module that implements the parser. If the parser points to an Agent, this configuration can be simplified to the Agent instance name.

Knowledge Pipeline Manager

The pipeline manager is similar to the agent manager. It manages pipeline configurations and creates instances from the configurations to run in the background. The knowledge pipeline manager also handles the state management of pipeline instances.

Integrated into the AIOS shell, the following commands are added:

  • knowledge pipelines: Returns the currently running pipeline instances.
  • knowledge journal $pipeline [$topn]: Queries the journal logs of the currently running pipeline.
  • knowledge query $object_id: Queries the content of the specified knowledge object.

Adding a New Knowledge Pipeline in AIOS Shell

In the directory $home/myai/knowledge_pipelines/ or in development mode, in the directory $source_root/rootfs/knowledge_pipelines/, add a new pipeline directory. Here is an example using the built-in pipeline named "Mia":

pipeline.toml

Create a configuration file named pipeline.toml

  • The name field specifies the globally unique pipeline name.
  • The input.module field points to the input implementation relative to the pipeline directory.
  • The input.params field defines the input's input parameters. Different input implementations can have different parameter formats.
  • The parser section is similar.
name = "Mia"
input.module = "input.py"
input.params.path = "${myai_dir}/data"
parser.module = "parser.py"
parser.params.path = "${myai_dir}/knowledge/indices/embedding"

input

The input module should implement at least:

async def next(self):

Define the input class and implement the asynchronous iterator method next. Scan the input and generate structured knowledge objects for each element.

  • If all elements in the input have been scanned, return None, and the pipeline will be marked as finished.
  • If the input can be pending, waiting for new input, return (None, None).
  • If you want to pass the created object to the parser, return (object_id, journal_str), where the journal_str is the input part of the generated journal log.
    The implementation in Mia is scanning files in a directory and creating objects for text and images.
def init(env: KnowledgePipelineEnvironment, params: dict)

Create an instance of the input class and return it.

parser

The parser module should implement at least:

async def parse(self, object: ObjectID) -> str:

Define the parser class and implement the parse method to create indexes for the object_id returned from the input. Return the journal_str.
The implementation in Mia is to embed the content of the input object and save it to Chromadb.

def init(env: KnowledgePipelineEnvironment, params: dict)

Create an instance of the parser class and return it.

Using the Index Created by the Pipeline

The pipeline defines the process of creating knowledge objects and indexes. Correspondingly, you need to use the indexes created by the pipeline to complete your work.
Taking the built-in pipeline "Mia" as an example, besides creating a pipeline named "Mia", an Agent named Mia is also added

Two suggestions for OpenDAN

Hi guys~

The OpenDAN repo hasn't been updated for a while, and I guess you're working hard on the basic framework code of AIOS (AI application container, app store, etc.). I am an architect from CYFS (https://www.cyfs.com), and I have been studying the next generation of network operating systems for a long time. I have spent a lot of time on the future basic network protocol (widely known as Web3 today), future decentralized storage protocol, and research on serverless dApps. I also had a period in my career where I was deeply involved with deep learning systems built on massive unstructured data (mainly big size videos). I understand the value of data very well. With the rapid arrival of the LLM-based AI revolution today, I see many opportunities for "future" concepts to be realized. Recently, I have been studying how to construct the future "Personal Intelligence Application" with LLM as the core. I already have some new designs that I can't wait to try.

I met William at the Orkland event last time and heard his introduction to the vision of OpenDAN, and I had an on-site experience with Jarvis (I sympathize with William's encounter in Orkland, maybe I was the last person to use the OpenDAN prototype at the event). My feeling is that although OpenDAN is not yet implemented like an operating system today, you have demonstrated the new relationship between ordinary people and intelligence in the AI era with actual products, which is very enlightening! I think I can work with you to push the implementation of AIOS!

Based on the current state of OpenDAN, I have two suggestions. If you think you can proceed in this way, we can open specific issues to discuss in detail separately.

1.Suggestions on the Goals and Architecture of AIOS

I greatly appreciate the current goal of OpenDAN's Personal AI OS: to help more enthusiasts deploy the latest AI algorithms and models quickly and easily in their own environment, which is very pragmatic and utilitarian. However, some people will think that such an AIOS is more like an aggregation installer without too much technical content. I believe that the AIOS driven by LLM's AI kernel is the key infrastructure for people to use "smart apps" (in contrast, today's apps are computational apps), and it realizes the transition from the "PC"(Personal Compute) era to the "PI" (Personal Intelligence) era. Looking back on history, various OSs flourished in the early computing era. As long as we persist, we can definitely seize this historic opportunity!

Basic Theory of AIOS

The premise of the AIOS concept is that we have (or will soon have) an LLM Kernel that has passed the Turing test. We need to truly understand the meaning of passing the Turing test: the LLM Kernel can be seen as a virtual brain. This virtual brain is no different from a natural brain, and it also has some characteristics of the human brain.

  1. Use its own knowledge (common knowledge) to intelligently reason the input and get the output
  2. It's a black box process. Even the author of LLM can't predict the result accurately before reasoning
  3. Compared with computing, intelligent reasoning can get "creative" inaccurate results. Creativity is an advantage, and accuracy is a disadvantage
  4. Just like people have different personalities, the big models trained by different organizations based on different methods and different data sets in the future will have different personalities. This personality distinction is ex post

After reaching the above conclusions, we can further derive the basic methods of using this type of AI technology:

  1. The anthropomorphic AI Agent is the main way for users to use AI technology in the future. ChatGPT's Agent is defined as a "knowledgeable advisor".
  2. Extending (defining and implementing) AI Agent is the key work of AIOS. The method of definition is very similar to a personal resume, such as "primary school math teacher". The method of implementing AI Agent can be divided into three levels from simple to complex
    a. Implemented based on keyword engineering
    b. Extend the knowledge base or plugins on the general LLM to achieve
    c. Custom training of LLM required for professional knowledge or skills of AI Agent
  3. The input of AI Agent is natural language (information), and the output can be natural language (information) or precise operation calculation instructions. (The essence of the programmer's profession is to produce computable instructions that can be run repeatedly)
  4. Organize different AI Agents into an AI Agent Group to complete complex tasks

I drew a process to illustrate how to use the AI Agent Group to complete the development of a complete small game under the guidance of the above theory.

AI Agent

From this process, it can be seen that most users do not need to deal directly with computing in the future. They can interact with AI Agents in a natural way. AI Agents will execute Functions (these Functions can even be written on the spot) when appropriate. From the perspective of operating system design, this is a new UI method.

The Core Goals of AIOS

In order for the above process to work, I have set several key goals for AIOS:

  1. Abstract the LLM interface, so that AI Agent developers can ignore the specific details of LLM in most cases, and at the same time make it easy for LLM developers to test whether their work results can make existing AI Agents work better
  2. According to the management of computational resources, the operation of LLM is scheduled and managed. And allow different AI Agents to run different LLM.
  3. Provide the running container of AI Agent, solve the problem of context memory of AI Agent, and give specific AI Agent access to the Function permission to support its specific work.
  4. Provide the running container of AI Agent Group. The running container supports AI Agent to form a workflow and supports AI Agent to pass and share information
  5. Provide the running environment of Function. Functions can be pre-installed by AIOS (written by humans), and AI Agents are allowed to write directly as needed.
  6. Properly manage the personal data of users (the owner of AIOS), make AI Agents more personalized, and also give small-parameter LLM models the opportunity to be customized.

The Basic Structure of AIOS

AIOS is a network operating system built on the LLM kernel, with AI Agent as the main interaction method, and based on the Web3 protocol.

Here is the architecture diagram (very simple)

AIOS

The basic logic of the above architecture:

  1. The AI Agent layer represents the way future intelligent applications exist. Each instantiation of an AI Agent is equivalent to creating a virtual person in the world. Our AIOS should actively use existing software and protocols to make these AI Agents more accessible.
  2. The goal of the AI Agent scheduling layer is to allow more AI Agents to work better. The operation of an AI Agent requires computational power and the correct Context, which usually runs on a device.
  3. AI requires too many computing resources, so AIOS is installed in a small cluster composed of user-owned devices. Its minimum scale may be a single high-performance PC, the regular scale may be 3-5 servers (composed of computing servers, storage servers, and access control servers) as the core, connecting all of the user's mobile phones, notebooks, and smart home devices. Its large scale may consist of hundreds of servers owned by small businesses. Integrating the computing resources on these heterogeneous devices for AI Agents and Functions to use is definitely not an easy task. Therefore, I believe that the next generation of operating systems will definitely be non-standalone network operating systems, and have done a lot of related research. But I think you as engineers in the AI field may not necessarily have knowledge in this area. The design of the next generation HTTP protocol and distributed storage system still has a lot of domain knowledge, if you are interested, you can go to the CYFS Repo (https://github.com/buckyos/CYFS) to understand our existing achievements.

I am writing a paper that will more fully explain the entire design, and plan to announce it in the near future.

This is a grand goal, but I think we should still start from the simplest prototype. For example, first build a system that can install AI Agents, allowing developers to simply use prompts to create and publish AI Agents like "Personal Exclusive Home Tutor". After installation, users can add this home tutor as a Telegram friend and learn anytime, anywhere.

2.The Design of DAO using SourceDAO

"Open source organizations have a long history and brilliant achievements. Practice has proved that an open source organization can achieve the goal of writing better code only by working in the virtual world. We believe that software development work is very suitable for DAO. We call this DAO for decentralized organizations to jointly develop software as SourceDAO." ---- from the White Paper of CodeDAO(https://www.codedao.ai)

SourceDAO offers a complete design for the DAO-ification of an open source project. After several iterations, the CYFS Core Dev Team has essentially completed the corresponding implementation of smart contracts. Here I use OpenDAN as an example for a brief introduction, the detailed design can refer to the white paper above. Due to my background, I have a rather fundamentalist attitude towards open source (I highly agree with GPL and GNU), and the starting point of SourceDAO also comes from Bitcoin's assumption of "man is evil if not restrained". Some designs may be considered extreme, but I believe you will understand.

Basic Operation Process

  1. Create an organization, design goals and initial DANDT distribution, set initial members, and establish the initial Roadmap.
  2. The Roadmap explains the relationship between system maturity and token release: the more mature the system, the more tokens are released. From the perspective of software engineering, the Roadmap outlines the rough plan of the project, dividing it into five stages: PoC, MVP, Alpha, Beta, Formula (Product Release), each of which has a DANDT release plan.
  3. Development as mining: This is the main stage for the DAO organization to achieve its goals. The community must work together to advance the Roadmap to the next stage. The DAO sets plans according to the standard project management process and regularly calculates the contribution value of project participants. After project acceptance, contributors will receive DANDT according to their contribution ratio.
  4. DANDT can also be used for market behavior to increase the popularity of the project. The main incentive principles are to incentivize new users (like engineers who Star us on Github) or to design fission rewards for those who bring new engineers and new users to the project.
  5. Holding DANDT allows participation in DAO governance.
  6. Financing can be carried out based on DANDT to obtain other types of resources for the DAO.

Staff Structure of the DAO Organization

Committee

The committee consists of no fewer than three members, and the number of members must be odd. Members are elected by voting on major affairs and serve for 12 months (can be re-elected). The committee is the main body for making daily decisions in the DAO, and processes regular DAO affairs by member voting. The committee needs to organize at least one formal public meeting each quarter to discuss the overall development of the DAO.

Alternate Members

1-3 alternate members (with priority) can be elected under the same conditions. Alternate members can participate in all activities of the committee but do not have voting rights.

Removal of Committee Members

Anyone can initiate a proposal (major proposal) to remove a committee member. Once the proposal is passed, the member immediately loses qualification, and the committee must elect a temporary replacement from the alternate members within 14 days (the term of the removed member is inherited). If the committee cannot implement the election (there might be an even number of remaining members), an alternate member will be selected based on priority ranking.

Committee members can also resign voluntarily. After the approval of the committee, the resignation takes effect and the member loses qualification.

Secretary-General of the Committee

The Secretary-General must be a member of the committee and is responsible for organizing the committee to work according to the constitution, especially in keeping written records and public work. If other committee members lose their qualifications, the Secretary-General can serve concurrently.

Committee Accountant

The committee appoints an accountant to handle some of the financial affairs in the regular DAO affairs. The term of office is two years. The committee accountant can receive income from the committee's budget package every month upon appointment, has no voting rights, and cannot hold other positions.

Market Leader

The market leader must be a committee member. The market leader is responsible for formulating marketing promotion plans and executing them.

CFO

The CFO must be a committee member. The main job of the CFO is to prepare budgets, design asset custody systems, and propose finance-related proposals. (Note: The CFO and the committee accountant are not the same person, and there is no hierarchical relationship.)

DAN Developer

Any developer who has contributed to the OpenDAN project and has a contribution value of more than 100 will automatically qualify as a DAN Developer (lifetime term). Removal:

  1. Voluntarily declare to quit.
  2. The identity of a DAN Developer can be revoked through a major proposal.

Core Developer

OpenDAN is an open source organization, so engineers are the main members of the organization. In a sense, Core Developers are full-time participants in the DAO. They can receive a fixed income every two weeks based on the current level and DAO's financial configuration. The project manager can, in principle, assign tasks to Core Developers. Core Developers can also hold other DAO positions.

Decision-Making Mechanism

Transaction Classification:

DAO transactions are classified into internal project transactions, routine DAO transactions, important transactions, and major transactions. Internal project transactions are decided by the project lead or designated responsible individuals, routine DAO transactions are decided by committee voting, and important and major transactions are decided by bidding from all DANDT holders. The difference between important and major transactions lies in the minimum voting threshold (the amount of DANDT available for voting). Important transactions require a minimum voting threshold of 30% of the available DANDT, while major transactions require a minimum voting threshold of 40%.

Decision-Making Process:

Except for internal project transactions, all DAO transactions follow the following process:

  1. Proposal: Committee members are eligible to initiate all proposals, while non-committee members can initiate proposals by staking DANDT. The required amount of staked DANDT varies depending on the type of transaction.
  2. Proposals can be designed with a voting deadline (not less than 14 days, major proposals not less than 21 days). Once all committee members have voted, routine DAO transactions that require committee voting automatically produce results.
  3. After the voting deadline for a proposal, there are three possible outcomes: approval, rejection, or failure to reach the minimum voting threshold. (If the proposal was initiated by staking, the staked tokens will be returned to the proposer.)
  4. Some proposals are "contract proposals," such as modifying certain contract parameters. Once such a proposal is approved, it will be automatically executed.
  5. For non-contract proposals that are approved, they enter the execution phase. The proposal is then handed over to designated individuals for processing.
  6. After completing the proposal operations, the proposer can mark the proposal as completed.

Project Development Process

OpenDAN is an open-source organization, and the project development process is the primary workflow. The project development process within the DAO organization follows the principle of prioritizing efficiency in the early stages and stability and fairness in the later stages. At the level of DAO rules, we avoid designing too many detailed rules and instead delegate the implementation of specific tasks to the responsible individuals.

Basic Process:

  1. Project Planning: Based on the roadmap, the committee discusses and plans the number of projects needed to achieve the goals of the current stage. The project planning is finalized through discussions at committee meetings. After project planning, the preparation phase begins, focusing on selecting a project lead (project manager). The project manager must be a Core DAN Developer.
  2. Project Initiation: Once the project manager is determined, they initiate the project initiation. The format of the project initiation document is flexible, but the most important aspect is the design of the project budget pool (where contributors receive their mining income from the budget pool), project team members, and the project duration.
  3. Discussion and Approval: After the project initiation document is submitted, the committee discusses (with emphasis) and votes for approval. Some projects are considered important projects, requiring DAO voting approval based on the requirements for important transactions.
  4. Defining the Actual Plan: The project manager starts defining the actual project plan, including the list of team members and specific task designs. Task designs should incorporate contribution values. Guidelines for contribution value design will be provided in the openDAN project management manual.
  5. Project Execution: The project enters the execution phase, aiming to complete all tasks. The project manager marks the project as completed.
  6. Project Acceptance: If the project manager marks the project as completed, it enters the acceptance phase. General projects are accepted by the committee, while important projects are accepted through DAO voting. The acceptance levels include: not approved, poor, satisfactory, and excellent.
    7.Project Team Token Reward:Based on the project's actual budget, the formula for token rewards is as follows: project's actual budget * (individual contribution value / total contribution value). The project's actual budget is determined by multiplying the initial budget by the acceptance level.

I hope that these suggestions will be helpful to you.

Guidelines for Module Leaders and Proposal for Contribution Process

Dear all,

Firstly, I want to express my heartfelt thanks to all those who have shown interest and left comments in the issue about becoming module leaders of MVP . Your support is invaluable!

Currently, OpenDAN does not have a well-defined contribution process guideline. I believe we need to establish this soon, but it should be a collaborative effort, developed through our collective discussions and experiences. From my perspective, I would like potential module leaders to open a separate issue outlining their general module design after their application. This can be viewed as a project proposal. This way, we can focus all pre-coding discussions for a module in one place. Additionally, this is a necessary process step required by the SourceDAO project management contract.

I think a project proposal should include the following sections:

  1. Basic Information: Including the project leader.
  2. Difficulty Level: I have set some, but if you have different opinions, please suggest modifications. Objective discussions about difficulty directly relate to the project's expected completion time and expected DAO Token rewards.
  3. Team (optional): If you plan to invite more friends to complete this project together, you can state it here.
  4. Project Goals: Briefly explain the core objectives of the project.
  5. Key Project Processes: You can explain this using interface definitions or pseudocode.
  6. Testing Methods: This is essential. Describe how to write test cases for the module. This section will help everyone build a more unambiguous consensus.
  7. Project Architecture Design (optional): Draw important flowcharts, class diagrams, sequence diagrams, etc. I recommend using draw.io for this.

I am currently developing the pkg manager ,basic agent manager and agent template manage component, I will complete a document as an example.

some where to try out?

I saw the introduction of your project on YouTube, and it looks very interesting. I briefly read your README, and the setup process seems quite complex with specific hardware requirements. It seems like a massive undertaking. Do you have any simple experiential platforms, like public Discord groups? I noticed that you just released a Discord chatbot.

Enhancement Proposals for AIGC Direction Focusing on Strengthening Single Agent Capabilities

Description:

Our current AIGC workflow, particularly with the story_maker, has ventured into the realm of multi-agent collaboration to tackle intricate problems. However, from the vantage point of delivering genuine end-user value, I firmly believe we should pivot the core direction of AIGC towards amplifying the capabilities of a single Agent.

Here are the key areas and associated tasks that I recommend we focus on:

  1. Image Generation:

    • Integrate with DALL·E3 by adding a simple text_to_image node.
    • Enhance the single agent that uses SD, essentially replacing a less intuitive WebUI with an LLM-based agent for better SD utilization.
      • Assist users in clarifying their requirements before initiating the drawing process, possibly through interactive keyword prompts.
      • Use image analysis to determine effective construction methods.
      • Guide users towards popular effects, automating processes such as model downloads. This could be our breakthrough.
      • Steer users towards building and using their own Personal LoRA.
  2. Image Editing:

    • There are two approaches to this:
      • Agent-based linguistic control: This approach not only aims at fulfilling traditional image editing needs but also includes advanced features like:
        • Beauty enhancement (Skin retouching, etc.)
        • Automatic exposure adjustments.
        • Even automatic composition.
      • Conventional image editing via WebUI.

The newly released GPT-V does not have an API available for use yet, but I think it can be of great help in solving the problems mentioned above.

  1. Voice Generation and Editing:

    • Based on a given text and scenario, produce voice outputs in a specific voice imprint.
      • Train to derive one's own voice imprint, or "lora".
    • Given a voice input (or video), extract its content. An example use-case would be transcribing meeting records and identifying speakers.
    • Real-time translation: Accept voice input and provide translated output. For instance, translating a Chinese speech into English while retaining the original voice imprint.
  2. Sound Editing:

    • Remove background noises.
    • Isolate a particular voice or extract background music (Karaoke mode).

By concentrating our efforts on enhancing a single Agent's capabilities, I believe we can create a more streamlined, user-centric experience. Feedback and additional suggestions are most welcome.

Discussion on the call process for training individual portraits LoRA in AIGC

Destination

AIOS integrates AI portrait process. Includes personal ID photos, artistic photos, pictures of hairstyle changes, clothing changes, sense changs, etc

Basic process

  • Connect to a personal stable diffusion.
  • Verify the model and extension of the stable diffusion.
  • Encapsulate the return information of different states of stable diffusion, allowing users to perceive the steps.
  • AIOS kernel load the local directory, read 5 to 10 individual photos, and initiate training for the Lora model.
  • The LORA training process may take 10 to 20 minutes, depending on the GPU.
  • Once training is complete, notify the user and await a new input message.
  • The llm parses the message to a prompt and sends a request to the AIGC model. The AIGC can then generate ID photos, artistic photos, pictures of hairstyle changes, clothing changes, etc.

Other

More consideration should be given to using LLM to parse requirements and invoke commands, reducing the commands users need to input in the process, so that users only need to input their requirements.

@waterflier @lurenpluto

Some questions, the main question is how does llm interact with the system

After reading the Workflow code, I would like to share my understanding and questions to see if it aligns with the overall design. As I am interested in llm kernel packaging, my focus is primarily on this area.

  1. A role contains an agent (1:1?), and an agent may have multiple chatsessions (1:n). So what is the timing of creating a chatsession? When is it decided by the owner or by the compute_kernel to create a new chatsession?
  2. Each llm call means that the compute_kernel thinks that the task is temporarily unknown how to split, or that the task has reached the minimum granularity (such as merging results), and it is up to the compute_kernel to decide when to call up llm?
  3. result = await compute_kernel().do_llm_completion(prompt,the_role.agent.get_llm_model_name(),the_role.agent.get_max_token_size())
    Which llm model to use is determined when creating the agent instance, and cannot be changed after creation. The agent and llm instance are m:n relationships. max_token_size is an attribute of the agent? What is the relationship with llm's max_token?
  4. Based on point 3, the compute_kernel needs to dynamically create and maintain llm instances at runtime, or is it that llm actively registers with the compute_kernel?
  5. Each llm instance should have its own description of capabilities, and have an interface to query the current status (busy or idle). Most of the api (service) mode llm capabilities are already determined. The difference in llm capabilities is mainly reflected in hardware capabilities (when deployed locally) (this can be optimized later).
  6. The llm instance has no context, and each time it senses the context based on the prompt passed in by the compute_kernel call.
  7. The context of a chatsession may be huge, should the prompt passed to llm be trimmed?
  8. Does the compute_kernel go to find an idle llm instance to execute tasks, or does the compute_kernel only submit tasks, and let the llm client queue up by themselves?

Agent Memory & Agent Work Cycle

Agent & Workflow Status (Memory)

There are two types of states in memory: the Agent's own memory and the state within the workflow. The Agent's memory is the core of its identity as an independent entity. Agents with the same factory settings will develop different memories based on their work experiences, ultimately affecting their performance. The focus of their memory tends to be on long-term relationships with people and summaries of similar types of work.

The state within the workflow, on the other hand, represents the concept of "The names change, but the game remains the same." serving the organization's phased goals. Its way of maintaining memory is more conducive to the completion of current TODOs (Tasks).

Currently, I am inclined to implement the above two states in an isomorphic way (Workspace), with the only difference being that the Agent's memory workspace is private, while the state within the workflow is shared by multiple roles. This way, the entire system is still working with the Agent as the main body. The implementation of the workspace is a file system, divided into three main directories: /todos/, /memory/, and /kb/.

Among them, the /todo/ and /kb/ directories support both reading and writing, and the content format is human-friendly. This can support both Agents and real people working in the same workspace, and some todos that cannot be completed by the Agent can be assigned to people. The /memory/ directory stores the Agent's memory (or the work summary of the role in the workflow), which can be read directly but should not be modified.

The main starting point for distinguishing the above two states in design is bionics and personal philosophy. I believe that retaining individual and organizational characteristics can better activate the diversity of the system, have a greater chance of completing tasks, and ultimately achieve AGI (Artificial General Intelligence). Fundamentally, I oppose the concept of an Agent that is "competent in everything and knowledgeable in everything." Individuality is the root of creativity.

Agent Work Cycle

As designed above, the realization of all work goals in the system depends on active Agents. By authorizing and limiting resources for Agents, we can have overall control over the system.

Version 0.5.1 of the Agent is completely message-driven and passive. Events from the environment must first be converted into messages before they can be processed. After adding the Agent work cycle, the following logic is introduced:

Agents have wake/sleep logic. Sleep replenishes the Agent's energy.

When awake, the Agent's timer starts working, and the timer-driven cycle includes:

0. ProcessMessage, similar to before, but the Agent extracts TODOs from the process and adds them to the current Workspace.
1. OnWork, the Agent reads TODOs from the Workspace (which Workspace depends on which Workflows the Agent has joined at that time) and tries to complete them, leaving work logs in the process.
2. The process of completing TODOs is gradual. The Agent first determines whether the TODO can be completed directly. If not, the task is broken down, and the sub-tasks can be handed over to other Agents (or people).
3. OnThink, the Agent analyzes the messages, work logs, and summarizes them to improve the ProcessMessage and OnWork.
4. OnLearn, the Agent learns from external requirements (or autonomously) and modifies the Knowledge base.
5. If an agent takes on a managerial role in a workflow, they can employ management or economic methods such as creating or recruiting agents and posting reward tasks to solve more complex problems. However, I think this part of the discussion can be saved for future versions.

From a product perspective, the goal of the Agent Work Cycle is:

  1. To give the Agent reasonable initiative (to send messages proactively at appropriate times).
  2. To allow the Agent to complete some tasks directly within its authorization range.
  3. To give the Agent stronger capabilities for in-depth thinking, information integration, and learning (extended search analysis) around specific people or things.

Understanding Agent Loop with Jarvis as an Example

Jarvis is a personal assistant, focusing on:

  1. Managing the owner's schedule and providing necessary reminders.
  2. Acting as a consultancy advisor based on the owner's current tasks, collecting and organizing more real-world information.
  3. Excluding the coordination of other Agents, Jarvis only teams up with Mia (Mia is a more pure knowledge organization and collection type Agent).

OnMessage (Passively Triggered)

Update the "Todo List" if necessary.

OnEvent (Passively Triggered)

For example, you can get events from existing calendar services and drive them.

OnWork (Every 5 minutes, note the energy consumption)

Read the schedule, provide necessary proactive reminders, especially natural language descriptions of periodic reminders. Try to complete necessary TODOs at the right time, such as confirming the destination weather before traveling and proactively giving out clothing plans.

OnLearn

  1. Formulate learning plans based on TODOs and work summaries.
  2. Actively use the network to obtain more real-time information according to the learning plan.
  3. Use this real-time information to do a better job as an assistant.

OnThink

Summarize the people and events encountered based on work experience. When the owner has subsequent related tasks, provide more targeted advice.

With the support of the above new facilities, implement some Agents that can solve practical problems (ignoring Token cost issues for now)

1. Shopping assistant, can complete research, comparison, and purchasing tasks based on purchasing needs.

2. Form a team of product managers, programmers, testers, and designers to develop more complete and complex software (MetaGPT has already accumulated a lot in this field).

Is there a prebuilt official Jarvis docker image?

Jarvis looks interesting.

I have to manually build docker since there is no prebuilt official Jarvis docker image. It will be more convenient to have a official prebuilt docker image for quickstart.

Calling for the moudle-PM of the MVP!

Thanks to fiatrete, we can start developement of OpenDAN mvp at brach : https://github.com/fiatrete/OpenDAN-Personal-AI-OS/tree/MVP

Based on the rules design of the upcoming SourceDAO, next step we need to

  1. Confirm the version leader (I recommend myself)
  2. Calling for the moudle-PM (you will receive OpenDAN DAO Token rewards after the version is released!) The first version of MVP is 0.5.1, I have submitted the basic design document, check it here https://github.com/fiatrete/OpenDAN-Personal-AI-OS/blob/MVP/doc/mvp/mvp%20plan.md
  3. You can apply to become the moudle-PM in this issue, or discuss how the design (including difficulty design) of certain modules should be more reasonable. When applying, if you're willing, you can introduce yourself a bit~

The following content is from MVP document. (https://github.com/fiatrete/OpenDAN-Personal-AI-OS/blob/MVP/doc/mvp/mvp%20plan.md) I have extracted some important content related to the application for the module-PM

OpenDAN Basic Planning of MVP

0.5.1 Implement data capture into Knownlege-Base(KB) via Spider, followed by access by AI Agent (35%)
0.5.2 Build a Personal-Center based on the KB and associate the AI Agent with accessible telgram accounts (30%)
0.5.3 Release for waitlist (5%)
0.5.4 First public release (10%)
0.5.5 Incorporate modifications after the first public version, workload depends on feedback (15%)
0.5.6 Official version of MVP (5%)

R&D Process Management

Based on the project management module provided by SourceDAO, explore a new open source R&D process of "open source is mining".
0. Confirming Version Leader based on committee election.

  1. Module (task division): Divide the system's futures into independent modules as much as possible, the development workload should be at the level of 1 person for 2-3 weeks.
  2. Discussion: Discuss whether the module division is reasonable, and design it's BUDGET based on the difficulty of the module and its importance to the current version (most important step)
  3. Recruit module PM. The module PM is responsible for the module's test delivery: completing the set functions, constructing the basic tests, and passing the self-tests. Testing should retain at least 30% of development resources.
  4. For the completed module, PM should write and publicize the Project Proposal. It contains more detailed about module goals + design ideas, participating teams (if any, there should also be a preliminary division of work within the team and calculation of contribution value), and acceptance plan design.
  5. The PM completes development and self-testing. Mark the module is DONE.
  6. Version Leader organizes the acceptance of the module (a dedicated acceptor can be appointed).
  7. Version Leader organizes integration testing according to the completion situation, and the module PM fixes BUGs. The test results can be used in the nightly-channel of OpenDAN.
  8. After the test passes, the Version Leader announces the version release, anyone can use this from release-chanel.
  9. The committee accepts the version after the release effect. After acceptance, all participants can extract contribution rewards.

Difficulty is expressed in the mode of requirement engineer level * time (in weeks, less than 1 week is counted as 1 week). 1 week of work time is calculated as 20 hours.

How about we start a DAO?

Hey ALL, have you thought about setting up a DAO for our open-source project? It could be super cool!

With a DAO, we can make decisions in a way that's fair and open to everyone. Members get to vote on important stuff, so it's not all top-down like a traditional company.

Plus, having a clear structure for governance would save us a lot of headaches down the road. We could attract more people to join in and invest, which would be awesome.

It's also a great way to get the community involved and give them a sense of ownership over the project. And because we don't have to wait around for one person to make all the decisions, we can move faster and get things done more efficiently.

And let's not forget about security! With a DAO, we can reduce the risk of fraud and keep everything safe and secure. So what do you think? Shall we give it a go?

[TODO] I suggest that we migrate OpenDAN's DAO implementation to SourceDAO.

Background:
#1
#25

It appears that we have reached a consensus on the following points:

  1. The subsequent open-source community construction of OpenDAN will be driven based on DAO. Fundamentally, this is because DAO, through the token mechanism, can effectively incentivize all contributors while providing a fairer (and more long-term) governance structure.

  2. SourceDAO, developed by the CYFS DAO organization, is designed and implemented specifically for open-source organizations. It is an EVM-based DAO contract that can well support our aforementioned requirements. I suggest choosing the SourceDAO contract as the core contract for the OpenDAN DAO organization.

The SourceDAO contract has been recently open-sourced (admittedly, we pushed the timeline a bit), and can be found here: https://github.com/buckyos/SourceDAO

We have conducted multiple rounds of testing and auditing on the current implementation, but the contract has not been used yet. Therefore, we still need more audits and tests (we would appreciate everyone's help on this).

Given the above, I believe the next steps (TODO) are as follows:

  1. Before formally using this contract, we may need a process to make an official decision. Future decisions can be made through the voting contract of SourceDAO, which allows for fair and open participation by everyone.
  2. I hope everyone will understand the basic design of the SourceDAO contract. Before official activation, we need to prepare some configurations.
  3. We need to develop DAO-related pages on the homepage of www.opendan.ai and connect them with the SourceDAO contract.
  4. As per convention, we need to open source the page code of www.opendan.ai.

I look forward to hearing everyone's feedback~

Draft of the Storage Scheme for the Email Spider

Configuration File Path

The configuration file for the email scraping program is located at rootfs/email/config.toml.

The configuration file includes the following fields:

  • EMAIL_IMAP_SERVER: This field is for the IMAP server of your email service. For example, "imap.gmail.com".
  • EMAIL_ADDRESS: This field is for the email address that you want to scrape. Please replace with your own email address.
  • EMAIL_PASSWORD: This field is for the password of your email account. Please replace with your own password.
  • EMAIL_IMAP_PORT: This field is for the port number of your IMAP server. For Gmail, this is typically 993.
  • LOCAL_DIR: This field is for the local directory where you want to store the scraped emails. For example, 'rootfs/data'.

Please note that you should keep your email address and password confidential and ensure they are securely stored.

Sure, here's how you can incorporate this information:

Title: File Organization and Storage Scheme for Email Scraping Program

File Storage Path

The scraped email files will be stored in the directory rootfs/data/[email protected]/. And also could change it by LOCAL_DIR filed

Creation of Email Folders

Each email will be processed through its name and time to generate a unique MD5 hash. We then use this hash to create a unique folder to store the corresponding email content.

Email Content Storage

Within each email's folder, we create two files to store the main information of the email:

  • email.txt: This file stores the body content of the email.
  • meta.json: This file stores the header information of the email.

In addition, this folder can also be used to store attachments, images, and other files related to the email.

The above is the file organization and storage scheme for our email scraping program. We welcome your feedback and suggestions so that we can continuously optimize and improve this scheme.

Some issues regarding workflow implementation

1.Code:
result = await ComputeKernel().do_llm_completion(prompt,the_role.agent.get_llm_model_name(),the_role.agent.get_max_token_size())
Question:
ComputeKernel should be a complex scheduling system, it will contain various resources and related interfaces, but here it only depends on llm, it should be more appropriate to encapsulate an llm interface based on ComputeKernel.

2.Code:
callchain:CallChain = self._parse_function_call_chain(result)
Question:
In my understanding, shouldn't a function just be a definite function call? How come it's a CallChain? What is the structure of this CallChain?

3.Code:

next_msg:AgentMsg = self._parse_to_msg(result)
if next_msg is not None:
  # TODO: Next Target can be another role in workflow
  next_workflow:Workflow = self.get_workflow(next_msg.get_target())

Question:
Do agents in the same workflow also communicate through send_message? It should be clearer to directly call the agent's interface, and the implementation will be simpler.

4.Code:

inner_chat_session = the_role.agent.get_chat_session(next_msg.get_target(),next_msg.get_session_id())
inner_chat_session.append_post(next_msg)
resp = await next_workflow.send_msg(next_msg)
inner_chat_session.append_recv(resp)

Question:
Shouldn't the state of the session be maintained by the owning Agent or workflow? How come it's obtained by the user?

5.Should there be a clear interface to create a session? Whether a new session is needed should only be known by the user.

Why not use AutoGPT as the task planning module for Jarvis?

During my use of Jarvis, I found that it only executes one task at a time when planning tasks. However, we know that AutoGPT has the ability to break down complex issues into a group of simpler, smaller ones, and it can automatically orchestrate the execution of tasks. Therefore, I'm wondering, why not directly use AutoGPT as the task planning module for Jarvis, to enhance its ability to handle multiple tasks?

0.5.1 Progress Status of MVP Update and New Plan!

Overview

The core goal of version 0.5.1 is to turn the concept of AIOS into code and get it up and running as quickly as possible. After three weeks of development, our plans have undergone some changes based on the actual progress of the system. Under the guidance of this goal, some components do not need to be fully implemented. Furthermore, based on the actual development experience from several demo Intelligent Applications, we intend to strengthen some components. This document will explain these changes and provide an update on the current development progress of MVP(0.5.1,0.5.2)

The previous plan, please see here: #29

Progress Status of MVP

  • Each module includes whether the current version goals have been met, the current person in charge, and workload assessment.

  • Modules that are not marked for version 0.5.2 and do not have a designated person in charge are modules for which we are currently recruiting contributors.

  • Modules that have not been completed but already have a designated person in charge are modules that are currently in development.

  • AIOS Kernel

  • #47

    • Kernel Service
      • System-Call Interface,A2
      • Name Service,A4
      • Node Daemon,A2
      • ACL Control,A4
      • Contact Manager,A2
    • Runtime Context (0.5.2),A4
    • Package System,@waterflier, A2+S4
  • AI Compute System,@waterflier, A2

  • Storage System

  • Embeding Piplines,@photosssa, A2

  • Network Gateway,A6

  • Build-in Service

    • Spider,@alexsunxl, A2
      • E-mail Spider,@alexsunxl, S2
      • Telegram Spider,S2
      • Twitter Spider (0.5.2)
      • Facebook Spider (0.5.2)
    • Agent Message Tunnel (0.5.2) @waterflier A1
    • Home IoT Environment (0.5.2), A4
      • Compatible Home Assistant (0.5.2), A4
  • Build-in Agents/Apps

  • UI

    • CLI UI (aios_shell),@waterflier,S2
    • Web UI (0.5.2),A4+S4
  • 0.5.1 Integration Test (Senior*3)

  • OpenDAN DAO Website (alpha)

  • SDK

    • Workflow SDK,@waterflier, A2
    • AI Environments SDK (0.5.2), A2
    • Compute Kernel SDK (0.5.2), A2
  • Document (>0.5.2)

    • System design document, including the design document of each subsystem
    • Installation/use document for end users
    • SDK document for developers

The following is the introduction of the adjustment of each component after the current implementation.

AIOS Kernel

Define some of the important basic concepts of Intelligent Applications running on OpenDAN

Agent

Agent is the core concept of the system, created through appropriate LLM, prompt words, and memory. Agents support our vision of a new relationship between humans and computation in the future:

Human <-> Agent <-> Compute 

Agents form the basis of future intelligent applications. From the user's perspective, the strength of AIOS is primarily determined by "how many agents with different capabilities it has."
The above process has now been implemented. In practice, I found a key issue is that we need to continuously seek the optimal solution. This issue directly relates to how application developers of OpenDAN build intelligent applications, so I think it has a high priority.

Optimization of system prompts

The goal is to allow Agents to communicate with other Agents (forming a team), call Functions at the right time, and read/write status through the environment at the right time, using prompt words. The existing implementation is usually:

When you decide to communicate with a work group, please use : sendmsg(group_name, content).

Our optimization direction is:

  1. To allow Agents to initiate calls accurately
  2. To use as few precious prompt word resources as possible.

If there are already systematic studies in this field, introductions are also welcome!

Workflow

Workflow has realized the concept of allowing multiple Agents to play different roles and collaboratively solve complex problems within an organization. It is also the main form of intelligent applications on OpenDAN. Compared to a single Agent, building a team composed of Agents can effectively solve three inherent problems of LLM:

  1. The prompt word window will grow, but it will remain limited for a long time.
  2. Like humans, Agents trained on different corpora and algorithms will have different personalities and will excel in different roles.
  3. The inference results of LLM are uncontrollable, so accuracy cannot be guaranteed. Just like humans make mistakes, the collaboration of multiple Agents is needed to improve accuracy.

The basic framework of Workflow has been completed (which is also the core of version 0.5.1). Following the subsequent SDK documentation, we now have a basic framework for third-party developers to develop applications on OpenDAN.

AI Environments

Environments provide an abstraction for AI Agents to access the real world.

Environments include properties, events, methods (Env.Function), and come with natural language descriptions that Agents can understand. This allows AI Agents to understand the current environment and when to access it. For example, an Agent planning a trip needs to understand the real weather conditions at the destination in the future to make the right decisions. This weather condition needs to be provided to the Agent through Environments.

The events in Environments also provide logic for the autonomous work of Agents. For example, an Agent can track changes in the user's schedule and date, automatically helping the user plan and track the specific itinerary for the day.

Celender Environment

The system's default environment, which can access the current time, the user's schedule, and the weather information at a specific location. It also contains some important and basic user information, including home address and office address.

Compatible with common Celender services. (0.5.2)

  • Microsoft Outlook Calendar
  • Google Calendar
  • Apple Calendar

Workspace Environment (0.5.2)

A file system-based workspace environment that allows the Agent to read/write files at appropriate times.

AI Functions

Function is a core concept of AIOS, providing the Agent with descriptions of suitable callable Functions, allowing the Agent to invoke Functions at the right time. Through Functions, the Agent gains "execution power," rather than just being an advisor that only provides suggestions. The Function framework allows third-party developers to develop and publish Functions, supporting Agents and Workflows to have a list of available Functions, through which they can build appropriate prompt words, enabling the Agent to invoke Functions at the right time.

Under development.

Basic AI Functions (0.5.2)

There are already a plethora of basic services in the world, such as querying the weather at a specific location at a specific time, checking hotel prices, or booking plane tickets. The system should separate the definition and implementation of Basic (generic) Functions, allowing Agent developers to implement common scenarios with generic logic. The definition of generic Functions is undoubtedly similar to standard setting work.

I know that many other projects have done a lot of work in this field, and ChatGPT also has dedicated function support. What we need to do is to find the open standards that are closest to our goals and then integrate them.

AI BUS

The AI BUS connects various conceptual entities of OpenDAN. For example, if Agent A wants to send a message to another Agent B and wait for the processing result of the message, it can simply use the AI BUS:

resp = await AIBus.send_msg(agentA,agentB,msg)

The abstraction of AI BUS allows different Agents to choose suitable physical hosts to run according to the system's needs. This is also why we define AIOS as a "Network OS". All entities registered on the AI BUS can be accessed via the AI BUS interface. As needed, we will also persist the messages in the AI BUS, so that when a distributed system experiences regular failures, it can continue to work after being pulled up again.

The concept of AI BUS has many similarities with traditional MessageQueues.

Chatsession

Intuitively, ChatSession saves the "chat history". The chat history is currently the natural source of Agent Memory capability.
Determining a ChatSession has three key attributes: Owner, Topic, and Remote. An operation where A sends a message to B and gets B's reply will generate two messages, and save them in two different ChatSessions.

Currently, ChatSession is saved based on sqlite. After the Zone-level D-RDB is set up in the future, it will be migrated to RD-DB.

Knowlege Base

Provide a unified interface, support switching vector database kernel
Integrate open source vector database (pay attention to Lience selection)
When designing the interface, prepare for future access control

Under development.

Personal Models (>0.5.2)

The goal of this subsystem is to support users in training models based on their own data, including subsequent usage, management, deployment, and other operations of the model. In the early stages, invoking this module and adding new models should be operations performed by advanced users.

It is still uncertain whether this module will be actively used in intelligent applications.

Frame Services

The implementation offers a range of fundamental services for traditional Network OS. It connects users' devices to the same Zone via the network and provides a unified abstraction for application access. This component serves as a basic framework and computing resource for the operation of intelligent applications on the upper layer. On the lower layer, it connects various types of hardware through different protocols, integrates resources, and offers a unified abstraction for intelligent applications to access.

Kernel Service (0.5.2)

The Kernel Service implements the System Calls for OpenDAN and provides a "kernel mode" abstraction. In version 0.5.1, since this component is not yet implemented, all code—whether system services or application code—runs in kernel mode.

In the future, we plan to maintain the system running in this mode for an extended period, as it facilitates debugging.

The Kernel Service is mainly composed of the following component:

System-Call Interface

Centralizes the provision and management of system call interfaces.

Name Service

It is the most crucial foundational state service in a cluster (Zone) comprised of all the user's devices. As the core service of the Zone, it provides the most basic guarantee for the availability and reliability of all services within the Zone. When a user needs to restore the Zone from a backup, the Name Service is the first service to be restored.

Its functionality is similar to that of etcd but includes a on-chain component. From a deployment standpoint, it needs to be operationally optimized for small clusters made up of consumer-level user devices.

Node Daemon

It is a foundational service that runs on all devices that join the Zone, responding to essential kernel scheduling commands. It adjusts the services and data running on that particular device.

ACL Control (>0.5.2)

Another essential foundational service of the kernel, it is responsible for the overall management of permissions related to users, applications, and data. The Runtime Context reads the relevant information and implements proper isolation.

Contact Manager

From the perspectives of permission control and some early application scenarios, understanding the user's basic interpersonal relationships is an important component of OpenDAN's intelligent permission system. Therefore, we provide a contact management component at the system kernel layer. This component can be considered an upgraded version of the traditional operating system's "User Group" module.

Runtime Context (0.5.2)

It serves as the runtime container for user-mode code, offering isolation guarantees for user-mode code.

Depending on the type of service, we offer three different Runtime Contexts. The most commonly used is Docker, followed by virtual machines, and finally, entire physical machines.

Package System

The Package Manager is a fundamental component of the system for managing Packages. The sub system provides fundamental support for packaging, publishing, downloading, verifying, installing, and loading folders containing required packages under different scenarios. Based on relevant modules, it's easy to build a package management system similar to apt/pip/npm.

The system design has deeply referenced Git and NDN networks. The distinction between client and server is not that important. Through cryptography, it achieves decentralized trustworthy verification. Any client can become an effective repo server through simple configuration.

Based on the Package System, we can implement the publishing, downloading, and installation of extendable foundational entities such as Agents, Functions, and Environments. This enables the creation of an app store on OpenDAN.

Under development.

AI Compute System

The purpose of designing Compute System is to enable our users to use their computational resources more efficiently. These computational resources can come from devices they own (such as their workstations and gaming laptops), as well as from cloud computing and decentralized computing networks.

compute_kernel

The interface of this component is designed from the perspective of the model user rather than the model trainer. The basic form of its interface is:

compute_kernel.do_compute(function_name, model_name,args)

Scheduler

The goal of the Scheduler component is to select an appropriate ComputeNode to run tasks based on the tasks in the task queue and the known status of all ComputeNodes (which may be delayed). In the current version (0.5.1), the implementation of the Scheduler is only to get the system up and running. In the next version (0.5.2), the overall framework for computing resource scheduling needs to be established.

LLM

LLM support is the system's most core functionality. OpenDAN requires that there be at least one available LLM computing node in the system. The supported interfaces are as follows:

def llm_completion(self,prompt:AgentPrompt,mode_name:Optional[str] = None,max_token:int = 0):

In the current era, many teams are working hard to develop new LLMs . We will also actively integrate these LLMs into OpenDAN.

  • GPT4 (Cloud)
  • LLaMa2 Under development.
  • Claude2
  • Falcon2
  • MPT-7B
  • Vicuna

Embeding

Provides computational support for the vectorization of different types of user data. The specific algorithms supported depend on the requirements of the entire pipeline.

Under development.

Txt2img

Generate images based on text descriptions. According to the implementation mode, we can interface with a cloud-based implementation and a local implementation.

The local implementation will definitely use Stable Diffusion.

Under development.

Img2txt (>0.5.2)

Generate appropriate text descriptions for the specified images.

Txt2voice

Generate voice based on specified text, using a selected model (the focus is on personal models), and guided by certain emotional cue words.

To be developed

Voice2txt

Extract text information from a segment of audio (or video) through speech recognition.

To be developed

Language Translate

Translate a segment of text into a specified target language.

Since LLM itself is developed based on the foundation of translation, I am currently considering whether it is necessary to provide a text translation interface within the computing kernel. Following the principle of not adding entities if they are not needed, it can be postponed from development.
pending

Storage System

The file system (state storage) has always been a critical part of operating systems. Its implementation directly impacts the system's reliability and performance. The challenge of this section is how to transfer key technologies that are already mature in traditional cloud computing to clusters composed of consumer-level electronic devices with low operational maintenance, while still maintaining sufficient reliability and performance. The implementation of the subsystems in this section is of limited stability. Therefore, I believe the focus of OpenDAN in the early stages for this section should be on establishing stable interfaces to get the system running as quickly as possible, with independent improvements to be made in the future.

From the standpoint of trade-offs, our priorities are:

  • Abandoning continuous consistency guarantees, the system only provides strong assurance for reliability up to "backup points." This means we allow the loss of some newly added data if the system experiences a failure.

  • Allowing downtime, considering the consumer-level power supply, a short period of unavailability of the system itself will not have a significant impact. We can stop the service for backup/migration when necessary.

DFS

Distributed file system, combining the public storage space on all devices to form a highly reliable, highly available file system.

Object Storage

Distributed object storage, and based on MapObject, it implements trustworthy RootState management.

(MapObject and RootState is a concept from CYFS)

Under development.

D-RDB

Distributed relational database, providing highly reliable and highly available relational database services (mainly used for OLTP - Online Transaction Processing). We do not encourage application developers to use RDB on a large scale; the main reason for offering this component is for compatibility considerations.

Pending.

D-VDB

Distributed vector database, which currently appears to be the core foundational component of the Knowledge Base library.

Under development.

Embeding Piplines

Read appropriate Raw Files and Meta Data from the specified location in the Storage System. After passing through a series of Embedding Pipelines, save the results to the Vector Database as defined by the Knowledge Base.

Under development.

Network Gateway (0.5.2)

Obtain user data by recognizing network data.
The Gateway also provides an external access entrance for the entire system, and access control can be unified.
Provides the bus abstraction in the network operating system (the network cable is the bus), devices within the Zone are recognized by the system as plug-and-play devices, and can be called by applications/Agents

NDN Client

AI-related models are all quite large, so we offer a download tool based on NDN (Named Data Networking) theory to replace curl. The NDN Client will continue to support new Content-Based protocols in the future, allowing OpenDAN developers to publish large packages more quickly, at lower costs, and more conveniently.

Build-in Service

The basic functions of the system implemented by "user mode" can be regarded as pre -installed applications of the system.Let the system have basic availability without installing any intelligent applications.
We should build built-in applications for 1,2 early preset scenarios, rather than all possible scenarios.This allows us to run the system faster and allow us to discover the shortcomings of the system faster, so as to improve the system faster.

Spider

A series of reptiles are provided to help users import their data into the system.

E-mail Spider

The most basic spider is used to capture user mail data.The main purpose of this is to determine the general data format(include text,image,contact) and location to save the grabbed data.

Telegram Spider

Allow users to capture their own Telegram chat records and save them in the Knowlege Base

To be developed.

Twitter Spider (0.5.2)

Allows users to scrape their own Twitter data and save it in the Knowledge Base.

Facebook Spider (0.5.2)

Allows users to scrape their own Facebook data and save it in the Knowledge Base.

Agent Message Tunnel (0.5.2)

The original ROBOT module, after considering its actual function, was renamed the Agent Message Tunnel.
This is the default function supported by the system. It supports users to configure different message channels for different Agent/Workflow, so that users can interact with Agent/Worflow through existing software/services.From the perspective of product, the goal of this module can use the core function of OpenDAN without installing any new software on the one hand. On the other hand, it also creates a stronger mental model for users: My Agent can registered social account, so that Agent has his own identity in the virtual world.

E-mail Tunnel

Let Agent have its own email account. After registration, users can interact with Agent through mail.

Telegram Tunnel

Let Agent have his own Telegram account. After registration, users can interact with Agent through Telegram.

Discord Tunnel

Let Agent have its own discord account. After registration, users can interact with agent through Discord.

Home IoT Environment (0.5.2)

We've implemented a significant built-in environment: the Home Environment. Through this environment, the AI Agent can access real-time status of the home via installed IoT devices, including reading temperature and humidity information, accessing security camera data, and controlling smart devices in the home. This allows users to better manage a large number of smart IoT devices through AI technology. For instance, a user can simply tell the Agent, "Richard is coming over to watch a movie this afternoon," and the AI Agent will automatically read the security camera data, recognize Richard upon arrival, turn on the home projector, close the curtains, and turn on the wall lights.

Thanks to LLM's powerful natural language understanding, all we need to do is connect a smart microphone to the Home Environment and configure a simple voice-to-text feature. This makes it easy to implement a privately deployed and very intelligent version of Alexa.

In terms of system design, we use the Home Environment as an intermediary layer, freeing OpenDAN from having to spend energy on dealing with compatibility issues with various existing, complex IoT protocols. This keeps the system simple and makes it easier to expand.

Compatible Home Assistant

Home Assistant is a well-known, open-source IoT system. We could consider implementing the Home Environment based on the Home Assistant's API.

Build-in Agents/Apps

Once users have installed OpenDAN, it should have some basic functionalities, even without the installation of any third-party smart applications. These basic functions are provided via built-in Agents/Applications. Built-in applications have two important implications for OpenDAN:

  1. They provide a developer's perspective to scrutinize whether our design is reasonable and the application development process is smooth.
  2. Through one or two scenarios, OpenDAN can be quickly put into use by real users in a production environment, and these scenarios can serve as a basis for driving system improvements in OpenDAN.

Agent: Personal Information Assistant

Through interacting with this Agent, users can use natural language to query information that has already been saved in the Knowledge-Base. For example, "Please show me the photos from my meeting with Richard last week." They can also find their information more accurately based on some interactive questions.

To be developed.

Agent: Bulter Jarvis (0.5.2)

The Butler Agent Jarvis can recognize certain special commands. Through these commands, it can communicate with other Agents in the system, check the system's status, and use all the system's functionalities. It can be seen as another entry point to AIOS_Shell.

Another important function of the Jarvis is to create sessions. When a user has many workflows/agents installed on their OpenDAN, they might not know which workflow/agent to talk to in order to solve a problem. I envision the future mode to be: "If you don't know who to turn to, ask the Jarvis." The Jarvis will create or find a suitable session based on a brief conversation with the user, and then guide the user into this session.

Based on these two functions, the Jarvis might be the only "special Agent" that requires custom development among all Agents, and it is a part of the system.

App: Personal Station (0.5.2)

The Personal Station is a built-in application that provides a graphical user interface for users to interact with the system. It is a web application that can be accessed through a browser. It is also the first application that users will see after installing OpenDAN. It provides a simple interface for users to interact with the system, and it also provides a way for users to install new applications.

The main functions of Personal Station include:

  1. Library, with the help of Personal Information Assistant, you can better manage your own photos, videos, music, documents, etc., and share them with friends more effectively. (For example, ask the assistant to share photos from an event, selecting from those you've starred, and distribute them to friends based on the people appearing in the photos.)
  2. HomePage, with functions similar to Facebook/Twitter, where you can post content you want to share. You can also open your Agent to friends and family, allowing them to interact with your Agent, discuss schedule arrangements, and query your KnowledgeBase for open content.

Home Station is a mobile-first WebApp.

UI

CLI UI (aios_shell)

The system provides the command line UI interface priority, facing developers and early senior users.

Web UI (0.5.2)

Web UI interface for end users

0.5.1 Integration Test (Senior*3)

Can be divided into 3 parts
1.Workflow -> AI Agent -> AI Agent
2.Spider -> Pipline -> Knowledge Base
3.AI Agent <- Functions <- Knowledge Base

SDK

Workflow SDK

The SDK allows developers to expand the new workflow/agent to the system.
At present, the SDK has completed the most original version. In ROOTFS/, the .tmol file is written according to the directory structure, and a new workflow/ agent can be added to the system.

AI Environments SDK (>0.5.2)

The SDK allows developers to expand the system that can be called by AI, including

  • Expand the new environment
  • Expand the new function

Compute Kernel SDK (>0.5.2)

This SDK allows developers to expand more core capabilities to the system

Document (>0.5.2)

When we release 0.5.3, we must complete at least 3 documents:

  1. OpenDan's complete system design document, including the design document of each subsystem.
  2. Installation/use document for end users.
  3. SDK document for developers.

Agent Jarvis has been refactored to use functions of ChatGPT

Check it out and have a play!

The introduction of functions in the ChatGPT 0613 updates has resulted in a significant productivity boost. I'm confident that many products based on ChatGPT will undergo major transformations as a result. Refactoring Jarvis with functions was a seamless process, and I anticipate that this will become the standard approach for developing Agents based on LLMs. I look forward to other open-source LLMs catching up quickly.

Post-refactoring, Jarvis has exhibited a notable improvement in response speed. Its selection of functional modules has become more accurate and stable, with fewer errors. Interestingly, token consumption has also decreased. This could likely be attributed to OpenAI employing a specialized method for compressing function schemas.

This is just the first version of the functions feature. Currently, ChatGPT can only choose between formatted returns and natural language returns. It would be even better if natural language descriptions could be included alongside formatted returns.

On a broader scale, the functions feature has simplified the development of Agents. However, this convenience ties developers ever more closely to OpenAI. I hope that open-source LLMs will keep pace to avoid a "Big Brother" situation in the AI world.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.