Giter VIP home page Giter VIP logo

mem0's Introduction

Mem0 - The Memory Layer for Personalized AI

Learn more · Join Discord

Mem0 Discord Mem0 PyPI - Downloads Y Combinator S24

Introduction

Mem0 (pronounced as "mem-zero") enhances AI assistants and agents with an intelligent memory layer, enabling personalized AI interactions. Mem0 remembers user preferences, adapts to individual needs, and continuously improves over time, making it ideal for customer support chatbots, AI assistants, and autonomous systems.

Graph Memory Integration New Feature: Introducing Graph Memory. Check out our documentation.

Core Features

  • Multi-Level Memory: User, Session, and AI Agent memory retention
  • Adaptive Personalization: Continuous improvement based on interactions
  • Developer-Friendly API: Simple integration into various applications
  • Cross-Platform Consistency: Uniform behavior across devices
  • Managed Service: Hassle-free hosted solution

How Mem0 works?

Mem0 leverages a hybrid database approach to manage and retrieve long-term memories for AI agents and assistants. Each memory is associated with a unique identifier, such as a user ID or agent ID, allowing Mem0 to organize and access memories specific to an individual or context.

When a message is added to the Mem0 using add() method, the system extracts relevant facts and preferences and stores it across data stores: a vector database, a key-value database, and a graph database. This hybrid approach ensures that different types of information are stored in the most efficient manner, making subsequent searches quick and effective.

When an AI agent or LLM needs to recall memories, it uses the search() method. Mem0 then performs search across these data stores, retrieving relevant information from each source. This information is then passed through a scoring layer, which evaluates their importance based on relevance, importance, and recency. This ensures that only the most personalized and useful context is surfaced.

The retrieved memories can then be appended to the LLM's prompt as needed, enhancing the personalization and relevance of its responses.

Use Cases

Mem0 empowers organizations and individuals to enhance:

  • AI Assistants and agents: Seamless conversations with a touch of déjà vu
  • Personalized Learning: Tailored content recommendations and progress tracking
  • Customer Support: Context-aware assistance with user preference memory
  • Healthcare: Patient history and treatment plan management
  • Virtual Companions: Deeper user relationships through conversation memory
  • Productivity: Streamlined workflows based on user habits and task history
  • Gaming: Adaptive environments reflecting player choices and progress

Get Started

The easiest way to set up Mem0 is through the managed Mem0 Platform. This hosted solution offers automatic updates, advanced analytics, and dedicated support. Sign up to get started.

If you prefer to self-host, use the open-source Mem0 package. Follow the installation instructions to get started.

Installation Instructions

Install the Mem0 package via pip:

pip install mem0ai

Alternatively, you can use Mem0 with one click on the hosted platform here.

Basic Usage

Mem0 requires an LLM to function, with gpt-4o from OpenAI as the default. However, it supports a variety of LLMs; for details, refer to our Supported LLMs documentation.

First step is to instantiate the memory:

from mem0 import Memory

m = Memory()
How to set OPENAI_API_KEY
import os
os.environ["OPENAI_API_KEY"] = "sk-xxx"

You can perform the following task on the memory:

  1. Add: Store a memory from any unstructured text
  2. Update: Update memory of a given memory_id
  3. Search: Fetch memories based on a query
  4. Get: Return memories for a certain user/agent/session
  5. History: Describe how a memory has changed over time for a specific memory ID
# 1. Add: Store a memory from any unstructured text
result = m.add("I am working on improving my tennis skills. Suggest some online courses.", user_id="alice", metadata={"category": "hobbies"})

# Created memory --> 'Improving her tennis skills.' and 'Looking for online suggestions.'
# 2. Update: update the memory
result = m.update(memory_id=<memory_id_1>, data="Likes to play tennis on weekends")

# Updated memory --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'
# 3. Search: search related memories
related_memories = m.search(query="What are Alice's hobbies?", user_id="alice")

# Retrieved memory --> 'Likes to play tennis on weekends'
# 4. Get all memories
all_memories = m.get_all()
memory_id = all_memories["memories"][0] ["id"] # get a memory_id

# All memory items --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'
# 5. Get memory history for a particular memory_id
history = m.history(memory_id=<memory_id_1>)

# Logs corresponding to memory_id_1 --> {'prev_value': 'Working on improving tennis skills and interested in online courses for tennis.', 'new_value': 'Likes to play tennis on weekends' }

Tip

If you prefer a hosted version without the need to set up infrastructure yourself, check out the Mem0 Platform to get started in minutes.

Graph Memory

To initialize Graph Memory you'll need to set up your configuration with graph store providers. Currently, we support Neo4j as a graph store provider. You can setup Neo4j locally or use the hosted Neo4j AuraDB. Moreover, you also need to set the version to v1.1 (prior versions are not supported). Here's how you can do it:

from mem0 import Memory

config = {
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": "neo4j+s://xxx",
            "username": "neo4j",
            "password": "xxx"
        }
    },
    "version": "v1.1"
}

m = Memory.from_config(config_dict=config)

Documentation

For detailed usage instructions and API reference, visit our documentation at docs.mem0.ai. Here, you can find more information on both the open-source version and the hosted Mem0 Platform.

Star History

Star History Chart

Support

Join our community for support and discussions. If you have any questions, feel free to reach out to us using one of the following methods:

Contributors

Join our Discord community to learn about memory management for AI agents and LLMs, and connect with Mem0 users and contributors. Share your ideas, questions, or feedback in our GitHub Issues.

We value and appreciate the contributions of our community. Special thanks to our contributors for helping us improve Mem0.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

mem0's People

Contributors

aaishikdutta avatar ahnedeee avatar aryankhanna475 avatar cachho avatar cclauss avatar deshraj avatar dev-khant avatar deven298 avatar eltociear avatar gasolin avatar ianupamsingh avatar infinite-wait avatar juananpe avatar kmitul avatar krescent avatar maccuryj avatar misrasaurabh1 avatar navyaalapati13 avatar patcher9 avatar pranavpuranik avatar prateekchhikara avatar prikshit7766 avatar rupeshbansal avatar sahilyadav902 avatar shenxiangzhuang avatar sidmohanty11 avatar subhajit20 avatar sw8fbar avatar taranjeet avatar vatsalrathod16 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mem0's Issues

Non-feature request - Modularize the application

The Embedchain class has a lot of methods and it would add value in terms of code readability to abstract it a little bit. There are many open issues about integrating multiple llms, vector dbs or embedding. While I see a level of abstraction in the vector db folder and that can be leveraged for further integration options, I believe we should do something similar for the methods where we use the embedding models and the llm model. I have raised a PR for this #92 which attends to abstracting the data formats for loaders and chunkers . @taranjeet @cachho please let me know if this is something we can add, so we can have some further discussions on how to structure it for the more critical pieces like the embedding models and chat completions.

Feature Request - Integrate Azure's OpenAI API as an Option

Currently, embedchain is designed to use OpenAI's API for creating embeddings and leveraging the power of GPT-3 for generating answers in the context of chatbots. This feature request proposes to include the option of using Azure's OpenAI API as an alternative.

Azure, a comprehensive suite of cloud services offered by Microsoft, also provides an implementation of OpenAI API. Integration with Azure's OpenAI API would provide a choice to the users to select between OpenAI's original API and Azure's version based on their specific requirements and preferences.

Issue on TypeVar

When trying to run the sample code I get this:
ImportError: cannot import name 'TypeVar' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)

I am running this in a Databricks notebook.

Insert Local File instead of link

How do I train the model with my local files? Suppose I have a pdf in root directory and I want to add it like mygpt.add("pdf_file", "book.pdf"). Is it possible?

Add tests

  • need to setup tests so that contributing to the repo becomes easier and faster

Add support for caching

how does the framework handle caching - does it embed everything again and add to database each time you run the script or does it know that a given data source is already embedded and in database therefore no need to incur that expense?

Note: This issue is opened on behalf of discord user bodech, message link

Add Huggingface embeddings

I would appreciate it if you add Huggingface embeddings, because it would be free to use, in contrast to OpenAI's embeddings, which uses ada I believe. So something along those lines would be great:

`embeddings_model_name = "sentence-transformers/all-MiniLM-L6-v2"

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)`

Altough I must admit that I do not know the difference between openAI and this model when it comes to embeddings, if anyone knows, please let me know what those differences are.

Using GPT-4 for prompting

Hi there, I see that the framework is using GPT-3.5 last release , for doing prompting
How I can change to GPT-4 ?

My Best for this project !
Regards

Fine tune tone for the answer

  • Wondering if it is possible to fine-tune the tone the AI replying to me? such as if I provided the dialogue of Sherlock Holmes, could it reply me with the tone that Sherlock talks? Ty!
  • this issue is opened on behalf of twitter user ring_hyacinth, tweet

What are the implications of allowing more documents as context?

Let's talk about this method:

def query(self, input_query):
        """
        Queries the vector database based on the given input query.
        Gets relevant doc based on the query and then passes it to an
        LLM as context to get the answer.

        :param input_query: The query to use.
        :return: The answer to the query.
        """
        result = self.collection.query(
            query_texts=[input_query,],
            n_results=1,
        )
        result_formatted = self._format_result(result)
        answer = self.get_answer_from_llm(input_query, result_formatted[0][0].page_content)
        return answer

As far as I can tell, (and I'm just reading, not necessarily understanding, correct me if I'm wrong), it will return the one single closest document. n_results=1

What if we have a more granular database, cut into smaller pieces?

E.g. the webpages and documents we added are only a paragraph long. Then it will only return that one paragraph. So let's keep imagining that a user asks a complex question for which the correct answer is stored in more than one document. Then it would only answer part of the question with limited knowledge.

Here's a simple example. Let's say we are in the car business and feed our database information about the Corvette, one page for each generation. Then a user asks how much does horsepower does the current Corvette make and how much did the first one make?. If my understanding is correct, it could not answer that question (for this specific question, ChatGPT knows the answer out of the box, but you get the point).

For these kinds of use cases I'm proposing to allow the retrieval of more than one document, configurable by the user. 1 can stay as the default. These are then all passed as context so a LLM can do it's magic and process the information.

The downside I can see is that it will require more tokens, and thus cost more. This is a compromise the user has to make for better results. The max token limit should also be considered, especially in cases where the database contains short and long text, for this edge case, max tokens should be configurable by the user, and in case a limit is set, the tokens of the prompt should be counted and cut off if necessary. edit: openai has a max tokens parameter that does all of this

P.S. Why are we prompting with prompt = f"""Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context} if we just use one piece of context.

I will propose a PR for his.

Project Tools

Setup Following Project Management Tools

  1. Project Package and Environment Manager: Poetry is recommended
  2. pytests and pylint setup
  3. Contributing Guide
  4. Sphnix Documentation and deploying on the readthedocs server
  5. Docstrings for API : Google Style is recommended.
  6. CI/CD workflows

I can help with above

epub format

please allow epub format for one of the types supported

Add meta data

  • Is there a way to add more metadata on each document? something like document id - and get it back in the response?
  • opened on behalf of discord user ikinnrot, message link

[Feature Request] Auto-Detect data-type, make the it optional

First off... Great job!!! Simple and tight code. Much appreciate you making/sharing it.

There was one quick suggestion I had: In order to minimize boilerplate code, it would be good to modify the interface to make the file_type variable optional and detected based on the input content. If the variable is defined then the code would check the file to ensure that it is of the specified type.

This ease-of-life modification should be added early in development to minimize more extensive refactors down the line.

But I wholly understand if you have a different design goal for making this a required input.

Not installing

Trying to install using pip3 and it returns this error:

Building wheels for collected packages: hnswlib
  Building wheel for hnswlib (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for hnswlib (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [199 lines of output]
      running bdist_wheel
      running build
      running build_ext
      creating var
      creating var/folders
      creating var/folders/8c
      creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn
      creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.o -std=c++14
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.o -fvisibility=hidden
      building 'hnswlib' extension
      creating build
      creating build/temp.macosx-13-arm64-cpython-311
      creating build/temp.macosx-13-arm64-cpython-311/python_bindings
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include -I/opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/core/include -I./hnswlib/ -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c ./python_bindings/bindings.cpp -o build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -O3 -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO=\"0.7.0\" -std=c++14 -fvisibility=hidden
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < dim; i++) {
                              ~ ^ ~~~
      ./python_bindings/bindings.cpp:102:13: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
                  buffer.ndim);
                  ^~~~~~~~~~~
      ./python_bindings/bindings.cpp:126:17: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
                      ids_numpy.ndim, feature_rows);
                      ^~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:126:33: warning: format specifies type 'int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
                      ids_numpy.ndim, feature_rows);
                                      ^~~~~~~~~~~~
      ./python_bindings/bindings.cpp:121:58: warning: comparison of integers of different signs: 'std::__vector_base<long, std::allocator<long>>::value_type' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              if (!((ids_numpy.ndim == 1 && ids_numpy.shape[0] == feature_rows) ||
                                            ~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:647:28: warning: unused variable 'data' [-Wunused-variable]
                          float* data = (float*)items.data(row);
                                 ^
      ./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:876:1: warning: 'pybind11_init' is deprecated: PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE [-Wdeprecated-declarations]
      PYBIND11_PLUGIN(hnswlib) {
      ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:432:20: note: expanded from macro 'PYBIND11_PLUGIN'
                  return pybind11_init();                                                               \
                         ^
      ./python_bindings/bindings.cpp:876:1: note: 'pybind11_init' has been explicitly marked deprecated here
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:426:5: note: expanded from macro 'PYBIND11_PLUGIN'
          PYBIND11_DEPRECATED("PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE")                     \
          ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:194:43: note: expanded from macro 'PYBIND11_DEPRECATED'
      #    define PYBIND11_DEPRECATED(reason) [[deprecated(reason)]]
                                                ^
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:95:11: warning: field 'link_list_locks_' will be initialized after field 'label_op_locks_' [-Wreorder-ctor]
              : link_list_locks_(max_elements),
                ^
      ./python_bindings/bindings.cpp:488:39: note: in instantiation of member function 'hnswlib::HierarchicalNSW<float>::HierarchicalNSW' requested here
                  new_index->appr_alg = new hnswlib::HierarchicalNSW<dist_t>(
                                            ^
      ./python_bindings/bindings.cpp:880:38: note: in instantiation of member function 'Index<float>::createFromParams' requested here
              .def(py::init(&Index<float>::createFromParams), py::arg("params"))
                                           ^
      ./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:892:28: note: in instantiation of member function 'Index<float>::knnQuery_return_numpy' requested here
                  &Index<float>::knnQuery_return_numpy,
                                 ^
      ./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:619:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
                  if (rows <= num_threads * 4) {
                      ~~~~ ^  ~~~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:257:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (features != dim)
                  ~~~~~~~~ ^  ~~~
      ./python_bindings/bindings.cpp:898:28: note: in instantiation of member function 'Index<float>::addItems' requested here
                  &Index<float>::addItems,
                                 ^
      ./python_bindings/bindings.cpp:261:18: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (rows <= num_threads * 4) {
                  ~~~~ ^  ~~~~~~~~~~~~~~~
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < dim; i++) {
                              ~ ^ ~~~
      ./python_bindings/bindings.cpp:323:47: note: in instantiation of function template specialization 'hnswlib::HierarchicalNSW<float>::getDataByLabel<float>' requested here
                  data.push_back(appr_alg->template getDataByLabel<data_t>(id));
                                                    ^
      ./python_bindings/bindings.cpp:903:49: note: in instantiation of member function 'Index<float>::getDataReturnList' requested here
              .def("get_items", &Index<float, float>::getDataReturnList, py::arg("ids") = py::none())
                                                      ^
      ./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:467:27: note: in instantiation of member function 'Index<float>::getAnnData' requested here
              auto ann_params = getAnnData();
                                ^
      ./python_bindings/bindings.cpp:945:43: note: in instantiation of member function 'Index<float>::getIndexParams' requested here
                      return py::make_tuple(ind.getIndexParams()); /* Return dict (wrapped in a tuple) that fully encodes state of the Index object */
                                                ^
      ./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:198:
      ./hnswlib/bruteforce.h:105:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < k; i++) {
                              ~ ^ ~
      ./hnswlib/bruteforce.h:59:5: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::searchKnn' requested here
          ~BruteforceSearch() {
          ^
      ./python_bindings/bindings.cpp:748:13: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::~BruteforceSearch' requested here
                  delete alg;
                  ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1397:5: note: in instantiation of member function 'BFIndex<float>::~BFIndex' requested here
          delete __ptr;
          ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1658:7: note: in instantiation of member function 'std::default_delete<BFIndex<float>>::operator()' requested here
            __ptr_.second()(__tmp);
            ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1612:19: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::reset' requested here
        ~unique_ptr() { reset(); }
                        ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1872:40: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::~unique_ptr' requested here
                  v_h.holder<holder_type>().~holder_type();
                                             ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1535:26: note: in instantiation of member function 'pybind11::class_<BFIndex<float>>::dealloc' requested here
              record.dealloc = dealloc;
                               ^
      ./python_bindings/bindings.cpp:957:9: note: in instantiation of function template specialization 'pybind11::class_<BFIndex<float>>::class_<>' requested here
              py::class_<BFIndex<float>>(m, "BFIndex")
              ^
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:198:
      ./hnswlib/bruteforce.h:113:27: warning: comparison of integers of different signs: 'int' and 'const size_t' (aka 'const unsigned long') [-Wsign-compare]
              for (int i = k; i < cur_element_count; i++) {
                              ~ ^ ~~~~~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:960:44: note: in instantiation of member function 'BFIndex<float>::knnQuery_return_numpy' requested here
              .def("knn_query", &BFIndex<float>::knnQuery_return_numpy, py::arg("data"), py::arg("k") = 1, py::arg("filter") = py::none())
                                                 ^
      ./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:778:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (features != dim)
                  ~~~~~~~~ ^  ~~~
      ./python_bindings/bindings.cpp:961:44: note: in instantiation of member function 'BFIndex<float>::addItems' requested here
              .def("add_items", &BFIndex<float>::addItems, py::arg("data"), py::arg("ids") = py::none())
                                                 ^
      In file included from ./python_bindings/bindings.cpp:6:
      ./hnswlib/hnswlib.h:80:13: warning: unused function 'AVX512Capable' [-Wunused-function]
      static bool AVX512Capable() {
                  ^
      34 warnings generated.
      creating build/lib.macosx-13-arm64-cpython-311
      x86_64-apple-darwin13.4.0-clang++ -bundle -undefined dynamic_lookup -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/acf/opt/anaconda3/lib -L/Users/acf/opt/anaconda3/lib -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -o build/lib.macosx-13-arm64-cpython-311/hnswlib.cpython-311-darwin.so -stdlib=libc++ -mmacosx-version-min=10.7
      ld: warning: -pie being ignored. It is only used when linking a main executable
      ld: unsupported tapi file type '!tapi-tbd' in YAML file '/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/lib/libSystem.tbd' for architecture x86_64
      clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
      error: command '/Users/acf/opt/anaconda3/bin/x86_64-apple-darwin13.4.0-clang++' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects

Add support to load codebase

  • Thanks, Such a handy repo! Loving the user-friendly API. Can't wait to see it support a whole codebase(just like other types of documents) in the future:)
  • opened on behalf of twitter user ericman65204539, tweet

Add new format - sitemap

Hi @taranjeet I was working on my mini project to chat over a small-sized blog and I found myself writing some piece of code to iterate over the sitemap of the website. I think it would be valueable if we can provide format support for a sitemap to automate multiple web page loading and chunking. Do you already have a issue tracking that or it is something that can be added
Right now I am doing something like this:

# Download sitemap.xml file from a website and extract all the links
def get_links(url):
    url = f'{url}/sitemap.xml'
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'lxml')
        links = [link.text for link in soup.find_all('loc')]
        return links
    else:
        print(f'Error: {response.status_code}')
        return None

feature request: Add New Format "Image"

Embedchain will parse uploaded images, extract text information and embed.
Ex, Screenshot of a book chapter.

The parser package should be configurable, the default should be opensource.

Reset the database

  • it would also be nice if there was a method to reset the database. I have no idea about chroma, I'm sure you can just delete the db folder.
  • this issue is opened on behalf of discord user cachho, message link

ImportError: cannot import name 'App' from partially initialized module 'embedchain' (most likely due to a circular import)

I encountered a strange problem: my Python code consists of only one file, and when the name of this Python file is the same as the name of the library it references: embedchain.py, an error is reported. ImportError: cannot import name 'App' from partially initialized module 'xxx' (most likely due to a circular import)

So, Just rename the file to anther name, and it will be fixed.

Issue with get_openai_answer

max_tokens parameter being set to 1000 is an issue. With having multiple sources (with long urls) and larger webpages, this is quickly eaten up. When the token amount is exceeded no warning is given except from openAI.

openai.error.RateLimitError: The server had an error while processing your request. Sorry about that!

def get_openai_answer(self, prompt):
messages = []
messages.append({
"role": "user", "content": prompt
})
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=messages,
temperature=0,
max_tokens=1000,
top_p=1,
)
return response["choices"][0]["message"]["content"]

[BUG] Chroma DB Duplicate ID Error

This is my code:

import os
os.environ["OPENAI_API_KEY"] = "sk-???"
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add_local("pdf_file", "docs/masnavi-en.pdf")
print(naval_chat_bot.query("Who is the most powerful man?"))

I get chromadb.errors.DuplicateIDError: Expected IDs to be unique, found duplicates for. Where is the problem?

P.S: This was my second attempt. The first one with a different pdf document was successful.

Feature Request: Parameters and OpenAI model

Parameters to specify OpenAI model and settings.

ex. I'm subclassing App and updating the model this way to test:

def get_openai_answer(self, prompt):
        messages = []
        messages.append({
            "role": "user", "content": prompt
        })
        response = openai.ChatCompletion.create(
            model="gpt-4-0613",
            messages=messages,
            temperature=0.25,
            max_tokens=1000,
            top_p=1,
        )
        return response["choices"][0]["message"]["content"]

It would be awesome to have a few parameters when querying for temperature,max_tokens, and top_p as well. Or globally/in env? not sure what's best, but happy to create a PR.

add new format sqldatabase

specifically im working with snowflake but would love to be able to select a table, or set of tables as a format source from my data warehouse

Feature Request - Add DataFrames (Spark or Pandas) as Sources

Currently, embedchain allows the addition of various types of data sources such as YouTube videos, PDF files, and web pages to be processed and used in the application. This feature request proposes to extend this functionality to include DataFrames, specifically those from the Spark or Pandas libraries, as potential data sources.

DataFrames are a commonly used data structure for handling and manipulating data in Python, especially in data science and machine learning applications. They are particularly effective when dealing with large, structured datasets, which can include text data.

The ability to use DataFrames as a source of data would add a significant amount of flexibility to embedchain, as users could directly input their preprocessed and transformed data into the application. This could be beneficial in scenarios where the data is already available in a DataFrame format, such as when it has been preprocessed or transformed as part of a larger data pipeline.

The implementation of this feature would involve adding a new method to the App class (or modifying the existing .add() method) that accepts a DataFrame and its format (Spark or Pandas) as arguments. The method would then handle the loading of the data from the DataFrame into the application in the appropriate format, ready to be processed and used in the application.

This feature would increase the flexibility and usefulness of embedchain, making it more applicable to a wider range of scenarios and use-cases, and potentially attracting a broader user base. It would also align well with common data science workflows, which often involve the use of DataFrames for data manipulation and analysis.

Please consider adding this feature in a future update of embedchain.

openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

My code:

import os
from keys import *
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

from embedchain import App

naval_chat_bot = App()

naval_chat_bot.add("web_page", "https://psymplicity.com/")

print(naval_chat_bot.query("what is the three-step approach to private mental health care"))

The Error:

Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python
All data from https://psymplicity.com/ already exists in the database.
Traceback (most recent call last):
File "c:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\code\flask_app_2\embedchain_test.py", line 21, in
print(naval_chat_bot.query("what is the three-step approach to private mental health care"))
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 225, in query
answer = self.get_answer_from_llm(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 211, in get_answer_from_llm
answer = self.get_openai_answer(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 162, in get_openai_answer
response = openai.ChatCompletion.create(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 743, in _interpret_response_line
raise error.ServiceUnavailableError(
openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.