mem0ai / mem0 Goto Github PK

View Code? Open in Web Editor NEW

20.5K 120.0 1.9K 31.02 MB

The memory layer for Personalized AI

Home Page: https://mem0.ai

License: Apache License 2.0

Python 69.81% Makefile 0.15% Jupyter Notebook 9.04% Dockerfile 0.16% Mako 0.05% MDX 18.76% JavaScript 2.03% CSS 0.01%

ai chatgpt llm python chatbots rag application embeddings vector-database long-term-memory

mem0's Introduction

Learn more · Join Discord

Introduction

Mem0 (pronounced as "mem-zero") enhances AI assistants and agents with an intelligent memory layer, enabling personalized AI interactions. Mem0 remembers user preferences, adapts to individual needs, and continuously improves over time, making it ideal for customer support chatbots, AI assistants, and autonomous systems.

New Feature: Introducing Graph Memory. Check out our documentation.

Core Features

Multi-Level Memory: User, Session, and AI Agent memory retention
Adaptive Personalization: Continuous improvement based on interactions
Developer-Friendly API: Simple integration into various applications
Cross-Platform Consistency: Uniform behavior across devices
Managed Service: Hassle-free hosted solution

How Mem0 works?

Mem0 leverages a hybrid database approach to manage and retrieve long-term memories for AI agents and assistants. Each memory is associated with a unique identifier, such as a user ID or agent ID, allowing Mem0 to organize and access memories specific to an individual or context.

When a message is added to the Mem0 using add() method, the system extracts relevant facts and preferences and stores it across data stores: a vector database, a key-value database, and a graph database. This hybrid approach ensures that different types of information are stored in the most efficient manner, making subsequent searches quick and effective.

When an AI agent or LLM needs to recall memories, it uses the search() method. Mem0 then performs search across these data stores, retrieving relevant information from each source. This information is then passed through a scoring layer, which evaluates their importance based on relevance, importance, and recency. This ensures that only the most personalized and useful context is surfaced.

The retrieved memories can then be appended to the LLM's prompt as needed, enhancing the personalization and relevance of its responses.

Use Cases

Mem0 empowers organizations and individuals to enhance:

AI Assistants and agents: Seamless conversations with a touch of déjà vu
Personalized Learning: Tailored content recommendations and progress tracking
Customer Support: Context-aware assistance with user preference memory
Healthcare: Patient history and treatment plan management
Virtual Companions: Deeper user relationships through conversation memory
Productivity: Streamlined workflows based on user habits and task history
Gaming: Adaptive environments reflecting player choices and progress

Get Started

The easiest way to set up Mem0 is through the managed Mem0 Platform. This hosted solution offers automatic updates, advanced analytics, and dedicated support. Sign up to get started.

If you prefer to self-host, use the open-source Mem0 package. Follow the installation instructions to get started.

Installation Instructions

Install the Mem0 package via pip:

pip install mem0ai

Alternatively, you can use Mem0 with one click on the hosted platform here.

Basic Usage

Mem0 requires an LLM to function, with gpt-4o from OpenAI as the default. However, it supports a variety of LLMs; for details, refer to our Supported LLMs documentation.

First step is to instantiate the memory:

from mem0 import Memory

m = Memory()

How to set OPENAI_API_KEY

import os
os.environ["OPENAI_API_KEY"] = "sk-xxx"

You can perform the following task on the memory:

Add: Store a memory from any unstructured text
Update: Update memory of a given memory_id
Search: Fetch memories based on a query
Get: Return memories for a certain user/agent/session
History: Describe how a memory has changed over time for a specific memory ID

# 1. Add: Store a memory from any unstructured text
result = m.add("I am working on improving my tennis skills. Suggest some online courses.", user_id="alice", metadata={"category": "hobbies"})

# Created memory --> 'Improving her tennis skills.' and 'Looking for online suggestions.'

# 2. Update: update the memory
result = m.update(memory_id=<memory_id_1>, data="Likes to play tennis on weekends")

# Updated memory --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'

# 3. Search: search related memories
related_memories = m.search(query="What are Alice's hobbies?", user_id="alice")

# Retrieved memory --> 'Likes to play tennis on weekends'

# 4. Get all memories
all_memories = m.get_all()
memory_id = all_memories["memories"][0] ["id"] # get a memory_id

# All memory items --> 'Likes to play tennis on weekends.' and 'Looking for online suggestions.'

# 5. Get memory history for a particular memory_id
history = m.history(memory_id=<memory_id_1>)

# Logs corresponding to memory_id_1 --> {'prev_value': 'Working on improving tennis skills and interested in online courses for tennis.', 'new_value': 'Likes to play tennis on weekends' }

Tip

If you prefer a hosted version without the need to set up infrastructure yourself, check out the Mem0 Platform to get started in minutes.

Graph Memory

To initialize Graph Memory you'll need to set up your configuration with graph store providers. Currently, we support Neo4j as a graph store provider. You can setup Neo4j locally or use the hosted Neo4j AuraDB. Moreover, you also need to set the version to v1.1 (prior versions are not supported). Here's how you can do it:

from mem0 import Memory

config = {
    "graph_store": {
        "provider": "neo4j",
        "config": {
            "url": "neo4j+s://xxx",
            "username": "neo4j",
            "password": "xxx"
        }
    },
    "version": "v1.1"
}

m = Memory.from_config(config_dict=config)

Documentation

For detailed usage instructions and API reference, visit our documentation at docs.mem0.ai. Here, you can find more information on both the open-source version and the hosted Mem0 Platform.

Star History

Support

Join our community for support and discussions. If you have any questions, feel free to reach out to us using one of the following methods:

Contributors

Join our Discord community to learn about memory management for AI agents and LLMs, and connect with Mem0 users and contributors. Share your ideas, questions, or feedback in our GitHub Issues.

We value and appreciate the contributions of our community. Special thanks to our contributors for helping us improve Mem0.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

mem0's People

Contributors

Stargazers

Watchers

Forkers

dumoedss developerfred litanlitudan iwillcodeu hertera1 iserralv stjordanis lizalexandrita troystefano logp itsharex nilsonmichiles touristshaun cachho aaishikdutta regression-io tecworks-dev aoocar wpcfan lewis-huang zhangxiaochunxy blackwhites tiandevlab jeejeeguan barryqian750530 miblue119 fujohnwang jcizlz dev-wan linuer lorsso wowmarcomei dylan-jiang softsquire jwang287 mbp101 tulw4r echoflysky shyamal-anadkat veinsvx tomchapin kyleyangsde mrbusche zyjia wushaoneng chengfai deewooo vjimrunning petercao polokobe mohamadosama techthiyanes honestyallan breeze-64 junzhnag neil0306 limboinf haorand harrison001 leijiangling abhijithka vangelisma satya131113 candidosales hongjiu94 bylele yosshi-git papajaisndjon babybirdprd symbiand 1675fjz arindamatcal selflabs octag0no mivanovitch lingqian3 kumar045 fgeyw tonywhite11 amrrs go-bank666 doscadesa kingmaiker jefedeoro gen-ai-experts tonysteven guaguaguaxia hljpeter geocng coffee526069 warghatsatyam wmenjoy gobars iamleon121 youminxue suparious verheesj jan-karsten-kuhnke nemoair kudril1

mem0's Issues

Add support for Notion

Non-feature request - Modularize the application

The Embedchain class has a lot of methods and it would add value in terms of code readability to abstract it a little bit. There are many open issues about integrating multiple llms, vector dbs or embedding. While I see a level of abstraction in the vector db folder and that can be leveraged for further integration options, I believe we should do something similar for the methods where we use the embedding models and the llm model. I have raised a PR for this #92 which attends to abstracting the data formats for loaders and chunkers . @taranjeet @cachho please let me know if this is something we can add, so we can have some further discussions on how to structure it for the more critical pieces like the embedding models and chat completions.

Feature Request - Integrate Azure's OpenAI API as an Option

Currently, embedchain is designed to use OpenAI's API for creating embeddings and leveraging the power of GPT-3 for generating answers in the context of chatbots. This feature request proposes to include the option of using Azure's OpenAI API as an alternative.

Azure, a comprehensive suite of cloud services offered by Microsoft, also provides an implementation of OpenAI API. Integration with Azure's OpenAI API would provide a choice to the users to select between OpenAI's original API and Azure's version based on their specific requirements and preferences.

Streaming responses

is there any way we could have streaming responses? just throwing this out here: https://sdk.vercel.ai/docs/concepts/streaming
opened on behalf of discord user cachho, message link

TypeError afer running

Cannot find reference 'App' in 'embedchain.py'

[Feature Request] Add CSV or Google Sheets support

[Feature Request] Add ReliableGPT to handle errors

Hi @taranjeet,

Facing issues with rate-limiting and context window limitations.

Would recommend wrapping the openai base call with reliableGPT.

from reliablegpt import reliableGPT
openai.ChatCompletion.create = reliableGPT(openai.ChatCompletion.create, ...)

Source: https://github.com/BerriAI/reliableGPT

Issue on TypeVar

When trying to run the sample code I get this:
ImportError: cannot import name 'TypeVar' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)

I am running this in a Databricks notebook.

Insert Local File instead of link

How do I train the model with my local files? Suppose I have a pdf in root directory and I want to add it like mygpt.add("pdf_file", "book.pdf"). Is it possible?

Add tests

need to setup tests so that contributing to the repo becomes easier and faster

[feature] - Add Database, like SQL and SQLite3

Process a database as a data source

Add support for caching

how does the framework handle caching - does it embed everything again and add to database each time you run the script or does it know that a given data source is already embedded and in database therefore no need to incur that expense?

Note: This issue is opened on behalf of discord user bodech, message link

Error when DB only has 1 resource added to it via the .add module

Hello,
I had to issue this error on embedchain during the inference that arises when there is only one resource added to the chromadb. Below is the attached screenshot to the error on streamlit app of embedchain that I created.

Add Huggingface embeddings

I would appreciate it if you add Huggingface embeddings, because it would be free to use, in contrast to OpenAI's embeddings, which uses ada I believe. So something along those lines would be great:

`embeddings_model_name = "sentence-transformers/all-MiniLM-L6-v2"

embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)`

Altough I must admit that I do not know the difference between openAI and this model when it comes to embeddings, if anyone knows, please let me know what those differences are.

Using GPT-4 for prompting

Hi there, I see that the framework is using GPT-3.5 last release , for doing prompting
How I can change to GPT-4 ?

My Best for this project !
Regards

Fine tune tone for the answer

Wondering if it is possible to fine-tune the tone the AI replying to me? such as if I provided the dialogue of Sherlock Holmes, could it reply me with the tone that Sherlock talks? Ty!
this issue is opened on behalf of twitter user ring_hyacinth, tweet

What are the implications of allowing more documents as context?

Let's talk about this method:

def query(self, input_query):
        """
        Queries the vector database based on the given input query.
        Gets relevant doc based on the query and then passes it to an
        LLM as context to get the answer.

        :param input_query: The query to use.
        :return: The answer to the query.
        """
        result = self.collection.query(
            query_texts=[input_query,],
            n_results=1,
        )
        result_formatted = self._format_result(result)
        answer = self.get_answer_from_llm(input_query, result_formatted[0][0].page_content)
        return answer

As far as I can tell, (and I'm just reading, not necessarily understanding, correct me if I'm wrong), it will return the one single closest document. n_results=1

What if we have a more granular database, cut into smaller pieces?

E.g. the webpages and documents we added are only a paragraph long. Then it will only return that one paragraph. So let's keep imagining that a user asks a complex question for which the correct answer is stored in more than one document. Then it would only answer part of the question with limited knowledge.

Here's a simple example. Let's say we are in the car business and feed our database information about the Corvette, one page for each generation. Then a user asks how much does horsepower does the current Corvette make and how much did the first one make?. If my understanding is correct, it could not answer that question (for this specific question, ChatGPT knows the answer out of the box, but you get the point).

For these kinds of use cases I'm proposing to allow the retrieval of more than one document, configurable by the user. 1 can stay as the default. These are then all passed as context so a LLM can do it's magic and process the information.

The downside I can see is that it will require more tokens, and thus cost more. This is a compromise the user has to make for better results. The max token limit should also be considered, especially in cases where the database contains short and long text, for this edge case, max tokens should be configurable by the user, and in case a limit is set, the tokens of the prompt should be counted and cut off if necessary. edit: openai has a max tokens parameter that does all of this

P.S. Why are we prompting with prompt = f"""Use the following pieces of context to answer the query at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. {context} if we just use one piece of context.

I will propose a PR for his.

Custom OpenAi configuration for Query

Hi
Is there any way to configure temperature and model usage for openai when run a query?

Thanks.

Embedchain

Project Tools

Setup Following Project Management Tools

Project Package and Environment Manager: Poetry is recommended
pytests and pylint setup
Contributing Guide
Sphnix Documentation and deploying on the readthedocs server
Docstrings for API : Google Style is recommended.
CI/CD workflows

I can help with above

Add support to add text and text string

Opened on behalf of discord user Edo, message link

epub format

please allow epub format for one of the types supported

Add meta data

Is there a way to add more metadata on each document? something like document id - and get it back in the response?
opened on behalf of discord user ikinnrot, message link

[Feature Request] Auto-Detect data-type, make the it optional

First off... Great job!!! Simple and tight code. Much appreciate you making/sharing it.

There was one quick suggestion I had: In order to minimize boilerplate code, it would be good to modify the interface to make the file_type variable optional and detected based on the input content. If the variable is defined then the code would check the file to ensure that it is of the specified type.

This ease-of-life modification should be added early in development to minimize more extensive refactors down the line.

But I wholly understand if you have a different design goal for making this a required input.

Not installing

Trying to install using pip3 and it returns this error:

Building wheels for collected packages: hnswlib
  Building wheel for hnswlib (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for hnswlib (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [199 lines of output]
      running bdist_wheel
      running build
      running build_ext
      creating var
      creating var/folders
      creating var/folders/8c
      creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn
      creating var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmp4e6jgsj0.o -std=c++14
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c /var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.cpp -o var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/tmpsl27hkck.o -fvisibility=hidden
      building 'hnswlib' extension
      creating build
      creating build/temp.macosx-13-arm64-cpython-311
      creating build/temp.macosx-13-arm64-cpython-311/python_bindings
      x86_64-apple-darwin13.4.0-clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include -I/private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include -I/opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/numpy/core/include -I./hnswlib/ -I/opt/homebrew/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c ./python_bindings/bindings.cpp -o build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -O3 -stdlib=libc++ -mmacosx-version-min=10.7 -DVERSION_INFO=\"0.7.0\" -std=c++14 -fvisibility=hidden
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < dim; i++) {
                              ~ ^ ~~~
      ./python_bindings/bindings.cpp:102:13: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
                  buffer.ndim);
                  ^~~~~~~~~~~
      ./python_bindings/bindings.cpp:126:17: warning: format specifies type 'int' but the argument has type 'pybind11::ssize_t' (aka 'long') [-Wformat]
                      ids_numpy.ndim, feature_rows);
                      ^~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:126:33: warning: format specifies type 'int' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
                      ids_numpy.ndim, feature_rows);
                                      ^~~~~~~~~~~~
      ./python_bindings/bindings.cpp:121:58: warning: comparison of integers of different signs: 'std::__vector_base<long, std::allocator<long>>::value_type' (aka 'long') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              if (!((ids_numpy.ndim == 1 && ids_numpy.shape[0] == feature_rows) ||
                                            ~~~~~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:647:28: warning: unused variable 'data' [-Wunused-variable]
                          float* data = (float*)items.data(row);
                                 ^
      ./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:876:1: warning: 'pybind11_init' is deprecated: PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE [-Wdeprecated-declarations]
      PYBIND11_PLUGIN(hnswlib) {
      ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:432:20: note: expanded from macro 'PYBIND11_PLUGIN'
                  return pybind11_init();                                                               \
                         ^
      ./python_bindings/bindings.cpp:876:1: note: 'pybind11_init' has been explicitly marked deprecated here
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:426:5: note: expanded from macro 'PYBIND11_PLUGIN'
          PYBIND11_DEPRECATED("PYBIND11_PLUGIN is deprecated, use PYBIND11_MODULE")                     \
          ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/detail/common.h:194:43: note: expanded from macro 'PYBIND11_DEPRECATED'
      #    define PYBIND11_DEPRECATED(reason) [[deprecated(reason)]]
                                                ^
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:95:11: warning: field 'link_list_locks_' will be initialized after field 'label_op_locks_' [-Wreorder-ctor]
              : link_list_locks_(max_elements),
                ^
      ./python_bindings/bindings.cpp:488:39: note: in instantiation of member function 'hnswlib::HierarchicalNSW<float>::HierarchicalNSW' requested here
                  new_index->appr_alg = new hnswlib::HierarchicalNSW<dist_t>(
                                            ^
      ./python_bindings/bindings.cpp:880:38: note: in instantiation of member function 'Index<float>::createFromParams' requested here
              .def(py::init(&Index<float>::createFromParams), py::arg("params"))
                                           ^
      ./python_bindings/bindings.cpp:667:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:892:28: note: in instantiation of member function 'Index<float>::knnQuery_return_numpy' requested here
                  &Index<float>::knnQuery_return_numpy,
                                 ^
      ./python_bindings/bindings.cpp:670:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:619:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
                  if (rows <= num_threads * 4) {
                      ~~~~ ^  ~~~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:257:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (features != dim)
                  ~~~~~~~~ ^  ~~~
      ./python_bindings/bindings.cpp:898:28: note: in instantiation of member function 'Index<float>::addItems' requested here
                  &Index<float>::addItems,
                                 ^
      ./python_bindings/bindings.cpp:261:18: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (rows <= num_threads * 4) {
                  ~~~~ ^  ~~~~~~~~~~~~~~~
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:199:
      ./hnswlib/hnswalg.h:755:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < dim; i++) {
                              ~ ^ ~~~
      ./python_bindings/bindings.cpp:323:47: note: in instantiation of function template specialization 'hnswlib::HierarchicalNSW<float>::getDataByLabel<float>' requested here
                  data.push_back(appr_alg->template getDataByLabel<data_t>(id));
                                                    ^
      ./python_bindings/bindings.cpp:903:49: note: in instantiation of member function 'Index<float>::getDataReturnList' requested here
              .def("get_items", &Index<float, float>::getDataReturnList, py::arg("ids") = py::none())
                                                      ^
      ./python_bindings/bindings.cpp:383:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:467:27: note: in instantiation of member function 'Index<float>::getAnnData' requested here
              auto ann_params = getAnnData();
                                ^
      ./python_bindings/bindings.cpp:945:43: note: in instantiation of member function 'Index<float>::getIndexParams' requested here
                      return py::make_tuple(ind.getIndexParams()); /* Return dict (wrapped in a tuple) that fully encodes state of the Index object */
                                                ^
      ./python_bindings/bindings.cpp:386:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:389:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:392:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:395:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:198:
      ./hnswlib/bruteforce.h:105:27: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
              for (int i = 0; i < k; i++) {
                              ~ ^ ~
      ./hnswlib/bruteforce.h:59:5: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::searchKnn' requested here
          ~BruteforceSearch() {
          ^
      ./python_bindings/bindings.cpp:748:13: note: in instantiation of member function 'hnswlib::BruteforceSearch<float>::~BruteforceSearch' requested here
                  delete alg;
                  ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1397:5: note: in instantiation of member function 'BFIndex<float>::~BFIndex' requested here
          delete __ptr;
          ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1658:7: note: in instantiation of member function 'std::default_delete<BFIndex<float>>::operator()' requested here
            __ptr_.second()(__tmp);
            ^
      /Users/acf/opt/anaconda3/bin/../include/c++/v1/memory:1612:19: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::reset' requested here
        ~unique_ptr() { reset(); }
                        ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1872:40: note: in instantiation of member function 'std::unique_ptr<BFIndex<float>>::~unique_ptr' requested here
                  v_h.holder<holder_type>().~holder_type();
                                             ^
      /private/var/folders/8c/dnq_8d0j6b10xklrxyqdt1fh0000gn/T/pip-build-env-8s3c61cb/overlay/lib/python3.11/site-packages/pybind11/include/pybind11/pybind11.h:1535:26: note: in instantiation of member function 'pybind11::class_<BFIndex<float>>::dealloc' requested here
              record.dealloc = dealloc;
                               ^
      ./python_bindings/bindings.cpp:957:9: note: in instantiation of function template specialization 'pybind11::class_<BFIndex<float>>::class_<>' requested here
              py::class_<BFIndex<float>>(m, "BFIndex")
              ^
      In file included from ./python_bindings/bindings.cpp:6:
      In file included from ./hnswlib/hnswlib.h:198:
      ./hnswlib/bruteforce.h:113:27: warning: comparison of integers of different signs: 'int' and 'const size_t' (aka 'const unsigned long') [-Wsign-compare]
              for (int i = k; i < cur_element_count; i++) {
                              ~ ^ ~~~~~~~~~~~~~~~~~
      ./python_bindings/bindings.cpp:853:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:960:44: note: in instantiation of member function 'BFIndex<float>::knnQuery_return_numpy' requested here
              .def("knn_query", &BFIndex<float>::knnQuery_return_numpy, py::arg("data"), py::arg("k") = 1, py::arg("filter") = py::none())
                                                 ^
      ./python_bindings/bindings.cpp:856:13: warning: cannot delete expression with pointer-to-'void' type 'void *' [-Wdelete-incomplete]
                  delete[] f;
                  ^        ~
      ./python_bindings/bindings.cpp:778:22: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int' [-Wsign-compare]
              if (features != dim)
                  ~~~~~~~~ ^  ~~~
      ./python_bindings/bindings.cpp:961:44: note: in instantiation of member function 'BFIndex<float>::addItems' requested here
              .def("add_items", &BFIndex<float>::addItems, py::arg("data"), py::arg("ids") = py::none())
                                                 ^
      In file included from ./python_bindings/bindings.cpp:6:
      ./hnswlib/hnswlib.h:80:13: warning: unused function 'AVX512Capable' [-Wunused-function]
      static bool AVX512Capable() {
                  ^
      34 warnings generated.
      creating build/lib.macosx-13-arm64-cpython-311
      x86_64-apple-darwin13.4.0-clang++ -bundle -undefined dynamic_lookup -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/acf/opt/anaconda3/lib -L/Users/acf/opt/anaconda3/lib -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/acf/opt/anaconda3/include -D_FORTIFY_SOURCE=2 -isystem /Users/acf/opt/anaconda3/include build/temp.macosx-13-arm64-cpython-311/./python_bindings/bindings.o -o build/lib.macosx-13-arm64-cpython-311/hnswlib.cpython-311-darwin.so -stdlib=libc++ -mmacosx-version-min=10.7
      ld: warning: -pie being ignored. It is only used when linking a main executable
      ld: unsupported tapi file type '!tapi-tbd' in YAML file '/Library/Developer/CommandLineTools/SDKs/MacOSX13.sdk/usr/lib/libSystem.tbd' for architecture x86_64
      clang-12: error: linker command failed with exit code 1 (use -v to see invocation)
      error: command '/Users/acf/opt/anaconda3/bin/x86_64-apple-darwin13.4.0-clang++' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for hnswlib
Failed to build hnswlib
ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects

epub format

can you add this format please

sveltejs interface

wrap a @sveltejs interface around it and make an ai product.
opened on behalf of Twitter user Patrick, tweet link

Add support to load codebase

Thanks, Such a handy repo! Loving the user-friendly API. Can't wait to see it support a whole codebase(just like other types of documents) in the future：）
opened on behalf of twitter user ericman65204539, tweet

Add new format - sitemap

Hi @taranjeet I was working on my mini project to chat over a small-sized blog and I found myself writing some piece of code to iterate over the sitemap of the website. I think it would be valueable if we can provide format support for a sitemap to automate multiple web page loading and chunking. Do you already have a issue tracking that or it is something that can be added
Right now I am doing something like this:

# Download sitemap.xml file from a website and extract all the links
def get_links(url):
    url = f'{url}/sitemap.xml'
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'lxml')
        links = [link.text for link in soup.find_all('loc')]
        return links
    else:
        print(f'Error: {response.status_code}')
        return None

opened on behalf of discord user Papa Dosa, message link

feature request: Add New Format "Image"

Embedchain will parse uploaded images, extract text information and embed.
Ex, Screenshot of a book chapter.

The parser package should be configurable, the default should be opensource.

Add support for summarization

Since the data is splitted in chunks, is it possible to implement a summarize function?
opened on behalf of discord user edo, message link

Add support for Pinecone as vector database

Currently, is it possible to use embedchain with pinecone? If not, will it be possible in the future?
this issue is opened on behalf of discord user Hector, message link

Javascript / Nodejs version of EmbedChain?

Hi, is there are JS version of EmbedChain (similar to what was done with LangChainJS) in the works?

Thanks for building this!
David

Reset the database

it would also be nice if there was a method to reset the database. I have no idea about chroma, I'm sure you can just delete the db folder.
this issue is opened on behalf of discord user cachho, message link

ImportError: cannot import name 'App' from partially initialized module 'embedchain' (most likely due to a circular import)

I encountered a strange problem: my Python code consists of only one file, and when the name of this Python file is the same as the name of the library it references: embedchain.py, an error is reported. ImportError: cannot import name 'App' from partially initialized module 'xxx' (most likely due to a circular import)

So, Just rename the file to anther name, and it will be fixed.

Enhance query function and add more types of question

Is it possible to ask the bot articulate questions such as "Given the various documents, can you write the history of..."

Opened on behalf of discord user Edo, message link

Issue with get_openai_answer

max_tokens parameter being set to 1000 is an issue. With having multiple sources (with long urls) and larger webpages, this is quickly eaten up. When the token amount is exceeded no warning is given except from openAI.

openai.error.RateLimitError: The server had an error while processing your request. Sorry about that!

def get_openai_answer(self, prompt):
messages = []
messages.append({
"role": "user", "content": prompt
})
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0613",
messages=messages,
temperature=0,
max_tokens=1000,
top_p=1,
)
return response["choices"][0]["message"]["content"]

[BUG] Chroma DB Duplicate ID Error

This is my code:

import os
os.environ["OPENAI_API_KEY"] = "sk-???"
from embedchain import App
naval_chat_bot = App()
naval_chat_bot.add_local("pdf_file", "docs/masnavi-en.pdf")
print(naval_chat_bot.query("Who is the most powerful man?"))

I get chromadb.errors.DuplicateIDError: Expected IDs to be unique, found duplicates for. Where is the problem?

P.S: This was my second attempt. The first one with a different pdf document was successful.

Hoping to support custom openai base_url.

I hope for support for a custom base_url. This is because the original base_url from OpenAI is sometimes not directly accessible from my region.

Access Paid / login or gated content

some web pages are need login or paid,need curl to get content.i can create a PR

add ML models

Custom LLM and Embedding

Hi!
Can we use custom LLM and Embedding models with it?

Thanx

Add linter

How to handle citations?

opened on behalf of discord user ArunPrakash, message link

Add support for local text

This issue is meant to track PR from https://github.com/cachho.

Collating thoughts and final action here first

Feature Request: Parameters and OpenAI model

Parameters to specify OpenAI model and settings.

ex. I'm subclassing App and updating the model this way to test:

def get_openai_answer(self, prompt):
        messages = []
        messages.append({
            "role": "user", "content": prompt
        })
        response = openai.ChatCompletion.create(
            model="gpt-4-0613",
            messages=messages,
            temperature=0.25,
            max_tokens=1000,
            top_p=1,
        )
        return response["choices"][0]["message"]["content"]

It would be awesome to have a few parameters when querying for temperature,max_tokens, and top_p as well. Or globally/in env? not sure what's best, but happy to create a PR.

add new format sqldatabase

specifically im working with snowflake but would love to be able to select a table, or set of tables as a format source from my data warehouse

Feature Request - Add DataFrames (Spark or Pandas) as Sources

Currently, embedchain allows the addition of various types of data sources such as YouTube videos, PDF files, and web pages to be processed and used in the application. This feature request proposes to extend this functionality to include DataFrames, specifically those from the Spark or Pandas libraries, as potential data sources.

DataFrames are a commonly used data structure for handling and manipulating data in Python, especially in data science and machine learning applications. They are particularly effective when dealing with large, structured datasets, which can include text data.

The ability to use DataFrames as a source of data would add a significant amount of flexibility to embedchain, as users could directly input their preprocessed and transformed data into the application. This could be beneficial in scenarios where the data is already available in a DataFrame format, such as when it has been preprocessed or transformed as part of a larger data pipeline.

The implementation of this feature would involve adding a new method to the App class (or modifying the existing .add() method) that accepts a DataFrame and its format (Spark or Pandas) as arguments. The method would then handle the loading of the data from the DataFrame into the application in the appropriate format, ready to be processed and used in the application.

This feature would increase the flexibility and usefulness of embedchain, making it more applicable to a wider range of scenarios and use-cases, and potentially attracting a broader user base. It would also align well with common data science workflows, which often involve the use of DataFrames for data manipulation and analysis.

Please consider adding this feature in a future update of embedchain.

openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

My code:

import os
from keys import *
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

from embedchain import App

naval_chat_bot = App()

naval_chat_bot.add("web_page", "https://psymplicity.com/")

print(naval_chat_bot.query("what is the three-step approach to private mental health care"))

The Error:

Unable to connect optimized C data functions [No module named '_testbuffer'], falling back to pure Python
All data from https://psymplicity.com/ already exists in the database.
Traceback (most recent call last):
File "c:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\code\flask_app_2\embedchain_test.py", line 21, in
print(naval_chat_bot.query("what is the three-step approach to private mental health care"))
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 225, in query
answer = self.get_answer_from_llm(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 211, in get_answer_from_llm
answer = self.get_openai_answer(prompt)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\embedchain\embedchain.py", line 162, in get_openai_answer
response = openai.ChatCompletion.create(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
return super().create(*args, **kwargs)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
response, _, api_key = requestor.request(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 298, in request
resp, got_stream = self._interpret_response(result, stream)
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 700, in _interpret_response
self._interpret_response_line(
File "C:\Users\moshe\OneDrive - University College London\Code\gpt-autopilot\venv\lib\site-packages\openai\api_requestor.py", line 743, in _interpret_response_line
raise error.ServiceUnavailableError(
openai.error.ServiceUnavailableError: The server is overloaded or not ready yet.

mem0ai / mem0 Goto Github PK

mem0's Introduction

Introduction

Core Features

How Mem0 works?

Use Cases

Get Started

Installation Instructions

Basic Usage

Graph Memory

Documentation

Star History

Support

Contributors

License

mem0's People

Contributors

Stargazers

Watchers

Forkers

mem0's Issues

My code:

The Error:

Recommend Projects

Recommend Topics

Recommend Org