Question Validation <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Oops I did, whoops! <a class="user-mention notranslate" data-hoverca

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

AttributeError: object has no attribute 'get_text_embedding_batch',about run-llama/llama_index

dosubot commented on June 8, 2024 2

The issue you're encountering stems from using AutoTokenizer.from_pretrained, which returns a tokenizer, not an embeddings model. Tokenizers prepare text for processing by models but don't generate embeddings themselves. To generate embeddings, you need to use a model designed for this purpose, such as those available from Hugging Face's transformers library.

For your specific need to use open-source models for embeddings, consider using models from the Hugging Face transformers library that are designed for generating embeddings. Here's a simplified approach:

Choose an Embeddings Model: Select a model from Hugging Face that is suitable for generating embeddings, such as sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.
Load the Model and Tokenizer: Use the AutoModel and AutoTokenizer classes to load your model and tokenizer.

from transformers import AutoModel, AutoTokenizer

model_name = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)

Generate Embeddings: After tokenizing your text, you can pass the tokenized input to your model to obtain embeddings.

inputs = tokenizer("Your text here", return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
    outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)  # Example of obtaining embeddings

This approach allows you to directly use models from Hugging Face for generating embeddings, bypassing the need for a method like get_text_embedding_batch, which is not a standard method for Hugging Face models or tokenizers.

Remember, the key is to use a model that is capable of generating embeddings, as opposed to a tokenizer, which is only responsible for preparing text inputs for a model.

Details

Sources

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

from llama_index.

alineberry commented on June 8, 2024 2

@logan-markewich I think you tagged me on accident

from llama_index.

logan-markewich commented on June 8, 2024 1

Oops I did, whoops!

@NeevrajKB

from llama_index.

logan-markewich commented on June 8, 2024

@alineberry you set the embedding model to be a tokenizer. Probably you should do this

pip install llama-index-embeddings-huggingface

from llama_index.embeddings.huggingface import HuggingFaceEmbedding 

embed_model = HuggingFaceEmbedding(model_name="<...>", embed_batch_size=10, device="<cpu or gpu>")

from llama_index.

NeevrajKB commented on June 8, 2024

😁 Man you really made my day 😂😂😂

from llama_index.

AttributeError: object has no attribute 'get_text_embedding_batch' about llama_index HOT 5 CLOSED

Comments (5)

Details

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent