truera / trulens Goto Github PK
View Code? Open in Web Editor NEWEvaluation and Tracking for LLM Experiments
Home Page: https://www.trulens.org/
License: MIT License
Evaluation and Tracking for LLM Experiments
Home Page: https://www.trulens.org/
License: MIT License
Hi,
I'm trying to integrate trulens eval in our setup.
We are using ChatOpenAI model with ChatPromptTemplate in langchain.
When calling the chain directly, it works fine. Doing the same through TruChain results in an error.
Versions:
trulens-eval==0.10.0
langchain==0.0.266
This is the code reproducing the issue.
It demonstrates that calling the chain directly works.
from trulens_eval import TruChain
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.schema import SystemMessage
from langchain.prompts.chat import ChatPromptTemplate
llm = ChatOpenAI(temperature=0.9)
# you need to set the Open AI API Key
prompt = ChatPromptTemplate.from_messages(
[SystemMessage(content="You are a friendly bot, who speaks like a dog")]
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
truchain = TruChain(
chain,
app_id="Chain1_ChatApplication",
)
input = {"input": "Hello"}
result_chain = chain(input)
print("Result from chain works fine: " + str(result_chain))
result = truchain(input)
print("Result from trulens: " + str(result))
Hello,
I'm wondering how we can log the production data into Bigquery database and then retrieve them later to analyze the results.
Relevant slack discussion thread - https://aiqualityforum.slack.com/archives/C02K2B8439S/p1689913999931319
Currently trulens supports Langchain chain's call method, but doesn't support acall or asyncronous calls. Would be great to get that support!
Thanks!
By measuring latency, we can compare how effective these models are across different implementations. I believe it would be beneficial for evaluating and comparing the performance of this wrapper. Particularly, working with an e-GPU but in a Sequential process, it may help triage bottleneck issues.
If I open trulens-eval dashboard evaluations -> http://localhost:8501/Evaluations
And have mulitple records,
when I select a different record then the first one in the row,
still the first record's data is shown in the bottom part "Display full app json".
Expected behaviour: the data for the other row should be displayed.
My code to run trulens:
truchain = TruChain(chain, app_id="app_id", tags="my tag)
await truchain.acall_with_record(variables)
trulens-eval version: 0.11.0
The change to use a context manager doesn't actually log anything:
with tru_llm_standalone as recording:
llm_standalone(prompt_input)
If instead I change that to
tru_llm_standalone.call_with_record(prompt_input)
I see records logged.
Affects releases: 0.9.0
Dashboard breaks if I my Python environment has llama-index 0.7.24.post1
or later.
Not sure if it's an issue with the new llama-index
releases or a compatibility issue between that and trulens-eval.
I can only get it to work by setting llama-index>=0.7.16,<=0.7.23
in requirements.txt
or pyproject.toml
Update to llama-index=0.7.24.post1
or later (up to 0.8.2
), then run Tru().start_dashboard()
and browse to the dashboard.
See dashboard log here: log.txt
A temporary workaround would be pinning the llama-index
version to 0.7.23
in requirements.txt
and setup.py
.
But users of trulens-eval would miss on any future llama-index
updates while the issue lasts.
Currently, it's hard to retrieve feedback function results in a structured way. It would be nice to have a:
get_feedbacks(record)
API
This will be helpful if we want to track user ID, and session ID, customer ID and other related information in the session.
The code below:
import os
os.environ["OPENAI_API_KEY"] = "sk-***"
os.environ["HUGGINGFACE_API_KEY"] = "hf_***"
# Imports main tools:
from trulens_eval import TruLlama, Feedback, Tru, feedback
tru = Tru()
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
import numpy as np
# Initialize Huggingface-based feedback function collection class:
hugs = feedback.Huggingface()
openai = feedback.OpenAI()
# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
TruLlama.select_source_nodes().node.text
).aggregate(np.min)
tru_query_engine = TruLlama(query_engine,
app_id='LlamaIndex_App1',
feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance])
generates:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[/mnt/llm/trulens/1.py](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/1.py) in line 33
[27](file:///mnt/llm/trulens/1.py?line=26) # Question/statement relevance between question and each context chunk.
[29](file:///mnt/llm/trulens/1.py?line=28) f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
[30](file:///mnt/llm/trulens/1.py?line=29) TruLlama.select_source_nodes().node.text
[31](file:///mnt/llm/trulens/1.py?line=30) ).aggregate(np.min)
---> [33](file:///mnt/llm/trulens/1.py?line=32) tru_query_engine = TruLlama(query_engine,
[34](file:///mnt/llm/trulens/1.py?line=33) app_id='LlamaIndex_App1',
[35](file:///mnt/llm/trulens/1.py?line=34) feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance])
File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:134](https://vscode-remote+ssh-002dremote***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:134), in TruLlama.__init__(self, app, **kwargs)
[132](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=131) kwargs['app'] = app
[133](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=132) kwargs['root_class'] = Class.of_object(app) # TODO: make class property
--> [134](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=133) kwargs['instrument'] = LlamaInstrument()
[136](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=135) super().__init__(**kwargs)
File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:101](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:101), in LlamaInstrument.__init__(self)
[97](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=96) def __init__(self):
[98](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=97) super().__init__(
[99](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=98) root_method=TruLlama.query_with_record,
[100](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=99) modules=LlamaInstrument.Default.MODULES,
--> [101](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=100) classes=LlamaInstrument.Default.CLASSES(), # was thunk
[102](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=101) methods=LlamaInstrument.Default.METHODS
[103](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=102) )
File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:55](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:55), in LlamaInstrument.Default.()
[42](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=41) MODULES = {"llama_index."}.union(
[43](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=42) LangChainInstrument.Default.MODULES
[44](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=43) ) # NOTE: llama_index uses langchain internally for some things
[46](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=45) # Putting these inside thunk as llama_index is optional.
[47](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=46) CLASSES = lambda: {
[48](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=47) llama_index.indices.query.base.BaseQueryEngine,
[49](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=48) llama_index.indices.base_retriever.BaseRetriever,
[50](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=49) llama_index.indices.base.BaseIndex,
[51](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=50) llama_index.chat_engine.types.BaseChatEngine,
[52](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=51) llama_index.prompts.base.Prompt,
[53](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=52) # llama_index.prompts.prompt_type.PromptType, # enum
[54](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=53) llama_index.question_gen.types.BaseQuestionGenerator,
---> [55](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=54) llama_index.indices.query.response_synthesis.ResponseSynthesizer,
[56](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=55) llama_index.indices.response.refine.Refine,
[57](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=56) llama_index.llm_predictor.LLMPredictor,
[58](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=57) llama_index.llm_predictor.base.LLMMetadata,
[59](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=58) llama_index.llm_predictor.base.BaseLLMPredictor,
[60](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=59) llama_index.vector_stores.types.VectorStore,
[61](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=60) llama_index.question_gen.llm_generators.BaseQuestionGenerator,
[62](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=61) llama_index.indices.service_context.ServiceContext,
[63](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=62) llama_index.indices.prompt_helper.PromptHelper,
[64](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=63) llama_index.embeddings.base.BaseEmbedding,
[65](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=64) llama_index.node_parser.interface.NodeParser
[66](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=65) }.union(LangChainInstrument.Default.CLASSES())
[68](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=67) # Instrument only methods with these names and of these classes. Ok to
[69](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=68) # include llama_index inside methods.
[70](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=69) METHODS = dict_set_with(
[71](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=70) {
[72](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=71) "get_response":
(...)
[94](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=93) }, LangChainInstrument.Default.METHODS
[95](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=94) )
AttributeError: module 'llama_index.indices.query' has no attribute 'response_synthesis'
Hi, I have an application that uses Trulens scores and I would like to link that application to the Evaluations page.
Is it possible to enable query parameters so I can generate a url with the App Filters?
Would be nice to write the filters on the URL:
https://docs.streamlit.io/library/api-reference/utilities/st.experimental_set_query_params
Then get the filters back:
https://docs.streamlit.io/library/api-reference/utilities/st.experimental_get_query_params
Thanks!
Need to rename test to model_wrapper_non_eager_test.py instead of model_wrapper_test_non_eager.py to enable pytest
unit tests are failing
Hi, it is exciting to have Trulens. I wonder if a bibtex citation can be provided for the users to cite the repo?
This code has different gradients each time
from trulens.nn.attribution import InputAttribution, InternalInfluence
from trulens.nn.attribution import IntegratedGradients
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.distributions import PointDoi, LinearDoi
from trulens.nn.slices import Cut, InputCut, OutputCut, Slice
from trulens.nn.models import get_model_wrapper
from transformers import pipeline
summarizer = pipeline("summarization")
summarizer(
"""
America has changed dramatically during recent years. Not only has the number of
graduates in traditional engineering disciplines such as mechanical, civil,
electrical, chemical, and aeronautical engineering declined, but in most of
the premier American universities engineering curricula now concentrate on
and encourage largely the study of engineering science. As a result, there
are declining offerings in engineering subjects dealing with infrastructure,
the environment, and related issues, and greater concentration on high
technology subjects, largely supporting increasingly complex scientific
developments. While the latter is important, it should not be at the expense
of more traditional engineering.
Rapidly developing economies such as China and India, as well as other
industrial countries in Europe and Asia, continue to encourage and advance
the teaching of engineering. Both China and India, respectively, graduate
six and eight times as many traditional engineers as does the United States.
Other industrial countries at minimum maintain their output, while America
suffers an increasingly serious decline in the number of engineering graduates
and a lack of well-educated engineers.
"""
)
m = summarizer.model.to("cpu")
tokenizer = summarizer.tokenizer
batch_sentences = ["Hello I'm a single sentence", "And another sentence", "And the very very last one"]
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt").to(m.device)
input = batch['input_ids']
wrapped_model = get_model_wrapper(m, input_shape=(None, 7), device='cpu')
embed_cut = Cut('model_encoder_layers_0_self_attn_q_proj', anchor='in')
from trulens.nn.quantities import QoI
import torch
class SummarizerQoI(QoI):
def __call__(self, seq_2_seq_output):
logits = seq_2_seq_output['logits']
max_token, max_indices = torch.max(logits,dim=-1)
qoi = torch.mean(max_token, 1, True)
return qoi
infl = InternalInfluence(
wrapped_model,
Slice(embed_cut, OutputCut()),
SummarizerQoI(),
PointDoi(cut=embed_cut))
attrs = infl.attributions(**batch)
ValueError in the tensorflow 2 / keras notebook provided in the website.
The error occurs when executing the 6th code cell, it contains following code:
infl = InputAttribution(model)
attrs_input = infl.attributions(x_pp)
Error Log:
ValueError Traceback (most recent call last)
<ipython-input-18-9b10bbb33c8d> in <module>()
1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)
/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
269 to_cut=self.slice.to_cut,
270 intervention=D,
--> 271 doi_cut=doi_cut)
272 # Take the mean across the samples in the DoI.
273 if isinstance(qoi_grads, DATA_CONTAINER_TYPE):
/usr/local/lib/python3.7/dist-packages/trulens/nn/models/keras.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
430 intervention, DATA_CONTAINER_TYPE) else [intervention]
431
--> 432 Q = qoi(to_tensors[0]) if len(to_tensors) == 1 else qoi(to_tensors)
433
434 doi_tensors, intervention = self._prepare_intervention_with_input(
/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in __call__(self, y)
109
110 def __call__(self, y: TensorLike) -> TensorLike:
--> 111 self._assert_cut_contains_only_one_tensor(y)
112
113 if self.activation is not None:
/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in _assert_cut_contains_only_one_tensor(self, x)
76 '`{}` expected to receive an instance of `Tensor`, but '
77 'received an instance of {}'.format(
---> 78 self.__class__.__name__, type(x)))
79
80
ValueError: `MaxClassQoI` expected to receive an instance of `Tensor`, but received an instance of <class 'keras.engine.keras_tensor.KerasTensor'>
Bug when using the new AzureOpenAI wrapper added in PR #242
azure = AzureOpenAI(model_engine="gpt-35-turbo", deployment_id="gpt-4")
Traceback (most recent call last):
File "/home/lab/lab392-rfpvirtualassistant/scripts/langchain/eval.py", line 15, in <module>
pipeline = QAPipeline(config['AWS']['AWSKendraSearchIndexId'], config['OpenAI']['OpenAIAnswerDeployment'], eval_mode=True, answers=answers)
File "/home/lab/lab392-rfpvirtualassistant/scripts/langchain/pipeline.py", line 58, in __init__
azure = AzureOpenAI(model_engine="gpt-35-turbo", deployment_id=openAIDeployment)
File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1689, in __init__
super().__init__(
File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1049, in __init__
super().__init__(
File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1020, in __init__
super().__init__(*args, **kwargs)
File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/util.py", line 1660, in __init__
super().__init__(__tru_class_info=class_info, **kwargs)
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for AzureOpenAI
deployment_id
field required (type=value_error.missing)
I believe the problem is that in class OpenAI
, the deployment_id
parameter is lost since it is not assigned to self_kwargs
. PR incoming with my suggestion to fix this bug.
Hi team, I'm delighted to explore this library for finding notorious issues in our trained models.
I'm trying to apply TruLens to various tasks (CV - segmentation, localisation; NLP - classification, generation etc.) and got stuck in the initial stages itself.
For started, I have tried to load pre-trained semantic/instance segmentation models and couldn't get them to work (after spending few cycles on debugging).
A minute change in the intro_demo_pytorch.ipynb at the line:
pytorch_model = models.segmentation.fcn_resnet50(pretrained=True)
throws the following error while calculating the InputAttribution():
ValueError Traceback (most recent call last)
<ipython-input-11-9b10bbb33c8d> in <module>()
1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)
3 frames
/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
269 intervention=D,
270 doi_cut=doi_cut)
--> 271
272 # Take the mean across the samples in the DoI.
273 if isinstance(qoi_grads, DATA_CONTAINER_TYPE):
/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
461 grads_list = []
462 for z in zs:
--> 463 z_flat = ModelWrapper._flatten(z)
464 qoi_out = qoi(y)
465
/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in __call__(self, y)
109
110 def __call__(self, y: TensorLike) -> TensorLike:
--> 111 self._assert_cut_contains_only_one_tensor(y)
112
113 if self.activation is not None:
/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in _assert_cut_contains_only_one_tensor(self, x)
76 '`{}` expected to receive an instance of `Tensor`, but '
77 'received an instance of {}'.format(
---> 78 self.__class__.__name__, type(x)))
79
80
ValueError: `MaxClassQoI` expected to receive an instance of `Tensor`, but received an instance of <class 'collections.OrderedDict'>
Can you share some guidance on triaging this error? I'm using the same input image and transforms.
Hi Teams
The record tracking function is look great!
I would like to ask is it possible to use it with out running a db, I mainly want to use it to get the tracking record for each time of my call.
Thank you very much
I was wondering if it is supported to add multiple tags for a record.
Example code from the documentation for adding tags:
https://www.trulens.org/trulens_eval/langchain_quickstart/#instrument-chain-for-logging-with-trulens
truchain = TruChain(chain,
app_id='Chain1_ChatApplication',
feedbacks=[f_lang_match],
tags = "prototype")
What if I wanted to add one more tag, e.g.
tags = ["tag1","tag2"]
Is something like this supported?
Currently I get an error, that a single string is expected.
import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from trulens_eval import TruChain, Feedback, Huggingface, Tru
tru = Tru()
os.environ["HUGGINGFACE_API_KEY"] = ""
gpt4all_path = './models/gpt4all-converted.bin'
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
hugs = Huggingface()
f_lang_match = Feedback(hugs.language_match).on_input_output()
llm = GPT4All(model=gpt4all_path,verbose=True,temp=0.2)
index = FAISS.load_local("my_faiss_index", embeddings)
template = """Answer the following question using {context}
Question: {question}
Answer:
"""
def get_best_answer(question):
matched_docs, sources = similarity_search(question, index, n=4)
context = "\n".join([doc.page_content for doc in matched_docs])
prompt = PromptTemplate(template=template, input_variables=["context", "question"]).partial(context=context)
llm_chain = LLMChain(prompt=prompt, llm=llm)
truchain = TruChain(llm_chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match],tags = "prototype")
answer = truchain.run(question)
return answer
def similarity_search(query, index, n=4):
matched_docs = index.similarity_search(query, k=n)
sources = []
for doc in matched_docs:
sources.append(
{
"page_content": doc.page_content,
"metadata": doc.metadata,
}
)
return matched_docs, sources
while True:
question = input("Please enter your question (or type 'exit' to close the program): ")
if question.lower() == "exit":
break
answer = get_best_answer(question)
print("Answer:", answer)
gives
✅ app Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_44c43fe23fdbb98055154e6bb126142a -> default.sqlite
Traceback (most recent call last):
File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 75, in <module>
answer = get_best_answer(question)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 48, in get_best_answer
answer = truchain.run(question)
^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'RuntimeError' object is not callable`
from trulens_eval import Select
selector = Select.RecordCalls._call.args.inputs[["key1", "key2"]]
repr(selector)
Output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../.venv/lib/python3.9/site-packages/trulens_eval/util.py", line 951, in __repr__
return "JSONPath()" + ("".join(map(repr, self.path)))
File ".../.venv/lib/python3.9/site-packages/trulens_eval/util.py", line 926, in __repr__
return f"[{','.join(self.indices)}]"
AttributeError: 'GetItems' object has no attribute 'indices'
Then obviously this crashes any Feedback that uses this selector
On util.py
def __repr__(self):
return f"[{','.join(self.items)}]"
The notebooks directory contains notebooks named intro_demo_pytorch.ipynb and intro_demo_tf2_keras.ipynb , these notebooks show how trulens can be used for visualizing and explaining an image classification model. I request similar notebooks for NLP and time-series RNNs so that users can easily get started with them.
Thankyou!
Quantities of interests receive output values on DoI samples but these are not necessarily the instance being explained, except in some cases like point. Because of this, one cannot define a QoI for "explained instance predicted class logits" because predicted class is not know unless the DoI sample is the same as the explained instance.
A related issue is that some may interpret QoI like the default max to mean what I described above but that is not what the max QoI implements because of the issue noted.
Currently trulens supports LlamaIndex's query method, but doesn't support aquery. Also we need support of chat_engine methods: chat and achat.
Thanks!
This bug affects release 0.9.0 and it was probably introduced with #362 (I tested with the codebase before and after the merge, before it works fine).
While trying to reproduce this example from LangChain I noticed that the TruChain
with an AgentExecutor
causes TypeError: 'NoneType' object is not subscriptable
in langchain.chains.base
from langchain import LLMMathChain
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.chat_models import ChatOpenAI
from trulens_eval import Tru, TruChain
llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
llm_math_chain = LLMMathChain.from_llm(llm, verbose=True)
tools = [
Tool(
name="Calculator",
func=llm_math_chain.run,
description="useful for when you need to answer questions about math"
),
]
agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)
app = TruChain(agent)
# works fine
agent(inputs={"input": "how much is Euler's number divided by PI"})
# raises `TypeError: 'NoneType' object is not subscriptable`
app(inputs={"input": "how much is Euler's number divided by PI"})
Output logs and stacktrace here: logs.txt
It seems that the output of torchvision models is an OrderedDict and not a torch.Tensor, which by default has a single key, 'out'
. Currently a custom QoI is required to handle this type of output:
class TorchvisionMaxClassQoI(MaxClassQoI):
def __call__(self, y):
super().__call__(y['out'])
It would be nice to handle this case by default instead of requiring a custom QoI. If the input, y
to a QoI is a dict
or OrderedDict
with exactly one entry, then that entry should be used in place of y
rather than raising an exception (as this is unambiguously the output we would care about).
I don't use Jupyter notebook, but I do use LLMs. Could have a quick start that doesn't require Jupyter?
Trulens package needs fastavro==1.7.4
.
1.7.4 version has an issue and it has been fixed in latest update (1.8.2)
Can you please fix this ?
Some feedback functions may need to operate on the final formatted prompt, not only on prompt inputs. For example, when using the ReAct pattern the prompt might contain dynamic instructions or examples that need to be taken into account when evaluating the LLM behavior.
One could try to achieve that by passing both the prompt template and the inputs dictionary to the feedback function, then letting it format the prompt before calculating the metric:
class CustomProvider(Provider):
def my_feedback_func(self, prompt: str, inputs: dict) -> float:
prompt = prompt.format(**inputs)
return float(len(prompt))
Feedback(CustomProvider().my_feedback_func) \
.on(Select.App.app.prompt.template) \
.on(Select.RecordCalls._call.args.inputs)
However, that is not possible, because the FeedbackCall schema doesn't allow a dict
as argument to the feedback function.
Using the above defined feedback in an app will produce the following error:
Feedback Function Exception Caught: Traceback (most recent call last):
File ".../.venv/lib/python3.9/site-packages/trulens_eval/feedback.py", line 862, in run
feedback_call = FeedbackCall(
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for FeedbackCall
args -> inputs
str type expected (type=type_error.str)
Change the schema of FeedbackCall to the following:
class FeedbackCall(SerialModel):
args: Dict[str, Union[str, JSON]]
ret: float
Many visualizers, including the Tiler
class should be able to be used without necessarily using other TruLens features. However, typically, the backend will be set when the user calls get_model_wrapper
, unless they have explicitly set the backend environment variable (which is probably not the norm). This creates a problem when using the Tiler
as it uses the backend to get the channel dimension. When the backend has not been set either in the environment variable or by calling get_model_wrapper
, the backend comes back as None
, leading to an error, that will perhaps be confusing to a user trying to use just the visualization library of TruLens.
I suggest that we try to handle the case where the backend is not set when the tiler is called (and other places the channel dimension comes up in the visualizers). Most of the time, the channel dimension should be able to be inferred, because it is used for the purpose of displaying RGB or grayscale images, which can only have a 3 or a 1 as the size of the channel dimension. Thus, the only ambiguous case is when the image itself has a height/width of 3 or 1. In those cases, perhaps we can adopt a default, e.g., the convention used by matplotlib
(which I believe is channels last).
The new process when trying to obtain the channel dimension would be:
matplotlib
) expects.This way the visualizations
module never throws errors simply because the user never specified the backend.
Hello! Would appreciate any help to solve this problem! I'm trying to use the IntegratedGradients() to explain my model.
yolo = Load_Yolo_model() # yolo is <class 'tensorflow.python.keras.engine.functional.Functional'> object
model_wrapped = get_model_wrapper(yolo)
ig_computer = IntegratedGradients(model_wrapped, resolution=20)
with PIL.Image.open(image_path).convert('RGB') as img:
x = np.array(img.resize((416, 416)))
x_np = np.array(img.resize((416, 416), PIL.Image.ANTIALIAS))[np.newaxis]
input_attributions = ig_computer.attributions(x_np)
But run into a problem:
Traceback (most recent call last):
File "trulens_yolov3.py", line 31, in <module>
input_attributions = ig_computer.attributions(x_np)
File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/attribution.py", line 264, in attributions
qoi_grads = self.model.qoi_bprop(
File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/models/tensorflow_v2.py", line 424, in qoi_bprop
Q = qoi(outputs[0]) if len(outputs) == 1 else qoi(outputs)
File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/quantities.py", line 111, in __call__
self._assert_cut_contains_only_one_tensor(y)
File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/quantities.py", line 63, in _assert_cut_contains_only_one_tensor
raise QoiCutSupportError(
trulens.nn.quantities.QoiCutSupportError: Cut provided to quantity of interest was comprised of multiple tensors, but `MaxClassQoI` is only defined for cuts comprised of a single tensor (received a list of 3 tensors).
Either (1) select a slice where the `to_cut` corresponds to a single tensor, or (2) implement/use a `QoI` object that supports lists of tensors, i.e., where the parameter, `x`, to `__call__` is expected/allowed to be a list of 3 tensors.
in models.pytorch
model_args = ModelWrapper._nested_apply(
model_args, lambda model_args: model_args.requires_grad_(True))
The default cut
argument to InputAttribution
is set to None:
trulens/trulens/nn/attribution.py
Line 465 in c93dbe3
This is then passed to InternalInfluence
with a cut (InputCut(), None)
, and InternalInfluence
interprets a None
cut as an InputCut()
. This is not the intended default behavior, which should be OutputCut()
.
This causes correctness issues, where attributions are returned identical to the input (tested on tensorflow==2.4.0).
import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from trulens_eval import TruChain, Feedback, Huggingface, Tru
tru = Tru()
os.environ["HUGGINGFACE_API_KEY"] = ""
gpt4all_path = './models/gpt4all-converted.bin'
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
hugs = Huggingface()
f_lang_match = Feedback(hugs.language_match).on_input_output()
llm = GPT4All(model=gpt4all_path,verbose=True,temp=0.2)
index = FAISS.load_local("my_faiss_index", embeddings)
template = """Answer the following question using {context}
Question: {question}
Answer:
"""
def get_best_answer(question):
matched_docs, sources = similarity_search(question, index, n=4)
context = "\n".join([doc.page_content for doc in matched_docs])
prompt = PromptTemplate(template=template, input_variables=["context", "question"]).partial(context=context)
llm_chain = LLMChain(prompt=prompt, llm=llm)
truchain = TruChain(llm_chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match],tags = "prototype")
answer = truchain.run(question)
return answer
def similarity_search(query, index, n=4):
matched_docs = index.similarity_search(query, k=n)
sources = []
for doc in matched_docs:
sources.append(
{
"page_content": doc.page_content,
"metadata": doc.metadata,
}
)
return matched_docs, sources
while True:
question = input("Please enter your question (or type 'exit' to close the program): ")
if question.lower() == "exit":
break
answer = get_best_answer(question)
print("Answer:", answer)
gives
✅ app Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_44c43fe23fdbb98055154e6bb126142a -> default.sqlite
Traceback (most recent call last):
File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 75, in <module>
answer = get_best_answer(question)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 48, in get_best_answer
answer = truchain.run(question)
^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'RuntimeError' object is not callable`
FYI as a heads up
If
os.environ['TRULENS_BACKEND'] = 'keras'
is set incorrectly, then the library runs into issues. This should be able to be inferred from the model type upon wrapping the model
Hi team,
I'm trying to explain the output of localization models (fasterrcnn_resnet50_fpn, and others in Torchvision).
There are no issues when I wrap the pytorch model in get_model_wrapper() but passing this wrapped model to InputAttributions() or IntegratedGradients() throws, what seems a trivial PyTorch error:
Would you know, where in trulens/pytorch.py are we creating views and if that's something that can be quickly fixed?
RuntimeError Traceback (most recent call last)
<ipython-input-43-9b10bbb33c8d> in <module>()
1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)
7 frames
/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
269 to_cut=self.slice.to_cut,
270 intervention=D,
--> 271 doi_cut=doi_cut)
272 # Take the mean across the samples in the DoI.
273 if isinstance(qoi_grads, DATA_CONTAINER_TYPE):
/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
455 attribution_cut=attribution_cut,
456 intervention=intervention,
--> 457 return_tensor=False)
458
459 y = to_cut.access_layer(y)
/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in fprop(self, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention, return_tensor, input_timestep)
370 ]
371 # Run the network.
--> 372 output = self.model(*model_args, **model_kwargs)
373 if isinstance(output, tuple):
374 output = output[0]
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
75 original_image_sizes.append((val[0], val[1]))
76
---> 77 images, targets = self.transform(images, targets)
78
79 # Check for degenerate boxes
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1049 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1050 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051 return forward_call(*input, **kwargs)
1052 # Do not call functions when jit is used
1053 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/transform.py in forward(self, images, targets)
118
119 image_sizes = [img.shape[-2:] for img in images]
--> 120 images = self.batch_images(images, size_divisible=self.size_divisible)
121 image_sizes_list: List[Tuple[int, int]] = []
122 for image_size in image_sizes:
/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/transform.py in batch_images(self, images, size_divisible)
222 batched_imgs = images[0].new_full(batch_shape, 0)
223 for img, pad_img in zip(images, batched_imgs):
--> 224 pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
225
226 return batched_imgs
RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.
I am running in ubuntu 20.04 machine. I have integrated trulens with llama-index for quering. Here is the code -
from multiprocessing.managers import BaseManager
from trulens_eval import TruLlama, Feedback, Tru, feedback
from sqlalchemy import URL
from llama_index import SimpleWebPageReader
from llama_index import VectorStoreIndex
import numpy as np
def initialize_trulens():
global feedbacks
# Initiating the trulens dashboard
url_object = URL.create(
"postgresql+psycopg2",
username=os.environ.get("SUPERBASE_USERNAME"),
password=os.environ.get("SUPERBASE_PASSWORD"),
host=os.environ.get("SUPERBASE_HOST"),
port=os.environ.get("SUPERBASE_PORT"),
database=os.environ.get("SUPERBASE_DATABASE")
)
tru = Tru(database_url= url_object)
tru.run_dashboard()
# Initialize Huggingface-based feedback function collection class:
hugs = feedback.Huggingface()
openai = feedback.OpenAI()
# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
TruLlama.select_source_nodes().node.text
).aggregate(np.min)
feedbacks = [f_lang_match, f_qa_relevance, f_qs_relevance]
def query():
documents = SimpleWebPageReader(html_to_text=True).load_data(
["http://paulgraham.com/worked.html"]
)
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
tru_query_engine_recorder = TruLlama(
query_engine,
app_id='LlamaIndex_App1',
feedbacks=feedbacks
)
response = tru_query_engine_recorder.query("What did the author do growing up?")
print(response)
return response
if __name__ == "__main__":
print("server started...")
initialize_trulens()
manager = BaseManager(('', 5602), b'password')
manager.register('query_index', query)
server = manager.get_server()
server.serve_forever()
After running the server it is working good. I can see the dashboard of trulens. But after some time I am getting the following error and trulens dashboard gets disconnected from database.
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
2023-09-26 14:01:57.225 MediaFileHandler: Missing file 16a57003a519fd6aa13a2f733bd4fa522ba8e58545862765c5d0ce92.png
2023-09-26 14:01:57.235 MediaFileHandler: Missing file 6b6481b1f4db67286783dec664cc63b2153e3c7c262a3bff8a328923.png
2023-09-26 14:01:57.236 MediaFileHandler: Missing file 37d4f4b62a7b6acf699e54df90a7bf371cbab2d17fca60c9372fd503.png
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
2023-09-26 14:02:14.752 MediaFileHandler: Missing file 16a57003a519fd6aa13a2f733bd4fa522ba8e58545862765c5d0ce92.png
2023-09-26 14:02:14.753 MediaFileHandler: Missing file 6b6481b1f4db67286783dec664cc63b2153e3c7c262a3bff8a328923.png
2023-09-26 14:02:14.754 MediaFileHandler: Missing file 37d4f4b62a7b6acf699e54df90a7bf371cbab2d17fca60c9372fd503.png
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
2023-09-26 14:18:28.504 Session with id 36790e30-3097-4309-acd6-b1d908187e66 is already connected! Connecting to a new session.
Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .
2023-09-26 14:18:28.840 Uncaught app exception
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
exec(code, module.__dict__)
File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 141, in <module>
main()
File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 137, in main
streamlit_app()
File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 51, in streamlit_app
df, feedback_col_names = lms.get_records_and_feedback([])
File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 60, in wrapper
callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/sqlalchemy_db.py", line 44, in <lambda>
run_before(lambda self, *args, **kwargs: check_db_revision(self.engine)),
File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 112, in check_db_revision
if is_legacy_sqlite(engine):
File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 82, in is_legacy_sqlite
tables = list(inspector.get_table_names())
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 396, in get_table_names
return self.dialect.get_table_names(
File "<string>", line 2, in get_table_names
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 97, in cache
ret = fn(self, con, *args, **kw)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3368, in get_table_names
return self._get_relnames_for_relkinds(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3364, in _get_relnames_for_relkinds
return connection.scalars(query).all()
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1344, in scalars
return self.execute(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute
return meth(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
return connection._execute_clauseelement(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
ret = self._execute_context(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
return self._exec_single_context(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
self._handle_dbapi_exception(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
self.dialect.do_execute(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[SQL: SELECT pg_catalog.pg_class.relname
FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace
WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s]
[parameters: {'param_1': 'r', 'param_2': 'p', 'relpersistence_1': 't', 'nspname_1': 'pg_catalog'}]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered
xmin = min(xmin, np.nanmin(xi))
/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered
xmax = max(xmax, np.nanmax(xi))
Usecase: When changing the input to a custom input, agreement_measure
returns nan.
If one changes the input like so:
f_agreement_measure = Feedback(GroundTruthAgreement(answers, azure).agreement_measure).on(
Select.Record.calls[0].rets.query).on(
Select.Record.main_output["answer"]
)
You can see that in agreement_measure_calls
, the ret
and meta
are empty because it wasn't able to find the input in the dictionary:
[{'args': {'prompt': 'Prompt 1 Modified', 'response': "response 1"}, 'ret': nan, 'meta': {}}]
From what I've understood so far, it's because in GroundTruthAgreement
and _find_response()
, it is searching for the original query Prompt 1
, like in Select.Record.main_input
. Instead, it is searching for a query containing Prompt 1 Modified
, contained in Select.Record.calls[0].rets.query
, but can't find it:
self.ground_truth = [{'query': 'Prompt 1', 'response': 'response 1'}]
Therefore, it returns nan.
When I follow the demo in langchain_quickstart.ipynb
Error occurs at this line
from IPython.display import JSON
# Imports main tools:
from trulens_eval import TruChain, Feedback, Huggingface, Tru # Error here
from trulens_eval.schema import FeedbackResult
tru = Tru()
Full error message is below:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[26], line 4
1 from IPython.display import JSON
3 # Imports main tools:
----> 4 from trulens_eval import TruChain, Feedback, Huggingface, Tru
5 from trulens_eval.schema import FeedbackResult
6 tru = Tru()
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/__init__.py:83
1 """
2 # Trulens-eval LLM Evaluation Library
3
(...)
78
79 """
81 __version__ = "0.17.0"
---> 83 from trulens_eval.feedback import Bedrock
84 from trulens_eval.feedback import Feedback
85 from trulens_eval.feedback import Huggingface
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/feedback/__init__.py:14
11 AggCallable = Callable[[Iterable[float]], float]
13 # Specific feedback functions:
---> 14 from trulens_eval.feedback.embeddings import Embeddings
15 # Main class holding and running feedback functions:
16 from trulens_eval.feedback.feedback import Feedback
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/feedback/embeddings.py:8
5 from pydantic import PrivateAttr
7 from trulens_eval.utils.imports import REQUIREMENT_SKLEARN
----> 8 from trulens_eval.utils.pyschema import WithClassInfo
9 from trulens_eval.utils.serial import SerialModel
12 class Embeddings(SerialModel, WithClassInfo):
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/utils/pyschema.py:588
584 # Key of structure where class information is stored.
585 CLASS_INFO = "__tru_class_info"
--> 588 class WithClassInfo(pydantic.BaseModel):
589 """
590 Mixin to track class information to aid in querying serialized components
591 without having to load them.
592 """
594 # Using this odd key to not pollute attribute names in whatever class we mix
595 # this into. Should be the same as CLASS_INFO.
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:104, in __new__(mcs, cls_name, bases, namespace, __pydantic_generic_metadata__, __pydantic_reset_parent_namespace__, **kwargs)
File ~/miniconda3/envs/py310/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:345, in inspect_namespace(namespace, ignored_types, base_class_vars, base_class_fields)
NameError: Fields must not use names with leading underscores; e.g., use 'WithClassInfo__tru_class_info' instead of '_WithClassInfo__tru_class_info'.
Hi there 👋
I am using from trulens_eval.feedback.OpenAI
but I need to pass extra args to use with azure, I don't understand the usage of OpenAIEndpoint
since I cannot see extra args to setup openai, usually you can do something like
openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = "2023-03-15-preview"
openai.api_key = os.getenv("OPENAI_API_KEY")
Not sure how to do it with trulens
Thanks a lot
Cheers,
Fra
Describe the bug
The symptom occurs with error 'Tensor' has no attribute 'numpy'
The problem is because this call in tensorflow_v2 is expecting eager, but it is not in eager mode.
To Reproduce
implement a tensorflow.keras.Model
Override call() function with @tf.function annotation.
Quick hack fix is to remove the annotation, but this is not the correct longterm solution
from tensorflow.keras import Model
class CustomModel(Model):
def __init__ ....
@tf.function
def call()...
(Run attribution on this model)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.