truera / trulens Goto Github PK

Evaluation and Tracking for LLM Experiments

License: MIT License

Python 40.51% Shell 0.04% Jupyter Notebook 57.64% Makefile 0.24% JavaScript 0.05% HTML 0.01% TypeScript 1.48% Mako 0.02% CSS 0.01%

machine-learning neural-networks explainable-ml llm llmops

trulens's People

Contributors

Stargazers

Watchers

Forkers

macklinkachorn laplacekorea stjordanis omar-fouad gdbsd nathanhack eshnil2000 imdark ai-hub-deep-learning-fundamental shayaksen afiqmuzaffar adrianemikko seanahmad amelia22974 c3-sina ns11 lucidcircuscom anupgoenka jaytoday seshakiran datajms francescosaveriozuppichini techthiyanes manu87ds kbcrowe inayet tullytim croesuslab lariel-fernandes amart85 dkzdev ishaan-jaff lmbarak01 alecjcn jansystemic tmcsiva lino5000 yisding singlaayush zhutony brunoscaglione c9luster foundation-model josephrp chitimbwasc abeusher yuan776 isayahc colombod noahvl nileshchopda innoktiv huzefaanver2303002khideg rrichaz1 schmidtseb arattinger tanvirrahman99 ahmedjemaa-tech alialaki nvillaluenga lokeshjonnakuti prashantdixit0 acal-07 michaelromagne deltavml voidpixelgh nagarajuerigi daniloktz zenodia epinzur fovi-llc viking916 jisunglim eternalerrors christogh takween-ai mariaafara jonmach rajib76 ingridstevens andrewisplinghoff vivekgangasani hipotures mr2cool biniyam69 lestarr hiboyang chaibiyun0703 muhammadmudassirraza12345 roschler budhewarvijay0407 aaronvarghese allthingsllm lehidalgo web3-engineer rengongzhihuimengjing kshitiz305 shossain drazvan sharma-pratik

trulens's Issues

OpenAI Chat LLM with ChatPromptTemplate raises error with TrueChain

Hi,

I'm trying to integrate trulens eval in our setup.
We are using ChatOpenAI model with ChatPromptTemplate in langchain.
When calling the chain directly, it works fine. Doing the same through TruChain results in an error.
Versions:
trulens-eval==0.10.0
langchain==0.0.266

This is the code reproducing the issue.
It demonstrates that calling the chain directly works.

from trulens_eval import TruChain
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.schema import SystemMessage
from langchain.prompts.chat import ChatPromptTemplate


llm = ChatOpenAI(temperature=0.9)
# you need to set the Open AI API Key 

prompt = ChatPromptTemplate.from_messages(
    [SystemMessage(content="You are a friendly bot, who speaks like a dog")]
)

chain = LLMChain(llm=llm, prompt=prompt, verbose=True)

truchain = TruChain(
    chain,
    app_id="Chain1_ChatApplication",
)

input = {"input": "Hello"}
result_chain = chain(input)
print("Result from chain works fine: " + str(result_chain))
result = truchain(input)
print("Result from trulens: " + str(result))

Log the production data into Bigquery and retrieve the analysis based on it

Hello,
I'm wondering how we can log the production data into Bigquery database and then retrieve them later to analyze the results.

Support for Langchain's acall method

Relevant slack discussion thread - https://aiqualityforum.slack.com/archives/C02K2B8439S/p1689913999931319

Currently trulens supports Langchain chain's call method, but doesn't support acall or asyncronous calls. Would be great to get that support!

Thanks!

Request to add functionality to Trulens to measure Latency

By measuring latency, we can compare how effective these models are across different implementations. I believe it would be beneficial for evaluating and comparing the performance of this wrapper. Particularly, working with an e-GPU but in a Sequential process, it may help triage bottleneck issues.

"Display full app json" always shows data for first row on the Dashboard

If I open trulens-eval dashboard evaluations -> http://localhost:8501/Evaluations
And have mulitple records,
when I select a different record then the first one in the row,
still the first record's data is shown in the bottom part "Display full app json".
Expected behaviour: the data for the other row should be displayed.

My code to run trulens:

truchain = TruChain(chain, app_id="app_id", tags="my tag)
await truchain.acall_with_record(variables)

trulens-eval version: 0.11.0

Quickstart example not actually logging

The change to use a context manager doesn't actually log anything:

with tru_llm_standalone as recording:
    llm_standalone(prompt_input)

If instead I change that to

tru_llm_standalone.call_with_record(prompt_input)

I see records logged.

[Bug] Dashboard breaks with `llama-index>0.7.23`

Affects releases: 0.9.0

Issue

Dashboard breaks if I my Python environment has llama-index 0.7.24.post1 or later.

Not sure if it's an issue with the new llama-index releases or a compatibility issue between that and trulens-eval.

I can only get it to work by setting llama-index>=0.7.16,<=0.7.23 in requirements.txt or pyproject.toml

Reproduce

Update to llama-index=0.7.24.post1 or later (up to 0.8.2), then run Tru().start_dashboard() and browse to the dashboard.

See dashboard log here: log.txt

Suggestion

A temporary workaround would be pinning the llama-index version to 0.7.23 in requirements.txt and setup.py.

But users of trulens-eval would miss on any future llama-index updates while the issue lasts.

API to retrieve feedback function results after a run

Currently, it's hard to retrieve feedback function results in a structured way. It would be nice to have a:
get_feedbacks(record) API

Add multiple tags using a list or multiple columns called tag1, tag2 to TruChain

This will be helpful if we want to track user ID, and session ID, customer ID and other related information in the session.

trulens doesn't work with llama 0.7.5

The code below:

import os
os.environ["OPENAI_API_KEY"] = "sk-***"
os.environ["HUGGINGFACE_API_KEY"] = "hf_***"

# Imports main tools:
from trulens_eval import TruLlama, Feedback, Tru, feedback
tru = Tru()

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("What did the author do growing up?")
print(response)

import numpy as np



# Initialize Huggingface-based feedback function collection class:
hugs = feedback.Huggingface()
openai = feedback.OpenAI()

# Define a language match feedback function using HuggingFace.
f_lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.

# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()

# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
    TruLlama.select_source_nodes().node.text
).aggregate(np.min)

tru_query_engine = TruLlama(query_engine,
    app_id='LlamaIndex_App1',
    feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance])

generates:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[/mnt/llm/trulens/1.py](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/1.py) in line 33
     [27](file:///mnt/llm/trulens/1.py?line=26) # Question/statement relevance between question and each context chunk.
     [29](file:///mnt/llm/trulens/1.py?line=28) f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
     [30](file:///mnt/llm/trulens/1.py?line=29)     TruLlama.select_source_nodes().node.text
     [31](file:///mnt/llm/trulens/1.py?line=30) ).aggregate(np.min)
---> [33](file:///mnt/llm/trulens/1.py?line=32) tru_query_engine = TruLlama(query_engine,
     [34](file:///mnt/llm/trulens/1.py?line=33)     app_id='LlamaIndex_App1',
     [35](file:///mnt/llm/trulens/1.py?line=34)     feedbacks=[f_lang_match, f_qa_relevance, f_qs_relevance])

File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:134](https://vscode-remote+ssh-002dremote***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:134), in TruLlama.__init__(self, app, **kwargs)
    [132](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=131) kwargs['app'] = app
    [133](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=132) kwargs['root_class'] = Class.of_object(app)  # TODO: make class property
--> [134](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=133) kwargs['instrument'] = LlamaInstrument()
    [136](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=135) super().__init__(**kwargs)

File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:101](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:101), in LlamaInstrument.__init__(self)
     [97](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=96) def __init__(self):
     [98](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=97)     super().__init__(
     [99](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=98)         root_method=TruLlama.query_with_record,
    [100](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=99)         modules=LlamaInstrument.Default.MODULES,
--> [101](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=100)         classes=LlamaInstrument.Default.CLASSES(),  # was thunk
    [102](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=101)         methods=LlamaInstrument.Default.METHODS
    [103](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=102)     )

File [~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:55](https://vscode-remote+ssh-002dremote-***.vscode-resource.vscode-cdn.net/mnt/llm/trulens/~/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py:55), in LlamaInstrument.Default.()
     [42](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=41) MODULES = {"llama_index."}.union(
     [43](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=42)     LangChainInstrument.Default.MODULES
     [44](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=43) )  # NOTE: llama_index uses langchain internally for some things
     [46](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=45) # Putting these inside thunk as llama_index is optional.
     [47](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=46) CLASSES = lambda: {
     [48](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=47)     llama_index.indices.query.base.BaseQueryEngine,
     [49](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=48)     llama_index.indices.base_retriever.BaseRetriever,
     [50](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=49)     llama_index.indices.base.BaseIndex,
     [51](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=50)     llama_index.chat_engine.types.BaseChatEngine,
     [52](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=51)     llama_index.prompts.base.Prompt,
     [53](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=52)     # llama_index.prompts.prompt_type.PromptType, # enum
     [54](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=53)     llama_index.question_gen.types.BaseQuestionGenerator,
---> [55](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=54)     llama_index.indices.query.response_synthesis.ResponseSynthesizer,
     [56](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=55)     llama_index.indices.response.refine.Refine,
     [57](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=56)     llama_index.llm_predictor.LLMPredictor,
     [58](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=57)     llama_index.llm_predictor.base.LLMMetadata,
     [59](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=58)     llama_index.llm_predictor.base.BaseLLMPredictor,
     [60](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=59)     llama_index.vector_stores.types.VectorStore,
     [61](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=60)     llama_index.question_gen.llm_generators.BaseQuestionGenerator,
     [62](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=61)     llama_index.indices.service_context.ServiceContext,
     [63](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=62)     llama_index.indices.prompt_helper.PromptHelper,
     [64](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=63)     llama_index.embeddings.base.BaseEmbedding,
     [65](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=64)     llama_index.node_parser.interface.NodeParser
     [66](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=65) }.union(LangChainInstrument.Default.CLASSES())
     [68](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=67) # Instrument only methods with these names and of these classes. Ok to
     [69](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=68) # include llama_index inside methods.
     [70](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=69) METHODS = dict_set_with(
     [71](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=70)     {
     [72](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=71)         "get_response":
   (...)
     [94](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=93)     }, LangChainInstrument.Default.METHODS
     [95](file:///home/wodecki/.conda/envs/llamaindex/lib/python3.10/site-packages/trulens_eval/tru_llama.py?line=94) )

AttributeError: module 'llama_index.indices.query' has no attribute 'response_synthesis'

Allow get query parameters?

Hi, I have an application that uses Trulens scores and I would like to link that application to the Evaluations page.

Is it possible to enable query parameters so I can generate a url with the App Filters?

Would be nice to write the filters on the URL:
https://docs.streamlit.io/library/api-reference/utilities/st.experimental_set_query_params

Then get the filters back:
https://docs.streamlit.io/library/api-reference/utilities/st.experimental_get_query_params

Thanks!

Non eager Tensorflow tests are failing

Need to rename test to model_wrapper_non_eager_test.py instead of model_wrapper_test_non_eager.py to enable pytest

unit tests are failing

Have a share on twitter feature for leaderboard

Request for Citation

Hi, it is exciting to have Trulens. I wonder if a bibtex citation can be provided for the users to cite the repo?

Random gradients in torch summarizer pipeline

This code has different gradients each time

from trulens.nn.attribution import InputAttribution, InternalInfluence
from trulens.nn.attribution import IntegratedGradients
from trulens.nn.quantities import MaxClassQoI
from trulens.nn.distributions import PointDoi, LinearDoi
from trulens.nn.slices import Cut, InputCut, OutputCut, Slice
from trulens.nn.models import get_model_wrapper

from transformers import pipeline

summarizer = pipeline("summarization")

summarizer(
    """
    America has changed dramatically during recent years. Not only has the number of 
    graduates in traditional engineering disciplines such as mechanical, civil, 
    electrical, chemical, and aeronautical engineering declined, but in most of 
    the premier American universities engineering curricula now concentrate on 
    and encourage largely the study of engineering science. As a result, there 
    are declining offerings in engineering subjects dealing with infrastructure, 
    the environment, and related issues, and greater concentration on high 
    technology subjects, largely supporting increasingly complex scientific 
    developments. While the latter is important, it should not be at the expense 
    of more traditional engineering.

    Rapidly developing economies such as China and India, as well as other 
    industrial countries in Europe and Asia, continue to encourage and advance 
    the teaching of engineering. Both China and India, respectively, graduate 
    six and eight times as many traditional engineers as does the United States. 
    Other industrial countries at minimum maintain their output, while America 
    suffers an increasingly serious decline in the number of engineering graduates 
    and a lack of well-educated engineers.
"""
)

m = summarizer.model.to("cpu")
tokenizer = summarizer.tokenizer

batch_sentences = ["Hello I'm a single sentence", "And another sentence", "And the very very last one"]
batch = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="pt").to(m.device)
input = batch['input_ids']

wrapped_model = get_model_wrapper(m, input_shape=(None, 7), device='cpu')

embed_cut = Cut('model_encoder_layers_0_self_attn_q_proj', anchor='in')
from trulens.nn.quantities import QoI
import torch
class SummarizerQoI(QoI):
    def __call__(self, seq_2_seq_output):
        logits = seq_2_seq_output['logits']
        
        max_token, max_indices = torch.max(logits,dim=-1)
        qoi = torch.mean(max_token, 1, True)
        return qoi

infl = InternalInfluence(
    wrapped_model, 
    Slice(embed_cut, OutputCut()), 
    SummarizerQoI(),
    PointDoi(cut=embed_cut))

attrs = infl.attributions(**batch)

ValueError in the tensorflow 2 / keras notebook provided in the website.

The error occurs when executing the 6th code cell, it contains following code:

infl = InputAttribution(model)
attrs_input = infl.attributions(x_pp)

Error Log:

ValueError                                Traceback (most recent call last)
<ipython-input-18-9b10bbb33c8d> in <module>()
      1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)

/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
    269             to_cut=self.slice.to_cut,
    270             intervention=D,
--> 271             doi_cut=doi_cut)
    272         # Take the mean across the samples in the DoI.
    273         if isinstance(qoi_grads, DATA_CONTAINER_TYPE):

/usr/local/lib/python3.7/dist-packages/trulens/nn/models/keras.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
    430             intervention, DATA_CONTAINER_TYPE) else [intervention]
    431 
--> 432         Q = qoi(to_tensors[0]) if len(to_tensors) == 1 else qoi(to_tensors)
    433 
    434         doi_tensors, intervention = self._prepare_intervention_with_input(

/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in __call__(self, y)
    109 
    110     def __call__(self, y: TensorLike) -> TensorLike:
--> 111         self._assert_cut_contains_only_one_tensor(y)
    112 
    113         if self.activation is not None:

/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in _assert_cut_contains_only_one_tensor(self, x)
     76                 '`{}` expected to receive an instance of `Tensor`, but '
     77                 'received an instance of {}'.format(
---> 78                     self.__class__.__name__, type(x)))
     79 
     80 

ValueError: `MaxClassQoI` expected to receive an instance of `Tensor`, but received an instance of <class 'keras.engine.keras_tensor.KerasTensor'>

Feedback function on Groundedness did not work with AzureOpenAI

The summarize_provider is default to OpenAI, which prevents from instantiating ChatCompletion with AzureOpenAI.

AzureOpenAI fails: missing deployment_id

Bug when using the new AzureOpenAI wrapper added in PR #242

How to reproduce:

azure = AzureOpenAI(model_engine="gpt-35-turbo", deployment_id="gpt-4")

Error:

Traceback (most recent call last):
  File "/home/lab/lab392-rfpvirtualassistant/scripts/langchain/eval.py", line 15, in <module>
    pipeline = QAPipeline(config['AWS']['AWSKendraSearchIndexId'], config['OpenAI']['OpenAIAnswerDeployment'], eval_mode=True, answers=answers)
  File "/home/lab/lab392-rfpvirtualassistant/scripts/langchain/pipeline.py", line 58, in __init__
    azure = AzureOpenAI(model_engine="gpt-35-turbo", deployment_id=openAIDeployment)
  File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1689, in __init__
    super().__init__(
  File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1049, in __init__
    super().__init__(
  File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/feedback.py", line 1020, in __init__
    super().__init__(*args, **kwargs)
  File "/home/lab/anaconda/envs/lab392/lib/python3.10/site-packages/trulens_eval/util.py", line 1660, in __init__
    super().__init__(__tru_class_info=class_info, **kwargs)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for AzureOpenAI
deployment_id
  field required (type=value_error.missing)

I believe the problem is that in class OpenAI, the deployment_id parameter is lost since it is not assigned to self_kwargs. PR incoming with my suggestion to fix this bug.

Explain semantic segmentation models via PyTorch backend

Hi team, I'm delighted to explore this library for finding notorious issues in our trained models.

I'm trying to apply TruLens to various tasks (CV - segmentation, localisation; NLP - classification, generation etc.) and got stuck in the initial stages itself.

For started, I have tried to load pre-trained semantic/instance segmentation models and couldn't get them to work (after spending few cycles on debugging).

A minute change in the intro_demo_pytorch.ipynb at the line:

pytorch_model = models.segmentation.fcn_resnet50(pretrained=True)

throws the following error while calculating the InputAttribution():

ValueError                                Traceback (most recent call last)
<ipython-input-11-9b10bbb33c8d> in <module>()
      1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)

3 frames
/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
    269             intervention=D,
    270             doi_cut=doi_cut)
--> 271 
    272         # Take the mean across the samples in the DoI.
    273         if isinstance(qoi_grads, DATA_CONTAINER_TYPE):

/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
    461         grads_list = []
    462         for z in zs:
--> 463             z_flat = ModelWrapper._flatten(z)
    464             qoi_out = qoi(y)
    465 

/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in __call__(self, y)
    109 
    110     def __call__(self, y: TensorLike) -> TensorLike:
--> 111         self._assert_cut_contains_only_one_tensor(y)
    112 
    113         if self.activation is not None:

/usr/local/lib/python3.7/dist-packages/trulens/nn/quantities.py in _assert_cut_contains_only_one_tensor(self, x)
     76                 '`{}` expected to receive an instance of `Tensor`, but '
     77                 'received an instance of {}'.format(
---> 78                     self.__class__.__name__, type(x)))
     79 
     80 

ValueError: `MaxClassQoI` expected to receive an instance of `Tensor`, but received an instance of <class 'collections.OrderedDict'>

Can you share some guidance on triaging this error? I'm using the same input image and transforms.

Using it with out the Sqllite db

Hi Teams
The record tracking function is look great！
I would like to ask is it possible to use it with out running a db, I mainly want to use it to get the tracking record for each time of my call.

Thank you very much

Multiple tags for a record?

I was wondering if it is supported to add multiple tags for a record.

Example code from the documentation for adding tags:
https://www.trulens.org/trulens_eval/langchain_quickstart/#instrument-chain-for-logging-with-trulens

truchain = TruChain(chain,
    app_id='Chain1_ChatApplication',
    feedbacks=[f_lang_match],
    tags = "prototype")

What if I wanted to add one more tag, e.g.
tags = ["tag1","tag2"]

Is something like this supported?
Currently I get an error, that a single string is expected.

Usability QoL items

move quick start to top and add before hard to see badges "Quickstart Pytorch, and Tensorflow/Keras notebooks are below:"
In the notebooks, make clear attribution shape is input shape and that mask visualizer is just a helper for image data and not mandatory

Can it work with a Local LLM?

import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from trulens_eval import TruChain, Feedback, Huggingface, Tru


tru = Tru()


os.environ["HUGGINGFACE_API_KEY"] = ""


gpt4all_path = './models/gpt4all-converted.bin'


callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])


embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
hugs = Huggingface()


f_lang_match = Feedback(hugs.language_match).on_input_output()

llm = GPT4All(model=gpt4all_path,verbose=True,temp=0.2)


index = FAISS.load_local("my_faiss_index", embeddings)


template = """Answer the following question using {context}
Question: {question}
Answer:
"""


def get_best_answer(question):
    matched_docs, sources = similarity_search(question, index, n=4)
    context = "\n".join([doc.page_content for doc in matched_docs])
    prompt = PromptTemplate(template=template, input_variables=["context", "question"]).partial(context=context)
    llm_chain = LLMChain(prompt=prompt, llm=llm)
    truchain = TruChain(llm_chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match],tags = "prototype")
    answer = truchain.run(question)
    return answer



def similarity_search(query, index, n=4):
    matched_docs = index.similarity_search(query, k=n)
    sources = []
    for doc in matched_docs:
        sources.append(
            {
                "page_content": doc.page_content,
                "metadata": doc.metadata,
            }
        )
    return matched_docs, sources


while True:

    question = input("Please enter your question (or type 'exit' to close the program): ")


    if question.lower() == "exit":
        break


    answer = get_best_answer(question)


    print("Answer:", answer)

gives

✅ app Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_44c43fe23fdbb98055154e6bb126142a -> default.sqlite
Traceback (most recent call last):
  File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 75, in <module>
    answer = get_best_answer(question)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 48, in get_best_answer
    answer = truchain.run(question)
             ^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'RuntimeError' object is not callable`

[Bug] Selector `GetItems` crashes with `AttributeError`

How to reproduce:

from trulens_eval import Select

selector = Select.RecordCalls._call.args.inputs[["key1", "key2"]]

repr(selector)

Output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../.venv/lib/python3.9/site-packages/trulens_eval/util.py", line 951, in __repr__
    return "JSONPath()" + ("".join(map(repr, self.path)))
  File ".../.venv/lib/python3.9/site-packages/trulens_eval/util.py", line 926, in __repr__
    return f"[{','.join(self.indices)}]"
AttributeError: 'GetItems' object has no attribute 'indices'

Then obviously this crashes any Feedback that uses this selector

Suggested fix

On util.py

    def __repr__(self):
        return f"[{','.join(self.items)}]"

Add in Llama-index implementation

No notebooks for NLP and time-series RNNs .

The notebooks directory contains notebooks named intro_demo_pytorch.ipynb and intro_demo_tf2_keras.ipynb , these notebooks show how trulens can be used for visualizing and explaining an image classification model. I request similar notebooks for NLP and time-series RNNs so that users can easily get started with them.
Thankyou!

QoI has no access to instance being explained

Quantities of interests receive output values on DoI samples but these are not necessarily the instance being explained, except in some cases like point. Because of this, one cannot define a QoI for "explained instance predicted class logits" because predicted class is not know unless the DoI sample is the same as the explained instance.

A related issue is that some may interpret QoI like the default max to mean what I described above but that is not what the max QoI implements because of the issue noted.

Support for Llama_Index's query_engine aquery method and chat_engine chat/achat methods

Currently trulens supports LlamaIndex's query method, but doesn't support aquery. Also we need support of chat_engine methods: chat and achat.

Thanks!

[Bug] Missing Package and Unicode Encode Error

Hello, you forgot to import os :D

And it seems that these symbols below only work with Streamlit. The Azure Function libs do not support encoding these unicode characters. When I replace them with nothing, my app works. Could you consider removing them?

[Bug] TruChain with AgentExecutor breaks LangChain

This bug affects release 0.9.0 and it was probably introduced with #362 (I tested with the codebase before and after the merge, before it works fine).

Context

While trying to reproduce this example from LangChain I noticed that the TruChain with an AgentExecutor causes TypeError: 'NoneType' object is not subscriptable in langchain.chains.base

Reproduce

from langchain import LLMMathChain
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.chat_models import ChatOpenAI
from trulens_eval import Tru, TruChain

llm=ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

llm_math_chain = LLMMathChain.from_llm(llm, verbose=True)

tools = [
    Tool(
        name="Calculator",
        func=llm_math_chain.run,
        description="useful for when you need to answer questions about math"
    ),
]

agent = initialize_agent(tools, llm, agent=AgentType.OPENAI_FUNCTIONS, verbose=True)

app = TruChain(agent)

# works fine
agent(inputs={"input": "how much is Euler's number divided by PI"})

# raises `TypeError: 'NoneType' object is not subscriptable`
app(inputs={"input": "how much is Euler's number divided by PI"})

Output logs and stacktrace here: logs.txt

Handle single-entry dictionary output in default QoIs

It seems that the output of torchvision models is an OrderedDict and not a torch.Tensor, which by default has a single key, 'out'. Currently a custom QoI is required to handle this type of output:

class TorchvisionMaxClassQoI(MaxClassQoI):
    def __call__(self, y):
        super().__call__(y['out'])

It would be nice to handle this case by default instead of requiring a custom QoI. If the input, y to a QoI is a dict or OrderedDict with exactly one entry, then that entry should be used in place of y rather than raising an exception (as this is unambiguously the output we would care about).

Avoid passing in tru workspace as a parameter

Have a Quickstart from Colab

Python Quick Start (no notebook)

I don't use Jupyter notebook, but I do use LLMs. Could have a quick start that doesn't require Jupyter?

Installation fails due to fastavro

Trulens package needs fastavro==1.7.4.
1.7.4 version has an issue and it has been fixed in latest update (1.8.2)
Can you please fix this ?

[FR] Allow passing `dict` as input to feedback function

Context

Some feedback functions may need to operate on the final formatted prompt, not only on prompt inputs. For example, when using the ReAct pattern the prompt might contain dynamic instructions or examples that need to be taken into account when evaluating the LLM behavior.

One could try to achieve that by passing both the prompt template and the inputs dictionary to the feedback function, then letting it format the prompt before calculating the metric:

class CustomProvider(Provider):
    def my_feedback_func(self, prompt: str, inputs: dict) -> float:
        prompt = prompt.format(**inputs)
        return float(len(prompt))

Feedback(CustomProvider().my_feedback_func) \
    .on(Select.App.app.prompt.template) \
    .on(Select.RecordCalls._call.args.inputs)

However, that is not possible, because the FeedbackCall schema doesn't allow a dict as argument to the feedback function.
Using the above defined feedback in an app will produce the following error:

Feedback Function Exception Caught: Traceback (most recent call last):
  File ".../.venv/lib/python3.9/site-packages/trulens_eval/feedback.py", line 862, in run
    feedback_call = FeedbackCall(
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for FeedbackCall
args -> inputs
  str type expected (type=type_error.str)

Suggestion

Change the schema of FeedbackCall to the following:

class FeedbackCall(SerialModel):
    args: Dict[str, Union[str, JSON]]
    ret: float

[Request] Make code snippets copy-able on How to use TruLens page

Background

Currently, the How to use TruLens page displays 3 code snippets as PNGs. These PNGs are both static and resolution is impacted depending on the user's screen.

Request

Replace snippet PNGs with copy-able code snippets for ease of use, accessibility, and ease of maintenance (ideal), OR
Add hyperlinks to respective files/gists with code represented in PNGs.

Dependency of visualizers on backend leads to confusing error

Many visualizers, including the Tiler class should be able to be used without necessarily using other TruLens features. However, typically, the backend will be set when the user calls get_model_wrapper, unless they have explicitly set the backend environment variable (which is probably not the norm). This creates a problem when using the Tiler as it uses the backend to get the channel dimension. When the backend has not been set either in the environment variable or by calling get_model_wrapper, the backend comes back as None, leading to an error, that will perhaps be confusing to a user trying to use just the visualization library of TruLens.

I suggest that we try to handle the case where the backend is not set when the tiler is called (and other places the channel dimension comes up in the visualizers). Most of the time, the channel dimension should be able to be inferred, because it is used for the purpose of displaying RGB or grayscale images, which can only have a 3 or a 1 as the size of the channel dimension. Thus, the only ambiguous case is when the image itself has a height/width of 3 or 1. In those cases, perhaps we can adopt a default, e.g., the convention used by matplotlib (which I believe is channels last).

The new process when trying to obtain the channel dimension would be:

check if the backend is set. If so, use the backend channel dimension;
otherwise, check if there is a single dimension with size 1 or 3. If so, use that as the channel dimension;
otherwise, assume the user has provided images in the format that the visualization library (matplotlib) expects.

This way the visualizations module never throws errors simply because the user never specified the backend.

Error when running IntegratedGradients.attributions(image)

Hello! Would appreciate any help to solve this problem! I'm trying to use the IntegratedGradients() to explain my model.

yolo = Load_Yolo_model() # yolo is <class 'tensorflow.python.keras.engine.functional.Functional'> object
model_wrapped = get_model_wrapper(yolo)
ig_computer = IntegratedGradients(model_wrapped, resolution=20)

with PIL.Image.open(image_path).convert('RGB') as img:
    x = np.array(img.resize((416, 416)))
    x_np = np.array(img.resize((416, 416), PIL.Image.ANTIALIAS))[np.newaxis]
    input_attributions = ig_computer.attributions(x_np)

But run into a problem:

Traceback (most recent call last):
  File "trulens_yolov3.py", line 31, in <module>
    input_attributions = ig_computer.attributions(x_np)
  File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/attribution.py", line 264, in attributions
    qoi_grads = self.model.qoi_bprop(
  File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/models/tensorflow_v2.py", line 424, in qoi_bprop
    Q = qoi(outputs[0]) if len(outputs) == 1 else qoi(outputs)
  File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/quantities.py", line 111, in __call__
    self._assert_cut_contains_only_one_tensor(y)
  File "/home/ubuntu/anaconda3/envs/TF23/lib/python3.8/site-packages/trulens/nn/quantities.py", line 63, in _assert_cut_contains_only_one_tensor
    raise QoiCutSupportError(
trulens.nn.quantities.QoiCutSupportError: Cut provided to quantity of interest was comprised of multiple tensors, but `MaxClassQoI` is only defined for cuts comprised of a single tensor (received a list of 3 tensors).

Either (1) select a slice where the `to_cut` corresponds to a single tensor, or (2) implement/use a `QoI` object that supports lists of tensors, i.e., where the parameter, `x`, to `__call__` is expected/allowed to be a list of 3 tensors.

requires_gradients_ is being called on model_args, which are not guaranteed to be tensors nor be connected via gradients

in models.pytorch

model_args = ModelWrapper._nested_apply(
                model_args, lambda model_args: model_args.requires_grad_(True))

Default cut to InputAttribution is incorrect

The default cut argument to InputAttribution is set to None:

trulens/trulens/nn/attribution.py

Line 465 in c93dbe3

cut: CutLike = None,

This is then passed to InternalInfluence with a cut (InputCut(), None), and InternalInfluence interprets a None cut as an InputCut(). This is not the intended default behavior, which should be OutputCut().

This causes correctness issues, where attributions are returned identical to the input (tested on tensorflow==2.4.0).

Testing Local LLMs?

import os
from langchain import PromptTemplate, LLMChain
from langchain.llms import GPT4All
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from trulens_eval import TruChain, Feedback, Huggingface, Tru


tru = Tru()


os.environ["HUGGINGFACE_API_KEY"] = ""


gpt4all_path = './models/gpt4all-converted.bin'


callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])


embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
hugs = Huggingface()


f_lang_match = Feedback(hugs.language_match).on_input_output()

llm = GPT4All(model=gpt4all_path,verbose=True,temp=0.2)


index = FAISS.load_local("my_faiss_index", embeddings)


template = """Answer the following question using {context}
Question: {question}
Answer:
"""


def get_best_answer(question):
    matched_docs, sources = similarity_search(question, index, n=4)
    context = "\n".join([doc.page_content for doc in matched_docs])
    prompt = PromptTemplate(template=template, input_variables=["context", "question"]).partial(context=context)
    llm_chain = LLMChain(prompt=prompt, llm=llm)
    truchain = TruChain(llm_chain, app_id='Chain1_ChatApplication', feedbacks=[f_lang_match],tags = "prototype")
    answer = truchain.run(question)
    return answer



def similarity_search(query, index, n=4):
    matched_docs = index.similarity_search(query, k=n)
    sources = []
    for doc in matched_docs:
        sources.append(
            {
                "page_content": doc.page_content,
                "metadata": doc.metadata,
            }
        )
    return matched_docs, sources


while True:

    question = input("Please enter your question (or type 'exit' to close the program): ")


    if question.lower() == "exit":
        break


    answer = get_best_answer(question)


    print("Answer:", answer)

gives

✅ app Chain1_ChatApplication -> default.sqlite
✅ feedback def. feedback_definition_hash_44c43fe23fdbb98055154e6bb126142a -> default.sqlite
Traceback (most recent call last):
  File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 75, in <module>
    answer = get_best_answer(question)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\abhishek\Local-LLM\gpt4all_truelens.py", line 48, in get_best_answer
    answer = truchain.run(question)
             ^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'RuntimeError' object is not callable`

Image in README 404's

FYI as a heads up

Remove the need to set the framework environment

os.environ['TRULENS_BACKEND'] = 'keras'

is set incorrectly, then the library runs into issues. This should be able to be inferred from the model type upon wrapping the model

Explain object detection models via PyTorch backend

Hi team,

I'm trying to explain the output of localization models (fasterrcnn_resnet50_fpn, and others in Torchvision).
There are no issues when I wrap the pytorch model in get_model_wrapper() but passing this wrapped model to InputAttributions() or IntegratedGradients() throws, what seems a trivial PyTorch error:

Would you know, where in trulens/pytorch.py are we creating views and if that's something that can be quickly fixed?

RuntimeError                              Traceback (most recent call last)
<ipython-input-43-9b10bbb33c8d> in <module>()
      1 infl = InputAttribution(model)
----> 2 attrs_input = infl.attributions(x_pp)

7 frames
/usr/local/lib/python3.7/dist-packages/trulens/nn/attribution.py in attributions(self, *model_args, **model_kwargs)
    269             to_cut=self.slice.to_cut,
    270             intervention=D,
--> 271             doi_cut=doi_cut)
    272         # Take the mean across the samples in the DoI.
    273         if isinstance(qoi_grads, DATA_CONTAINER_TYPE):

/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in qoi_bprop(self, qoi, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention)
    455             attribution_cut=attribution_cut,
    456             intervention=intervention,
--> 457             return_tensor=False)
    458 
    459         y = to_cut.access_layer(y)

/usr/local/lib/python3.7/dist-packages/trulens/nn/models/pytorch.py in fprop(self, model_args, model_kwargs, doi_cut, to_cut, attribution_cut, intervention, return_tensor, input_timestep)
    370         ]
    371         # Run the network.
--> 372         output = self.model(*model_args, **model_kwargs)
    373         if isinstance(output, tuple):
    374             output = output[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/generalized_rcnn.py in forward(self, images, targets)
     75             original_image_sizes.append((val[0], val[1]))
     76 
---> 77         images, targets = self.transform(images, targets)
     78 
     79         # Check for degenerate boxes

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1049         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1050                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1051             return forward_call(*input, **kwargs)
   1052         # Do not call functions when jit is used
   1053         full_backward_hooks, non_full_backward_hooks = [], []

/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/transform.py in forward(self, images, targets)
    118 
    119         image_sizes = [img.shape[-2:] for img in images]
--> 120         images = self.batch_images(images, size_divisible=self.size_divisible)
    121         image_sizes_list: List[Tuple[int, int]] = []
    122         for image_size in image_sizes:

/usr/local/lib/python3.7/dist-packages/torchvision/models/detection/transform.py in batch_images(self, images, size_divisible)
    222         batched_imgs = images[0].new_full(batch_shape, 0)
    223         for img, pad_img in zip(images, batched_imgs):
--> 224             pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
    225 
    226         return batched_imgs

RuntimeError: A view was created in no_grad mode and is being modified inplace with grad mode enabled. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly

I am running in ubuntu 20.04 machine. I have integrated trulens with llama-index for quering. Here is the code -

from multiprocessing.managers import BaseManager
from trulens_eval import TruLlama, Feedback, Tru, feedback
from sqlalchemy import URL
from llama_index import SimpleWebPageReader
from llama_index import VectorStoreIndex
import numpy as np

def initialize_trulens():
    global feedbacks

    # Initiating the trulens dashboard
    url_object = URL.create(
        "postgresql+psycopg2",
        username=os.environ.get("SUPERBASE_USERNAME"),
        password=os.environ.get("SUPERBASE_PASSWORD"), 
        host=os.environ.get("SUPERBASE_HOST"),
        port=os.environ.get("SUPERBASE_PORT"),
        database=os.environ.get("SUPERBASE_DATABASE")
    )
    tru = Tru(database_url= url_object)
    tru.run_dashboard()

    # Initialize Huggingface-based feedback function collection class:
    hugs = feedback.Huggingface()
    openai = feedback.OpenAI()
    # Define a language match feedback function using HuggingFace.
    f_lang_match = Feedback(hugs.language_match).on_input_output()
    # By default this will check language match on the main app input and main app
    # output.

    # Question/answer relevance between overall question and answer.
    f_qa_relevance = Feedback(openai.relevance).on_input_output()

    # Question/statement relevance between question and each context chunk.
    f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
        TruLlama.select_source_nodes().node.text
    ).aggregate(np.min)

    feedbacks = [f_lang_match, f_qa_relevance, f_qs_relevance]

def query():
    documents = SimpleWebPageReader(html_to_text=True).load_data(
         ["http://paulgraham.com/worked.html"]
    )
    index = VectorStoreIndex.from_documents(documents)
    
    query_engine = index.as_query_engine()
    tru_query_engine_recorder = TruLlama(
        query_engine,
        app_id='LlamaIndex_App1',
        feedbacks=feedbacks
    )
    response = tru_query_engine_recorder.query("What did the author do growing up?")
    print(response)
    return response

    
if __name__ == "__main__":
    print("server started...")
    initialize_trulens()
    manager = BaseManager(('', 5602), b'password')
    manager.register('query_index', query)
    server = manager.get_server()
    server.serve_forever()

After running the server it is working good. I can see the dashboard of trulens. But after some time I am getting the following error and trulens dashboard gets disconnected from database.

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

2023-09-26 14:01:57.225 MediaFileHandler: Missing file 16a57003a519fd6aa13a2f733bd4fa522ba8e58545862765c5d0ce92.png

2023-09-26 14:01:57.235 MediaFileHandler: Missing file 6b6481b1f4db67286783dec664cc63b2153e3c7c262a3bff8a328923.png

2023-09-26 14:01:57.236 MediaFileHandler: Missing file 37d4f4b62a7b6acf699e54df90a7bf371cbab2d17fca60c9372fd503.png

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

2023-09-26 14:02:14.752 MediaFileHandler: Missing file 16a57003a519fd6aa13a2f733bd4fa522ba8e58545862765c5d0ce92.png

2023-09-26 14:02:14.753 MediaFileHandler: Missing file 6b6481b1f4db67286783dec664cc63b2153e3c7c262a3bff8a328923.png

2023-09-26 14:02:14.754 MediaFileHandler: Missing file 37d4f4b62a7b6acf699e54df90a7bf371cbab2d17fca60c9372fd503.png

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

2023-09-26 14:18:28.504 Session with id 36790e30-3097-4309-acd6-b1d908187e66 is already connected! Connecting to a new session.

Tru was already initialized. Cannot change database_url=postgresql+psycopg2://<database_username>:<database_password>@<database_host>:<database_port>/<database_name> or database_file=None .

2023-09-26 14:18:28.840 Uncaught app exception

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context

    self.dialect.do_execute(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute

    cursor.execute(statement, parameters)

psycopg2.OperationalError: server closed the connection unexpectedly

	This probably means the server terminated abnormally

	before or while processing the request.

server closed the connection unexpectedly

	This probably means the server terminated abnormally

	before or while processing the request.





The above exception was the direct cause of the following exception:



Traceback (most recent call last):

  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script

    exec(code, module.__dict__)

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 141, in <module>

    main()

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 137, in main

    streamlit_app()

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/Leaderboard.py", line 51, in streamlit_app

    df, feedback_col_names = lms.get_records_and_feedback([])

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 60, in wrapper

    callback(*args, **kwargs)

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/sqlalchemy_db.py", line 44, in <lambda>

    run_before(lambda self, *args, **kwargs: check_db_revision(self.engine)),

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 112, in check_db_revision

    if is_legacy_sqlite(engine):

  File "/usr/local/lib/python3.10/site-packages/trulens_eval/database/utils.py", line 82, in is_legacy_sqlite

    tables = list(inspector.get_table_names())

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 396, in get_table_names

    return self.dialect.get_table_names(

  File "<string>", line 2, in get_table_names

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 97, in cache

    ret = fn(self, con, *args, **kw)

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3368, in get_table_names

    return self._get_relnames_for_relkinds(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/base.py", line 3364, in _get_relnames_for_relkinds

    return connection.scalars(query).all()

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1344, in scalars

    return self.execute(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1412, in execute

    return meth(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection

    return connection._execute_clauseelement(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement

    ret = self._execute_context(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context

    return self._exec_single_context(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context

    self._handle_dbapi_exception(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception

    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context

    self.dialect.do_execute(

  File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 921, in do_execute

    cursor.execute(statement, parameters)

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly

	This probably means the server terminated abnormally

	before or while processing the request.

server closed the connection unexpectedly

	This probably means the server terminated abnormally

	before or while processing the request.



[SQL: SELECT pg_catalog.pg_class.relname 

FROM pg_catalog.pg_class JOIN pg_catalog.pg_namespace ON pg_catalog.pg_namespace.oid = pg_catalog.pg_class.relnamespace 

WHERE pg_catalog.pg_class.relkind = ANY (ARRAY[%(param_1)s, %(param_2)s]) AND pg_catalog.pg_class.relpersistence != %(relpersistence_1)s AND pg_catalog.pg_table_is_visible(pg_catalog.pg_class.oid) AND pg_catalog.pg_namespace.nspname != %(nspname_1)s]

[parameters: {'param_1': 'r', 'param_2': 'p', 'relpersistence_1': 't', 'nspname_1': 'pg_catalog'}]

(Background on this error at: https://sqlalche.me/e/20/e3q8)

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6826: RuntimeWarning: All-NaN slice encountered

  xmin = min(xmin, np.nanmin(xi))

/usr/local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:6827: RuntimeWarning: All-NaN slice encountered

  xmax = max(xmax, np.nanmax(xi))

Changing the input of `agreement_measure` returns nan

Usecase: When changing the input to a custom input, agreement_measure returns nan.

If one changes the input like so:

f_agreement_measure = Feedback(GroundTruthAgreement(answers, azure).agreement_measure).on(
                Select.Record.calls[0].rets.query).on(
                Select.Record.main_output["answer"]
)

You can see that in agreement_measure_calls, the ret and meta are empty because it wasn't able to find the input in the dictionary:

[{'args': {'prompt': 'Prompt 1 Modified', 'response': "response 1"}, 'ret': nan, 'meta': {}}]

From what I've understood so far, it's because in GroundTruthAgreement and _find_response(), it is searching for the original query Prompt 1, like in Select.Record.main_input. Instead, it is searching for a query containing Prompt 1 Modified, contained in Select.Record.calls[0].rets.query, but can't find it:

self.ground_truth = [{'query': 'Prompt 1', 'response': 'response 1'}]

Therefore, it returns nan.

【Error】NameError: Fields must not use names with leading underscores

When I follow the demo in langchain_quickstart.ipynb

Error occurs at this line

from IPython.display import JSON
# Imports main tools:
from trulens_eval import TruChain, Feedback, Huggingface, Tru # Error here
from trulens_eval.schema import FeedbackResult
tru = Tru()

Full error message is below:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 4
      1 from IPython.display import JSON
      3 # Imports main tools:
----> 4 from trulens_eval import TruChain, Feedback, Huggingface, Tru
      5 from trulens_eval.schema import FeedbackResult
      6 tru = Tru()

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/__init__.py:83
      1 """
      2 # Trulens-eval LLM Evaluation Library
      3 
   (...)
     78 
     79 """
     81 __version__ = "0.17.0"
---> 83 from trulens_eval.feedback import Bedrock
     84 from trulens_eval.feedback import Feedback
     85 from trulens_eval.feedback import Huggingface

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/feedback/__init__.py:14
     11 AggCallable = Callable[[Iterable[float]], float]
     13 # Specific feedback functions:
---> 14 from trulens_eval.feedback.embeddings import Embeddings
     15 # Main class holding and running feedback functions:
     16 from trulens_eval.feedback.feedback import Feedback

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/feedback/embeddings.py:8
      5 from pydantic import PrivateAttr
      7 from trulens_eval.utils.imports import REQUIREMENT_SKLEARN
----> 8 from trulens_eval.utils.pyschema import WithClassInfo
      9 from trulens_eval.utils.serial import SerialModel
     12 class Embeddings(SerialModel, WithClassInfo):

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/trulens_eval/utils/pyschema.py:588
    584 # Key of structure where class information is stored.
    585 CLASS_INFO = "__tru_class_info"
--> 588 class WithClassInfo(pydantic.BaseModel):
    589     """
    590     Mixin to track class information to aid in querying serialized components
    591     without having to load them.
    592     """
    594     # Using this odd key to not pollute attribute names in whatever class we mix
    595     # this into. Should be the same as CLASS_INFO.

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:104, in __new__(mcs, cls_name, bases, namespace, __pydantic_generic_metadata__, __pydantic_reset_parent_namespace__, **kwargs)

File ~/miniconda3/envs/py310/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:345, in inspect_namespace(namespace, ignored_types, base_class_vars, base_class_fields)

NameError: Fields must not use names with leading underscores; e.g., use 'WithClassInfo__tru_class_info' instead of '_WithClassInfo__tru_class_info'.

trulens_eval.feedback.OpenAI with Azure

Hi there 👋

I am using from trulens_eval.feedback.OpenAI but I need to pass extra args to use with azure, I don't understand the usage of OpenAIEndpoint since I cannot see extra args to setup openai, usually you can do something like

openai.api_type = "azure"
openai.api_base = os.getenv("OPENAI_API_BASE")
openai.api_version = "2023-03-15-preview"
openai.api_key = os.getenv("OPENAI_API_KEY")

Not sure how to do it with trulens

Thanks a lot

Cheers,

Fra

Attributions fails in tf2 with graph activated through tf.function annotated call

Describe the bug
The symptom occurs with error 'Tensor' has no attribute 'numpy'
The problem is because this call in tensorflow_v2 is expecting eager, but it is not in eager mode.

To Reproduce
implement a tensorflow.keras.Model

Override call() function with @tf.function annotation.

Quick hack fix is to remove the annotation, but this is not the correct longterm solution

from tensorflow.keras import Model
class CustomModel(Model):

def __init__ ....

@tf.function
def call()...

(Run attribution on this model)

truera / trulens Goto Github PK

trulens's People

Contributors

Stargazers

Watchers

Forkers

trulens's Issues

Issue

Reproduce

Suggestion

How to reproduce:

Error:

How to reproduce:

Suggested fix

Context

Reproduce

Context

Suggestion

Background

Request

Recommend Projects

Recommend Topics

Recommend Org