langchain-ai / langchain Goto Github PK

🦜🔗 Build context-aware reasoning applications

License: MIT License

Python 66.15% Makefile 0.22% HTML 0.03% Dockerfile 0.02% TeX 0.01% JavaScript 0.01% Shell 0.16% XSLT 0.07% Jupyter Notebook 33.11% MDX 0.22%

langchain's Issues

create github action to auto publish new pypi package when version changes

Add support for OpenAI Completion's Suffix Parameter

Hi, is there any chance you could add support for the suffix parameter to the OpenAI class/call?

https://beta.openai.com/docs/api-reference/completions/create

I could land this if you don't have the time, but it should be simple enough that you could land it faster than I could fork the repo and make a PR.

add support for more vector databases

elasticsearch, pinecone

Add huggingface tokenizer support for splitting text

Would be good to have some methods that split on tokens as in the OpenAI example

https://github.com/openai/openai-cookbook/blob/459afa7d9bf026c4434f54458dc7d9e7d9f9f5fe/examples/Obtain_dataset.ipynb

pip install requirements.txt fails on conda on mac

I get this error when I run pip install -r requirements.txt

ERROR: Could not find a version that satisfies the requirement faiss (from versions: none)
ERROR: No matching distribution found for faiss

As per this issue, requirements.txt should be updated to faiss-cpu.

Should be able to pass API keys as params rather than requiring them to be in the environment

It's just a bit annoying, I want to use this library in production and I currently store credentials not in the environment.

I think the ideal API is like all the AWS SDKs where you can either stick them in the environment OR pass them as params to the llm constructor.

I can do a PR for this if you're accepting PRs?

SelfAskWithSearchChain error when followup required

I have just updated to version 0.0.7

1- When running a simple question like: "What is the capital of Idaho?" , the result is OK

2- When running a question like: "What is the hometown of the reigning men's U.S. Open champion?"

I got the following error:
What is the hometown of the reigning men's U.S. Open champion?
Are follow up questions needed here: Yes.
Follow up: Who is the reigning men's U.S. Open champion?
(.......)

File ~/anaconda3/envs/nlp/lib/python3.10/site-packages/langchain/chains/self_ask_with_search/base.py:166, in SelfAskWithSearchChain.run(self, question)
152 def run(self, question: str) -> str:
153 """Run self ask with search chain.
154
155 Args:
(...)
...
--> 107 elif "snippet" in res["organic_results"][0].keys():
108 toret = res["organic_results"][0]["snippet"]
109 else:
KeyError: 'organic_results'

Add Program of Thoughts Prompting or Program-Aided Language Model

Add NLTK support for splitting text

https://github.com/hwchase17/langchain/blob/master/langchain/text_splitter.py

add vector database (FAISS) integration

Add vector database semantic search chain

Combining Conversation Chain and Agents

Is there a current API for hooking a conversational chain with an agent so, during the conversation, actions can be performed (e.g. db lookup etc)?

If not, how would you see this working architecturally?

Add support for pinecone vector DB

Support HuggingFaceHub embeddings endpoint

The current implementation of HuggingFaceEmbeddings local sentence-transformers to derive the encodings. This can be limiting as it requires a fairly capable machine to download the model, load it, and run inference.

An alternative is to support embeddings derived directly via the HuggingFaceHub. See this blog post for details. This implementation will set similar expectations as Cohere and OpenAI embeddings API.

Add optional version attribute to BasePromptTemplate

I would like to track versions of PromptTemplates through time. an optional version attribute would help with this.

add_texts in VectorStore and add_example in SemanticSimilarityExampleSelector should return id

I'm storing examples in both a vectorstore and a database. I would like to add the vectorstore id field to the database after the example has been successfully indexed. I think I could do this if add_texts in vectorstore could return a list of vectorstore IDs, and add_example in SemanticSimilarityExampleSelector propagated the returned ID back as well.

improve pip install

add llm extra dependencies
add all extra dependencies

SQLDatabaseChain expects only select statements

Running some CRUD-like statements to an agent throws

ResourceClosedError: This result object does not return rows. It has been closed automatically.

From the implementation it appears that it always expects to see rows which are then cast to str and returned as part of the chain. What would the impact of modifying this behaviour be on the expected usecase for the SQL chain as it is?

https://github.com/hwchase17/langchain/blob/261029cef3e7c30277027f5d5283b87197eab520/langchain/sql_database.py#L70-L71

check if sql database exists when initializing

when initializing with a sql database, check to see if it already exists raise error if it does not

add tests for notebooks

to make sure they are up to date

fix sentence_transformers segfault in integration tests

per #104 needed to start skipping unit tests due to a segfault - look into this more and figure out what fixes are needed https://github.com/hwchase17/langchain/blob/95dd2f140e19d29bdb62d4dae2048e3edf0ee147/tests/integration_tests/embeddings/test_huggingface.py#L7

Is langchain able to process batch prompt?

Sorry to disturb,I wonder if langchain could process a batch of prompts or it just process each text by calling llm(text)?

add support for haystack

https://github.com/deepset-ai/haystack

Support Codex embeddings

Current implementation OpenAI embeddings are hard coded to return only text embeddings via. GPT-3. For example,

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
...
        responses = [
            self._embedding_func(text, engine=f"text-search-{self.model_name}-doc-001")
            for text in texts
        ]

    def embed_query(self, text: str) -> List[float]:
...
        embedding = self._embedding_func(
            text, engine=f"text-search-{self.model_name}-query-001"
        )

However, recent literature on reasoning shows CODEX to be more powerful on reasoning tasks than GPT-3. OpenAIEmbeddings should be modified to support both text and code embeddings.

Prompts for harrison/combine_documents_chain

Improved prompts for harrison/combine_documents_chain

"""QUESTION_PROMPT is the prompt used in phase 1 where we run the LLM on each chunk of the doc."""

question_prompt_template = """Use the following portion of a legal contract to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
{context}
Question: {question}
Relevant text, if any:"""

QUESTION_PROMPT = PromptTemplate(
    template=question_prompt_template, input_variables=["context", "question"]
)



"""  """

combine_prompt_template = """Given the following extracted parts of a contract and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any  kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as  defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
=========
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl
QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Source: 0-pl
Content: And we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Source: 24-pl
Content: And a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Source: 5-pl
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
Source: 34-pl
=========
FINAL ANSWER: The president did not mention Michael Jackson.
SOURCES:
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""

COMBINE_PROMPT = PromptTemplate(
    template=combine_prompt_template, input_variables=["summaries", "question"]
)



"""COMBINE_DOCUMENT_PROMPT is fed into CombineDocumentsChain() at langchain/chains/combine_documents.py"""
COMBINE_DOCUMENT_PROMPT = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",
    input_variables=["page_content", "source"],
)

SequentialChain that runs next chain on each output of the prior chain?

Given:

Chain A generates N outputs, e.g. we get back the text and immediately split it into a list based on post-processing we control and expect.
Chain B should then run over each of those outputs from Chain A.

Is there any way to do this elegantly in langchain? Perhaps some way to provide an output formatter to a chain, or some for_each pre- / post-processor? Or does this seem like just two independent chains with processing in between?

google/flan-t5-xxl and google/flan-t5-xl don't seem to work with HuggingFaceHub

First, when I load them I get a warning:

hf = HuggingFaceHub(repo_id="google/flan-t5-xl")
You're using a different task than the one specified in the repository. Be sure to know what you're doing :)

Then, when I use it in inference, I get gibberish.

hf("The capital of New York is")
'ew York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is'

If I run the API via requests, I get the expected answer

import requests

API_URL = "https://api-inference.huggingface.co/models/google/flan-t5-xl"
headers = {"Authorization": "Bearer api_org_xxxxxxxxxxxxxxxxxxxxxxxxxxx"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": "The capital of New York is",
})
print(output)
[{'generated_text': 'Albany'}]

Any suggestions?

Unicode error on Windows

File "C:\ProgramData\Anaconda3\envs\LangChain\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 8: character maps to <undefined>

more consistent printing of intermediate steps

right now, some chains print out intermediate steps, some dont. lets standardize it so that they all have the same flag which turns it on/off, and things are printed out in a standard way. Ideally colorized

make integration_tests fails currently

pytest tests/integration_tests
============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Users/delip/workspace/langchain
plugins: anyio-3.5.0, dotenv-0.5.2
collected 14 items / 1 error

==================================== ERRORS ====================================
____________ ERROR collecting tests/integration_tests/test_faiss.py ____________
ImportError while importing test module '/Users/delip/workspace/langchain/tests/integration_tests/test_faiss.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../opt/anaconda3/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration_tests/test_faiss.py:9: in <module>
    from langchain.faiss import FAISS
E   ModuleNotFoundError: No module named 'langchain.faiss'

Allow different LLM objects for each PromptTemplate in SequentialChain

I would like to use SequentialChain with the option to use a different LLM class at each step. The rationale behind this is that I am using different temperature settings for different prompts within my chain. I also potentially may use different models for each step in the future.

a rough idea for config -- have a json dict specifying LLM config, and pass in a list of configs (or list of LLM objects) which is the same length as the number of prompttemplates in the chain if you want to use different objects per chain, or one LLM object or config object in the case where you want to use the same for all

Add docs for custom LLMs

parrallize the map/reduce chain

make the calls in the "map" part concurrently

Support multiple memory modules in a chain

For example this list of entities but also the conversation summary

Add AI21 Embeddings support

0.13 does not pick up API token environment variable set in jupyter

I remember doing this in the early versions, but since 0.10 I can't seem to %env in Jupyter notebook to pass API tokens (tested this only with HuggingFaceHub, btw). Passing it as an argument, however, works fine.

Steps to reproduce:

Add AI21 text generation support

get data from a particular url

New search chain that doesnt use serpapi

too few free trials, too expensive

improve error messages for missing keys in pydantic classes

Currently is as below, which is way too ugly

    335 Create a new model by parsing and validating input data from keyword arguments.
    336 
    337 Raises ValidationError if the input data cannot be parsed to form a valid model.
    338 """
    339 # Uses something other than `self` the first arg to allow "self" as a settable attribute
--> 340 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    341 if validation_error:
    342     raise validation_error
...
---> 53     input_variables = values["input_variables"]
     54     template = values["template"]
     55     template_format = values["template_format"]

KeyError: 'input_variables'

Bibtex Citation

Do you have a bibtex citation for this repo?

e.g. something like the following (from https://github.com/bigscience-workshop/promptsource)

@misc{bach2022promptsource,
      title={PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts},
      author={Stephen H. Bach and Victor Sanh and Zheng-Xin Yong and Albert Webson and Colin Raffel and Nihal V. Nayak and Abheesht Sharma and Taewoon Kim and M Saiful Bari and Thibault Fevry and Zaid Alyafeai and Manan Dey and Andrea Santilli and Zhiqing Sun and Srulik Ben-David and Canwen Xu and Gunjan Chhablani and Han Wang and Jason Alan Fries and Maged S. Al-shaibani and Shanya Sharma and Urmish Thakker and Khalid Almubarak and Xiangru Tang and Xiangru Tang and Mike Tian-Jian Jiang and Alexander M. Rush},
      year={2022},
      eprint={2202.01279},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

using poetry for dependency management?

improve search chain by letting it parse results from json directly

Add HuggingFace Hub embeddings support

Run integration tests on GHA

maybe not the ones that require $$

Serialize BasePromptTemplate to json rather than a file

I want to save prompt templates in a JSONField in a django db. the current save() method on BasePromptTemplate outputs to a file rather than a json object. I'd prefer to have a method that loads the prompt to & from json, & I decide what to do with the JSON myself.

Cannot import `PromptTemplate`

Hello,

When running the "Map Reduce" demo, I see the error below:

ImportError                               Traceback (most recent call last)
Cell In [3], line 1
----> 1 from langchain import OpenAI, PromptTemplate, LLMChain
      2 from langchain.text_splitter import CharacterTextSplitter
      3 from langchain.chains.mapreduce import MapReduceChain

ImportError: cannot import name 'PromptTemplate' from 'langchain' (/Users/mteoh/projects/langchain_sandbox/.env/lib/python3.9/site-packages/langchain/__init__.py)

I see it defined here: https://github.com/hwchase17/langchain/blob/c02eb199b6587aeeb50fbb083693572bd2f030cc/langchain/prompts/prompt.py#L13

And mentioned here:
https://github.com/hwchase17/langchain/blob/c02eb199b6587aeeb50fbb083693572bd2f030cc/langchain/__init__.py#L35

However, when grepping in the library directory, I do not find it:

:~/projects/langchain_sandbox/.env/lib/python3.9/site-packages/langchain $ grep -r PromptTemplate .

Relevant versions of my packages:

$ pip freeze | grep "langchain\|openai"
langchain==0.0.16
openai==0.25.0

Any advice? Thanks! Excited about this work.

Add support for more embedding models

cohere, huggingface, ai21

langchain-ai / langchain Goto Github PK

langchain's Issues

Recommend Projects

Recommend Topics

Recommend Org