langchain-ai / langchain Goto Github PK

🦜🔗 Build context-aware reasoning applications

License: MIT License

Python 37.77% Makefile 0.12% HTML 0.02% Dockerfile 0.01% TeX 0.01% JavaScript 0.01% Shell 0.09% XSLT 0.04% Jupyter Notebook 61.84% MDX 0.11%

langchain's Introduction

🦜️🔗 LangChain

⚡ Build context-aware reasoning applications ⚡

Looking for the JS/TS library? Check out LangChain.js.

To help you ship LangChain apps to production faster, check out LangSmith. LangSmith is a unified developer platform for building, testing, and monitoring LLM applications. Fill out this form to speak with our sales team.

Quick Install

With pip:

pip install langchain

With conda:

conda install langchain -c conda-forge

🤔 What is LangChain?

LangChain is a framework for developing applications powered by large language models (LLMs).

For these applications, LangChain simplifies the entire application lifecycle:

Open-source libraries: Build your applications using LangChain's open-source building blocks, components, and third-party integrations. Use LangGraph to build stateful agents with first-class streaming and human-in-the-loop support.
Productionization: Inspect, monitor, and evaluate your apps with LangSmith so that you can constantly optimize and deploy with confidence.
Deployment: Turn your LangGraph applications into production-ready APIs and Assistants with LangGraph Cloud.

Open-source libraries

langchain-core: Base abstractions and LangChain Expression Language.
langchain-community: Third party integrations.
- Some integrations have been further split into partner packages that only rely on langchain-core. Examples include langchain_openai and langchain_anthropic.
langchain: Chains, agents, and retrieval strategies that make up an application's cognitive architecture.
LangGraph: A library for building robust and stateful multi-actor applications with LLMs by modeling steps as edges and nodes in a graph. Integrates smoothly with LangChain, but can be used without it.

Productionization:

LangSmith: A developer platform that lets you debug, test, evaluate, and monitor chains built on any LLM framework and seamlessly integrates with LangChain.

Deployment:

LangGraph Cloud: Turn your LangGraph applications into production-ready APIs and Assistants.

🧱 What can you build with LangChain?

❓ Question answering with RAG

Documentation
End-to-end Example: Chat LangChain and repo

🧱 Extracting structured output

Documentation
End-to-end Example: SQL Llama2 Template

🤖 Chatbots

Documentation
End-to-end Example: Web LangChain (web researcher chatbot) and repo

And much more! Head to the Tutorials section of the docs for more.

🚀 How does LangChain help?

The main value props of the LangChain libraries are:

Components: composable building blocks, tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not
Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks

Off-the-shelf chains make it easy to get started. Components make it easy to customize existing chains and build new ones.

LangChain Expression Language (LCEL)

LCEL is the foundation of many of LangChain's components, and is a declarative way to compose chains. LCEL was designed from day 1 to support putting prototypes in production, with no code changes, from the simplest “prompt + LLM” chain to the most complex chains.

Overview: LCEL and its benefits
Interface: The standard Runnable interface for LCEL objects
Primitives: More on the primitives LCEL includes
Cheatsheet: Quick overview of the most common usage patterns

Components

Components fall into the following modules:

📃 Model I/O

This includes prompt management, prompt optimization, a generic interface for chat models and LLMs, and common utilities for working with model outputs.

📚 Retrieval

Retrieval Augmented Generation involves loading data from a variety of sources, preparing it, then searching over (a.k.a. retrieving from) it for use in the generation step.

🤖 Agents

Agents allow an LLM autonomy over how a task is accomplished. Agents make decisions about which Actions to take, then take that Action, observe the result, and repeat until the task is complete. LangChain provides a standard interface for agents, along with LangGraph for building custom agents.

📖 Documentation

Please see here for full documentation, which includes:

Introduction: Overview of the framework and the structure of the docs.
Tutorials: If you're looking to build something specific or are more of a hands-on learner, check out our tutorials. This is the best place to get started.
How-to guides: Answers to “How do I….?” type questions. These guides are goal-oriented and concrete; they're meant to help you complete a specific task.
Conceptual guide: Conceptual explanations of the key parts of the framework.
API Reference: Thorough documentation of every class and method.

🌐 Ecosystem

🦜🛠️ LangSmith: Trace and evaluate your language model applications and intelligent agents to help you move from prototype to production.
🦜🕸️ LangGraph: Create stateful, multi-actor applications with LLMs. Integrates smoothly with LangChain, but can be used without it.
🦜🏓 LangServe: Deploy LangChain runnables and chains as REST APIs.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see here.

🌟 Contributors

langchain's People

Contributors

Stargazers

Watchers

Forkers

johnshahawy zlapp stanleyjacob cogscides commune-ai stjordanis mkozakov techthiyanes oqustudy gpalrepo researchoor codeaudit ratebseirawan issam9 brains4math ashiqrh camjw mwhammond tomeras91 shailja-thakur ricklentz delip saurabhsv nlarusstone desis123 eyurtsev obi1kenobi nashid echallenge lightml thesved jsv4 jheitzeb jim-salmons jerlich markey glavin001 joeteicher asamant21 justanotheratom razcle maksymdel dillchen sonicviz hansenq shyamal-anadkat andrewgleave tonyabracadabra lefromage thecooltechguy shoelsch scottleibrand coyotespike johnmcdonnell andersenchen cameronccohen moozilla sebastianrtj thepok shobith huntergerlach furqanrydhan jaratai kwadhera denoyanyo ardychanz mbui0327 soniamali serenaburgessxsv kainblake cclauss alirezabayatmk kekewind awilliamson10 pravinshahi0007 komalthombare1 agola11 fajarvm jrcribb nanderoo johnnay ssahgal ghabs apumapho varunnair18 devonbrackbill eternalerrors benderv hj3938 rustmaster1 abi zanderchase parthchadha danielsc gojira mkirch stevenhoelscher fblissjr dheerajiiitv javafxpert

langchain's Issues

Should be able to pass API keys as params rather than requiring them to be in the environment

It's just a bit annoying, I want to use this library in production and I currently store credentials not in the environment.

I think the ideal API is like all the AWS SDKs where you can either stick them in the environment OR pass them as params to the llm constructor.

I can do a PR for this if you're accepting PRs?

add vector database (FAISS) integration

Unicode error on Windows

File "C:\ProgramData\Anaconda3\envs\LangChain\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 8: character maps to <undefined>

SelfAskWithSearchChain error when followup required

I have just updated to version 0.0.7

1- When running a simple question like: "What is the capital of Idaho?" , the result is OK

2- When running a question like: "What is the hometown of the reigning men's U.S. Open champion?"

I got the following error:
What is the hometown of the reigning men's U.S. Open champion?
Are follow up questions needed here: Yes.
Follow up: Who is the reigning men's U.S. Open champion?
(.......)

File ~/anaconda3/envs/nlp/lib/python3.10/site-packages/langchain/chains/self_ask_with_search/base.py:166, in SelfAskWithSearchChain.run(self, question)
152 def run(self, question: str) -> str:
153 """Run self ask with search chain.
154
155 Args:
(...)
...
--> 107 elif "snippet" in res["organic_results"][0].keys():
108 toret = res["organic_results"][0]["snippet"]
109 else:
KeyError: 'organic_results'

pip install requirements.txt fails on conda on mac

I get this error when I run pip install -r requirements.txt

ERROR: Could not find a version that satisfies the requirement faiss (from versions: none)
ERROR: No matching distribution found for faiss

As per this issue, requirements.txt should be updated to faiss-cpu.

Add docs for custom agents

get data from a particular url

parrallize the map/reduce chain

make the calls in the "map" part concurrently

google/flan-t5-xxl and google/flan-t5-xl don't seem to work with HuggingFaceHub

First, when I load them I get a warning:

hf = HuggingFaceHub(repo_id="google/flan-t5-xl")
You're using a different task than the one specified in the repository. Be sure to know what you're doing :)

Then, when I use it in inference, I get gibberish.

hf("The capital of New York is")
'ew York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is The capital of the world New York is'

If I run the API via requests, I get the expected answer

import requests

API_URL = "https://api-inference.huggingface.co/models/google/flan-t5-xl"
headers = {"Authorization": "Bearer api_org_xxxxxxxxxxxxxxxxxxxxxxxxxxx"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": "The capital of New York is",
})
print(output)
[{'generated_text': 'Albany'}]

Any suggestions?

improve search chain by letting it parse results from json directly

Serialize BasePromptTemplate to json rather than a file

I want to save prompt templates in a JSONField in a django db. the current save() method on BasePromptTemplate outputs to a file rather than a json object. I'd prefer to have a method that loads the prompt to & from json, & I decide what to do with the JSON myself.

SequentialChain that runs next chain on each output of the prior chain?

Given:

Chain A generates N outputs, e.g. we get back the text and immediately split it into a list based on post-processing we control and expect.
Chain B should then run over each of those outputs from Chain A.

Is there any way to do this elegantly in langchain? Perhaps some way to provide an output formatter to a chain, or some for_each pre- / post-processor? Or does this seem like just two independent chains with processing in between?

check if sql database exists when initializing

when initializing with a sql database, check to see if it already exists raise error if it does not

New search chain that doesnt use serpapi

too few free trials, too expensive

Add optional version attribute to BasePromptTemplate

I would like to track versions of PromptTemplates through time. an optional version attribute would help with this.

improve error messages for missing keys in pydantic classes

Currently is as below, which is way too ugly

    335 Create a new model by parsing and validating input data from keyword arguments.
    336 
    337 Raises ValidationError if the input data cannot be parsed to form a valid model.
    338 """
    339 # Uses something other than `self` the first arg to allow "self" as a settable attribute
--> 340 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
    341 if validation_error:
    342     raise validation_error
...
---> 53     input_variables = values["input_variables"]
     54     template = values["template"]
     55     template_format = values["template_format"]

KeyError: 'input_variables'

create github action to auto publish new pypi package when version changes

Add AI21 Embeddings support

using poetry for dependency management?

Add vector database semantic search chain

Add support for OpenAI Completion's Suffix Parameter

Hi, is there any chance you could add support for the suffix parameter to the OpenAI class/call?

https://beta.openai.com/docs/api-reference/completions/create

I could land this if you don't have the time, but it should be simple enough that you could land it faster than I could fork the repo and make a PR.

Support Codex embeddings

Current implementation OpenAI embeddings are hard coded to return only text embeddings via. GPT-3. For example,

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
...
        responses = [
            self._embedding_func(text, engine=f"text-search-{self.model_name}-doc-001")
            for text in texts
        ]

    def embed_query(self, text: str) -> List[float]:
...
        embedding = self._embedding_func(
            text, engine=f"text-search-{self.model_name}-query-001"
        )

However, recent literature on reasoning shows CODEX to be more powerful on reasoning tasks than GPT-3. OpenAIEmbeddings should be modified to support both text and code embeddings.

Support multiple memory modules in a chain

For example this list of entities but also the conversation summary

add tests for notebooks

to make sure they are up to date

Prompts for harrison/combine_documents_chain

Improved prompts for harrison/combine_documents_chain

"""QUESTION_PROMPT is the prompt used in phase 1 where we run the LLM on each chunk of the doc."""

question_prompt_template = """Use the following portion of a legal contract to see if any of the text is relevant to answer the question. 
Return any relevant text verbatim.
{context}
Question: {question}
Relevant text, if any:"""

QUESTION_PROMPT = PromptTemplate(
    template=question_prompt_template, input_variables=["context", "question"]
)



"""  """

combine_prompt_template = """Given the following extracted parts of a contract and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in  relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an  injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other)  right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation  in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any  kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as  defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
=========
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl
QUESTION: What did the president say about Michael Jackson?
=========
Content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. \n\nGroups of citizens blocking tanks with their bodies. Everyone from students to retirees teachers turned soldiers defending their homeland.
Source: 0-pl
Content: And we won’t stop. \n\nWe have lost so much to COVID-19. Time with one another. And worst of all, so much loss of life. \n\nLet’s use this moment to reset. Let’s stop looking at COVID-19 as a partisan dividing line and see it for what it is: A God-awful disease.  \n\nLet’s stop seeing each other as enemies, and start seeing each other for who we really are: Fellow Americans.  \n\nWe can’t change how divided we’ve been. But we can change how we move forward—on COVID-19 and other issues we must face together. \n\nI recently visited the New York City Police Department days after the funerals of Officer Wilbert Mora and his partner, Officer Jason Rivera. \n\nThey were responding to a 9-1-1 call when a man shot and killed them with a stolen gun. \n\nOfficer Mora was 27 years old. \n\nOfficer Rivera was 22. \n\nBoth Dominican Americans who’d grown up on the same streets they later chose to patrol as police officers. \n\nI spoke with their families and told them that we are forever in debt for their sacrifice, and we will carry on their mission to restore the trust and safety every community deserves.
Source: 24-pl
Content: And a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world. \n\nAnd I’m taking robust action to make sure the pain of our sanctions  is targeted at Russia’s economy. And I will use every tool at our disposal to protect American businesses and consumers. \n\nTonight, I can announce that the United States has worked with 30 other countries to release 60 Million barrels of oil from reserves around the world.  \n\nAmerica will lead that effort, releasing 30 Million barrels from our own Strategic Petroleum Reserve. And we stand ready to do more if necessary, unified with our allies.  \n\nThese steps will help blunt gas prices here at home. And I know the news about what’s happening can seem alarming. \n\nBut I want you to know that we are going to be okay.
Source: 5-pl
Content: More support for patients and families. \n\nTo get there, I call on Congress to fund ARPA-H, the Advanced Research Projects Agency for Health. \n\nIt’s based on DARPA—the Defense Department project that led to the Internet, GPS, and so much more.  \n\nARPA-H will have a singular purpose—to drive breakthroughs in cancer, Alzheimer’s, diabetes, and more. \n\nA unity agenda for the nation. \n\nWe can do this. \n\nMy fellow Americans—tonight , we have gathered in a sacred space—the citadel of our democracy. \n\nIn this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself. \n\nIt is in this moment that our character is formed. Our purpose is found. Our future is forged. \n\nWell I know this nation.
Source: 34-pl
=========
FINAL ANSWER: The president did not mention Michael Jackson.
SOURCES:
QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""

COMBINE_PROMPT = PromptTemplate(
    template=combine_prompt_template, input_variables=["summaries", "question"]
)



"""COMBINE_DOCUMENT_PROMPT is fed into CombineDocumentsChain() at langchain/chains/combine_documents.py"""
COMBINE_DOCUMENT_PROMPT = PromptTemplate(
    template="Content: {page_content}\nSource: {source}",
    input_variables=["page_content", "source"],
)

add support for more vector databases

elasticsearch, pinecone

0.13 does not pick up API token environment variable set in jupyter

I remember doing this in the early versions, but since 0.10 I can't seem to %env in Jupyter notebook to pass API tokens (tested this only with HuggingFaceHub, btw). Passing it as an argument, however, works fine.

Steps to reproduce:

Combining Conversation Chain and Agents

Is there a current API for hooking a conversational chain with an agent so, during the conversation, actions can be performed (e.g. db lookup etc)?

If not, how would you see this working architecturally?

Is langchain able to process batch prompt?

Sorry to disturb,I wonder if langchain could process a batch of prompts or it just process each text by calling llm(text)?

Add HuggingFace Hub embeddings support

Add support for more embedding models

cohere, huggingface, ai21

Add Cohere Embeddings support

Cannot import `PromptTemplate`

Hello,

When running the "Map Reduce" demo, I see the error below:

ImportError                               Traceback (most recent call last)
Cell In [3], line 1
----> 1 from langchain import OpenAI, PromptTemplate, LLMChain
      2 from langchain.text_splitter import CharacterTextSplitter
      3 from langchain.chains.mapreduce import MapReduceChain

ImportError: cannot import name 'PromptTemplate' from 'langchain' (/Users/mteoh/projects/langchain_sandbox/.env/lib/python3.9/site-packages/langchain/__init__.py)

I see it defined here: https://github.com/hwchase17/langchain/blob/c02eb199b6587aeeb50fbb083693572bd2f030cc/langchain/prompts/prompt.py#L13

And mentioned here:
https://github.com/hwchase17/langchain/blob/c02eb199b6587aeeb50fbb083693572bd2f030cc/langchain/__init__.py#L35

However, when grepping in the library directory, I do not find it:

:~/projects/langchain_sandbox/.env/lib/python3.9/site-packages/langchain $ grep -r PromptTemplate .

Relevant versions of my packages:

$ pip freeze | grep "langchain\|openai"
langchain==0.0.16
openai==0.25.0

Any advice? Thanks! Excited about this work.

Add docs for custom LLMs

make integration_tests fails currently

pytest tests/integration_tests
============================= test session starts ==============================
platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0
rootdir: /Users/delip/workspace/langchain
plugins: anyio-3.5.0, dotenv-0.5.2
collected 14 items / 1 error

==================================== ERRORS ====================================
____________ ERROR collecting tests/integration_tests/test_faiss.py ____________
ImportError while importing test module '/Users/delip/workspace/langchain/tests/integration_tests/test_faiss.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../opt/anaconda3/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/integration_tests/test_faiss.py:9: in <module>
    from langchain.faiss import FAISS
E   ModuleNotFoundError: No module named 'langchain.faiss'

Support HuggingFaceHub embeddings endpoint

The current implementation of HuggingFaceEmbeddings local sentence-transformers to derive the encodings. This can be limiting as it requires a fairly capable machine to download the model, load it, and run inference.

An alternative is to support embeddings derived directly via the HuggingFaceHub. See this blog post for details. This implementation will set similar expectations as Cohere and OpenAI embeddings API.

Add support for pinecone vector DB

Add AI21 text generation support

Allow different LLM objects for each PromptTemplate in SequentialChain

I would like to use SequentialChain with the option to use a different LLM class at each step. The rationale behind this is that I am using different temperature settings for different prompts within my chain. I also potentially may use different models for each step in the future.

a rough idea for config -- have a json dict specifying LLM config, and pass in a list of configs (or list of LLM objects) which is the same length as the number of prompttemplates in the chain if you want to use different objects per chain, or one LLM object or config object in the case where you want to use the same for all

Run integration tests on GHA

maybe not the ones that require $$

SQLDatabaseChain expects only select statements

Running some CRUD-like statements to an agent throws

ResourceClosedError: This result object does not return rows. It has been closed automatically.

From the implementation it appears that it always expects to see rows which are then cast to str and returned as part of the chain. What would the impact of modifying this behaviour be on the expected usecase for the SQL chain as it is?

https://github.com/hwchase17/langchain/blob/261029cef3e7c30277027f5d5283b87197eab520/langchain/sql_database.py#L70-L71

Bibtex Citation

Do you have a bibtex citation for this repo?

e.g. something like the following (from https://github.com/bigscience-workshop/promptsource)

@misc{bach2022promptsource,
      title={PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts},
      author={Stephen H. Bach and Victor Sanh and Zheng-Xin Yong and Albert Webson and Colin Raffel and Nihal V. Nayak and Abheesht Sharma and Taewoon Kim and M Saiful Bari and Thibault Fevry and Zaid Alyafeai and Manan Dey and Andrea Santilli and Zhiqing Sun and Srulik Ben-David and Canwen Xu and Gunjan Chhablani and Han Wang and Jason Alan Fries and Maged S. Al-shaibani and Shanya Sharma and Urmish Thakker and Khalid Almubarak and Xiangru Tang and Xiangru Tang and Mike Tian-Jian Jiang and Alexander M. Rush},
      year={2022},
      eprint={2202.01279},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

add support for haystack

https://github.com/deepset-ai/haystack

improve pip install

add llm extra dependencies
add all extra dependencies

more consistent printing of intermediate steps

right now, some chains print out intermediate steps, some dont. lets standardize it so that they all have the same flag which turns it on/off, and things are printed out in a standard way. Ideally colorized

Add Program of Thoughts Prompting or Program-Aided Language Model

Add NLTK support for splitting text

https://github.com/hwchase17/langchain/blob/master/langchain/text_splitter.py

Add huggingface tokenizer support for splitting text

Would be good to have some methods that split on tokens as in the OpenAI example

https://github.com/openai/openai-cookbook/blob/459afa7d9bf026c4434f54458dc7d9e7d9f9f5fe/examples/Obtain_dataset.ipynb

add_texts in VectorStore and add_example in SemanticSimilarityExampleSelector should return id

I'm storing examples in both a vectorstore and a database. I would like to add the vectorstore id field to the database after the example has been successfully indexed. I think I could do this if add_texts in vectorstore could return a list of vectorstore IDs, and add_example in SemanticSimilarityExampleSelector propagated the returned ID back as well.

fix sentence_transformers segfault in integration tests

per #104 needed to start skipping unit tests due to a segfault - look into this more and figure out what fixes are needed https://github.com/hwchase17/langchain/blob/95dd2f140e19d29bdb62d4dae2048e3edf0ee147/tests/integration_tests/embeddings/test_huggingface.py#L7