Comments (19)
You are correct we are not leveraging async implementations at the moment. The main reason is that I feel like most code bases are not written in the async paradigm and it is quite hard and not always possible to transition from sync to async.
To mitigate this we currently run agents in different threads so at least one agent will not block the whole app.
As we move forward I would love to see Chainlit support async implementations :)
from chainlit.
For streaming, Chainlit already supports streaming with openai
, langchain
and any python code. See https://docs.chainlit.io/concepts/streaming/python :)
from chainlit.
Thank you for the link @hooman-bayer , I pretty much agree with you and we also want to see where the community wants Chainlit to go between staying a rapid prototyping tool or deploy to production and scale.
As for streaming final responses in LangChain @segevtomer I found this interesting issue hwchase17/langchain#2483. I'll dig more into it!
from chainlit.
@segevtomer For clarity, all the intermediary steps are already streamed, including the last one which is the final response. Then the final response is sent as a stand alone message (not an intermediary step) in the UI without any overhead so the user can see it.
What I am saying here is that the only improvement we can do is to stream the last tokens after your stop token (usually Final answer:) directly without waiting for the completion to end. This is what hwchase17/langchain#2483 does.
While this would be a win for the user experience, the actual time gain will be very limited, since it only impacts a few tokens at the very end of the whole process.
from chainlit.
@willydouhard happy to contribute at some point in the near future, in case it becomes part of your roadmap, I have been building an app that is fully extending langchain to async
including tools ( using their class signature that offers arun
). But you are 100% right that most of libraries are only offering sync APIs.
from chainlit.
For streaming, Chainlit already supports streaming with
openai
,langchain
and any python code. See https://docs.chainlit.io/concepts/streaming/python :)
Correct, I saw it but its again kind of faking :) it and the client needs to wait anyway till the response is completed from OpenAI endpoint that might not be desired. For instance openai.ChatCompletion.acreate
creates an SSE
and directly passes the response to the client token by token as its generated by ChatGPT so the latency is way smaller.
Also imagine in case of your WebSocket for action agents this can bring a lot better experience for the user.
from chainlit.
Interesting, in my understanding, openai.ChatCompletion.create
was not waiting for the whole response to be generated to start streaming tokens. Do you happen to have a link to a resource covering that in more details?
from chainlit.
To add to the conversation, I tried to use a few different chain classes and couldn't get streaming to work on any of them (only once the response was complete it was updated on screen).
from chainlit.
For LangChain, only the intermediary steps are streamed at the moment. If you configured your LLM with streaming=True
you should see the intermediary steps being streamed if you unfold them in the UI (click on the Using...
button).
I will take a look on how to also stream the final response!
from chainlit.
@willydouhard see this issue from openai python sdk for more details. In general if you want to keep the tool as a simple POC for only multiple users I see it great as is with sync
but what if we want to scale to 100 users or so? I think running all of this on different threads is not so realistic and modern ( user experience with sync
also wont be great), probably async
is the way to go.
from chainlit.
@willydouhard that is using AsyncCallbackHandler
I mentioned above. Using that you get access to on_llm_new_token(self, token: str, **kwargs: Any) -> None:
that then you could customize the return the output token by token to the client
from chainlit.
Thanks for the update @willydouhard.
I wouldn't say it's "very limited", I agree that there will still be delay because we have to wait for the final prompt in the chain to occur, however it is still very valuable to stream it. Let's say the final response is over 1k tokens long, streaming that will still be significant for UX.
from chainlit.
I have two problems with this:
- There is absolute no indication, that the LLM is doing something, not even a wait cursor
- If the text generation runs longer than a few seconds, the UI loses connection to the server, and the message is never displayed.
So what I would need, is either streaming of the final result, or a configurable timeout before the UI loses connection to the server and some spinner to indicate, that something is happening. Preferably both.
from chainlit.
@Banbury what is your setup? Are your running open source models like gpt4all locally or are you using openai api?
from chainlit.
I have been trying to run Vicuna locally with langchain. It does work more or less, but only for short texts.
from chainlit.
So I have seen issues for local models and we are investing them. For API's models everything should work fine. It would be helpful If you can share a code snippet so I can try to reproduce.
from chainlit.
This is the code I have been working on. It's simple enough.
import chainlit as cl
from llama_cpp import Llama
from langchain.llms import LlamaCpp
from langchain.embeddings import LlamaCppEmbeddings
from langchain import PromptTemplate, LLMChain
llm = LlamaCpp(model_path="Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_0.bin", seed=0, n_ctx=2048, max_tokens=512, temperature=0.1, streaming=True)
template = """
### Instruction:
{message}
### Response:
"""
@cl.langchain_factory
def factory():
prompt = PromptTemplate(template=template, input_variables=["message"])
llm_chain = LLMChain(prompt=prompt, llm=llm, verbose=True)
return llm_chain
I have been using the same model with llama.cpp and llama-cpp-python without problems.
from chainlit.
Thank you, I am going to prioritize this!
from chainlit.
Here is the proposal to move chainlit to async by default #40. Feedback wanted!
from chainlit.
Related Issues (20)
- Azure OpenAI support HOT 3
- chainlit use graphsignal err HOT 1
- -w argument in the chainlit run command doesn't work yet HOT 7
- Ability to add additional UI component types (e. g. new tab, custom code)
- AttributeError: module 'select' has no attribute 'epoll'. HOT 2
- "Waiting for thread pool to idle before working"? HOT 1
- Cannot run on macOS HOT 2
- Streaming=True not working when i integrate Langchain. HOT 3
- Really bad memory leak HOT 3
- Error: Invalid value: File does not exist: target.py HOT 3
- Why does the window that pops up locally after I specify the ip and port not display any content? HOT 21
- Feature request: Bind Actions / Ask Users to the TextArea input menu HOT 2
- Having the same issue here. Running on a linux server, no error message, just a blank page. Specifying the host and port did not help in any way. HOT 7
- LAN SSL requirement HOT 2
- It consumes a significant amount of computer resources. HOT 12
- [Bug] Getting spammed with SSL Handshake Error when ru HOT 12
- work without any api key HOT 2
- Setting up the chainlit environment locally HOT 1
- After running the chainlit run app.py getting the error "Could not reach the server" HOT 3
- Deploy in PythonAnywhere HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chainlit.