Comments (5)
what framework are you using?
from qwen.
what framework are you using?
用的 qwen GitHub 官方的脚本
from qwen.
which one?
from qwen.
which one?
from qwen.
The web_demo.py
script serves as a demonstrative tool and is explicitly not intended for deployment in a production environment due to its inherent lack of production-grade capabilities. Despite being capable of managing multiple concurrent requests, it doesn't process them simultaneously, instead utilizing a queue mechanism implemented by gradio
. In terms of GPU utilization, when handling each request, the transformers
library employs a basic model parallelism approach for multi-GPU inference, meaning that only a single GPU is actively engaged at any given moment.
For deploying a solution in a production setting, this repository does not cater to those requirements, and such tasks should be managed by your IT professionals. As an alternative, consider projects like FastChat
combined with vLLM
. This setup allows for parallel execution of multiple requests if your GPU has adequate memory and leverages tensor parallelism, thus maximizing resource utilization by engaging all GPUs concurrently.
from qwen.
Related Issues (20)
- [BUG] Function Calling 示例有错误,最新的 openai sdk 运行时提示 api 已经废弃 HOT 2
- 请问哪里可以找到qwen用于vllm的jinja template? HOT 1
- [BUG] <title>执行eval中的eval_plugin进行评测 有一个agent从huggingface_hub拉包错误 HOT 1
- 请问可以使用高通的npu进行部署和推理吗? HOT 1
- 微调完成后使用llama_factory的vllm和qwen官方的vllm部署方式启动返回的不一样 HOT 2
- 💡 [REQUEST] - <使用ollama来调用qwen:14B时,怎么设置输出文本长度呢> HOT 1
- [BUG] <title>fastchat + vLLM +OpenAI API 调用qwen模型,数据不需要预先处理吗 HOT 1
- 本地部署后,运行很慢啊 HOT 4
- 请问下 2.5什么时候开源呀? HOT 1
- File "finetune.py", line 412, in <module> train() File "finetune.py", line 384, in train model = get_peft_model(model, lora_config) File "/opt/conda/envs/qwen/lib/python3.8/site-packages/peft/mapping.py", line 123, in get_peft_model peft_config.base_model_name_or_path = model.__dict__.get("name_or_path", None) AttributeError: 'NoneType' object has no attribute '__dict__'[BUG] <title> HOT 2
- qwen 14b 不微调的情况下,问相同的问题,模型输出也不太一致,是为什么?温度已经设置成0了 HOT 2
- [BUG] <title>torch.cuda.OutOfMemoryError: CUDA out of memory. HOT 1
- Qwen pre_trained, 打印一下内容,就没有了,不确定是否训练完成 HOT 2
- [BUG] 转换Qwen1.5-14B报错 HOT 1
- 多轮对话训练数据格式组织 HOT 1
- [BUG] Questionable embedding feature shape extracted from Qwen-7B-Chat HOT 2
- [BUG] <title> 命令行运行参数解析错误
- 工具调用的时候,本来用户没有输入参数,但是模型会自动幻想参数 HOT 1
- [BUG] 多轮对话的 prompt 应该如何构建?
- [BUG] <title>使用finetune.sh微调的问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qwen.