wisdomshell / codeshell-vscode Goto Github PK

An intelligent coding assistant plugin for Visual Studio Code, developed based on CodeShell

License: Apache License 2.0

JavaScript 15.95% CSS 32.97% TypeScript 51.08%

codeshell-vscode's Introduction

CodeShell VSCode Extension

codeshell-vscode项目是基于CodeShell大模型开发的支持Visual Studio Code的智能编码助手插件，支持python、java、c++/c、javascript、go等多种编程语言，为开发者提供代码补全、代码解释、代码优化、注释生成、对话问答等功能，旨在通过智能化的方式帮助开发者提高编程效率。

环境要求

node版本v18及以上
Visual Studio Code版本要求 1.68.1 及以上
CodeShell 模型服务已启动

编译插件

如果要从源码进行打包，需要安装 node v18 以上版本，并执行以下命令：

git clone https://github.com/WisdomShell/codeshell-vscode.git
cd codeshell-vscode
npm install
npm exec vsce package

然后会得到一个名为codeshell-vscode-${VERSION_NAME}.vsix的文件。

模型服务

llama_cpp_for_codeshell项目提供CodeShell大模型 4bits量化后的模型，模型名称为codeshell-chat-q4_0.gguf。以下为部署模型服务步骤：

编译代码

Linux / Mac(Apple Silicon设备)
```
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
cd llama_cpp_for_codeshell
make
```
在 macOS 上，默认情况下启用了Metal，启用Metal可以将模型加载到 GPU 上运行，从而显著提升性能。
Mac(非Apple Silicon设备)
```
git clone https://github.com/WisdomShell/llama_cpp_for_codeshell.git
cd llama_cpp_for_codeshell
LLAMA_NO_METAL=1 make
```
对于非 Apple Silicon 芯片的 Mac 用户，在编译时可以使用 LLAMA_NO_METAL=1 或 LLAMA_METAL=OFF 的 CMake 选项来禁用Metal构建，从而使模型正常运行。
Windows

您可以选择在Windows Subsystem for Linux中按照Linux的方法编译代码，也可以选择参考llama.cpp仓库中的方法，配置好w64devkit后再按照Linux的方法编译。

下载模型

在Hugging Face Hub上，我们提供了三种不同的模型，分别是CodeShell-7B、CodeShell-7B-Chat和CodeShell-7B-Chat-int4。以下是下载模型的步骤。

使用CodeShell-7B-Chat-int4模型推理，将模型下载到本地后并放置在以上代码中的 llama_cpp_for_codeshell/models 文件夹的路径

git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/blob/main/codeshell-chat-q4_0.gguf

使用CodeShell-7B、CodeShell-7B-Chat推理，将模型放置在本地文件夹后，使用TGI加载本地模型，启动模型服务

git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat
git clone https://huggingface.co/WisdomShell/CodeShell-7B

加载模型

CodeShell-7B-Chat-int4模型使用llama_cpp_for_codeshell项目中的server命令即可提供API服务

./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080

注意：对于编译时启用了 Metal 的情况下，若运行时出现异常，您也可以在命令行添加参数 -ngl 0 显式地禁用Metal GPU推理，从而使模型正常运行。

CodeShell-7B和CodeShell-7B-Chat模型，使用TGI加载本地模型，启动模型服务

模型服务[NVIDIA GPU]

对于希望使用NVIDIA GPU进行推理的用户，可以使用text-generation-inference项目部署CodeShell大模型。以下为部署模型服务步骤：

下载模型

在 Hugging Face Hub将模型下载到本地后，将模型放置在 $HOME/models 文件夹的路径下，即可从本地加载模型。

git clone https://huggingface.co/WisdomShell/CodeShell-7B-Chat

部署模型

使用以下命令即可用text-generation-inference进行GPU加速推理部署：

docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \
        --env LOG_LEVEL="info,text_generation_router=debug" \
        ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
        --model-id /data/CodeShell-7B-Chat --num-shard 1 \
        --max-total-tokens 5000 --max-input-length 4096 \
        --max-stop-sequences 12 --trust-remote-code

更详细的参数说明请参考text-generation-inference项目文档。

配置插件

VSCode中执行Install from VSIX...命令，选择codeshell-vscode-${VERSION_NAME}.vsix，完成插件安装。

设置CodeShell大模型服务地址
配置是否自动触发代码补全建议
配置自动触发代码补全建议的时间延迟
配置补全的最大tokens数量
配置问答的最大tokens数量
配置模型运行环境

注意：不同的模型运行环境可以在插件中进行配置。对于CodeShell-7B-Chat-int4模型，您可以在Code Shell: Run Env For LLMs选项中选择CPU with llama.cpp选项。而对于CodeShell-7B和CodeShell-7B-Chat模型，应选择GPU with TGI toolkit选项。

功能特性

1. 代码补全

自动触发代码建议
热键触发代码建议

在编码过程中，当停止输入时，代码补全建议可自动触发（在配置选项Auto Completion Delay中可设置为1~3秒），或者您也可以主动触发代码补全建议，使用快捷键Alt+\（对于Windows电脑）或option+\（对于Mac电脑）。

当插件提供代码建议时，建议内容以灰色显示在编辑器光标位置，您可以按下Tab键来接受该建议，或者继续输入以忽略该建议。

2. 代码辅助

对一段代码进行解释/优化/清理
为一段代码生成注释/单元测试
检查一段代码是否存在性能/安全性问题

在vscode侧边栏中打开插件问答界面，在编辑器中选中一段代码，在鼠标右键CodeShell菜单中选择对应的功能项，插件将在问答界面中给出相应的答复。

3. 智能问答

支持多轮对话
支持会话历史
基于历史会话（做为上文）进行多轮对话
可编辑问题，重新提问
对任一问题，可重新获取回答
在回答过程中，可以打断

在问答界面的代码块中，可以点击复制按钮复制该代码块，也可点击插入按钮将该代码块内容插入到编辑器光标处。

开源协议

Apache 2.0

Star History

codeshell-vscode's People

Contributors

Stargazers

Watchers

codeshell-vscode's Issues

检查性能问题，跟检查安全问题的prompts函数混淆了

https://github.com/WisdomShell/codeshell-vscode/blob/ac8923fb2486e4e54a80cda1f34d5ec586ba276f/src/CreatePrompt.ts#L23C1-L29C2

导入必要的模块

import socketserver
import http.server

回答

定义HTTP处理程序

class MyHttpRequestHandler(http.server.SimpleHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.end_headers()
self.wfile.write(b'This is a GET request.')

定义HTTP服务器

class MyHTTPServer(socketserver.TCPServer):
def init(self, server_address, request_handler_class):
socketserver.TCPServer.init(self, server_address, request_handler_class)
self.request_handler_class = request_handler_class

def handle_request(self, request, client_address):
    request_handler = self.request_handler_class(request, client_address, self)
    request_handler.handle()

启动HTTP服务器

server = MyHTTPServer(('', 8000), MyHttpRequestHandler)
print('Starting HTTP server on port 8000...')
server.serve_forever()

个HTTP服务器使用了SimpleHTTPRequestHandler类，该类是http.server模块中的一个类，用于处理HTTP请求。handle_request方法中的request_handler对象是SimpleHTTPRequestHandler类的一个实例，它负责处理HTTP请求并响应。

在启动HTTP服务器时，我们首先定义了MyHTTPServer类，该类继承自socketserver.TCPServer类。然后，我们定义了MyHttpRequestHandler类，该类也继承自http.server.SimpleHTTPRequestHandler类。然后，我们创建了一个MyHTTPServer对象，并将MyHttpRequestHandler类作为参数传递给request_handler_class属性。最后，我们使用server.serve_forever()方法启动HTTP服务器。

需要注意的是，这个HTTP服务器是单线程的，不能并发处理多个请求。如果您需要并发处理多个请求，可以使用多线程的方式实现。 |end>|

| 4 | | 2 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |

| | | | | | | | | | | | | |

模型下载和模型安装遇到错误提示

模型下载时遇到错误，看到“https://huggingface.co/WisdomShell/CodeShell-7B-Chat-int4/codeshell_q4_0.gguf”下没有codeshell_q4_0.gguf文件，只有个类似的文件codeshell-chat-q4_0.gguf。

手动下载下codeshell-chat-q4_0.gguf到models文件夹下后，安装模型提示错误
“./server : 无法将“./server”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正确，然后再试一次。
所在位置行:1 字符: 1

./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port ...

  + CategoryInfo          : ObjectNotFound: (./server:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CommandNotFoundException”

本地启动后，webview 页面空白

vscode、node 版本都符合要求。
一开始能启动成功，不知道忽然就不行了，拉了最新代码下来也不行，求大佬给看看。

Deployment models

I used the code provided in the Readme.md to deploy the model, but an error occurred after I executed the command. Why? I have carefully examined the code path of llama_cpp_for_codeshell. thank you!

The error message is:
(base) root@9020:~/llama_cpp_for_codeshell$ ./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8185
-bash: ./server: No such file or directory

使用 CodeShell-7B-Chat 未量化版本模型

请问codeshell的vscode插件能否调用本地部署的CodeShell-7B-Chat未量化版本的模型

插件中访问模型url添加后缀completion，模型那边部署是completions

插件中访问模型url添加后缀completion，模型那边部署是completions，
http://127.0.0.1:8000/v1/chat/completions,两边都不通，这代码咋发出来的，不自测试的吗

生成内容出现乱码

GPU方式运行模型服务出错

按照README运行命令：
./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080

报错信息如下：
ggml_metal_init: GPU name: Apple M1
ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 5461.34 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 558.13 MB
llama_new_context_with_model: max tensor size = 224.77 MB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 4096.00 MB, offs = 0
ggml_metal_add_buffer: allocated 'data ' buffer, size = 486.91 MB, offs = 4059267072, ( 4583.53 / 5461.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 1346.00 MB, ( 5929.53 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 552.02 MB, ( 6481.55 / 5461.34), warning: current allocated size is greater than the recommended max working set size
ggml_metal_graph_compute: command buffer 0 failed with status 5
GGML_ASSERT: ggml-metal.m:1459: false
Abort trap: 6

电脑信息：
M1 MacBook Pro
MacOS Sonoma 14.1

For anyone who can not build server on Windows

I tried to use WSL to run the server program. Here are some tips:

Make sure that scripts/build-info.sh uses LF instead of SRLF after migrating the whole project to WSL. This can be done easily with VSCode
After running server on WSL, you may access the service from Windows via http://127.0.0.1:PORT instead of http://ip.addr.of.eth0:PORT. According to link this seems to be a bug (or a feature).

docker run --gpus 'all' 报错，多卡不支持吗

A40双卡服务器，使用GPU部署服务时
docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data
--env LOG_LEVEL="info,text_generation_router=debug"
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3
--model-id /data/CodeShell-7B-Chat --num-shard 1
--max-total-tokens 5000 --max-input-length 4096
--max-stop-sequences 12 --trust-remote-code

报错如下：
024-01-19T08:15:44.995533Z ERROR warmup{max_input_length=4096 max_prefill_tokens=4096}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Error: Warmup(Generation("Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!"))
2024-01-19T08:15:45.052858Z ERROR text_generation_launcher: Webserver Crashed
2024-01-19T08:15:45.052873Z INFO text_generation_launcher: Shutting down shards
2024-01-19T08:15:45.395141Z INFO shard-manager: text_generation_launcher: Shard terminated rank=0
Error: WebserverFailed

部署服务器有乱码

我访问127.0.0.1:8080 进行对话时候有乱码

I can't find it in the VSCODE plugin market

请问可以通过openai风格的api访问吗?

RAG+Langchain to make its more powerful?

Adding to an AI-powered coding assistant involve advanced capabilities, such as understanding a codebase and its unique style, as well as providing your project specific assistance.

Retrieval-Augmented Generation for coding could work by scanning and indexing a project's codebase so that the model can retrieve relevant snippets of code or documentation when generating code or explanations.

If we can do this. It will be exceed any other AI assistants on the market :)

请问是否支持双卡设备部署

sudo docker run --gpus 'all' --shm-size 1g -p 9090:80 -v /home/llh/model_hub/WisdomShell_CodeShell-7B-Chat:/data  --env LOG_LEVEL="info,text_generation_router=debug" ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 --model-id /data --num-shard 2 --max-total-tokens 5000 --max-input-length 4096 --max-stop-sequences 12
--trust-remote-code

设备为双RTX6000，CUDA版本12.2，执行报错：

2023-10-25T03:43:06.938048Z  INFO text_generation_launcher: Args { model_id: "/data", revision: None, validation_workers: 2, sharded: None, num_shard: Some(2), quantize: None, dtype: None, trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 12, max_top_n_tokens: 5, max_input_length: 4096, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "02da084c587e", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-10-25T03:43:06.938115Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/data` do not contain malicious code.
2023-10-25T03:43:06.938126Z  INFO text_generation_launcher: Sharding model on 2 processes
2023-10-25T03:43:06.938328Z  INFO download: text_generation_launcher: Starting download process.
2023-10-25T03:43:09.670454Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-10-25T03:43:10.042577Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-10-25T03:43:10.042982Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-10-25T03:43:10.043031Z  INFO shard-manager: text_generation_launcher: Starting shard rank=1
2023-10-25T03:43:12.861796Z  WARN text_generation_launcher: Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2

2023-10-25T03:43:12.881244Z  WARN text_generation_launcher: Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2

2023-10-25T03:43:12.933483Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 266, in get_model
    raise ValueError("sharded is not supported for AutoModel")
ValueError: sharded is not supported for AutoModel

2023-10-25T03:43:12.952449Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 266, in get_model
    raise ValueError("sharded is not supported for AutoModel")
ValueError: sharded is not supported for AutoModel

2023-10-25T03:43:13.348876Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 266, in get_model
    raise ValueError("sharded is not supported for AutoModel")

ValueError: sharded is not supported for AutoModel
 rank=0
2023-10-25T03:43:13.446787Z ERROR text_generation_launcher: Shard 0 failed to start
2023-10-25T03:43:13.446824Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart
2023-10-25T03:43:13.448962Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 266, in get_model
    raise ValueError("sharded is not supported for AutoModel")

ValueError: sharded is not supported for AutoModel
 rank=1

Is is support deepseek coder yet ?

Hi, I was using llama cpp with deepseek coder connecting to this extension but code completion is not working (error about not string or something) but chat with server is works
Then I tried using llama_cpp_for_codeshell, it seems code completion is connecting (no error on VSCode) but the result is garbled characters

@@@@@@

The same also in chat with server

System : m2 pro

Thanks

生成内容有乱码

企业微信截图_dbb94793-3363-4b0b-9589-a8548e51e63f

【运行环境】：macos下，模型服务部署起来，直接通过从chrome浏览器【version: 117.0.5938.88（正式版本） (arm64)】里访问服务的形式

可以把 codeshell-chat-q4_0.gguf 往魔搭社区传一份吗？

您好，可以把 codeshell-chat-q4_0.gguf 往魔搭社区传一份吗？我的服务器访问不了HF

有没有在windows上本地部署（用GPU、不用docker）的办法

受环境限制，需要采用这种方式部署推理并使用VSCODE插件

使用TGI加载本地模型时报错

我是用TGI加载本地模型CodeShell-7B-Chat，但是加载过程中报错，我使用的命令如下：

sudo docker run --gpus 'all' --shm-size 1g -p 9090:80 -v /home/CodeShell/WisdomShell:/data --env LOG_LEVEL="info,text_generation_router=debug" ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 --model-id /data/CodeShell-7B-Chat --num-shard 1 --max-total-tokens 5000 --max-input-length 4096 --max-stop-sequences 12 --trust-remote-code

输出及报错信息如下：

2023-10-24T01:47:14.674168Z  INFO text_generation_launcher: Args { model_id: "/data/CodeShell-7B-Chat", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: true, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 12, max_top_n_tokens: 5, max_input_length: 4096, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "e2df4ceac2dc", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-10-24T01:47:14.674233Z  WARN text_generation_launcher: `trust_remote_code` is set. Trusting that model `/data/CodeShell-7B-Chat` do not contain malicious code.
2023-10-24T01:47:14.685067Z  INFO download: text_generation_launcher: Starting download process.
2023-10-24T01:47:21.825629Z  INFO text_generation_launcher: Files are already present on the host. Skipping download.

2023-10-24T01:47:23.136555Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-10-24T01:47:23.137089Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-10-24T01:47:30.969269Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:
 rank=0
2023-10-24T01:47:30.969335Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 4 rank=0
Error: ShardCannotStart
2023-10-24T01:47:31.066204Z ERROR text_generation_launcher: Shard 0 failed to start
2023-10-24T01:47:31.066262Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

我目前使用的环境如下：

显卡：nvidia v100
系统：ubuntu20.04
python版本：3.10
docker版本： 24.0.5

M1 Max编译模型代码出错

系统：macos 14
编译错误：
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wunreachable-code-break -Wunreachable-code-return -Wdouble-promotion -pthread
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi
I NVCCFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -Wno-pedantic -Xcompiler "-Wunreachable-code-break -Wunreachable-code-return -Wmissing-prototypes -Wextra-semi "
I LDFLAGS: -framework Accelerate -framework Foundation -framework Metal -framework MetalKit
I CC: Apple clang version 15.0.0 (clang-1500.0.40.1)
I CXX: Apple clang version 15.0.0 (clang-1500.0.40.1)

cc -I. -Icommon -D_XOPEN_SOURCE=600 -D_DARWIN_C_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_ACCELERATE -DACCELERATE_NEW_LAPACK -DACCELERATE_LAPACK_ILP64 -DGGML_USE_METAL -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wunreachable-code-break -Wunreachable-code-return -Wdouble-promotion -pthread -c ggml.c -o ggml.o
ggml.c:543:5: error: call to undeclared function 'clock_gettime'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
clock_gettime(CLOCK_MONOTONIC, &ts);
^
ggml.c:543:5: note: did you mean 'clock_set_time'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/mach/clock_priv.h:79:15: note: 'clock_set_time' declared here
kern_return_t clock_set_time
^
ggml.c:543:19: error: use of undeclared identifier 'CLOCK_MONOTONIC'
clock_gettime(CLOCK_MONOTONIC, &ts);
^
ggml.c:549:5: error: call to undeclared function 'clock_gettime'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
clock_gettime(CLOCK_MONOTONIC, &ts);
^
ggml.c:549:19: error: use of undeclared identifier 'CLOCK_MONOTONIC'
clock_gettime(CLOCK_MONOTONIC, &ts);
^
ggml.c:555:12: error: call to undeclared function 'clock'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
return clock();
^
ggml.c:559:12: error: use of undeclared identifier 'CLOCKS_PER_SEC'
return CLOCKS_PER_SEC/1000;
^
ggml.c:896:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:937:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:978:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1026:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1073:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % QK8_0 == 0);
^
ggml.c:1098:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(QK8_0 == 32);
^
ggml.c:1286:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(QK8_1 == 32);
^
ggml.c:1321:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % QK8_1 == 0);
^
ggml.c:1540:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1560:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1581:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1607:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
ggml.c:1634:5: error: call to undeclared function 'assert'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
assert(k % qk == 0);
^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.
make: *** [ggml.o] Error 1

Illegal instruction running the codeshell in termux for android

Hi,
I have build visx and server success on termux which is running in Android phone.
But illegal instruction appeared and server stopped.
Do you have any clue to let server continue running?

e.g which source code I should try to debug or do the modification?

./server -m codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8081
..............................................................................................
llama_new_context_with_model: n_ctx = 8192
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 1344.00 MB
llama_new_context_with_model: compute buffer total size = 558.13 MB
Illegal instruction
~/llama_cpp_for_codeshell $ free -m -h

vscode 左边任务栏codeshell一直连不上

vscode 左边任务栏codeshell一直连不上，服务部署之后有时候能看到底下codeshell在转动，但是不显示任何东西，请问是为什么

unable to load model

模型加载失败
codeshell-chat-q4_0.gguf 从魔搭下载的 cppv1.0中

部署模型时，跑不出结果（Unable to get the result of the service to the server）

it will stop at here without any more process for hours!!!!
yesterday it can run further but still can't finish !!!!
I dont know why ???? truly want to find someone to help me . thanks a lot

Docker启动量化模型的GPU模式

Docker启动GPU推理模式，可以使用INT4量化后的模型吗，启动好像出错了。

codeshll不支持shared吗

通过TGI托管的模型，启动方式为：
BNB_CUDA_VERSION=122 CUDA_VISIBLE_DEVICES=0,1 text-generation-launcher --model-id /data/llms/codeshell-7b-chat --tokenizer-config-path /data/llms/codeshell-7b-chat/tokenizer_config.json --sharded true --trust-remote-code --port=8080

CUDA_VISIBLE_DEVICES=0,1 和 --sharded true 设置后报错：
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

模型服务启动后网页能正常访问，但是vscode插件访问时服务端提示404异常？

我是将模型搭在单独的Linux服务器上的，vscode插件在本地编译后加载到vscode中将用户及工作区地址都指向服务器后，无法正常使用。

一般运行的疑问与GPU运行无法工作的情况

我遵照教程, 依次执行克隆和编译.

情况1 Normal

执行 make后, server服务正常运行

但是插件这边已确认与服务器正常链接，但是vscode插件中，有如下情况:

vscode左下角显示的codeshell服务链接总是显示失败，但是其实是能正常使用的
Chat界面常常出现无返回的情况，即使调整阈值
自动补全目前没见到生效，即使调整Auto Completion Delay。或者说这个自动补全跟我理解的有差异吗？并不是在编辑的过程中显示灰色代码在当前编辑的后面？

经过上述问题，我推测可能与速度有关，于是开始使用GPU的方式，但后续问题更加糟糕

情况2 GPU

执行make LLAMA_CUBLAS=1

I llama.cpp build info:
I UNAME_S:   Linux
I UNAME_P:   x86_64
I UNAME_M:   x86_64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -pthread -march=native -mtune=native
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -mtune=native
I NVCCFLAGS: --forward-unknown-to-host-compiler -use_fast_math -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread    -Wno-pedantic -Xcompiler "-Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -mtune=native "
I LDFLAGS:   -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC:        cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:       g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -mtune=native  examples/main/main.cpp ggml.o llama.o common.o sampling.o console.o grammar-parser.o k_quants.o ggml-cuda.o ggml-alloc.o ggml-backend.o -o main -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib

====  Run ./main -h for help.  ====

g++ -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread  -Wno-array-bounds -Wno-format-truncation -Wextra-semi -march=native -mtune=native  -Iexamples/server examples/server/server.cpp ggml.o llama.o common.o sampling.o grammar-parser.o k_quants.o ggml-cuda.o ggml-alloc.o ggml-backend.o -o server -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib

根据log来看是执行成功的

随后我执行 ./main -m ../models/CodeShell-7B-Chat-int4/codeshell-chat-q4_0.gguf --color -i -r "User:" 或 ./server -m ../models/CodeShell-7B-Chat-int4/codeshell-chat-q4_0.gguf --host 0.0.0.0 --port 8008 -mg 0

都会显示异常

zsh: segmentation fault (core dumped)  ./main -m ../models/CodeShell-7B-Chat-int4/codeshell-chat-q4_0.gguf -n 256  -

或

{"timestamp":1697764284,"level":"INFO","function":"main","line":1356,"message":"build info","build":1385,"commit":"7382f26"}
{"timestamp":1697764284,"level":"INFO","function":"main","line":1358,"message":"system info","n_threads":16,"n_threads_batch":-1,"total_threads":32,"system_info":"AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
zsh: segmentation fault (core dumped)  ./server -m ../models/CodeShell-7B-Chat-int4/codeshell-chat-q4_0.gguf --host

如此。

希望开发组能解答相关疑问，1、GPU是否能正常运行q4？ 2、CPU编译的server与插件之间除了Chat勉强可用外，其他功能均失效？是个人问题还是需要进一步完善呢？

Unable to map the real IP of the service to the server

I tried --host 127.0.0.1 but couldn't -- host my server's real IP.
Does it mean that I only need --host 127.0.0.1 to access this project through the real IP: port of my server?

插件可以支持OpenAI API格式吗？

选中代码后的相关功能优化

建议学习下cursor的界面，支持那种选中代码后针对代码段添加问题后再输入模型的模式，还有编辑的模式

FetchError:request to http://127.0.0.1:8080/completionfailed

When chat with CodeShell, show error as the following:
FetchError:request to http://127.0.0.1:8080/completionfailed, reason: connect ECONNREFUSED127.0.0.1:8080
来源：CodeShell VSCode Extension（扩展）。
How to fix this issue for above?

Encountered issue when trying to build server on Windows

After running make server, error occurs:

process_begin: CreateProcess(NULL, uname -s, ...) failed.
process_begin: CreateProcess(NULL, uname -p, ...) failed.
process_begin: CreateProcess(NULL, uname -m, ...) failed.
process_begin: CreateProcess(NULL, cc --version, ...) failed.
'cc' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
process_begin: CreateProcess(NULL, expr >= 070100, ...) failed.
process_begin: CreateProcess(NULL, expr >= 080100, ...) failed.
process_begin: CreateProcess(NULL, cc -dumpmachine, ...) failed.
I llama.cpp build info:
I UNAME_S:
I UNAME_P:
I UNAME_M:
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -DGGML_USE_K_QUANTS  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -Wdouble-promotion -march=native -mtune=native
I CXXFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -DGGML_USE_K_QUANTS  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn  -Wno-array-bounds -march=native -mtune=native
I NVCCFLAGS:  -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -DGGML_USE_K_QUANTS  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn    -Wno-pedantic -Xcompiler "-Wno-array-bounds -march=native -mtune=native "
I LDFLAGS:
'cc' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
I CC:
'head' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
I CXX:

'sh' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
make: *** [build-info.h] 错误 1

I'm not familiar with make. It seems that this Makefile does not support windows yet?

vscode 左边任务栏codeshell一直连不上，如何使用。

vscode右下角提示这个错误，如何解决啊。
h: request to http://localhost:8080/completion failed, reason: connect ECONNREFUSED 127.0.0.1:8080

MacOS 为什么会这样

Hugging Face Hub 404

404 problem with the provided Hugging Face Hub link. Are there no resources here?
https://huggingface.co/WisdomShell/CodeShell.gguf

code-server安装codeshell-vscode插件对话框空白

请问code-server安装codeshell-vscode插件怎么对话框是空白

Make 报错

如图， make 错误怎么解决？

魔搭4it模型中缺codeshell-chat-q4_0.gguf

如题，在魔搭的4it模型中缺codeshell-chat-q4_0.gguf文件，hg中的模型有。

启动TGI报错

环境: ubuntu1804
内存: 64g

执行的命令为
`

docker run --gpus 'all' --shm-size 1g -p 8080:80 -v /opt/models:/data
--env LOG_LEVEL="info,text_generation_router=debug"
ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3
--model-id /data/CodeShell-7B-Chat-int4 --num-shard 1
--max-total-tokens 5000 --max-input-length 4096
--max-stop-sequences 12`

日志:
`
2023-10-31T03:02:33.425212Z INFO text_generation_launcher: Args { model_id: "/data/CodeShell-7B-Chat-int4", revision: None, validation_workers: 2, sharded: None, num_shard: Some(1), quantize: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 12, max_top_n_tokens: 5, max_input_length: 4096, max_total_tokens: 5000, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, hostname: "f9c8519bf276", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false }
2023-10-31T03:02:33.425555Z INFO download: text_generation_launcher: Starting download process.
2023-10-31T03:02:54.427892Z WARN text_generation_launcher: No safetensors weights found for model /data/CodeShell-7B-Chat-int4 at revision None. Converting PyTorch weights to safetensors.

Error: DownloadError
2023-10-31T03:03:25.740295Z ERROR download: text_generation_launcher: Download encountered an error: Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 195, in download_weights
utils.convert_files(local_pt_files, local_st_files, discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 106, in convert_files
convert_file(pt_file, sf_file, discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 68, in convert_file
to_removes = _remove_duplicate_names(loaded, discard_names=discard_names)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 25, in _remove_duplicate_names
shareds = _find_shared_tensors(state_dict)

File "/opt/conda/lib/python3.9/site-packages/safetensors/torch.py", line 72, in _find_shared_tensors
if v.device != torch.device("meta") and storage_ptr(v) != 0 and storage_size(v) != 0:

AttributeError: 'list' object has no attribute 'device'
`

http://127.0.0.1:8080浏览器也上不去。

如何解决http://127.0.0.1:8080/上不去的问题。

下载模型

下载模型总是提示网络连接超时怎么办

功能特性>1.代码补全处例图缺失。

是否考虑用template 来向LLM 提问？

在使用extension时候，对LLM的提问基本上是固定下来的，比如
解释代码
优化代码

现在的prompt 太简单，是不是可以考虑使用instructive prompt with an example，在7b model里没有什么特别的效果。
但可能对更大的model会有很好的效果。
另外，如果在fine tune的时候也使用同一个template，这可能将对LLM的一致性有很大的提高。

通过text-generation-inference部署时报错

命令如下

docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/codeshell/CodeShell-7B-Chat:/data \
        --env LOG_LEVEL="info,text_generation_router=debug" \
        ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \
        --model-id WisdomShell/CodeShell-7B-Chat-int4 --num-shard 1 \
        --max-total-tokens 5000 --max-input-length 4096 \
        --max-stop-sequences 12 --trust-remote-code

2023-10-24T07:02:11.814270Z INFO download: text_generation_launcher: Starting download process.
Error: DownloadError
2023-10-24T07:02:16.019924Z ERROR download: text_generation_launcher: Download encountered an error: Traceback (most recent call last):

File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(

File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err

File "/opt/conda/lib/python3.9/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)

TimeoutError: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 790, in urlopen
response = self._make_request(

File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 491, in _make_request
raise new_e

File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)

File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1092, in _validate_conn
conn.connect()

File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()

File "/opt/conda/lib/python3.9/site-packages/urllib3/connection.py", line 218, in _new_conn
raise NewConnectionError(

urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f1b325b6fd0>: Failed to establish a new connection: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(

File "/opt/conda/lib/python3.9/site-packages/urllib3/connectionpool.py", line 844, in urlopen
retries = retries.increment(

File "/opt/conda/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/CodeShell-7B-Chat (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1b325b6fd0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 113, in download_weights
utils.weight_files(model_id, revision, extension)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 96, in weight_files
filenames = weight_hub_files(model_id, revision, extension)

File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/hub.py", line 25, in weight_hub_files
info = api.model_info(model_id, revision=revision)

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/hf_api.py", line 1677, in model_info
r = get_session().get(path, headers=headers, timeout=timeout, params=params)

File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 602, in get
return self.request("GET", url, **kwargs)

File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)

File "/opt/conda/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)

File "/opt/conda/lib/python3.9/site-packages/huggingface_hub/utils/_http.py", line 63, in send
return super().send(request, *args, **kwargs)

File "/opt/conda/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)

requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/CodeShell-7B-Chat (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1b325b6fd0>: Failed to establish a new connection: [Errno 110] Connection timed out'))"), '(Request ID: a7ec4c93-df44-453f-95e5-7e027bc442b9)')

国内的网络环境连接不上huggingface

-ngl 0 不加触发自动补齐服务就挂了，加了得10s+才能出来

正常吗？这咋玩
mac 32g inte

是否考虑多种语言?

Substitute hardcoded Chinese text with a language definition JSON file.
In the CODESELL extension settings, incorporate a language choice for the user (English, 中文).
When returning the answer from the LLM using the specified language.
I would like to make contributions on this task.

VSCode里安装插件后一直是空的转菊花

如题，截图如下，模型服务已启动
问题表现：在VSCode里的插件点击后，左侧加载使用的时候是空的，一直转菊花状态

企业微信截图_e225969a-d865-4bf6-be72-4fc0de542717

本地chrome浏览器里打开可访问

wisdomshell / codeshell-vscode Goto Github PK

codeshell-vscode's Introduction

CodeShell VSCode Extension

环境要求

编译插件

模型服务

编译代码

下载模型

加载模型

模型服务[NVIDIA GPU]

下载模型

部署模型

配置插件

功能特性

1. 代码补全

2. 代码辅助

3. 智能问答

开源协议

Star History

codeshell-vscode's People

Contributors

Stargazers

Watchers

Forkers

codeshell-vscode's Issues

导入必要的模块

定义HTTP处理程序

定义HTTP服务器

启动HTTP服务器

情况1 Normal

情况2 GPU

Recommend Projects

Recommend Topics

Recommend Org