I think this is something interesting and no one has done it before

Can we have support for GGML as triton backend about lmdeploy HOT 12 CLOSED

internlm commented on June 9, 2024

Can we have support for GGML as triton backend

from lmdeploy.

Comments (12)

tikikun commented on June 9, 2024 1

but having ggml also can attract more contributors, also i'm curious whether inferLLM support cuda, doesn't see that on the page

from lmdeploy.

tpoisonooo commented on June 9, 2024

Indeed llama.cpp/GGML is famous (about 30k star), but the code is really hard to read.

InferLLM may be a better choice.

from lmdeploy.

tikikun commented on June 9, 2024

wow what is it can you elaborate a bit more, to what i understand it's just that you need to implement the ggml itself into the triton backend and later on you can reuse the triton backend and have the benefit of ggml update after every iteration

from lmdeploy.

tikikun commented on June 9, 2024

will not be re-using fastertransformer i guess but more about how the ggml itself can be integrated with tritonbackend

from lmdeploy.

tpoisonooo commented on June 9, 2024

will not be re-using fastertransformer i guess but more about how the ggml itself can be integrated with tritonbackend

Here is my opinion, when you integrate a inference backend, you have to make sure of service quality.
Let's take a assumption, after integration, if ggml has a bug, lmdeploy has responsibility to locate it or fix it.

The greater cost of software is maintenance, so we need to consider the code complexity of ggml.

from lmdeploy.

tpoisonooo commented on June 9, 2024

About InferLLM https://github.com/MegEngine/InferLLM

from lmdeploy.

tikikun commented on June 9, 2024

saw enable gpu on the cmake file still

from lmdeploy.

tpoisonooo commented on June 9, 2024

but having ggml also can attract more contributors, also i'm curious whether inferLLM support cuda, doesn't see that on the page

yes, attracting more contributors is a good reason to integrate ggml.

from lmdeploy.

tpoisonooo commented on June 9, 2024

but having ggml also can attract more contributors, also i'm curious whether inferLLM support cuda, doesn't see that on the page

InferLLM cuda part is WIP MegEngine/InferLLM#27

from lmdeploy.

tikikun commented on June 9, 2024

InferLLM will have many language barriers for non-chinese speakers contributors also, which is a very large audience, for most of the open source implementation it depends on how many contributors you have.

from lmdeploy.

tpoisonooo commented on June 9, 2024

InferLLM will have many language barriers for non-chinese speakers contributors also, which is a very large audience, for most of the open source implementation it depends on how many contributors you have.

Already passed on to InferLLM team .. QvQ

from lmdeploy.

lvhan028 commented on June 9, 2024

Close since no more activity over two weeks. Feel free to reopen it if it is still an issue

from lmdeploy.

Recommend Projects

Can we have support for GGML as triton backend about lmdeploy HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent