Giter VIP home page Giter VIP logo

llm.nvim's Introduction

LLM powered development for Neovim

llm.nvim is a plugin for all things LLM. It uses llm-ls as a backend.

This project is influenced by copilot.vim and tabnine-nvim

Formerly hfcc.nvim.

demonstration use of llm.nvim

Note

When using the Inference API, you will probably encounter some limitations. Subscribe to the PRO plan to avoid getting rate limited in the free tier.

https://huggingface.co/pricing#pro

Features

Code completion

This plugin supports "ghost-text" code completion, à la Copilot.

Choose your model

Requests for code generation are made via an HTTP request.

You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the APIs listed in backend.

Always fit within the context window

The prompt sent to the model will always be sized to fit within the context window, with the number of tokens determined using tokenizers.

Configuration

Backend

llm.nvim can interface with multiple backends hosting models.

You can override the url of the backend with the LLM_NVIM_URL environment variable. If url is nil, it will default to the Inference API's default url

When api_token is set, it will be passed as a header: Authorization: Bearer <api_token>.

llm-ls will try to add the correct path to the url to get completions if it does not already end with said path. You can disable this behavior by setting disable_url_path_completion to true.

Inference API

backend = "huggingface"

API

  1. Create and get your API token from here https://huggingface.co/settings/tokens.

  2. Define how the plugin will read your token. For this you have multiple options, in order of precedence:

    1. Pass api_token = <your token> in plugin opts - this is not recommended if you use a versioning tool for your configuration files
    2. Set the LLM_NVIM_HF_API_TOKEN environment variable
    3. You can define your HF_HOME environment variable and create a file containing your token at $HF_HOME/token
    4. Install the huggingface-cli and run huggingface-cli login - this will prompt you to enter your token and set it at the right path
  3. Choose your model on the Hugging Face Hub, and, in order of precedence, you can either:

    1. Set the LLM_NVIM_MODEL environment variable
    2. Pass model = <model identifier> in plugin opts

Note: the model's value will be appended to the url like so : {url}/model/{model} as this is how we route requests to the right model.

backend = "ollama"

API

Refer to Ollama's documentation on how to run ollama. Here is an example configuration:

{
  model = "codellama:7b",
  url = "http://localhost:11434", -- llm-ls uses "/api/generate"
  -- cf https://github.com/ollama/ollama/blob/main/docs/api.md#parameters
  request_body = {
    -- Modelfile options for the model you use
    options = {
      temperature = 0.2,
      top_p = 0.95,
    }
  }
}

Note: model's value will be added to the request body.

Open AI

backend = "openai"

Refer to Ollama's documentation on how to run ollama. Here is an example configuration:

{
  model = "codellama",
  url = "http://localhost:8000", -- llm-ls uses "/v1/completions"
  -- cf https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#openai-compatible-web-server
  request_body = {
    temperature = 0.2,
    top_p = 0.95,
  }
}

Note: model's value will be added to the request body.

backend = "tgi"

API

Refer to TGI's documentation on how to run TGI. Here is an example configuration:

{
  model = "bigcode/starcoder",
  url = "http://localhost:8080", -- llm-ls uses "/generate"
  -- cf https://huggingface.github.io/text-generation-inference/#/Text%20Generation%20Inference/generate
  request_body = {
    parameters = {
      temperature = 0.2,
      top_p = 0.95,
    }
  }
}

Models

{
  tokens_to_clear = { "<|endoftext|>" },
  fim = {
    enabled = true,
    prefix = "<fim_prefix>",
    middle = "<fim_middle>",
    suffix = "<fim_suffix>",
  },
  model = "bigcode/starcoder",
  context_window = 8192,
  tokenizer = {
    repository = "bigcode/starcoder",
  }
}

Note

These are the default config values

{
  tokens_to_clear = { "<EOT>" },
  fim = {
    enabled = true,
    prefix = "<PRE> ",
    middle = " <MID>",
    suffix = " <SUF>",
  },
  model = "codellama/CodeLlama-13b-hf",
  context_window = 4096,
  tokenizer = {
    repository = "codellama/CodeLlama-13b-hf",
  }
}

Note

Spaces are important here

By default, llm-ls is installed by llm.nvim the first time it is loaded. The binary is downloaded from the release page and stored in:

vim.api.nvim_call_function("stdpath", { "data" }) .. "/llm_nvim/bin"

When developing locally, when using mason or if you built your own binary because your platform is not supported, you can set the lsp.bin_path setting to the path of the binary. You can also start llm-ls via tcp using the --port [PORT] option, which is useful when using a debugger.

lsp.version is used only when llm.nvim downloads llm-ls from the release page.

lsp.cmd_env can be used to set environment variables for the llm-ls process.

Mason

You can install llm-ls via mason.nvim. To do so, run the following command:

:MasonInstall llm-ls

Then reference llm-ls's path in your configuration:

{
  -- ...
  lsp = {
    bin_path = vim.api.nvim_call_function("stdpath", { "data" }) .. "/mason/bin/llm-ls",
  },
  -- ...
}

Tokenizer

llm-ls uses tokenizers to make sure the prompt fits the context_window.

To configure it, you have a few options:

  • No tokenization, llm-ls will count the number of characters instead:
{
  tokenizer = nil,
}
  • from a local file on your disk:
{
  tokenizer = {
    path = "/path/to/my/tokenizer.json"
  }
}
  • from a Hugging Face repository, llm-ls will attempt to download tokenizer.json at the root of the repository:
{
  tokenizer = {
    repository = "myusername/myrepo"
    api_token = nil -- optional, in case the API token used for the backend is not the same
  }
}
  • from an HTTP endpoint, llm-ls will attempt to download a file via an HTTP GET request:
{
  tokenizer = {
    url = "https://my-endpoint.example.com/mytokenizer.json",
    to = "/download/path/of/mytokenizer.json"
  }
}

Suggestion behavior

You can tune the way the suggestions behave:

  • enable_suggestions_on_startup lets you choose to enable or disable "suggest-as-you-type" suggestions on neovim startup. You can then toggle auto suggest with LLMToggleAutoSuggest (see Commands)
  • enable_suggestions_on_files lets you enable suggestions only on specific files that match the pattern matching syntax you will provide. It can either be a string or a list of strings, for example:
    • to match on all types of buffers: enable_suggestions_on_files: "*"
    • to match on all files in my_project/: enable_suggestions_on_files: "/path/to/my_project/*"
    • to match on all python and rust files: enable_suggestions_on_files: { "*.py", "*.rs" }

Commands

llm.nvim provides the following commands:

  • LLMToggleAutoSuggest enables/disables automatic "suggest-as-you-type" suggestions
  • LLMSuggestion is used to manually request a suggestion

Package manager

Using packer

require("packer").startup(function(use)
  use {
    'huggingface/llm.nvim',
    config = function()
      require('llm').setup({
        -- cf Setup
      })
    end
  }
end)

Using lazy.nvim

require("lazy").setup({
  {
    'huggingface/llm.nvim',
    opts = {
      -- cf Setup
    }
  },
})

Using vim-plug

Plug 'huggingface/llm.nvim'
require('llm').setup({
  -- cf Setup
})

Setup

local llm = require('llm')

llm.setup({
  api_token = nil, -- cf Install paragraph
  model = "bigcode/starcoder2-15b", -- the model ID, behavior depends on backend
  backend = "huggingface", -- backend ID, "huggingface" | "ollama" | "openai" | "tgi"
  url = nil, -- the http url of the backend
  tokens_to_clear = { "<|endoftext|>" }, -- tokens to remove from the model's output
  -- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
  request_body = {
    parameters = {
      max_new_tokens = 60,
      temperature = 0.2,
      top_p = 0.95,
    },
  },
  -- set this if the model supports fill in the middle
  fim = {
    enabled = true,
    prefix = "<fim_prefix>",
    middle = "<fim_middle>",
    suffix = "<fim_suffix>",
  },
  debounce_ms = 150,
  accept_keymap = "<Tab>",
  dismiss_keymap = "<S-Tab>",
  tls_skip_verify_insecure = false,
  -- llm-ls configuration, cf llm-ls section
  lsp = {
    bin_path = nil,
    host = nil,
    port = nil,
    cmd_env = nil, -- or { LLM_LOG_LEVEL = "DEBUG" } to set the log level of llm-ls
    version = "0.5.3",
  },
  tokenizer = nil, -- cf Tokenizer paragraph
  context_window = 1024, -- max number of tokens for the context window
  enable_suggestions_on_startup = true,
  enable_suggestions_on_files = "*", -- pattern matching syntax to enable suggestions on specific files, either a string or a list of strings
  disable_url_path_completion = false, -- cf Backend
})

llm.nvim's People

Contributors

alejandrosuero avatar blmarket avatar cifvts avatar cxwx avatar davido264 avatar hampushauffman avatar klei22 avatar mcpatate avatar nenkoru avatar robzz avatar wilfriedroset avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llm.nvim's Issues

Extremely slow on Completion

Extremely slow on my Mac use M2 chip. Only use a very light startcoder2 3b model.

I can see a very high GPU or CPU usage and Idle wake Ups. Many times the ollama process became so heavy that a normal ollama quite is not working and need to be force ended.

image

Not quiet understand what happened. Here is the log of ollama. I cannot understand it but maybe helpful.

{"function":"initialize","level":"INFO","line":444,"msg":"initializing slots","n_slots":1,"tid":"0x17074f000","timestamp":1712133604}
{"function":"initialize","level":"INFO","line":456,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"0x17074f000","timestamp":1712133604}
{"function":"update_slots","level":"INFO","line":1572,"msg":"all slots are idle and system prompt is empty, clear the KV cache","tid":"0x176287000","timestamp":1712133604}
time=2024-04-03T17:40:04.803+09:00 level=INFO source=dyn_ext_server.go:159 msg="Starting llama main loop"
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":0,"tid":"0x176287000","timestamp":1712133604}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":0,"n_past_se":0,"n_prompt_tokens_processed":438,"slot_id":0,"task_id":0,"tid":"0x176287000","timestamp":1712133604}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":0,"slot_id":0,"task_id":0,"tid":"0x176287000","timestamp":1712133604}
[GIN] 2024/04/03 - 17:40:06 | 200 |  1.903387625s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:06 | 200 |  1.244827417s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:06 | 200 |  1.041619375s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:06 | 200 |  838.446166ms |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:06 | 200 |   412.63925ms |       127.0.0.1 | POST     "/api/generate"
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":440,"n_ctx":2048,"n_past":439,"n_system_tokens":0,"slot_id":0,"task_id":0,"tid":"0x176287000","timestamp":1712133606,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":4,"tid":"0x176287000","timestamp":1712133606}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":32,"n_past_se":0,"n_prompt_tokens_processed":405,"slot_id":0,"task_id":4,"tid":"0x176287000","timestamp":1712133606}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":32,"slot_id":0,"task_id":4,"tid":"0x176287000","timestamp":1712133606}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1347.64 ms /   405 tokens (    3.33 ms per token,   300.53 tokens per second)","n_prompt_tokens_processed":405,"n_tokens_second":300.52513985549564,"slot_id":0,"t_prompt_processing":1347.641,"t_token":3.327508641975309,"task_id":4,"tid":"0x176287000","timestamp":1712133607}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     198.78 ms /     7 runs   (   28.40 ms per token,    35.21 tokens per second)","n_decoded":7,"n_tokens_second":35.21392459189576,"slot_id":0,"t_token":28.39785714285714,"t_token_generation":198.785,"task_id":4,"tid":"0x176287000","timestamp":1712133607}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1546.43 ms","slot_id":0,"t_prompt_processing":1347.641,"t_token_generation":198.785,"t_total":1546.4260000000002,"task_id":4,"tid":"0x176287000","timestamp":1712133607}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":444,"n_ctx":2048,"n_past":443,"n_system_tokens":0,"slot_id":0,"task_id":4,"tid":"0x176287000","timestamp":1712133607,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":6,"tid":"0x176287000","timestamp":1712133607}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":31,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":6,"tid":"0x176287000","timestamp":1712133607}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":31,"slot_id":0,"task_id":6,"tid":"0x176287000","timestamp":1712133607}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1331.61 ms /   406 tokens (    3.28 ms per token,   304.89 tokens per second)","n_prompt_tokens_processed":406,"n_tokens_second":304.89430455937145,"slot_id":0,"t_prompt_processing":1331.609,"t_token":3.279825123152709,"task_id":6,"tid":"0x176287000","timestamp":1712133609}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     162.88 ms /     6 runs   (   27.15 ms per token,    36.84 tokens per second)","n_decoded":6,"n_tokens_second":36.83670900841719,"slot_id":0,"t_token":27.146833333333333,"t_token_generation":162.881,"task_id":6,"tid":"0x176287000","timestamp":1712133609}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1494.49 ms","slot_id":0,"t_prompt_processing":1331.609,"t_token_generation":162.881,"t_total":1494.49,"task_id":6,"tid":"0x176287000","timestamp":1712133609}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":443,"n_ctx":2048,"n_past":442,"n_system_tokens":0,"slot_id":0,"task_id":6,"tid":"0x176287000","timestamp":1712133609,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":8,"tid":"0x176287000","timestamp":1712133609}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":31,"n_past_se":0,"n_prompt_tokens_processed":405,"slot_id":0,"task_id":8,"tid":"0x176287000","timestamp":1712133609}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":31,"slot_id":0,"task_id":8,"tid":"0x176287000","timestamp":1712133609}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1327.47 ms /   405 tokens (    3.28 ms per token,   305.09 tokens per second)","n_prompt_tokens_processed":405,"n_tokens_second":305.0904913463531,"slot_id":0,"t_prompt_processing":1327.475,"t_token":3.277716049382716,"task_id":8,"tid":"0x176287000","timestamp":1712133610}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     260.96 ms /     9 runs   (   29.00 ms per token,    34.49 tokens per second)","n_decoded":9,"n_tokens_second":34.48791198684861,"slot_id":0,"t_token":28.99566666666667,"t_token_generation":260.961,"task_id":8,"tid":"0x176287000","timestamp":1712133610}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1588.44 ms","slot_id":0,"t_prompt_processing":1327.475,"t_token_generation":260.961,"t_total":1588.436,"task_id":8,"tid":"0x176287000","timestamp":1712133610}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":445,"n_ctx":2048,"n_past":444,"n_system_tokens":0,"slot_id":0,"task_id":8,"tid":"0x176287000","timestamp":1712133610,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":10,"tid":"0x176287000","timestamp":1712133610}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":30,"n_past_se":0,"n_prompt_tokens_processed":407,"slot_id":0,"task_id":10,"tid":"0x176287000","timestamp":1712133610}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":30,"slot_id":0,"task_id":10,"tid":"0x176287000","timestamp":1712133610}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1329.28 ms /   407 tokens (    3.27 ms per token,   306.18 tokens per second)","n_prompt_tokens_processed":407,"n_tokens_second":306.1817109464099,"slot_id":0,"t_prompt_processing":1329.276,"t_token":3.266034398034398,"task_id":10,"tid":"0x176287000","timestamp":1712133612}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     130.75 ms /     5 runs   (   26.15 ms per token,    38.24 tokens per second)","n_decoded":5,"n_tokens_second":38.24004038148264,"slot_id":0,"t_token":26.150599999999997,"t_token_generation":130.753,"task_id":10,"tid":"0x176287000","timestamp":1712133612}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1460.03 ms","slot_id":0,"t_prompt_processing":1329.276,"t_token_generation":130.753,"t_total":1460.029,"task_id":10,"tid":"0x176287000","timestamp":1712133612}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":442,"n_ctx":2048,"n_past":441,"n_system_tokens":0,"slot_id":0,"task_id":10,"tid":"0x176287000","timestamp":1712133612,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":12,"tid":"0x176287000","timestamp":1712133612}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":32,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":12,"tid":"0x176287000","timestamp":1712133612}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":32,"slot_id":0,"task_id":12,"tid":"0x176287000","timestamp":1712133612}
[GIN] 2024/04/03 - 17:40:13 | 200 |    7.4575515s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:13 | 200 |  6.892029459s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:13 | 200 |  6.579324209s |       127.0.0.1 | POST     "/api/generate"
[GIN] 2024/04/03 - 17:40:13 | 200 |   6.19803325s |       127.0.0.1 | POST     "/api/generate"
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":440,"n_ctx":2048,"n_past":439,"n_system_tokens":0,"slot_id":0,"task_id":12,"tid":"0x176287000","timestamp":1712133613,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":48,"tid":"0x176287000","timestamp":1712133613}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":31,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":48,"tid":"0x176287000","timestamp":1712133613}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":31,"slot_id":0,"task_id":48,"tid":"0x176287000","timestamp":1712133613}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1328.01 ms /   406 tokens (    3.27 ms per token,   305.72 tokens per second)","n_prompt_tokens_processed":406,"n_tokens_second":305.71966861795136,"slot_id":0,"t_prompt_processing":1328.014,"t_token":3.2709704433497535,"task_id":48,"tid":"0x176287000","timestamp":1712133615}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     197.04 ms /     7 runs   (   28.15 ms per token,    35.53 tokens per second)","n_decoded":7,"n_tokens_second":35.526502771067214,"slot_id":0,"t_token":28.148,"t_token_generation":197.036,"task_id":48,"tid":"0x176287000","timestamp":1712133615}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1525.05 ms","slot_id":0,"t_prompt_processing":1328.014,"t_token_generation":197.036,"t_total":1525.05,"task_id":48,"tid":"0x176287000","timestamp":1712133615}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":444,"n_ctx":2048,"n_past":443,"n_system_tokens":0,"slot_id":0,"task_id":48,"tid":"0x176287000","timestamp":1712133615,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":50,"tid":"0x176287000","timestamp":1712133615}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":32,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":50,"tid":"0x176287000","timestamp":1712133615}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":32,"slot_id":0,"task_id":50,"tid":"0x176287000","timestamp":1712133615}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1327.23 ms /   406 tokens (    3.27 ms per token,   305.90 tokens per second)","n_prompt_tokens_processed":406,"n_tokens_second":305.90025843297695,"slot_id":0,"t_prompt_processing":1327.23,"t_token":3.269039408866995,"task_id":50,"tid":"0x176287000","timestamp":1712133616}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     262.41 ms /     9 runs   (   29.16 ms per token,    34.30 tokens per second)","n_decoded":9,"n_tokens_second":34.29760412180985,"slot_id":0,"t_token":29.156555555555556,"t_token_generation":262.409,"task_id":50,"tid":"0x176287000","timestamp":1712133616}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1589.64 ms","slot_id":0,"t_prompt_processing":1327.23,"t_token_generation":262.409,"t_total":1589.6390000000001,"task_id":50,"tid":"0x176287000","timestamp":1712133616}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":447,"n_ctx":2048,"n_past":446,"n_system_tokens":0,"slot_id":0,"task_id":50,"tid":"0x176287000","timestamp":1712133616,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":52,"tid":"0x176287000","timestamp":1712133616}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":32,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":52,"tid":"0x176287000","timestamp":1712133616}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":32,"slot_id":0,"task_id":52,"tid":"0x176287000","timestamp":1712133616}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1328.47 ms /   406 tokens (    3.27 ms per token,   305.61 tokens per second)","n_prompt_tokens_processed":406,"n_tokens_second":305.61380952882786,"slot_id":0,"t_prompt_processing":1328.474,"t_token":3.2721034482758617,"task_id":52,"tid":"0x176287000","timestamp":1712133619}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =    1008.81 ms /    32 runs   (   31.53 ms per token,    31.72 tokens per second)","n_decoded":32,"n_tokens_second":31.72044769446865,"slot_id":0,"t_token":31.52540625,"t_token_generation":1008.813,"task_id":52,"tid":"0x176287000","timestamp":1712133619}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    2337.29 ms","slot_id":0,"t_prompt_processing":1328.474,"t_token_generation":1008.813,"t_total":2337.287,"task_id":52,"tid":"0x176287000","timestamp":1712133619}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":470,"n_ctx":2048,"n_past":469,"n_system_tokens":0,"slot_id":0,"task_id":52,"tid":"0x176287000","timestamp":1712133619,"truncated":false}
{"function":"launch_slot_with_data","level":"INFO","line":829,"msg":"slot is processing task","slot_id":0,"task_id":54,"tid":"0x176287000","timestamp":1712133619}
{"function":"update_slots","ga_i":0,"level":"INFO","line":1810,"msg":"slot progression","n_past":33,"n_past_se":0,"n_prompt_tokens_processed":406,"slot_id":0,"task_id":54,"tid":"0x176287000","timestamp":1712133619}
{"function":"update_slots","level":"INFO","line":1834,"msg":"kv cache rm [p0, end)","p0":33,"slot_id":0,"task_id":54,"tid":"0x176287000","timestamp":1712133619}
{"function":"print_timings","level":"INFO","line":272,"msg":"prompt eval time     =    1328.28 ms /   406 tokens (    3.27 ms per token,   305.66 tokens per second)","n_prompt_tokens_processed":406,"n_tokens_second":305.6593659751437,"slot_id":0,"t_prompt_processing":1328.276,"t_token":3.2716157635467984,"task_id":54,"tid":"0x176287000","timestamp":1712133620}
{"function":"print_timings","level":"INFO","line":286,"msg":"generation eval time =     327.84 ms /    11 runs   (   29.80 ms per token,    33.55 tokens per second)","n_decoded":11,"n_tokens_second":33.5533620468771,"slot_id":0,"t_token":29.803272727272727,"t_token_generation":327.836,"task_id":54,"tid":"0x176287000","timestamp":1712133620}
{"function":"print_timings","level":"INFO","line":295,"msg":"          total time =    1656.11 ms","slot_id":0,"t_prompt_processing":1328.276,"t_token_generation":327.836,"t_total":1656.112,"task_id":54,"tid":"0x176287000","timestamp":1712133620}
{"function":"update_slots","level":"INFO","line":1642,"msg":"slot released","n_cache_tokens":450,"n_ctx":2048,"n_past":449,"n_system_tokens":0,"slot_id":0,"task_id":54,"tid":"0x176287000","timestamp":1712133620,"truncated":false}
[GIN] 2024/04/03 - 17:40:20 | 200 | 13.134449416s |       127.0.0.1 | POST     "/api/generate"

Rate Limit error on locally deployed model

I am trying to run Starcoder locally through Ollama. And I want to get code auto-completion like in the README gif.

But I keep getting the following error after every debounce: [LLM] inference api error: Rate limit reached. Please log in or use your apiToken

local llm = require('llm')

    llm.setup({
        api_token = nil,                             -- cf Install paragraph
        model = "bigcode/starcoder",                 -- the model ID, behavior depends on backend
        url = "http://localhost:11434/api/generate", -- the http url of the backend
        tokens_to_clear = { "<|endoftext|>" },       -- tokens to remove from the model's output

        -- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
        request_body = {
            parameters = {
                temperature = 0.1,
            },
        },
        -- set this if the model supports fill in the middle
        fim = {
            enabled = true,
            prefix = "<fim_prefix>",
            middle = "<fim_middle>",
            suffix = "<fim_suffix>",
        },
        debounce_ms = 1000,
        context_window = 8192, -- max number of tokens for the context window
        tokenizer = { -- cf Tokenizer paragraph
            repository = "bigcode/starcoder",
        },
    })

Am I wrong to understand that this repo can give Copilot/Tabnine-like autocomplete with locally deployed models?
Please let me know what my next steps should be.

How to use openai api?

I read the code, and it seems support real openai api. But When I set it up something is wrong.
Just make sure if this supports open ai api? I mean realy openai api.

Inconsistent Virtual Text Placement with Tabs

Hi,
I'm encountering an issue with llm.nvim where virtual text is not correctly aligned when lines start with a tab character. Specifically, virtual text intended to be aligned with the cursor position defaults to column 1 when the line begins with a tab, whereas it aligns correctly when the line starts with spaces.

image

I digged into the source code and found the col came from llm-ls, I'm not sure if it's a bug of llm.nvim.

README suggests Ollama should work but it does not

I feel like I have tried every combination of setup configuration that's in the readme but did not manage to get it working with 0.5.0+. I get many different "missing field *" errors. Is there a full config somewhere as an example that works with ollama?

I have it working with a fork from before ollama support was supposedly in this repo but I would like to switch to this again if possible. https://github.com/Amzd/nvim.config/blob/main/lua/plugins/llm.lua

Unreachable LLM server blocks UI

When the configured LLM server is unreachable the UI is blocked by the error messages, making typing extremely slow. This can of course be negated by disabling generation, but maybe hiding the error message after 'n' times would be a better experience for the user?

No LSP with LLM

I have managed to get llm autocomplete to work using cmp-ai. But then I lose basic lsp support (intellisense)

[Feat]: Improve DX

Motivation

When I was working on the files, there were no rules for the linters nor lsp, lua_ls and (in my case) selene will complain about unused variables, parameters, shadowing, etc.

Proposal

  • Linter: I will personally go for selene (a more updated and better approach than luacheck)

Configuration example

Linters

  • Selene:
Expand to see configuration
# selene.toml or .selene.toml
std="neovim" # looks for a `neovim.yml` file to use as configuration

[rules]
global_usage = "warn"
deprecated = "warn" # If changed to `allow` it will rely in `lua_ls` diagnostics alone
multiple_statements = "warn"
incorrect_standard_library_use = "allow" # This is for cases like `string.format`, `package.config`, etc.
mixed_table = "allow"
unused_variable = "warn"
undefined_variable = "warn"
# neovim.yml
---
base: lua51 # to use lua5.1 what Neovim uses

globals:
  jit:
    any: true
  vim:
    any: true
  assert:
    args:
      - type: bool
      - type: string
        required: false
  after_each:
    args:
      - type: function
  before_each:
    args:
      - type: function
  describe:
    args:
      - type: string
      - type: function
  it:
    args:
      - type: string
      - type: function

Expand to see configuration
-- .luacheckrc
-- Rerun tests only if their modification time changed.
cache = true

std = luajit
codes = true

self = false

-- Glorious list of warnings: https://luacheck.readthedocs.io/en/stable/warnings.html
ignore = {
  "212", -- Unused argument, In the case of callback function, _arg_name is easier to understand than _, so this option is set to off.
  "122", -- Indirectly setting a readonly global
}

globals = {
  "_",
  "_PlenaryLeafTable",
  "_PlenaryBustedOldAssert",
  "_AssociatedBufs",
}

-- Global objects defined by the C code
read_globals = {
  "vim",
}

exclude_files = {
  "lua/plenary/profile/lua_profiler.lua",
  "lua/plenary/profile/memory_profiler.lua",
  "lua/plenary/async_lib/*.lua",
}

files = {
  ["lua/plenary/busted.lua"] = {
    globals = {
      "describe",
      "it",
      "pending",
      "before_each",
      "after_each",
      "clear",
      "assert",
      "print",
    },
  },
  ["lua/plenary/async/init.lua"] = {
    globals = {
      "a",
    },
  },
  ["lua/plenary/async/tests.lua"] = {
    globals = {
      "describe",
      "it",
      "pending",
      "before_each",
      "after_each",
    },
  },
}

Formatter

We can also add a Makefile to use make lint and make format.

lint:
   @printf "\nRunning linter\n"
   @selene --display-style quiet --config ./selene.toml lua/llm
   # @luacheck lua/llm
   @printf "\Running formatter check\n"
   @stylua --color always -f ./.stylua.toml --check .

format:
   @printf "\nFixing all fixable formatting problems\n"
   @stylua --color always -f ./.stylua.toml .

Also the native support for .editorconfig in Neovim to ensure code style not only in lua files.


Integrating linters and formatters in worflows

  • Selene and stylua:
name: lint

on:
  pull_request:
    branches:
      - main
    paths:
      - "lua/**"
      - "tests/**"

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout sources
        uses: actions/checkout@v4

      - name: Run selene
        uses: NTBBloodbath/[email protected]
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          args: --display-style quiet lua/llm

  style-lint:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout sources
        uses: actions/checkout@v4

      - name: Lint with stylua
        uses: JohnnyMorganz/stylua-action@v4
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          version: latest
          args: --color always --check .

Note

Both stylua and selene are installable from cargo.

We could use a cargo install package and then run make lint if
using the Makefile approach.

  • Luacheck and stylua:
name: lint

on:
  pull_request:
    branches:
      - main
    paths:
      - "lua/**"
      - "tests/**"

jobs:
  stylua:
    name: stylua
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v3
      - uses: JohnnyMorganz/stylua-action@v2
        with:
          token: ${{ secrets.GITHUB_TOKEN }}
          version: latest
          # CLI arguments
          args: --color always --check .

  luacheck:
    name: Luacheck
    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v3

      - name: Prepare
        run: |
          sudo apt-get update
          sudo apt-get install -y luarocks
          sudo luarocks install luacheck

      - name: Lint
        run: make lint

Contributing rules, standards and code of conduct

Adding a CONTRIBUTING.md and a CODE_OF_CONDUCT.md for easy to follow instructions on how to change the repo.

In CONTRIBUTING.md we could specify and tell the linters and formatters, how to structure the code, if following conventional commits standards or not and some how to fork the repo just in case. We could also add a minimal.lua configuration to test errors for bugs.

minimal.lua config example
nvim -nu minimal.lua
vim.cmd([[set runtimepath=$VIMRUNTIME]])
vim.cmd([[set packpath=/tmp/nvim/lazy/]])

local lazypath = "/tmp/nvim/lazy/lazy.nvim"
if not vim.loop.fs_stat(lazypath) then
  vim.fn.system({
    "git",
    "clone",
    "--filter=blob:none",
    "https://github.com/folke/lazy.nvim.git",
    "--branch=stable", -- latest stable release
    lazypath,
  })
end

vim.opt.rtp:prepend(lazypath)

require("lazy").setup({
  {
    "huggingface/llm.nvim",
    config = function()
      require("llm").setup({
        -- cf Setup
      }),
    end,
    lazy = false,
    enabled = true,
  },
})

Some example of this here.

Issue and PR templating

Establish issues templates at leats for differentiating from BUG or FEATURE REQUEST, and adding bug or enhancement labels for easy to spot on the issues page.

Note

With the new Github templating form system, we could create a form like issue for bugs, like ISSUE_TEMPLATE/bug_report.yml.
Making sure the issuer check requirements like if it has used the minimal.lua configuration, and point them to it.

We can make required checks and areas so it won't submit unless checked or filled. Like
an example of the configuration used or the Neovim version running, etc..

Some example of this here.

Adding a PR template to follow a changes made format on how it was being tested or not tested, which issue or features closes or fixes with Fixes #<issue number>.

Some example of this here.

Workflows actions

Note

Some of these notes are unnecessary per se, but they add some flavour to the project 😁.

Labelers

We could some actions to improve labeling depending on the files touched in the PRs with actions/labeler in combination with amannn/action-semantic-pull-request

Another action that I find very useful is eps1lon/actions-label-merge-conflict, labels the PR with a label (conflicted for example) and sends a message to the PR telling the people on the PR that they need to solve conflicts with their PR.

Commits

For linting commits I find more useful and less annoying to use commitlint in CI than in hooks, allowing the person committing a change to not have to wait locally to commit changes. This action is as simple as:

# Using this inside an action with the `run` key
npm install --save-dev @commitlint/{cli,config-conventional}
echo "module.exports = { extends: ['@commitlint/config-conventional'] };" > commitlint.config.js
npx commitlint --from HEAD~1 --to HEAD --verbose

If we use conventional commits in the repo of course.

Note

For formatting an linting code refer to Integrating linters and formatters section above.

`[LLM] Model bigcode/starcoderbase is currently loading`

I'm getting [LLM] Model bigcode/starcoderbase is currently loading when using the bigcode/starcoderbase with this config on lazy.nvim:

"huggingface/llm.nvim",
event = "VeryLazy",
opts = {
    api_token = "<key>",
    {
      tokens_to_clear = { "<|endoftext|>" },
      fim = {
        enabled = true,
        prefix = "<fim_prefix>",
        middle = "<fim_middle>",
        suffix = "<fim_suffix>",
      },
      model = "bigcode/starcoder",
      context_window = 8192,
      tokenizer = {
        repository = "bigcode/starcoder",
      }
    }
}

No Auto-Completions and weird offset_encodings warning

WindowsTerminal_EkvgUjPfun

As you see i'm getting this weird warning and no completion when trying to write cpp, this is my llm.nvim setup using lazy

{
    'huggingface/llm.nvim',
    opts = {
      backend = 'ollama',
      model = 'deepseek-coder',
      accept_keymap = '<Tab>',
      dismiss_keymap = '<S-Tab>',
      url = 'http://localhost:11434/api/generate',
      request_body = {
        options = {
          temperature = 0.2,
          top_p = 0.95,
        },
      },
      enable_suggestions_on_startup = false,
      lsp = {
        bin_path = vim.api.nvim_call_function('stdpath', { 'data' }) .. '/mason/bin/llm-ls',
      },
    },
  },

How do you use this?

The documentation is out of order and all over the place, and after following it running :LLMSuggestion doesn't do anything, I've tried running it with text selected, passing in a string, etc. It doesn't give any feedback on what I did wrong or what to do, the token and a valid model is passed into it.

This project is cool but could use better documentation.

Error starting llm-ls

I built llm-ls locally, and config it as following:
lsp = { bin_path = "/home/myuser/soft/GPT/llm-ls-0.5.2/target/release/llm-ls", host = nil, port = nil, version = "0.5.2", },
However, when opening nvim, I meet the Error starting llm-ls. And the nvim/lsp.log record:
[START][2024-05-06 22:55:47] LSP logging initiated [WARN][2024-05-06 22:55:47] .../lua/vim/lsp.lua:601 "buf_attach_client called on unloaded buffer (id: 1): "

What should I do?
Thanks.

check for llm-ls in PATH ?

When developing locally, when using mason or if you built your own binary because your platform is not supported, you can set the lsp.bin_path setting to the path of the binary.

I am a bit perplexed by that, I had llm-ls in PATH and installing llm.nvim tried to install its own (Making TUI unresponsive for 4/5 seconds with no warning). Maybe just check with executable("llm-ls") if it's available ?

NB: I ma just starting but considering the relative complexity of the plugin, adding a health.lua to be able to run :checkhealth llm.nvim could help users

`Tab` key not usable in insert mode

Using this config:

"huggingface/llm.nvim",
event = "VeryLazy",
opts = {
    api_token = "<key>",

    {
        tokens_to_clear = { "<|endoftext|>" },
        fim = {
            enabled = true,
            prefix = "<fim_prefix>",
            middle = "<fim_middle>",
            suffix = "<fim_suffix>",
        },
        model = "codellama/CodeLlama-34b-Instruct-hf",
        context_window = 8192,
        tokenizer = {
            repository = "codellama/CodeLlama-34b-Instruct-hf",
        }
    },

    enable_suggestions_on_startup = false,
    lsp = {
        bin_path = vim.api.nvim_call_function("stdpath", { "data" }) .. "/mason/bin/llm-ls",
    },
},

I am unable to use the Tab key in the insert mode when suggestions are on, but Tab works for accepting outputted code. Changing accepting and dismissing keys only made them not usable (in insert mode), and Tab working again.

Add ability to specify file types to attach to.

Right now the llm-ls attaches to all file / buffer types, including things like the Telescope search field.

This plugin should implement a config option and functionality to only attach to specified file types like copilot.lua does:

https://github.com/zbirenbaum/copilot.lua/blob/master/lua/copilot/client.lua#L91

An example config would include:

filetypes = {
     bash = true,
     c = true,
     cpp = true,
     fish = true,
     go = true,
     html = true,
     java = true,
     javascript = true,
     just = true,
     lua = true,
     python = true,
     rust = true,
     sh = true,
     typescript = true,
     zsh = true,
     ["*"] = false, -- disable for all other filetypes and ignore default `filetypes`
}

Otherwise, something like this will happen:

CleanShot 2023-09-29 at 07 02 29@2x

Thanks

extract_generation fails with attempt to index a nil value

the current version of plugin 1a3d558c72cb2778e68a86a26aa9c530ea25c769 with nvim v0.9.1 seems to fail against a self hosted codellama/CodeLlama-13b-hf version 9e3ca34b26ace9f0c4e098f61f6fa8e9d89dc2c2 with ghcr.io/huggingface/text-generation-inference:1.0.3

16:17:31 msg_show ...rs/wroset/.local/share/nvim/lazy/llm.nvim/lua/llm/hf.lua:27: attempt to index a nil value
16:17:31 msg_show stack traceback:
16:17:31 msg_show ^I...rs/wroset/.local/share/nvim/lazy/llm.nvim/lua/llm/hf.lua:27: in function 'extract_generation'
16:17:31 msg_show ^I...rs/wroset/.local/share/nvim/lazy/llm.nvim/lua/llm/hf.lua:82: in function <...rs/wroset/.local/share/nvim/lazy/llm.nvim/lua/llm/hf.lua:80>

I've installed llm.nvim with lazyvim with the following configuration and I'm not (yet) using lsp.

{
  "huggingface/llm.nvim",
  opts = {
    debounce_ms = 150,
    accept_keymap = "<Tab>",
    dismiss_keymap = "<S-Tab>",
    tokenizer_path = nil, -- when setting model as a URL, set this var
    -- code-llama
    query_params = {
      stop_token = "<EOT>",
    },
    fim = {
      enabled = true,
      prefix = "<PRE> ",
      middle = " <MID>",
      suffix = " <SUF>",
    },
    context_window = 4096,
  },
},

Here is an example of my test scenario:

#! /usr/bin/python3

def main():

I've added a couple a troubleshoot notify similar:

diff --git a/lua/llm/hf.lua b/lua/llm/hf.lua
index d6f4879..85818c3 100644
--- a/lua/llm/hf.lua
+++ b/lua/llm/hf.lua
@@ -15,7 +15,10 @@ local function build_inputs(before, after)
 end

 local function extract_generation(data)
+  vim.notify("data has " .. tostring(#data) .. " elements", vim.log.levels.ERROR)
+  vim.notify("data[1]: >" .. data[1] .. "<", vim.log.levels.ERROR)
   local decoded_json = json.decode(data[1])
+  vim.notify("decoded_json has " .. tostring(#decoded_json) .. " elements", vim.log.levels.ERROR)
   if decoded_json == nil then
     vim.notify("[LLM] error getting response from API", vim.log.levels.ERROR)
     return ""

I'm not sure to understand why but decoded_json is empty

2023-09-06T22:30:41 Notify  DEBUG [LLM] api token is empty, suggestion might not work                                               │
2023-09-06T22:30:43 Error  ERROR data has 1 elements                                                                                │
2023-09-06T22:30:43 Error  ERROR data[1]: >{"generated_text":"print(\"Hello, world!\")\n\nif __name__ == \"__main__\":\n    main()\n│
2023-09-06T22:30:43 Error  ERROR decoded_json has 0 elements                                                                        │
2023-09-06T22:30:43 Messages  INFO ^I...rs/wroset/.local/share/nvim/lazy/llm.nvim/lua/llm/hf.lua:85: in function <...rs/wroset/.loca│

Change authentication mechanism

As suggested by BaggiPonte here, people usually version their nvim configuration.

Current mechanism for authentication requires passing the token in the plugin configuration which can lead to unintended upload of HF tokens.

Merge of user config with default config causes issues

Hi there, thanks for the cool plugin.

I noticed an issue caused by how you merge the default config with the user provided config in config.lua. I usually use ollama and decided to try out vllm, but had a a bit of trouble getting it to work because of this.

vllm exposes an openAI compatible API, so my request_body looked something like the following:

request_body = {
    temperature = 0,
    top_p = 0.9,
    presence_penalty = 1,
    stop = { '<|file_separator|>', '<|fim_prefix|>', '<|fim_suffix|>', '<|fim_middle|>' }
}

vllm kept throwing me HTTP bad request errors, and sniffing the requests sent to vllm showed me that the default values for the request body (the parameters object) were being sent along with my own request body, which was unexpected. vllm visibly does not like unexpected arguments.
The problem comes from the use of vim.tbl_deep_extend(), which recursively merges the default config map with the user provided one. As a consequence, the user provided request body gets merged with the default one, which for the request body in particular is surprising behavior.

You can observe the same behavior by using the configuration provided in the README for say, ollama and looking at the transmitted request body: the parameters object is present alongside the options object although it's not present in the user config, ollama is just more graceful than vllm about it.

I worked around it by explicitly deleting the parameters object from the llm.nvim config table just after the setup() call with this: require('llm.config').config.request_body.parameters = nil. Not a complex workaround, but I did need to go and check the plugin source to figure it out.

The fix that I would suggest is that, since the default config is provided for the huggingface backend, you should either not merge the default request_body with the user config if the user is not using the huggingface backend, or include a matching default request body for other backends (for instance what's in the README) and merge with that. Let me know if you'd like a PR for it.

lsp server not started properly

I hit the following error:

"Requesting completion for a detached buffer, check enable_suggestions_on_files' value",

1/ I think the message could be improved, aka "no LSP server enabled for this buffer"
2/ LspInfo gives me:

 Client: llm-ls (id: 1, bufnr: [1, 6])
 	filetypes:       
 	autostart:       false
 	root directory:  /home/teto/nixpkgs
 	cmd:             /nix/store/ngay8qxw0bnirwsnjsk84bdcsbd2q9kc-llm-ls-0.4.0/bin/llm-ls

which means the config here

local client_id = lsp.start({
is probably not enough (it should autostart maybe ?).
Turns out :LspStart llm-ls seems to start it (I see "codellama currently loading", I hope it's not too long took 1mn and I could accept the proposition with the binding in accept_keymap).

Interested in adding llm-ls to https://github.com/neovim/nvim-lspconfig ? ping me if you open a PR and I can review it.

Is it possible to support insert mode mapping?

As title. It seems that one can only make use of one command :HFccSuggestion. And this forces us to get a suggestion in normal mode. It will be more natural if we could trigger it in insert mode. Thanks!

Having autocompleted code conform to python spacing?

Currently hfcc adds newlines in python as tabs instead of 4 spaces:

    args = parse_args()
	random = randint(0,9)
	print(random)

I currently fix this with running a :retab command after autocomplete, which works, however wondering if there might be a way to preprocess the code before it gets added to the buffer (or if there is a vim setting I can use to have it add using spaces instead of tabs?)

expose callbacks

as I started my neovim, I got surprised that my <tab> key was doing non-sense.
As I had just installed llm.nvim the culprit was evident (plus :verb map <tab>). Tab is an important key so I dont think the plugin should hijack it like this. I would prefer for the plugins to expose its callbacks and let me map them. The mapping could be kept in the README out of the setup function.
The setup function meme can be quite annoying when you compose your setup
https://mrcjkb.dev/posts/2023-08-22-setup.html

Can not using the ollama in docker container. ERROR: [LLM] http error

It is a great plugin and I love it. But I found an error here.

[LLM] http error: error sending request for url (http://localhost:11434/api/generate): connection closed before message completed

Following the config in the readme.

{
    "huggingface/llm.nvim",
    opts = {
      -- cf Setup
    },
    config = function()
      local llm = require("llm")
      llm.setup({
        api_token = nil, -- cf Install paragraph
        -- for ollama backend
        backend = "ollama", -- backend ID, "huggingface" | "" | "openai" | "tgi"
        model = "starcoder2:7b",
        url = "http://localhost:11434/api/generate",
        tokens_to_clear = { "<|endoftext|>" }, -- tokens to remove from the model's output
        -- parameters that are added to the request body, values are arbitrary, you can set any field:value pair here it will be passed as is to the backend
        request_body = {
          parameters = {
            max_new_tokens = 60,
            temperature = 0.2,
            top_p = 0.95,
          },
        },
        -- set this if the model supports fill in the middle
        fim = {
          enabled = true,
          prefix = "<fim_prefix>",
          middle = "<fim_middle>",
          suffix = "<fim_suffix>",
        },
        debounce_ms = 150,
        accept_keymap = "<C-y>",
        dismiss_keymap = "<C-n>",
        tls_skip_verify_insecure = false,
        -- llm-ls configuration, cf llm-ls section
        lsp = {
          bin_path = nil,
          host = nil,
          port = nil,
          version = "0.5.2",
        },
        tokenizer = {
          repository = "bigcode/starcoder2-7b",
         
        }, -- cf Tokenizer paragraph
        -- tokenizer = nil, -- cf Tokenizer paragraph
        context_window = 4096, -- max number of tokens for the context window
        enable_suggestions_on_startup = true,
        enable_suggestions_on_files = "*", -- pattern matching syntax to enable suggestions on specific files, either a string or a list of strings
      })
    end,
  } 

The MOST wirred thing is that I can curl the answer to the same model & api url, and my vscode continue plugin can communicate this is ollama which is running on a docker container but this plugin cannot!

Thank you for your time and reply!

How to use proxy env var

I am unable to communicate with any http endpoints because I am behind a corporate proxy that uses self-signed certificates. Typically we use the http_proxy and https_proxy environment variables for this purpose, but I can't see any obvious configurations that I can add to my lua config to make this work.

I have tried adding http_proxy = "http://ProxyURL:ProxyPort" to cmd_env in the llm.setup but it still keeps throwing an http error... invalid peer certificate, unknown issuer.

feature request: Provide a setting to limit the number of token

Depending on the size of the file, the number of token added to the payload can exceed the maximum allowed by the model.
This produce the following error:

[HFcc] Input validation error: inputs tokens + max_new_tokens must be <= 1512. Given: 1516 inputs tokens and 150 max_new_tokens

I reckon the issue comes from the following lines:
https://github.com/huggingface/hfcc.nvim/blob/f3968f7a6f87e74da91333d19bd5c0f7720cd463/lua/hfcc/completion.lua#L38
https://github.com/huggingface/hfcc.nvim/blob/f3968f7a6f87e74da91333d19bd5c0f7720cd463/lua/hfcc/completion.lua#L41

Where: nvim_buf_get_text({buffer}, {start_row}, {start_col}, {end_row}, {end_col}, {opts})

My understanding is that the whole file will be sent in two part (before, after), hence the error.

It would be great to limit the number of tokens. We could imagine a setting to define the maximum number of token then cut before and after to make sure the total length. ex: len(before) + len(after) + max_new_tokens = maximum_tokens

Start with auto suggestion off

Hello,

Thanks for this amazing plugin, starcoder is a blessing which was missing to my life ! Would it be possible to add an option to start with autosuggestion off so we can activate it when needed instead. My workflow is to bind :LLMToggleAutoSuggest to a shortcut like HF so I can toggle it at will, but a desactivated start would be more convenient

Thanks

Can't use completions

I have the plugin installed and configured but I am unable to accept completions. I see the ghost text suggestions, went through and made sure to include the opts for:

accept_keymap = "<Tab>",
dismiss_keymap = "<S-Tab>"

but pressing Tab just inserts a Tab. I am using LazyVim

llm.nvim does not attach to the buffer

Some time ago llm.nvim stopped completing anything. I went to the source and added some prints to see, why it does not give ghost text. It turned out that the print "not attached" fired:

function M.get_completions(callback)
  if M.client_id == nil then
    vim.print("no client_id")
    return
  end
  if not lsp.buf_is_attached(0, M.client_id) then
    vim.print("not attached")
    return
  end
  ...

I then dug and found the following autocmd in M.setup:

    api.nvim_create_autocmd("BufEnter", {
      group = augroup,
      pattern = config.get().enable_suggestions_on_files,
      callback = function(ev)
        vim.print("notattaching")
        if not lsp.buf_is_attached(ev.buf, client_id) then
          vim.print("attaching")
          lsp.buf_attach_client(ev.buf, client_id)
        end
      end,
    })

Strangely, the these lines were not run (no prints about attaching). If I change the "BufEnter" to "InsertLeave" those prints work and some requests go to the ollama, as my gpu starts being used (though it still does not show the ghost text for some reason - some other problem)

Neovim 0.10.0 support

Hi, after update to Neovim 0.10.0 plugin crashes at startup. Don't know why but have error:

Vim(lua):E5108: Error executing lua ...k/myNeovimPackages/start/llm.nvim/lua/llm/completion.lua:150:
Vim:[LLM] Error starting llm-ls
stack traceback:
        [C]: in function 'nvim_command'
        ...k/myNeovimPackages/start/llm.nvim/lua/llm/completion.lua:150: in function 'setup'
        ...ir/pack/myNeovimPackages/start/llm.nvim/lua/llm/init.lua:29: in function 'setup'

add system promt for FIM or other parameters.

This model, seems need other params in the promt. There need more params for this to give the llm. It also have a system promt like this to avoid return <code_prefix> and <code_suffix>. And how to update the system promt for doing that? Thank you!

Chatbot with TUI

Hi,

It's not an issue but a proposition. I'm currently testing the use of the textual python package which is very interesting to create rich terminal UI.

I already tested it inside neovim, and that's not bad at all:

image

Your opinion is important:

  • is it interesting that I start something inside the "llm.vim" project?
  • or is it better to create a "chatbot" vim plugin outside?

LLM Error: ['Stop'] not used

Across multiple models (for example WizardLM/WizardCoder-1B-V1.0), both inference, using the model name and with my own depoyment I get the error:

[LLM] The following 'model_kwargs' are not used by the model: ['stop'] (note typos in the generate argument will also show up in this list)

I can't work out what to do to fix this....

ollama not working

config:

return {
  "huggingface/llm.nvim",
  opts = {
    model = "rouge/autocoder-s-6.7b:latest",
    backend = "ollama",
    url = "http://localhost:11434/api/generate",
    request_body = {
      parameters = {
        max_new_tokens = 100000, -- the maximum numbers of tokens to generate, ignore the number of prompt token
        temperature = 0.2, -- the bigger, more creatively
        top_p = 0.95, -- the bigger, text generated is diverse
      },
    },
    tokens_to_clear = { "<|endoftext|>" },
    fim = {
      enabled = true,
      prefix = "<fim_prefix>",
      middle = "<fim_middle>",
      suffix = "<fim_suffix>",
    },
    accept_keymap = "<Tab>",
    dismiss_keymap = "<S-Tab>",
    lsp = {
      bin_path = vim.api.nvim_call_function("stdpath", { "data" })
        .. "/mason/bin/llm-ls",
    },
    context_window = 100000, -- max number of tokens for the context window
    tokenizer = {
      repository = "Bin12345/AutoCoder_S_6.7B",
    },
  },
}

error message: [LLM] serde json error: EOF while parsing a value at line 1 column 0

Can't get to work with ollama

I am getting the follwoing when hitting tab in insert mode:

Error executing vim.schedule lua callback: ...cal/share/nvim/lazy/llm.nvim/lua/llm/language_server.lua:154: attempt to index local 'completion_result' (a nil
value)

This is my config

  {
    'huggingface/llm.nvim',
    config = function()
      require('llm').setup({
        model = "starcoder2:7b",
        backend = "ollama",
        url = "http://localhost:11434/api/generate",
        -- cf https://github.com/ollama/ollama/blob/main/docs/api.md#parameters
        request_body = {
          -- Modelfile options for the model you use
          options = {
            temperature = 0.2,
            top_p = 0.95,
          }
        }
      })
    end
  },

Additionally, the command LLMSuggestion does nothing and auto suggestion doesnt seem to do anythingeither . ollama is installed, including the specified model.

Other models dont work either. Using the same model via ollama run does work. It also seems to be spawning the respective ollama serve processes.

Any idea what could be going on?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.