lm-sys / routellm Goto Github PK
View Code? Open in Web Editor NEWA framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
License: Apache License 2.0
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
License: Apache License 2.0
Hi There,
first, I want to thank you for this great project. I have successfully installed and configured RouteLLM on my machine, but I cannot find any information on how to execute it. Would you kindly provide some examples on how to use the tool? In particular i want an example on how to pass the prompt and get back the “model name”.
Thanks so much,
Pirouz
routellm version - 0.2.0
Python 3.10.14
import os
os.environ["OPENAI_API_KEY"] = "XXX"
os.environ["GROQ_API_KEY"] = "YYY"
from routellm.controller import Controller
client = Controller(
routers=["mf"],
strong_model="gpt-3.5-turbo",
weak_model="llama3-8b-8192"
)
response = client.chat.completions.create(
model="router-mf-0.11593",
messages=[
{"role": "user", "content": "hi, how are you"}
]
)
message_content = response['choices'][0]['message']['content']
model_name = response['model']
print(f"Message content: {message_content}")
print(f"Model name: {model_name}")
Error:
HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/routellm/mf_gpt4_augmented/resolve/main/pytorch_model.bin
The above exception was the direct cause of the following exception:
EntryNotFoundError Traceback (most recent call last)
Cell In[1], line 8
4 os.environ["GROQ_API_KEY"] = "XX"
6 from routellm.controller import Controller
----> 8 client = Controller(
9 routers=["mf"],
10 strong_model="gpt-3.5-turbo",
11 weak_model="llama3-8b-8192"
12 )
14 response = client.chat.completions.create(
...
285 )
EntryNotFoundError: 404 Client Error. (Request ID: Root=1-66ab5a67-46661bc90bdf3b0f6b75110c;8df487c1-691d-4d82-af40-8eee63fd32ff)
HI folks,
I have configured my application as here i wanted to change the parameters of the model could you please suggest as how i can pass my own parameters here. I am trying to implement this in RAG Application
client = Controller(
routers=["mf"],
strong_model="gpt-4-1106-preview",
weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
progress_bar=True
)
response =resources.routellm.chat.completions.create(
This tells RouteLLM to use the MF router with a cost threshold of 0.11593
model="router-mf-0.1439",
messages=[
{"role": "system", "content":"""Handle chit-chat gracefully, if the user is greeting then greet them back.
You are an honest and helpful AI assistant. Your job is to understand the Users
questions and only make use of the context provided to answer it clearly and precisely,
be descriptive, use amounts, values and percentages wherever necessary.
Always include all necessary details. Stick to the context and never use any previous information.
If no information is provided in the context then refrain from giving wrong answers.
No preamble."""},
{"role": "user", "content": query}, # Assuming 'query' is the user's input
{"role": "assistant", "content": context} # Including the context in the conversation
]
)
How routellm supports function calling?
Congrats that you've made such a great innovation in agents. I'm working out to reproducing the paper maybe using more data~ But there exists some problems.
In your paper, I recognize that
I can see that queries can be routed to different models of chat completion, but can routellm also route queries to different OpenAI assistants or use different llms for the same OpenAI assistant?
while running basic example I get this error
`
import os
from routellm.controller import Controller
os.environ["OPENAI_API_KEY"] = 'my api'
client = Controller(
routers=["mf"],
strong_model="gpt-4o",
weak_model="gpt-4o-mini",
)
`
'---------------------------------------------------------------------------
OpenAIError Traceback (most recent call last)
Cell In[1], line 2
1 import os
----> 2 from routellm.controller import Controller
4 os.environ["OPENAI_API_KEY"] = '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'
7 client = Controller(
8 routers=["mf"],
9 strong_model="gpt-4o",
10 weak_model="gpt-4o-mini",
11 )
File ~/playground/agent_2906/0507/RouteLLM/routellm/controller.py:10
7 from litellm import acompletion, completion
8 from tqdm import tqdm
---> 10 from routellm.routers.routers import ROUTER_CLS
12 # Default config for routers augmented using golden label data from GPT-4.
13 # This is exactly the same as config.example.yaml.
14 GPT_4_AUGMENTED_CONFIG = {
15 "sw_ranking": {
16 "arena_battle_datasets": [
(...)
27 "mf": {"checkpoint_path": "routellm/mf_gpt4_augmented"},
28 }
File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/routers.py:17
12 from routellm.routers.causal_llm.llm_utils import (
13 load_prompt_format,
14 to_openai_api_messages,
15 )
16 from routellm.routers.causal_llm.model import CausalLLMClassifier
---> 17 from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel
18 from routellm.routers.similarity_weighted.utils import (
19 OPENAI_CLIENT,
20 compute_elo_mle_with_tie,
21 compute_tiers,
22 preprocess_battles,
23 )
26 def no_parallel(cls):
File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/matrix_factorization/model.py:4
1 import torch
2 from huggingface_hub import PyTorchModelHubMixin
----> 4 from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT
6 MODEL_IDS = {
7 "RWKV-4-Raven-14B": 0,
8 "alpaca-13b": 1,
(...)
70 "zephyr-7b-beta": 63,
71 }
74 class MFModel(torch.nn.Module, PyTorchModelHubMixin):
File ~/playground/agent_2906/0507/RouteLLM/routellm/routers/similarity_weighted/utils.py:11
8 from sklearn.linear_model import LogisticRegression
10 choices = ["A", "B", "C", "D"]
---> 11 OPENAI_CLIENT = OpenAI()
14 def compute_tiers(model_ratings, num_tiers):
15 n = len(model_ratings)
File ~/anaconda3/envs/langchain/lib/python3.11/site-packages/openai/_client.py:105, in OpenAI.init(self, api_key, organization, project, base_url, timeout, max_retries, default_headers, default_query, http_client, _strict_response_validation)
103 api_key = os.environ.get("OPENAI_API_KEY")
104 if api_key is None:
--> 105 raise OpenAIError(
106 "The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable"
107 )
108 self.api_key = api_key
110 if organization is None:
OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable'
can support ollama?
Hello, I am facing this issue:
File "/app/app/src/router/__init__.py", line 5, in <module> from .gateway.router import router as gateway_router File "/app/app/src/router/gateway/router.py", line 4, in <module> from routellm.controller import Controller File "/usr/local/lib/python3.12/site-packages/routellm/controller.py", line 10, in <module> from routellm.routers.routers import ROUTER_CLS File "/usr/local/lib/python3.12/site-packages/routellm/routers/routers.py", line 17, in <module> from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel File "/usr/local/lib/python3.12/site-packages/routellm/routers/matrix_factorization/model.py", line 4, in <module> from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT File "/usr/local/lib/python3.12/site-packages/routellm/routers/similarity_weighted/utils.py", line 11, in <module> OPENAI_CLIENT = OpenAI() ^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/openai/_client.py", line 104, in __init__ raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Of course, I can set the environment variable, but I don't want to depend on it. It would be preferable to have a way to set the base_url, model, and api_key for this client as well.
The webpage at http://0.0.0.0:6060/ might be temporarily down or it may have moved permanently to a new web address although I launch server with routers: ['mf'] completely.
Why every single framework force everyone to use OpenAI, please allow to use 2 ollama models, for examle llama3:8b and strong model llama3:70b, we also need support for more models what if i want an SQL model in there for sql queries or something else.
The example config's model paths such as routellm/arena_battles_embeddings
caused a conflict & error in my tooling bc the directory name is the same as the package name.
Add router support for models hosted on Amazon Bedrock.
pdm isn't happy with some project structure. is easy to fix in pr
Hi Team ,
Thanks alot for the lib.Unfortunatly we couldn't use it with Linux and Mac
Version
routellm==0.1.0
Error
`
from routellm.controller import Controller
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'routellm.controller'
`
when i initializing the RouteLLM controller with demo, i meet the errors below:
import os
from routellm.controller import Controller
os.environ["OPENAI_API_KEY"] = "sk-xxxxxxx"
os.environ["ANYSCALE_API_KEY"] = "esecret_xxxxx"
client = Controller(
routers=["mf"],
strong_model="gpt-4-1106-preview",
weak_model="anyscale/mistralai/Mixtral-8x7B-Instruct-v0.1",
)
response = client.chat.completions.create(
model="router-mf-0.11593",
messages=[
{"role": "user", "content": "Hello!"}
]
)
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/openai_server.py", line 22, in <module>
from routellm.controller import Controller, RoutingError
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/controller.py", line 10, in <module>
from routellm.routers.routers import ROUTER_CLS
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/routers.py", line 17, in <module>
from routellm.routers.matrix_factorization.model import MODEL_IDS, MFModel
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/matrix_factorization/model.py", line 4, in <module>
from routellm.routers.similarity_weighted.utils import OPENAI_CLIENT
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/routellm/routers/similarity_weighted/utils.py", line 8, in <module>
from sklearn.linear_model import LogisticRegression
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/__init__.py", line 84, in <module>
from .base import clone
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/base.py", line 19, in <module>
from .utils._estimator_html_repr import _HTMLDocumentationLinkMixin, estimator_html_repr
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/__init__.py", line 11, in <module>
from ._chunking import gen_batches, gen_even_slices
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_chunking.py", line 8, in <module>
from ._param_validation import Interval, validate_params
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_param_validation.py", line 14, in <module>
from .validation import _is_arraylike_not_scalar
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/validation.py", line 26, in <module>
from ..utils._array_api import _asarray_with_order, _is_numpy_namespace, get_namespace
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/_array_api.py", line 11, in <module>
from .fixes import parse_version
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/sklearn/utils/fixes.py", line 20, in <module>
import scipy.stats
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/__init__.py", line 606, in <module>
from ._stats_py import *
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/_stats_py.py", line 49, in <module>
from . import distributions
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/distributions.py", line 11, in <module>
from . import _discrete_distns
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/stats/_discrete_distns.py", line 10, in <module>
from scipy.interpolate import interp1d
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/__init__.py", line 167, in <module>
from ._interpolate import *
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_interpolate.py", line 14, in <module>
from . import _fitpack_py
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_fitpack_py.py", line 8, in <module>
from ._fitpack_impl import bisplrep, bisplev, dblint # noqa: F401
File "/data/RouteLLM/.routellm/lib/python3.9/site-packages/scipy/interpolate/_fitpack_impl.py", line 103, in <module>
'iwrk': array([], dfitpack_int), 'u': array([], float),
TypeError
i use python 3.9.5 and install routellm with
python3 -m venv .routellm
source .routellm/bin/activate
pip install "routellm[serve,eval]"
pls help to find the problem, ths!
The current matrix factorization router (MFModel
) is unnecessarily complex. Given that all operations in the forward pass are linear with no activations, we can significantly simplify this model.
Currently, we're doing several steps:
Since these are all linear operations, they can be collapsed into a single matrix multiplication, embedding * model. This would:
[0.017317, 0.01118, 0.303653, 0.579448, -0.376634, -0.208742, -0.193392, 0.639659, -0.085909, 0.107312, 0.300785, -0.349391, -0.384368, -0.145022, 0.317397, -0.063074, -0.128751, 0.243364, -0.181707, 0.808825, 0.275169, 0.666149, -0.115858, 0.155953, 0.24292, -0.197154, -0.157491, 0.11632, 0.197647, 0.040279, 0.409797, -1.24056, -0.511287, -0.393113, -0.108808, 0.039914, 0.366597, 0.135737, 0.198802, 0.119974, 0.153426, -0.22505, 0.674797, 0.284063, -0.196429, 0.155066, -0.212335, -0.363016, 0.212736, 0.211674, -0.372157, 0.010955, 0.037939, -0.066029, -0.07933, -0.101132, -0.311588, 0.077285, -0.207608, 0.125983, 0.510143, -0.255973, -0.096116, 0.229892, -0.434007, 0.344456, -0.137472, 0.41125, -0.052777, 0.06959, -0.043151, -0.062137, 0.162818, 0.041656, 0.077479, 0.126347, 0.061875, 0.116124, 0.247373, 0.453157, -0.101855, 0.040579, -0.552021, 0.12112, -0.823787, -0.296899, -0.46667, 0.022095, -0.310721, -0.401873, 0.016014, 0.548683, -0.438079, 0.239599, 0.445288, 0.132415, 0.160069, 0.509489, 0.058122, 0.108559, -0.005905, -0.425724, -0.189577, 0.053441, 0.535769, -0.008355, 0.142684, 0.009374, -0.219168, 0.033156, -0.420615, -0.145288, -0.135326, -0.172469, -0.371276, -0.215616, -0.413526, -0.300192, 0.005224, -0.410654, -0.407338, -0.086193, 0.244957, -0.122847, -0.180609, -0.221066, -0.014492, -0.125457, 0.077016, -0.481223, 0.571043, -0.406598, 0.58677, -0.793018, -0.574046, 0.168964, -0.140509, 0.062438, -0.574914, 0.517542, 0.305398, -0.040312, 0.133368, 0.227152, 0.194301, -0.204837, 0.117291, 0.12243, -0.357704, -0.12873, -0.305825, 0.041006, -0.307598, 0.295055, -0.178294, -0.434973, 0.395057, 0.150889, -0.384117, 0.362086, -0.52113, 0.616021, 0.011602, -0.140464, -0.408201, 0.412563, -0.194249, -0.455087, -0.418446, 0.00887, -0.351214, 0.140768, 0.436393, 0.102469, 0.367923, -0.026261, 0.122515, -0.436895, -0.119046, -0.242179, -0.215407, 0.443827, -0.048456, -0.215488, -0.181659, 0.28717, 0.001767, 0.122558, -0.494734, 0.113986, -0.307884, 0.331145, 0.101143, 0.196236, 0.120081, -0.567916, 0.431674, 0.210675, 0.397218, -0.003222, -0.124574, 0.163897, -0.012514, -0.437356, -0.083122, -0.277771, -0.072012, 0.215686, 0.06413, -0.13142, 0.094984, -0.38486, -0.072067, 0.495137, -0.393166, -0.230718, -0.116048, 0.712394, 0.279401, 0.238164, 0.041076, 0.148722, 0.580803, 0.614566, -0.147473, 0.432371, 0.713287, -0.012816, 0.17443, 0.122719, -0.168159, 0.062227, -0.511618, -0.242144, -0.15323, 0.176365, -0.331397, -0.130046, 0.520083, -0.236528, -0.234034, -0.201974, -0.235412, 0.408897, -0.56152, 0.197764, -0.57766, -0.745011, -0.153192, 0.378314, 0.060145, -0.132778, 0.742793, 0.08398, -0.493689, 0.071289, 0.147504, -0.078614, 0.068797, -0.36456, -0.150841, -0.128449, -0.523257, -0.515868, -0.293334, -0.1087, -0.216722, -0.37464, -0.362562, 0.145622, 0.273712, -0.309493, 0.331044, -0.169746, -0.116288, 0.106025, 0.075529, -0.081049, 0.245917, 0.180469, -0.562014, -0.29112, 0.020882, 0.134045, 0.106949, 0.326906, 0.184262, 0.028326, 0.03369, -0.251042, 0.196618, -0.420975, -0.021204, -0.00376, 0.19101, -0.335425, -0.217719, 0.111878, -0.016975, 0.30771, 0.433765, 0.150516, -0.073278, 0.171964, -0.305194, 0.080526, 0.08366, -0.170164, 0.442168, 0.106601, -0.04912, -0.071456, 0.259819, -0.111718, 0.138566, -0.60584, 0.147761, 0.152774, 0.057143, -0.759514, 0.069949, -0.825877, 0.335864, 0.199449, -0.266243, -0.403074, 0.20985, -0.188599, -0.019244, -0.375069, -0.421033, -0.1462, -0.220748, -0.061277, -0.135211, -0.141422, -0.215439, 0.091185, -0.007891, -0.274731, -0.594342, -0.451199, 0.021285, 0.272036, -0.007255, 0.172055, -0.032696, 0.376444, -0.175173, 0.255335, -0.264267, 0.083475, 0.179118, 0.091082, 0.260392, 0.171118, 0.421613, -0.558687, -0.341742, -0.279588, -0.10411, -0.058658, 0.086409, 0.492655, -0.210353, 0.551876, -0.128579, -0.1514, 0.193864, 0.246684, -0.30106, 0.512475, -0.348025, 0.269122, -0.478439, 0.593487, 0.375225, 0.332428, 0.340556, 0.264401, 0.087356, 0.632642, 0.088945, -0.560939, 0.390676, 0.162052, 0.411639, -0.289915, -0.632261, -0.413713, 0.355988, -0.485467, 0.383603, 0.303537, -0.381534, -0.092763, 0.417598, 0.803573, -0.405532, -0.347625, 0.285436, -0.178229, -0.400952, -0.26588, 0.369776, 0.145592, -0.439068, -0.263021, -0.086975, -0.229377, 0.395024, -0.386163, 0.785113, -0.064416, 0.236719, -0.15956, 0.083073, -0.106436, 0.145759, 0.221116, 0.033651, -0.142519, -0.135173, -0.163156, 0.111862, 0.309598, 0.234952, -0.487281, 0.028245, -0.869042, 0.15329, 0.262507, 0.154243, 0.327218, -0.018497, -0.247377, -0.144596, 0.131668, 0.129669, 0.204863, 0.201405, 0.348766, -0.056677, 0.067078, -0.473868, 0.084789, 0.342684, 0.190402, -0.130603, -0.294488, -0.053648, -0.393713, -0.288915, -0.081103, 0.031378, 0.284718, 0.303025, 0.199114, 0.252887, 0.023813, 0.317988, -0.108757, 0.021954, 0.25852, 0.133369, 0.033529, 0.348155, 0.484368, -0.082799, -0.478818, 0.139661, -0.112743, 0.242792, 0.214462, -0.035537, 0.198981, 0.720355, 0.105166, 0.118839, -0.398867, 0.056433, 0.290868, -0.166652, -0.187253, -0.025869, 0.348516, -0.194342, -0.018384, -0.707275, -0.212753, -0.342487, 0.366088, 0.502167, -0.481192, 0.002395, -0.303666, -0.496359, -0.437127, 0.070926, -0.164983, 0.493785, 0.254769, -0.330188, 0.69813, 0.110217, -0.165885, -0.09634, 0.056251, 0.016961, 0.357136, -0.484442, 0.201198, 0.066619, -0.262838, -0.326917, 0.323244, 0.05816, 0.367642, -0.142585, -0.375839, -0.334615, 0.190125, -0.029278, 0.099811, 0.143965, -0.275176, -0.58817, -0.144267, -0.359548, -0.310475, 0.534627, -0.576121, -0.194085, 0.052187, -0.187249, -0.235678, -0.122017, 0.375999, -0.086289, -0.127235, 0.080998, 0.181145, 0.157067, -0.26179, -0.451746, -0.135946, -0.236, 0.052646, 0.336281, 0.21719, 0.186457, -0.002216, -0.056215, -0.369369, 0.442009, -0.228632, -0.175233, -0.292619, 0.18085, -0.222465, -0.071054, -0.036178, 0.42584, -0.052242, -0.186202, 0.438149, -0.189797, -0.16556, 0.036239, -0.02704, -0.254496, -0.069539, -0.232275, -0.695319, 0.460565, 0.33609, 0.51992, -0.283644, 0.143016, 0.185549, 0.012047, -0.222176, -0.130095, -0.261126, -0.422626, 0.286046, 0.318453, -0.25702, 0.280548, 0.066077, 0.205378, 0.221395, 0.134313, 0.202538, -0.112085, 0.112352, -0.311995, -0.114661, -0.305415, 0.163122, -0.162758, 0.064207, 0.100317, -0.297041, 0.153704, 0.412633, -0.236838, -0.213884, 0.043544, 0.078991, 0.026837, 0.399776, -0.292028, -0.702604, 0.238641, -0.057664, -0.338922, 0.101509, -0.030345, -0.092672, 0.189603, -0.184702, -0.224473, 0.232278, 0.167241, 0.204301, -0.074669, -0.31327, -0.069146, 0.169052, 0.34982, 0.001693, 0.495445, 0.169925, -0.079298, -0.00096, 0.068827, -0.110808, 0.049159, -0.156822, 0.033281, -0.138699, 0.064114, -0.183973, 0.299447, 0.020633, -0.394375, 0.22391, 0.29888, -0.162223, -0.154018, 0.0686, 0.091588, 0.010075, 0.177063, 0.337276, -0.258455, -0.172135, -0.309286, 0.11186, -0.063176, -0.131384, -0.117094, -0.025922, 0.217625, 0.064211, 0.097853, 0.21063, 0.209421, -0.003702, -0.12937, 0.568447, 0.056538, 0.071752, 0.131685, 0.265961, 0.13205, -0.342845, -0.14158, 0.327599, 0.206992, 0.380256, -0.092596, -0.077388, -0.19744, 0.0181, 0.287433, 0.088687, 0.097779, -0.044891, -0.404558, 0.147617, 0.422414, 0.11152, 0.308355, -0.106925, 0.204491, 0.043149, 0.065036, -0.753266, 0.122351, 0.336833, -0.00801, -0.262349, -0.193282, -0.103019, -0.089863, 0.171337, 0.309414, 0.014423, 0.098344, -0.110209, -0.169665, -0.030896, -0.097471, 0.00666, 0.101595, 0.061852, 0.176964, -0.21323, -0.099782, 0.228022, -0.262198, -0.425247, 0.417079, 0.017299, -0.191564, 0.004748, -0.250221, 0.234701, -0.271065, -0.057453, 0.304677, 0.4701, 0.250589, -0.087086, -0.429968, -0.26403, -0.387913, -0.464612, -0.342326, -0.071384, 0.056032, 0.187852, 0.380555, 0.189432, 0.34011, 0.266143, 0.009143, -0.317522, -0.234059, 0.276891, 0.174809, 0.140528, -0.105288, -0.65848, 0.084518, -0.234592, 0.318019, 0.510351, 0.006479, 0.537869, -0.392096, -0.411233, -0.189889, 0.134191, -0.075683, -0.169409, 0.125705, -0.327027, -0.066445, -0.52144, -0.097577, -0.177766, 0.232948, -0.135097, -0.343601, -0.091137, 0.062618, 0.053287, 0.18644, -0.6094, 0.048837, 0.267879, -0.413453, -0.141747, 0.207981, -0.04925, -0.174698, -0.509869, -0.476397, 0.068638, -0.152651, 0.104868, 0.197331, -0.064872, -0.1051, -1.40418, -0.194817, 0.208227, -0.045253, -0.232286, 0.073835, 0.12477, 0.393212, 0.347051, -0.187002, 0.079182, -0.27366, -0.215268, 0.375153, 0.270839, -0.334651, -0.126299, 0.34891, -0.174526, 0.234166, -0.317101, 0.057596, -0.157946, 0.15384, 0.16841, 0.158807, -0.192711, 0.192967, -0.262208, 0.108206, 0.238273, 0.236885, -0.399003, 0.221671, 0.038937, -0.107384, 0.288186, 0.160961, -0.086901, 0.055572, -0.190251, -0.233012, -0.054056, -0.080065, 0.111019, -0.044721, 0.036763, 0.068096, -0.017873, 0.261569, 0.346434, 0.065229, -0.023851, -0.330086, 0.213761, 0.128141, -0.138356, -0.062674, 0.195684, 0.215495, 0.194634, -0.339133, -0.268465, -0.298594, -0.362164, -0.253306, -0.168292, 0.199113, -0.524123, -0.090773, -0.096247, 0.046664, -0.046513, 0.13497, 0.114262, -0.488398, -0.2347, 0.26051, 0.031243, -0.152594, 0.258885, -0.064539, -0.176934, -0.027078, 0.197796, -0.050404, 0.004199, -0.020745, -0.127675, 0.053641, 0.515427, 0.131214, 0.353022, 0.284469, 0.01992, 0.120054, -0.318418, -0.026164, 0.306722, 0.035191, 0.425452, 0.046934, 0.010072, -0.134704, -0.118026, 0.033954, 0.444288, 0.004718, 0.035425, -0.030341, 0.394551, -0.165347, -0.115437, -0.017297, -0.585792, 0.17584, 0.377414, 0.421793, 0.188193, 0.307312, 0.610973, -0.196335, -0.29751, -0.105334, 0.199592, -0.195532, -0.095663, 0.142824, 0.130411, -0.080841, 0.202719, 0.471838, -0.072826, 0.246151, 0.109777, -0.101721, 0.169312, 0.54931, -0.074526, 0.021988, -0.096728, -0.223985, -0.058271, 0.23175, -0.332564, 0.169538, -0.225755, 0.046639, 0.136866, -0.158008, 0.114861, 0.065593, -0.117845, 0.490567, -0.378452, 0.408763, 0.048036, 0.315145, -0.041749, 0.309414, 0.031155, 0.347439, -0.051953, -0.201888, 0.179567, 0.17787, 0.152476, -0.050791, 0.420996, -0.111863, 0.110077, 0.268456, -0.074361, -0.144558, 0.119518, 0.188343, 0.396397, -0.381355, 0.012706, 0.245918, 0.26378, 0.207468, 0.06862, 0.268775, 0.503796, -0.042588, 0.299801, 0.264099, 0.567906, 0.343754, 0.112813, -0.058419, 0.151873, 0.105714, 0.013268, -0.104881, 0.179048, 0.103319, 0.155907, -0.207802, -0.594822, 0.001902, 0.334797, -0.128813, 0.02412, 0.158227, 0.232278, -0.168783, -0.101024, 0.001426, -0.334838, -0.25871, -0.281469, 0.175912, 0.173545, 0.199818, 0.156694, -0.202074, -0.528855, 0.341782, -0.294037, -0.567092, 0.042527, 0.229844, -0.274017, 0.111275, 0.022757, -0.276101, 0.432179, 0.322151, -0.11445, 0.865446, 0.367544, 0.267589, 0.00913, -0.410267, 0.137246, -0.013712, 0.620266, -0.091809, -0.297659, -0.373554, 0.207084, -0.421513, -0.183964, -0.156403, 0.219091, -0.508866, 0.516564, -0.361563, -0.201876, 0.202988, 0.183052, -0.22674, 0.057602, 0.041183, -0.211405, 0.247517, 0.204372, 0.042675, -0.214661, -0.111943, 0.009249, -0.014273, -0.351459, 0.070249, -0.315316, 0.133022, -0.073426, -0.180068, -0.333467, -0.067528, 0.357887, 0.430013, 0.131229, 0.298485, 0.373571, -0.302588, -0.04142, -0.344667, -0.283525, 0.640575, 0.317337, 0.401381, 0.189486, 0.073186, 0.02416, -0.215443, 0.056143, 0.120336, -0.231008, -0.105986, -0.453503, -0.219785, -0.030274, -0.367342, -0.113358, 0.196147, 0.291157, 0.326472, 0.446857, -0.085561, 0.010959, 0.066616, 0.15023, -0.209559, -0.112984, 0.072598, -0.427699, -0.260073, 0.032521, 0.081192, -0.014159, 0.143266, 0.197289, 0.067981, 0.173343, -0.155237, 0.193014, -0.033441, -0.270513, 0.12482, -0.140087, -0.524852, -0.142413, -0.197585, 0.069683, 0.00106, -0.060416, 0.241788, -0.273508, 0.014679, -0.066452, -0.355985, -0.262008, 0.26785, -0.009632, 0.163352, -0.068926, 0.46138, -0.317769, -0.397394, 0.224559, 0.352467, -0.097191, -0.287376, 0.408935, 0.345993, 0.09068, 0.2473, 2.3111, -0.14702, -0.111799, -0.052716, 0.230692, 0.225265, -0.35181, 0.094639, -0.154193, 0.185283, -0.315491, -0.077438, 0.24265, -0.103315, -0.156623, -0.086985, -0.316301, 0.000796, -0.025065, -0.097864, -0.362233, -0.448295, -0.403811, 0.258856, -0.100113, -0.055167, 0.294756, 0.024366, 0.102181, -0.106253, 0.023481, 0.160745, 0.063656, 0.155556, -0.336469, 0.325614, -0.266145, -0.074525, 0.201849, 0.441004, -0.174538, 0.131324, 0.284181, -0.261139, 0.098757, -0.019434, -0.194059, -0.108849, -0.072083, -0.093592, -0.285213, -0.176247, 0.069006, 0.297378, -0.025485, 0.268425, -0.101778, 0.018244, 0.776521, 0.297483, 0.251349, -0.167599, -0.30711, 0.070886, 0.01418, 0.285411, -0.430578, -0.237813, 0.059797, 0.027026, -0.0401, 0.143306, -0.469388, 0.055392, 0.137084, 0.284571, 0.189084, -0.405384, 0.135162, -0.680802, -0.434545, -0.210474, 0.30213, 0.114895, 0.167591, -0.307093, -0.255949, 0.242898, 0.187186, 0.3594, -0.125649, 0.174752, 0.301497, -0.150837, 0.118552, 0.144685, 0.023964, 0.20746, -0.186843, 0.230801, 0.11998, 0.099391, -0.390997, 0.242291, -0.209336, -0.369022, 0.225537, -0.254627, -0.19489, 0.007398, 0.30297, -0.100568, -0.039901, -0.267365, 0.17685, 0.032181, -0.051405, -0.003954, 0.061989, -0.398622, -0.102953, 0.230554, 0.369276, -0.32691, 0.121757, 0.282954, 0.275177, 0.301383, -0.048143, -0.102173, 0.270449, 0.326503, 0.356696, 0.198148, 0.566387, 0.118633, 0.069914, 0.049507, 0.264942, -0.021149, -0.315653, 0.195143, -0.037403, -0.560274, 0.036958, 0.226462, -0.187307, 0.00932, 0.06245, 0.158091, -0.02271, 0.303259, -0.281134, 0.229444, 0.202054, -0.022002, -0.175618, -0.035272, -0.416639, -0.079588, -0.190756, 0.237299, 0.128946, -0.025495, 0.31631, 0.165038, -0.036987, -0.056892, -0.472618, -0.240427, 0.258912, 0.142983, -0.017613, 0.09934, 0.301944, -0.317137, -0.045731, 0.176888, -0.237915, 0.034828, -0.244753, -0.262084, 0.007381, 0.179293, 0.012775, 0.134795, -0.16332, -0.444582, -0.080167, 0.024672, -0.090209, -0.09143, 0.177423, 0.066397, -0.464973, 0.473688, 0.156524, -0.011874, -0.018553, 0.049021, -0.058733, -0.16094, -0.055641, 0.084314, -0.180604, -0.147321, 0.507487, 0.259353, 0.214523, 0.136566, 0.10569, -0.117942, 0.207137, 0.524199, 0.176873, 0.319673, 0.065076, 0.200993, 0.067377, -0.128274, -0.148678, -0.369512, -0.073067, 0.022234, -0.376015, -0.161213, -0.004808, -0.385252, -0.063738, 0.172607, -0.040167, -0.120519, 0.296494, -0.195137, 0.055634, 0.323904, -0.638334, -0.255347, -0.100382, 0.251132, -0.055979, 0.004391, -0.289993, -0.004406, 0.050617, 0.410566, 0.452379, -0.556643, 0.081581, 0.137408, 0.254382, 0.251986, 0.082583, -0.024478, -0.477649, 0.310222, 0.211715, 0.022005, 0.063267, -0.130571, 0.155438, 0.380635, 0.231092, 0.099042, -0.391679, -0.058661, -0.540002, -0.358878, -0.324142, 0.243863, -0.400055, 0.103157, -0.262598, -0.044676, -0.444585, 0.030034, 0.01668, 0.311564, 0.543531, -0.047709, -0.113976, -0.304748, -0.150807, -0.274888, 0.024604, -0.183968, 0.024504, 0.393683, -0.430544, -0.323938, 0.306146, -0.039433, -0.189903, 0.057104, 0.19676, 0.036725, 0.079969, -0.205473, -0.314785, 0.030175, -0.049927, 0.061419, -0.36235, -0.056072, 0.159138, 0.456674, 0.007084, 0.441482, -0.175448, 0.061765, 0.412505, -0.402356, -0.084174, 0.085337, -0.180057, 0.284374, 0.031825, 0.15114, 0.045856, 0.362218, 0.371848, 0.142496, 0.376347, 0.309523, 0.437986, -0.178713, -0.200895, -0.046065, 0.183416, -0.31115, 0.299963, -0.005362, 0.397519, -0.025268, 0.382294, -0.424654, -0.169118, 0.246686, -0.017109, -0.480841, -0.132066, 0.066515, -0.014366, 0.487456, -0.023139, 0.006938, 0.314802, 0.340747, -0.010792, 0.064729, 0.304637, 0.072488, -0.257531, -0.164407, -0.238009, 0.251726, 0.442151, -0.439882, -0.096664, 0.030146, -0.100694, -0.168094, -0.193923, 0.46795, 0.080172, 0.063586, -0.328571, -0.16416, -0.259619, 0.293085, -0.279067, 0.232538, 0.033095, -0.198362, -0.305268, -0.361208, 0.034213, 0.427696, -0.033954, -0.227259, 0.01694, -0.551509, -0.055286, -0.099024, 0.267421, 0.104194, 0.000865, -0.088973, 0.200319]
Can I ask if we must have an OpenAI API to use RouteLLM? Currently, I'm using Groq as the weak model and Anthropic as the strong model, and it shows this error:
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable.
What this threshold value actually means?
routed_model = client.route(
prompt="What's the squareroot of 144?",
router="bert",
threshold=0.4066,
)
print(f"Prompt should be routed to {routed_model}")
[For 50.0% strong model calls for bert, threshold = 0.4066] what does it exactly implies? Does it mean that among 10 queries 50% will be routed to strong models?
For my use case, I have different models running on different servers which all replicate the OpenAI completions endpoint. However, from what I can see, it is currently not possible to use both the default OpenAI base URL and a custom server's base URL when defining the controller (or two separate server base URLs). There is functionality to create a Controller
client from within a local server running a model, but what if there is no access to run the RouteLLM code from within the server hosting the model.
It would be great if the controller could be provided with a separate base URL for a strong and weak model as, from my understanding, right now, if a base_url is provided, it overrides the base_url used for both the strong and weak model.
Example Use Case:
I want to use an OpenAI model like GPT-4o (using the OpenAI completions endpoint) and an open-source model like Mistral running on a custom server with a custom URL (replicating the OpenAI completion endpoint).
Actual Behavior:
Steps to Reproduce:
Define a controller without a base_url parameter.
Attempt to call a model (e.g., Mistral) hosted on a custom server with its own URL.
Define a controller with a custom base_url.
Attempt to call an OpenAI model (e.g., GPT-4o).
Current Workaround:
I am currently solving this in a rudimentary way by checking whether the model called in the Controller.completion
function (found within kwargs["model"]
) matches the strong_model
string or the weak_model
string and using the corresponding base_url ive provided for each model.
Proposed Solution:
Introduce functionality in the Controller class to allow specifying separate base URLs for the strong and weak models.
controller = Controller(
strong_model="gpt-4o",
weak_model="openai/mistral",
strong_model_base_url="https://api.openai.com/v1" # or just None,
weak_model_base_url="http://custom-endpoint.com/v1"
)
Not sure if I am just misunderstanding something and this functionality does exist. Thank you!
What is the significance of using Hadamard product between model embedding and query embedding in the Matrix Factorization router? Why not directly compute the score using the dot product?
Hi developers, thanks for your effort on this project!
I have a question: in the paper, when calculating router monetary costs, were the costs of the OpenAI embedding model (i.e., text-embedding-3-small) included?
Hello,
I have not clearly understood the format and source of the datasets used to train these routers. It's said to be published in huggingface. but for example I can't find the dataset that is used to train: routellm/mf_gpt4_augmented. As I understood from the code: train_matrix_factorization.py there has to be a json dataset with keys idx, model_a, model_b, winner. But there is no such dataset in the huggingface. Could you clarify the format and the creation of the dataset that is used for mf training?
I got a numpy error which required me to pin a lesser version of numpy. PR inc
The server fails for some clients/chat interfaces trying to connect to it bc it doesn't respond to preflight requests (CORS), causing the client to not POST completion requests.
The following server output is observed:
INFO: 127.0.0.1:13579 - "OPTIONS /dashboard/billing/usage?start_date=2024-08-01&end_date=2024-08-07 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:13580 - "OPTIONS /dashboard/billing/subscription HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:13590 - "OPTIONS /v1/v1/chat/completions HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:13645 - "OPTIONS /v1/chat/completions HTTP/1.1" 405 Method Not Allowed
INFO: 127.0.0.1:13655 - "OPTIONS /dashboard/billing/usage?start_date=2024-08-01&end_date=2024-08-07 HTTP/1.1" 404 Not Found
INFO: 127.0.0.1:13656 - "OPTIONS /dashboard/billing/subscription HTTP/1.1" 404 Not Found
PR incoming
I set up my server successfully but when I even ask a simple question server responds with the following error
INFO: 127.0.0.1:63236 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\errors.py", line 186, in __call__
raise exc
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 776, in app
await route.handle(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 297, in handle
await self.app(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 64, in wrapped_app
raise exc
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\starlette\routing.py", line 72, in app
response = await func(request)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\routing.py", line 278, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\openai_server.py", line 153, in create_chat_completion
routed_model = route_fn(prompt, threshold, ROUTED_PAIR)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\routers.py", line 42, in route
if self.calculate_strong_win_rate(prompt) >= threshold:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\routers.py", line 235, in calculate_strong_win_rate
winrate = self.model.pred_win_rate(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\matrix_factorization\model.py", line 124, in pred_win_rate
logits = self.forward([model_a, model_b], prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\Desktop\Bisrat files\RouteLLM\routellm\routers\matrix_factorization\model.py", line 113, in forward
OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model)
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\resources\embeddings.py", line 114, in create
return self._post(
^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1261, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 942, in request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1026, in _request
return self._retry_request(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1074, in _retry_request
return self._request(
^^^^^^^^^^^^^^
File "C:\Users\Bisrat\AppData\Local\Programs\Python\Python312\Lib\site-packages\openai\_base_client.py", line 1041, in _request
raise self._make_status_error_from_response(err.response) from None
openai.RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}
I dont understand how it could say quota limit exceeded before even a single response. please help me navigate this error.
I would like to know how BERT classifier decides between strong and weak models? Where can I see the working of bert_gpt4_augmented?
First of all, I want to congratulate you on this project. I think it is excellent. I would like to integrate this functionality with my existing workflow. I have a gateway that integrates multiple providers, and I need to know which model would be the best to call, instead of making a request through this API.
Therefore, I believe it would be beneficial to have an endpoint that simply indicates which model is the best. I can develop this feture by myself; I just want to know if you also think this would be a good feature so I can make a PR to contribute.
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8977 tokens (8977 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}
I am using routellm _ openrouter and this is the error I am getting when using google's gemini pro
It looks like this only supports two models, a strong and a weak model. But there are other things to consider like if privacy is a concern, or if the question is math heavy, or if the question has a visual element, etc.
Why not have a RouteLLM that could route to several arbitrary models (including local, self-hosted, or models as a service like GPT4).
And provide some example training scripts and/or a training guide that we could use to fine tune this.
How can RouteLLM be used with LangChain?
Any ideas / plans to support ollama?
maybe following a similar approach as for TextGrad which runs perfectly on an embedded device like Jetson Orin.
PS. Awesome work, thx for sharing.
File "/home/andreas/PycharmProjects/RouteLLM/.venv/lib/python3.10/site-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/home/andreas/PycharmProjects/RouteLLM/routellm/openai_server.py", line 40, in lifespan
ROUTERS_MAP[router] = ROUTER_CLS[router](**router_config)
File "/home/andreas/PycharmProjects/RouteLLM/routellm/routers/routers.py", line 231, in __init__
self.strong_model_id = MODEL_IDS[strong_model]
KeyError: 'gpt-4o'
I see there is a predefined list of LLMs in the MODEL_IDS dictionary.
Is there a way to specify arbitrary models
For example in my use case I want to use gpt-4o as the strong model and Qwen/Qwen2-72B-Instruct with Together.ai as an inference provider.
Is there a methodology to generate matrix factorization data for any model pair?
For other models, RouteLLM supports any provider that has an OpenAI-compatible interface, which includes a wide-range of both closed and open-source models running locally or in the cloud. Once you have an OpenAI-compatible endpoint, set the --alt-base-url and --alt-api-key flags to point to your endpoint. e.g. For Anyscale Endpoints,
I followed the instructions above to use my ollama, but it doesn't work at all. The error I get for requesting interface 6060/v1 still seems to point to OpenAI, am I doing something wrong?
Command: 'python -m routellm.openai_server --routers mf --config /Users/dali/PycharmProjects/simply_crawler/configuration/routellm_config.yaml --alt-base-url http://localhost:11434/v1/ --alt-api-key ollama'
The following is the error content:
Launching server with routers: ['mf'] INFO: Started server process [18063] INFO: Waiting for application startup. Loading mf: 100%|██████████| 1/1 [00:00<00:00, 1.54it/s] INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:6060 (Press CTRL+C to quit) INFO: 127.0.0.1:58563 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} INFO: 127.0.0.1:58565 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} ERROR: Exception in ASGI application Traceback (most recent call last): File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__ return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__ raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 758, in __call__ await self.middleware_stack(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 778, in app await route.handle(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 299, in handle await self.app(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/starlette/routing.py", line 74, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/openai_server.py", line 175, in create_chat_completion routed_model = route_fn(prompt, threshold, ROUTED_PAIR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 42, in route if self.calculate_strong_win_rate(prompt) >= threshold: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/routers.py", line 235, in calculate_strong_win_rate winrate = self.model.pred_win_rate( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 124, in pred_win_rate logits = self.forward([model_a, model_b], prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/routellm/routers/matrix_factorization/model.py", line 113, in forward OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model) File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/resources/embeddings.py", line 114, in create return self._post( ^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1240, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 921, in request return self._request( ^^^^^^^^^^^^^^ File "/Users/dali/PycharmProjects/simply_crawler/venv/lib/python3.11/site-packages/openai/_base_client.py", line 1020, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} INFO: 127.0.0.1:58566 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
I am trying to use azure openai but I got this error.
raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
Also can we use multiple models instead of only two strong and weak model?
I am using a matrix factorization (mf) router in a RAG application and want to download the MF model to my local system. Is this possible? Could you also explain how it works internally? Additionally, can we change the embedding model used by the MF router?
Great job with the current routing setup!
I’m wondering if there’s a possibility to expand the routing capabilities to include multiple LiteLLM models. Currently, it seems we can only route between one strong and one weak model.
Here’s why it would be beneficial for example:
Microsoft-PHI: Useful for enterprise tasks and Microsoft integrations.
Google-Gemma: Great for tasks that involve Google’s ecosystem.
Meta-Llama3: Ideal for open-source and research-based queries.
Azure-GPT: Perfect for projects , troubleshooting services, or offering guidance on optimizing cloud infrastructure .
Currently as we have, for high-quality responses on complex queries, a strong model like OpenAI GPT.
Supporting these models would help optimize costs and improve how we handle various data types and queries. Is there a way to integrate this with RouteLLM? Any advice or guidance would be greatly appreciated, let me know by when it will be published.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.