Based on the following paper.
The problem we're addressing is LLMs like ChatGPT changing unexpectedly with zero transparency.
The nature of these changes depend on the task you are doing. This results in breaking changes to applications in ways that are hard to plan for. This is frustrating, time consuming and costly for application developers.
![The nature of LLM Drift](https://private-user-images.githubusercontent.com/41707476/313469071-92177932-c4e9-42ba-abc0-ac77da2647bc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM3MTA3MjMsIm5iZiI6MTcyMzcxMDQyMywicGF0aCI6Ii80MTcwNzQ3Ni8zMTM0NjkwNzEtOTIxNzc5MzItYzRlOS00MmJhLWFiYzAtYWM3N2RhMjY0N2JjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MTUlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODE1VDA4MjcwM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTdmMmU0MDEzODM2MzM5YTcwMDMwMDE1ZTYyYmNiNGFmMzE1YTJlMmZiM2JjYTdkYTc2NGU1YThkNjg4N2VhMTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Z1npMeMAEZpZKOXDDOXmtXHz6jUJGAeGVyF8d9OEweM)
Current solutions like HuggingFace's OpenLLM leaderboard don't keep a history and require you to trust HuggingFace since they are the ones running the benchmarks and publishing the results
We solve this by using Galadriel which leverages teeML to trustlessly query ChatGPT and other AI models. We have written a generic framework for writing benchmarks based on Galadriel.
Based on this new framework, it's very easy to add new benchmarks and they will automatically be applied to all the LLMs being tracked. For example, I managed to write a new benchmark in just 5 minutes for counting happy numbers.
Mixture-of-expert models seem to be where the world is headed, where there are several models working together to achieve the best result. Different models are suited to different tasks, for example there may be a model that is better at mathematics. LLM Drift can be intelligent middleware that automatically routes user's requests to the best performing model based on their query. This can be integrated into LLM Routers like Openrouter (started by Alex Atallah).