zx3xyy / llmlingua Goto Github PK
View Code? Open in Web Editor NEWThis project forked from microsoft/llmlingua
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
Home Page: https://llmlingua.com/
License: MIT License