Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis
This repository contains the implementation and resources for the paper "Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis," accepted at the Design Automation Conference (DAC) 2024. Our work presents a novel framework for personalizing large language models (LLMs) on-device through self-supervised data selection and synthesis techniques. This approach enables efficient and effective personalization without compromising user privacy or requiring extensive computational resources.
With the increasing deployment of large language models (LLMs) in on-device applications, personalizing these models to better reflect individual user preferences and contexts has become crucial. However, traditional personalization methods often rely on extensive data collection and processing, raising privacy and efficiency concerns. We propose a self-supervised framework that leverages user interaction data to selectively synthesize personalized training datasets. This method significantly enhances the personalization effectiveness of LLMs while operating within the constraints of on-device computing resources. Our experiments demonstrate notable improvements in model performance across various tasks, showcasing the potential of our approach for on-device LLM personalization.
If you find our work useful for your research or if you use parts of this code in your own projects, please consider citing our paper:
@article{qin2023enabling,
title={Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis},
author={Qin, Ruiyang and Xia, Jun and Jia, Zhenge and Jiang, Meng and Abbasi, Ahmed and Zhou, Peipei and Hu, Jingtong and Shi, Yiyu},
journal={arXiv preprint arXiv:2311.12275},
year={2023}
}