slm's Introduction

Scalable Language Model with Generalized Continual Learning

📢Temporary Statement !!!

Thank you for your interest in our work! We have received many requests regarding our code and decided to release the raw code for our ICLR 2024 paper "Scalable Language Model with Generalized Continual Learning". However, please note that we have not yet provided any supplementary explanations.

I've been occupied lately with another video-generation project. Due to this, we've decided to release all our utilized scripts and codes first. While they may seem a bit disorganized, they are functional. I plan to restructure and tidy them up during the ICLR conference period.

Suggestions

Although we have completed this work, there are still some shortcomings. If you wish to continue the related working, I will provide some of my own suggestions. May these will help you.

Bert Benchmark. This benchmark is already approaching its upper limit with existing methods, making it difficult to further improve. Continuing to work on this benchmark will be very difficult.
Llama Benchmark. We find the fusion of large language models (LLMs) with continual learning to be both intriguing and of considerable practical significance. We conducted experiments on the Llama model to achieve our objectives, yet we acknowledge that the problem definition may not be optimal. As evident, different tasks may present varying levels of difficulty, and we recognize that our initial task may be too simplistic. You are welcome to refine and adjust this setup according to your discretion.
Batch size. We follow the L2P , which assumes that all queries within the same batch share the same source. However, this may be a trick and could potentially simplify the retrieval process. To address this concern, we mitigate this phenomenon by employing a robust retriever model, ensuring its effectiveness even when the batch size is set to 1. This aspect could also serve as an intriguing point for discussion.
Engineering. This may not be too research-related, and you can just ignore. I don't think the current way of weight increments is very elegant. Absolutely, exploring methods to save memory and reduce inference costs through engineering techniques is an intriguing pursuit.

References

This repository owes its existence to the exceptional contributions of other projects. If it weren't for the help of the following work, this job wouldn't have been completed:

L2P: https://github.com/google-research/l2p
ProgressivePrompts: https://github.com/arazd/ProgressivePrompts
Alpaca: https://github.com/tatsu-lab/stanford_alpaca
Transformers: https://github.com/huggingface/transformers

Many thanks to their invaluable contributions.

slm's People

Contributors

Stargazers

Watchers

slm's Issues

Dimension mismatch

Thanks for your work. I have encountered one bug when runing the code in Line 134 of SLM/SLM-llama/retriever.py:

weight_offset = torch.take_along_dim(self.weight_offset, idx_vote[:, None,None], dim=1)weight_offset = torch.take_along_dim(self.weight_offset, idx_vote[:, None,None], dim=1)

The error is:
RuntimeErrorRuntimeError: : The size of tensor a (12) must match the size of tensor b (2) at non-singleton dimension 0

Could you help figure it out? Thanks.

I've question about the self.pool_size

Hello. Thank you for sharing your source code.

While reviewing it, I noticed the variable "self.pool_size" being used. From my understanding, it seems to represent the number of core vectors for each task. Is my understanding correct? If not, could you please clarify its purpose?

The reason I'm confused about the variable is the bash script file in your repository.

In vectordb/script.sh, you write like following in the line 54

# ------------------------- pool: 24, groups: 1 ------------------------------------------

How we can define the 24 tasks from AGNews (4 classes), Yelp (5 classes), DBPedia (14 classes), Amazon (5 classes), and Yahoo (10 classes) datasets ?

Based on your paper and source code, it seems that the pool_size variable represents the number of tasks excluding the current task. Therefore, the pool_size should be 4 for all continual learning sequences.

Am I correct...?

ps. If you have some time, could you describe the exact experimental steps needed to reproduce your report?

Thank you in advance.

Recommend Projects

pbihao / slm Goto Github PK

slm's Introduction

Scalable Language Model with Generalized Continual Learning

📢Temporary Statement !!!

Suggestions

References

slm's People

Contributors

Stargazers

Watchers

Forkers

slm's Issues

Dimension mismatch

I've question about the self.pool_size

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent