Giter VIP home page Giter VIP logo

slm's Introduction

Scalable Language Model with Generalized Continual Learning

📢Temporary Statement !!!

Thank you for your interest in our work! We have received many requests regarding our code and decided to release the raw code for our ICLR 2024 paper "Scalable Language Model with Generalized Continual Learning". However, please note that we have not yet provided any supplementary explanations.

I've been occupied lately with another video-generation project. Due to this, we've decided to release all our utilized scripts and codes first. While they may seem a bit disorganized, they are functional. I plan to restructure and tidy them up during the ICLR conference period.

Suggestions

Although we have completed this work, there are still some shortcomings. If you wish to continue the related working, I will provide some of my own suggestions. May these will help you.

  1. Bert Benchmark. This benchmark is already approaching its upper limit with existing methods, making it difficult to further improve. Continuing to work on this benchmark will be very difficult.

  2. Llama Benchmark. We find the fusion of large language models (LLMs) with continual learning to be both intriguing and of considerable practical significance. We conducted experiments on the Llama model to achieve our objectives, yet we acknowledge that the problem definition may not be optimal. As evident, different tasks may present varying levels of difficulty, and we recognize that our initial task may be too simplistic. You are welcome to refine and adjust this setup according to your discretion.

  3. Batch size. We follow the L2P , which assumes that all queries within the same batch share the same source. However, this may be a trick and could potentially simplify the retrieval process. To address this concern, we mitigate this phenomenon by employing a robust retriever model, ensuring its effectiveness even when the batch size is set to 1. This aspect could also serve as an intriguing point for discussion.

  4. Engineering. This may not be too research-related, and you can just ignore. I don't think the current way of weight increments is very elegant. Absolutely, exploring methods to save memory and reduce inference costs through engineering techniques is an intriguing pursuit.

References

This repository owes its existence to the exceptional contributions of other projects. If it weren't for the help of the following work, this job wouldn't have been completed:

Many thanks to their invaluable contributions.

slm's People

Contributors

pbihao avatar

Stargazers

 avatar 爱可可-爱生活 avatar Dongfang Li avatar Circle-HIT avatar KANG IL LEE avatar  avatar Han Ji Su avatar  avatar truongpdd avatar  avatar  avatar Zhang Kehao avatar Wang Bomin avatar Yuechen avatar  avatar NNT avatar  avatar

Watchers

 avatar  avatar

slm's Issues

Dimension mismatch

Thanks for your work. I have encountered one bug when runing the code in Line 134 of SLM/SLM-llama/retriever.py:

weight_offset = torch.take_along_dim(self.weight_offset, idx_vote[:, None,None], dim=1)weight_offset = torch.take_along_dim(self.weight_offset, idx_vote[:, None,None], dim=1)

The error is:
RuntimeErrorRuntimeError: : The size of tensor a (12) must match the size of tensor b (2) at non-singleton dimension 0

Could you help figure it out? Thanks.

I've question about the self.pool_size

Hello. Thank you for sharing your source code.

While reviewing it, I noticed the variable "self.pool_size" being used. From my understanding, it seems to represent the number of core vectors for each task. Is my understanding correct? If not, could you please clarify its purpose?

The reason I'm confused about the variable is the bash script file in your repository.

In vectordb/script.sh, you write like following in the line 54

# ------------------------- pool: 24, groups: 1 ------------------------------------------

How we can define the 24 tasks from AGNews (4 classes), Yelp (5 classes), DBPedia (14 classes), Amazon (5 classes), and Yahoo (10 classes) datasets ?

Based on your paper and source code, it seems that the pool_size variable represents the number of tasks excluding the current task. Therefore, the pool_size should be 4 for all continual learning sequences.

Am I correct...?

ps. If you have some time, could you describe the exact experimental steps needed to reproduce your report?

Thank you in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.