Giter VIP home page Giter VIP logo

Comments (2)

TCL606 avatar TCL606 commented on May 13, 2024

In fact, the model that has only gone through the pre-training stage can only perform ASR and AAC tasks and is completely incapable of performing any zero-shot tasks. In other words, the problem of task overfitting is even more serious at this point. As you can see in the paper, we introduce a large amount of QA data in the instruction tuning stage, so that the model can see more abundant prompts, thus alleviating the situation of the model not following instructions. However, the model is still struggling to do more difficult tasks, without reducing the lora factor or being activated.

For your second question, I don't quite understand. In pre-training stage and instruction tuning stage, the Q-Former and LoRA are both updated. We used the model after instruction tuning to plot Figure 3. I think the phenomenon your mentioned can only demonstrate that reducing lora scaling to 2.0 is sufficient to activate the model capacity, but does not directly indicate that the pre-trained model can solve these tasks.

from salmonn.

TCL606 avatar TCL606 commented on May 13, 2024

I will close this issue. If you have any question, welcome to reopen it.

from salmonn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.