Giter VIP home page Giter VIP logo

era-capstone's Introduction

ERA-CAPSTONE

๐Ÿค—Space Link

Phi2 : Pretraining LLM from Scratch

Details

  1. Model used: Microsoft Phi2
  2. Dataset used: Tiny Stories dataset(100k samples) & Realtime data(100k samples) from finetuned Phi2 model via Ollama
  3. Pretraining approach: Pretraining using QLoRA

Design

image

Training Loss Curve

image

Training Logs

image

Phi2 : Multimodal Finetuning

Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Audio Model: Whisper
  4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  5. Finetuning Dataset: Instruct 150k dataset based on COCO

Design

image

Approach

image

Pretraining

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Finetuning

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Results

image

Deployed on HF

Text & Image:

image image

Audio & Image:

Question Asked: How many people are there in this image? image image On HF Space: image

Possible Improvements / Future Scope:

  1. Full Training: Here I have pretrained using 200k samples of LAION-CC-SBU dataset with BLIP captions, though its giving good results, full dataset would make it still better.
  2. Captions for Finetuning: I have used Instruct 150k dataset for finetuning the model, original Llava model was finetuned also on blip captions(558k) which would again improve the model capabilities.
  3. Latency Reduction / Model Optimization: Model could be quantized probably with GPTQ or AWQ to reduce the latency and make the model run faster on CPU too.
  4. Audio Adapter: There is abundant data available for Whisper pretraining / finetuning, so could give it a try to use an audio adapter too and finetune a complete multimodal llm.
  5. Lighter Variant of ClIP?: For audio I have used Whisper Tiny and still getting good results at minimal latency, it would be interesting to see if I could use a lighter variant of CLIP as well to reduce the latency.

era-capstone's People

Contributors

ravinaik avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.