Giter VIP home page Giter VIP logo

mamba's Introduction

gallery_img2sketch2img

story-snippets.ipynb

Notebook for the code snippets in Mamba: A shallow dive

finetune.py

This code is setting up and executing a training process for a ๐Ÿ mamba using PyTorch, Hugging Face's Transformers, Accelerate, and W&B (Weights & Biases). Here's a breakdown of the key components and steps:

  1. Imports and Initial Setup: The script imports necessary libraries and modules such as PyTorch, Transformers, Accelerate, and W&B. It also imports a specific model (MambaLMHeadModel) from mamba_ssm.models.mixer_seq_simple.

  2. Accelerator and W&B Initialization: The Accelerator is initialized for distributed and mixed precision training, and W&B (a tool for tracking and visualizing machine learning experiments) is initialized in a disabled mode.

  3. Model Loading and Modification:

    • Model Path and Loading: The model is loaded from a specified path using the MambaLMHeadModel.from_pretrained method.
    • Monkey Patching forward Method: The forward method of the model is modified (monkey patched) to include a custom loss function using cross-entropy loss. This modification allows the model to compute loss if labels are provided.
    • Tokenizer Initialization: A tokenizer (AutoTokenizer) is loaded and configured with padding and end-of-sequence tokens.
  4. Token Embeddings Update: The token embeddings in the model are resized to accommodate additional tokens and special tokens added to the tokenizer.

  5. Dataset Preparation:

    • Loading Dataset: A dataset is loaded from a specified source.
    • Tokenization: The dataset is tokenized using the prepared tokenizer. The tokenize function processes text into a format suitable for the model.
    • Multithreading: Tokenization is done in a multithreaded manner using os.cpu_count() to utilize all available CPU cores.
  6. Data Collation: A collate function is defined to transform a list of tokenized elements into a format suitable for batch processing during training.

  7. Training Configuration:

    • Batch Size, Gradient Accumulation, and Learning Rate: Key hyperparameters for training, such as batch size, gradient accumulation steps, and learning rate, are set.
    • TrainingArguments: The TrainingArguments class from Hugging Face is used to configure various aspects of training like batch sizes, logging, evaluation strategy, and learning rate scheduler.
  8. Trainer Setup: A Trainer object is created with the model, tokenizer, training arguments, data collator, and datasets for training and evaluation.

  9. Model Training: Finally, the model is trained using the Trainer.train() method.

Overall, the script is a comprehensive setup for training a large language model (MambaLMHeadModel) with specific configurations and customizations for tokenization, data collation, and training arguments.

mamba's People

Contributors

geronimi73 avatar albertfgu avatar tridao avatar yair-schiff avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.