Giter VIP home page Giter VIP logo

speediedan / finetuning-scheduler Goto Github PK

View Code? Open in Web Editor NEW
55.0 3.0 4.0 2.21 MB

A PyTorch Lightning extension that accelerates and enhances foundation model experimentation with flexible fine-tuning schedules.

Home Page: https://finetuning-scheduler.readthedocs.io

License: Apache License 2.0

Python 95.40% Makefile 0.16% Shell 2.61% Dockerfile 1.82%
machine-learning transfer-learning finetuning fine-tuning pytorch-lightning pytorch artificial-intelligence neural-networks superglue

finetuning-scheduler's Introduction

A PyTorch Lightning extension that enhances model experimentation with flexible fine-tuning schedules.


DocsSetupExamplesCommunity

PyPI - Python Version PyPI Status
codecov ReadTheDocs DOI license


FinetuningScheduler explicit loss animation

FinetuningScheduler is simple to use yet powerful, offering a number of features that facilitate model research and exploration:

  • easy specification of flexible fine-tuning schedules with explicit or regex-based parameter selection
    • implicit schedules for initial/naive model exploration
    • explicit schedules for performance tuning, fine-grained behavioral experimentation and computational efficiency
  • automatic restoration of best per-phase checkpoints driven by iterative application of early-stopping criteria to each fine-tuning phase
  • composition of early-stopping and manually-set epoch-driven fine-tuning phase transitions

Setup

Step 0: Install from PyPI

pip install finetuning-scheduler
Additional installation options

Install Optional Packages

To install additional packages required for examples:

pip install finetuning-scheduler['examples']

or to include packages for examples, development and testing:

pip install finetuning-scheduler['all']

Source Installation Examples

To install from (editable) source (includes docs as well):

git clone https://github.com/speediedan/finetuning-scheduler.git
cd finetuning-scheduler
python -m pip install -e ".[all]" -r requirements/docs.txt

Install a specific FTS version from source using the standalone pytorch-lighting package:

export FTS_VERSION=2.0.0
export PACKAGE_NAME=pytorch
git clone -b v${FTS_VERSION} https://github.com/speediedan/finetuning-scheduler
cd finetuning-scheduler
python -m pip install -e ".[all]" -r requirements/docs.txt

Latest Docker Image

Note, publishing of new finetuning-scheduler version-specific docker images was paused after the 2.0.2 patch release. If new version-specific images are required, please raise an issue.

Docker Image Version (tag latest semver)

Step 1: Import the FinetuningScheduler callback and start fine-tuning!

import lightning as L
from finetuning_scheduler import FinetuningScheduler

trainer = L.Trainer(callbacks=[FinetuningScheduler()])

Get started by following the Fine-Tuning Scheduler introduction which includes a CLI-based example or by following the notebook-based Fine-Tuning Scheduler tutorial.


Installation Using the Standalone pytorch-lightning Package

applicable to versions >= 2.0.0

Now that the core Lightning package is lightning rather than pytorch-lightning, Fine-Tuning Scheduler (FTS) by default depends upon the lightning package rather than the standalone pytorch-lightning. If you would like to continue to use FTS with the standalone pytorch-lightning package instead, you can still do so as follows:

Install a given FTS release (for example v2.0.0) using standalone pytorch-lightning:

export FTS_VERSION=2.0.0
export PACKAGE_NAME=pytorch
wget https://github.com/speediedan/finetuning-scheduler/releases/download/v${FTS_VERSION}/finetuning-scheduler-${FTS_VERSION}.tar.gz
pip install finetuning-scheduler-${FTS_VERSION}.tar.gz

Examples

Scheduled Fine-Tuning For SuperGLUE


Continuous Integration

Fine-Tuning Scheduler is rigorously tested across multiple CPUs, GPUs and against major Python and PyTorch versions. Each Fine-Tuning Scheduler minor release (major.minor.patch) is paired with a Lightning minor release (e.g. Fine-Tuning Scheduler 2.0 depends upon Lightning 2.0).

To ensure maximum stability, the latest Lightning patch release fully tested with Fine-Tuning Scheduler is set as a maximum dependency in Fine-Tuning Scheduler's requirements.txt (e.g. <= 1.7.1). If you'd like to test a specific Lightning patch version greater than that currently in Fine-Tuning Scheduler's requirements.txt, it will likely work but you should install Fine-Tuning Scheduler from source and update the requirements.txt as desired.

Current build statuses for Fine-Tuning Scheduler
System / (PyTorch/Python ver) 2.1.2/3.8 2.4.0/3.8, 2.4.0/3.11
Linux [GPUs**] - Build Status
Linux (Ubuntu 22.04) Test Test
OSX (11) Test Test
Windows (2022) Test Test
  • ** tests run on one RTX 4090 and one RTX 2070

Community

Fine-Tuning Scheduler is developed and maintained by the community in close communication with the Lightning team. Thanks to everyone in the community for their tireless effort building and improving the immensely useful core Lightning project.

PR's welcome! Please see the contributing guidelines (which are essentially the same as Lightning's).


Citing Fine-Tuning Scheduler

Please cite:

@misc{Dan_Dale_2022_6463952,
    author       = {Dan Dale},
    title        = {{Fine-Tuning Scheduler}},
    month        = Feb,
    year         = 2022,
    doi          = {10.5281/zenodo.6463952},
    publisher    = {Zenodo},
    url          = {https://zenodo.org/record/6463952}
    }

Feel free to star the repo as well if you find it useful or interesting. Thanks 😊!

finetuning-scheduler's People

Contributors

solalatus avatar speediedan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

finetuning-scheduler's Issues

Use of `Lightning` Unified Package Not Currently Supported

🐛 Bug

When running the default schedule creation instrucitons form the docs I get a ValueError.

To Reproduce

See last lines of BoringModel example below.

# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Initially based on https://bit.ly/3oQ8Vqf
import re
from functools import partial
from typing import List, Optional
from warnings import WarningMessage

import torch
from lightning import LightningDataModule, LightningModule
from torch.optim.lr_scheduler import LambdaLR
from torch.utils.data import DataLoader, Dataset, IterableDataset, Subset


def multiwarn_check(
    rec_warns: List, expected_warns: List, expected_mode: bool = False
) -> List[Optional[WarningMessage]]:
    msg_search = lambda w1, w2: re.compile(w1).search(w2.message.args[0])
    if expected_mode:  # we're directed to check that multiple expected warns are obtained
        return [w_msg for w_msg in expected_warns if not any([msg_search(w_msg, w) for w in rec_warns])]
    else:  # by default we're checking that no unexpected warns are obtained
        return [w_msg for w_msg in rec_warns if not any([msg_search(w, w_msg) for w in expected_warns])]


unexpected_warns = partial(multiwarn_check, expected_mode=False)


unmatched_warns = partial(multiwarn_check, expected_mode=True)


class LinearWarmupLR(LambdaLR):
    def __init__(self, optimizer, num_warmup_steps, num_training_steps, last_epoch=-1):
        def lr_lambda(current_step: int):
            if current_step < num_warmup_steps:
                return float(current_step) / float(max(1, num_warmup_steps))
            return max(
                0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps))
            )

        super().__init__(optimizer, lr_lambda, last_epoch)


class CustomLRScheduler:
    def __init__(self, optimizer):
        self.optimizer = optimizer

    def step(self, epoch):
        ...

    def state_dict(self):
        ...

    def load_state_dict(self, state_dict):
        ...


class RandomDictDataset(Dataset):
    def __init__(self, size: int, length: int):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        a = self.data[index]
        b = a + 2
        return {"a": a, "b": b}

    def __len__(self):
        return self.len


class RandomDataset(Dataset):
    def __init__(self, size: int, length: int):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class RandomIterableDataset(IterableDataset):
    def __init__(self, size: int, count: int):
        self.count = count
        self.size = size

    def __iter__(self):
        for _ in range(self.count):
            yield torch.randn(self.size)


class RandomIterableDatasetWithLen(IterableDataset):
    def __init__(self, size: int, count: int):
        self.count = count
        self.size = size

    def __iter__(self):
        for _ in range(len(self)):
            yield torch.randn(self.size)

    def __len__(self):
        return self.count


class BoringModel(LightningModule):
    def __init__(self):
        """Testing PL Module.

        Use as follows:
        - subclass
        - modify the behavior for what you want

        class TestModel(BaseTestModel):
            def training_step(...):
                # do your own thing

        or:

        model = BaseTestModel()
        model.training_epoch_end = None
        """
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def loss(self, batch, prediction):
        # An arbitrary loss to have a loss that updates the model weights during `Trainer.fit` calls
        return torch.nn.functional.mse_loss(prediction, torch.ones_like(prediction))

    def step(self, x):
        x = self(x)
        out = torch.nn.functional.mse_loss(x, torch.ones_like(x))
        return out

    def training_step(self, batch, batch_idx):
        output = self(batch)
        loss = self.loss(batch, output)
        return {"loss": loss}

    def training_step_end(self, training_step_outputs):
        return training_step_outputs

    def training_epoch_end(self, outputs) -> None:
        torch.stack([x["loss"] for x in outputs]).mean()

    def validation_step(self, batch, batch_idx):
        output = self(batch)
        loss = self.loss(batch, output)
        return {"x": loss}

    def validation_epoch_end(self, outputs) -> None:
        torch.stack([x["x"] for x in outputs]).mean()

    def test_step(self, batch, batch_idx):
        output = self(batch)
        loss = self.loss(batch, output)
        return {"y": loss}

    def test_epoch_end(self, outputs) -> None:
        torch.stack([x["y"] for x in outputs]).mean()

    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]

    def train_dataloader(self):
        return DataLoader(RandomDataset(32, 64))

    def val_dataloader(self):
        return DataLoader(RandomDataset(32, 64))

    def test_dataloader(self):
        return DataLoader(RandomDataset(32, 64))

    def predict_dataloader(self):
        return DataLoader(RandomDataset(32, 64))


class BoringDataModule(LightningDataModule):
    def __init__(self, data_dir: str = "./"):
        super().__init__()
        self.data_dir = data_dir
        self.non_picklable = None
        self.checkpoint_state: Optional[str] = None
        self.random_full = RandomDataset(32, 64 * 4)

    def setup(self, stage: Optional[str] = None):
        if stage == "fit" or stage is None:
            self.random_train = Subset(self.random_full, indices=range(64))

        if stage in ("fit", "validate") or stage is None:
            self.random_val = Subset(self.random_full, indices=range(64, 64 * 2))

        if stage == "test" or stage is None:
            self.random_test = Subset(self.random_full, indices=range(64 * 2, 64 * 3))

        if stage == "predict" or stage is None:
            self.random_predict = Subset(self.random_full, indices=range(64 * 3, 64 * 4))

    def train_dataloader(self):
        return DataLoader(self.random_train)

    def val_dataloader(self):
        return DataLoader(self.random_val)

    def test_dataloader(self):
        return DataLoader(self.random_test)

    def predict_dataloader(self):
        return DataLoader(self.random_predict)


class ManualOptimBoringModel(BoringModel):
    def __init__(self):
        super().__init__()
        self.automatic_optimization = False

    def training_step(self, batch, batch_idx):
        opt = self.optimizers()
        output = self(batch)
        loss = self.loss(batch, output)
        opt.zero_grad()
        self.manual_backward(loss)
        opt.step()
        return loss


from finetuning_scheduler import FinetuningScheduler
from lightning import Trainer

trainer = Trainer(callbacks=[FinetuningScheduler(gen_ft_sched_only=True)])
print(trainer.log_dir)
model = BoringModel()
data = BoringDataModule()
trainer.fit(model,data)

Error

Traceback (most recent call last):
  File "/home/michael/work/michael/ml1-bonus/test.py", line 248, in <module>
    trainer.fit(model,data)
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 603, in fit
    call._call_and_handle_interrupt(
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 645, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1024, in _run
    verify_loop_configurations(self)
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/configuration_validator.py", line 53, in verify_loop_configurations
    _check_deprecated_callback_hooks(trainer)
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/trainer/configuration_validator.py", line 238, in _check_deprecated_callback_hooks
    if is_overridden(method_name="on_load_checkpoint", instance=callback) and has_legacy_argument:
  File "/home/michael/work/michael/lib/python3.8/site-packages/lightning/pytorch/utilities/model_helpers.py", line 34, in is_overridden
    raise ValueError("Expected a parent")
ValueError: Expected a parent

Environment

  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3070
    - NVIDIA GeForce GTX 960
    - available: True
    - version: 11.7
  • Packages:
    - finetuning-scheduler: 0.3.3
    - numpy: 1.24.1
    - pyTorch_debug: False
    - pyTorch_version: 1.13.1+cu117
    - pytorch-lightning: 1.8.4
    - tqdm: 4.64.1
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.8.10
    - version: #64~20.04.1-Ubuntu SMP Fri Jan 6 16:42:31 UTC 2023

Additional context

The ScheduleImplMixin.save_schedule method appears to save a copy of the same schedule file on each machine

🐛 Bug

After setting the trainer.ckpt_path variable to resume the fine-tuning process, the fine-tuning scheduler appears to save an Encoder_ft_schedule.yaml file on each machine.

However, as all machines are utilizing the same shared remote storage during my training process, this concurrent saving action results in an Input/Output error. It may be necessary to decorate the ScheduleImplMixin.save_schedule static method with @rank_zero_only to ensure that the file is saved only once by the primary node.

To address this issue, I decorated the ScheduleImplMixin.save_schedule static method with @rank_zero_only, ensuring that the file is only saved once by the primary node. After making this change, the model training proceeded without further I/O errors.

Environment

  • Fine-Tuning Scheduler Version (e.g., 0.1.0): 2.1.1
  • Lightning Version (e.g., 1.5.0): 2.1.1
  • PyTorch Version (e.g., 2.0): 2.1.1
  • Python version (e.g., 3.11): 3.9
  • OS (e.g., Linux): Linux
  • CUDA/cuDNN version: 11.8
  • GPU models and configuration: A100 80g
  • How you installed PyTorch (conda, pip, source): pip

val_loss requirement

hello thanks for publishing your work to the community !
As I see it is set to automatic that val_loss is monitored variable ; hence it is required for validation epoch to log the validation loss that will be named exactly like that "val_loss" I would suggest to add this info to docs or add monitor variable name to constructor , as it can cost couple hours of debugging to dig it out (as in my case) :)

Phases are not disjoint. The following parameters are specified in multiple phases:

🐛 Bug

Hello, thanks for making this nice callback.
I'm trying to use finetuning-scheduler to train my model phase-by-phase: phase 1, I want to first train part of my model; phase 2, I want to train the whole model. For the following config, the callback reports the error: Phases are not disjoint. So, I want to know how could I acheive my goal by using this callback.

0:
  params: # the parameters for each phase definition can be fully specified
  - model.enc
  - model.dec
  - model.masker1
  - model.mvdr
  max_transition_epoch: 30
  lr: 0.001 # per-phase maximum learning rates can be specified
1:
  params:
  - model
  lr: 0.001 # per-phase maximum learning rates can be specified

To Reproduce

Expected behavior

Environment

  • Fine-Tuning Scheduler Version (e.g., 0.1.0):
  • PyTorch Lightning Version (e.g., 1.5.0):
  • PyTorch Version (e.g., 1.10):
  • Python version (e.g., 3.9):
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

Warning or Info why schedule stopped before reaching the final step.

🚀 Feature

I would like to see finetuning-scheduler logging or warning (or at least an info message) if it ends because the parameter max_depth was set to something smaller than the number of available finetuning steps.

Motivation

I accidentally set the max_depth parameter to 1 in one of my config files. Consequently, the training finished after stage 1 of my schedule. But it did so without some info/ warning. It took me quite some time to track down the wrong config setting.

Pitch

Send an info or warning message if the training is stopped because of max_depth being smaller than the number of available steps.

Alternatives

Leave it as it is ...

Additional context


Enhancement Request: Make toggling between FTS standalone (`pytorch-lightning`) and unified (`lightning`) dependent versions easier

🐛 Bug

I am trying to integrate finetuning-scheduler with my code which is built around pytorch_lightning and am having import errors in a way that seems to be essentially the mirror of #8.

This is having the result that when trying to specify finetuning callbacks in the config as per the example, the following error is observed:

main.py: error: Parser key "trainer.callbacks":
  Does not validate against any of the Union subtypes
  Subtypes: (typing.List[pytorch_lightning.callbacks.callback.Callback], <class 'pytorch_lightning.callbacks.callback.Callback'>, <class 'NoneType'>)
  Errors:
    - Expected a <class 'list'>
    - Import path finetuning_scheduler.FinetuningScheduler does not correspond to a subclass of <class 'pytorch_lightning.callbacks.callback.Callback'>
    - Expected a <class 'NoneType'>
  Given value type: <class 'dict'>
  Given value: {'class_path': 'finetuning_scheduler.FinetuningScheduler'}

To Reproduce

import pytorch_lightning as pl
issubclass(finetuning_scheduler.FinetuningScheduler, pl.Callback)
>>> False
issubclass(finetuning_scheduler.FTSCheckpoint, pl.Callback)
>>> False
issubclass(finetuning_scheduler.FTSEarlyStopping, pl.Callback)
>>> False

import lightning.pytorch as pl
issubclass(finetuning_scheduler.FinetuningScheduler, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSCheckpoint, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSEarlyStopping, pl.Callback)
>>> True

Expected behavior

import pytorch_lightning as pl
issubclass(finetuning_scheduler.FinetuningScheduler, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSCheckpoint, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSEarlyStopping, pl.Callback)
>>> True

import lightning.pytorch as pl
issubclass(finetuning_scheduler.FinetuningScheduler, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSCheckpoint, pl.Callback)
>>> True
issubclass(finetuning_scheduler.FTSEarlyStopping, pl.Callback)
>>> True

Environment

  • CUDA:
    • GPU:
    • available: False
    • version: None
  • Packages:
    • finetuning-scheduler: 2.0.2
    • numpy: 1.22.4
    • pyTorch_debug: False
    • pyTorch_version: 2.0.0
    • pytorch-lightning: 2.0.0
    • tqdm: 4.65.0
  • System:
    • OS: Darwin
    • architecture:
      • 64bit
    • processor: arm
    • python: 3.10.3
    • version: Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000

Change optimizer after a phase transition

🚀 Feature

Hello! I would like to change the optimizer after a phase transition, similar to changing the learning rate schedule after a phase transition. Is this possible? If not, would it be possible to implement as a new feature for finetuning-scheduler?

Motivation

I want to first train a linear top net with SGD then switch to Adam to finetune the whole network. I found that Adam does not converge to the optimal solution for my linear top nets, so that's why I want to use SGD for the first phase.

Pitch

It would be great if we could change the optimizer after a phase transition, similar to changing the learning rate scheduler. We should be able to change both optimizer and learning rate scheduler together.

Alternatives

Additional context


Config option to enable/disable batchnorm running values

Thanks for sharing this great tool

🚀 Feature

Add constructor argument batch_norm_track_running_stats to control track_running_stats of frozen batch norm layers, or avoid changing track_running_stats of batchnorm layers.

Motivation

Currently FinetuningSchedular calls self.freeze(modules=pl_module, train_bn=False). That in turns executes this code for each model module:

if isinstance(module, _BatchNorm):
    module.track_running_stats = False
# recursion could yield duplicate parameters for parent modules w/ parameters so disabling it
for param in module.parameters(recurse=False):
    param.requires_grad = False

This can be confusing because setting track_running_stats=False is not normally done when fine-tunning. It also silently changes the values set when creating the model. The default value of track_running_stats is True so when one fine-tunes a model following the standard procedure of setting requires_grad normally track_running_stats will be True unless set otherwise. I'm not sure why it is implemented this way in BaseFinetuning

Pitch

Adding a constructor argument to provide control over the batchnorm behavior when using the module. I think the default should be to set track_running_stats=True to reproduce the standard "recipe" for fine-tuning.
Alternatively do not alter track_running_stats at all.
At the very least I think it would be important to document the current behavior.


BatchNormalization meddles in finetuning schedule.

🐛 Bug

Hi @speediedan, thanks for creating this fine-tuning callback!

I want to fine-tune the torchvision.models.segmentation.deeplabv3_resnet50 dataset, which contains batch normalization parameters. As a first step, I would like to fine-tune in a simple two-step approach. Starting with the classification part and continue with the backbone. When I run the attached script finetuning-scheduler issues the following warning:

UserWarning: FinetuningScheduler configured the provided model to have 23 trainable parameters in phase 0 (the initial training phase) but the optimizer has subsequently been initialized with 129 trainable parameters. If you have manually added additional trainable parameters you may want to ensure the manually added new trainable parameters do not collide with the 182 parameters FinetuningScheduler has been scheduled to thaw in the provided schedule.

I think this is caused by freeze_before_training calling freeze which has a parameter train_bn which defaults to True and therefore sets requires_grad to True for every batch normalization parameter.

Downstream the code crashes because of parameters appearing in more than one parameter group (though, not in the attached sample, as the data does not fit the model, but it reproduces the initial bug).

To Reproduceimport torch

from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, LightningDataModule
from pytorch_lightning.cli import LightningArgumentParser, LightningCLI
from torchvision.models.segmentation import deeplabv3_resnet50


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        
        self.model = deeplabv3_resnet50()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        params = list(filter(lambda x: x.requires_grad, self.parameters()))
        optimizer = torch.optim.Adam(params, lr=0.1)

        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, factor=0.5, patience=5)
        return {
            "optimizer": optimizer,
            "lr_scheduler": {
                "scheduler": scheduler,
                "monitor": "valid_loss",
            },
        }


class MyDataModule(LightningDataModule):

    def __init__(self):
        super().__init__()

    def train_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def val_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def test_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)


from finetuning_scheduler import FinetuningScheduler, FTSCheckpoint, FTSEarlyStopping


class BoringCLI(LightningCLI):

    def add_arguments_to_parser(self, parser: LightningArgumentParser) -> None:
        parser.add_lightning_class_args(FinetuningScheduler, 'finetune_scheduler')
        parser.set_defaults({
            'finetune_scheduler.ft_schedule': {
                0: {
                    'params': ['model.classifier.*'],  # the parameters for each phase definition can be fully specified
                    'max_transition_epoch': 2,
                    'lr': 0.001,
                },
                1: {
                    'params': ['model.backbone.*'],
                    'lr': 0.001 
                }
            },
        })

        # EarlyStopping
        parser.add_lightning_class_args(FTSEarlyStopping, "early_stopping")
        early_stopping_defaults = {
            "early_stopping.monitor": "valid_loss",
            "early_stopping.patience": 99999,  # disable early_stopping
            "early_stopping.mode": "min",
            "early_stopping.min_delta": 0.01,
        }
        parser.set_defaults(early_stopping_defaults)

        # ModelCheckpoint
        parser.add_lightning_class_args(FTSCheckpoint, 'model_checkpoint')
        model_checkpoint_defaults = {
            "model_checkpoint.filename": "epoch{epoch}_val_loss{valid_loss:.2f}",
            "model_checkpoint.monitor": "valid_loss",
            "model_checkpoint.mode": "min",
            "model_checkpoint.every_n_epochs": 1,
            "model_checkpoint.save_top_k": 5,
            "model_checkpoint.auto_insert_metric_name": False,
            "model_checkpoint.save_last": True
        }
        parser.set_defaults(model_checkpoint_defaults)


if __name__ == "__main__":
    BoringCLI(BoringModel, MyDataModule, seed_everything_default=False, save_config_overwrite=True)

Expected behavior

Call freeze with train_bn=False, to avoid training the batch normalization parameters by default.

Environment

  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3090
    - available: True
    - version: 11.3
  • Packages:
    - finetuning-scheduler: 0.2.3
    - numpy: 1.23.3
    - pyTorch_debug: False
    - pyTorch_version: 1.12.1
    - pytorch-lightning: 1.7.7
    - tqdm: 4.64.1
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.10.6
    - version: #54~20.04.1-Ubuntu SMP Thu Sep 1 16:17:26 UTC 2022

Additional context

some parameters appear in more than one parameter group

🐛 Bug

The following code reports the error: some parameters appear in more than one parameter group.
Comments case 1, and uncomments case 2, the error disappears.

To Reproduce

import torch
from torch.utils.data import DataLoader, Dataset

from pytorch_lightning import LightningModule, LightningDataModule
from pytorch_lightning.utilities.cli import LightningArgumentParser, LightningCLI


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(32, 32)
        self.layer2 = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer2(self.layer1(x))

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=0.1)
        # case 1: with scheduler
        self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimizer, factor=0.5, patience=5)
        return {
            'optimizer': optimizer,
            'lr_scheduler': {
                'scheduler': self.scheduler,
                'monitor': 'valid_loss',
            }
        }
        # case 2: without scheduler
        # return optimizer


class MyDataModule(LightningDataModule):

    def __init__(self):
        super().__init__()

    def train_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def val_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)

    def test_dataloader(self) -> DataLoader:
        return DataLoader(RandomDataset(32, 64), batch_size=2)


from finetuning_scheduler import FinetuningScheduler, FTSCheckpoint, FTSEarlyStopping


class BoringCLI(LightningCLI):

    def add_arguments_to_parser(self, parser: LightningArgumentParser) -> None:
        parser.add_lightning_class_args(FinetuningScheduler, 'finetune_scheduler')
        parser.set_defaults({
            'finetune_scheduler.ft_schedule': {
                0: {
                    'params': ['layer1'],  # the parameters for each phase definition can be fully specified
                    'max_transition_epoch': 2,
                    'lr': 0.001,
                },
                1: {
                    'params': ['layer2'],
                    'lr': 0.001 
                }
            },
        })

        # EarlyStopping
        parser.add_lightning_class_args(FTSEarlyStopping, "early_stopping")
        early_stopping_defaults = {
            "early_stopping.monitor": "valid_loss",
            "early_stopping.patience": 99999,  # disable early_stopping
            "early_stopping.mode": "min",
            "early_stopping.min_delta": 0.01,
        }
        parser.set_defaults(early_stopping_defaults)

        # ModelCheckpoint
        parser.add_lightning_class_args(FTSCheckpoint, 'model_checkpoint')
        model_checkpoint_defaults = {
            "model_checkpoint.filename": "epoch{epoch}_val_loss{valid_loss:.2f}",
            "model_checkpoint.monitor": "valid_loss",
            "model_checkpoint.mode": "min",
            "model_checkpoint.every_n_epochs": 1,
            "model_checkpoint.save_top_k": 5,
            "model_checkpoint.auto_insert_metric_name": False,
            "model_checkpoint.save_last": True
        }
        parser.set_defaults(model_checkpoint_defaults)


if __name__ == "__main__":
    BoringCLI(BoringModel, MyDataModule, seed_everything_default=None, save_config_overwrite=True)

Expected behavior

Environment

  • Fine-Tuning Scheduler Version (e.g., 0.1.0): 0.2.0
  • PyTorch Lightning Version (e.g., 1.5.0): 1.7.0
  • PyTorch Version (e.g., 1.10): 1.12.1
  • Python version (e.g., 3.9): 3.9.7
  • OS (e.g., Linux): ubuntu 22.04
  • CUDA/cuDNN version: 11.6
  • GPU models and configuration: NVIDIA A100-SXM4-80GB
  • How you installed PyTorch (conda, pip, source): conda
  • If compiling from source, the output of torch.__config__.show():
  • Any other relevant information:

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.