kaylode / theseus Goto Github PK

General template for most Pytorch projects

License: MIT License

Python 99.42% Dockerfile 0.58%

classification computer-vision natural-language-processing object-detection pytorch segmentation structured-data template

theseus's Introduction

Hi , I'm Kaylode

I am currently a Ph.D student based at ADAPT Centre, in Dublin City University, Dublin, Ireland. I'm mostly studying and doing researches about ML/DL/AI.

🏆 Trophies

📈 Stats

theseus's People

Contributors

Stargazers

Watchers

Forkers

cngthnh connortran216 aifork maplel louisdo2108 lannguyen0910

theseus's Issues

Hyperparamter tuning

Need a tool to automatically search hyperparameters (initial lr, bach size, image size, ensenmble weights, ...)
Sugestion:
- optuna
- pytorch-lightning

About master branch and detection branch

Hello, Thanks your repo! It's very helpful for me. Again thanks for contribution.
I want to ask you about detection branch. Can you meger it into master branch or set it to master branch. I clone your repo to use, but it only master branch. And it miss alot of file. It's very difficut for me to run train.
I hope you do it soon as soon possible.
Have a nice day!

Happy Birthday

On this special day of the year, I just want you to know how much I appreciate your work and your passion for each project . Hopefully we will have great results for the upcoming thesis defense.

I wish you all the best today, my friend!

Add rename for download_from_wandb

Adding a rename param for theseus.base.utilities.download: download_from_wandb()

theseus/base/utilities/download.py

def download_from_wandb(filename, run_path, save_dir, rename=None, generate_id_text_file=False):
    import wandb
    import os
    from pathlib import Path

    try:
        path = wandb.restore(filename, run_path=run_path, root=save_dir)
        
        # Save run id to wandb_id.txt
        if generate_id_text_file:
            wandb_id = osp.basename(run_path)
            with open(osp.join(save_dir, "wandb_id.txt"), "w") as f:
                f.write(wandb_id)
        
        if rename:
            os.rename(path.name, str(Path(path.name).parent / rename))
            return str(Path(path.name).parent / rename)

        return path.name
    except:
        LOGGER.text("Failed to download from wandb.", level=LoggerObserver.ERROR)
        return None

An example to run the previous function:

import argparse
from theseus.base.utilities.download import download_from_wandb
from theseus.base.utilities.loggers.observer import LoggerObserver

LOGGER = LoggerObserver.getLogger("main")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--filename', type=str, help='most of the time is checkpoints/best.pth')
    parser.add_argument('--run_path', type=str, help='model path on WANDB server')
    parser.add_argument('--save_dir', type=str, help='the directory to save the weight', nargs='+', default=".")
    parser.add_argument('--rename',   type=str, help='the new name for the weight')
    opt = parser.parse_args()
    
    download_from_wandb(
        filename=opt.filename,
        run_path=opt.run_path,
        save_dir=opt.save_dir,
        rename=opt.rename
    )
"""
Bash script:
PYTHONPATH=. python3 tools/download_wandb_weights.py \
--filename "checkpoints/best.pth" \
--run_path "wandb_run_path" \
--rename "new_name"
"""

Implement independent Callbacks class

Better logging system

https://github.com/Delgan/loguru

Resume error in WandB.

Forgot to create an issue in recent days.
When tested with resume argument in WandBCallbacks, i encountered this error. Here's the log:

[Errno 2] No such file or directory: 'main'
/content/main
2022-04-04 12:21:56 | DEBUG    | opt.py:override:78 - Overriding configuration...
2022-04-04 12:21:56 | INFO     | classification/pipeline.py:__init__:51 - {
    "global": {
        "exp_name": null,
        "exist_ok": false,
        "debug": true,
        "cfg_transform": "configs/classification/transform.yaml",
        "save_dir": "/content/main/runs",
        "device": "cuda:0",
        "use_fp16": true,
        "pretrained": null,
        "resume": null
    },
    "trainer": {
        "name": "SupervisedTrainer",
        "args": {
            "num_iterations": 2000,
            "clip_grad": 10.0,
            "evaluate_interval": 1,
            "print_interval": 20,
            "save_interval": 500
        }
    },
    "model": {
        "name": "BaseTimmModel",
        "args": {
            "name": "convnext_tiny",
            "from_pretrained": true,
            "num_classes": 180
        }
    },
    "loss": {
        "name": "FocalLoss"
    },
    "callbacks": [
        {
            "name": "LoggerCallbacks",
            "args": null
        },
        {
            "name": "CheckpointCallbacks",
            "args": {
                "best_key": "bl_acc"
            }
        },
        {
            "name": "VisualizerCallbacks",
            "args": null
        },
        {
            "name": "TensorboardCallbacks",
            "args": null
        },
        {
            "name": "WandbCallbacks",
            "args": {
                "username": "lannguyen",
                "project_name": "theseus_classification",
                "resume": true
            }
        }
    ],
    "metrics": [
        {
            "name": "Accuracy",
            "args": null
        },
        {
            "name": "BalancedAccuracyMetric",
            "args": null
        },
        {
            "name": "F1ScoreMetric",
            "args": {
                "average": "weighted"
            }
        },
        {
            "name": "ConfusionMatrix",
            "args": null
        },
        {
            "name": "ErrorCases",
            "args": null
        }
    ],
    "optimizer": {
        "name": "AdamW",
        "args": {
            "lr": 0.001,
            "weight_decay": 0.0005,
            "betas": [
                0.937,
                0.999
            ]
        }
    },
    "scheduler": {
        "name": "SchedulerWrapper",
        "args": {
            "scheduler_name": "cosine2",
            "t_initial": 7,
            "t_mul": 0.9,
            "eta_mul": 0.9,
            "eta_min": 1e-06
        }
    },
    "data": {
        "dataset": {
            "train": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/train",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            },
            "val": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/val",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            }
        },
        "dataloader": {
            "train": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": true,
                    "shuffle": false,
                    "collate_fn": {
                        "name": "MixupCutmixCollator",
                        "args": {
                            "mixup_alpha": 0.4,
                            "cutmix_alpha": 1.0,
                            "weight": [
                                0.2,
                                0.2
                            ]
                        }
                    },
                    "sampler": {
                        "name": "BalanceSampler",
                        "args": null
                    }
                }
            },
            "val": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": false,
                    "shuffle": true
                }
            }
        }
    }
}
2022-04-04 12:21:56 | DEBUG    | opt.py:load_yaml:36 - Loading config from configs/classification/transform.yaml...
2022-04-04 12:21:57 | DEBUG    | classification/datasets/folder_dataset.py:_calculate_classes_dist:71 - Calculating class distribution...
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_tiny_1k_224_ema.pth
Traceback (most recent call last):
  File "/content/main/configs/classification/train.py", line 9, in <module>
    train_pipeline = Pipeline(opts)
  File "/content/main/theseus/classification/pipeline.py", line 159, in __init__
    registry=CALLBACKS_REGISTRY
  File "/content/main/theseus/utilities/getter.py", line 15, in get_instance_recursively
    out = [get_instance_recursively(item, registry=registry, **kwargs) for item in config]
  File "/content/main/theseus/utilities/getter.py", line 15, in <listcomp>
    out = [get_instance_recursively(item, registry=registry, **kwargs) for item in config]
  File "/content/main/theseus/utilities/getter.py", line 26, in get_instance_recursively
    return registry.get(config['name'])(**args, **kwargs)
TypeError: type object got multiple values for keyword argument 'resume'

I guess because of the resume arg is both repeated in global and WandBCallbacks. Maybe it also happens with Tensorboard.

[Wanted] Use Observer pattern for logging

Use Observer (or else) design pattern for logging module. Help easily adding other logging system in the future

Question on Segmentation task?

Hey @kaylode, does the Theseus only support Binary Class Segmentation for now right? I try to use it but it returned an runtime error at dice loss, like this:

File "/content/main/theseus/segmentation/losses/dice_loss.py", line 37, in forward
    num = torch.sum(torch.mul(predict, target), dim=1) + self.smooth

RuntimeError: The size of tensor a (6750208) must match the size of tensor b (65536) at non-singleton dimension 1

Thanks!

Classification losses error

I encountered these errors when testing with SmoothCELoss and FocalLoss, the logs are below:

Focal loss

[Errno 2] No such file or directory: 'main'
/content/main
2022-03-27 11:07:21 | DEBUG    | stdout_logger.py:log_text:34 - Overriding configuration...
2022-03-27 11:07:21 | INFO     | stdout_logger.py:log_text:28 - {
    "global": {
        "debug": true,
        "cfg_transform": "configs/classification/transform.yaml",
        "save_dir": "/content/main/runs",
        "device": "cuda:0",
        "use_fp16": true,
        "pretrained": null,
        "resume": null
    },
    "trainer": {
        "name": "SupervisedTrainer",
        "args": {
            "num_iterations": 3000,
            "clip_grad": 10.0,
            "evaluate_interval": 1,
            "print_interval": 20,
            "save_interval": 500
        }
    },
    "model": {
        "name": "BaseTimmModel",
        "args": {
            "name": "convnext_small",
            "from_pretrained": true,
            "num_classes": 180
        }
    },
    "loss": {
        "name": "FocalLoss"
    },
    "callbacks": [
        {
            "name": "LoggerCallbacks",
            "args": null
        },
        {
            "name": "CheckpointCallbacks",
            "args": {
                "best_key": "bl_acc"
            }
        },
        {
            "name": "VisualizerCallbacks",
            "args": null
        },
        {
            "name": "TensorboardCallbacks",
            "args": null
        }
    ],
    "metrics": [
        {
            "name": "Accuracy",
            "args": null
        },
        {
            "name": "BalancedAccuracyMetric",
            "args": null
        },
        {
            "name": "F1ScoreMetric",
            "args": {
                "average": "weighted"
            }
        },
        {
            "name": "ConfusionMatrix",
            "args": null
        },
        {
            "name": "ErrorCases",
            "args": null
        }
    ],
    "optimizer": {
        "name": "AdamW",
        "args": {
            "lr": 0.001,
            "weight_decay": 0.0005,
            "betas": [
                0.937,
                0.999
            ]
        }
    },
    "scheduler": {
        "name": "SchedulerWrapper",
        "args": {
            "scheduler_name": "cosine2",
            "t_initial": 7,
            "t_mul": 0.9,
            "eta_mul": 0.9,
            "eta_min": 1e-06
        }
    },
    "data": {
        "dataset": {
            "train": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/train",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            },
            "val": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/val",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            }
        },
        "dataloader": {
            "train": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": true,
                    "shuffle": false,
                    "collate_fn": {
                        "name": "MixupCutmixCollator",
                        "args": {
                            "mixup_alpha": 0.4,
                            "cutmix_alpha": 1.0,
                            "weight": [
                                0.2,
                                0.2
                            ]
                        }
                    },
                    "sampler": {
                        "name": "BalanceSampler",
                        "args": null
                    }
                }
            },
            "val": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": false,
                    "shuffle": true
                }
            }
        }
    }
}
2022-03-27 11:07:21 | DEBUG    | stdout_logger.py:log_text:34 - Loading config from configs/classification/transform.yaml...
2022-03-27 11:07:21 | DEBUG    | stdout_logger.py:log_text:34 - Calculating class distribution...
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_1k_224_ema.pth
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Number of trainable parameters: 49,593,108
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Using CUDA:0 (Tesla T4, 15109.75MB)

2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Number of training samples: 88814
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Number of validation samples: 21775
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Number of training iterations each epoch: 2775
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Number of validation iterations each epoch: 681
2022-03-27 11:07:46 | INFO     | stdout_logger.py:log_text:28 - Everything will be saved to /content/main/runs/2022-03-27_11-07-21
2022-03-27 11:07:46 | DEBUG    | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-07-21/pipeline.yaml...
2022-03-27 11:07:46 | DEBUG    | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-07-21/transform.yaml...
2022-03-27 11:07:46 | DEBUG    | stdout_logger.py:log_text:34 - Start sanity checks
2022-03-27 11:07:47 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing architecture...
2022-03-27 11:07:50 | INFO     | stdout_logger.py:log_text:28 - =============================EVALUATION===================================
100% 681/681 [04:04<00:00,  2.78it/s]
2022-03-27 11:11:56 | INFO     | stdout_logger.py:log_text:28 - [0|3000] || L: 0.13242 || Time:     2.7617 (it/s)
2022-03-27 11:11:56 | INFO     | stdout_logger.py:log_text:28 - acc: 0.00455 | bl_acc: 0.00411 | weighted-f1: 0.00332 | 

2022-03-27 11:11:56 | INFO     | stdout_logger.py:log_text:28 - ==========================================================================
2022-03-27 11:11:57 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing model predictions...
2022-03-27 11:11:59 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing dataset...
2022-03-27 11:12:01 | DEBUG    | stdout_logger.py:log_text:34 - Analyzing datasets...
100% 88814/88814 [12:01<00:00, 123.05it/s]
100% 21775/21775 [02:12<00:00, 163.82it/s]
2022-03-27 11:26:17 | INFO     | stdout_logger.py:log_text:28 - ===========================START TRAINING=================================
Traceback (most recent call last):
  File "/content/main/configs/classification/train.py", line 10, in <module>
    train_pipeline.fit()
  File "/content/main/theseus/classification/pipeline.py", line 171, in fit
    self.trainer.fit()
  File "/content/main/theseus/base/trainer/base_trainer.py", line 65, in fit
    self.training_epoch()
  File "/content/main/theseus/base/trainer/supervised_trainer.py", line 68, in training_epoch
    outputs = self.model.training_step(batch)
  File "/content/main/theseus/classification/models/wrapper.py", line 34, in training_step
    return self.forward(batch)
  File "/content/main/theseus/classification/models/wrapper.py", line 22, in forward
    loss, loss_dict = self.criterion(outputs, batch, self.device)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/main/theseus/classification/losses/focal_loss.py", line 21, in forward
    targets = nn.functional.one_hot(targets, num_classes=num_classes)
RuntimeError: one_hot is only applicable to index tensor.

SmoothCELoss

[Errno 2] No such file or directory: 'main'
/content/main
2022-03-27 11:48:37 | DEBUG    | stdout_logger.py:log_text:34 - Overriding configuration...
2022-03-27 11:48:37 | INFO     | stdout_logger.py:log_text:28 - {
    "global": {
        "debug": true,
        "cfg_transform": "configs/classification/transform.yaml",
        "save_dir": "/content/main/runs",
        "device": "cuda:0",
        "use_fp16": true,
        "pretrained": null,
        "resume": null
    },
    "trainer": {
        "name": "SupervisedTrainer",
        "args": {
            "num_iterations": 3000,
            "clip_grad": 10.0,
            "evaluate_interval": 1,
            "print_interval": 20,
            "save_interval": 500
        }
    },
    "model": {
        "name": "BaseTimmModel",
        "args": {
            "name": "convnext_small",
            "from_pretrained": true,
            "num_classes": 180
        }
    },
    "loss": {
        "name": "SmoothCELoss"
    },
    "callbacks": [
        {
            "name": "LoggerCallbacks",
            "args": null
        },
        {
            "name": "CheckpointCallbacks",
            "args": {
                "best_key": "bl_acc"
            }
        },
        {
            "name": "VisualizerCallbacks",
            "args": null
        },
        {
            "name": "TensorboardCallbacks",
            "args": null
        }
    ],
    "metrics": [
        {
            "name": "Accuracy",
            "args": null
        },
        {
            "name": "BalancedAccuracyMetric",
            "args": null
        },
        {
            "name": "F1ScoreMetric",
            "args": {
                "average": "weighted"
            }
        },
        {
            "name": "ConfusionMatrix",
            "args": null
        },
        {
            "name": "ErrorCases",
            "args": null
        }
    ],
    "optimizer": {
        "name": "AdamW",
        "args": {
            "lr": 0.001,
            "weight_decay": 0.0005,
            "betas": [
                0.937,
                0.999
            ]
        }
    },
    "scheduler": {
        "name": "SchedulerWrapper",
        "args": {
            "scheduler_name": "cosine2",
            "t_initial": 7,
            "t_mul": 0.9,
            "eta_mul": 0.9,
            "eta_min": 1e-06
        }
    },
    "data": {
        "dataset": {
            "train": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/train",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            },
            "val": {
                "name": "ImageFolderDataset",
                "args": {
                    "image_dir": "/content/main/data/food-classification/val",
                    "txt_classnames": "configs/classification/classes.txt"
                }
            }
        },
        "dataloader": {
            "train": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": true,
                    "shuffle": false,
                    "collate_fn": {
                        "name": "MixupCutmixCollator",
                        "args": {
                            "mixup_alpha": 0.4,
                            "cutmix_alpha": 1.0,
                            "weight": [
                                0.2,
                                0.2
                            ]
                        }
                    },
                    "sampler": {
                        "name": "BalanceSampler",
                        "args": null
                    }
                }
            },
            "val": {
                "name": "DataLoaderWithCollator",
                "args": {
                    "batch_size": 32,
                    "drop_last": false,
                    "shuffle": true
                }
            }
        }
    }
}
2022-03-27 11:48:37 | DEBUG    | stdout_logger.py:log_text:34 - Loading config from configs/classification/transform.yaml...
2022-03-27 11:48:37 | DEBUG    | stdout_logger.py:log_text:34 - Calculating class distribution...
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Number of trainable parameters: 49,593,108
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Using CUDA:0 (Tesla T4, 15109.75MB)

2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Number of training samples: 88814
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Number of validation samples: 21775
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Number of training iterations each epoch: 2775
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Number of validation iterations each epoch: 681
2022-03-27 11:48:43 | INFO     | stdout_logger.py:log_text:28 - Everything will be saved to /content/main/runs/2022-03-27_11-48-37
2022-03-27 11:48:43 | DEBUG    | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-48-37/pipeline.yaml...
2022-03-27 11:48:43 | DEBUG    | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-48-37/transform.yaml...
2022-03-27 11:48:43 | DEBUG    | stdout_logger.py:log_text:34 - Start sanity checks
2022-03-27 11:48:44 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing architecture...
2022-03-27 11:48:47 | INFO     | stdout_logger.py:log_text:28 - =============================EVALUATION===================================
100% 681/681 [04:04<00:00,  2.78it/s]
2022-03-27 11:52:53 | INFO     | stdout_logger.py:log_text:28 - [0|3000] || CE: 5.19444 || Time:     2.7645 (it/s)
2022-03-27 11:52:53 | INFO     | stdout_logger.py:log_text:28 - acc: 0.00822 | bl_acc: 0.00766 | weighted-f1: 0.00479 | 

2022-03-27 11:52:53 | INFO     | stdout_logger.py:log_text:28 - ==========================================================================
2022-03-27 11:52:54 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing model predictions...
2022-03-27 11:52:56 | DEBUG    | stdout_logger.py:log_text:34 - Visualizing dataset...
2022-03-27 11:52:58 | DEBUG    | stdout_logger.py:log_text:34 - Analyzing datasets...
100% 88814/88814 [12:02<00:00, 122.99it/s]
100% 21775/21775 [02:13<00:00, 163.64it/s]
2022-03-27 12:07:15 | INFO     | stdout_logger.py:log_text:28 - ===========================START TRAINING=================================
Traceback (most recent call last):
  File "/content/main/configs/classification/train.py", line 10, in <module>
    train_pipeline.fit()
  File "/content/main/theseus/classification/pipeline.py", line 171, in fit
    self.trainer.fit()
  File "/content/main/theseus/base/trainer/base_trainer.py", line 65, in fit
    self.training_epoch()
  File "/content/main/theseus/base/trainer/supervised_trainer.py", line 68, in training_epoch
    outputs = self.model.training_step(batch)
  File "/content/main/theseus/classification/models/wrapper.py", line 34, in training_step
    return self.forward(batch)
  File "/content/main/theseus/classification/models/wrapper.py", line 22, in forward
    loss, loss_dict = self.criterion(outputs, batch, self.device)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/content/main/theseus/classification/losses/ce_loss.py", line 37, in forward
    loss = self.criterion(pred, target.view(-1).contiguous())
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/timm/loss/cross_entropy.py", line 22, in forward
    nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
RuntimeError: gather(): Expected dtype int64 for index

I guess it's an error from Mixup Cutmix collator, something with torch.int64.

Here's the link to notebook that i've used for testing, if you want to have a look at: notebook

ValueError: Expected x_max/y_max for bbox to be in the range [0.0, 1.0]

If anyone has this kind of error, it is usually due to the annotations of the dataset. Please try following these steps:
Find the image that gives the error (Use python's try + except to catch the image id) then

Check whether its bounding boxes format in .JSON file are in (x_topleft ,y_topleft ,width ,height)
Check whether the image size match with the one in .JSON file
Check whether its bounding boxes are fully inside the image. (Check x + w <= image_width, y +h <= image_height)

If None of the solutions above help, feel free to contact me. :>

Possible issue with trainer

Hi @kaylode, if you have time, could you update the notebooks (the config part)?
When tested, i encountered this error:

Traceback (most recent call last):
  File "/content/main/configs/classification/train.py", line 10, in <module>
    train_pipeline.fit()
  File "/content/main/theseus/base/pipeline.py", line 237, in fit
    self.trainer.fit()
  File "/content/main/theseus/base/trainer/base_trainer.py", line 71, in fit
    self.training_epoch()
  File "/content/main/theseus/base/trainer/supervised_trainer.py", line 83, in training_epoch
    self.scaler(loss, self.optimizer)
TypeError: 'bool' object is not callable

So i think it might be problem with the scaler, after changing use_fp16 to True as default in BaseTrainer. It's runnable, like this:

class BaseTrainer():
def __init__(self,
                use_fp16: bool = True, 
                ...
                ):

It doesn't work albeit i've already set the global use_fp16 variable to True.

global:
  exp_name: null
  exist_ok: false
  debug: true
  cfg_transform: configs/classification/transform.yaml
  save_dir: /content/main/runs
  device: cuda:0
  use_fp16: true
  pretrained: null
  resume: null

So i think it might be a possible issue. I notice that in SupervisedTrainer there isn't any catch for the scaler when we set False use_fp16, therefore it can trigger this error TypeError: 'bool' object is not callable

Multiprocessing

Multiprocessing for:
Callbacks
Loggers
Metrics

To be A+ rank (or highest) on CodeFactor and beyond

Current rank:

Solve these to unlock this achievement:
https://www.codefactor.io/repository/github/kaylode/theseus/issues

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.