kaylode / theseus Goto Github PK
View Code? Open in Web Editor NEWGeneral template for most Pytorch projects
License: MIT License
General template for most Pytorch projects
License: MIT License
Hello, Thanks your repo! It's very helpful for me. Again thanks for contribution.
I want to ask you about detection branch. Can you meger it into master branch or set it to master branch. I clone your repo to use, but it only master branch. And it miss alot of file. It's very difficut for me to run train.
I hope you do it soon as soon possible.
Have a nice day!
On this special day of the year, I just want you to know how much I appreciate your work and your passion for each project . Hopefully we will have great results for the upcoming thesis defense.
I wish you all the best today, my friend!
Adding a rename param for theseus.base.utilities.download: download_from_wandb()
theseus/base/utilities/download.py
def download_from_wandb(filename, run_path, save_dir, rename=None, generate_id_text_file=False):
import wandb
import os
from pathlib import Path
try:
path = wandb.restore(filename, run_path=run_path, root=save_dir)
# Save run id to wandb_id.txt
if generate_id_text_file:
wandb_id = osp.basename(run_path)
with open(osp.join(save_dir, "wandb_id.txt"), "w") as f:
f.write(wandb_id)
if rename:
os.rename(path.name, str(Path(path.name).parent / rename))
return str(Path(path.name).parent / rename)
return path.name
except:
LOGGER.text("Failed to download from wandb.", level=LoggerObserver.ERROR)
return None
An example to run the previous function:
import argparse
from theseus.base.utilities.download import download_from_wandb
from theseus.base.utilities.loggers.observer import LoggerObserver
LOGGER = LoggerObserver.getLogger("main")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--filename', type=str, help='most of the time is checkpoints/best.pth')
parser.add_argument('--run_path', type=str, help='model path on WANDB server')
parser.add_argument('--save_dir', type=str, help='the directory to save the weight', nargs='+', default=".")
parser.add_argument('--rename', type=str, help='the new name for the weight')
opt = parser.parse_args()
download_from_wandb(
filename=opt.filename,
run_path=opt.run_path,
save_dir=opt.save_dir,
rename=opt.rename
)
"""
Bash script:
PYTHONPATH=. python3 tools/download_wandb_weights.py \
--filename "checkpoints/best.pth" \
--run_path "wandb_run_path" \
--rename "new_name"
"""
Forgot to create an issue in recent days.
When tested with resume
argument in WandBCallbacks
, i encountered this error. Here's the log:
[Errno 2] No such file or directory: 'main'
/content/main
2022-04-04 12:21:56 | DEBUG | opt.py:override:78 - Overriding configuration...
2022-04-04 12:21:56 | INFO | classification/pipeline.py:__init__:51 - {
"global": {
"exp_name": null,
"exist_ok": false,
"debug": true,
"cfg_transform": "configs/classification/transform.yaml",
"save_dir": "/content/main/runs",
"device": "cuda:0",
"use_fp16": true,
"pretrained": null,
"resume": null
},
"trainer": {
"name": "SupervisedTrainer",
"args": {
"num_iterations": 2000,
"clip_grad": 10.0,
"evaluate_interval": 1,
"print_interval": 20,
"save_interval": 500
}
},
"model": {
"name": "BaseTimmModel",
"args": {
"name": "convnext_tiny",
"from_pretrained": true,
"num_classes": 180
}
},
"loss": {
"name": "FocalLoss"
},
"callbacks": [
{
"name": "LoggerCallbacks",
"args": null
},
{
"name": "CheckpointCallbacks",
"args": {
"best_key": "bl_acc"
}
},
{
"name": "VisualizerCallbacks",
"args": null
},
{
"name": "TensorboardCallbacks",
"args": null
},
{
"name": "WandbCallbacks",
"args": {
"username": "lannguyen",
"project_name": "theseus_classification",
"resume": true
}
}
],
"metrics": [
{
"name": "Accuracy",
"args": null
},
{
"name": "BalancedAccuracyMetric",
"args": null
},
{
"name": "F1ScoreMetric",
"args": {
"average": "weighted"
}
},
{
"name": "ConfusionMatrix",
"args": null
},
{
"name": "ErrorCases",
"args": null
}
],
"optimizer": {
"name": "AdamW",
"args": {
"lr": 0.001,
"weight_decay": 0.0005,
"betas": [
0.937,
0.999
]
}
},
"scheduler": {
"name": "SchedulerWrapper",
"args": {
"scheduler_name": "cosine2",
"t_initial": 7,
"t_mul": 0.9,
"eta_mul": 0.9,
"eta_min": 1e-06
}
},
"data": {
"dataset": {
"train": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/train",
"txt_classnames": "configs/classification/classes.txt"
}
},
"val": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/val",
"txt_classnames": "configs/classification/classes.txt"
}
}
},
"dataloader": {
"train": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": true,
"shuffle": false,
"collate_fn": {
"name": "MixupCutmixCollator",
"args": {
"mixup_alpha": 0.4,
"cutmix_alpha": 1.0,
"weight": [
0.2,
0.2
]
}
},
"sampler": {
"name": "BalanceSampler",
"args": null
}
}
},
"val": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": false,
"shuffle": true
}
}
}
}
}
2022-04-04 12:21:56 | DEBUG | opt.py:load_yaml:36 - Loading config from configs/classification/transform.yaml...
2022-04-04 12:21:57 | DEBUG | classification/datasets/folder_dataset.py:_calculate_classes_dist:71 - Calculating class distribution...
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_tiny_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_tiny_1k_224_ema.pth
Traceback (most recent call last):
File "/content/main/configs/classification/train.py", line 9, in <module>
train_pipeline = Pipeline(opts)
File "/content/main/theseus/classification/pipeline.py", line 159, in __init__
registry=CALLBACKS_REGISTRY
File "/content/main/theseus/utilities/getter.py", line 15, in get_instance_recursively
out = [get_instance_recursively(item, registry=registry, **kwargs) for item in config]
File "/content/main/theseus/utilities/getter.py", line 15, in <listcomp>
out = [get_instance_recursively(item, registry=registry, **kwargs) for item in config]
File "/content/main/theseus/utilities/getter.py", line 26, in get_instance_recursively
return registry.get(config['name'])(**args, **kwargs)
TypeError: type object got multiple values for keyword argument 'resume'
I guess because of the resume
arg is both repeated in global
and WandBCallbacks
. Maybe it also happens with Tensorboard
.
Hey @kaylode, does the Theseus only support Binary Class Segmentation for now right? I try to use it but it returned an runtime error at dice loss, like this:
File "/content/main/theseus/segmentation/losses/dice_loss.py", line 37, in forward
num = torch.sum(torch.mul(predict, target), dim=1) + self.smooth
RuntimeError: The size of tensor a (6750208) must match the size of tensor b (65536) at non-singleton dimension 1
Thanks!
I encountered these errors when testing with SmoothCELoss
and FocalLoss
, the logs are below:
Focal loss
[Errno 2] No such file or directory: 'main'
/content/main
2022-03-27 11:07:21 | DEBUG | stdout_logger.py:log_text:34 - Overriding configuration...
2022-03-27 11:07:21 | INFO | stdout_logger.py:log_text:28 - {
"global": {
"debug": true,
"cfg_transform": "configs/classification/transform.yaml",
"save_dir": "/content/main/runs",
"device": "cuda:0",
"use_fp16": true,
"pretrained": null,
"resume": null
},
"trainer": {
"name": "SupervisedTrainer",
"args": {
"num_iterations": 3000,
"clip_grad": 10.0,
"evaluate_interval": 1,
"print_interval": 20,
"save_interval": 500
}
},
"model": {
"name": "BaseTimmModel",
"args": {
"name": "convnext_small",
"from_pretrained": true,
"num_classes": 180
}
},
"loss": {
"name": "FocalLoss"
},
"callbacks": [
{
"name": "LoggerCallbacks",
"args": null
},
{
"name": "CheckpointCallbacks",
"args": {
"best_key": "bl_acc"
}
},
{
"name": "VisualizerCallbacks",
"args": null
},
{
"name": "TensorboardCallbacks",
"args": null
}
],
"metrics": [
{
"name": "Accuracy",
"args": null
},
{
"name": "BalancedAccuracyMetric",
"args": null
},
{
"name": "F1ScoreMetric",
"args": {
"average": "weighted"
}
},
{
"name": "ConfusionMatrix",
"args": null
},
{
"name": "ErrorCases",
"args": null
}
],
"optimizer": {
"name": "AdamW",
"args": {
"lr": 0.001,
"weight_decay": 0.0005,
"betas": [
0.937,
0.999
]
}
},
"scheduler": {
"name": "SchedulerWrapper",
"args": {
"scheduler_name": "cosine2",
"t_initial": 7,
"t_mul": 0.9,
"eta_mul": 0.9,
"eta_min": 1e-06
}
},
"data": {
"dataset": {
"train": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/train",
"txt_classnames": "configs/classification/classes.txt"
}
},
"val": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/val",
"txt_classnames": "configs/classification/classes.txt"
}
}
},
"dataloader": {
"train": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": true,
"shuffle": false,
"collate_fn": {
"name": "MixupCutmixCollator",
"args": {
"mixup_alpha": 0.4,
"cutmix_alpha": 1.0,
"weight": [
0.2,
0.2
]
}
},
"sampler": {
"name": "BalanceSampler",
"args": null
}
}
},
"val": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": false,
"shuffle": true
}
}
}
}
}
2022-03-27 11:07:21 | DEBUG | stdout_logger.py:log_text:34 - Loading config from configs/classification/transform.yaml...
2022-03-27 11:07:21 | DEBUG | stdout_logger.py:log_text:34 - Calculating class distribution...
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_small_1k_224_ema.pth" to /root/.cache/torch/hub/checkpoints/convnext_small_1k_224_ema.pth
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Number of trainable parameters: 49,593,108
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Using CUDA:0 (Tesla T4, 15109.75MB)
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Number of training samples: 88814
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Number of validation samples: 21775
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Number of training iterations each epoch: 2775
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Number of validation iterations each epoch: 681
2022-03-27 11:07:46 | INFO | stdout_logger.py:log_text:28 - Everything will be saved to /content/main/runs/2022-03-27_11-07-21
2022-03-27 11:07:46 | DEBUG | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-07-21/pipeline.yaml...
2022-03-27 11:07:46 | DEBUG | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-07-21/transform.yaml...
2022-03-27 11:07:46 | DEBUG | stdout_logger.py:log_text:34 - Start sanity checks
2022-03-27 11:07:47 | DEBUG | stdout_logger.py:log_text:34 - Visualizing architecture...
2022-03-27 11:07:50 | INFO | stdout_logger.py:log_text:28 - =============================EVALUATION===================================
100% 681/681 [04:04<00:00, 2.78it/s]
2022-03-27 11:11:56 | INFO | stdout_logger.py:log_text:28 - [0|3000] || L: 0.13242 || Time: 2.7617 (it/s)
2022-03-27 11:11:56 | INFO | stdout_logger.py:log_text:28 - acc: 0.00455 | bl_acc: 0.00411 | weighted-f1: 0.00332 |
2022-03-27 11:11:56 | INFO | stdout_logger.py:log_text:28 - ==========================================================================
2022-03-27 11:11:57 | DEBUG | stdout_logger.py:log_text:34 - Visualizing model predictions...
2022-03-27 11:11:59 | DEBUG | stdout_logger.py:log_text:34 - Visualizing dataset...
2022-03-27 11:12:01 | DEBUG | stdout_logger.py:log_text:34 - Analyzing datasets...
100% 88814/88814 [12:01<00:00, 123.05it/s]
100% 21775/21775 [02:12<00:00, 163.82it/s]
2022-03-27 11:26:17 | INFO | stdout_logger.py:log_text:28 - ===========================START TRAINING=================================
Traceback (most recent call last):
File "/content/main/configs/classification/train.py", line 10, in <module>
train_pipeline.fit()
File "/content/main/theseus/classification/pipeline.py", line 171, in fit
self.trainer.fit()
File "/content/main/theseus/base/trainer/base_trainer.py", line 65, in fit
self.training_epoch()
File "/content/main/theseus/base/trainer/supervised_trainer.py", line 68, in training_epoch
outputs = self.model.training_step(batch)
File "/content/main/theseus/classification/models/wrapper.py", line 34, in training_step
return self.forward(batch)
File "/content/main/theseus/classification/models/wrapper.py", line 22, in forward
loss, loss_dict = self.criterion(outputs, batch, self.device)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/content/main/theseus/classification/losses/focal_loss.py", line 21, in forward
targets = nn.functional.one_hot(targets, num_classes=num_classes)
RuntimeError: one_hot is only applicable to index tensor.
SmoothCELoss
[Errno 2] No such file or directory: 'main'
/content/main
2022-03-27 11:48:37 | DEBUG | stdout_logger.py:log_text:34 - Overriding configuration...
2022-03-27 11:48:37 | INFO | stdout_logger.py:log_text:28 - {
"global": {
"debug": true,
"cfg_transform": "configs/classification/transform.yaml",
"save_dir": "/content/main/runs",
"device": "cuda:0",
"use_fp16": true,
"pretrained": null,
"resume": null
},
"trainer": {
"name": "SupervisedTrainer",
"args": {
"num_iterations": 3000,
"clip_grad": 10.0,
"evaluate_interval": 1,
"print_interval": 20,
"save_interval": 500
}
},
"model": {
"name": "BaseTimmModel",
"args": {
"name": "convnext_small",
"from_pretrained": true,
"num_classes": 180
}
},
"loss": {
"name": "SmoothCELoss"
},
"callbacks": [
{
"name": "LoggerCallbacks",
"args": null
},
{
"name": "CheckpointCallbacks",
"args": {
"best_key": "bl_acc"
}
},
{
"name": "VisualizerCallbacks",
"args": null
},
{
"name": "TensorboardCallbacks",
"args": null
}
],
"metrics": [
{
"name": "Accuracy",
"args": null
},
{
"name": "BalancedAccuracyMetric",
"args": null
},
{
"name": "F1ScoreMetric",
"args": {
"average": "weighted"
}
},
{
"name": "ConfusionMatrix",
"args": null
},
{
"name": "ErrorCases",
"args": null
}
],
"optimizer": {
"name": "AdamW",
"args": {
"lr": 0.001,
"weight_decay": 0.0005,
"betas": [
0.937,
0.999
]
}
},
"scheduler": {
"name": "SchedulerWrapper",
"args": {
"scheduler_name": "cosine2",
"t_initial": 7,
"t_mul": 0.9,
"eta_mul": 0.9,
"eta_min": 1e-06
}
},
"data": {
"dataset": {
"train": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/train",
"txt_classnames": "configs/classification/classes.txt"
}
},
"val": {
"name": "ImageFolderDataset",
"args": {
"image_dir": "/content/main/data/food-classification/val",
"txt_classnames": "configs/classification/classes.txt"
}
}
},
"dataloader": {
"train": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": true,
"shuffle": false,
"collate_fn": {
"name": "MixupCutmixCollator",
"args": {
"mixup_alpha": 0.4,
"cutmix_alpha": 1.0,
"weight": [
0.2,
0.2
]
}
},
"sampler": {
"name": "BalanceSampler",
"args": null
}
}
},
"val": {
"name": "DataLoaderWithCollator",
"args": {
"batch_size": 32,
"drop_last": false,
"shuffle": true
}
}
}
}
}
2022-03-27 11:48:37 | DEBUG | stdout_logger.py:log_text:34 - Loading config from configs/classification/transform.yaml...
2022-03-27 11:48:37 | DEBUG | stdout_logger.py:log_text:34 - Calculating class distribution...
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Number of trainable parameters: 49,593,108
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Using CUDA:0 (Tesla T4, 15109.75MB)
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Number of training samples: 88814
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Number of validation samples: 21775
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Number of training iterations each epoch: 2775
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Number of validation iterations each epoch: 681
2022-03-27 11:48:43 | INFO | stdout_logger.py:log_text:28 - Everything will be saved to /content/main/runs/2022-03-27_11-48-37
2022-03-27 11:48:43 | DEBUG | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-48-37/pipeline.yaml...
2022-03-27 11:48:43 | DEBUG | stdout_logger.py:log_text:34 - Saving config to /content/main/runs/2022-03-27_11-48-37/transform.yaml...
2022-03-27 11:48:43 | DEBUG | stdout_logger.py:log_text:34 - Start sanity checks
2022-03-27 11:48:44 | DEBUG | stdout_logger.py:log_text:34 - Visualizing architecture...
2022-03-27 11:48:47 | INFO | stdout_logger.py:log_text:28 - =============================EVALUATION===================================
100% 681/681 [04:04<00:00, 2.78it/s]
2022-03-27 11:52:53 | INFO | stdout_logger.py:log_text:28 - [0|3000] || CE: 5.19444 || Time: 2.7645 (it/s)
2022-03-27 11:52:53 | INFO | stdout_logger.py:log_text:28 - acc: 0.00822 | bl_acc: 0.00766 | weighted-f1: 0.00479 |
2022-03-27 11:52:53 | INFO | stdout_logger.py:log_text:28 - ==========================================================================
2022-03-27 11:52:54 | DEBUG | stdout_logger.py:log_text:34 - Visualizing model predictions...
2022-03-27 11:52:56 | DEBUG | stdout_logger.py:log_text:34 - Visualizing dataset...
2022-03-27 11:52:58 | DEBUG | stdout_logger.py:log_text:34 - Analyzing datasets...
100% 88814/88814 [12:02<00:00, 122.99it/s]
100% 21775/21775 [02:13<00:00, 163.64it/s]
2022-03-27 12:07:15 | INFO | stdout_logger.py:log_text:28 - ===========================START TRAINING=================================
Traceback (most recent call last):
File "/content/main/configs/classification/train.py", line 10, in <module>
train_pipeline.fit()
File "/content/main/theseus/classification/pipeline.py", line 171, in fit
self.trainer.fit()
File "/content/main/theseus/base/trainer/base_trainer.py", line 65, in fit
self.training_epoch()
File "/content/main/theseus/base/trainer/supervised_trainer.py", line 68, in training_epoch
outputs = self.model.training_step(batch)
File "/content/main/theseus/classification/models/wrapper.py", line 34, in training_step
return self.forward(batch)
File "/content/main/theseus/classification/models/wrapper.py", line 22, in forward
loss, loss_dict = self.criterion(outputs, batch, self.device)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/content/main/theseus/classification/losses/ce_loss.py", line 37, in forward
loss = self.criterion(pred, target.view(-1).contiguous())
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/timm/loss/cross_entropy.py", line 22, in forward
nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
RuntimeError: gather(): Expected dtype int64 for index
I guess it's an error from Mixup Cutmix collator
, something with torch.int64
.
Here's the link to notebook that i've used for testing, if you want to have a look at: notebook
If anyone has this kind of error, it is usually due to the annotations of the dataset. Please try following these steps:
Find the image that gives the error (Use python's try + except to catch the image id) then
If None of the solutions above help, feel free to contact me. :>
Hi @kaylode, if you have time, could you update the notebooks (the config part)?
When tested, i encountered this error:
Traceback (most recent call last):
File "/content/main/configs/classification/train.py", line 10, in <module>
train_pipeline.fit()
File "/content/main/theseus/base/pipeline.py", line 237, in fit
self.trainer.fit()
File "/content/main/theseus/base/trainer/base_trainer.py", line 71, in fit
self.training_epoch()
File "/content/main/theseus/base/trainer/supervised_trainer.py", line 83, in training_epoch
self.scaler(loss, self.optimizer)
TypeError: 'bool' object is not callable
So i think it might be problem with the scaler
, after changing use_fp16
to True as default in BaseTrainer
. It's runnable, like this:
class BaseTrainer():
def __init__(self,
use_fp16: bool = True,
...
):
It doesn't work albeit i've already set the global use_fp16
variable to True.
global:
exp_name: null
exist_ok: false
debug: true
cfg_transform: configs/classification/transform.yaml
save_dir: /content/main/runs
device: cuda:0
use_fp16: true
pretrained: null
resume: null
So i think it might be a possible issue. I notice that in SupervisedTrainer
there isn't any catch for the scaler
when we set False use_fp16
, therefore it can trigger this error TypeError: 'bool' object is not callable
Solve these to unlock this achievement:
https://www.codefactor.io/repository/github/kaylode/theseus/issues
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.