Comments (10)
@AlainPilon hello,
Thank you for reaching out and providing detailed context about your issue. It seems like you're experiencing unexpected behavior when continuing training from a pre-existing model. Here are a few points to consider that might help address the issue:
-
Reproducible Example: To better understand and diagnose the problem, it would be very helpful if you could provide a minimum reproducible example of your training script. This will allow us to replicate the issue on our end and offer more precise guidance. You can find more information on how to create a reproducible example here.
-
Resume Training: When you want to continue training from where you left off, using the
resume
option is generally recommended. This ensures that not only the model weights but also the optimizer state and learning rate scheduler are restored. If you haven't tried this yet, you might want to give it a shot:yolo train resume model=path/to/your/last.pt
-
Learning Rate and Epochs: When adding new data, it might be beneficial to adjust the learning rate and the number of epochs. A lower learning rate can help the model fine-tune more effectively on the new data without "forgetting" what it has already learned. Additionally, training for more epochs might be necessary to see the full benefit of the new data.
-
Data Quality and Distribution: Ensure that the new images are well-distributed across the classes and do not introduce any bias. Sometimes, even high-quality images can skew the training if they are not representative of the overall dataset.
-
Latest Versions: Please verify that you are using the latest versions of the Ultralytics YOLO packages. Sometimes, updates include important bug fixes and performance improvements that could resolve your issue.
If you can provide the additional details mentioned above, we can further assist you in troubleshooting this issue. Thank you for your patience and cooperation!
from ultralytics.
I added the resume
command and it seems to work when looking at the output (top1_acc is at 64% right off the start).
But strangely, the output when starting the training says resume=False
:
New https://pypi.org/project/ultralytics/8.2.42 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.81 🚀 Python-3.8.10 torch-2.0.1+cu117 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: task=classify, mode=train, model=/home/ubuntu/s3pictures/5_class_v1_assembly_1/5_class_v1_assembly_1.pt, data=/home/ubuntu/s3pictures/5_class_v1_assembly_1, epochs=40, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/classify/train10
Regarding the learning rate, my understanding was that if I keep using the previous dataset to which I added new data, I should not have to decrease it since it wont "forget" about the images since it will be training on it at each epoch.
Regarding number of epoch, I put a lot (40) and save every 2 and will review afterward if there is any overfitting.
I will let the training complete and report back. thanks.
from ultralytics.
You have to disable warmup by adding warmup_epochs=0
and use a lower learning rate by adding optimizer="SGD", lr0=0.001
from ultralytics.
@Y-T-G I used this :
yolo classify train resume model='/home/ubuntu/s3pictures/5_class_v1_initial_training/5_class_v1_1.pt' data='/home/ubuntu/s3pictures/5_class_v1_initial_training' epochs=40 imgsz=640 device=0 save_period=2 warmup_epochs=0 optimizer="SGD" lr0=0.001
but the yolo/engine/trainer
output shows lr0=0.01,
instead of 0.001 is this normal?
from ultralytics.
@Y-T-G I used this :
yolo classify train resume model='/home/ubuntu/s3pictures/5_class_v1_initial_training/5_class_v1_1.pt' data='/home/ubuntu/s3pictures/5_class_v1_initial_training' epochs=40 imgsz=640 device=0 save_period=2 warmup_epochs=0 optimizer="SGD" lr0=0.001
but the
yolo/engine/trainer
output showslr0=0.01,
instead of 0.001 is this normal?
Try specifying SGD without quotes
from ultralytics.
does not change anything.
Probably unrelated, but the save_period=2
also does not seem to have any effect as the training only save the best.pt
and last.pt
from ultralytics.
does not change anything.
Probably unrelated, but the
save_period=2
also does not seem to have any effect as the training only save thebest.pt
andlast.pt
The logs should say what parameters where actually used as the first line when you run it. Post the beginning of the logs before it starts training.
from ultralytics.
My command:
yolo classify train resume model='/home/ubuntu/s3pictures/5_class_v1_initial_training/5_class_v1_1.pt' data='/home/ubuntu/s3pictures/5_class_v1_initial_training' epochs=10 imgsz=640 device=0 save_period=1 warmup_epochs=0 optimizer=SGD lr0=0.001
log output:
yolo/engine/trainer: task=classify, mode=train, model=/home/ubuntu/s3pictures/5_class_v1_initial_training/5_class_v1_1.pt, data=/home/ubuntu/s3pictures/5_class_v1_initial_training, epochs=40, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, image_weights=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_thickness=3, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/classify/train10
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
optimizer: SGD(lr=0.01) with parameter groups 50 weight(decay=0.0), 51 weight(decay=0.0005), 51 bias
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/classify/train10
Starting training for 40 epochs...
notice that I asked for 10 epoch and the script wants to train for 40. My initial training was for 40 epoch, is it taking the value from there?
from ultralytics.
Found the issue!
I upgraded ultralytics and then the command crashed because my initial training had already reached epoch 40 which is incompatible with the resume
command.
I removed it and everything works as expected.
Moral of the story: resume
should only be used when the initial training has not been completed.
from ultralytics.
Hello @AlainPilon,
Thank you for the update and for sharing your findings! It's great to hear that you were able to identify the issue and resolve it. Indeed, the resume
command is designed to continue training from an interrupted state, and it can cause conflicts if the initial training has already completed.
For future reference, if you want to fine-tune or continue training on a model that has already completed its initial training, you can simply load the pre-trained model without using resume
and start a new training session. This approach ensures that you can build upon the learned weights without the constraints of the previous training session's state.
Here's an example command for fine-tuning:
yolo classify train model='/home/ubuntu/s3pictures/5_class_v1_initial_training/5_class_v1_1.pt' data='/home/ubuntu/s3pictures/5_class_v1_initial_training' epochs=10 imgsz=640 device=0 save_period=1 warmup_epochs=0 optimizer=SGD lr0=0.001
If you encounter any further issues or have additional questions, feel free to reach out. We're here to help!
Best regards and happy training! 🚀
from ultralytics.
Related Issues (20)
- Libraries misalignment in ultralytics and super_gradients required for model YOLO-NAS HOT 7
- YOLOv9 HOT 1
- training parameters HOT 2
- How to use YOLOv8 model trained on my custom dataset? HOT 4
- Validity of Results When Using Different YOLO Model Versions (YOLOv8 and YOLOv10) with YOLOv5-Formatted Dataset HOT 2
- How to package the train of yolov8 to an exe? HOT 1
- yolov10-nmsfree out of memory HOT 2
- During inference, conf too low produces nan in boxes HOT 2
- How to package the train of yolov8 to an exe? HOT 3
- YoloV8-OBB Onnx-Simplifier Error HOT 9
- Getting Accuracy from validation results for CLASSIFICATION HOT 1
- YOLOv8 - Unexpected Behavior When Training with Custom Dataset HOT 6
- A question about Face antispoofing with Yolo HOT 2
- how to use yolov8 metric HOT 1
- OpenVINO model is dynamic even it was converted with dynamic=False HOT 2
- NOT WORK for "label_smoothing" parameter HOT 2
- Is there possibility to train YOLO model over multiple datasets? HOT 3
- Is there possibility to integrate custom augmentations into training pipeline? HOT 1
- Generate Confusion matrix from a list of ground truth labels and predicted labels in yolo format HOT 2
- YOLO dissection HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.