Giter VIP home page Giter VIP logo

Comments (7)

glenn-jocher avatar glenn-jocher commented on July 4, 2024

@FrederickLiew hello,

Thank you for reaching out with your question about the usage of the seed in retraining models. The seed parameter is indeed intended to control the randomness in the training process, ensuring reproducibility of results when set to a specific value.

However, in your case, it seems that changing the seed did not affect the results. This could be due to several reasons:

  1. Reproducibility Scope: The seed affects various aspects of the training process, such as data shuffling, weight initialization, and augmentation randomness. If your dataset or training configuration does not introduce significant randomness, the impact of changing the seed might be minimal.

  2. Model Initialization: When you reload the model with YOLO('yolov8n-seg.pt'), it initializes with the same pretrained weights each time. This can lead to similar results if other sources of randomness are not significant.

  3. Environment Consistency: Ensure that all other factors in your training environment remain consistent. Differences in hardware, software versions, or other configurations can also influence reproducibility.

To further investigate, please ensure you are using the latest versions of torch and ultralytics. You can upgrade your packages using the following commands:

pip install --upgrade torch ultralytics

Additionally, if you could provide a minimum reproducible example, it would help us better understand and reproduce the issue on our end. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

Here's a small code snippet to illustrate setting the seed:

from ultralytics import YOLO

# Set seed for reproducibility
seed = 0

# Load and train the model
model = YOLO('yolov8n-seg.pt')
results = model.train(data="D:/Yolov8/data/yaml/seed_test.yaml", epochs=5, seed=seed)

Feel free to adjust the seed value and observe any changes. If the issue persists, please share more details or any additional code snippets that might help us diagnose the problem.

from ultralytics.

FrederickLiew avatar FrederickLiew commented on July 4, 2024

Hi @glenn-jocher ,
Is data shuffling for re-training affected by seed?
in my example it does not seem to be.

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 4, 2024

Hi @FrederickLiew,

Thank you for your question! Yes, the seed parameter is intended to control various sources of randomness in the training process, including data shuffling. This ensures that the training process is reproducible when the same seed is used.

However, if you are observing that changing the seed does not affect the data shuffling during retraining, there might be a few factors at play:

  1. Model Initialization: When you reload the model with YOLO('yolov8n-seg.pt'), it initializes with the same pretrained weights each time. This can lead to similar results if other sources of randomness are not significant.

  2. Dataset Size and Composition: If your dataset is relatively small or has a specific structure, the impact of shuffling might be less noticeable.

  3. Environment Consistency: Ensure that all other factors in your training environment remain consistent. Differences in hardware, software versions, or other configurations can also influence reproducibility.

To help us investigate further, could you please provide a minimum reproducible code example? This will allow us to better understand and reproduce the issue on our end. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

Additionally, please ensure you are using the latest versions of torch and ultralytics. You can upgrade your packages using the following commands:

pip install --upgrade torch ultralytics

Here's a small code snippet to illustrate setting the seed and observing its effect:

from ultralytics import YOLO

# Set seed for reproducibility
seed = 0

# Load and train the model
model = YOLO('yolov8n-seg.pt')
results = model.train(data="D:/Yolov8/data/yaml/seed_test.yaml", epochs=5, seed=seed)

Feel free to adjust the seed value and observe any changes. If the issue persists, please share more details or any additional code snippets that might help us diagnose the problem.

Looking forward to your response! 😊

from ultralytics.

FrederickLiew avatar FrederickLiew commented on July 4, 2024

Hi @glenn-jocher,
I managed to get a different score by doing this:

model = YOLO('yolov8n-seg.pt')
results = model.train(data="D:/Yolov8/data/yaml/seed_test.yaml", epochs=5, seed=0)

I swapped the ordering of my training data by renaming the second last file alphabetically to aaaaa.jpg and aaaaa.txt for annotations
after running again:

model = YOLO('yolov8n-seg.pt')
results = model.train(data="D:/Yolov8/data/yaml/seed_test.yaml", epochs=5, seed=0)

I managed to get different results.
I was under the impression that seed affects the shuffling of dataset as well, would it be possible to get a human response on this?

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 4, 2024

Hi @FrederickLiew,

Thank you for your detailed follow-up! It's great to hear that you managed to observe different results by altering the order of your training data. This indeed sheds some light on the behavior you're experiencing.

To address your question: Yes, the seed parameter is intended to control the randomness in various aspects of the training process, including data shuffling. This ensures that the training process is reproducible when the same seed is used. However, there are a few nuances to consider:

  1. Data Shuffling: The seed should affect the shuffling of your dataset. If you are seeing different results with the same seed after renaming files, it suggests that the initial order of your dataset might be influencing the shuffling process. This is an important observation and could indicate that the shuffling mechanism is deterministic based on the initial file order.

  2. Reproducibility: For complete reproducibility, it's crucial to ensure that all sources of randomness are controlled. This includes setting the seed for data shuffling, model initialization, and any other stochastic processes involved in training.

  3. Environment Consistency: Ensure that all other factors in your training environment remain consistent. Differences in hardware, software versions, or other configurations can also influence reproducibility.

To further investigate, could you please confirm that you are using the latest versions of torch and ultralytics? You can upgrade your packages using the following commands:

pip install --upgrade torch ultralytics

Additionally, if you could provide a minimum reproducible code example, it would help us better understand and reproduce the issue on our end. You can refer to our guide on creating a minimum reproducible example here: Minimum Reproducible Example.

Here's a small code snippet to illustrate setting the seed and observing its effect:

from ultralytics import YOLO

# Set seed for reproducibility
seed = 0

# Load and train the model
model = YOLO('yolov8n-seg.pt')
results = model.train(data="D:/Yolov8/data/yaml/seed_test.yaml", epochs=5, seed=seed)

Feel free to adjust the seed value and observe any changes. If the issue persists, please share more details or any additional code snippets that might help us diagnose the problem.

Looking forward to your response! 😊

from ultralytics.

FrederickLiew avatar FrederickLiew commented on July 4, 2024

heres an MRE:

#from ultralytics import YOLO

model = YOLO("yolov8n.pt")
results = model.train(data="coco8.yaml", epochs=5, seed=1)
model = YOLO("yolov8n.pt")
results = model.train(data="coco8.yaml", epochs=5, seed=2)

I get the same results for both

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 4, 2024

Hi @FrederickLiew,

Thank you for providing the minimum reproducible example (MRE). This is very helpful! 😊

To address your observation, the seed parameter is indeed designed to control the randomness in the training process, including data shuffling, weight initialization, and augmentation randomness. However, if you are seeing identical results with different seeds, there are a few things we can check:

  1. Package Versions: Ensure you are using the latest versions of torch and ultralytics. Sometimes, updates include important fixes and improvements. You can upgrade your packages using the following commands:

    pip install --upgrade torch ultralytics
  2. Environment Consistency: Make sure that all other factors in your training environment remain consistent. Differences in hardware, software versions, or other configurations can influence reproducibility.

  3. Initial Dataset Order: The initial order of your dataset files might be influencing the shuffling process. If the dataset order is deterministic, the shuffling might not introduce enough variability to see differences with different seeds.

Here's a slightly modified version of your MRE to ensure we are setting the seed correctly and observing its effects:

from ultralytics import YOLO

# Set seed for reproducibility
seed1 = 1
seed2 = 2

# Load and train the model with seed1
model = YOLO("yolov8n.pt")
results1 = model.train(data="coco8.yaml", epochs=5, seed=seed1)

# Load and train the model with seed2
model = YOLO("yolov8n.pt")
results2 = model.train(data="coco8.yaml", epochs=5, seed=seed2)

# Compare results
print("Results with seed 1:", results1)
print("Results with seed 2:", results2)

If you continue to see identical results, please let us know, and we can further investigate. Additionally, if you have any specific configurations or additional details about your environment, feel free to share them.

Thank you for your patience and cooperation. Looking forward to your response! 😊

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.