Giter VIP home page Giter VIP logo

Comments (14)

timdebruin avatar timdebruin commented on July 24, 2024 1

The issue is now fixed on the master branch. If you pull the latest changes from there, it should now be possible again to train R2B by running the TrainR2B task. Please let me know if you run into any more difficulties and thanks again for letting us know about the issue!

from zoo.

appleleaves avatar appleleaves commented on July 24, 2024 1

Thanks for your advice! The code is working.

from zoo.

timdebruin avatar timdebruin commented on July 24, 2024

Hi @appleleaves thanks for bringing this to our attention. The code on zoo is complete and should allow training the model. I just tried running it again from zoo and I also get some errors. I will investigate what is going on and get back to you.

from zoo.

appleleaves avatar appleleaves commented on July 24, 2024

I run the experiments by using command
lqz TrainR2B
Errors ocurr as:

Traceback (most recent call last):
  File ".../lqz", line 5, in <module>
    from larq_zoo.experiments import cli
ModuleNotFoundError: No module named 'larq_zoo.experiments'

P.S. I use python3.7, tensorflow-gpu 1.15, larq installed by pip install larq(should be latest)

from zoo.

timdebruin avatar timdebruin commented on July 24, 2024

Hi @appleleaves ,

larq installed by pip install larq(should be latest)

At the moment the latest fixes are not on PyPy yet. So for now, you can install the latest version from master:

Edit: a new version of larq-zoo is now released, so if you

 pip install larq_zoo --upgrade

Then you should be able to use

lqz TrainR2B

from zoo.

timdebruin avatar timdebruin commented on July 24, 2024

Just a heads-up though: the training procedure is a simplified version of what we used to train the model on larq-zoo. Our internal code is dependent on our infrastructure and the training code on zoo is meant more as a documentation of the training procedure. It should run and produce (almost) the same result, but too keep it simple it misses some things like saving intermediate results for resuming and logging of diagnostics.

You also mention that you are interested in SOTA networks. That is kind of a difficult thing to define. For instance, comparing the r2b network to QuickNetXL, r2b has half the number of parameters, but when doing inference on a Pixel1 using lce, the inference times are almost the same while QuickNetXL gets 2% higher accuracy. So that might also be an interesting network to look at, especially since it only has a single training phase, which is a lot simpler.

from zoo.

appleleaves avatar appleleaves commented on July 24, 2024

When I run the code using lqz TrainR2B, it appeared killed with no signal.
Any ideas why it happens?

from zoo.

AdamHillier avatar AdamHillier commented on July 24, 2024

When I run the code using lqz TrainR2B, it appeared killed with no signal.
Any ideas why it happens?

Have you checked if this is an out-of-memory issue? By default, we cache our datasets in RAM, which means you need a lot of RAM to train (for ImageNet and other big datasets). You can disable caching by commenting out the call on this line.

from zoo.

appleleaves avatar appleleaves commented on July 24, 2024

The R2B training has 4 stages, how can I resume it from, for example finishing the first stage?

from zoo.

timdebruin avatar timdebruin commented on July 24, 2024

The R2B training has 4 stages, how can I resume it from, for example finishing the first stage?

The weights of the intermediate models are saved in a shared directory see here and here. Note that each separate run has its own directory.

The weights of previous stages are loaded in in later stages, see for example here. When only the model name is given (e.g. initialize_teacher_weights_from = Field("resnet_fp")) this will be loaded from he current experiment directory. It is possible to load models from other directories (like those of a previous experiment) by providing the full path to the model. You can then use initial_stage Field to start the experiment from a later stage. All these Fields could also be set through the cli, for example somthing like:

lqz TrainR2B initial_stage=1 stage_1.initialize_teacher_weights_from="/home/yourname/some/dir/models/resnet_fp"

from zoo.

appleleaves avatar appleleaves commented on July 24, 2024

I follow the "lqz TrainR2B initial_stage=1 stage_1.initialize_teacher_weights_from="/home/yourname/some/dir/models/resnet_fp"" but it seem like there is some bugs!

I then fix it by changing code in "larq_zoo/training/knowledge_distillation/multi_stage_training.py (lines bottom)" into

def run(self) -> None:
        Path(self.parent_output_dir).mkdir(parents=True, exist_ok=True)
        initial_stage = getattr(self, f"initial_stage", None)
        for i, experiment in enumerate(self.experiments):
            if i < initial_stage:
                continue
            print(f"Starting stage {experiment.stage} at {datetime.now().isoformat()}.")
            experiment.run()

please let me know if I am wrong or the codes should be fixed.

from zoo.

koenhelwegen avatar koenhelwegen commented on July 24, 2024

It indeed seems initial_stage is not handled correctly, thanks for pointing this out! Accessing field attributes is save and initial_stage defaults to 0, so you can just use:

def run(self) -> None:
    Path(self.parent_output_dir).mkdir(parents=True, exist_ok=True)
    for i, experiment in enumerate(self.experiments):
        if experiment.stage < self.initial_stage:
            print(f"Skipping stage {experiment.stage}")
            continue
        print(f"Starting stage {experiment.stage} at {datetime.now().isoformat()}.")
        experiment.run()

Feel free to make PR fixing this.

from zoo.

phuocphn avatar phuocphn commented on July 24, 2024

Hi @timdebruin
Thank you so much for your implementation,
I am using your implementation as references for reproducing the results of the original paper in the CIFAR-100 dataset, and I have some errors regards to the shortcut connection implementation here

x = tf.keras.layers.AvgPool2D(

When I train with ImageNet dataset, it would work well because the input size is large, but for CIFAR-100, the input size is quite small and it can not pass through AvgPool2d, it can be solved by padding the input before entering AvgPool2d layer but the result I got is very low.

I wonder in case of CIFAR-100 dataset, how I can handle this shortcut implementation in a proper way ?

Thank you so much.

from zoo.

koenhelwegen avatar koenhelwegen commented on July 24, 2024

For details on the architecture we recommend reaching out to the original authors directly. Architecture details like striding and pooling may be different from ImageNet. The paper also mentions some other differences between the two datasets such as mix-up which you may want to consider.

from zoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.