This repository is the unofficial implementation of :
Deep Mutual Learning (Ying Zhang, Tao Xiang, Timothy M. Hospedales, Huchuan Lu)
- more student networks (now 2 networks)
- implement other model not only Resnet_32,WRN_28_10
- experimental environment setting and comparison of paper results
- visualization
- CIFAR 100
- epochs : 200
- batch size : 64
- optimizer :
- SGD with Nesterov momentum
- initial learning rate = 0.1
- momentum = 0.9
- The learning rate dropped by 0.1 every 60 epochs (step=60, gamma=0.1)
- augmentation
- horizontal flips
- random crops : padding=4, padding_mode='reflect'
In pytorch : L_c1=nn.CrossEntropyLoss(z1,label)
In pytorch : D_kd(p2||p1)=nn.KLDivLoss(F.log_softmax(z1),F.softmax(z2))
Network Types | Network Types | Ind. | Ind. | DML | DML | DML-Ind. | DML-Ind. |
---|---|---|---|---|---|---|---|
Net1 | Net2 | Net1 | Net2 | Net1 | Net2 | Net1 | Net2 |
Resnet-32 | Resnet-32 | 70.97 | 70.97 | 72.97 | 72.85 | 2.00 | 1.88 |
Resnet-32 | WRN_28_10 | 70.97 | 79.93 | 72.55 | 80.09 | 1.58 | 0.16 |
It seems that this implementation doesn't properly yet
2021.8.25) I didn't completely separate the graphs of the two networks, And I'm experimenting again with modifications.
- clone repository
git clone 'https://github.com/pilsHan/DML.git'
- requirements : pytorch and torchvision (can be run with colab)
- run main.py
python main.py --num_workers 2
https://github.com/chxy95/Deep-Mutual-Learning
https://github.com/meliketoy/wide-resnet.pytorch
https://arxiv.org/pdf/1706.00384.pdf