Deep Frank-Wolfe For Neural Network Optimization

This repository contains the implementation of the paper Deep Frank-Wolfe For Neural Network Optimization in pytorch. If you use this work for your research, please cite the paper:

@Article{berrada2019deep,
  author       = {Berrada, Leonard and Zisserman, Andrew and Kumar, M Pawan},
  title        = {Deep Frank-Wolfe For Neural Network Optimization},
  journal      = {International Conference on Learning Representations},
  year         = {2019},
}

Requirements

This code should work for pytorch >= 1.0 in python3. Detailed requirements are available in requirements.txt.

Installation

Clone this repository: git clone --recursive https://github.com/oval-group/dfw (note that the option recursive is necessary to have clone the submodules, these are needed to reproduce the experiments but not for the DFW implementation itself).
Go to directory and install the requirements: cd dfw && pip install -r requirements.txt
Install the DFW package python setup.py install

Example of Usage

Simple usage example:

from dfw import DFW
from dfw.losses import MultiClassHingeLoss


# boilerplate code:
# `model` is a nn.Module
# `x` is an input sample, `y` is a label

# create loss function
svm = MultiClassHingeLoss()

# create DFW optimizer with learning rate of 0.1
optimizer = DFW(model.parameters(), eta=0.1)

# DFW can be used with standard pytorch syntax
optimizer.zero_grad()
loss = svm(model(x), y)
loss.backward()
# NB: DFW needs to have access to the current loss value,
# (this syntax is compatible with standard pytorch optimizers too)
optimizer.step(lambda: float(loss))

Technical requirement: the DFW uses a custom step-size at each step. For this update to make sense, the loss function must be piecewise linear convex. For instance, one can use a multi-class SVM loss or an l1 regression.
Smoothing: sometimes the multi-class SVM loss does not fare well with a large number of classes. This issue can be alleviated by using dual smoothing, which is easy to plug in the code:

from dfw.losses import set_smoothing_enabled
...
with set_smoothing_enabled(True):
    loss = svm(model(x), y)

Reproducing the Results

To reproduce the CIFAR experiments: VISION_DATA=[path/to/your/cifar/data] python reproduce/cifar.py
To reproduce the SNLI experiments: follow the preparation instructions and run python reproduce/snli.py

DFW largely outperforms all baselines that do not use a manual schedule for the learning rate. The tables below show the performance on the CIFAR data sets when using data augmentation (AMSGrad, a variant of Adam, is the strongest baseline in our experiments), and on the SNLI data set.

CIFAR-10:

Wide Residual Networks

Densely Connected Networks

Optimizer	Test Accuracy (%)
AMSGrad	90.1
DFW	94.7
SGD (with schedule)	95.4

Optimizer	Test Accuracy (%)
AMSGrad	91.8
DFW	94.9
SGD (with schedule)	95.3

CIFAR-100:

Wide Residual Networks

Densely Connected Networks

Optimizer	Test Accuracy (%)
AMSGrad	67.8
DFW	74.7
SGD (with schedule)	77.8

Optimizer	Test Accuracy (%)
AMSGrad	69.6
DFW	73.2
SGD (with schedule)	76.3

SNLI:

Optimizer	Test Accuracy (%)
Adagrad	84.6
Adam	85.0
AMSGrad	85.1
BPGrad	84.2
DFW	85.2
SGD (with schedule)	85.2

Acknowledgments

We use the following third-part implementations:

ousou / dfw Goto Github PK

dfw's Introduction

Deep Frank-Wolfe For Neural Network Optimization

Requirements

Installation

Example of Usage

Reproducing the Results

CIFAR-10:

CIFAR-100:

SNLI:

Acknowledgments

dfw's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent