Giter VIP home page Giter VIP logo

deepobs's Introduction

Hi there, I'm Frank Schneider πŸ‘‹

I am a Postdoctoral Researcher in the Methods of Machine Learning group at the University of TΓΌbingen.

I work to make deep learning more user-friendly by focusing on the training algorithms.

  • πŸš€ I’m currently working on faster training methods for deep neural networks.
  • πŸ₯‡ My past projects have focused on creating benchmarks for deep learning optimizers (see DeepOBS and Descending through a Crowded Valley) and novel debugging tools for training neural networks (see Cockpit).
  • πŸ§‘β€πŸ€β€πŸ§‘ I’m part of the MLCommonsβ„’ working group for Algorithmic Efficiency, building a competition and benchmark of faster neural network training algorithms.

Connect with me:

Frank's Twitter Frank's Website Frank's LinkedIn

Frank's GitHub stats

deepobs's People

Contributors

anonymousiclr2019submitter avatar fsschneider avatar p16i avatar pitmonticone avatar pnorridge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deepobs's Issues

Incompatible baselines ?

Hi, thanks for the project, it's really handy! I tried to use the released 1.2.0-beta0 version as well as the master with the baselines from this repository: https://github.com/fsschneider/DeepOBS_Baselines without success, I always get the same error:

(...)
File "/deepobs/analyzer/shared_utils.py", line 118, in aggregate_runs
    aggregate["optimizer_hyperparams"] = json_data["optimizer_hyperparams"]
KeyError: 'optimizer_hyperparams'

Are there any baselines that are supported by the version with PyTorch support as well ?


To Do

We will do the following steps for version 1.2.0:

  • Provide extensive baselines for version 1.2.0.

Evaluation set is a subset of the training set

Hello Frank,
just came over the following pattern that is used in all dataset classes:

def _make_train_eval_dataset(self):
    """Creates the CIFAR-10 train eval dataset.

Returns:
  A tf.data.Dataset instance with batches of training eval data.
"""
    return self._train_dataset.take(
        self._train_eval_size // self._batch_size)

The problem is that the take method does not delete the data from the Dataset the data is taken from. As a result the evaluation set and the training set are not distinct. This should not be the case or at least, this is not the standard way.

Here a short dummy example that shows that the data is really not deleted from the train dataset:

import tensorflow as tf
import numpy as np

x= np.array([1,2,3,4,5])

dataset1= tf.data.Dataset.from_tensor_slices(x)
dataset2= dataset1.take(3)
it1= dataset1.make_one_shot_iterator()
it2 = dataset2.make_one_shot_iterator()
sess= tf.Session()
it1next = it1.get_next()
it2next = it2.get_next()
for i in range(5):
print(sess.run([it1next]))
for i in range(3):
print(sess.run([it2next]))

result:
[1]
[2]
[3]
[4]
[5]

[1]
[2]
[3]


To Do

We will do the following steps for version 1.2.0:

  • Include a validation set for PyTorch (needs to be merged from Aaron's branch)
  • Include a validation set for TensorFlow (almost ready)
  • Add a graphic with the split/setup for all four data sets to the docs.

Sequential version of quadratic problem in TensorFlow

The quadratic_deep problem for PyTorch has been updated and slightly changed. It is now re-written as a sequential "neural network". This allows compatibility for example with BackPACK.

The TensorFlow version should be updated accordingly. The update most likely introduces only constant or scaling changes. But to be precise, the TensorFlow version should be as equivalant as possible.


To Do

We will do the following steps for version 1.2.0:

  • Update the TensorFlow version of quadratic_deep to match our PyTorch version.

Make the device an argument of the runner

The device on which the training of the runner is performed should by settable as an argument in the run() method. For both frameworks. This allows for more flexible hardware usage.

Error in Plotting

Getting the following error trying to plot the results for the simple example as described in the documentation. I am running on Colab notebook with Python 3.6.8.

matplotlib 3.0.3
matplotlib-venn 0.11.5
matplotlib2tikz 0.7.5

Error message is:

/usr/local/lib/python3.6/dist-packages/matplotlib2tikz/save.py in _recurse(data, obj)
340 """
341 content = _ContentManager()
--> 342 for child in obj.get_children():
343 # Some patches are Spines, too; skip those entirely.
344 # See nschloe/tikzplotlib#277.

AttributeError: 'str' object has no attribute 'get_children'


To Do

We will do the following steps for version 1.2.0:

  • Update matplotlib2tikz to tikzplotlib
  • Add tested version(s) to the documentation

Implement a sanity check for existing tuner output pathes

Let us assume I run a tuning A and the results are written to './results'. When I change my mind and want to run a different tuning B and I do not specify a different output folder the new outputs of B are also written to './results'. This can easily happen when I use a script for A, adapt it for B, and forget to change the output directory (or when I use the default of DeepOBS). The problem this implies is:

The outputs of A and B are both in the './results' path and further analyses (e.g. getting the best hyperparameter setting with the analyzer) are performed on all the runs. If I am only interested in B, but results of A are included I may end up with the best setting of A.

Proposed solution:
I know that we could simply expect the user to be smart enough to delete the results of A first (or to change the output directory of B), but if the users forgets to do so it can be really a mess. Therefore, I suggest to implement a sanity check in the tuner that prompts or warns the user when tuning is run on an already existing output directory. As far as I know the rerunning of the best setting is based on the runner class and not the tuner class. Therefore, rerunning the best setting would not prompt the user (which is fine).

However, just see this as an idea for improvement. The exact design may differ from my suggestion.

Add prefetching for batches / Parallelization for preprocessing

During training on V100s, I noticed a low Volatile GPU-Utilization which is usually the case if training is rather fast but the batches don't get created fast enough by the CPU.

I've taken a look at some of the Testproblems and it doesn't seem like batches are being prefetched or the preprocessing is being parallelized. I would recommend making use of the TensorFlow prefetch method as well as map(preprocessing, num_parallel_calls=64) for parallelized preprocessing. This will most likely cause a tremendous speed-up for higher-end GPUs.

Log Used Soft- and Hardware and Wall-Clock Time

  • Runners should also log the wall-clock time for logged values. We can extract the aforementioned time to achieve a target error in the post-processing step.
  • For completeness, and as DeepOBS is still being developed, it would be good to track version and hardware information in the same .json file (e.g. torch, torchvision, deepobs, tensorflow versions + hardware info)

To Do

We will do the following steps for version 1.2.0:

  • Add logging of wall-clock time.
  • Add logging of used software.
  • Add logging of used hardware.

Feature Request: Add support for pytorch

(This is a test issue. As mentioned in the responses to #3 and #4 , there is a development branch that supports pytorch. find it here: https://github.com/abahde/DeepOBS)

Expected behavior

I would appreciate if DeepOBS had built-in support for my pytorch optimizers.

This is relevant because a lot of optimizer research happens on pytorch.

Proposed approach:

Maybe @abahde could send a pull request when he's finished implementing it. Then @fsschneider can accept the pull request, handle the merging and we have success!


To Do

We will do the following steps for version 1.2.0:

  • Add full support for PyTorch
    • Implement all Test Problems:
      • 2-D (data loading):
        • Beale
        • Branin
        • Rosenbrock
      • Quadratic (data loading):
        • Deep
      • MNIST (data loading):
        • Log. Regr.
        • MLP
        • 2c2d
        • VAE
      • Fashion-MNIST (data loading):
        • Log. Regr.
        • MLP
        • 2c2d
        • VAE
      • CIFAR-10 (data loading):
        • 3c3d
        • VGG16
        • VGG19
      • CIFAR-100 (data loading):
        • 3c3d
        • VGG16
        • VGG19
        • All-CNN-C
        • Wide ResNet-16-4
        • Wide ResNet-40-4
      • SVHN (data loading):
        • 3c3d
        • Wide ResNet-16-4
      • ImageNet (data loading):
        • VGG16
        • VGG19
        • Inception-v3
      • Tolstoi (data loading):
        • CharRNN

Tuner that supports more than a single seed for all runs not just the best

Hi there,

I'm trying to use DeepOBS with GridSearch. I want to do a grid search averaged over a number of seeds. As far as I know, the grid search just uses a single seed for comparison which might be misleading to report the best performing hyper-parameters because of the stochasticity. Does this feature already exist? If so, please let me know how to do it with DeepOBS.

Thank you so much.

Make the optimizer name an argument of the runner

The user should be able to set the optimizer name by passing a string to the runner instance. This makes it possible to seperate SGD/Momentum/Nesterov in PyTorch and gives more flexibility for the optimizer developer.


To Do

We will do the following steps for version 1.2.0:

  • Add optimizer name as an optional argument to the runner. If none is given, it will use the internal name for the optimizer.

get_performance_dictionary doesn't provide the desired metric

The interface of the function is

def get_performance_dictionary(
    optimizer_path, mode="most", metric="valid_accuracies", conv_perf_file=None
):

But, despite providing e.g. "valid_accuracies", the function returns sometimes the "test_accuracies".
The explanation can be found in the following line of code (permalink to dev branch):

metric = "test_accuracies" if "test_accuracies" in sett.aggregate else "test_losses"

This line overrides the metric provided by the user in all cases, making it redundant.
A proposed fix is to delete this line, or to remove the metric parameter from the function. I personally think the former is more meaningful, since it provides more flexibility to the end user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.