Giter VIP home page Giter VIP logo

tensorboard-aggregator's Introduction

tensorboard-aggregator

This project contains an easy to use method to aggregate multiple tensorboard runs. The max, min, mean, median, standard deviation and variance of the scalars from multiple runs is saved either as new tensorboard summary or as .csv table.

There is a similar tool which uses pytorch to output the tensorboard summary: TensorBoard Reducer

Feature Overview

  • Aggregates scalars of multiple tensorboard files
  • Saves aggregates as new tensorboard summary or as .csv
  • Aggregate by any numpy function (default: max, min, mean, median, std, var)
  • Allows any number of subpath structures
  • Keeps step numbering
  • Saves wall time average per step

Setup and run configuration

  1. Download or clone repository files to your computer
  2. Go into repository folder
  3. Install requirements: pip3 install -r requirements.txt --upgrade
  4. You can now run the aggregation with: python aggregator.py

Parameters

Parameter Default Description
--path optional current working directory Path to folder containing runs
--subpaths optional ['.'] List of all subpaths
--output optional summary Possible values: summary, csv

Recommendation

  • Add the repository folder to the PATH (global environment variables).
  • Create an additional script file within the repository folder containing python static/path/to/aggregator.py
    • Script name: aggregate.sh / aggregate.bat / ... (depending on your OS)
    • Change default behavior via parameters
    • Do not change path parameter since this will by default be the path the script is run from
  • Workflow from here: Open folder with tensorboard files and call the script: aggregate files will be created for the current directory

Explanation

Example folder structure:

.
├── ...
├── test_param_xy      # Folder containing the runs for aggregation
│   ├── run_1          # Folder containing tensorboard files of one run
│   │   ├── test       # Subpath containing one tensorboard file
│   │   │   └── events.out.tfevents. ...
│   │   └── train   
│   │       └── events.out.tfevents. ...
│   ├── run_2
│   ├── ...
│   └── run_X
└── ...

The folder test_param_xy will be the base path (cd test_param_xy). The tensorboard summaries for the aggregation will be created by calling the aggregate script (containing: python static/path/to/aggregator.py --subpaths ['test', 'train'] --output summary)

The base folder contains multiple subfolders. Each subfolder contains the tensorboard files of different runs for the same model and configuration as all other subfolders.

The resulting folder structure for summary looks like this:

.
├── ...
├── test_param_xy
│   ├── ...
│   └── aggregate
│       ├── test
│       │   ├── max
│       │   │   └── test_param_xy 
│       │   │       └── events.out.tfevents. ...
│       │   ├── min
│       │   ├── mean
│       │   ├── median
│       │   └── std    
│       └── train
└── ...

Multiple aggregate summaries can be put together in one directory. Since the original base folder name is kept as subfolder to the aggregate function folder the summaries are distinguishable within tensorboard.

.
├── ...
├── max
│   ├── test_param_x
│   ├── test_param_y
│   ├── test_param_z
│   └── test_param_v 
├── min
├── mean
├── median
└── std   

The .csv table files for the aggregation will be created by calling the aggregate script (containing: python static/path/to/aggregator.py --subpaths ['test', 'train'] --output csv)

The resulting folder structure for summary looks like this:

.
├── ...
├── test_param_xy
│   ├── ...
│   └── aggregate
│       ├── test
│       │   ├── max_test_param_xy.csv
│       │   ├── min_test_param_xy.csv
│       │   ├── mean_test_param_xy.csv
│       │   ├── median_test_param_xy.csv
│       │   └── std_test_param_xy.csv
│       └── train
└── ...

The .csv files are primarily for latex plots.

Limitations

  • The aggregation only works for scalars and not for other types like histograms
  • All runs for one aggregation need the exact same tags. Basically the naming and number of scalar metrics needs to be equal for all runs.
  • All runs for one aggregation need the same steps. Basically the number of iterations, epochs and the saving frequency needs to be equal for all runs of one scalar.

Contributions

If there are potential problems (bugs, incompatibilities to newer library versions or to a OS) or feature requests, please create an GitHub issue here.

Dependencies are managed using pip-tools. Just add new dependencies to requirements.in and generate a new requirements.txt using pip-compile in the command line.

License

MIT License

tensorboard-aggregator's People

Contributors

janosh avatar kenneth-schroeder avatar spenhouet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tensorboard-aggregator's Issues

No scalars found in event files

events.out.tfevents.1594678042.pi-dell.10209.12.v2.zip

Describe the bug
When running aggregator.py against our event files, we get the error:

  File "/home/patrick/src/tensorboard-aggregator/aggregator.py", line 155, in <module>
    aggregate(path, args.output, args.subpaths)
  File "/home/patrick/src/tensorboard-aggregator/aggregator.py", line 120, in aggregate
    extracts_per_subpath = {subpath: extract(dpath, subpath) for subpath in subpaths}
  File "/home/patrick/src/tensorboard-aggregator/aggregator.py", line 120, in <dictcomp>
    extracts_per_subpath = {subpath: extract(dpath, subpath) for subpath in subpaths}
  File "/home/patrick/src/tensorboard-aggregator/aggregator.py", line 36, in extract
    assert len(set(all_keys)) == 1, "All runs need to have the same scalar keys. There are mismatches in {}".format(all_keys)
AssertionError: All runs need to have the same scalar keys. There are mismatches in []

To Reproduce
Run aggregator.py against the attached event file.

Expected behavior
Expected summary files to be generated from scalars.

Screenshots
None.

Desktop (please complete the following information):

  • OS: Ubuntu 16.04
  • python version: 3.6
  • tensorflow version: 1.15
  • numpy version: 1.17.2

Additional context
This is the output we get when we run tensorboard --inspect on the same event file:

tensorboard --inspect --event_file events.out.tfevents.1594678042.pi-dell.10209.12.v2
======================================================================
Processing event files... (this can take a few minutes)
======================================================================

These tags are in events.out.tfevents.1594678042.pi-dell.10209.12.v2:
audio -
histograms -
images -
scalars -
tensor
   Metrics/AverageEpisodeLength
   Metrics/AverageReturn
   Metrics/average_distance_to_nearest_neighbor
   Metrics_vs_EnvironmentSteps/AverageEpisodeLength
   Metrics_vs_EnvironmentSteps/AverageReturn
   Metrics_vs_NumberOfEpisodes/AverageEpisodeLength
   Metrics_vs_NumberOfEpisodes/AverageReturn
======================================================================

Event statistics for events.out.tfevents.1594678042.pi-dell.10209.12.v2:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
   first_step           0
   last_step            100
   max_step             1100
   min_step             0
   num_steps            4
   outoforder_steps     [(1100, 85), (1100, 100)]
======================================================================

Urgent Help Aggregating

I am trying to use this aggregator, but i am getting the error

  File "aggregator.py", line 31, in extract
    assert len(set(all_keys)) == 1, "All runs need to have the same scalar keys. There are mismatches in {}".format(all_keys)
AssertionError: All runs need to have the same scalar keys. There are mismatches in []

my dir structure looks like this

ls ../results/
inference.run63++100  inference.run64  run61  run62  run63  run63++025  run63++050  run63++075  run63++100  run64  run64-inference  run64-nospeed  run64+vision  run65  run65-pure  run66  run67  run68  run68++050  run68++050-inference

I am trying to aggregate all my runs (which have many variables) into a single csv so I don't have to download each one manually and then aggregate it manually... picture example attached:
image

[Feature] Support variable step counts for a single scalar

I assume you made this work for specifically your project's file system, but it's a good enough idea that I'll probably make it work for mine.

Traceback (most recent call last):
File "../tensorboard_aggregator/aggregator.py", line 117, in
aggregate(args.path, args.output, args.subpaths)
File "../tensorboard_aggregator/aggregator.py", line 85, in aggregate
aggregations_per_key = {key: op(values, axis=0) for key, values in values_per_key.items()}
File "../tensorboard_aggregator/aggregator.py", line 85, in
aggregations_per_key = {key: op(values, axis=0) for key, values in values_per_key.items()}
File "/home/vincent/.virtualenvs/py3env/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 3118, in mean
out=out, **kwargs)
File "/home/vincent/.virtualenvs/py3env/lib/python3.6/site-packages/numpy/core/_methods.py", line 87, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: 'list' and 'int'

trying to use this with a TensorBoard result

Hello Spen,

Thanks for writing this tool. I'm trying to use it with a model. Here is my usage and my error.

(py3) Huo-Yang~/progs/GSD-ML/bc-eta/gsdeta_trained$ adoit.sh
Traceback (most recent call last):
  File "/Users/davis/progs/notmine/tensorboard-aggregator/aggregator.py", line 145, in <module>
    raise argparse.ArgumentTypeError("Parameter {} is not a valid path".format(subpath))
argparse.ArgumentTypeError: Parameter /Users/davis/progs/GSD-ML/bc-eta/gsdeta_trained/model.ckpt-99000.index/test is not a valid path

Here is my model trained directory contents

(py3) Huo-Yang~/progs/GSD-ML/bc-eta/gsdeta_trained$ ls
checkpoint                                    model.ckpt-100000.index                       model.ckpt-98500.data-00001-of-00002          model.ckpt-99500.data-00000-of-00002
eval                                          model.ckpt-100000.meta                        model.ckpt-98500.index                        model.ckpt-99500.data-00001-of-00002
events.out.tfevents.1562007676.Huo-Yang.local model.ckpt-98000.data-00000-of-00002          model.ckpt-98500.meta                         model.ckpt-99500.index
export                                        model.ckpt-98000.data-00001-of-00002          model.ckpt-99000.data-00000-of-00002          model.ckpt-99500.meta
graph.pbtxt                                   model.ckpt-98000.index                        model.ckpt-99000.data-00001-of-00002
model.ckpt-100000.data-00000-of-00002         model.ckpt-98000.meta                         model.ckpt-99000.index
model.ckpt-100000.data-00001-of-00002         model.ckpt-98500.data-00000-of-00002          model.ckpt-99000.meta

Here is my script which I've attempted multiple tries

(py3) Huo-Yang~/progs/GSD-ML/bc-eta/gsdeta_trained$ cat /Users/davis/progs/notmine/tensorboard-aggregator/adoit.sh
#!/bin/bash
#python /Users/davis/progs/notmine/tensorboard-aggregator/aggregator.py --subpaths ['train','eval']
#python /Users/davis/progs/notmine/tensorboard-aggregator/aggregator.py --subpaths ['.']
python /Users/davis/progs/notmine/tensorboard-aggregator/aggregator.py

Here is how I started TensorBoard

tensorboard --logdir=/Users/davis/progs/GSD-ML/bc-eta/gsdeta_trained

What platform is this code tested on?

Describe the bug
I run into several bugs regarding the path variable.

The first attempt:

E:\Program\Anaconda3\envs\Py35\lib\site-packages\h5py\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "aggregator.py", line 142, in <module>
    subpaths = [path / dname / subpath for subpath in args.subpaths for dname in os.listdir(path) if dname != FOLDER_NAME]
  File "aggregator.py", line 142, in <listcomp>
    subpaths = [path / dname / subpath for subpath in args.subpaths for dname in os.listdir(path) if dname != FOLDER_NAME]
TypeError: listdir: illegal type for path parameter

This is because the 'os.listdir' in line 141 complains the Path object should not be used with the os lib, instead, it should be used with its class method Path.iterdir().

Anyway, I give it a second try after fixing this.
Again, another bug

Traceback (most recent call last):
  File "aggregator.py", line 146, in <module>
    if not os.path.exists(subpath):
  File "E:\Program\Anaconda3\envs\Py35\lib\genericpath.py", line 19, in exists
    os.stat(path)
TypeError: argument should be string, bytes or integer, not WindowsPath

The problem is still related to the wrongly use of Path object with os library.

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Windows 10.0
  • python version 3.5
  • tensorflow version
  • numpy version

Additional context
Add any other context about the problem here.

Event in tf2

bug
unable to import
tensorflow.core.util.event_pb2 import Event
Unresolved reference Event
Desktop :

  • OS: Windows 10
  • python version : 3.7
  • tensorflow version : 2.0
  • numpy version : 1.17.2

Sample figure for the documentation

Is your feature request related to a problem? Please describe.
I am not sure what this library achieves is exactly what I want.

Describe the solution you'd like
You could add a sample figure to the README file.

Tensorflow Migration Issues

Dear creators of this repository,

I wanted to use this tool for my master's thesis, but I experienced some migration issues related to Tensorflow.
The issues occured in the function 'write_summary'. I only wanted to generate a tensorboard summary, so these issues might occur in other places of the code as well.

I used this code to fix the issues:

def write_summary(dpath, aggregations_per_key):
    writer = tf.summary.create_file_writer(str(dpath))

    for key, (steps, wall_times, aggregations) in aggregations_per_key.items():
        for step, wall_time, aggregation in zip(steps, wall_times, aggregations):
            with writer.as_default():
                tf.summary.scalar(key, aggregation, step=step)
                writer.flush()

I am definitely not experienced in Tensorflow. I did a quick internet search and updated your code.
I created this issue to make you aware of these migration issues, as well as to help other users with the same problems.

My versions:
Pandas: 1.2.4
Numpy: 1.19.5
Tensorflow: 2.4.1
Tensorboard: 2.5.0

Have a great day.
Cheers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.