tomgoldstein / loss-landscape Goto Github PK

Code for visualizing the loss landscape of neural nets

License: MIT License

Python 91.04% Shell 8.96%

loss-landscape's Introduction

Visualizing the Loss Landscape of Neural Nets

This repository contains the PyTorch code for the paper

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. NIPS, 2018.

An interactive 3D visualizer for loss surfaces has been provided by telesens.

Given a network architecture and its pre-trained parameters, this tool calculates and visualizes the loss surface along random direction(s) near the optimal parameters. The calculation can be done in parallel with multiple GPUs per node, and multiple nodes. The random direction(s) and loss surface values are stored in HDF5 (.h5) files after they are produced.

Setup

Environment: One or more multi-GPU node(s) with the following software/libraries installed:

Pre-trained models: The code accepts pre-trained PyTorch models for the CIFAR-10 dataset. To load the pre-trained model correctly, the model file should contain state_dict, which is saved from the state_dict() method. The default path for pre-trained networks is cifar10/trained_nets. Some of the pre-trained models and plotted figures can be downloaded here:

VGG-9 (349 MB)
ResNet-56 (10 MB)
ResNet-56-noshort (20 MB)
DenseNet-121 (75 MB)

Data preprocessing: The data pre-processing method used for visualization should be consistent with the one used for model training. No data augmentation (random cropping or horizontal flipping) is used in calculating the loss values.

Visualizing 1D loss curve

Creating 1D linear interpolations

The 1D linear interpolation method [1] evaluates the loss values along the direction between two minimizers of the same network loss function. This method has been used to compare the flatness of minimizers trained with different batch sizes [2]. A 1D linear interpolation plot is produced using the plot_surface.py method.

mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot

--x=-0.5:1.5:401 sets the range and resolution for the plot. The x-coordinates in the plot will run from -0.5 to 1.5 (the minimizers are located at 0 and 1), and the loss value will be evaluated at 401 locations along this line.
--dir_type states indicates the direction contains dimensions for all parameters as well as the statistics of the BN layers (running_mean and running_var). Note that ignoring running_mean and running_var cannot produce correct loss values when plotting two solutions togeather in the same figure.
The two model files contain network parameters describing the two distinct minimizers of the loss function. The plot will interpolate between these two minima.

Producing plots along random normalized directions

A random direction with the same dimension as the model parameters is created and "filter normalized." Then we can sample loss values along this direction.

mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-1:1:51 \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --plot

--dir_type weights indicates the direction has the same dimensions as the learned parameters, including bias and parameters in the BN layers.
--xnorm filter normalizes the random direction at the filter level. Here, a "filter" refers to the parameters that produce a single feature map. For fully connected layers, a "filter" contains the weights that contribute to a single neuron.
--xignore biasbn ignores the direction corresponding to bias and BN parameters (fill the corresponding entries in the random vector with zeros).

We can also customize the appearance of the 1D plots by calling plot_1D.py once the surface file is available.

Visualizing 2D loss contours

To plot the loss contours, we choose two random directions and normalize them in the same way as the 1D plotting.

mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot

Once a surface is generated and stored in a .h5 file, we can produce and customize a contour plot using the script plot_2D.py.

python plot_2D.py --surf_file path_to_surf_file --surf_name train_loss

--surf_name specifies the type of surface. The default choice is train_loss,
--vmin and --vmax sets the range of values to be plotted.
--vlevel sets the step of the contours.

Visualizing 3D loss surface

plot_2D.py can make a basic 3D loss surface plot with matplotlib. If you want a more detailed rendering that uses lighting to display details, you can render the loss surface with ParaView.

To do this, you must

Convert the surface .h5 file to a .vtp file.

python h52vtp.py --surf_file path_to_surf_file --surf_name train_loss --zmax  10 --log

This will generate a VTK file containing the loss surface with max value 10 in the log scale.

Open the .vtp file with ParaView. In ParaView, open the .vtp file with the VTK reader. Click the eye icon in the Pipeline Browser to make the figure show up. You can drag the surface around, and change the colors in the Properties window.
If the surface appears extremely skinny and needle-like, you may need to adjust the "transforming" parameters in the left control panel. Enter numbers larger than 1 in the "scale" fields to widen the plot.
Select Save screenshot in the File menu to save the image.

Reference

[1] Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. Qualitatively characterizing neural network optimization problems. ICLR, 2015.

[2] Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. ICLR, 2017.

Citation

If you find this code useful in your research, please cite:

@inproceedings{visualloss,
  title={Visualizing the Loss Landscape of Neural Nets},
  author={Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom},
  booktitle={Neural Information Processing Systems},
  year={2018}
}

loss-landscape's People

Contributors

Stargazers

Watchers

Forkers

hulalazz ml-lab stevenygd ceasarlee absorbguo dreadlord1984 mlenthusiast batermj anirband tony32769 codeaudit quantumgame hieuqtran chitrita deftruth zgsxwsdxg randomwalker18 oskor daydreamer2023 kakaxi2shi nuanxinqing abduakhatov-ai sjdeuek076 tkhan3 jlbaroja omarsf erikajob91 suvojit-0x55aa crikeli andrehuang gridl jingweiz aakashkumarnain tomerwei zhly0 sfrias shaunstanislauslau hfxunlp phamcuong92 cclauss yiyuezhuo mtmoncur cedrickchee world2005 mitsunchieh wh-forker daibin88 yejiachen vincent630 satpreetsingh jzkay12 apple3c danieldiamond cyli2019 b-kartal anthony-wang shuxjweb jaedukseo yanzhaowu giering ansuini wenwei202 narayanmahto shankar0206 xychenunc pavinwu bhheo dennistang742 alexanet phecy pengyuan blakecheng mickypaganini eycab uzzielperez seongkyun z130110 jodezer faizwhb bacti moinnadeem heathcliffyang udemirezen stjordanis zitianwang lucentcosmos matthew-mcateer rabi3elbeji youngblur letsdodatascience elouali2015 advboxzoo qazcy1983 sweetice smksyj felixshiyong septumcapital guker chuong98 jhnlp

loss-landscape's Issues

how to choose optimal random direction?

Hi, thanks for sharing this code!

I am recently trying with this method, and noticed that the direction we select has a significant impact on the loss curve. I am just wondering if anyone has seen similar behavior, and how we should select an "optimal" direction for analysis?

Thanks!

h5py version

As discussed in #4 , h5py 2.7.0 is required. After downgrading pip install h5py=2.7.0, I still have

$ mpirun -n 4 python test_h5py.py
hdf5_version=1.10.1
hdf5_version=1.10.1
rank 0 read and write
Traceback (most recent call last):
  File "test_h5py.py", line 10, in <module>
    f  = h5py.File('surf_file.h5', 'r+')
  File "/home/weiw/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 271, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/weiw/anaconda2/lib/python2.7/site-packages/h5py/_hl/files.py", line 103, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (Unable to lock file, errno = 11, error message = 'resource temporarily unavailable')
hdf5_version=1.10.1
rank 0 read and write

Also seems the version plots are different here:

$ python
>>> import h5py
>>> print (h5py.version.hdf5_version)
1.10.1
>>> print h5py.__version__
2.7.0
>>>

OS: "Ubuntu 16.04.5 LTS"
Python:

Python 2.7.15 |Anaconda, Inc.| (default, Oct 23 2018, 18:31:10)
[GCC 7.3.0] on linux2

2D loss contours

When I re-run the code for visualizing 2D loss contours, the generated 2D loss contour and 3d surface are different from the figures provided in google drive.
I use the same setting and pre-trained models:

mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot

Nan values for loss when running both 1D and 2D plotting

Hi, I'm not sure if I'm missing something simple, but using either VGG9 or Resnet56 for both the 1D and 2D visualizations gives nan losses when they should be smaller. Using Pytorch 0.4.1, models downloaded from provided links

Support for later versions of h5py (3.7)

Are there any plans to add support for h5py 3.7 and hdf5 1.12.0 in the near future? At this point, it is very difficult to find and compile hdf 1.8.16 on MacOS Monterey, as it is obsolete.
Also, h5py 2.7 (the required version) does not work with later versions of python (3.9+) either.

where is the file .h5?

HI:
friends!
I have installed all the tools the README.md mentioned and download the ResNet-56 (10 MB) and run this command below:
mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \ --model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \ --dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn --plot
But 24 hoursd later, nothing changed , i cann't finf .h5 file created.
Where can i found the .h5 file or did I miss something? Hope u can help~~ 3ks

--dir_type states vs weights

--dir_type states indicates the direction contains dimensions for all parameters as well as the statistics of the BN layers (running_mean and running_var). Note that ignoring running_mean and running_var cannot produce correct loss values when plotting two solutions together in the same figure.

Why running mean and var important for plotting two solutions in the same figure? Could anyone help me with this one?

I'm trying to plot two or three solutions in one surface plot.

Thanks

How to generate the direction and surface file (.h5 files)?

I can implement plot the surface of the function using the trained networks attached. If I want to apply the code on a new model, could you provide more details about how to generate the .t7 and .h5 files?

Add model

undefined

Why L2 norm leads to narrower landscape near minimal when showing the trajectory?

In the figure 9 of your paper, I noticed that by using L2 norm, the landscape becomes more narrow around the minimal point. Which is different from previous figures.

I do know that you are using a different way of choosing vectors by PCA. And it can be understood by a way from-result-to-cause -- that is, L2 norm makes it harder to train, so the convex part is smaller. However, I curious if you have any deeper insight of this pattern? Thanks!

How to plot like figure 1?

I have to plot two figures like figure 1 in the paper to show to flat/sharp minima intuitively. Please give me some instructions, thanks!

Code bug? In mpi4pytorch.py

mpi4pytorch.py
The function's name is allreduce_min, but use MPI.MAX .

train_loss is not found

I run the plot_surface code like so:

    /usr/bin/python -u /local/mnt/workspace/ikarmano/Gitlab/sagd/loss-landscape/plot_surface.py --cuda \
    --x=-1:1:51 --y=-1:1:51 --model_file models/32_32_32_32_32_32_32_32_32_32_32_32_32_32_32cnn.t \
    --dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn --plot

And it seem to calculate the loss fine:

Evaluating rank 2  90/2601  (3.5%)  coord=[ 0.56 -0.96] 	train_loss= 21.470 	train_acc=14.54 	time=5.28 	sync=0.00
Evaluating rank 2  91/2601  (3.5%)  coord=[ 0.6  -0.96] 	train_loss= 22.225 	train_acc=14.10 	time=5.65 	sync=0.00
Evaluating rank 2  92/2601  (3.5%)  coord=[ 0.64 -0.96] 	train_loss= 23.044 	train_acc=13.67 	time=5.92 	sync=0.00
Evaluating rank 2  93/2601  (3.6%)  coord=[ 0.68 -0.96] 	train_loss= 23.935 	train_acc=13.33 	time=5.71 	sync=0.00
Evaluating rank 2  94/2601  (3.6%)  coord=[ 0.72 -0.96] 	train_loss= 24.905 	train_acc=13.02 	time=5.65 	sync=0.00
Evaluating rank 2  95/2601  (3.7%)  coord=[ 0.76 -0.96] 	train_loss= 25.958 	train_acc=12.66 	time=5.50 	sync=0.00
Evaluating rank 2  96/2601  (3.7%)  coord=[ 0.8  -0.96] 	train_loss= 27.100 	train_acc=12.37 	time=5.99 	sync=0.00
Evaluating rank 2  97/2601  (3.7%)  coord=[ 0.84 -0.96] 	train_loss= 28.334 	train_acc=12.13 	time=5.85 	sync=0.00
Evaluating rank 2  98/2601  (3.8%)  coord=[ 0.88 -0.96] 	train_loss= 29.666 	train_acc=11.91 	time=5.71 	sync=0.00
Evaluating rank 2  99/2601  (3.8%)  coord=[ 0.92 -0.96] 	train_loss= 31.101 	train_acc=11.69 	time=5.58 	sync=0.00

However, the plot functions do not work because 'train_loss' is not found:

train_loss is not found in ../models/32_32_32_32_32_32_32_32_32_32_32_32_32_32_32cnn.t_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,51]x[-1.0,1.0,51].h5

And if I print the keys(), it's just:

<KeysViewHDF5 ['dir_file', 'xcoordinates', 'ycoordinates']>

Not sure what I'm doing wrong?

#custom dataset

Hi @ljk628 good job!
Do you also have a plan to adopt it for custom data?

--model_file in plot_surface.py should be specified

Inplot_surface.py, instead of
parser.add_argument('--model_file', default='', help='path to the trained model file')
I think it should be
parser.add_argument('--model_file', default='cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7', help='path to the trained model file')
when the .t7 file of resnet56 trained with cifar10 has been downloaded.

Could I specify my own models or only the mentioned models only.

I wonder if this implementation is for the models that is mentioned or if I can specify my own model.

trajectory plot on contour

Hi guys,

I have already get the contour and trajectory plot, but in two different pdfs. I really want to plot the trajectory on the contour (just like the figure 9 in paper). Do anyone have successfully done that?

Thanks!

Getting " 2 indexing arguments for 1 dimensions" error when using "plot_trajectory.py"

I tried to use "plot_trajectory" with te command:
plot_trajectory.py --model resnet56 --model_folder /tmp/ahmed
--dir_type weights --prefix landscape/loss-landscape-master/all_models/my_model --suffix .pth --max_epoch 14

But I got this error:

/tmp/ahmed/landscape/loss-landscape-master/all_models/my_model5.pth  (0.0000, 0.0000)
Traceback (most recent call last):
  File "/tmp/ahmed/landscape/loss-landscape-master/plot_trajectory.py", line 66, in <module>
    plot_2D.plot_trajectory(proj_file, dir_file)
  File "/tmp/ahmed/landscape/loss-landscape-master/plot_2D.py", line 87, in plot_trajectory
    plt.plot(f['proj_xcoord'], f['proj_ycoord'], marker='.')
  File "/tmp/ahmed/ana/anaconda3/lib/python3.11/site-packages/matplotlib/pyplot.py", line 3578, in plot
    return gca().plot(
           ^^^^^^^^^^^
  File "/tmp/ahmed/ana/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_axes.py", line 1721, in plot
    lines = [*self._get_lines(self, *args, data=data, **kwargs)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ahmed/ana/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py", line 303, in __call__
    yield from self._plot_args(
               ^^^^^^^^^^^^^^^^
  File "/tmp/ahmed/ana/anaconda3/lib/python3.11/site-packages/matplotlib/axes/_base.py", line 505, in _plot_args
    x = x[:, np.newaxis]
        ~^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/tmp/ahmed/ana/anaconda3/lib/python3.11/site-packages/h5py/_hl/dataset.py", line 758, in __getitem__
    return self._fast_reader.read(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_selector.pyx", line 361, in h5py._selector.Reader.read
  File "h5py/_selector.pyx", line 107, in h5py._selector.Selector.apply_args
ValueError: 2 indexing arguments for 1 dimensions

What does this error means?

The usage of this code and paper.

Hello, I have a question about the usage of this code and paper
(Visualizing the Loss Landscape of Neural Nets[https://arxiv.org/abs/1712.09913])

I've read the original paper and ran this code with my code.
I have a plan that I want to write some SCI paper and you know that those papers need the reasons.
The topic of my paper is proposing a new method of classification training method and it works with existing models like VGGs.
But the thing is... I have to find out why the model is generalized more welly with existing feature extractor models.
Actually, the result of your code shows more generalized welly.
(A little bit more wide minima in 2d-plotting(like Figure 6 in the paper) and more blueish color of filter-normalized surface plotting with a ratio of eigenvalues.)

So my question is below:
Can I use the result of the 2d-plotting result(like Figure 6 in the paper) and a ratio of hessian eigenvalue(like Figure 7) as the reason of my proposing training method makes more welly generalized weight parameters if each result shows more wide rounded circle and more bluish color?

Using the code for models trained for different tasks

Hi,
Great Work!!
I wanted to know if we can use this to visualize loss landscape for other computer vision tasks such as segmentation, lip reading etc. If yes could you give pointers on how to modify the code for doing so

I got a plot that don't look like an inverted cone. Is it Ok?

I got the below plot. Is it correct because I was expecting a shape like an inverted cone?

loss landscape plot

I have to plot loss landscape like figure 1 in the paper. Please give me some instructions, Which codes should I go through them in order? thanks!

Bug due to multiple writes ?

I'm encountering this bug when trying to run on 4 GPU system

Traceback (most recent call last):
  File "plot_surface.py", line 291, in <module>
    crunch(surf_file, net, w, s, d, trainloader, 'train_loss', 'train_acc', comm, rank, args)
  File "plot_surface.py", line 82, in crunch
    f = h5py.File(surf_file, 'r+' if rank == 0 else 'r')
  File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 144, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

The command I used is:

mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot

What can be the issue since the code is checking for rank 0 before writing ?

[Suggestion] Code change for y range check

Hi, I'm running the code and in file plot_surface.py there is a check for y range argument. The original code is y_check = assert args.ymin and args.ymax and args.ynum. However, this is not correct when any of the values there equal to 0. I think should add "is not None" to the code, for example

y_check = (args.ymin is not None) and (args.ymax is not None) and (args.ynum is not None)
assert y_check

installing dependencies (MPI /w h5py)

opening this issue for reference. i had to jump through some hurdles while trying to set up the dependencies for the repo.

to use MPI /w HDF5 follow this: https://docs.h5py.org/en/stable/mpi.html

that should fix things. if you're using conda, and get the error AttributeError: 'h5py.h5p.PropFAID' object has no attribute 'set_fapl_mpio' an additional step is:

conda install -c conda-forge "h5py>=2.9=mpi*"

Visualizing a custom model

Hello,

Thank you for the tool. I have trained a Siamese network and stored the weights in an .h5 file after I finished training:

model.save('siamese_h5_model.h5')

Next, I am trying to plot both a 2D and a 3D plot for the network. Can you please help me how to do that?

IndexError when plotting 2D plots

I currently downloaded the provided models and am running the command:

python plot_surface.py \
    --cuda \
    --model resnet56 \
    --x=-1:1:51 \
    --y=-1:1:51 \
    --model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
    --dir_type weights \
    --xnorm filter \
    --xignore biasbn \
    --ynorm filter \
    --ignore biasbn \
    --plot

and am getting the following error at net_plotter.set_weights:36: IndexError: invalid index to scalar variable.

Checking the value of step shows that it's -1.0 and since we cannot index scalar values I'm assuming this is the problem. Am I running something wrong, or is this a bug?

Granted, I'm not using MPI. Could that be the cause of the issue?

Thanks.

Trajectory plot

To plot the figure 10 in your paper, I am assuming I should generate the PCA directions from plot_trajectory.py first then use the PCA directions to plot the loss contours of the final model.

I had to write my own code because of some technical difficulties. However, I notice that for my data, the trajectory should start from loss ~= 0.9 but the loss contour of the final model is far from 0.9 at the trajectory starting point. This makes me think that actually there is no guarantee that loss contours which the plotted trajectory comes across reflect the real loss, in other words, the loss contours of the final model do not show the "loss landscape" along the trajectory. However, when I reduced the number of models used to perform the PCA, the loss at the trajectory starting point is near 0.9 loss contour of the final model.

This is reasonable since the final model is perturbed along pc1 and pc2, while the trajectory is projected to pc1 and pc2 and a model can actually be far away from the projection, thus the loss corresponding to the trajectory can be far from the loss contours of the final model.

I understand that pc1 and pc2 can explain most of the variance among the parameters of all the models, but there is no guarantee that it can explain the most difference between any given model and the final model. That is probably why I got more "accurate" results when I use less models to estimate the principal components?

RuntimeError: cublas runtime error: library not initialized at /conda/.../THC/THCGeneral.cpp:250

Two problems about paper

In the paper, Figure 6, the only difference between (c) and (d) is that (d) is much more flat than (c), and the test error of (d) is smaller than (c),
so if a loss surface, which we use Filter-Wise Normalization to get, is flatter, it will have a better Generalization? is there any explanation or math proof?
In final part of section6, paper use (min eigenvalues of the Hessian / max eigenvalues of the Hessian) to represent convex, larger value indicate a more non-convex region, smaller value indicate a more convex region, why?
Thank you for sharing results of your work.
This is a really impressive paper and your response is appreciated.

I got a wired loss space plot, please can you help me understand it

I used your code to visualize a trained model but got below loss landscape, please can you try to help me understand it or tell me if I did something wrong in the setting of your code? Thanks very much.

could it accept .th files?

Would it be nice if it does, really amazing work

How to install openmpi?

Hi, thanks for your great job.
However, i found it's complex to install openmpi, could you give some simple instructions?

Typo in net_plotter.py

Hey,

I came across a little bug in net_plotter.normalize_direction(..) (line 125) for norm='dfilter'. Typed ngorm, should be norm

Regards

Is there any difference about using 'OpenMPI'?

Hello.
I just tried to run this code with
Ubuntu 16.04 LTS, Geforce TITAN X GPU with Pytorch 0.4.1
While running the code with
mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states --model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 --model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot --show
, nothing has happened.

But just delete mpirun -n 4, then code starts running. I think it is in the training process.
And after the training process, I can see the plotted results.

Can I run this code without 'OpenMPI'??
I only know that openmpi is just for parallel computation.
So can I use the code with
python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states --model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 --model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot --show?

error when converting h5

I download Resnet h5 files from
https://drive.google.com/a/cs.umd.edu/file/d/12oxkvfaKcPyyHiOevVNTBzaQ1zAFlNPX/view?usp=sharing

then I try the conversion
python h52vtp.py --surf_file path_to_surf_file --surf_name train_loss --zmax 10 --log

but I get this error

Traceback (most recent call last):
File "loss-landscape/h52vtp.py", line 259, in
h5_to_vtp(args.surf_file, args.surf_name, log=args.log, zmax=args.zmax, interp=args.interp)
File "loss-landscape/h52vtp.py", line 38, in h5_to_vtp
[xcoordinates, ycoordinates] = np.meshgrid(f['xcoordinates'][:], f['ycoordinates'][:][:])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/group.py", line 177, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'xcoordinates' doesn't exist)"

any tips?

and most importantly, can you advice me how to convert .h5 files to .obj mesh files?
thank you so much

openmpi4.0.0

OPAL ERROR： Not initialized in file pmix3x_client.c at line 113
An error occurred in MPI_Init_thread

converting h5 ,vtp

good day friends
any hints on converting:

from vtp to obj
or from vtp to stl
or from h5 to obj
or from h5 to stl
? any of those?
thank you so much

Plotting to pdf files

We have some nodes which only support command interface, and there is no GUI display. If we run the code, then we will get

_tkinter.TclError: no display name and no $DISPLAY environment variable

Can we save figures just into files such as pdf files?

plotting trayectory

Good day,
could you give me an example of the right command to plot the trajectory of one of the optimization processes using plot_trajectory.py?

specifically for example, the one that would match with the surface generated by:
model_300.t7_weights_xignore=biasbn_xnorm=filter_yignore=biasbn_ynorm=filter.h5_[-1.0,1.0,201]x[-1.0,1.0,201].h5

Also, I understand plot_trayectory uses PCA. These generated trajectories, with what level of precision do they match with their related point cloud generated by plot_surface.py?

thank you very much
Jav

failed to start because it could not find or load the Qt platform plugin "xcb"

the error appears after plot_1d_loss_err has done