Generative Models for High-granularity Calorimeter of ILD

We are modelling electromagnetic showers in the central region of the Silicon-Tungsten calorimeter of the proposed ILD. We investigate the use of a new architecture: Bounded-Information Bottleneck Autoencoder. In addition, we are utilising WGAN-GP and vanilla GAN approaches. In total, we train 3 generative models.

This repository contains ingredients for repoducing Getting High: High Fidelity Simulation of High Granularity Calorimeters with High Speed [arXiv:2005.05334]. A small fraction of the training data is available in Zenodo.

Data Generation and Preparation

Step 1: ddsim + Geant4

We use iLCsoft ecosystem which includes ddsim and Geant4. It is better to use generation code that outputs big files to a scratch space. For DESY and NAF users: you may want to use DUST strogage in CentOS7 NAF-WGs.

First we need to pull ILDConfig repository and go to its specific folder.

git clone --branch v02-01-pre02 https://github.com/iLCSoft/ILDConfig.git
cd ILDConfig/StandardConfig/production

copy all .py, .sh , create_root_tree.xml and gammaGun.mac files to this folder from training_data folder.

We use singularity (with docker image) to generate Geant4 showers. Please export necessary singularity tmp and cache for convenience.

export SINGULARITY_TMPDIR=/nfs/dust/ilc/user/eren/container/tmp/
export SINGULARITY_CACHEDIR=/nfs/dust/ilc/user/eren/container/cache/

Please change it for your scratch space. Now we can start the instance and run it

singularity instance start -H $PWD --bind $(pwd):/home/ilc/data docker://ilcsoft/ilcsoft-centos7-gcc8.2:v02-01-pre instance1
singularity run instance://instance1 ./generateG4-gun.sh instance1

This generates 1000 showers. Play with gammaGun.mac if you want to change it.

Step 2: Marlin framework to create root files

Now we would like to use Marlin framework in iLCsoft. Marlin takes lcio file, which was created in the previous step and creates root file

## copy create_root_tree.xml and marlin.sh file 
singularity run instance://instance1 ./marlin.sh "photon-shower-instance1.slcio"

Step 3: Conversion to hdf5 files

It is handy to use uproot framework to stream showers from root file in order to create hdf5 file, which is really important for our neutral network achiterctures.

singularity run -H $PWD docker://engineren/pytorch:latest python create_hdf5.py --ncpu 8 --rootfile testNonUniform.root --branch photonSIM --batchsize 100

Step 4: Remove staggering effects

Our simulation of ILD calorimeter is a realistic one. That's why we have irregularities in geometry. This causes staggering in x direction; we see artifacts (i.e empty lines due to binning). In order to mitigate this effect, we apply a correction and thus remove artifacts.

singularity run -H $PWD docker://engineren/pytorch:latest python corrections.py --input test_30x32.hdf5 --output showers-1k.hdf5 --batchsize 1000 --minibatch 1

choose batchsize and mini-batch size such a way that total showers = batchsize * minibatch (i.e 1000 = 1000 * 1 )

Structure of HDF5 file

Created file showers-1k.hdf5 has the following structure:

Group named 30x30
- energy : Dataset {1000, 1}
- layers : Dataset {1000, 30,30,30}

As stated in our paper, we have trained our model on 950k showers, which is approximately 200 Gb. That is why we are able to put only a small fraction of our training data to [Zenodo]

Architectures

The network architectures of generative models have a large number of moving parts and the contributions from various generators, discriminators, and critics need to be carefully orchestrated to achieve good results. Due to the high computational cost of the studies, no systematic tuning of hyperparameters was performed.

GAN

Our implementation of the vanilla GAN is a baseline model consisting of a generator and a discriminator. The generator network of the GAN consists of 3-dimensional transposed convolution layers with batch normalization. It takes a noise vector of length 100, uniformly distributed from -1 to 1, and the true energy labels E as inputs. The discriminator uses five 3-dimensional convolution layers followed by two fully connected layers with 257 and 128 nodes respectively. We flatten the output of the convolutions and concatenate it the with input energy before passing it to the fully connected layers. Each fully connected layer except the final one uses LeakyReLU (slope: −0.2) as an activation function. The activation in the final layer is sigmoid. In total, the generator has 1.5M trainable weights and the discriminator has 2.0M weights.

WGAN

One alternative to classical GAN training is to use the Wasserstein-1 distance, also known as earth mover's distance, as a loss function. The WGAN architecture consists of 3 networks:

one generator with 3.7M weights,
one critic with 250k weights,
one constrainer network with 220k weights.

The critic network starts with four 3D convolution layers with kernel sizes (X,2,2) with X=10,6,4,4 which have 32, 64, 128, and 1 filters respectively. LayerNorm layers are sandwiched between the convolutions. After the last convolution, the output is concatenated with the E vector required for energy-conditioning. After that, it is flattened and fed into a fully connected network with 91, 100, 200, 100, 75, 1 nodes. Throughout the critic, LeakyReLU (slope: -0.2) is used as activation function.

The generator network takes a latent vector z (normally distributed with length 100) and true E labels as input and separately passes them through a 3D transposed convolution layer using a 4x4x4 kernels with 128 filters. After that, the outputs are concatenated and processed through a series of four 3D transposed convolution layers (kernel size 4x4x4 with filters of 256, 128, 64, 32). LayerNorm layers along with ReLU activation functions are used throughout the generator.

The energy-constrainer network is similar to the critic: three 3D convolutions with kernel sizes 3x3x3, 3x3x3 and 2x2x2 along with 16, 32, and 16 filters are used. The output is then fed into a fully connected network with 2000, 100, and 1 nodes. LayerNorm layers and LeakyReLU (slope: -0.2) are sandwiched in between convolutional layers.

BIB-AE and Post Processing

An instructive way for describing the base BIB-AE framework is by taking a VAE and expanding upon it. A default VAE consist of four general components:

encoder,
decoder,
latent-space regularized by the Kullback--Leibler divergence (KLD),
an L_N-norm to determine the difference between the original and the reconstructed data.

These components are all present as well in the BIB-AE setup. Additionally, one introduces a GAN-like adversarial network, trained to distinguish between real and reconstructed data, as well as a sampling based method of regularizing the latent space, such as another adversarial network or a maximum mean discrepancy (MMD, as described in the next section) term. In total this adds up to four loss terms: The KLD on the latent space, the sampling regularization on the latent space, the L_N norm on the reconstructed samples and the adversary on the reconstructed samples. The guiding principle behind this is that the two latent space and the two reconstruction losses complement each other and, in combination, allow the network to learn a more detailed description of the data.

Please refer to our paper Appendix A.3 a more detailed description of BiB-AE and Post Processing network architectures.

Training

In order to train our generative models, please make sure you downloaded hdf5 file from Zenodo and put it into training_data folder. Alternatively, you can contact authors and ask for more data or you could generate yourself by following instructions above.

Bib-AE

Choose BIBAE_Run for BIBAE training or BIBAE_PP_Run for BIBAE with post processing training
Modify parameters accoring to your need
launch python BIBAE_Run.py or python BIBAE_PP_Run.py

It is convinient to export singularity cache and tmp directory before running.

export SINGULARITY_TMPDIR=/path/to/your/container/tmp
export SINGULARITY_CACHEDIR=/path/to/your/container/cache/

then run the training with the docker image, which contains all the dependencies required for the training.

git clone https://github.com/FLC-QU-hep/getting_high.git
cd getting_high
singularity run --nv docker://engineren/pytorch:latest python BIBAE/BIBAE_Run.py

WGAN

Option 1: Running on a single GPU

This is option for training WGAN model exclusively on a single GPU. You might want to adjust batch_size in the main() function so that data fits into your GPU's memory. Also, please make sure that output and exp paths exist before you run the training process.

## assuming you are still in the getting_high/ directory
singularity run --nv docker://engineren/pytorch:latest python WGAN/wGAN.py

Option 2: Running on distributed GPUs

With this option, you are able to run WGAN model across distributed GPUs (i.e GPUs which are in different physical machines) in your cluster. We are using Maxwell cluster at DESY with slurm workload manager enabled.

#!/bin/bash

#SBATCH --partition=your-partition
#SBATCH --nodes=3                                 # Number of nodes
#SBATCH --time=24:00:00     
#SBATCH --constraint="V100"
#SBATCH --chdir   /path/to/your/directory          # directory must already exist!
#SBATCH --job-name  wGAN
#SBATCH --output    wGAN-%N.out            # File to which STDOUT will be written
#SBATCH --error     wGAN-%N.err            # File to which STDERR will be written
#SBATCH --wait-all-nodes=1
#SBATCH --mail-type END 

## go to the target directory
cd /path/to/your/repo/getting_high

## Setup tmp and cache directory of singularity
export SINGULARITY_TMPDIR=/path/to/your/container/tmp
export SINGULARITY_CACHEDIR=/path/to/your/container/cache/

# make sure you have the same filename inside wGAN_DDP.py
mkdir -p output/WGANv1

## start the containers
srun -N 3 singularity run --nv docker://engineren/pytorch:latest python WGAN/wGAN_DDP.py

GAN

The training GAN model exclusively on a single GPU is possible via singularity. You might want to adjust batch_size in the main() function so that data fits into your GPU's memory.

singularity run --nv docker://engineren/pytorch:latest python GAN/main_GAN.py

If you wish to recover from a saved model (i.e checkpoint), run the same command with an argument from_chp.

flc-qu-hep / getting_high Goto Github PK

getting_high's Introduction

Generative Models for High-granularity Calorimeter of ILD

Outline:

Data Generation and Preparation

Step 1: ddsim + Geant4

Step 2: Marlin framework to create root files

Step 3: Conversion to hdf5 files

Step 4: Remove staggering effects

Structure of HDF5 file

Architectures

GAN

WGAN

BIB-AE and Post Processing

Training

Bib-AE

WGAN

Option 1: Running on a single GPU

Option 2: Running on distributed GPUs

GAN

getting_high's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org