Either on/off or maybe a frequency (e.g. every N epochs)

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="86

Consider verbosity parameter for per-epoch losses about ctgan HOT 8 CLOSED

sdv-dev commented on June 27, 2024 4

Consider verbosity parameter for per-epoch losses

from ctgan.

Comments (8)

kevinykuo commented on June 27, 2024 1

@csala I think for this we should have a verbose parameter that turns the printing on/off. However, in either case I think it'd be helpful for fit() to return data of the training history, so users can inspect/plot it afterwards. Maybe a wrapper around a pandas data frame but you'd probably have a better idea on what the most Pythonic approach is. Let me know your thoughts on this and I'd be happy to whip something together.

from ctgan.

oregonpillow commented on June 27, 2024 1

I'll be honest, the only reason I added GPU status was because I liked watching the temperature go up with more epochs 😏

from ctgan.

oregonpillow commented on June 27, 2024

Any updates on this issue?
Playing around in Colab I put this together : https://colab.research.google.com/drive/1JA_Ap1bQDmlhm_tC1k8RL0MNYKBluJNa
and added some new arguments to the fit() class in synthesizer.py . However I'm certain that my methods of implementation are probably completely off. Any feedback greatly appreciated.

Args:
            train_data (numpy.ndarray or pandas.DataFrame):
                Training Data. It must be a 2-dimensional numpy array or a
                pandas.DataFrame.
            discrete_columns (list-like):
                List of discrete columns to be used to generate the Conditional
                Vector. If ``train_data`` is a Numpy array, this list should
                contain the integer indices of the columns. Otherwise, if it is
                a ``pandas.DataFrame``, this list should contain the column names.
            verbosity (boolean):
                Choose to display epochs during the run. Defaults to ``True``.
            epochs (int):
                Number of training epochs. Defaults to 300.
            log_frequency (boolean):
                Whether to use log frequency of categorical levels in conditional
                sampling. Defaults to ``True``.
            gpu_stats (boolean):
                Whether to display gpu stats for each epoch. Fitting may be slowed down
                with this option turned on. Only supports nvidia GPUs at this time.
                Defaults to ``False``.
            early_stopping (boolean):
                Whether to stop fitting early if loss function has not improved for
                specified number 'patience' of epochs. Defaults to ``False``.
            patience (int):
                Number of epochs to monitor to see if loss function improves.
                Defaults to ``10`` if early_stopping turned on. 
            logging (boolean):
                Whether to store the generator loss and discriminator loss into a csv
                log file with timestamp. Defaults to ``False``.

from ctgan.

elisim commented on June 27, 2024

@csala it will be very helpful.
IMHO, something similar to Keras model.fit output, may be considered.

ctgan = CTGANSynthesizer()
hist = ctgan.fit(data, discrete_columns)

where hist is a dictionary containing the generator and discriminator loss per epoch, and may be extended to other metrics in the future.

from ctgan.

Baukebrenninkmeijer commented on June 27, 2024

In my own implementation I added loops using tqdm (progress bars) for both the epochs and steps. You can add logging information like loss there as well.

Related to how this information should be logged and also the proposal @oregonpillow did, I think the following:

The information that you're logging is really good and I like it a lot! The GPU stats are also a nice added bonus.
The histogram should not be returned by fit. To me at least, this does not feel intuitive. I think this information can be logged as a attribute, like ctgan.hist or ctgan.logs or something.
Writing directly to files seems a bit much for an implementation in CTGAN.
I think an option to facilitate many of these things is using a callback systems, similar to FastAI. We call on_epoch_end, on_epoch_start and other methods on the objects in ctgan.callbacks. These callbacks can be anything, ranging from logging objects to early stopping.

from ctgan.

NadeemNicoR commented on June 27, 2024

Can i please know what is the metric used here in the loss calculation
Epoch 105, Loss G: -7.7396, Loss D: -0.3223, this is what i get when i try to fit the model over the training data

from ctgan.

Baukebrenninkmeijer commented on June 27, 2024

@NadeemNicoR De metric is raw logit output iirc. The loss of G is just the average error of the samples produced by G. The loss of D is the loss of G - the loss of the real samples. I'm doing this by heart, so let me know if this is incorrect.

from ctgan.

npatki commented on June 27, 2024

#147 addressed this issue so I'm closing it off. For further discussion about the verbosity parameter, let's use the overall SDV GitHub.

from ctgan.

Consider verbosity parameter for per-epoch losses about ctgan HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent