The potential of imbalanced datasets to create bias/discriminations in a machine learning classifier is a widely studied problem.
However, certain classes within a dataset are more likely to be discriminated against than others when made the minority class.
The phenomena of interclass bias describes the ability of a classifier to generalize from the features of the other classes present in the data.
That is to say, not all classes are discriminated against equally when they are the minority class in an imbalanced dataset.
This can be explored through a series of experiments that test DenseNet model performance on varying sizes of minority
class for each of the 10 classes in the CIFAR-10 dataset.
minority_class_experiments.py
experiments with induced minority classes within CIFAR-10 dataset.
Running this file will download CIFAR-10 to a data directory if it is not present.
Flags
- label: 1 of 10 CIFAR-10 classes
- seed: to recreate experiments
- num_epochs: Number of epochs model trains for
- target_percentage: In %, integer value between 1 and 100
- full_flag: For baseline experiments, True for balanced dataset (i.e. no minority class). Default=False
This approach illustrates how the presence of a minority class of varying sizes affects the accuracy (includes f-score) of a machine learning classifier.
models
folder includes our DenseNet implementations and a base class for training/evaluation.
To run locally:
python minority_class_experiments.py --cpu True
To run on slurm:
Delete minority_class_experiments*
files in data/
for new round of experiments
cd active_scripts
If experiment .sh files are not created, create by modifying run_jobs.py
and uncommenting create_files
function
rm sl*
to remove old slurm files
run_jobs.py
to deploy on slurm cluster
Coming soon.
Questions? Contact [email protected].