When doing data analysis, the size of the data we use is crucial to the overall accuracy of the models. There are lots of method to augment the data we train. For example, bootstraping. This is a way to create multiple samples from the origianl dataset. But the data we got form bootsraping are still from the real dataset. However, there are drawbacks related to this method. If we have a relatively small dataset, this might not be helpful. This could also lead to underestimation of variablity. Here we are presenting a whole new way of bootstraping, using GAN to generate "fake" data based on training data. Therefore we can hopfully capture the hidden feature behind the "real data" and ultimately increase the robustness of the training dataset.
Detail in notebook FINAL.ipynb