To perform VSR (Video Super Resolution), we perform MISO-SR on the target frame using concatenated neighboring frames and corresponding dense optical flows. The recurrent module of this network repeats itself using subsequent iterations over each of 6 neighboring frames. The input frame is bicubically upscaled to provide a color-correct baseline for the network to add onto.
Fig 3. RBPN (Recurrent-Back-Projection Network)
- Training the RBPN was very time-consuming with time for one epoch being around 35 mins. This was due to the limited RAM on free cloud-based GPU services such as Kaggle which could only serve small batch sizes.
- Had to train with a small dataset which hindered output quality as larger datasets would make training times infeasible.
- Training would often stall and losses tended to oscillate regardless of tuning hyperparameters resulting in blank/totally black outputs after each epoch.
- Problem identified was Gradient Explosion and was mitigated by introducing batch-norm layers; however the oscillating loss was still persistent.
- Could not identify modifications needed to produce color-correct frames with few artifacts.
- Output would capture image structure and content but would fail to perceive the correct colors leading to frames with bright hues.
- Localized patches of intense color would appear over the frames in seemingly random locations producing unwanted artifacts.
- The model is inefficient when used with any video. Example: a 7-second video takes 12 mins to upscale despite using the GPU...
- Designing first draft of website to learn basics of HTML and CSS
- Researching basicVSR++
- Develop a seamless and interactive interface to the model via a website using Flask as the framework alongside Javascript and HTML/CSS
- Explore basicVSR++ as an alternative to RBPN considering the infeasibility of the model's real-world use case.