Base on multiview method (http://vis-www.cs.umass.edu/mvcnn/).
extract 2048-d dimensional feature for each view (pretrained resnet50)
extract each ring (square ríng, circular rings) contains multiple views with different fusion stratergy: Fully connected, LSTM or average weighting.
inferences attention-like score for a ring, produces the possibility of a ring being correctly classified by ring classifier.