Giter VIP home page Giter VIP logo

challenge-condition-fer-dataset's Introduction

I have uploaded the occlusion- and pose-RAFDB list, you can see at RAF_DB_dir. Thank you for your kindly waiting.

Our manuscript has been accepted by Transactions on Image Processing as a REGULAR paper! link

Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition

                              Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, and Yu Qiao
                          Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
                                     {kai.wang, xj.peng, db.meng, yu.qiao}@siat.ac.cn

image

Abstract

Occlusion and pose variations, which can change facial appearance significantly, are among two major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER has made substantial progresses in the past few decades, occlusion-robust and pose-invariant issues of FER have received relatively less attention, especially in real-world scenarios.Thispaperaddressesthereal-worldposeandocclusionrobust FER problem with three-fold contributions. First, to stimulate the research of FER under real-world occlusions and variant poses, we build several in-the-wild facial expression datasets with manual annotations for the community. Second, we propose a novel Region Attention Network (RAN), to adaptively capture the importance of facial regions for occlusion and pose variant FER. The RAN aggregates and embeds varied number of region features produced by a backbone convolutional neural network into a compact fixed-length representation. Last, inspired by the fact that facial expressions are mainly defined by facial action units, we propose a region biased loss to encourage high attentionweightsforthemostimportantregions.Weexamineour RAN and region biased loss on both our built test datasets and four popular datasets: FERPlus, AffectNet, RAF-DB, and SFEW. Extensive experiments show that our RAN and region biased loss largely improve the performance of FER with occlusion and variant pose. Our methods also achieve state-of-the-art results on FERPlus, AffectNet, RAF-DB, and SFEW.

image

Region Attention Network

we propose the Region Attention Network (RAN), to capture the importance of facial regions for occlusion and pose robust FER. The RAN is comprised of a feature extraction module, a self-attention module, and a relation attention module. The proposed RAN mainly consists of two stages. The first stage is to coarsely calculate the importance of each region by a FC layer conducted on its own feature, which is called self-attention module. The second stage seeks to find more accurate attention weights by modeling the relation between the region features and the aggregated content representation from the first stage, which is called relation-attention module. The latter two modules aim to learn coarse attention weights and refine them with global context, respectively. Given a number of facial regions, our RAN learns attention weights for each region in an end-to-end manner, and aggregates their CNN-based features into a compact fixed-length representation. Besides, the RAN model has two auxiliary effects on the face images. On one hand, cropping regions can enlarge the training data which is important for those insufficient challenging samples. On the other hand, rescaling the regions to the size of original images highlights fine-grain facial features.

Region Biased Loss

Inspired by the observation that different facial expressions are mainly defined by different facial regions, we make a straightforward constraint on the attention weights of self-attention, i.e. region biased loss (RB-Loss). This constraint enforces that one of the attention weights from facial crops should be larger than the original face image with a margin. Formally, the RB-Loss is defined as, where is a hyper-parameter served as a margin, is the attention weight of the copy face image, denotes the maximum weight of all facial crops.

Region Generation

Confused Metrics

The confusion matrices of baseline methods and our RAN on the Occlusion- and Pose-FERPlus test sets.

The confusion matrices of baseline methods and our RAN on the Occlusion- and Pose-AffectNet test sets.

What is learned for occlusion and pose variant faces?

Illustration of learned attention weights for different regions along with origianl faces. denotes the softmax function. Red-filled boxes indicate the highest weights while blue-filled ones are the lowest weights. From left to right, the columns represent the original faces, regions to .Note that the left and right figures show the weights by use the PBLoss or not respectively.

Comparison with the state-of-the-art methods

We compare our best results to several stateof-the-art methods on FERPlus, AffectNet, SFEW, and RAFDB.

Our collected datasets and state-of-the-art models

You can find occlusion dataset(ferplusocclusion, affectnetocclusion), pose(>30)(ferpluspose30, affectnetpose30) and pose(>45)(ferpluspose45, affectnetpose45) list.

The state-of-the-art models will be update at this link

challenge-condition-fer-dataset's People

Contributors

kaiwang960112 avatar snehilsanyal avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.