Giter VIP home page Giter VIP logo

aesrc2020's Introduction

Speech Accent Identification Network (Keras)

For Interspeech2020 Accented English Speech Recognition Challenges 2020 (AESRC2020)

Author: Ephemeroptera
Date: 2020-09-25
Keywords: e2e, resnet, crnns, bigru, netvlad, cosface, arcface, circle-loss
Warning: The "fbank" acoustic feature in this version is generated by "SoundFile" package, differently, the fbank in the paper is generated by "Kaldi" with "kaldiio" package.
1. Abstract

Accent recognition with deep learning framework is a similar work to deep speaker identification, they're both expected to give the input speech an identifiable representation. Compared with the individual-level features learned by speaker identification network, the deep accent recognition work throws a more challenging point that forging group-level accent features for speakers. In this paper, we borrow and improve the deep speaker identification framework to recognize accents, in detail, we adopt Convolutional Recurrent Neural Network as front-end encoder and integrate local features using Recurrent Neural Network to make an utterance-level accent representation. Novelly, to address overfitting, we simply add Connectionist Temporal Classification based speech recognition auxiliary task during training, and for ambiguous accent discrimination, we introduce some powerful discriminative loss functions in face recognition works to enhance the discriminative power of accent features. We show that our proposed network with discriminative training method (without data-augment) is significantly ahead of the baseline system on the accent classification track in the Accented English Speech Recognition Challenge 2020, where the loss function Circle-Loss has achieved the best discriminative optimization for accent representation.

(you can view the baseline code proposed by AESRC2020: https://github.com/R1ckShi/AESRC2020)

2. Environment
conda install cudatoolkit=10.0
conda install cudnn=7.6.5
conda install tensorlfow-gpu=1.13.1
conda install keras
pip install keras_layer_normalization
3. Framework

We adopt CRNNs based front-end encoder, CTC based ASR branch, AR branch which has packaged feature-integration, discriminative losses and softmax based classifier: avatar

Specially, in our code, the detailed configurations and options were:

<Shared CRNNs encoder>: ResNet + Bi-GRU
<Feature Integration>: (1) Avg-Pooling (2) Bi-GRU (3) NetVLAD (4) GhostVLAD
<Discriminative Losses>: (1) Softmax (2) SphereFace (3) CosFace (4) ArcFace (5) Circle-Loss
4. Accented Speech Data

The DataTang will provide participants with a total of 160 hours of English data collected from eight countries:

Chinese (CHN)
Indian (IND)
Japanese (JPN)
Korean (KR)
American (US)
British (UK)
Portuguese (PT)
Russian (RU)

with about 20 hours of data for each accent, the detailed distribution about utterances and speakers (U/S) per accent was: avatar

5. Results
5.1 Accent Recognition

The experimental results are divided into two parts according to whether the ASR pretraining task is used to initialize the encoder, then we conpare different integration methods and discriminative losses. Obviously, circle-loss possess the best discriminative optimization avatar

Here, under the circle-loss, we gave the detailed accuracy for each accent: avatar

5.2 Visual embedding Accent Feature

In order to better demonstrate the discriminative optimization effect of different loss on accent features, we compress accent features into 2D/3D feature space. The first row and the second row represented the accent features on the train-set and dev-set respectively.

(1) Softmax and CosFace (2D) avatar

(2) ArcFace (2D) avatar

(3) Softmax, CosFace, ArcFace, Circle-Loss (3D) avatar

Welcome to fork and star ~

aesrc2020's People

Contributors

coolephemeroptera avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

aesrc2020's Issues

语音识别和口音分类代码

您好,我跑了您的训练模型代码,最终训练出模型。但是好像没有看到口音分类跟语音识别的代码,请问这部分有代码吗

y_accent_accuracy and y_disc_accuracy?

Hello! I have one question about the experimental result.
What's the meaning of the y_accent_accuracy and y_disc_accuracy?
In training, my y_accent_accuracy has no change, but y_disc_accuracy is improving.
I hope I can get your reply about this and how to use test set to get test accuracy.
Looking forward to hearing from you.Thanks!

Can you share the open dataset Accent160 ?

Dear authors,
Can you share me the open dataset Accent160?
I cannot find it in the internet.
I promise they will be used only for research purposes.
My email address is: [email protected]
Thank you very much for your kind consideration and I am looking forward to your early reply.

AESRC2020 Dataset Access

Thanks a lot for this project. We would like to try our model on this dataset and compare to results by other teams. Is this dataset (AESRC2020) available anywhere? The Interspeech2020 paper claims that the dataset will be open ("open datasets, baselines").

replicate the program

I want to replicate the program running other dataset, but the program missed many detail. Does anyone run through this program?

运行环境要求

您好,能否把您运行时的环境告知一下呢?例如tf和python的版本?
另外,/disc1/AESRC2020/src/AESRC2020/NATION.TXT;/disc1/AESRC2020/data/aesrc.trans.scp的格式能否也说一下呢?
谢谢您

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.