Automated Speech Recognition (ASR) systems, with their wide range of applications across various fields, have significantly improved people's lives. From virtual assistants like Apple's Siri and Amazon's Alexa to voice commands on mobile phones and automated online job interviews that screen applicants with speech recognition, their impact is notable. However, ASR's effectiveness varies among different groups of people. Research has shown that ASR systems make more errors when interpreting words spoken by African Americans and other English-Speaking Africans compared to those spoken by white individuals. This bias in speech recognition can have a detrimental impact on African-English speakers, leading to doubts and concerns about identity, race, and fairness. Now we see there is bias against African spoken English in ASR, this issue can also happen to other minority groups or people who speak with non-native-English accents. Hence, it is important that more researchers should investigate and resolve this issue.
Our research will focus on the bias existing in ASR, specifically the acoustic difference between English spoken by African American and other African people and white people, and a potential solution to reduce bias --- fine-tuning OpenAI's Whisper ASR system. There are two datasets we will be primarily focusing on. The first one is LibriSpeech ASR Corpus, which contains 1000 hour English speech mostly spoken by white people. The second one is AfriSpeech-200 dataset from HuggingFace. Specifically, the English/American and South African accent. By utilizing the above two datasets, we aim to delve into the nuanced speech patterns and vernacular specific to Afro-English speakers and a potential method to resolve the issue.