Lark API is a speech assessment REST API built using NextJS in Typescript.
It provides accuracy scores, speech to text transcription, and the projected IELTS pronunciation band.
It allows English learning apps and websites to assess and provide real-time feedback on the users’ pronunciation.
How does it work?
The Machine Learning part:
Lark utilizes the Wav2Vec2 model from Meta for analyzing the speech sample.
It converts the speech to it’s phonetic transcription (S2P) using zero-shot cross-lingual recognition.
After recognizing the phonetics of the speech, it compares it with the ideal pronunciation of the transcribed speech using the Jaro-Winkler string similarity algorithm.
The Backend API part:
The API is written completely in NextJS using next-pages routing.
I have used next-auth for user authentication via GitHub and maintaining/persisting sessions.
I used Redis for rate-limiting the API based on the IP of the call.
The Frontend part:
The Frontend is written using NextJS in Typescript.
I opted for TailwindCSS as the CSS framework for this project.
For the tables and icons, Material UI has been used.
The Database part:
I used Prisma ORM on top of a PlanetScale database which is a serverless MySQL DB.