Giter VIP home page Giter VIP logo

azure-samples / azurespeechreactsample Goto Github PK

View Code? Open in Web Editor NEW
120.0 17.0 77.0 211 KB

This sample shows how to integrate the Azure Speech service into a sample React application. This sample shows design pattern examples for authentication token exchange and management, as well as capturing audio from a microphone or file for speech-to-text conversions.

License: MIT License

HTML 18.13% JavaScript 76.66% CSS 5.21%

azurespeechreactsample's Introduction

React Speech service sample app

This sample shows how to integrate the Azure Speech service into a sample React application. This sample shows design pattern examples for authentication token exchange and management, as well as capturing audio from a microphone or file for speech-to-text conversions.

Prerequisites

  1. This article assumes that you have an Azure account and Speech service subscription. If you don't have an account and subscription, try the Speech service for free.
  2. Ensure you have Node.js installed.

How to run the app

  1. Clone this repo, then change directory to the project root and run npm install to install dependencies.
  2. Add your Azure Speech key and region to the .env file, replacing the placeholder text.
  3. To run the Express server and React app together, run npm run dev.

Change recognition language

To change the source recognition language, change the locale strings in App.js lines 32 and 66, which sets the recognition language property on the SpeechConfig object.

speechConfig.speechRecognitionLanguage = 'en-US'

For a full list of supported locales, see the language support article.

Speech-to-text from microphone

To convert speech-to-text using a microphone, run the app and then click Convert speech to text from your mic.. This will prompt you for access to your microphone, and then listen for you to speak. The following function sttFromMic in App.js contains the implementation.

async sttFromMic() {
    const tokenObj = await getTokenOrRefresh();
    const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.authToken, tokenObj.region);
    speechConfig.speechRecognitionLanguage = 'en-US';
    
    const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
    const recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);

    this.setState({
        displayText: 'speak into your microphone...'
    });

    recognizer.recognizeOnceAsync(result => {
        let displayText;
        if (result.reason === ResultReason.RecognizedSpeech) {
            displayText = `RECOGNIZED: Text=${result.text}`
        } else {
            displayText = 'ERROR: Speech was cancelled or could not be recognized. Ensure your microphone is working properly.';
        }

        this.setState({
            displayText: displayText
        });
    });
}

Running speech-to-text from a microphone is done by creating an AudioConfig object and using it with the recognizer.

const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);

Speech-to-text from file

To convert speech-to-text from an audio file, run the app and then click Convert speech to text from an audio file.. This will open a file browser and allow you to select an audio file. The following function fileChange is bound to an event handler that detects the file change.

async fileChange(event) {
    const audioFile = event.target.files[0];
    console.log(audioFile);
    const fileInfo = audioFile.name + ` size=${audioFile.size} bytes `;

    this.setState({
        displayText: fileInfo
    });

    const tokenObj = await getTokenOrRefresh();
    const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.authToken, tokenObj.region);
    speechConfig.speechRecognitionLanguage = 'en-US';

    const audioConfig = speechsdk.AudioConfig.fromWavFileInput(audioFile);
    const recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);

    recognizer.recognizeOnceAsync(result => {
        let displayText;
        if (result.reason === ResultReason.RecognizedSpeech) {
            displayText = `RECOGNIZED: Text=${result.text}`
        } else {
            displayText = 'ERROR: Speech was cancelled or could not be recognized. Ensure your microphone is working properly.';
        }

        this.setState({
            displayText: fileInfo + displayText
        });
    });
}

You need the audio file as a JavaScript File object, so you can grab it directly off the event target using const audioFile = event.target.files[0];. Next, you use the file to create the AudioConfig and then pass it to the recognizer.

const audioConfig = speechsdk.AudioConfig.fromWavFileInput(audioFile);
const recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);

Token exchange process

This sample application shows an example design pattern for retrieving and managing tokens, a common task when using the Speech JavaScript SDK in a browser environment. A simple Express back-end is implemented in the same project under server/index.js, which abstracts the token retrieval process.

The reason for this design is to prevent your speech key from being exposed on the front-end, since it can be used to make calls directly to your subscription. By using an ephemeral token, you are able to protect your speech key from being used directly. To get a token, you use the Speech REST API and make a call using your speech key and region. In the Express part of the app, this is implemented in index.js behind the endpoint /api/get-speech-token, which the front-end uses to get tokens.

app.get('/api/get-speech-token', async (req, res, next) => {
    res.setHeader('Content-Type', 'application/json');
    const speechKey = process.env.SPEECH_KEY;
    const speechRegion = process.env.SPEECH_REGION;

    if (speechKey === 'paste-your-speech-key-here' || speechRegion === 'paste-your-speech-region-here') {
        res.status(400).send('You forgot to add your speech key or region to the .env file.');
    } else {
        const headers = { 
            headers: {
                'Ocp-Apim-Subscription-Key': speechKey,
                'Content-Type': 'application/x-www-form-urlencoded'
            }
        };

        try {
            const tokenResponse = await axios.post(`https://${speechRegion}.api.cognitive.microsoft.com/sts/v1.0/issueToken`, null, headers);
            res.send({ token: tokenResponse.data, region: speechRegion });
        } catch (err) {
            res.status(401).send('There was an error authorizing your speech key.');
        }
    }
});

In the request, you create a Ocp-Apim-Subscription-Key header, and pass your speech key as the value. Then you make a request to the issueToken endpoint for your region, and an authorization token is returned. In a production application, this endpoint returning the token should be restricted by additional user authentication whenever possible.

On the front-end, token_util.js contains the helper function getTokenOrRefresh that is used to manage the refresh and retrieval process.

export async function getTokenOrRefresh() {
    const cookie = new Cookie();
    const speechToken = cookie.get('speech-token');

    if (speechToken === undefined) {
        try {
            const res = await axios.get('/api/get-speech-token');
            const token = res.data.token;
            const region = res.data.region;
            cookie.set('speech-token', region + ':' + token, {maxAge: 540, path: '/'});

            console.log('Token fetched from back-end: ' + token);
            return { authToken: token, region: region };
        } catch (err) {
            console.log(err.response.data);
            return { authToken: null, error: err.response.data };
        }
    } else {
        console.log('Token fetched from cookie: ' + speechToken);
        const idx = speechToken.indexOf(':');
        return { authToken: speechToken.slice(idx + 1), region: speechToken.slice(0, idx) };
    }
}

This function uses the universal-cookie library to store and retrieve the token from local storage. It first checks to see if there is an existing cookie, and in that case it returns the token without hitting the Express back-end. If there is no existing cookie for a token, it makes the call to /api/get-speech-token to fetch a new one. Since we need both the token and its corresponding region later, the cookie is stored in the format token:region and upon retrieval is spliced into each value.

Tokens for the service expire after 10 minutes, so the sample uses the maxAge property of the cookie to act as a trigger for when a new token needs to be generated. It is reccommended to use 9 minutes as the expiry time to act as a buffer, so we set maxAge to 540 seconds.

In App.js you use getTokenOrRefresh in the functions for speech-to-text from a microphone, and from a file. Finally, use the SpeechConfig.fromAuthorizationToken function to create an auth context using the token.

const tokenObj = await getTokenOrRefresh();
const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.authToken, tokenObj.region);

In many other Speech service samples, you will see the function SpeechConfig.fromSubscription used instead of SpeechConfig.fromAuthorizationToken, but by avoiding the usage of fromSubscription on the front-end, you prevent your speech subscription key from becoming exposed, and instead utilize the token authentication process. fromSubscription is safe to use in a Node.js environment, or in other Speech SDK programming languages when the call is made on a back-end, but it is best to avoid using in a browser-based JavaScript environment.

azurespeechreactsample's People

Contributors

glharper avatar microsoftopensource avatar trevorbye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azurespeechreactsample's Issues

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './lib/tokenize' is not defined by "exports"

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Download the source code, npm install and then npm run dev - as per the instructions

Any log messages given by the failure

Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './lib/tokenize' is not defined by "exports"

Expected/desired behavior

That it works

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
Win 11

Versions

Build 22621

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Does not function

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [x] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

follow the steps given

Any log messages given by the failure

Invalid options object. Dev Server has been initialized using an options object that does not match the API schema.

  • options.allowedHosts[0] should be a non-empty string.

Expected/desired behavior

That it works

OS and Version?

Windows 11

Versions

Mention any other details that might be useful

Deleting the proxy line in package.json or modifying it to

"options": {
    "allowedHosts": ["localhost", ".localhost"],
    "proxy": "http://localhost:3001"
  }

opens the webpage but returns the error

FATAL_ERROR: <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <title>Error</title> </head> <body> <pre>Cannot GET /api/get-speech-token</pre> </body> </html> 

or modifying the env file instead to include

DANGEROUSLY_DISABLE_HOST_CHECK=true

Opens the webpage but instead returns

FATAL_ERROR: Proxy error: Could not proxy request /api/get-speech-token from localhost:3000 to http://localhost:3001/ (ECONNREFUSED).

Thanks! We'll be in touch soon.

Problem with custom endpoint when auto detecting the language

This issue is for a:

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I'm trying to use an autoDetectConfig with custom model endpoint, I've modified sttFromMic in src/App.js as follow :

   async sttFromMic() {
        const tokenObj = await getTokenOrRefresh();
        const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.authToken, tokenObj.region);
        
        var enLanguageConfig = speechsdk.SourceLanguageConfig.fromLanguage("en-US");
        var frLanguageConfig = speechsdk.SourceLanguageConfig.fromLanguage("fr-FR", "b9a605f6-0a51-4ffa-9bda-c9ca9e951cb2");
        var autoDetectConfig = speechsdk.AutoDetectSourceLanguageConfig.fromSourceLanguageConfigs([enLanguageConfig, frLanguageConfig]);
        
        const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
        const recognizer = speechsdk.SpeechRecognizer.FromConfig(speechConfig, autoDetectConfig, audioConfig);

        this.setState({
            displayText: 'speak into your microphone...'
        });

        recognizer.recognizeOnceAsync(result => {
            let displayText;
            if (result.reason === ResultReason.RecognizedSpeech) {
                displayText = `RECOGNIZED: Text=${result.text}`
            } else {
                displayText = 'ERROR: Speech was cancelled or could not be recognized. Ensure your microphone is working properly.';
            }

            this.setState({
                displayText: displayText
            });
        });
    }

But it does not seems to use my custom model in french : I don't get the correct transcription (I have the standard transcription not the one from my custom model) and I don't get the corresponding logs in my custom model.
I tried in python with the same auth token and it works (I get the correct transcription :

import azure.cognitiveservices.speech as speechsdk

def from_mic():
    en_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US")
    fr_language_config = speechsdk.languageconfig.SourceLanguageConfig("fr-FR", 'b9a605f6-0a51-4ffa-9bda-c9ca9e951cb2')
    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig(sourceLanguageConfigs=[en_language_config, fr_language_config])
    #speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    speech_config = speechsdk.SpeechConfig(auth_token=token, region=service_region)
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config)
    
    print("Speak into your microphone.")
    result = speech_recognizer.recognize_once_async().get()
    print(result.text)

from_mic()

Any log messages given by the failure

I don't get any log or error

Expected/desired behavior

When detecting french it should use my custom model as described in https://docs.microsoft.com/fr-fr/azure/cognitive-services/speech-service/how-to-automatic-language-detection?pivots=programming-language-javascript

OS and Version?

Mac OS Big Sur

Versions

"microsoft-cognitiveservices-speech-sdk": "^1.17.0"

Trigger recognized event based on duration

I have a issue regarding the recognizing, recognized event. I have issue, that when a speaker speeks for a long time without interruption the recognized event does not get triggered for a long time, and alot of recognizing text starts being displayed/corrected which makes it hard to read for the user. I would like a faster trigger from the recognizing to the recognized event. Is there any way in the SDK to make a limit, lets say that after 5 seconds of speech it automatically triggers the recognized event.

CODE:

async function sttFromMic() {
const tokenObj = await getTokenOrRefresh();
const speechConfig = speechsdk.SpeechConfig.fromAuthorizationToken(tokenObj.authToken, tokenObj.region);
speechConfig.speechRecognitionLanguage = 'en-US';
const audioConfig = speechsdk.AudioConfig.fromDefaultMicrophoneInput();
const recognizer = new speechsdk.SpeechRecognizer(speechConfig, audioConfig);

    setDisplayText('speak into your microphone...');

    recognizer.recognizing = (s, e) => {
        setDisplayText(`RECOGNIZING: Text=${e.result.text}`);
    };

    recognizer.recognizeOnceAsync(result => {
        if (result.reason === speechsdk.ResultReason.RecognizedSpeech) {
            setDisplayText(`RECOGNIZED: Text=${result.text}`);
        } else {
            setDisplayText('ERROR: Speech was cancelled or could not be recognized. Ensure your microphone is working properly.');
        }
    });
}

SpeechRecognizer Continuous Recognition Displaying "the object is already disposed" error when run on localhost:3000 [Error appears on the normal browser but works fine on Incognito mode]

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

  1. Clone [AzureSpeechReactSample] code with continuous recognition example
  2. Replace SPEECH_KEY and SPEECH_REGION with appropriate key and region
  3. npm i
  4. npm run dev
  5. Open browser localhost:3000
  6. Get object is already disposed error

Expected/desired behavior

Object is already disposed error on http://localhost:3000

OS and Version?

Windows 10
Cognitive Services Speech SDK 1.32.0

Mention any other details that might be useful

Error screen:
object disposed error


Thanks! Hope to hear from you soon

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.