Audio Analysis

This application is a Starter Kit (SK) that is designed to get you up and running quickly with a common industry pattern, and to provide information about best practices around Watson services. The Audio Analysis application was created to highlight the combination of the Speech to Text (STT) and AlchemyLanguage services as an Audio Analysis tool. This application can serve as the basis for your own applications that follow that pattern.

Demo.

Note: This sample application only works on desktop computer systems, and then only in the Firefox and Chrome web browsers.

How this app works
Getting Started
About the Audio Analysis pattern
User interface in this sample application
Troubleshooting

How this app works

The Audio Analysis application extracts concepts from YouTube videos.

To begin, select or specify a YouTube video. As the video streams, the Speech to Text service transcribes its audio track. That text is then piped to the AlchemyLanguage service for analysis, it extracts concepts from the transcription with an associated score.

Getting started

Clone the repository into your computer.

git clone https://github.com/watson-developer-cloud/audio-analysis.git

Sign up in Bluemix or use an existing account.
If it is not already installed on your system, download and install the Cloud-foundry CLI tool.
Edit the manifest.yml file in the folder that contains your code and replace audio-analysis-starter-kit with a unique name for your application. The name that you specify determines the application's URL, such as application-name.mybluemix.net. The relevant portion of the manifest.yml file looks like the following:
```
applications:
- services:
  - speech-to-text-service
  - alchemy-language-service
  name: application-name
  command: npm start
  path: .
  memory: 512M
```
Connect to Bluemix:

cf api https://api.ng.bluemix.net
cf login

Create and retrieve service keys to access the AlchemyLanguage service:

cf create-service alchemy_api free alchemy-language-service
cf create-service-key alchemy-language-service myKey
cf service-key alchemy-language-service myKey

Create and retrieve service keys to access the Speech to Text service:

cf create-service speech_to_text standard speech-to-text-service
cf create-service-key speech-to-text-service myKey
cf service-key speech-to-text-service myKey

Create a .env file in the root directory of your clone of the project repository by copying the sample .env.example file using the following command:

cp .env.example .env

You will update the .env with the information you retrieved in steps 6 and 7.

The .env file will look something like the following:

ALCHEMY_LANGUAGE_API_KEY=
SPEECH_TO_TEXT_USERNAME=
SPEECH_TO_TEXT_PASSWORD=

Install the dependencies you application need:

npm install

Start the application locally:

npm start

Open a browser and go to: http://localhost:3000/
Push the application to Bluemix:

cf push

After completing the steps above, you are ready to test your application. Start a browser and enter the URL of your application.

        <your application name>.mybluemix.net

See the User interface in this sample application section for information about modifying the existing user interface to support other video sources.

About the Audio Analysis pattern

First, make sure you read the Reference Information to understand the services that are involved in this pattern.

Using the Speech To Text and the AlchemyLanguage services

When a quality audio signal contains terms found in the current source of concepts in AlchemyLanguage, the combination of Speech To Text and AlchemyLanguage can be used to analyze the audio source to build summaries, indices, and to provide recommendations for additional related content. Though the Speech-To-Text service supports several languages, the AlchemyLanguage service currently only supports English.

The Audio Analysis app uses the node.js Speech-To-Text JavaScript SDK, which is a client-side library for audio transcriptions from the Speech To Text service. It also uses the concepts feature from AlchemyLanguage to extract concepts.

When to use this pattern

You need to analyze or index content contained within speech.
You want to make content recommendations based on speech.

Best practices

The quality of the audio source determines the quality of the transcript, which affects the quality of extracted concepts and recommendations.
The quality and confidence of the extracted concepts increases with the amount of transcribed text.

Reference information

The following links provide more information about the AlchemyLanguage and Speech to Text services, including tutorials on using those services:

AlchemyLanguage

API documentation: Get an in-depth understanding of the AlchemyLanguage service
API explorer: Try out the REST API

Speech To Text

API documentation: Get an in-depth understanding of the Speech To Text service
API reference: SDK code examples and reference
API Explorer: Try out the API

User interface in this sample application

The user interface that this sample application provides is intended as an example, and is not proposed as the user interface for your applications. However, if you want to use this user interface, you will want to modify the following files:

src/views/index.ejs - Lists the YouTube videos and footer values that are shown on the demo application's landing page. These items are defined using string values that are set in the CSS for the application.
src/views/videoplay.js - Maps YouTube video URLs to API calls and initiates streaming. You will want to expand or modify this if you want to use another video source or player.
src/index.js - Supports multiple types of YouTube URLs. You will want to expand or modify this if you want to use another video source or player.

Troubleshooting

When troubleshooting your Bluemix app, the most useful source of information is the execution logs. To see them, run:

$ cf logs <application-name> --recent

Open Source @ IBM

Find more open source projects on the IBM GitHub Page

License

This sample code is licensed under the Apache 2.0 license. Full license text is available in LICENSE.

Contributing

See CONTRIBUTING.

michaelz-voice2sports / audio-analysis Goto Github PK

audio-analysis's Introduction