❗conrad is a reboot of mscstts. Instead of httr, which is superseded and not recommended, we use httr2 to perform HTTP requests to the Microsoft Cognitive Services Text to Speech REST API.
conrad serves as a client to the Microsoft Cognitive Services Text to Speech REST API. The Text to Speech REST API supports neural text to speech voices, which support specific languages and dialects that are identified by locale. Each available endpoint is associated with a region.
Before you use the text to speech REST API, a valid account must be registered at the Microsoft Azure Cognitive Services and you must obtain an API key. Without an API key, this package will not work.
Install the CRAN version:
install.packages("conrad")
Or install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("fhdsl/conrad")
- Sign in/Create an Azure account on Microsoft Azure Cognitive Services.
- Click
+ Create a resource
(below “Azure services” or click on the Hamburger button) - Search for “Speech” and Click
Create
->Speech
- Create a Resource group and a “Name”.
- Choose
Pricing tier
(you can choose the free version withFree F0
) - Click
Review + create
, review the Terms, and clickCreate
.
If the deployment was successful, you should see ✅ Your deployment is complete on the next page.
- Under
Next steps
, clickGo to resource
- Look on the left sidebar and under
Resource Management
, clickKeys and Endpoint
- Copy either
KEY 1
orKEY 2
to clipboard. Only one key is necessary to make an API call.
Once you complete these steps, you have successfully retrieved your API keys to access the API.
Location/Region
, which you use to make calls
to the API. Specifying a different region will lead to a HTTP 403
Forbidden
response.
For more detailed information on each step, refer to the API Key vignette.
You can set your API key in a number of ways:
- Edit
~/.Renviron
and setMS_TTS_API_KEY = "YOUR_API_KEY"
- In
R
, useoptions(ms_tts_key = "YOUR_API_KEY")
. - Set
export MS_TTS_API_KEY=YOUR_API_KEY
in.bash_profile
/.bashrc
if you’re usingR
in the terminal. - Pass
api_key = "YOUR_API_KEY"
in arguments of functions such asms_list_voices(api_key = "YOUR_API_KEY")
.
ms_list_voice()
uses the
tts.speech.microsoft.com/cognitiveservices/voices/list
endpoint to get
a full list of
voices
for a specific region. It attaches a region prefix to this endpoint to
get a list of voices for that region.
For example, to get a list of all the voices for the westus
region, it
uses the
https://westus.tts.speech.microsoft.com/cognitiveservices/voices/list
endpoint.
ms_list_voice(api_key = "YOUR_API_KEY", region = "westus")
ms_synthesize()
uses the
tts.speech.microsoft.com/cognitiveservices/v1
endpoint to convert
text to
speech.
The endpoint requires Speech Synthesis Markup Language
(SSML)
to specify the language, gender, and full voice name.
# Convert text to speech
res <- ms_synthesize(script = "Hello world, this is a talking computer", region = "westus", gender = "Male")
# Returns hexadecimal representation of binary data
# Create file to store audio output
output_path <- tempfile(fileext = ".wav")
# Write binary data to output path
writeBin(res, con = output_path)
# Play audio in browser
play_audio(audio = output_path)
If you want more examples of different voices with different scripts, refer to the Introduction to conrad vignette.
ms_get_token()
makes a request to the issueToken
endpoint to get an
access
token.
The function require an API key and region as inputs. The access token
is used to send requests to the API.
ms_get_token(api_key = "YOUR_API_KEY", region = "westus")
- To enhance the reliability of our package, we have transitioned from using httr to httr2 for handling HTTP requests to the Text to Speech REST API. This change was motivated by the fact that httr is no longer being actively maintained, with updates limited to those necessary for CRAN compatibility. In contrast, httr2 represents a modern reimagining of httr and is strongly recommended for usage.
- It resolves the HTTP 403 Forbidden
issue. An HTTP 403
Forbidden response status code signifies that the server comprehends
the request but denies authorization. In the case of
mscstts::ms_synthesize()
, the problem arose due to the use of an invalid voice within the HTTP request, specifically concerning the chosen region. For instance, the SSML might have contained a voice name that was not supported in thewestus
region. As a consequence, the server would reject the HTTP request. - We have made significant improvements to the documentation across the entire package. These enhancements include simpler function names, commented functions for clarity, removal of unnecessary functions and arguments, and URLs directing users to pages that explain text-to-speech jargon.
We believe that these improvements will greatly enhance the usability of the package and make it even more reliable in the long-term.
conrad wouldn’t be possible without prior work on mscstts by John Muschelli and httr2 by Hadley Wickham.