eheikes / tts Goto Github PK

View Code? Open in Web Editor NEW

93.0 6.0 19.0 1.72 MB

Tools to convert text to speech :books::speech_balloon:

License: Apache License 2.0

JavaScript 100.00%

aws-polly amazon tts speech

tts's Introduction

Text-To-Speech Tools

This monorepository includes tools to convert text of any size to speech:

Command-line interface (CLI) tool to convert text to speech
Web TTS CLI tool to convert webpages to speech

These tools require an account with at least one of these (paid) services:

Amazon Web Services for AWS Polly
Google Cloud Platform for GCP Text-to-Speech

Contributing

Pull requests and suggestions are welcome. Create a new issue to report a bug or suggest a new feature.

Development commands:

npm install      # download the project dependencies
npm run lint     # lint code
npm run test     # run tests

tts's People

Contributors

Stargazers

Watchers

Forkers

tianhang ilabutk alexv53 shri3k byronlafleur nraval1729 sts0mrg0 sonodave g4spow anthropos2050 larue3000 barslev 5l1v3r1 gongzhihong repdota kyra-vega erkrystof hereisfahad siddht1

tts's Issues

add support for Azure Cloud Text to Speech

Feature Request

Microsoft/Azure has some good new neural net voices and can handle switching voices in the same SSML document. It would be nice to have support for Azure TTS built into tts-cli.

Azure opens a 10 minute web socket per authorized token instead of a character limit per API call.
That makes it harder to figure how much text it can actually convert in 10 minutes.

Ability to specify the path to ffmpeg binary

SSML submissions

Never mind... I didn't look at the options

Polly is capable of SSML submissions, but I'm not sure if aws-tts has a method to process an existing SSML document. I did a trial on an SSML document, and it read it as text rather then SSML.

It's this something I'll have to look elsewhere for or is it possible we'll see it here?

not connecting Google Cloud TTS using --service gcp option

Attempting to call the gcp service to convert a test file using google cloud tts

Win 10 machine
tts-cli works fine with AWS version (tts-cli version1.6.0)
AWS CLI is installed and credentials are using passed using the path environment to the locally stored credentials when using tts-cli with AWS

gcloud CLI is installed path is set to TTS project json key file for using gcp

I have a google cloud account and have downloaded the json key file. Not quite sure how to put all the pieces together so tts-cli accesses and passes the credentials to google cloud tts api. Not sure about the voice names for google cloud tts

When I try to run a test on the google cloud tts using the --service gcp option it seems to default to using the aws service and not even attempt to access the gcp api

What is the exact command you are running? For your security, please "X" out any AWS access keys and secrets.
tts test.txt test.mp3 --service gcp --private-key xxxxxxxxx --email [email protected] --project-id xxxxxxxxx --language en-US --voice Wavenet-A
What result are you seeing in the console? Copy & paste the exact output you get, with debugging turned on (see the Troubleshooting section for how to enable debugging).

tts-cli called with arguments {"":["test.txt","test.mp3"],"service":"gcp","private-key":"xxxxx","email":"[email protected]","project-id":"xxxxxx","language":"en-US","voice":"Wavenet-A"} +0ms
tts-cli input: test.txt +0ms
tts-cli output: test.mp3 +0ms
readText Reading from test.txt +0ms
readText Finished reading (1273 bytes) +16ms
chunkText Chunked into 3 text parts +0ms
splitText Stripping whitespace +0ms
generateSpeech Options: {"ffmpeg":"ffmpeg","format":"mp3","limit":5,"region":"us-east-1","type":"text","voice":"Wavenet-A","":["test.txt","test.mp3"],"service":"gcp","private-key":"xxxxx","email":"[email protected]","project-id":"xxxxx","language":"en-US"} +0ms
createPolly Creating Polly instance in us-east-1 +0ms
generateAll Requesting 3 audio segments, 5 at a time +0ms
callAws Opening output stream to C:\Users\ElJefe\AppData\Local\Temp\844094d6-82e4-4b79-bdf1-848a8eda2555.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%2...&TextType=text&VoiceId=Wavenet-A&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=X-Amz-Date=20181112T153349Z&X-Amz-Expires=1800&X-Amz-Signature=8...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\ElJefe\AppData\Local\Temp\06b2f6b2-1de1-439c-9e08-b65e5b3ba3f8.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=And%20God%...&TextType=text&VoiceId=Wavenet-A&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=-Amz-Expires=1800&X-Amz-Signature=a...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\ElJefe\AppData\Local\Temp\0dbfd3de-25bf-4ce9-800c-1600ce1eea98.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=And%20ther...&TextType=text&VoiceId=Wavenet-A&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20181112T153349Z&X-Amz-Expires=1800&X-Amz-Signature=6...&X-Amz-SignedHeaders=host +0ms
callAws Closing output stream +0ms
generateAll Requested all parts, with error HTTPError: Response code 400 (Bad Request) +0ms
callAws Error during request: Response code 400 (Bad Request) +0ms
callAws Amazon responded with {"message":"1 validation error detected: Value 'Wavenet-A' at 'voiceId' failed to satisfy constraint: Member must satisfy enum value set: [Nicole,

If copyright allows, please upload your input file somewhere (e.g. pastebin) and put a link to it here.
What OS are you using (Windows, OSX, Linux) and what version?

Windows 10

What version of Node.js is being used? (Run node -v in the console to find out.)
v10.13.0
What version of ffmpeg is being used? (Run ffmpeg -version in the console to find out.)
ffmpeg version 4.0.2

Add option for Breathing.

Polly has a new breath feature. is there a way to turn that on? https://aws.amazon.com/blogs/machine-learning/amazon-polly-releases-new-ssml-breath-feature/

Add tests

Support for lexicons

AWS Polly allows lexicons (customized pronunciation files) to be used, but aws-tts doesn't support using these as far as I can tell. Any chance support for this could be added, even if users need to upload the lexicons separately through the console?

SSML chunking adds pauses in speech

The chunkXml method is not very efficient and results in suboptimal SSML that can add unnecessary pauses at points in the audio.

Should check for errors reading stdin

readText() doesn't have a handler for errors when stdin is read; we should probably be watching for that.

Clean files upon ffmpeg error

Clean files even when ffmpeg fails. Otherwise, the files are left behind

ffmpeg.on('error', err => {
    reject(new Error('Could not start ffmpeg process'));
    // --> cleanup(manifestFile);
});
ffmpeg.on('close', code => {
    // --> cleanup(manifestFile);
    if (code > 0) {
        spinner.fail();
        return reject(new Error(`ffmpeg returned an error (${code})`));
    }
    spinner.end();
    resolve(newFile);
});

audio files and text

Can you share the audio files you generated with polly and corresponding text?

Language options seems to be uneffective with Polly

What is the exact command you are running? For your security, please "X" out any AWS access keys and secrets.

echo "Wie geht es dir?" | tts tests.mp3 --region eu-central-1 --language de-DE

The voice remains with a strong english accent, regardless what language option I choose. Much different from the listening to the language samples on the aws-polly webpage.

What result are you seeing in the console? Copy & paste the exact output you get, with debugging turned on (see the Troubleshooting section for how to enable debugging).

C:\Users\Tim\Desktop\tts-master>echo "Wie geht es dir?" | tts tests.mp3 --region eu-central-1 --language de-DE
tts-cli called with arguments {"_":["tests.mp3"],"region":"eu-central-1","language":"de-DE"} +0ms
tts-cli input: null +4ms
tts-cli output: tests.mp3 +2ms
readText Reading from stdin +0ms
readText Finished reading (21 bytes) +15ms
chunkText Chunked into 1 text parts +0ms
splitText Stripping whitespace +0ms
generateSpeech Options: {"ffmpeg":"ffmpeg","format":"mp3","language":"de-DE","limit":5,"region":"eu-central-1","type":"text","voice":"Joanna"} +0ms
create Creating AWS Polly instance in eu-central-1 +0ms
generateAll Requesting 1 audio segments, 5 at a time +0ms
generate Opening output stream to C:\Users\Tim\AppData\Local\Temp\4b15cfa8-0852-445a-96ea-2823a2a0a301.mp3 +0ms
generate Making request to https://polly.eu-central-1.amazonaws.com/v1/speech?LanguageCode=de-DE&OutputFormat=mp3&Text=%22Wie%20g...&TextType=text&VoiceId=Joanna&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20200501T132156Z&X-Amz-Expires=1800&X-Amz-Signature=8...&X-Amz-SignedHeaders=host +0ms
generate Closing output stream +0ms
generateAll Requested all parts, with error null +0ms
createManifest Creating C:\Users\Tim\AppData\Local\Temp\f10afdc3-b7f5-4d22-ac47-3fafe240fbb4.txt for manifest +0ms
createManifest Writing manifest contents:
createManifest file 'C:\Users\Tim\AppData\Local\Temp\4b15cfa8-0852-445a-96ea-2823a2a0a301.mp3' +0ms
combine Combining files into C:\Users\Tim\AppData\Local\Temp\a30bfdd2-a32c-4533-b1f5-88a5e177813b.mp3 +0ms
combineEncodedAudio Running ffmpeg -f concat -safe 0 -i C:\Users\Tim\AppData\Local\Temp\f10afdc3-b7f5-4d22-ac47-3fafe240fbb4.txt -c copy C:\Users\Tim\AppData\Local\Temp\a30bfdd2-a32c-4533-b1f5-88a5e177813b.mp3 +0ms
combineEncodedAudio
combineEncodedAudio ffmpeg version git-2020-05-01-39fb1e9 Copyright (c) 2000-2020 the FFmpeg developers
combineEncodedAudio built with gcc 9.3.1 (GCC) 20200328
combineEncodedAudio configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --disable-w32threads --enable-libmfx --enable-ffnvcodec --enable-cuda-llvm --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt --enable-amf
combineEncodedAudio
combineEncodedAudio libavutil 56. 43.100 / 56. 43.100
combineEncodedAudio libavcodec 58. 82.100 / 58. 82.100
combineEncodedAudio libavformat 58. 42.101 / 58. 42.101
combineEncodedAudio libavdevice 58. 9.103 / 58. 9.103
combineEncodedAudio libavfilter 7. 80.100 / 7. 80.100
combineEncodedAudio libswscale 5. 6.101 / 5. 6.101
combineEncodedAudio libswresample 3. 6.100 / 3. 6.100
combineEncodedAudio libpostproc 55. 6.100 / 55. 6.100
combineEncodedAudio
combineEncodedAudio [mp3 @ 000001cd6adc5900] Estimating duration from bitrate, this may be inaccurate
combineEncodedAudio
combineEncodedAudio Input #0, concat, from 'C:\Users\Tim\AppData\Local\Temp\f10afdc3-b7f5-4d22-ac47-3fafe240fbb4.txt':
combineEncodedAudio Duration: N/A, start: 0.000000, bitrate: 48 kb/s
combineEncodedAudio
combineEncodedAudio Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 48 kb/s
combineEncodedAudio
combineEncodedAudio Output #0, mp3, to 'C:\Users\Tim\AppData\Local\Temp\a30bfdd2-a32c-4533-b1f5-88a5e177813b.mp3':
combineEncodedAudio Metadata:
combineEncodedAudio TSSE : Lavf58.42.101
combineEncodedAudio
combineEncodedAudio Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 48 kb/s
combineEncodedAudio Stream mapping:
combineEncodedAudio Stream #0:0 -> #0:0 (copy)
combineEncodedAudio Press [q] to stop, [?] for help
combineEncodedAudio
combineEncodedAudio size= 6kB time=00:00:01.01 bitrate= 51.0kbits/s speed=1.8e+03x
combineEncodedAudio video:0kB audio:6kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.620992%
combineEncodedAudio +0ms
combineEncodedAudio ffmpeg process completed with code 0 +0ms
cleanup Manifest is file 'C:\Users\Tim\AppData\Local\Temp\4b15cfa8-0852-445a-96ea-2823a2a0a301.mp3' +0ms
cleanup Deleting temporary file C:\Users\Tim\AppData\Local\Temp\4b15cfa8-0852-445a-96ea-2823a2a0a301.mp3 +1ms
cleanup Deleting manifest file C:\Users\Tim\AppData\Local\Temp\f10afdc3-b7f5-4d22-ac47-3fafe240fbb4.txt +5ms
moveTempFile copying C:\Users\Tim\AppData\Local\Temp\a30bfdd2-a32c-4533-b1f5-88a5e177813b.mp3 to tests.mp3 +0ms

If copyright allows, please upload your input file somewhere (e.g. pastebin) and put a link to it here.
What OS are you using (Windows, OSX, Linux) and what version?

Windows 10 64bit 18363

What version of Node.js is being used? (Run node -v in the console to find out.)

v12.16.3

What version of ffmpeg is being used? (Run ffmpeg -version in the console to find out.)

git-2020-05-01-39fb1e9

Any idea what I may need to fix. I'm new to this and just trying to speech2text some school literature.
Thanks.

Used format: aws-tts [inputfile] outputfile [--access-key KEY --format mp3]
✔ Reading text
✔ Splitting text
✔ Convert to audio (99/99)
✖ Combine audio
ℹ ffmpeg returned an error (1):
receive errors:
[mp3 @ 0x7fc596008000] Format mp3 detected only with low score of 1, misdetection possible!

[mp3 @ 0x7fc596008000] Failed to read frame size: Could not seek to 1030.
[concat @ 0x7fc595802e00] Impossible to open '/var/folders/xl/mwfv7m896z1g2fnr6c3mrgy00000gn/T/f71ad45b-8738-4170-8b20-86164d2f0f7e.mp3'

Problems installing

Hi. I'm a newbie but eventually had success with most of the parts of the install. But when it came to running the command to install aws-tts (npm install aws-tss -g), I kept getting an error message. I've attached the log here. Any idea of what's preventing the install?

I'm on OS X, High Sierra. Node v. 8.9.4. FFmpeg 3.4.1.

The console readout is here:

npm ERR! code E404
npm ERR! 404 Not Found: aws-tss@latest

npm ERR! A complete log of this run can be found in:
npm ERR! /Users/XXXXX/.npm/_logs/2018-01-24T09_50_38_284Z-debug.log

Thank you so much for your help!

The typing on SSML triggers page to refresh

The SSML audio repeats on typing on input text form

handleChange = e => { //e.preventDefault(); this.props.updateState({ type: 'CAPTURE_RESPONSE', payload: e.target.value }) }

RESOURCE_EXHAUSTED: Received message larger than max (4380929 vs. 4194304)

This worked great for me previously but now I'm getting this error for many files.

Thought maybe I could fix it by lowering the limits in the script ( maxCharacterCount: service === 'gcp' ? 5000 : 1500,) but that didn't help

tts-cli called with arguments {"_":["32tephi.txt","/Users/cybe/workhorse/AUDIO/Wavenet/32tephi.txt.wav"],"format":"pcm","speed":0.8,"service":"gcp","language":"en-US","voice":"en-GB-Wavenet-C"} +0ms
tts-cli input: 32tephi.txt +2ms
tts-cli output: /Users/cybe
32tephi.txt
/workhorse/AUDIO/Wavenet/32tephi.txt.wav +0ms
readText Reading from 32tephi.txt +0ms
readText Finished reading (180139 bytes) +14ms
chunkText Chunked into 215 text parts +0ms
splitText Stripping whitespace +0ms
generateSpeech Options: {"ffmpeg":"ffmpeg","format":"pcm","language":"en-US","limit":5,"region":"us-east-1","speed":0.8,"type":"text","voice":"en-GB-Wavenet-C"} +0ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +0ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +1ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +19ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +0ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +6ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +1ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +3ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +0ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +2ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +0ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +3ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +0ms
snapdragon:compiler initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/compiler.js +5ms
snapdragon:parser initializing /usr/local/lib/node_modules/tts-cli/node_modules/snapdragon/lib/parser.js +0ms
create Creating Google Cloud TTS instance +0ms
generateAll Requesting 215 audio segments, 5 at a time +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/0c42771c-449f-43a8-b10f-27ff0da6986f.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/a4e76553-83f0-4cc7-aa38-bb705d26e0c5.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/86c96f66-abf4-4eec-a6b6-9381420c441a.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/26c4a4f5-e47e-409c-aede-1304cfd32f8f.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/d172ee27-ee92-49fd-b1df-4725f0184e82.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/c0cc2460-d445-46a5-91cc-8cd7ae0f3169.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/66508a64-f82a-4c75-9c54-022616cb294f.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/a11edb34-23df-41d4-a682-8f87f42acb23.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/887ffa3f-e42c-4d1e-8a6c-3af494eab02a.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/68d5b7ec-f496-4c79-a7f3-22090e3ccc0b.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/bf825d0b-6454-4c3a-b06b-f334732bbb4f.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/824173ca-3c95-41da-9280-11663483221f.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/ad093c2a-6a61-4af4-a618-a9c1d33f52bc.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/97dec93f-6e78-4ffa-a94e-bee538f4ee26.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/a47548fe-049a-435f-97e1-37ed231c07ca.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/a07412c4-215a-49c6-819f-033f3f8f0fe3.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/f800a9c1-c6eb-46c3-865f-25f2bbac855d.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/fd5b5139-8902-4351-b1a0-c01d8bd0dd03.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/28ebd30b-7146-45c0-b86c-24dc79b7840c.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/52b22c0e-c612-4b93-8e7b-1f373a277e5e.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/a90717b0-e8e5-4e76-b181-45f9ee402f95.wav +0ms
generate Making request to Google Cloud Platform +0ms
generate Error during request: 8 RESOURCE_EXHAUSTED: Received message larger than max (4380929 vs. 4194304) +0ms
generateAll Requested all parts, with error Error: 8 RESOURCE_EXHAUSTED: Received message larger than max (4380929 vs. 4194304) +0ms
Error: 8 RESOURCE_EXHAUSTED: Received message larger than max (4380929 vs. 4194304)
at Object.exports.createStatusError (/usr/local/lib/node_modules/tts-cli/node_modules/grpc/src/common.js:91:15)
at Object.onReceiveStatus (/usr/local/lib/node_modules/tts-cli/node_modules/grpc/src/client_interceptors.js:1204:28)
at InterceptingListener._callNext (/usr/local/lib/node_modules/tts-cli/node_modules/grpc/src/client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (/usr/local/lib/node_modules/tts-cli/node_modules/grpc/src/client_interceptors.js:618:8)
at callback (/usr/local/lib/node_modules/tts-cli/node_modules/grpc/src/client_interceptors.js:845:24)
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/ecc9afe1-e96d-4f5a-9be1-29f59065d81c.wav +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/626ba1b1-f174-4795-bf40-8e6efa000152.wav +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/35f61d31-a47a-4708-9524-850b8e8d7f0c.wav +0ms
generate Writing audio content to /var/folders/v6/hs1cz6wd6px8spzv37rzk5c40000gn/T/4e947f66-7d54-4db6-a5e6-3f5b5b3dc2f3.wav +0ms

Use listr for progress indicator

Use https://github.com/SamVerschueren/listr to indent the list a bit and make the progress meter a bit easier to work with.

"--format pcm" doesn't work

The ffmpeg command fails when concatenating PCM files.

https://trac.ffmpeg.org/wiki/audio%20types

Add usage info

Add support for device profiles in GCP

negative pitch values fail using gcp mode using -2.0 works fine with +2.0

The generateSpeech Options seems to generate a null value if a negative value is used for the pitch parameter. example -2.0 Works fine with a positive value +2.0

What is the exact command you are running? For your security, please "X" out any AWS access keys and secrets.

tts test.txt test.mp3 --service gcp --project-file xxxx.json --language en-uS --type ssml --voice en-US-Wavenet-B --pitch -2.0

tts-cli called with arguments {"2":0,"_":["test.txt","test.mp3"],"service":"gcp","project-file":"xxxx.json","language":"en-uS","type":"ssml","voice":"en-US-Wavenet-B","pitch":true} +0ms

What result are you seeing in the console? Copy & paste the exact output you get, with debugging turned on

tts-cli called with arguments {"2":0,"_":["test.txt","test.mp3"],"service":"gcp","project-file":"xxxx.json","language":"en-uS","type":"ssml","voice":"en-US-Wavenet-B","pitch":true} +0ms

generateSpeech Options: {"ffmpeg":"ffmpeg","format":"mp3","language":"en-uS","limit":5,"pitch":null,"projectFile":"xxxx.json","region":"us-east-1","type":"ssml","voice":"en-US-Wavenet-B"} +0ms

generate Error during request: 3 INVALID_ARGUMENT: Synthesizer RPC generic::invalid_argument: Invalid (or unsupported) synthesis parameters +0ms
generateAll Requested all parts, with error Error: 3 INVALID_ARGUMENT: Synthesizer RPC generic::invalid_argument: Invalid (or unsupported) synthesis parameters +0ms
generate Error during request: 3 INVALID_ARGUMENT: Synthesizer RPC generic::invalid_argument: Invalid (or unsupported) synthesis parameters +0ms
Error: 3 INVALID_ARGUMENT: Synthesizer RPC generic::invalid_argument: Invalid (or unsupported) synthesis parameters
at Object.exports.createStatusError (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\common.js:87:15)
at Object.onReceiveStatus (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:1188:28)
at InterceptingListener._callNext (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:564:42)
at InterceptingListener.onReceiveStatus (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:614:8)
at callback (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:841:24)

If copyright allows, please upload your input file somewhere (e.g. pastebin) and put a link to it here.
What OS are you using (Windows, OSX, Linux) and what version?
What version of Node.js is being used? (Run node -v in the console to find out.)
What version of ffmpeg is being used? (Run ffmpeg -version in the console to find out.)

Speed and pitch control for large documents

Looking for a way to control the speed and pitch of the Polly TTS conversion using aws-tts. When I use ssml tags in the large documents the process of chunking up the ssml document causes errors. I want to use the tags at the beginning and end of a large document.

Thanks,

Dave

Add support for Google Cloud Text-To-Speech

Give the option to use Google's TTS API instead of AWS Polly.

This would probably necessitate some name changes, but might make this tool more useful if it's generalized for various APIs.

Resources

Batch process- enhancement

A batch process that could load a list of documents to be run under a defined set of parameters for the entire run of documents. Sometimes I need to process 200 to 300 documents and it would be nice to be able to do a batch run since they are all set for the same parameters. A --batch tag that could pickup a text file with the documents names and default to naming the audio with the same title as the document. I am not much of a programmer so I don,t know if this is possible.

Document how to use with npx

So people don't (necessarily) need to install this package, document how to use it with npx (part of npm now).

Stuck on Convert to Audio 0/1

As the tile suggest, on the latest, I am getting this error, it just hangs on this.

Is it working for others?

Guess input & output formats from the file extensions

Have the default formats for input and output based on their file extensions. E.g. aws-tts input.ssml output.mp3 would use SSML as the input format and MP3 as the output format. The --format or --type options would override these.

Allow AWS access keys to be specified as options

Add access-key and secret-key CLI options.

Retry if AWS request fails

Sometimes AWS has a server error: HTTPError: Response code 503 (Service Unavailable). If this is the case, maybe retry the request.

Add --help option

If used, the process should exit with a 0 code. Otherwise an error code (when not passing any options) is okay.

Beginner

Hello there
can you do a quick video on how to setup stuff
i have windows , i downloaded the required files and encoder and got the aws credentials but didn't know how to get started using your aws-tts

SyntaxError: Unexpected token

/usr/local/lib/node_modules/aws-tts/tts.js:10
const { checkUsage, compressSpace, generateSpeech, getSpinner, trim } = require('./lib');
^

SyntaxError: Unexpected token {
at exports.runInThisContext (vm.js:53:16)
at Module._compile (module.js:374:25)
at Object.Module._extensions..js (module.js:417:10)
at Module.load (module.js:344:32)
at Function.Module._load (module.js:301:12)
at Function.Module.runMain (module.js:442:10)
at startup (node.js:136:18)
at node.js:966:3

aws-tts started throwing the following errors on both my remote linux box and local windows machine

✔ Reading text
✔ Splitting text
✖ Convert to audio (4/11)
HTTPError: Response code 400 (Bad Request)
at EventEmitter.ee.on.res (/usr/lib/node_modules/aws-tts/node_modules/got/index.js:182:24)
at emitOne (events.js:96:13)
at EventEmitter.emit (events.js:188:7)
at Immediate.setImmediate (/usr/lib/node_modules/aws-tts/node_modules/got/index.js:61:8)
at runCallback (timers.js:672:20)
at tryOnImmediate (timers.js:645:5)
at processImmediate [as _immediateCallback] (timers.js:617:5)

These errors started on a local windows and remote linux box at the same time this evening.
tested the CLI for polly on the windows box and it works so the issue seems to be in aws-tts

Ask user to upgrade Node if necessary

see https://github.com/typicode/please-upgrade-node

Invalid SSML Request - Please Help!

Thank you so much for your help with aws-tts!
I keep getting the following ERROR - Response code 400 (Bad Request): Invalid SSML request

What is the exact command you are running? For your security, please "X" out any AWS access keys and secrets. aws-tts tlsaeaudio1.txt tlsaeaudioc.mp3 --voice Matthew --type ssml --access-key XXXXXXXXXX --secret-key XXXXXXXXXX
What result are you seeing in the console? Copy & paste the exact output you get, with debugging turned on (see the Troubleshooting section for how to enable debugging).
C:\Users\Matt>aws-tts tlsaeaudio1.txt tlsaeaudioc.mp3 --voice Matthew --type ssml --access-key XXXXXXXXXX --secret-key XXXXXXXXXX
aws-tts called with arguments {"":["tlsaeaudio1.txt","tlsaeaudioc.mp3"],"voice":"Matthew","type":"ssml","access-key":"XXXXXXXX","secret-key":"XXXXXXXX"} +0ms aws-tts input: tlsaeaudio1.txt +4ms
aws-tts output: tlsaeaudioc.mp3 +1ms
Reading text
readText Reading from tlsaeaudio1.txt +0ms
readText Finished reading (10222 bytes) +0ms
Splitting text
chunkXml Started SAX XML parser +0ms
chunkXml Found tag: {"name":"speak","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Section 1... +0ms
chunkXml Adding chunk:
Section 1
... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Introduction... +0ms
chunkXml Adding chunk:
Introduction
... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Course Overview and Objectives... +0ms
chunkXml Adding chunk:
Course Overview and Objectives</p... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Welcome to... +0ms
chunkXml Adding chunk:
Welcome to
... +0ms
chunkXml Found tag: {"name":"emphasis","attributes":{"level":"moderate"},"isSelfClosing":false} +0ms
chunkXml Found text: Wise... +0ms
chunkXml Adding chunk:
Wise</emphasi... +0ms
chunkXml Found closing tag: "emphasis" +0ms
chunkXml Adding "emphasis" to extra tags and popping the stack +0ms
chunkXml Found text: Traffic... +0ms
chunkXml Adding chunk:
Tr... +0ms
chunkXml Found tag: {"name":"prosody","attributes":{"pitch":"-10%"},"isSelfClosing":false} +0ms
chunkXml Found text: School’s... +0ms
chunkXml Adding chunk:
School’s... +0ms
chunkXml Found closing tag: "prosody" +0ms
chunkXml Adding "prosody" to extra tags and popping the stack +0ms
chunkXml Found text: Traffic Law and Substance Abuse Education Course. ... +0ms
chunkXml Adding chunk:
Traffic ... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Why does the State of Florida require a person who... +0ms
chunkXml Adding chunk:
Why does the State of Florida req... +0ms
chunkXml Found tag: {"name":"phoneme","attributes":{"alphabet":"ipa","ph":"laɪvz"},"isSelfClosing":false} +0ms
chunkXml Found text: lives... +0ms
chunkXml Adding chunk:
lives... +0ms
chunkXml Found closing tag: "phoneme" +0ms
chunkXml Adding "phoneme" to extra tags and popping the stack +0ms
chunkXml Found text: on our highways, and our state is particularly con... +0ms
chunkXml Adding chunk:
</pho... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: This course has the following objectives: First, t... +0ms
chunkXml Adding chunk:
This course has the following obj... +0ms
chunkXml Found tag: {"name":"emphasis","attributes":{"level":"moderate"},"isSelfClosing":false} +0ms
chunkXml Found text: our... +0ms
chunkXml Adding chunk:
our</emphasis... +0ms
chunkXml Found closing tag: "emphasis" +0ms
chunkXml Adding "emphasis" to extra tags and popping the stack +0ms
chunkXml Found text: country, impaired driving is one of the leading ca... +0ms
chunkXml Adding chunk:
co... +0ms
chunkXml Adding chunk:
Third, to encourage students to make a c... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: When a person is killed in a traffic crash, the pe... +0ms
chunkXml Adding chunk:
When a person is killed in a traf... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: In... +0ms
chunkXml Adding chunk:
In
... +0ms
chunkXml Found tag: {"name":"emphasis","attributes":{"level":"moderate"},"isSelfClosing":false} +0ms
chunkXml Found text: our... +0ms
chunkXml Adding chunk:
our</emphasis... +0ms
chunkXml Found closing tag: "emphasis" +0ms
chunkXml Adding "emphasis" to extra tags and popping the stack +0ms
chunkXml Found text: state, there have been many examples of young, und... +0ms
chunkXml Adding chunk:
st... +0ms
chunkXml Adding chunk:
In this tragic crash, it was reported th... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Numerous laws and safe driving practices were viol... +0ms
chunkXml Adding chunk:
Numerous laws and safe driving pr... +0ms
chunkXml Found tag: {"name":"emphasis","attributes":{"level":"moderate"},"isSelfClosing":false} +0ms
chunkXml Found text: our... +0ms
chunkXml Adding chunk:
our</emphasis... +0ms
chunkXml Found closing tag: "emphasis" +0ms
chunkXml Adding "emphasis" to extra tags and popping the stack +0ms
chunkXml Found text: state, a driver is presumed to be impaired with a ... +0ms
chunkXml Adding chunk:
st... +0ms
chunkXml Adding chunk:
The combination of alcohol and speed res... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Of the many causative factors in this crash, drink... +0ms
chunkXml Adding chunk:
Of the many causative factors in ... +0ms
chunkXml Adding chunk:
It is sobering to realize that poor judg... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: This course will address the following subjects: P... +0ms
chunkXml Adding chunk:
This course will address the foll... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Before we begin our discussion of the physiologica... +0ms
chunkXml Adding chunk:
Before we begin our discussion of... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Alcohol... +0ms
chunkXml Adding chunk:
Alcohol
... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: According to the 2011 National Survey on Drug Use ... +0ms
chunkXml Adding chunk:
According to the 2011 National Su... +0ms
chunkXml Found tag: {"name":"break","attributes":{"time":"5s"},"isSelfClosing":true} +0ms
chunkXml Found closing tag: "break" +0ms
chunkXml Adding "break" to extra tags and popping the stack +0ms
chunkXml Found text: Alcohol is used by about half of those in the 18 t... +0ms
chunkXml Adding chunk:
Alcohol is used... +0ms
chunkXml Adding chunk:
This is about 14 times the 5% of young p... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Tobacco... +0ms
chunkXml Adding chunk:
Tobacco
... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: The National Survey on Drug Use and Health also ga... +0ms
chunkXml Adding chunk:
The National Survey on Drug Use a... +0ms
chunkXml Found tag: {"name":"break","attributes":{"time":"2s"},"isSelfClosing":true} +0ms
chunkXml Found closing tag: "break" +0ms
chunkXml Adding "break" to extra tags and popping the stack +0ms
chunkXml Found text: As with young alcohol users, use of tobacco produc... +0ms
chunkXml Adding chunk:
As with young a... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: Illicit Drugs... +0ms
chunkXml Adding chunk:
Illicit Drugs
... +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found text: The National Survey on Drug Use and Health lists t... +0ms
chunkXml Adding chunk:
The National Survey on Drug Use a... +0ms
chunkXml Adding chunk:
According to this survey the following a... +0ms
chunkXml Found tag: {"name":"break","attributes":{"time":"5s"},"isSelfClosing":true} +0ms
chunkXml Found closing tag: "break" +0ms
chunkXml Adding "break" to extra tags and popping the stack +0ms
chunkXml Found text: Alcohol, tobacco and drug use among young people c... +0ms
chunkXml Adding chunk:
Alcohol, tobacc... +0ms
chunkXml Adding chunk:
Also, 95% or more are not using other il... +0ms
chunkXml Found tag: {"name":"p","attributes":{},"isSelfClosing":false} +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found closing tag: "p" +0ms
chunkXml Adding "p" to extra tags and popping the stack +0ms
chunkXml Found closing tag: "speak" +0ms
chunkXml Adding "speak" to extra tags and popping the stack +0ms
chunkXml Reached end of XML +0ms
splitText Stripping whitespace +0ms
generateSpeech Options: {"access-key":"XXXXXXXX","ffmpeg":"ffmpeg","format":"mp3","limit":5,"region":"us-east-1","secret-key":"XXXXXXXX","type":"ssml","voice":"Matthew","":["tlsaeaudio1.txt","tlsaeaudioc.mp3"]} +0ms
createPolly Creating Polly instance in us-east-1 +0ms
Convert to audio (0/40)
generateAll Requesting 40 audio segments, 5 at a time +0ms
callAws Opening output stream to C:\Users\Matt\AppData\Local\Temp\56ccf6e0-57b5-4661-b517-fd3d1b83828b.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%3...&TextType=ssml&VoiceId=Matthew&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20180412T021421Z&X-Amz-Expires=1800&X-Amz-Signature=8...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\Matt\AppData\Local\Temp\a27b5ceb-f9b2-405a-a192-d77cf177dcee.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%3...&TextType=ssml&VoiceId=Matthew&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20180412T021421Z&X-Amz-Expires=1800&X-Amz-Signature=1...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\Matt\AppData\Local\Temp\224a2fc9-aae3-4287-8044-ccde2e996e45.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%3...&TextType=ssml&VoiceId=Matthew&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20180412T021421Z&X-Amz-Expires=1800&X-Amz-Signature=4...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\Matt\AppData\Local\Temp\55c713aa-5d24-4930-bf0d-6369c5cfc30b.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%3...&TextType=ssml&VoiceId=Matthew&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20180412T021421Z&X-Amz-Expires=1800&X-Amz-Signature=e...&X-Amz-SignedHeaders=host +0ms
callAws Opening output stream to C:\Users\Matt\AppData\Local\Temp\08cc789b-d7fa-4f66-a308-f543645faf1b.mp3 +0ms
callAws Making request to https://polly.us-east-1.amazonaws.com/v1/speech?OutputFormat=mp3&Text=%3Cspeak%3...&TextType=ssml&VoiceId=Matthew&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...&X-Amz-Date=20180412T021421Z&X-Amz-Expires=1800&X-Amz-Signature=b...&X-Amz-SignedHeaders=host +0ms
callAws Closing output stream +0ms
generateAll Requested all parts, with error HTTPError: Response code 400 (Bad Request) +0ms
callAws Error during request: Response code 400 (Bad Request) +0ms
callAws Amazon responded with {"message":"Invalid SSML request"} +0ms
Response code 400 (Bad Request): Invalid SSML request
callAws Closing output stream +0ms
callAws Error during request: Response code 400 (Bad Request) +0ms
callAws Amazon responded with {"message":"Invalid SSML request"} +0ms
callAws Closing output stream +0ms
callAws Error during request: Response code 400 (Bad Request) +0ms
callAws Amazon responded with {"message":"Invalid SSML request"} +0ms
callAws Closing output stream +0ms
callAws Closing output stream +0ms
If copyright allows, please upload your input file somewhere (e.g. pastebin) and put a link to it here.
What OS are you using (Windows, OSX, Linux) and what version?
Windows
What version of Node.js is being used? (Run node -v in the console to find out.)
v8.9.4
What version of ffmpeg is being used? (Run ffmpeg -version in the console to find out.)
ffmpeg version N-90094-ga877d22d9a

Thank you so much for your help!

Matt

Throws error when running under Windows (ffmpeg)

Trying to run the ffmpeg CLI throws an error:

C:\>aws-tts test.txt test.mp3
√ Reading text
√ Splitting text
√ Convert to audio (1/1)
× Combine audio
Error: ffmpeg returned an error (1)
    at ChildProcess.ffmpeg.on.code (C:\Users\Eric\AppData\Roaming\npm\node_modules\aws-tts\lib.js:160:25)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at maybeClose (internal/child_process.js:877:16)
    at Socket.<anonymous> (internal/child_process.js:334:11)
    at emitOne (events.js:96:13)
    at Socket.emit (events.js:188:7)
    at Pipe._handle.close [as _onclose] (net.js:498:12)

Allow the sample rate to be specified

AWS has a SampleRate parameter: http://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html#polly-SynthesizeSpeech-request-SampleRate

Add mp3 info + cover image to audio file

Make this a program!

I'm pretty much code illiterate and your walkthrough was the only thing that has allowed me to use polly the way I want. You could really do some good if you created some sort of self contained program with a GUI. Something someone can just input their own AWS credentials into, pick options, and drop a .txt file into.

--engine tag for Amazon neural voices is not returning neural voice audio but regular voice

The new --neural tag to select AWS neural voices is returning standard voice audio.

Text String + Access Key Issue

Hi,

Instead of giving a text file, can we not pass the text within the command and get it converted to speech?

example: aws-tts "hello, this is a test text to speech file using amazon polly" hello.mp3

Also, I am trying to add my AWS key but it keeps returning with help prompt.

using this command aws-tts --access-key --myaccesskeyhere

SSML intermittent error

The code is:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import apv
import audioread
import boto3
from boto3 import client
from boto3 import Session
from botocore.exceptions import BotoCoreError, ClientError
import csv
import mysql.connector
from mysql.connector import Error
import os
import sys
import wave

access_key = apv.aws["aws_key1"]
secret_key = apv.aws["aws_key2"]

polly_client = boto3.Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    region_name='us-west-2').client('polly')
   
event_ID = sys.argv[1]
event = str(event_ID)

pathout = "/var/www/html/data/event"

db_host = apv.sql["host"]
db_name = apv.sql["name"]
db_user = apv.sql["user"]
db_pass = apv.sql["pass"]
db_port = apv.sql["port"]
        
db = mysql.connector.connect(
    host = db_host,
    database=db_name,
    user=db_user,
    password=db_pass,
    port=db_port)
    
sql_select_Query = "select * from Voice where event_ID = %s"
cursor = db.cursor()
cursor.execute(sql_select_Query, (event_ID,))
record = cursor.fetchall()
for row in record:
    voice = row[1]
    names = row[2]
    text1a = row[3]
    text1b = row[4]
text1 = text1a + " " + names + " " + text1b
    
#if voice == "Matthew" or voice == "Joanna":
#    spkOpen = '<speak><amazon:domain name="news">'
#    spkClose = '</amazon:domain></speak>'
#else:
spkOpen = '<speak>'
spkClose = '</speak>'
    
fileOut1 = pathout + event + '/event' + event + '-1.wav'
response1 = polly_client.synthesize_speech(VoiceId=voice,
    Engine='neural',
    OutputFormat='pcm',
    TextType='ssml',
    Text=spkOpen + text1 + spkClose)

wav_file = wave.open(fileOut1, 'wb')
wav_file.setparams((1, 2, 16000, 0, 'NONE', 'NONE'))
wav_file.writeframes(response1['AudioStream'].read())

This is obviously being run against a database. Some events work. Some don't.

I have set the name, text1a, and text1b values the same in two different events for test purposes. If the script works for an event in the database, it doesn't matter what I have name, text1a, and text1b set to, it works. Likewise, if the script does not work for an event, it doesn't matter what I have name, text1a, and text1b set to, it won't work.

For the case where it doesn't work, I get the following results when I run the script:

Traceback (most recent call last):
  File "/var/www/html/py-scripts/wed-voice1.py", line 67, in <module>
    Text=spkOpen + text1 + spkClose)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 316, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 635, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidSsmlException: An error occurred (InvalidSsmlException) when calling the SynthesizeSpeech operation: Invalid SSML request

Any idea what the problem might be?

Split out lib routines

Split out the code in lib.js into individual modules, to make testing and maintenance easier. This should allow more code coverage too (fewer /* istanbul ignore */ directives).

Engine String? (Neural vs Standard)

I was taking a look at the new voices Polly has and was wondering how I would add the 'engine' string to the aws-tts line?

from: polly-SynthesizeSpeech-request-VoiceId

Engine

   Specifies the engine (standard or neural) for Amazon Polly to use when processing input text for speech synthesis. Using a voice that is not supported for the engine selected will result in an error.

   Type: String

   Valid Values: standard | neural

Fail gracefully if lexicon can't be found

I was running aws-tts using a lexicon. Unfortunately, I uploaded the lexicon into the wrong region, so Polly wasn't able to locate the lexicon. The error that came out was:

✔ Reading text
✔ Splitting text
✖ Convert to audio (0/1)
HTTPError
at EventEmitter.ee.on.res (/usr/local/lib/node_modules/aws-tts/node_modules/got/index.js:182:24)
at emitOne (events.js:115:13)
at EventEmitter.emit (events.js:210:7)
at Immediate.setImmediate (/usr/local/lib/node_modules/aws-tts/node_modules/got/index.js:61:8)
at runCallback (timers.js:800:20)
at tryOnImmediate (timers.js:762:5)
at processImmediate [as _immediateCallback] (timers.js:733:5)

Using Google on these results, I found Erik's website at https://ericheikes.com/text-speech-tool-aws-polly/#comment-63512 but reading that someone might think their credentials are wrong when they're actually correct (as they were in my case). It'd be really good if aws-tts could give more helpful output when there's an error, if that's possible - I haven't checked whether AWS sends anything more useful back in its errors.

What result are you seeing in the console? snapdragon:compiler initializing C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\snapdragon\lib\compiler.js +16ms
snapdragon:parser initializing C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\snapdragon\lib\parser.js +0ms
snapdragon:compiler initializing C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\snapdragon\lib\compiler.js +0ms
snapdragon:parser initializing C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\snapdragon\lib\parser.js +0ms
create Creating Google Cloud TTS instance +0ms
generateAll Requesting 41 audio segments, 5 at a time +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
generate Making request to Google Cloud Platform +0ms
Auth error:TypeError: URL is not a constructor
Auth error:TypeError: URL is not a constructor
Auth error:TypeError: URL is not a constructor
Auth error:TypeError: URL is not a constructor
Auth error:TypeError: URL is not a constructor
generate Error during request: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
generateAll Requested all parts, with error Error: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
generate Error during request: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
generate Error during request: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
generate Error during request: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
generate Error during request: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor +0ms
Error: 14 UNAVAILABLE: Getting metadata from plugin failed with error: URL is not a constructor
at Object.exports.createStatusError (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\common.js:91:15)
at Object.onReceiveStatus (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:1204:28)
at InterceptingListener._callNext (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:568:42)
at InterceptingListener.onReceiveStatus (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:618:8)
at callback (C:\Users\ElJefe\AppData\Roaming\npm\node_modules\tts-cli\node_modules\grpc\src\client_interceptors.js:845:24)
If copyright allows, please upload your input file somewhere (e.g. pastebin) and put a link to it here.
What OS are you using (Windows, OSX, Linux) and what version?
What version of Node.js is being used? (Run node -v in the console to find out.)
What version of ffmpeg is being used? (Run ffmpeg -version in the console to find out.)