Giter VIP home page Giter VIP logo

msspeech's Introduction

msspeech

not official API for Microsoft speech synthesis from Microsoft Edge web browser read aloud

Installation

pip install --upgrade msspeech

or

poetry add msspeech

After updating an already installed library To update the list of voices, run the command in your terminal:

msspeech-update-voices

or

poetry run msspeech-update-voices

Notes

Bad news

Since the first of July 2022, the list of voices and the API as a whole has been very much limited!

But there is also good news

They returned back some male voices and added new languages, as well as made support for emotional styles. Despite the fact that styles appeared in JSON, you still won't be able to use them, SSML does not perceive them. SSML is very limited here, so there is no point in supporting it.

The official documentation is not suitable for this API. It seems this API uses undocumented SSML markup.

https://docs.microsoft.com/ru-ru/azure/cognitive-services/speech-service/language-support#text-to-speech

Using

the pitch and rate values are set as a percentage from -100 to +100, that is, it can be a negative, positive number, or zero for the default value.

examples: -30, 40, 0

The volume should be a fractional number from 0.1 to 1.0, but in fact it doesn't work for some reason.

The maximum synthesize text length is approximately 31000 characters per request.

from CLI

synthesize text:

msspeech Guy hello --filename audio.mp3

update voices list:

msspeech-update-voices

From python

import asyncio
from msspeech import MSSpeech


async def main():
	mss = MSSpeech()
	print("Geting voices...")
	voices = await mss.get_voices_list()
	print("searching Russian voice...")
	for voice in voices:
		if voice["Locale"] == "ru-RU":
			print("Russian voice found:", voice["FriendlyName"])
			await mss.set_voice(voice["Name"])


	print("*" * 10)
	filename = "audio.mp3"
	# with open("s.txt", encoding="UTF8") as f: text:str = f.read()
	text = "Или написать текст здесь"
	print("waiting...")
	await mss.set_rate(10)
	await mss.set_pitch(0)
	await mss.set_volume(1.0)
	await mss.synthesize(text.strip(), filename)
	print("*"*10)
	print("SUCCESS! OK!")
	print("*"*10)

if __name__ == "__main__":
	asyncio.run(main())

msspeech's People

Contributors

alekssamos avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

msspeech's Issues

Does the CLI support reading text from a file?

There is a text file containing several sentences. When specifying this file through the CLI, multiple audio files will be outputted, with each sentence separated by a newline character, and each line representing one audio.

Option to use the Narrator voices?

In new Windows 11 builds, Narrator can use the exact same voices that are available in Edge, but offline, and seemingly more customizable. These voices install as store apps, so we don't need an internet connection either, like you seem to in order to use the Edge voices. Is it possible to get these integrated into this project?

(problem and solution) save the audio in data buffer

@alekssamos Thank you very much for this tool, I barely know it and it is perfect for some projects I am working on.

Problem:
When trying to save the audio file in a data buffer, the following error is displayed

msspeech/__init__.py, line 520, in _synthesize

     bc += await f.write(resp[1])

TypeError: object int can't be used in 'await' expression

My code

file = BytesIO()
await mss.synthesize(data.text.strip(), file)
file.seek(0)

Solution:
remove the await and it worked for me.

Doesn't work with SoundLoader (kivy)

I already tried to save the file as .mp3, .wav, .ogg, doesn't matter, kivy isn't able to load properly.
b'Unrecognized audio format'
from kivy.app import App
from kivy.core.audio import SoundLoader

class MusicPlayerApp(App):
def build(self):
# load the mp3 file using SoundLoader
sound = SoundLoader.load(R"D:\Documents\aaaa.ogg")
# play the audio file
sound.play()
return

if name == 'main':
MusicPlayerApp().run()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.