festvox / flite Goto Github PK

View Code? Open in Web Editor NEW

841.0 841.0 184.0 19.77 MB

A small fast portable speech synthesis system

License: Other

Makefile 0.20% C 99.33% C++ 0.09% Shell 0.17% Lex 0.01% Perl 0.01% Scheme 0.20%

flite's Issues

Windowed join functions

Is there any particular reason that windowed join is not implemented in Flite?

Are there any plans to include it in the future?

Configuring flite with .flitevox embedded?

Is it anyhow possible to include .flitevox voice(s) into the flite build?

In compilation section is seems possible to configure built-in voices through ./configure --with-langvox=transtac but I would like to compile in some high quality .flitefox from http://www.festvox.org/flite/packed/flite-2.1/voices/ .

The reason is that I am building flite for ffmpeg, which does not let you define .flitevox voice file in runtime.

Is there a guide for adding new language support to flite?

Hello,

Is there a guide for adding new language support to flite?

Built-in voice loading functions?

Hi, I'm using flite in my linux c++ project, and I'm trying to use the built-in voice loading function

extern "C"
{
    cst_voice *cmu_us_slt(); // built in function
}

But there's a link error, should I add more link flags besides -lflite?
Also, is the function name I'm using right?

[2.1] symbols removed but no soname bump

While trying to package flite version 2.1 for Debian¹, I noticed that three symbols (cst_read_2d_array, cst_read_array and cst_rx_not_indic) were dropped with respect to version 2.0. I was wondering if bumping of the soname was just forgotten or if there is anything else at stake.

Can you either bump the soname, or let me know what you think I should do instead?

¹ https://www.debian.org/

Avoid allocation of a buffer for the whole WAV file when streaming

This is a feature request.

Currently, when using streaming, a buffer is allocated large enough to hold a complete wav and each chunk is accessed through the start parameter of the callback function.

int my_stream_chunk(const cst_wave *w, int start, int size, 
                   int last, cst_audio_streaming_info *asi)
{
    // each call of this function has the same buffer and different start value
}

....
cst_audio_streaming_info *asi = cst_alloc(struct cst_audio_streaming_info_struct,1);
asi->min_buffsize = 256;
asi->asc = my_stream_chunk;
asi->userdata = NULL;

feat_set(v->features,"streaming_info",audio_streaming_info_val(asi));
cst_wave * wav = flite_text_to_wave(text_to_synth,v);
delete_wave(wav);

On embedded systems, memory is limited and sometimes the whole wav file is not required. For example when streaming to external devices.

The fact that a space for a hole file is allocated is limiting the length of the text that can be sent for synthesis.

A different approach could be to allocate enough space for the largest chunk and reuse it each time my_stream_chunk is called.

How to Create lexicon in indic Languages?

I have a list of 44k words and I want to create a lexicon of that. How can I do that?

about english cmu_lts_model.c and cmu_lex_data_raw.c

I have a question,can you help me?
why you cut the cmudict, and only 36964 english words in cmu_lex_data_raw.c
I know the cmudict contains 130000 english words, and I test the cmu_lts_model, it was performed poorly in cmu_lex_data_raw.c's 36964 words, about 90% word error rate. Why does this happen?(the cmu_lts_model is trained with cmudict which is removed the 36964 words? can you help me? thanks.
Forgive my poor English.

Library docs

I'm really sorry if I've missed anything, but it seems that readme doesn't include link to library documentation. Where can I find it?

P.S.: I'm also looking for list of source files which are part of library itself. Is it just everything under src/?

Dalek TTS voice on Picroft - diphone file structure

I'm trying to make a Mycroft/Picroft respond in a voice like the classic BBC Dr Who baddie, a Dalek.

I started with the standard British male Mimic diphone voice, it's already pretty robotic so it's well suited. For those who may be interested, I've altered it so that it does a passable Dalek impression which has involved two main steps;

The first is to break up the response into the individually delivered words (as in 'you ... will ... be ... exterminated') rather than running words together as in human speech. To do this on Mycroft I've interrupted coding at the point that the response has been translated into text (/mycroft-core/mycroft/audio/speech.py, at 'def handle_speak(event):') and changed the code at the 'else' point. Before I show any coding, I should say that, while I've been coding for many years, I'm a complete newbie to Python (and Mycroft/Picroft) and if I'm treading on toes or infringing things please let me know or delete this, and if you copy any of this you do so at your own risk (always make copies of the original files so that you can get back to the original code). This is what I changed it to;
else:
#insert pauses ('. ') between words for that dalek sound
utterance = utterance.replace(" ",". ")
utterance = utterance.replace(",",". . ")
utterance = utterance + ". "
mute_and_speak(utterance, ident, listen)

The second step was to add the Dalek electronic twang to the voice. After extensive Googling I found that this was originally created by passing the actor's voice through a 'ring modulator'(?). On another site (which I can't find at the moment, but the author deserves much the credit for this bit) I found that a 'software only' approximation of ring modulation was to merge a sine wave with the original voice. A sawtooth wave is a decent approximation of a sine wave and, I thought, might be faster so I chose that instead. Mycroft was reluctant to me adding the coding as a separate module so, again, I've had to butcher the original code, in this case '/mycroft-core/mycroft.tts/tts.py' at 'def _execute(self, sentence, ident, listen):'. The code was changed (at the point shown) to;

if os.path.exists(wav_file):
LOG.debug("TTS cache hit")
phonemes = self.load_phonemes(key)
else:
wav_file, phonemes = self.get_tts(sentence, wav_file)
if phonemes:
self.save_phonemes(key, phonemes)
vis = self.viseme(phonemes) if phonemes else None
try:
tooth_w = 0.01
tooth_h = 0.0
ifile = wave.open(wav_file,'rb')
channels = ifile.getnchannels()
frames = ifile.getnframes()
width = ifile.getsampwidth()
rate = ifile.getframerate()
audio = ifile.readframes(frames)
#remove the original file
ifile.close()
os.remove(wav_file)
# Convert buffer int16 using NumPy
audio16 = numpy.frombuffer(audio, dtype=numpy.int16)
empty16 = ([])
h = 1
d = tooth_w
for x in audio16:
n=x*h
empty16.append(n)
h = h - d
if h > 1 or h < tooth_h:
d = d * -1
outarray = numpy.array(empty16, dtype=numpy.int16)
dalek_file = wave.open(wav_file,'wb')
dalek_file.setnchannels(channels)
dalek_file.setframerate(rate)
dalek_file.setnframes(frames)
dalek_file.setsampwidth(width)
dalek_file.writeframes(outarray)
dalek_file.close()
except Exception as e:
print(e)
print("NOT dalekified")
finally:
self.queue.put((self.audio_ext, wav_file, vis, ident, l))

I also had to import the needed modules.

The tooth_h and tooth_w variables are the height and width of the sawtooth. I normally set tooth_h to 0, this means the sawtooth goes back and forth between 1 and 0 and the value deducted or added at each step is given by tooth_w (this should be between 0 and 1, preferably low) and the change in effect can be dramatic. There are hours of fun to be had messing about with tooth_w, there is a balance to be found between making it more 'Dalek' but keeping it intelligible.

My problem is that adding the coding at this point involves reopening the .wav file getting all the frames and precessing each, then rebuilding the file. This adds a 'noticeable' (read irritating) delay to the response, probably at least doubling the original noticeable response delay. My understanding of diphone voices are that they are created by concatenating tiny speech sounds held in some sort of database held in the original flitevox voice file. What would make it much faster would be to sawtooth each of these tiny fragments and return them to the file so that the Dalek voice was built in. Since each sawtooth fragment would be the same size as the original this shouldn't be a problem, if I could get at them. so my question is, is there an easy way to do this, or a complete description of the structure of a diphone file somewhere, or some kindly genius out there who could help? Cheers

Some questions about resource usage

I'm trying to find some more info about the library, I hope this is the right place to ask. I'm still very much a beginner when it comes to flite, so if anyone happens to know about any of this it would be incredibly helpful.

I'm attempting to get this library running on a resource-constrained platform, more specifically a 32-bit microcontroller with ~500 kB available RAM, 512kB ROM reserved for TTS, and plenty of flash storage. The plan is to output the resulting speech audio over i2s in real time.

Questions

About the following statement in the readme: "For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated."

Does this mean splitting text into scentences, or even words, can reduce the RAM requirement because the "waveform" will be shorter?
If so, would feeding individual words impact speech quality with the default US english lexicon?
Is this the same "runtime" spec listed at <1M in the readme's memory comparison table? (Or are there other metrics that heavily affect RAM usage?)

About the other memory requirements; as I understand it: core (60k) + USEnglish (100k) + lexicon (600k) + diphone (1800k) can all potentially be stored in ROM instead of RAM

Is this correct?
Is there any hope of moving at least the diphone and lexicon to NAND flash instead of RAM/ROM?
If so, how do I approach this?

Any pointers in the right direction are welcome! Including possible approaches as to how I might find some answers myself.

Getting undefined reference to 'us_tokenwords' while running make for indic voice

Hi. Would really appreciate any help. I am trying to convert indic clustergen voice built using festival to flite. But I am getting
flite-2.0.0-release/lang/cmu_indic_lang/cmu_indic_lang.c:89: undefined reference to us_tokentowords
while running make in flite directory

compiled binaries?

Are there some compiled binaries that can be used out of the box for windows? If yes, where can i download them, if no.. why arent there any?

making the .flitevox voices from source?

Hi guys, great great project. These voices are really amazing.

I've built flite 2.1 from source (probably one of the smoothest builds I've had in Linux) but I noticed that the .flitevox voices need to be downloaded instead of built from source?

I was wondering what the procedure for building these voices is and where I can find the source code?

Distortion with voice "clb"

Using latest Linux openSUSE, when using flite (compiled from source) with voice clb with a command line such as:
padsp flite -voice ~/gitprogs/flite/voices/cmu_us_clb.flitevox "one one one two five"
there is a nasty distortion of sound after each "one" enunciation.
Placing other words before the "one one one" helps eliminate this until at
"two three four five one one one two five"
the distortion disappears. It does not seem to be related to output volume and only occurs with voice clb, but may be related to pulse audio or some other factor. I'm wondering if this is a known issue and what simple tweaks might be helpful to track the issue down to the source?

How to conversion of FestVox voices to Flite?

From google/language-resources#31. I cannot conversion of FestVox voices to Flite.

gcc -g -O2 -Wall     -o flite_goog_th_unison flite_main.o flite_voice_list.o flite_lang_list.o -L . -lgoog_th_unison   -lflite_cmu_th_lang -lflite_cmu_th_lex -L/usr/local/src/tools/flite/build/x86_64-linux-gnu/lib -lflite   -lm  
/usr/bin/ld: cannot find -lflite_cmu_th_lang
/usr/bin/ld: cannot find -lflite_cmu_th_lex
collect2: error: ld returned 1 exit status
Makefile:108: recipe for target 'flite_goog_th_unison' failed
make: *** [flite_goog_th_unison] Error 1

add new language from scratch

Hello, it is possible to create a new language from audio and text files. If so, I would like to know the workflow. I am interested in giving German voices, ie flite files.

best regeards
Paul

reserved identifier violation

I would like to point out that identifiers like “_CST_CG_H__” and “_FLITE_H__” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?

16khz output from indic voice

Hello, I am using indic voice to generate the audio

./flite/bin/flite "-voice" flite/voices/cmu_indic_hin_ab.flitevox 'पुत्र मित्र आदि सगे संबंधियों' "-o" 'try.wav'

The output file try.wav is always 16khz. However, in the README.md it was mentioned that the output is deliberately kept at 8khz. Is it not valid for non-us voices?

Any plans to make new release?

I think that it would be good to flush currently committed changes and make new release :)

some questions about stress

./t2p covina
pau k ow v iy1 n ax pau

cmudict-0.4.out
covina nil k ow0 v iy1 n ax0

Hello, I meet some questions about the accent.
In the training data, there are ow0, ax0. there is not ow.
But when I use ./t2p to predict the words. I found the t2p print ow (not ow0)!
Could you help me? I want to know the detail about how the flite deals with the accent?
Thanks.

"to" is pronuanced "T AX" not "T UW1"?

Dear,

Could you kindly explain the reason of the issue as stated in the title?
Thanks!

Compatiblity on RTOS Platform

Is this code compatible for real time operating system (Nucleus RTOS) and what is the size required(RAM/ROM)

"VAL: tried to access car in 1023 typed val" error on big-endian (s390x)

When running flite-2.2 test on a big-endian arch (s390x), I'm getting this error:

$ cd flite-2.2
$ LD_LIBRARY_PATH=/builddir/build/BUILDROOT/flite-2.2-1.fc36.s390x/usr/lib64
$ make -C testsuite do_thread_test
make: Entering directory '/builddir/build/BUILD/flite-2.2/testsuite'
gcc -fopenmp -o multi_thread multi_thread_main.c \
	-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -DWORDS_BIGENDIAN=1    -I../include -L../build/s390x-linux-gnu/lib -lflite  -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -lm -lpulse-simple -lpulse  \
	-l flite_cmu_us_slt -lflite_cmulex -lflite_usenglish \
	-lflite -lm -lasound -lgomp
export OMP_NUM_THREADS=100 && ./multi_thread
VAL: tried to access car in 1023 typed val
VAL: tried to access car in 1023 typed val
make: *** [Makefile:89: do_thread_test] Error 255
make: Leaving directory '/builddir/build/BUILD/flite-2.2/testsuite'

Allow installation under MacOs

The current branch does not install on Mac OS. The reason for this is that the cp command hard-coded in the Makefile of the ./main directory uses flags not supported on Mac.
cp -pd ...
This can be solved by replacing the flags with -r in case of "Darwin". Even though -r has a totally different semantics it can be used as a replacement in this particular case.

    UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Darwin)
CP_FLAGS='-r'
else
CP_FLAGS='-pd'
endif
.....
	cp $(CP_FLAGS) $(flite_LIBS_deps) $(DESTDIR)$(INSTALLLIBDIR)

how to handle the project in Android ?

hi,
Thanks for share the great project,
but as a newer,maybe a little difficult to Android ,
Any advice or suggestion will be appreciated.

Thx

cmu_us_slt couldn't be loaded

Hello!

I loaded flite in VS2017.
VS2017 did some upgrades, and now I have

fliteDLL
cmu_us_rms

There is also the project "cmu_us_slt", but it's marked as "(not available)".

I went into the folder "flite-master\lang\cmu_us_slt", and I didn't see a vcprj or vcproj file there.

What is the "cmu_us_slt" about, and do I not need it?

new language: missing make_cart.scm

flite/lang/usenglish/Makefile

Line 64 in 6c9f20d

us_pos_cart.c:

make_cart.scm is not available
it could be use useful to start converting a voice in a new language

How to add a new language?

Hi all,

Awesome small library. I do have a question? Is it possible to add a new support language? if yes, how?

Many Thanks!

Multisyn voice integration

Is there any way that we can convert our voice built using Multisyn in festival to that of flite. I can't seem to find any way for it.

GCC 11.2.1 "does not match original declaration" warnings with LTO

When compiling flite-2.2 on Fedora development branch (rawhide/f36), I'm getting the following warnings:

making ../build/x86_64-linux-gnu/lib/libflite_cmulex.so
../../lang/cmulex/cmu_lex.c:49:27: warning: type of 'cmu_lex_phone_table' does not match original declaration [-Wlto-type-mismatch]
   49 | extern const char * const cmu_lex_phone_table[54];
      |                           ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: array types have different bounds
   14 | const char * const cmu_lex_phone_table[57] =
      |                    ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: 'cmu_lex_phone_table' was previously declared here
...
making ../build/x86_64-linux-gnu/lib/libflite_cmu_grapheme_lex.so
../../lang/cmu_grapheme_lex/cmu_grapheme_lex.h:47:27: warning: type of 'unicode_sampa_mapping' does not match original declaration [-Wlto-type-mismatch]
   47 | extern const char * const unicode_sampa_mapping[16663][5];
      |                           ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: array types have different bounds
    9 | const char * const unicode_sampa_mapping[16798][5] =
      |                    ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: 'unicode_sampa_mapping' was previously declared here

Token to Words - How to keep track of which words belong to a token?

Hey everyone, so I'm stuck on a problem: I need to send user-inputted text through Flite, and then display the original text on screen with synced up word highlighting. The problem is that when a token gets expanded into multiple words (1983 -> Nineteen Eighty Three) I can't find a way to keep these words "grouped" together so that I can then sync all three words up to the original highlighted token "1983". I've tried modifying the us_tokentowords function so that it returns all the words in a single string, but I can't quite get it to work. Has anyone here come up with a solution to any similar problems? Any help would be much appreciated, thanks!

How to configure default global settings such as voice?

Is there some way like a configuration file to set global defaults for things like the voice?
I am trying to use flite as the TTS backend for Okular (Document viewer) but I'm unable to use a voice other that the default kal16.

how to extend the default dictionary?

As the title, if I want to add more words which are not included in the default dictionary, what should I do?

memory leak problem

there's a memory leak problem in the function "ffeature_string", can you solve it?

find_sts_main.c fails to compile on mingw-w64 clang

ccache clang -mtune=generic -O2 -pipe -Wall -DCST_NO_SOCKETS -DUNDER_WINDOWS -DWIN32     -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1  -I../include  -c -o find_sts_main.o find_sts_main.c
In file included from find_sts_main.c:47:
In file included from ../include\cst_args.h:43:
In file included from ../include/cst_features.h:44:
In file included from ../include/cst_val.h:43:
In file included from ../include/cst_file.h:63:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windows.h:69:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windef.h:8:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\minwindef.h:163:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\winnt.h:1554:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\x86intrin.h:15:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\immintrin.h:18:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\xmmintrin.h:3005:
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: error: conflicting types for '_mm_clflush'
void _mm_clflush(void const * __p);
     ^
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: note: '_mm_clflush' is a builtin with type 'void (const void *)'
1 error generated.

Due to defining const to empty before including any system header, it breaks the function declaration for builtin functions

https://github.com/festvox/flite/blob/master/tools/find_sts_main.c#L45

m-ab-s/media-autobuild_suite#1706

Add demo/example links to README

Maybe to Soundcloud.

Tutorial explaining flite build process for new languages

This is a ToDo and I am hoping to get to this the last weekend in October. The idea is to build a tutorial describing the procedure to build a deployable voice in one Indian language that can expose the API capabilities of flite.

Indic voice builds broken as of commit e988047

Some changes that were introduced to the voice templates (mostly for grapheme voices) now break builds of indic voices.

In particular, this does not get defined in indic voices, since they are part of indic lang.

How to build shared libraries on macOS?

configure --enable-shared doesn't seem to enable building of shared libraries on macOS, while it is working fine in Linux.

Tested on Catalina (x86_64) and Monterey (arm64).

Add support for wasm32-wasi target

I know that WASI is currently experimental as a compilation target, but nonetheless, I did manage to get flite compiled to it and run using CraneStation/wasmtime runtime. If it's of any interest, I'd be more than happy to submit a preliminary PR and work on it to get it merged into master. Also, I'd be more than happy to monitor any changes to WASI in the future and submit any relevant updates.

flite to convert Text to phonetic (hello --> [HH EH L OW])

I need to use flite to convert Text to phonetic ( hello --> [HH EH L OW]). How can biuld part of the project to do that. I need just this part of Code.
realy, I need small fast run-time tolkit to use as a front-end convert text to .lab file for test new sentence in HTS.

Demo and different voices

Where can I download a demo with different voices before I try to compile flite?

exit -1 illegal number

flite/tools/setup_flite

Line 71 in 7c1994b

exit -1

;)

MBROLA voices?

Hey there!

Festival has support for Mbrola voices, which is pretty cool. I'd like to know whether it's possible to use them here in Flite too? I know there's a way of converting festvox to flitevox files, but I'm not sure how Mbrola is handled.

Thanks!

listing voices from a voicedir

Hello,

After putting .flitevox files in /usr/share/flite, I would have assumed that

flite -voicedir /usr/share/flite -lv

would have listed the voice stored in in /usr/share/flite, but that is not actually working.

It would be really useful to have this so that Linux distributions can just store voices there for them to be available to users without them having to understand the inners of voice paths etc.

Samuel

Is there a way to increase/add the pause between words?

Although flite is very fast, it sounds like the words are attached together while speaking, the whitespace that should separate the words is hard to determine in the file that is generated.

For example this line, when spoken the words are attached (I have found that for almost all words in a sentence), I can recognize the words when I'm looking at the text, but it's hard when the text is not there.

flite -ps -t "hello my name is John Doe!"

Outputs:

pau hh ax l ow m ay n ey m ih z jh aa n d ow pau

And when spoken, (without ps flag), the sound is exactly like that. The pauses are only between the sentences and not between the words.

I tried to look through the documentations and not finding anything, I tried to look through the code to see if I can increase the pause duration, but i couldn't find anything at all.

I found it hard to imagine I'm the only one who noticed this but I couldn't find anything on it so I'm making this issue.

flite version: flite-2.3-current Mar 2022 (http://cmuflite.org)
OS: Arch Linux x86_64
Kernel: 5.17.6-arch1-1

Include a festival tts into flite

I have a Swedish festival TTS and I need to include this language into Flite but when I do it, there are always some libraries that I don´t have or the program cannot find.

I downloaded the swedish tts from here --> http://person2.sol.lu.se/JohanFrid/festival/download.html
And I use thiese commands:
$FLITEDIR/tools/setup_flite
./bin/build_flite cg
cd flite
make

{macOS} Speak from command line

I can compile and run flite OK on macOS (10.14.6), but I can only generate .wav files - it won't read text from the command line or a file as in this example:

./bin/flite doc/alice

What do I need to set up to be able to do that? I don't see a way to select an audio device.

Using flite with pulseaudio

Using flite-2.0.7-current Jul 2017 on openSUSE Leap 15. Did git pull to ensure that latest code installed.
Pulseaudio controls where my audio is sent. Using padsp with flite works fine, audio goes to the right device. The flite docs indicate that pulse-simple can be compiled in. Experimentation shows I get the fewest errors on compile with ./configure --with-audio=pulseaudio and the link message indicates that pulse and pulse-simple are linked in. But when I run the new flite without padsp flite still complains that it cannot find /dev/dsp, so I guess I am missing a little detail or my understanding of what is supposed to happen is incomplete.

festvox / flite Goto Github PK

flite's Issues

Questions

Recommend Projects

Recommend Topics

Recommend Org