Giter VIP home page Giter VIP logo

flite's Introduction

              "Building Voices in Festival"
 Alan W Black ([email protected]), Kevin Lenzo ([email protected])
                 and see ACKNOWLEDGEMENTS
                  http://www.festvox.org

For full details about voice building see the document itself

http://festvox.org/bsv/

The included documentation, scripts and examples should be sufficient for an interested person to build their own synthetic voices in currently supported languages or new languages in the University of Edinburgh's Festival Speech Synthesis System. The quality of the result depends much on the time and skill of the builder. For English it may be possible to build a new voice in a couple of days work, a new language may take months or years to build. It should be noted that even the best voices in Festival (or any other speech synthesis system for that matter) are still nowhere near perfect quality.

This distribution includes

Support for designing, recording and autolabelling statistical parametric
    synthesis voices
Support for designing, recording and autolabelling diphone databases
Support for designing, recording and autolabelling unit selection dbs
Building simple limited domain synthesis engines
Support for building rule driven and data driven prosody models
   (duration, intonation and phrasing)
Support for building rule driven and data driven text analysis
Lexicon and building Letter to Sound rule support
Predefined scripts for building new US (and UK) English voices
Predefined scripts for building grapheme(++) voices for any language
Scripts for designing and selecting prompts to record for
   arbitrary languages

New in 2.8

https://github.com/festvox/festival/
Grapheme built voices can be converted to .flitevox files for android
Database size reduction for random forest clustergen voices
Random Forests for F0 prediction too
18 English voices, and 13 Indic voices

New in 2.7

Random forest models building for spectrum and duration in clustergen
Grapheme based synthesizers (with specific support for large number
  of unicode writing systems)
Clustergen state and stop value optimization
Wavesurfer label support
SPAM F0 support
Phrase break support
Support for SPTK's mgc parameterization

New in 2.3

Support for cygwin tools under Windows
Substantially improved CLUSTERGEN support with mlpg and mlsb

WARNING

This is not a pointy/clicky plug and play program to build new voices. It is instructions with discussion on the problems and an attempt to document the expertise we have gained in building other voices. Although we have tried to automate the task as much as possible this is no substitute for careful correction and understanding of the processes involved. There are significant pointers into the literature throughout the document that allow for more detailed study and further reading.

REQUIREMENTS

A Unix Machine

although there is nothing inheritantly Unix about the scripts, no
attempt has yet been made about porting this to other platforms

Festival and Speech Tools

This uses speech tools programs and festival itself at various
stages in builidng voices as well as (of course) for the final
voices.  Festival and the Edinburgh Speech Tools are available from

   http://www.cstr.ed.ac.uk/projects/festival/
   
or

   http://www.festvox.org/festival

or

   https://github.com/festvox
   
It is recommended that you compile your own versions of these
as you will need the libraries and include files to build some
programs in this festvox.

Wavesurfer

To display waveforms, spectragrams and phoneme labels.

Patience and understanding

Building a new voice is a lot of work, and something will probably
go wrong which may require the repetition of some long boring and
tedious process.  Even with lots of care a new voice still might 
just not work.  In distributing this document we hope to increase the
basic knowledge of synthesis out there and hopefully find people 
who can improve on this making the processing easier and more reliable
in the future.

INSTALLATION

You must have the Edinburgh Speech Tools and Festival instllation before you can build the tools in the festvox distribution.

Unpack festvox-2.8-release.tar.gz or clone it from github

git clone https://github.com/festvox/festvox
cd festvox
./configure
make

The configuration basically tries to find your version of the Edinburgh Speech Tools and uses its configuration to set compiler type etc. So you must have that installed. If configure fails try expliciting setting your ESTDIR environment variable to point ot your compiled version of the Speech Tools.

A pre-generated version of the document in html and postscript are distributed in the html/ directory

If you need to build the document itself, you will need a working version of the docbook tools, which may (or may not) already be installed on your system

To build the documenation

cd docbook
make doc

Note that even if the documentation build fails you can still use all the scripts and programs.

To use the scripts and programs in the festvox distribution each user is expected to have the environment variables ESTDIR and FESTVOXDIR set for example as (if you use bash, zsh, ksh or sh)

export ESTDIR=/home/awb/projects/speech_tools
export FESTVOXDIR=/home/awb/projects/festvox
export FLITEDIR=/home/awb/projects/flite
export SPTKDIR=/home/awb/projects/SPTK

Or if you use csh or tcsh

setenv ESTDIR /home/awb/projects/speech_tools
setenv FESTVOXDIR /home/awb/projects/festvox
setenv FLITEDIR /home/awb/projects/flite
setenv SPTKDIR /home/awb/projects/SPTK

Remember to set these to where your installations are, not ours.

flite's People

Contributors

asmaloney avatar awbcmu avatar doublebuffered avatar earboxer avatar elbeeo avatar festvox avatar ffontaine avatar happyalu avatar krishnshyam avatar kubkon avatar kubo avatar lenzo-ka avatar mindavi avatar paulgevers avatar pummelo65 avatar pzrq avatar rathann avatar saikrishnarallabandi avatar sthibaul avatar ycdtosa avatar zeehio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

flite's Issues

cmu_us_slt couldn't be loaded

Hello!

I loaded flite in VS2017.
VS2017 did some upgrades, and now I have

fliteDLL
cmu_us_rms

There is also the project "cmu_us_slt", but it's marked as "(not available)".

I went into the folder "flite-master\lang\cmu_us_slt", and I didn't see a vcprj or vcproj file there.

What is the "cmu_us_slt" about, and do I not need it?

about english cmu_lts_model.c and cmu_lex_data_raw.c

I have a question,can you help me?
why you cut the cmudict, and only 36964 english words in cmu_lex_data_raw.c
I know the cmudict contains 130000 english words, and I test the cmu_lts_model, it was performed poorly in cmu_lex_data_raw.c's 36964 words, about 90% word error rate. Why does this happen?(the cmu_lts_model is trained with cmudict which is removed the 36964 words? can you help me? thanks.
Forgive my poor English.

MBROLA voices?

Hey there!

Festival has support for Mbrola voices, which is pretty cool. I'd like to know whether it's possible to use them here in Flite too? I know there's a way of converting festvox to flitevox files, but I'm not sure how Mbrola is handled.

Thanks!

listing voices from a voicedir

Hello,

After putting .flitevox files in /usr/share/flite, I would have assumed that

flite -voicedir /usr/share/flite -lv

would have listed the voice stored in in /usr/share/flite, but that is not actually working.

It would be really useful to have this so that Linux distributions can just store voices there for them to be available to users without them having to understand the inners of voice paths etc.

Samuel

[2.1] symbols removed but no soname bump

While trying to package flite version 2.1 for Debian¹, I noticed that three symbols (cst_read_2d_array, cst_read_array and cst_rx_not_indic) were dropped with respect to version 2.0. I was wondering if bumping of the soname was just forgotten or if there is anything else at stake.

Can you either bump the soname, or let me know what you think I should do instead?

¹ https://www.debian.org/

How to add a new language?

Hi all,

Awesome small library. I do have a question? Is it possible to add a new support language? if yes, how?

Many Thanks!

Windowed join functions

Is there any particular reason that windowed join is not implemented in Flite?

Are there any plans to include it in the future?

add new language from scratch

Hello, it is possible to create a new language from audio and text files. If so, I would like to know the workflow. I am interested in giving German voices, ie flite files.

best regeards
Paul

Add support for wasm32-wasi target

I know that WASI is currently experimental as a compilation target, but nonetheless, I did manage to get flite compiled to it and run using CraneStation/wasmtime runtime. If it's of any interest, I'd be more than happy to submit a preliminary PR and work on it to get it merged into master. Also, I'd be more than happy to monitor any changes to WASI in the future and submit any relevant updates.

Library docs

I'm really sorry if I've missed anything, but it seems that readme doesn't include link to library documentation. Where can I find it?

P.S.: I'm also looking for list of source files which are part of library itself. Is it just everything under src/?

Avoid allocation of a buffer for the whole WAV file when streaming

This is a feature request.

Currently, when using streaming, a buffer is allocated large enough to hold a complete wav and each chunk is accessed through the start parameter of the callback function.

int my_stream_chunk(const cst_wave *w, int start, int size, 
                   int last, cst_audio_streaming_info *asi)
{
    // each call of this function has the same buffer and different start value
}

....
cst_audio_streaming_info *asi = cst_alloc(struct cst_audio_streaming_info_struct,1);
asi->min_buffsize = 256;
asi->asc = my_stream_chunk;
asi->userdata = NULL;

feat_set(v->features,"streaming_info",audio_streaming_info_val(asi));
cst_wave * wav = flite_text_to_wave(text_to_synth,v);
delete_wave(wav);

On embedded systems, memory is limited and sometimes the whole wav file is not required. For example when streaming to external devices.

The fact that a space for a hole file is allocated is limiting the length of the text that can be sent for synthesis.

A different approach could be to allocate enough space for the largest chunk and reuse it each time my_stream_chunk is called.

Allow installation under MacOs

The current branch does not install on Mac OS. The reason for this is that the cp command hard-coded in the Makefile of the ./main directory uses flags not supported on Mac.
cp -pd ...
This can be solved by replacing the flags with -r in case of "Darwin". Even though -r has a totally different semantics it can be used as a replacement in this particular case.

    UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Darwin)
CP_FLAGS='-r'
else
CP_FLAGS='-pd'
endif
.....
	cp $(CP_FLAGS) $(flite_LIBS_deps) $(DESTDIR)$(INSTALLLIBDIR)


flite to convert Text to phonetic (hello --> [HH EH L OW])

I need to use flite to convert Text to phonetic ( hello --> [HH EH L OW]). How can biuld part of the project to do that. I need just this part of Code.
realy, I need small fast run-time tolkit to use as a front-end convert text to .lab file for test new sentence in HTS.

Token to Words - How to keep track of which words belong to a token?

Hey everyone, so I'm stuck on a problem: I need to send user-inputted text through Flite, and then display the original text on screen with synced up word highlighting. The problem is that when a token gets expanded into multiple words (1983 -> Nineteen Eighty Three) I can't find a way to keep these words "grouped" together so that I can then sync all three words up to the original highlighted token "1983". I've tried modifying the us_tokentowords function so that it returns all the words in a single string, but I can't quite get it to work. Has anyone here come up with a solution to any similar problems? Any help would be much appreciated, thanks!

compiled binaries?

Are there some compiled binaries that can be used out of the box for windows? If yes, where can i download them, if no.. why arent there any?

Is there a way to increase/add the pause between words?

Although flite is very fast, it sounds like the words are attached together while speaking, the whitespace that should separate the words is hard to determine in the file that is generated.

For example this line, when spoken the words are attached (I have found that for almost all words in a sentence), I can recognize the words when I'm looking at the text, but it's hard when the text is not there.

flite -ps -t "hello my name is John Doe!"

Outputs:

pau hh ax l ow m ay n ey m ih z jh aa n d ow pau

And when spoken, (without ps flag), the sound is exactly like that. The pauses are only between the sentences and not between the words.

I tried to look through the documentations and not finding anything, I tried to look through the code to see if I can increase the pause duration, but i couldn't find anything at all.

I found it hard to imagine I'm the only one who noticed this but I couldn't find anything on it so I'm making this issue.

  • flite version: flite-2.3-current Mar 2022 (http://cmuflite.org)
  • OS: Arch Linux x86_64
  • Kernel: 5.17.6-arch1-1

Indic voice builds broken as of commit e988047

Some changes that were introduced to the voice templates (mostly for grapheme voices) now break builds of indic voices.

In particular, this does not get defined in indic voices, since they are part of indic lang.

Using flite with pulseaudio

Using flite-2.0.7-current Jul 2017 on openSUSE Leap 15. Did git pull to ensure that latest code installed.
Pulseaudio controls where my audio is sent. Using padsp with flite works fine, audio goes to the right device. The flite docs indicate that pulse-simple can be compiled in. Experimentation shows I get the fewest errors on compile with ./configure --with-audio=pulseaudio and the link message indicates that pulse and pulse-simple are linked in. But when I run the new flite without padsp flite still complains that it cannot find /dev/dsp, so I guess I am missing a little detail or my understanding of what is supposed to happen is incomplete.

GCC 11.2.1 "does not match original declaration" warnings with LTO

When compiling flite-2.2 on Fedora development branch (rawhide/f36), I'm getting the following warnings:

making ../build/x86_64-linux-gnu/lib/libflite_cmulex.so
../../lang/cmulex/cmu_lex.c:49:27: warning: type of 'cmu_lex_phone_table' does not match original declaration [-Wlto-type-mismatch]
   49 | extern const char * const cmu_lex_phone_table[54];
      |                           ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: array types have different bounds
   14 | const char * const cmu_lex_phone_table[57] =
      |                    ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: 'cmu_lex_phone_table' was previously declared here
...
making ../build/x86_64-linux-gnu/lib/libflite_cmu_grapheme_lex.so
../../lang/cmu_grapheme_lex/cmu_grapheme_lex.h:47:27: warning: type of 'unicode_sampa_mapping' does not match original declaration [-Wlto-type-mismatch]
   47 | extern const char * const unicode_sampa_mapping[16663][5];
      |                           ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: array types have different bounds
    9 | const char * const unicode_sampa_mapping[16798][5] =
      |                    ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: 'unicode_sampa_mapping' was previously declared here

16khz output from indic voice

Hello, I am using indic voice to generate the audio

./flite/bin/flite "-voice" flite/voices/cmu_indic_hin_ab.flitevox 'पुत्र मित्र आदि सगे संबंधियों' "-o" 'try.wav'

The output file try.wav is always 16khz. However, in the README.md it was mentioned that the output is deliberately kept at 8khz. Is it not valid for non-us voices?

Multisyn voice integration

Hi

Is there any way that we can convert our voice built using Multisyn in festival to that of flite. I can't seem to find any way for it.

some questions about stress

./t2p covina
pau k ow v iy1 n ax pau
cmudict-0.4.out
covina nil k ow0 v iy1 n ax0

Hello, I meet some questions about the accent.
In the training data, there are ow0, ax0. there is not ow.
But when I use ./t2p to predict the words. I found the t2p print ow (not ow0)!
Could you help me? I want to know the detail about how the flite deals with the accent?
Thanks.

Built-in voice loading functions?

Hi, I'm using flite in my linux c++ project, and I'm trying to use the built-in voice loading function

extern "C"
{
    cst_voice *cmu_us_slt(); // built in function
}

But there's a link error, should I add more link flags besides -lflite?
Also, is the function name I'm using right?

Dalek TTS voice on Picroft - diphone file structure

I'm trying to make a Mycroft/Picroft respond in a voice like the classic BBC Dr Who baddie, a Dalek.

I started with the standard British male Mimic diphone voice, it's already pretty robotic so it's well suited. For those who may be interested, I've altered it so that it does a passable Dalek impression which has involved two main steps;

The first is to break up the response into the individually delivered words (as in 'you ... will ... be ... exterminated') rather than running words together as in human speech. To do this on Mycroft I've interrupted coding at the point that the response has been translated into text (/mycroft-core/mycroft/audio/speech.py, at 'def handle_speak(event):') and changed the code at the 'else' point. Before I show any coding, I should say that, while I've been coding for many years, I'm a complete newbie to Python (and Mycroft/Picroft) and if I'm treading on toes or infringing things please let me know or delete this, and if you copy any of this you do so at your own risk (always make copies of the original files so that you can get back to the original code). This is what I changed it to;
else:
#insert pauses ('. ') between words for that dalek sound
utterance = utterance.replace(" ",". ")
utterance = utterance.replace(",",". . ")
utterance = utterance + ". "
mute_and_speak(utterance, ident, listen)

The second step was to add the Dalek electronic twang to the voice. After extensive Googling I found that this was originally created by passing the actor's voice through a 'ring modulator'(?). On another site (which I can't find at the moment, but the author deserves much the credit for this bit) I found that a 'software only' approximation of ring modulation was to merge a sine wave with the original voice. A sawtooth wave is a decent approximation of a sine wave and, I thought, might be faster so I chose that instead. Mycroft was reluctant to me adding the coding as a separate module so, again, I've had to butcher the original code, in this case '/mycroft-core/mycroft.tts/tts.py' at 'def _execute(self, sentence, ident, listen):'. The code was changed (at the point shown) to;

if os.path.exists(wav_file):
LOG.debug("TTS cache hit")
phonemes = self.load_phonemes(key)
else:
wav_file, phonemes = self.get_tts(sentence, wav_file)
if phonemes:
self.save_phonemes(key, phonemes)
vis = self.viseme(phonemes) if phonemes else None
try:
tooth_w = 0.01
tooth_h = 0.0
ifile = wave.open(wav_file,'rb')
channels = ifile.getnchannels()
frames = ifile.getnframes()
width = ifile.getsampwidth()
rate = ifile.getframerate()
audio = ifile.readframes(frames)
#remove the original file
ifile.close()
os.remove(wav_file)
# Convert buffer int16 using NumPy
audio16 = numpy.frombuffer(audio, dtype=numpy.int16)
empty16 = ([])
h = 1
d = tooth_w
for x in audio16:
n=x*h
empty16.append(n)
h = h - d
if h > 1 or h < tooth_h:
d = d * -1
outarray = numpy.array(empty16, dtype=numpy.int16)
dalek_file = wave.open(wav_file,'wb')
dalek_file.setnchannels(channels)
dalek_file.setframerate(rate)
dalek_file.setnframes(frames)
dalek_file.setsampwidth(width)
dalek_file.writeframes(outarray)
dalek_file.close()
except Exception as e:
print(e)
print("NOT dalekified")
finally:
self.queue.put((self.audio_ext, wav_file, vis, ident, l))

I also had to import the needed modules.

The tooth_h and tooth_w variables are the height and width of the sawtooth. I normally set tooth_h to 0, this means the sawtooth goes back and forth between 1 and 0 and the value deducted or added at each step is given by tooth_w (this should be between 0 and 1, preferably low) and the change in effect can be dramatic. There are hours of fun to be had messing about with tooth_w, there is a balance to be found between making it more 'Dalek' but keeping it intelligible.

My problem is that adding the coding at this point involves reopening the .wav file getting all the frames and precessing each, then rebuilding the file. This adds a 'noticeable' (read irritating) delay to the response, probably at least doubling the original noticeable response delay. My understanding of diphone voices are that they are created by concatenating tiny speech sounds held in some sort of database held in the original flitevox voice file. What would make it much faster would be to sawtooth each of these tiny fragments and return them to the file so that the Dalek voice was built in. Since each sawtooth fragment would be the same size as the original this shouldn't be a problem, if I could get at them. so my question is, is there an easy way to do this, or a complete description of the structure of a diphone file somewhere, or some kindly genius out there who could help? Cheers

Distortion with voice "clb"

Using latest Linux openSUSE, when using flite (compiled from source) with voice clb with a command line such as:
padsp flite -voice ~/gitprogs/flite/voices/cmu_us_clb.flitevox "one one one two five"
there is a nasty distortion of sound after each "one" enunciation.
Placing other words before the "one one one" helps eliminate this until at
"two three four five one one one two five"
the distortion disappears. It does not seem to be related to output volume and only occurs with voice clb, but may be related to pulse audio or some other factor. I'm wondering if this is a known issue and what simple tweaks might be helpful to track the issue down to the source?

making the .flitevox voices from source?

Hi guys, great great project. These voices are really amazing.

I've built flite 2.1 from source (probably one of the smoothest builds I've had in Linux) but I noticed that the .flitevox voices need to be downloaded instead of built from source?

I was wondering what the procedure for building these voices is and where I can find the source code?

find_sts_main.c fails to compile on mingw-w64 clang

ccache clang -mtune=generic -O2 -pipe -Wall -DCST_NO_SOCKETS -DUNDER_WINDOWS -DWIN32     -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1  -I../include  -c -o find_sts_main.o find_sts_main.c
In file included from find_sts_main.c:47:
In file included from ../include\cst_args.h:43:
In file included from ../include/cst_features.h:44:
In file included from ../include/cst_val.h:43:
In file included from ../include/cst_file.h:63:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windows.h:69:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windef.h:8:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\minwindef.h:163:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\winnt.h:1554:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\x86intrin.h:15:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\immintrin.h:18:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\xmmintrin.h:3005:
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: error: conflicting types for '_mm_clflush'
void _mm_clflush(void const * __p);
     ^
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: note: '_mm_clflush' is a builtin with type 'void (const void *)'
1 error generated.

Due to defining const to empty before including any system header, it breaks the function declaration for builtin functions

https://github.com/festvox/flite/blob/master/tools/find_sts_main.c#L45

m-ab-s/media-autobuild_suite#1706

"VAL: tried to access car in 1023 typed val" error on big-endian (s390x)

When running flite-2.2 test on a big-endian arch (s390x), I'm getting this error:

$ cd flite-2.2
$ LD_LIBRARY_PATH=/builddir/build/BUILDROOT/flite-2.2-1.fc36.s390x/usr/lib64
$ make -C testsuite do_thread_test
make: Entering directory '/builddir/build/BUILD/flite-2.2/testsuite'
gcc -fopenmp -o multi_thread multi_thread_main.c \
	-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -DWORDS_BIGENDIAN=1    -I../include -L../build/s390x-linux-gnu/lib -lflite  -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -lm -lpulse-simple -lpulse  \
	-l flite_cmu_us_slt -lflite_cmulex -lflite_usenglish \
	-lflite -lm -lasound -lgomp
export OMP_NUM_THREADS=100 && ./multi_thread
VAL: tried to access car in 1023 typed val
VAL: tried to access car in 1023 typed val
make: *** [Makefile:89: do_thread_test] Error 255
make: Leaving directory '/builddir/build/BUILD/flite-2.2/testsuite'

How to build shared libraries on macOS?

configure --enable-shared doesn't seem to enable building of shared libraries on macOS, while it is working fine in Linux.

Tested on Catalina (x86_64) and Monterey (arm64).

{macOS} Speak from command line

I can compile and run flite OK on macOS (10.14.6), but I can only generate .wav files - it won't read text from the command line or a file as in this example:

./bin/flite doc/alice

What do I need to set up to be able to do that? I don't see a way to select an audio device.

How to conversion of FestVox voices to Flite?

From google/language-resources#31. I cannot conversion of FestVox voices to Flite.

gcc -g -O2 -Wall     -o flite_goog_th_unison flite_main.o flite_voice_list.o flite_lang_list.o -L . -lgoog_th_unison   -lflite_cmu_th_lang -lflite_cmu_th_lex -L/usr/local/src/tools/flite/build/x86_64-linux-gnu/lib -lflite   -lm  
/usr/bin/ld: cannot find -lflite_cmu_th_lang
/usr/bin/ld: cannot find -lflite_cmu_th_lex
collect2: error: ld returned 1 exit status
Makefile:108: recipe for target 'flite_goog_th_unison' failed
make: *** [flite_goog_th_unison] Error 1

Tutorial explaining flite build process for new languages

This is a ToDo and I am hoping to get to this the last weekend in October. The idea is to build a tutorial describing the procedure to build a deployable voice in one Indian language that can expose the API capabilities of flite.

Some questions about resource usage

I'm trying to find some more info about the library, I hope this is the right place to ask. I'm still very much a beginner when it comes to flite, so if anyone happens to know about any of this it would be incredibly helpful.

I'm attempting to get this library running on a resource-constrained platform, more specifically a 32-bit microcontroller with ~500 kB available RAM, 512kB ROM reserved for TTS, and plenty of flash storage. The plan is to output the resulting speech audio over i2s in real time.

Questions

About the following statement in the readme: "For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated."

  • Does this mean splitting text into scentences, or even words, can reduce the RAM requirement because the "waveform" will be shorter?
  • If so, would feeding individual words impact speech quality with the default US english lexicon?
  • Is this the same "runtime" spec listed at <1M in the readme's memory comparison table? (Or are there other metrics that heavily affect RAM usage?)

About the other memory requirements; as I understand it: core (60k) + USEnglish (100k) + lexicon (600k) + diphone (1800k) can all potentially be stored in ROM instead of RAM

  • Is this correct?
  • Is there any hope of moving at least the diphone and lexicon to NAND flash instead of RAM/ROM?
  • If so, how do I approach this?

Any pointers in the right direction are welcome! Including possible approaches as to how I might find some answers myself.

How to configure default global settings such as voice?

Is there some way like a configuration file to set global defaults for things like the voice?
I am trying to use flite as the TTS backend for Okular (Document viewer) but I'm unable to use a voice other that the default kal16.

memory leak problem

there's a memory leak problem in the function "ffeature_string", can you solve it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.