festvox / flite Goto Github PK
View Code? Open in Web Editor NEWA small fast portable speech synthesis system
License: Other
A small fast portable speech synthesis system
License: Other
Is there any particular reason that windowed join is not implemented in Flite?
Are there any plans to include it in the future?
Is it anyhow possible to include .flitevox voice(s) into the flite build?
In compilation section is seems possible to configure built-in voices through ./configure --with-langvox=transtac
but I would like to compile in some high quality .flitefox from http://www.festvox.org/flite/packed/flite-2.1/voices/ .
The reason is that I am building flite for ffmpeg, which does not let you define .flitevox voice file in runtime.
Hello,
Is there a guide for adding new language support to flite?
Hi, I'm using flite in my linux c++ project, and I'm trying to use the built-in voice loading function
extern "C"
{
cst_voice *cmu_us_slt(); // built in function
}
But there's a link error, should I add more link flags besides -lflite
?
Also, is the function name I'm using right?
While trying to package flite version 2.1 for Debian¹, I noticed that three symbols (cst_read_2d_array, cst_read_array and cst_rx_not_indic) were dropped with respect to version 2.0. I was wondering if bumping of the soname was just forgotten or if there is anything else at stake.
Can you either bump the soname, or let me know what you think I should do instead?
This is a feature request.
Currently, when using streaming, a buffer is allocated large enough to hold a complete wav and each chunk is accessed through the start
parameter of the callback function.
int my_stream_chunk(const cst_wave *w, int start, int size,
int last, cst_audio_streaming_info *asi)
{
// each call of this function has the same buffer and different start value
}
....
cst_audio_streaming_info *asi = cst_alloc(struct cst_audio_streaming_info_struct,1);
asi->min_buffsize = 256;
asi->asc = my_stream_chunk;
asi->userdata = NULL;
feat_set(v->features,"streaming_info",audio_streaming_info_val(asi));
cst_wave * wav = flite_text_to_wave(text_to_synth,v);
delete_wave(wav);
On embedded systems, memory is limited and sometimes the whole wav file is not required. For example when streaming to external devices.
The fact that a space for a hole file is allocated is limiting the length of the text that can be sent for synthesis.
A different approach could be to allocate enough space for the largest chunk and reuse it each time my_stream_chunk
is called.
I have a list of 44k words and I want to create a lexicon of that. How can I do that?
I have a question,can you help me?
why you cut the cmudict, and only 36964 english words in cmu_lex_data_raw.c
I know the cmudict contains 130000 english words, and I test the cmu_lts_model, it was performed poorly in cmu_lex_data_raw.c's 36964 words, about 90% word error rate. Why does this happen?(the cmu_lts_model is trained with cmudict which is removed the 36964 words? can you help me? thanks.
Forgive my poor English.
I'm really sorry if I've missed anything, but it seems that readme doesn't include link to library documentation. Where can I find it?
P.S.: I'm also looking for list of source files which are part of library itself. Is it just everything under src/
?
I'm trying to make a Mycroft/Picroft respond in a voice like the classic BBC Dr Who baddie, a Dalek.
I started with the standard British male Mimic diphone voice, it's already pretty robotic so it's well suited. For those who may be interested, I've altered it so that it does a passable Dalek impression which has involved two main steps;
The first is to break up the response into the individually delivered words (as in 'you ... will ... be ... exterminated') rather than running words together as in human speech. To do this on Mycroft I've interrupted coding at the point that the response has been translated into text (/mycroft-core/mycroft/audio/speech.py, at 'def handle_speak(event):') and changed the code at the 'else' point. Before I show any coding, I should say that, while I've been coding for many years, I'm a complete newbie to Python (and Mycroft/Picroft) and if I'm treading on toes or infringing things please let me know or delete this, and if you copy any of this you do so at your own risk (always make copies of the original files so that you can get back to the original code). This is what I changed it to;
else:
#insert pauses ('. ') between words for that dalek sound
utterance = utterance.replace(" ",". ")
utterance = utterance.replace(",",". . ")
utterance = utterance + ". "
mute_and_speak(utterance, ident, listen)
The second step was to add the Dalek electronic twang to the voice. After extensive Googling I found that this was originally created by passing the actor's voice through a 'ring modulator'(?). On another site (which I can't find at the moment, but the author deserves much the credit for this bit) I found that a 'software only' approximation of ring modulation was to merge a sine wave with the original voice. A sawtooth wave is a decent approximation of a sine wave and, I thought, might be faster so I chose that instead. Mycroft was reluctant to me adding the coding as a separate module so, again, I've had to butcher the original code, in this case '/mycroft-core/mycroft.tts/tts.py' at 'def _execute(self, sentence, ident, listen):'. The code was changed (at the point shown) to;
if os.path.exists(wav_file):
LOG.debug("TTS cache hit")
phonemes = self.load_phonemes(key)
else:
wav_file, phonemes = self.get_tts(sentence, wav_file)
if phonemes:
self.save_phonemes(key, phonemes)
vis = self.viseme(phonemes) if phonemes else None
try:
tooth_w = 0.01
tooth_h = 0.0
ifile = wave.open(wav_file,'rb')
channels = ifile.getnchannels()
frames = ifile.getnframes()
width = ifile.getsampwidth()
rate = ifile.getframerate()
audio = ifile.readframes(frames)
#remove the original file
ifile.close()
os.remove(wav_file)
# Convert buffer int16 using NumPy
audio16 = numpy.frombuffer(audio, dtype=numpy.int16)
empty16 = ([])
h = 1
d = tooth_w
for x in audio16:
n=x*h
empty16.append(n)
h = h - d
if h > 1 or h < tooth_h:
d = d * -1
outarray = numpy.array(empty16, dtype=numpy.int16)
dalek_file = wave.open(wav_file,'wb')
dalek_file.setnchannels(channels)
dalek_file.setframerate(rate)
dalek_file.setnframes(frames)
dalek_file.setsampwidth(width)
dalek_file.writeframes(outarray)
dalek_file.close()
except Exception as e:
print(e)
print("NOT dalekified")
finally:
self.queue.put((self.audio_ext, wav_file, vis, ident, l))
I also had to import the needed modules.
The tooth_h and tooth_w variables are the height and width of the sawtooth. I normally set tooth_h to 0, this means the sawtooth goes back and forth between 1 and 0 and the value deducted or added at each step is given by tooth_w (this should be between 0 and 1, preferably low) and the change in effect can be dramatic. There are hours of fun to be had messing about with tooth_w, there is a balance to be found between making it more 'Dalek' but keeping it intelligible.
My problem is that adding the coding at this point involves reopening the .wav file getting all the frames and precessing each, then rebuilding the file. This adds a 'noticeable' (read irritating) delay to the response, probably at least doubling the original noticeable response delay. My understanding of diphone voices are that they are created by concatenating tiny speech sounds held in some sort of database held in the original flitevox voice file. What would make it much faster would be to sawtooth each of these tiny fragments and return them to the file so that the Dalek voice was built in. Since each sawtooth fragment would be the same size as the original this shouldn't be a problem, if I could get at them. so my question is, is there an easy way to do this, or a complete description of the structure of a diphone file somewhere, or some kindly genius out there who could help? Cheers
I'm trying to find some more info about the library, I hope this is the right place to ask. I'm still very much a beginner when it comes to flite, so if anyone happens to know about any of this it would be incredibly helpful.
I'm attempting to get this library running on a resource-constrained platform, more specifically a 32-bit microcontroller with ~500 kB available RAM, 512kB ROM reserved for TTS, and plenty of flash storage. The plan is to output the resulting speech audio over i2s in real time.
About the following statement in the readme: "For standard diphone voices, maximum run time memory requirements are approximately less than twice the memory requirement for the waveform generated."
About the other memory requirements; as I understand it: core (60k) + USEnglish (100k) + lexicon (600k) + diphone (1800k) can all potentially be stored in ROM instead of RAM
Any pointers in the right direction are welcome! Including possible approaches as to how I might find some answers myself.
Hi. Would really appreciate any help. I am trying to convert indic clustergen voice built using festival to flite. But I am getting
flite-2.0.0-release/lang/cmu_indic_lang/cmu_indic_lang.c:89: undefined reference to us_tokentowords
while running make in flite directory
Are there some compiled binaries that can be used out of the box for windows? If yes, where can i download them, if no.. why arent there any?
Hi guys, great great project. These voices are really amazing.
I've built flite 2.1 from source (probably one of the smoothest builds I've had in Linux) but I noticed that the .flitevox voices need to be downloaded instead of built from source?
I was wondering what the procedure for building these voices is and where I can find the source code?
Using latest Linux openSUSE, when using flite (compiled from source) with voice clb with a command line such as:
padsp flite -voice ~/gitprogs/flite/voices/cmu_us_clb.flitevox "one one one two five"
there is a nasty distortion of sound after each "one" enunciation.
Placing other words before the "one one one" helps eliminate this until at
"two three four five one one one two five"
the distortion disappears. It does not seem to be related to output volume and only occurs with voice clb, but may be related to pulse audio or some other factor. I'm wondering if this is a known issue and what simple tweaks might be helpful to track the issue down to the source?
From google/language-resources#31. I cannot conversion of FestVox voices to Flite.
gcc -g -O2 -Wall -o flite_goog_th_unison flite_main.o flite_voice_list.o flite_lang_list.o -L . -lgoog_th_unison -lflite_cmu_th_lang -lflite_cmu_th_lex -L/usr/local/src/tools/flite/build/x86_64-linux-gnu/lib -lflite -lm
/usr/bin/ld: cannot find -lflite_cmu_th_lang
/usr/bin/ld: cannot find -lflite_cmu_th_lex
collect2: error: ld returned 1 exit status
Makefile:108: recipe for target 'flite_goog_th_unison' failed
make: *** [flite_goog_th_unison] Error 1
Hello, it is possible to create a new language from audio and text files. If so, I would like to know the workflow. I am interested in giving German voices, ie flite files.
best regeards
Paul
I would like to point out that identifiers like “_CST_CG_H__
” and “_FLITE_H__
” do not fit to the expected naming convention of the C++ language standard.
Would you like to adjust your selection for unique names?
Hello, I am using indic voice to generate the audio
./flite/bin/flite "-voice" flite/voices/cmu_indic_hin_ab.flitevox 'पुत्र मित्र आदि सगे संबंधियों' "-o" 'try.wav'
The output file try.wav
is always 16khz. However, in the README.md it was mentioned that the output is deliberately kept at 8khz. Is it not valid for non-us voices?
I think that it would be good to flush currently committed changes and make new release :)
./t2p covina
pau k ow v iy1 n ax pau
cmudict-0.4.out
covina nil k ow0 v iy1 n ax0
Hello, I meet some questions about the accent.
In the training data, there are ow0, ax0. there is not ow.
But when I use ./t2p to predict the words. I found the t2p print ow (not ow0)!
Could you help me? I want to know the detail about how the flite deals with the accent?
Thanks.
Dear,
Could you kindly explain the reason of the issue as stated in the title?
Thanks!
Is this code compatible for real time operating system (Nucleus RTOS) and what is the size required(RAM/ROM)
When running flite-2.2 test on a big-endian arch (s390x), I'm getting this error:
$ cd flite-2.2
$ LD_LIBRARY_PATH=/builddir/build/BUILDROOT/flite-2.2-1.fc36.s390x/usr/lib64
$ make -C testsuite do_thread_test
make: Entering directory '/builddir/build/BUILD/flite-2.2/testsuite'
gcc -fopenmp -o multi_thread multi_thread_main.c \
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection -Wall -DWORDS_BIGENDIAN=1 -I../include -L../build/s390x-linux-gnu/lib -lflite -Wl,-z,relro -Wl,--as-needed -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -lm -lpulse-simple -lpulse \
-l flite_cmu_us_slt -lflite_cmulex -lflite_usenglish \
-lflite -lm -lasound -lgomp
export OMP_NUM_THREADS=100 && ./multi_thread
VAL: tried to access car in 1023 typed val
VAL: tried to access car in 1023 typed val
make: *** [Makefile:89: do_thread_test] Error 255
make: Leaving directory '/builddir/build/BUILD/flite-2.2/testsuite'
The current branch does not install on Mac OS. The reason for this is that the cp command hard-coded in the Makefile of the ./main directory uses flags not supported on Mac.
cp -pd ...
This can be solved by replacing the flags with -r in case of "Darwin". Even though -r has a totally different semantics it can be used as a replacement in this particular case.
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S),Darwin)
CP_FLAGS='-r'
else
CP_FLAGS='-pd'
endif
.....
cp $(CP_FLAGS) $(flite_LIBS_deps) $(DESTDIR)$(INSTALLLIBDIR)
hi,
Thanks for share the great project,
but as a newer,maybe a little difficult to Android ,
Any advice or suggestion will be appreciated.
Thx
Hello!
I loaded flite in VS2017.
VS2017 did some upgrades, and now I have
fliteDLL
cmu_us_rms
There is also the project "cmu_us_slt", but it's marked as "(not available)".
I went into the folder "flite-master\lang\cmu_us_slt", and I didn't see a vcprj or vcproj file there.
What is the "cmu_us_slt" about, and do I not need it?
Line 64 in 6c9f20d
make_cart.scm is not available
it could be use useful to start converting a voice in a new language
Hi all,
Awesome small library. I do have a question? Is it possible to add a new support language? if yes, how?
Many Thanks!
Hi
Is there any way that we can convert our voice built using Multisyn in festival to that of flite. I can't seem to find any way for it.
When compiling flite-2.2 on Fedora development branch (rawhide/f36), I'm getting the following warnings:
making ../build/x86_64-linux-gnu/lib/libflite_cmulex.so
../../lang/cmulex/cmu_lex.c:49:27: warning: type of 'cmu_lex_phone_table' does not match original declaration [-Wlto-type-mismatch]
49 | extern const char * const cmu_lex_phone_table[54];
| ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: array types have different bounds
14 | const char * const cmu_lex_phone_table[57] =
| ^
../../lang/cmulex/cmu_lex_entries.c:14:20: note: 'cmu_lex_phone_table' was previously declared here
...
making ../build/x86_64-linux-gnu/lib/libflite_cmu_grapheme_lex.so
../../lang/cmu_grapheme_lex/cmu_grapheme_lex.h:47:27: warning: type of 'unicode_sampa_mapping' does not match original declaration [-Wlto-type-mismatch]
47 | extern const char * const unicode_sampa_mapping[16663][5];
| ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: array types have different bounds
9 | const char * const unicode_sampa_mapping[16798][5] =
| ^
../../lang/cmu_grapheme_lex/grapheme_unitran_tables.c:9:20: note: 'unicode_sampa_mapping' was previously declared here
Hey everyone, so I'm stuck on a problem: I need to send user-inputted text through Flite, and then display the original text on screen with synced up word highlighting. The problem is that when a token gets expanded into multiple words (1983 -> Nineteen Eighty Three) I can't find a way to keep these words "grouped" together so that I can then sync all three words up to the original highlighted token "1983". I've tried modifying the us_tokentowords function so that it returns all the words in a single string, but I can't quite get it to work. Has anyone here come up with a solution to any similar problems? Any help would be much appreciated, thanks!
Is there some way like a configuration file to set global defaults for things like the voice?
I am trying to use flite as the TTS backend for Okular (Document viewer) but I'm unable to use a voice other that the default kal16.
As the title, if I want to add more words which are not included in the default dictionary, what should I do?
there's a memory leak problem in the function "ffeature_string", can you solve it?
ccache clang -mtune=generic -O2 -pipe -Wall -DCST_NO_SOCKETS -DUNDER_WINDOWS -DWIN32 -D_FORTIFY_SOURCE=0 -D__USE_MINGW_ANSI_STDIO=1 -I../include -c -o find_sts_main.o find_sts_main.c
In file included from find_sts_main.c:47:
In file included from ../include\cst_args.h:43:
In file included from ../include/cst_features.h:44:
In file included from ../include/cst_val.h:43:
In file included from ../include/cst_file.h:63:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windows.h:69:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\windef.h:8:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\minwindef.h:163:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\x86_64-w64-mingw32\include\winnt.h:1554:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\x86intrin.h:15:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\immintrin.h:18:
In file included from D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\xmmintrin.h:3005:
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: error: conflicting types for '_mm_clflush'
void _mm_clflush(void const * __p);
^
D:\media-autobuild_suite-master\msys64\mingw64\lib\clang\10.0.0\include\emmintrin.h:4224:6: note: '_mm_clflush' is a builtin with type 'void (const void *)'
1 error generated.
Due to defining const
to empty before including any system header, it breaks the function declaration for builtin functions
https://github.com/festvox/flite/blob/master/tools/find_sts_main.c#L45
Maybe to Soundcloud.
This is a ToDo and I am hoping to get to this the last weekend in October. The idea is to build a tutorial describing the procedure to build a deployable voice in one Indian language that can expose the API capabilities of flite.
Some changes that were introduced to the voice templates (mostly for grapheme voices) now break builds of indic voices.
In particular, this does not get defined in indic voices, since they are part of indic lang.
configure --enable-shared
doesn't seem to enable building of shared libraries on macOS, while it is working fine in Linux.
Tested on Catalina (x86_64) and Monterey (arm64).
I know that WASI is currently experimental as a compilation target, but nonetheless, I did manage to get flite
compiled to it and run using CraneStation/wasmtime runtime. If it's of any interest, I'd be more than happy to submit a preliminary PR and work on it to get it merged into master
. Also, I'd be more than happy to monitor any changes to WASI in the future and submit any relevant updates.
I need to use flite to convert Text to phonetic ( hello --> [HH EH L OW]). How can biuld part of the project to do that. I need just this part of Code.
realy, I need small fast run-time tolkit to use as a front-end convert text to .lab file for test new sentence in HTS.
Where can I download a demo with different voices before I try to compile flite?
Line 71 in 7c1994b
Hey there!
Festival has support for Mbrola voices, which is pretty cool. I'd like to know whether it's possible to use them here in Flite too? I know there's a way of converting festvox to flitevox files, but I'm not sure how Mbrola is handled.
Thanks!
Hello,
After putting .flitevox files in /usr/share/flite
, I would have assumed that
flite -voicedir /usr/share/flite -lv
would have listed the voice stored in in /usr/share/flite
, but that is not actually working.
It would be really useful to have this so that Linux distributions can just store voices there for them to be available to users without them having to understand the inners of voice paths etc.
Samuel
Although flite is very fast, it sounds like the words are attached together while speaking, the whitespace that should separate the words is hard to determine in the file that is generated.
For example this line, when spoken the words are attached (I have found that for almost all words in a sentence), I can recognize the words when I'm looking at the text, but it's hard when the text is not there.
flite -ps -t "hello my name is John Doe!"
Outputs:
pau hh ax l ow m ay n ey m ih z jh aa n d ow pau
And when spoken, (without ps
flag), the sound is exactly like that. The pauses are only between the sentences and not between the words.
I tried to look through the documentations and not finding anything, I tried to look through the code to see if I can increase the pause duration, but i couldn't find anything at all.
I found it hard to imagine I'm the only one who noticed this but I couldn't find anything on it so I'm making this issue.
flite-2.3-current Mar 2022 (http://cmuflite.org)
I have a Swedish festival TTS and I need to include this language into Flite but when I do it, there are always some libraries that I don´t have or the program cannot find.
I downloaded the swedish tts from here --> http://person2.sol.lu.se/JohanFrid/festival/download.html
And I use thiese commands:
$FLITEDIR/tools/setup_flite
./bin/build_flite cg
cd flite
make
I can compile and run flite OK on macOS (10.14.6), but I can only generate .wav files - it won't read text from the command line or a file as in this example:
./bin/flite doc/alice
What do I need to set up to be able to do that? I don't see a way to select an audio device.
Using flite-2.0.7-current Jul 2017 on openSUSE Leap 15. Did git pull
to ensure that latest code installed.
Pulseaudio controls where my audio is sent. Using padsp
with flite works fine, audio goes to the right device. The flite docs indicate that pulse-simple can be compiled in. Experimentation shows I get the fewest errors on compile with ./configure --with-audio=pulseaudio
and the link message indicates that pulse and pulse-simple are linked in. But when I run the new flite without padsp
flite still complains that it cannot find /dev/dsp
, so I guess I am missing a little detail or my understanding of what is supposed to happen is incomplete.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.