Giter VIP home page Giter VIP logo

readbeyond / aeneas Goto Github PK

View Code? Open in Web Editor NEW
2.4K 73.0 218.0 29.82 MB

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)

Home Page: http://www.readbeyond.it/aeneas/

License: GNU Affero General Public License v3.0

Python 49.18% HTML 3.95% Shell 0.45% C 9.73% C++ 36.35% Makefile 0.34%
speech alignment tts python linux macos windows nlp espeak espeak-ng

aeneas's People

Contributors

cbeer avatar chrisvaughn avatar chrisvire avatar danielbair avatar eomerdws avatar readbeyond avatar stephenmcconnel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aeneas's Issues

Please update debian/changelog?

Hello Alberto,

Thank you for all the work you have been doing with Aeneas! It is great work.

We would like to update the package the we build of Aeneas that gets used by Scripture App Builder and Reading App Builder. Could you update the debian/changelog and create an entry for 1.5.0.3 and include the changes in the log that you think are relevant? You have done such a great job of including information about changes for previous entries in the changelog. I could try to come up with a list, but I don't know whether I could get a good list.

Thanks,

Chris Hubbard

Global execution parameters

Either on command line, config file or ~/.config/aeneas.conf .

For stuff like setting the MFCC window size, disabling C extensions, etc.

Long term move from Python C extensions to CFFI

Today I tried running aeneas under PyPy (Python 2.7.10 branch). Everything seems working, except cdtw and cmfcc that gets compiled, but they do not import, producing the following error: AttributeError: _ARRAY_API not found ... ImportError: numpy.core.multiarray failed to import, both with NumPyPy and upstream NumPy.

Asking on their IRC channel, they strongly suggest to switch to CFFI, as the C API is not the preferred mechanism of PyPy for calling C code.

So, for the long run, it might be worth considering switching to CFFI or supporting it along side C extensions.

BeautifulSoup4 v4.5.0 breaks aeneas (API change?)

BeautifulSoup4 v4.5.0, released on PyPI on 2016-07-20, seems to include some API change that breaks aeneas when trying to parse XML files with lxml:

soup = BeautifulSoup("\n".join(lines), "lxml")

I am not sure whether this is a bug (there is nothing on the bs4 bug tracker yet), or an intentional API change in bs4.

For now (=> aeneas v1.5.1), with #92 I fixed this issue by setting exact version numbers for lxml and BeautifulSoup4 in requirements.txt and in setup.py, but the issue should be investigated further for the next releases.

For example, we might end specifying exact versions for all pip-installable packages.

CC: @danielbair @chrisvire --- your installers should be fine, as they require BeautifulSoup4==4.4.1 and lxml==3.6.0. Same for the Vagrant procedure, which relies on pip install aeneas which should install the correct versions.

Add check on audio head/tail/process

Currently if e.g. the user sets an audio tail beyond the actual length of the audio file, a cryptic error Unexpected error while executing task : The given index is not valid is returned.

Adding a check will help the user diagnose the issue.

The job cannot be loaded from the specified container

This is the result from my execute_job test. I couldn't find what's causing the problem.
It worked when it was tested on Unix machine, but on Windows 7 64-bit it doesn't work.
Fresh installation of Python 2.7.10 (+BeautifulSoup and lxml), ffmpeg-20150916, espeak-1.48.04, numpy-1.9.2+mkl-cp27, scikits.audiolab-0.11.0-cp27, and VCForPython27.msi

c:\sync\aeneas-master>python -m aeneas.tools.execute_job test/01.zip output/ -v
[INFO] Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Loading job from container...
[DEBU] 2015-09-21 21:20:38.113000 ExecuteJob: Validating container...
[DEBU] 2015-09-21 21:20:38.113000 Validator: Checking container file 'test/01.zip'
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file exists
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container file has config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Container has TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Checking container with TXT config file
[DEBU] 2015-09-21 21:20:38.128000 Validator: Trying to read config file from con tainer
[DEBU] 2015-09-21 21:20:38.144000 Validator: Config file found in container
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking contents TXT config file
[DEBU] 2015-09-21 21:20:38.144000 Validator: Converting file contents to config string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that string is well encode d
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking that the given string is w ell encoded
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking encoding of string
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking for reserved characters
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.144000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.160000 Validator: Checking required parameters '['is_ hierarchy_type', 'is_hierarchy_prefix', 'is_text_file_relative_path', 'is_text_file_name_regex', 'is_text_type', 'is_audio_file_relative_path', 'is_audio_file_name_regex', 'os_job_file_name', 'os_job_file_container', 'os_job_file_hierarchy_ type', 'os_job_file_hierarchy_prefix', 'os_task_file_name', 'os_task_file_format ', 'job_language']'
[DEBU] 2015-09-21 21:20:38.285000 Validator: Checking required parameters
[DEBU] 2015-09-21 21:20:38.300000 Validator: Checking input parameters are not empty
[DEBU] 2015-09-21 21:20:38.332000 Validator: Checking no required parameter is missing
[DEBU] 2015-09-21 21:20:38.378000 Validator: Checking all parameter values are allowed
[DEBU] 2015-09-21 21:20:38.410000 Validator: Checking allowed values for parameter 'job_language'
[DEBU] 2015-09-21 21:20:38.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.472000 Validator: Checking allowed values for parameter 'task_language'
[DEBU] 2015-09-21 21:20:38.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.535000 Validator: Checking allowed values for parameter 'os_job_file_container'
[DEBU] 2015-09-21 21:20:38.582000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.597000 Validator: Checking allowed values for parameter 'is_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.644000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.660000 Validator: Checking allowed values for parameter 'os_job_file_hierarchy_type'
[DEBU] 2015-09-21 21:20:38.707000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.722000 Validator: Checking allowed values for parameter 'is_text_type'
[DEBU] 2015-09-21 21:20:38.753000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.785000 Validator: Checking allowed values for parameter 'os_task_file_format'
[DEBU] 2015-09-21 21:20:38.816000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.847000 Validator: Checking allowed values for parameter 'task_adjust_boundary_algorithm'
[DEBU] 2015-09-21 21:20:38.878000 Validator: Passed
[DEBU] 2015-09-21 21:20:38.910000 Validator: Checking all implied parameters are present
[DEBU] 2015-09-21 21:20:38.941000 Validator: Checking implied parameters by 'is_hierarchy_type'='paged'
[DEBU] 2015-09-21 21:20:38.988000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.003000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.050000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.066000 Validator: Checking implied parameters by 'is_text_type'='unparsed'
[DEBU] 2015-09-21 21:20:39.113000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.128000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.160000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.191000 Validator: Checking implied parameters by 'os_task_file_format'='smil'
[DEBU] 2015-09-21 21:20:39.222000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.238000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='percent'
[DEBU] 2015-09-21 21:20:39.285000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.300000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rate'
[DEBU] 2015-09-21 21:20:39.347000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.363000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='rateaggressive'
[DEBU] 2015-09-21 21:20:39.394000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.425000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='aftercurrent'
[DEBU] 2015-09-21 21:20:39.457000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.472000 Validator: Checking implied parameters by 'task_adjust_boundary_algorithm'='beforenext'
[DEBU] 2015-09-21 21:20:39.519000 Validator: Passed
[DEBU] 2015-09-21 21:20:39.550000 Validator: Checking required parameters: returning True
[DEBU] 2015-09-21 21:20:39.582000 Validator: Checking contents TXT config file: returning True
[DEBU] 2015-09-21 21:20:39.628000 Validator: Analyze the contents of the container
[DEBU] 2015-09-21 21:20:39.675000 Validator: Checking the Job object generated from container
[DEBU] 2015-09-21 21:20:39.722000 Validator: Checking the Job is not None
[DEBU] 2015-09-21 21:20:39.738000 Validator: Checking the Job has at least one Task
[DEBU] 2015-09-21 21:20:39.785000 Validator: Unable to create at least one Task from the container.
[DEBU] 2015-09-21 21:20:39.816000 Validator: Checking container with TXT config file: returning False
[DEBU] 2015-09-21 21:20:39.863000 Validator: Checking container: returning False
[DEBU] 2015-09-21 21:20:39.894000 ExecuteJob: Validating container: failed
[DEBU] 2015-09-21 21:20:39.925000 ExecuteJob: Loading job from container: failed
[INFO] Loading job from container... done [ERRO] The job cannot be loaded from the specified container

Config:

is_hierarchy_type=flat
is_hierarchy_prefix=input/
is_text_file_relative_path=.
is_text_file_name_regex=..txt
is_text_type=parsed
is_audio_file_relative_path=.
is_audio_file_name_regex=.
.MP3

os_job_file_name=output_test-01
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=input/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3

job_language=en
job_description=Test 01 (flat hierarchy, parsed text files)

debian/ubuntu package

We would like to include aeneas as a package dependency on the linux version of Scripture App Builder (http://software.sil.org/scriptureappbuilder) which free software. Is anyone working on a debian/ubuntu package? Would you accept a pull request if I did the work as a native package or I could create a non-native package and have it in a separate repo. What would you prefer?

Cache synthesized WAV files

Currently, when using a TTS called via subprocess or remote API, each fragment is synthesized individually. Hence, in case of repeated fragments, they get synthesized more than once.

The problem is especially impacting those using (paid or free but limited) TTS API.

The solution would be adding a "cache" mechanism to avoid synthesizing again a fragment if previously a fragment with the same text and language has been synthesized. This requires two things:

  1. keeping a dictionary, mapping fragment (language, text) => tmp WAV file
  2. removing all the WAV files at the end of the synt process

Perhaps this caching must be explicitly enabled by the user (since it requires more tmp disk space) and/or enabled by default only for TTS API wrappers, like the current Nuance one.

cew on Windows

The Python C extension cew can be compiled on Windows, but it requires manually patching the espeak DLLs, etc.

See if espeak-ng make this feasible.

Config files and parameter names

This is a long term goal.

Adopting a popular format (INI-like, e.g. TOML).

Changing the current parameter names (too long and complex), with simpler ones.

Call festival via C++ extension

Festival has a C++ API, so we might consider creating a cfw Python C(++?) extension, similar to cew for eSpeak.

From my preliminary test (a simple C++ executable that synthesizes a given number of fragments and concatenates them, saving a single file to disk), it is 8-10x faster to generate 100-1000 fragments than the current subprocess-based Python wrapper. For 1k fragments (2k words, ~21min total audio), the C++ code takes about 2 min, instead of ~25 min of the Python code.

There might be issues with having the Python C(++?) extension to compile, as the C++ part depends on several libraries, in particular festival and several sub-libraries of speech_tools.

cc @ozdefir

Compiling C extensions on Windows and Python 3.4/3.5

After a preliminary search, it looks like there is no equivalent of "Microsoft Visual C++ compiler for Python 2.7" for Python 3.

One must install the correct Microsoft Visual Studio or Visual C/C++ (free, but several GB of download...), as described here:

https://matthew-brett.github.io/pydagogue/python_msvc.html

or

http://stackoverflow.com/questions/29909330/microsoft-visual-c-compiler-for-python-3-4

before being able of compiling Python C extensions.

Investigate this further.

Rewrite ``sd``

Too many magic numbers. Test other/better approaches.

Remove linux-only blocks for aeneas.cew

Please remove the linux-only blocks for aeneas.cew now.
I have merged the patches from https://github.com/pettarin/espeakosx into the homebrew espeak to compile and install libespeak.
I've submitted a pull request against the espeak.rb formula, but now homebrew maintainers are considering dropping espeak from their official formula list, see Homebrew/homebrew-core#2726 so it may be necessary to use my homebew tap from now on.

Add the tap:
brew tap danielbair/tap
Then install as any other formula:
brew install danielbair/tap/espeak

Mac and Windows installers are available for aeneas from https://github.com/sillsdev/aeneas-installer/releases with cew compiled and working!

Former TODO list (to be splitted out)

  • Improving robustness against music in background
  • Isolating non-speech intervals (music, prolonged silence)
  • Automated text fragmentation based on audio analysis
  • Auto-tuning DTW parameters
  • Reporting the alignment score
  • Multilevel sync map granularity (e.g., multilevel SMIL output)
  • Testing other approaches, like GMM/HMM/NN (e.g., using HTK or Kaldi)

Creating a Path class or some path sanitize functions

Right now paths are treated as (Unicode) strings, and this might pose problems for all the nefarious Windows issues we all know.

Perhaps it is worth considering creating a specialized class or some path sanitize functions in globalfunctions.py.

A specialized class has the advantage of making e.g. "slash conversion" (/ => \ on Windows) transparent to the rest of the code. But perhaps it is overkill and global functions will suffice.

Rewrite ``vad``

Use numpy more, e.g. boolean masks (numpy.ma) and rolling windows.

DTW anchor indexing problem due to non-integer TTS sample rate * shift (was: Systematic negative bias observable in longer audios)

With longer audios I observe a consistent negative bias which increases gradually towards the end. To make sure it's not a playback issue I tested with Audacity which confirmed the observation.
Examples:

https://readiance.org/finetuneas/librivox/the-brothers-karamazov-by-fyodor-dostoyevsky/40-book-6-chapter-2-the-duel-the
https://readiance.org/finetuneas/librivox/childrens-short-works-vol-011-by-various/the-little-mermaid-childrens-short-works?g=s

The alignments are almost perfect, so I thought it could be due to floating point math or rounding.

Packaging for OSX

At SIL, we are working on releasing Scripture App Builder for Mac (will build Android and iOS apps). We would like to include Aeneas support on the Mac. I have been in discussion with @danielbair on creating a package for OSX. Would you accept a pull request for this (similar to the debian packaging) or should we keep it as a separate repo?

Thanks,

Chris

Creating executables of aeneas with pyinstaller

Working on it on my personal repo, in devel branch.

This needs:

  1. addressing sys.in.encoding being None
  2. creating an hydra tool, so that only one exec should be built for each (OS, 32/64-bit) pair
  3. including the correct res/ files in the .spec configuration
  4. provide the .spec configurations: one for "one directory" and one for "one file" mode

Aeneas and Python3

Hi there,

I have not dived yet into the actual aeneas code, but I'd like to get things clear before doing that.
For testing purposes, I wanted to include it in a Python 3 project, but that choked on the beautifulsoup version (3.2.1) that it required.

  • Am I correct that aeneas only runs in Python 2?
  • Could Aeneas work with a higher version of BS?
  • How much would it take to rework Aeneas into a Py 3 version?

Thanks a lot

cew on OS X

At the moment the Python C extension cew works on OS X (with a modified cew_setup.py) but it requires compiling espeak as a static library and copying it in the aeneas/ directory.

See if this can be automated, especially now that espeak-ng seems the active upstream.

Expose additional eSpeak voices

Currently the languages allowed by the validation process are a subset of the voices available to espeak. Could we add the rest, or at least the english variations such as en-gb and en-us?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.