# aeneas **aeneas** is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment). * Version: 1.7.3 * Date: 2017-03-15 * Developed by: [ReadBeyond](http://www.readbeyond.it/) * Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/) * License: the GNU Affero General Public License Version 3 (AGPL v3) * Contact: [aeneas@readbeyond.it](mailto:aeneas@readbeyond.it) * Quick Links: [Home](http://www.readbeyond.it/aeneas/) - [GitHub](https://github.com/readbeyond/aeneas/) - [PyPI](https://pypi.python.org/pypi/aeneas/) - [Docs](http://www.readbeyond.it/aeneas/docs/) - [Tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) - [Benchmark](https://readbeyond.github.io/aeneas-benchmark/) - [Mailing List](https://groups.google.com/d/forum/aeneas-forced-alignment) - [Web App](http://aeneasweb.org) ## Goal **aeneas** automatically generates a **synchronization map** between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) **forced alignment**. For example, given [this text file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.xhtml) and [this audio file](https://raw.githubusercontent.com/readbeyond/aeneas/master/aeneas/tests/res/container/job/assets/p001.mp3), **aeneas** determines, for each fragment, the corresponding time interval in the audio file: ``` 1 => [00:00:00.000, 00:00:02.640] From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240] ``` ![Waveform with aligned labels, detail](wiki/align.png) This synchronization map can be output to file in several formats, depending on its application: * research: Audacity (AUD), ELAN (EAF), TextGrid; * digital publishing: SMIL for EPUB 3; * closed captioning: SubRip (SRT), SubViewer (SBV/SUB), TTML, WebVTT (VTT); * Web: JSON; * further processing: CSV, SSV, TSV, TXT, XML. ## System Requirements, Supported Platforms and Installation ### System Requirements 1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU) 2. [Python](https://python.org/) 2.7 (Linux, OS X, Windows) or 3.5 or later (Linux, OS X) 3. [FFmpeg](https://www.ffmpeg.org/) 4. [eSpeak](http://espeak.sourceforge.net/) 5. Python packages `BeautifulSoup4`, `lxml`, and `numpy` 6. Python headers to compile the Python C/C++ extensions (optional but strongly recommended) 7. A shell supporting UTF-8 (optional but strongly recommended) ### Supported Platforms **aeneas** has been developed and tested on **Debian 64bit**, with **Python 2.7** and **Python 3.5**, which are the **only supported platforms** at the moment. Nevertheless, **aeneas** has been confirmed to work on other Linux distributions, Mac OS X, and Windows. See the [PLATFORMS file](https://github.com/readbeyond/aeneas/blob/master/wiki/PLATFORMS.md) for details. If installing **aeneas** natively on your OS proves difficult, you are strongly encouraged to use [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant), which provides **aeneas** inside a virtualized Debian image running under [VirtualBox](https://www.virtualbox.org/) and [Vagrant](http://www.vagrantup.com/), which can be installed on any modern OS (Linux, Mac OS X, Windows). ### Installation All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the [INSTALL file](https://github.com/readbeyond/aeneas/blob/master/wiki/INSTALL.md) for detailed, step-by-step installation procedures for different operating systems. The generic OS-independent procedure is simple: 1. **Install** [Python](https://python.org/) (2.7.x preferred), [FFmpeg](https://www.ffmpeg.org/), and [eSpeak](http://espeak.sourceforge.net/) 2. Make sure the following **executables** can be called from your **shell**: `espeak`, `ffmpeg`, `ffprobe`, `pip`, and `python` 3. First install `numpy` with `pip` and then `aeneas` (this order is important): ```bash pip install numpy pip install aeneas ``` 4. To **check** whether you installed **aeneas** correctly, run: ```bash python -m aeneas.diagnostics ``` ## Usage 1. Run without arguments to get the **usage message**: ```bash python -m aeneas.tools.execute_task python -m aeneas.tools.execute_job ``` You can also get a list of **live examples** that you can immediately run on your machine thanks to the included files: ```bash python -m aeneas.tools.execute_task --examples python -m aeneas.tools.execute_task --examples-all ``` 2. To **compute a synchronization map** `map.json` for a pair (`audio.mp3`, `text.txt` in [plain](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.PLAIN) text format), you can run: ```bash python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \ "task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json ``` (The command has been split into lines with `\` for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.) To **compute a synchronization map** `map.smil` for a pair (`audio.mp3`, [page.xhtml](http://www.readbeyond.it/aeneas/docs/textfile.html#aeneas.textfile.TextFileFormat.UNPARSED) containing fragments marked by `id` attributes like `f001`), you can run: ```bash python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \ "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smil ``` As you can see, the third argument (the _configuration string_) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. 3. If you have several tasks to process, you can create a **job container** to batch process them: ```bash python -m aeneas.tools.execute_job job.zip output_directory ``` File `job.zip` should contain a `config.txt` or `config.xml` configuration file, providing **aeneas** with all the information needed to parse the input assets and format the output sync map files. Consult the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. The [documentation](http://www.readbeyond.it/aeneas/docs/) contains a highly suggested [tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) which explains how to use the built-in command line tools. ## Documentation and Support * Documentation: [http://www.readbeyond.it/aeneas/docs/](http://www.readbeyond.it/aeneas/docs/) * Command line tools tutorial: [http://www.readbeyond.it/aeneas/docs/clitutorial.html](http://www.readbeyond.it/aeneas/docs/clitutorial.html) * Library tutorial: [http://www.readbeyond.it/aeneas/docs/libtutorial.html](http://www.readbeyond.it/aeneas/docs/libtutorial.html) * Old, verbose tutorial: [A Practical Introduction To The aeneas Package](http://www.albertopettarin.it/blog/2015/05/21/a-practical-introduction-to-the-aeneas-package.html) * Mailing list: [https://groups.google.com/d/forum/aeneas-forced-alignment](https://groups.google.com/d/forum/aeneas-forced-alignment) * Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.readbeyond.it/aeneas/docs/changelog.html) * High level description of how aeneas works: [HOWITWORKS](https://github.com/readbeyond/aeneas/blob/master/wiki/HOWITWORKS.md) * Development history: [HISTORY](https://github.com/readbeyond/aeneas/blob/master/wiki/HISTORY.md) * Testing: [TESTING](https://github.com/readbeyond/aeneas/blob/master/wiki/TESTING.md) * Benchmark suite: [https://readbeyond.github.io/aeneas-benchmark/](https://readbeyond.github.io/aeneas-benchmark/) ## Supported Features * Input text files in `parsed`, `plain`, `subtitles`, or `unparsed` (XML) format * Multilevel input text files in `mplain` and `munparsed` (XML) format * Text extraction from XML (e.g., XHTML) files using `id` and `class` attributes * Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.) * Input audio file formats: all those readable by `ffmpeg` * Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TEXTGRID, TSV, TTML, TXT, VTT, XML * Confirmed working on 38 languages: AFR, ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR * MFCC and DTW computed via Python C extensions to reduce the processing time * Several built-in TTS engine wrappers: AWS Polly TTS API, eSpeak (default), eSpeak-ng, Festival, MacOS (via say), Nuance TTS API * Default TTS (eSpeak) called via a Python C extension for fast audio synthesis * Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect) * Batch processing of multiple audio/text pairs * Download audio from a YouTube video * In multilevel mode, recursive alignment from paragraph to sentence to word level * In multilevel mode, MFCC resolution, MFCC masking, DTW margin, and TTS engine can be specified for each level independently * Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes * Adjustable splitting times, including a max character/second constraint for CC applications * Automated detection of audio head/tail * Output an HTML file for fine tuning the sync map manually (`finetuneas` project) * Execution parameters tunable at runtime * Code suitable for Web app deployment (e.g., on-demand cloud computing instances) * Extensive test suite including 1,200+ unit/integration/performance tests, that run and must pass before each release ## Limitations and Missing Features * Audio should match the text: large portions of spurious text or audio might produce a wrong sync map * Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications * No protection against memory swapping: be sure your amount of RAM is adequate for the maximum duration of a single audio file (e.g., 4 GB RAM => max 2h audio; 16 GB RAM => max 10h audio) * [Open issues](https://github.com/readbeyond/aeneas/issues) ### A Note on Word-Level Alignment A significant number of users runs **aeneas** to align audio and text at word-level (i.e., each fragment is a word). Although **aeneas** was not designed with word-level alignment in mind and the results might be inferior to [ASR-based forced aligners](https://github.com/pettarin/forced-alignment-tools) for languages with good ASR models, **aeneas** offers some options to improve the quality of the alignment at word-level: * multilevel text (since v1.5.1), * MFCC nonspeech masking (since v1.7.0, disabled by default), * use better TTS engines, like Festival or AWS/Nuance TTS API (since v1.5.0). If you use the ``aeneas.tools.execute_task`` command line tool, you can add ``--presets-word`` switch to enable MFCC nonspeech masking, for example: ```bash $ python -m aeneas.tools.execute_task --example-words --presets-word $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word ``` If you use **aeneas** as a library, just set the appropriate ``RuntimeConfiguration`` parameters. Please see the [command line tutorial](http://www.readbeyond.it/aeneas/docs/clitutorial.html) for details. ## License **aeneas** is released under the terms of the GNU Affero General Public License Version 3. See the [LICENSE file](https://github.com/readbeyond/aeneas/blob/master/LICENSE) for details. Licenses for third party code and files included in **aeneas** can be found in the [licenses](https://github.com/readbeyond/aeneas/blob/master/licenses/README.md) directory. No copy rights were harmed in the making of this project. ## Supporting and Contributing ### Sponsors * **July 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) generously supported the development of the boundary adjustment code (v1.0.4) * **August 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) partially sponsored the port of the MFCC/DTW code to C (v1.1.0) * **September 2015**: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0) * **October 2015**: an anonymous donation sponsored the development of the "YouTube downloader" option (v1.3.0) * **April 2016**: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0 * **December 2016**: the [Centro Internazionale Del Libro Parlato "Adriano Sernagiotto"](http://www.libroparlato.org/) (Feltre, Italy) partially sponsored the development of the v1.7 series ### Supporting Would you like supporting the development of **aeneas**? I accept sponsorships to * fix bugs, * add new features, * improve the quality and the performance of the code, * port the code to other languages/platforms, and * improve the documentation. Feel free to [get in touch](mailto:aeneas@readbeyond.it). ### Contributing If you think you found a bug or you have a feature request, please use the [GitHub issue tracker](https://github.com/readbeyond/aeneas/issues) to submit it. If you want to ask a question about using **aeneas**, your best option consists in sending an email to the [mailing list](https://groups.google.com/d/forum/aeneas-forced-alignment). Finally, code contributions are welcome! Please refer to the [Code Contribution Guide](https://github.com/readbeyond/aeneas/blob/master/wiki/CONTRIBUTING.md) for details about the branch policies and the code style to follow. ## Acknowledgments Many thanks to **Nicola Montecchio**, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text. **Paolo Bertasi**, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage. **Chris Hubbard** prepared the files for packaging aeneas as a Debian/Ubuntu `.deb`. **Daniel Bair** prepared the `brew` formula for installing **aeneas** and its dependencies on Mac OS X. **Daniel Bair**, **Chris Hubbard**, and **Richard Margetts** packaged the installers for Mac OS X and Windows. **Firat Ozdemir** contributed the `finetuneas` HTML/JS code for fine tuning sync maps in the browser. **Willem van der Walt** contributed the code snippet to output a sync map in TextGrid format. **Chris Vaughn** contributed the MacOS TTS wrapper. All the mighty [GitHub contributors](https://github.com/readbeyond/aeneas/graphs/contributors), and the members of the [Google Group](https://groups.google.com/d/forum/aeneas-forced-alignment).