chrome.tts
Description: |
Use the chrome.tts API to play synthesized text-to-speech (TTS). See also the related ttsEngine API, which allows an extension to implement a speech engine.
|
Availability: |
Stable since Chrome 14.
|
Permissions: |
"tts"
|
Learn More: |
Chrome Office Hours: Text to Speech API
|
Overview
Chrome provides native support for speech on Windows (using SAPI 5), Mac OS X, and Chrome OS, using speech synthesis capabilities provided by the operating system. On all platforms, the user can install extensions that register themselves as alternative speech engines.
Generating speech
Call speak()
from your extension or
Chrome App to speak. For example:
chrome.tts.speak('Hello, world.');
To stop speaking immediately, just call stop()
:
chrome.tts.stop();
You can provide options that control various properties of the speech, such as its rate, pitch, and more. For example:
chrome.tts.speak('Hello, world.', {'rate': 2.0});
It's also a good idea to specify the language so that a synthesizer supporting that language (and regional dialect, if applicable) is chosen.
chrome.tts.speak( 'Hello, world.', {'lang': 'en-US', 'rate': 2.0});
By default, each call to speak()
interrupts any
ongoing speech and speaks immediately. To determine if a call would be
interrupting anything, you can call isSpeaking()
. In
addition, you can use the enqueue
option to cause this
utterance to be added to a queue of utterances that will be spoken
when the current utterance has finished.
chrome.tts.speak( 'Speak this first.'); chrome.tts.speak( 'Speak this next, when the first sentence is done.', {'enqueue': true});
A complete description of all options can be found in the tts.speak below. Not all speech engines will support all options.
To catch errors and make sure you're calling speak()
correctly, pass a callback function that takes no arguments. Inside
the callback, check
runtime.lastError
to see if there were any errors.
chrome.tts.speak( utterance, options, function() { if (chrome.runtime.lastError) { console.log('Error: ' + chrome.runtime.lastError.message); } });
The callback returns right away, before the engine has started generating speech. The purpose of the callback is to alert you to syntax errors in your use of the TTS API, not to catch all possible errors that might occur in the process of synthesizing and outputting speech. To catch these errors too, you need to use an event listener, described below.
Listening to events
To get more real-time information about the status of synthesized speech,
pass an event listener in the options to speak()
, like this:
chrome.tts.speak( utterance, { onEvent: function(event) { console.log('Event ' + event.type ' at position ' + event.charIndex); if (event.type == 'error') { console.log('Error: ' + event.errorMessage); } } }, callback);
Each event includes an event type, the character index of the current speech relative to the utterance, and for error events, an optional error message. The event types are:
'start'
: The engine has started speaking the utterance.'word'
: A word boundary was reached. Useevent.charIndex
to determine the current speech position.'sentence'
: A sentence boundary was reached. Useevent.charIndex
to determine the current speech position.'marker'
: An SSML marker was reached. Useevent.charIndex
to determine the current speech position.'end'
: The engine has finished speaking the utterance.'interrupted'
: This utterance was interrupted by another call tospeak()
orstop()
and did not finish.'cancelled'
: This utterance was queued, but then cancelled by another call tospeak()
orstop()
and never began to speak at all.'error'
: An engine-specific error occurred and this utterance cannot be spoken. Checkevent.errorMessage
for details.
Four of the event types—'end'
, 'interrupted'
,
'cancelled'
, and 'error'
—are final.
After one of those events is received, this utterance will no longer
speak and no new events from this utterance will be received.
Some voices may not support all event types, and some voices may not
send any events at all. If you do not want to use a voice unless it sends
certain events, pass the events you require in the
requiredEventTypes
member of the options object, or use
getVoices()
to choose a voice that meets your requirements.
Both are documented below.
SSML markup
Utterances used in this API may include markup using the
Speech Synthesis Markup
Language (SSML). If you use SSML, the first argument to
speak()
should be a complete SSML document with an XML
header and a top-level <speak>
tag, not a document
fragment.
For example:
chrome.tts.speak( '<?xml version="1.0"?>' + '<speak>' + ' The <emphasis>second</emphasis> ' + ' word of this sentence was emphasized.' + '</speak>');
Not all speech engines will support all SSML tags, and some may not support SSML at all, but all engines are required to ignore any SSML they don't support and to still speak the underlying text.
Choosing a voice
By default, Chrome chooses the most appropriate voice for each utterance you want to speak, based on the language and gender. On most Windows, Mac OS X, and Chrome OS systems, speech synthesis provided by the operating system should be able to speak any text in at least one language. Some users may have a variety of voices available, though, from their operating system and from speech engines implemented by other Chrome extensions. In those cases, you can implement custom code to choose the appropriate voice, or to present the user with a list of choices.
To get a list of all voices, call getVoices()
and pass it
a function that receives an array of TtsVoice
objects as its
argument:
chrome.tts.getVoices( function(voices) { for (var i = 0; i < voices.length; i++) { console.log('Voice ' + i + ':'); console.log(' name: ' + voices[i].voiceName); console.log(' lang: ' + voices[i].lang); console.log(' gender: ' + voices[i].gender); console.log(' extension id: ' + voices[i].extensionId); console.log(' event types: ' + voices[i].eventTypes); } });
Summary
Types | |
---|---|
TtsEvent | |
TtsVoice | |
Methods | |
speak −
chrome.tts.speak(string utterance, object options, function callback)
| |
stop −
chrome.tts.stop()
| |
pause −
chrome.tts.pause()
| |
resume −
chrome.tts.resume()
| |
isSpeaking −
chrome.tts.isSpeaking(function callback)
| |
getVoices −
chrome.tts.getVoices(function callback)
|
Types
TtsEvent
properties | ||
---|---|---|
enum of "start" , "end" , "word" , "sentence" , "marker" , "interrupted" , "cancelled" , "error" , "pause" , or "resume" |
type | The type can be 'start' as soon as speech has started, 'word' when a word boundary is reached, 'sentence' when a sentence boundary is reached, 'marker' when an SSML mark element is reached, 'end' when the end of the utterance is reached, 'interrupted' when the utterance is stopped or interrupted before reaching the end, 'cancelled' when it's removed from the queue before ever being synthesized, or 'error' when any other error occurs. When pausing speech, a 'pause' event is fired if a particular utterance is paused in the middle, and 'resume' if an utterance resumes speech. Note that pause and resume events may not fire if speech is paused in-between utterances. |
double | (optional) charIndex | The index of the current character in the utterance. |
string | (optional) errorMessage | The error description, if the event type is 'error'. |
TtsVoice
properties | ||
---|---|---|
string | (optional) voiceName | The name of the voice. |
string | (optional) lang | The language that this voice supports, in the form language-region. Examples: 'en', 'en-US', 'en-GB', 'zh-CN'. |
enum of "male" , or "female" |
(optional) gender | This voice's gender. |
boolean | (optional) remote | If true, the synthesis engine is a remote network resource. It may be higher latency and may incur bandwidth costs. |
string | (optional) extensionId | The ID of the extension providing this voice. |
array of string | (optional) eventTypes | All of the callback event types that this voice is capable of sending. |
Methods
speak
chrome.tts.speak(string utterance, object options, function callback)
Speaks text using a text-to-speech engine.
Parameters | |||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
string | utterance | The text to speak, either plain text or a complete, well-formed SSML document. Speech engines that do not support SSML will strip away the tags and speak the text. The maximum length of the text is 32,768 characters. | |||||||||||||||||||||||||||||||||||||||
object | (optional) options |
The speech options.
|
|||||||||||||||||||||||||||||||||||||||
function | (optional) callback |
Called right away, before speech finishes. Check chrome.runtime.lastError to make sure there were no errors. Use options.onEvent to get more detailed feedback.
If you specify the callback parameter, it should be a function that looks like this: function() {...};
|
stop
chrome.tts.stop()
Stops any current speech and flushes the queue of any pending utterances. In addition, if speech was paused, it will now be un-paused for the next call to speak.
pause
chrome.tts.pause()
Pauses speech synthesis, potentially in the middle of an utterance. A call to resume or stop will un-pause speech.
resume
chrome.tts.resume()
If speech was paused, resumes speaking where it left off.
isSpeaking
chrome.tts.isSpeaking(function callback)
Checks whether the engine is currently speaking. On Mac OS X, the result is true whenever the system speech engine is speaking, even if the speech wasn't initiated by Chrome.
Parameters | |||||
---|---|---|---|---|---|
function | (optional) callback |
If you specify the callback parameter, it should be a function that looks like this: function(boolean speaking) {...};
|
getVoices
chrome.tts.getVoices(function callback)
Gets an array of all available voices.
Parameters | |||||
---|---|---|---|---|---|
function | (optional) callback |
If you specify the callback parameter, it should be a function that looks like this: function(array of TtsVoice voices) {...};
|