{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import panel as pn\n", "\n", "from panel.widgets import SpeechToText, GrammarList\n", "\n", "pn.extension()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `SpeechToText` widget controls the speech recognition service of the browser by wrapping the [HTML5 `SpeechRecognition` API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition).\n", "\n", "The functionality is **experimental** and **only supported by Chrome and a few other browsers**. See https://caniuse.com/speech-recognition or https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition#Browser_compatibility for an up-to-date list of browsers supporting the SpeechRecognition API. Even in Chrome the `grammars`, `interim_results` and `max_alternatives` parameters are not yet supported.\n", "\n", "On some browsers, like Chrome, using Speech Recognition on a web page involves a server-based recognition engine. **Your audio is sent to a web service for recognition processing, so it won't work offline**. Whether this is secure and confidential enough for your use case is up to you to evaluate.\n", "\n", "#### Parameters:\n", "\n", "* **``results``** (List\\[Dict\\]): The results recognized. A list of Dictionaries.\n", "* **``value``** (str): The most recent SpeechRecognition result as a string.\n", "\n", "* **``lang``** (str): The language of the current SpeechRecognition service in BCP 47 format. For example 'en-US'.\n", "* **``continuous``** (bool): Controls whether continuous results are returned for each recognition, or only a single result. Defaults to False.\n", "* **``interim_results``** (boolean): Controls whether interim results should be returned (True) or not (False.) Interim results are results that are not yet final (e.g. the RecognitionResult.is_final property is False).\n", "* **``max_alternatives``** (int): Sets the maximum number of RecognitionAlternatives provided per result. A number between 1 and 5. The default value is 1.\n", "* **``service_uri``** (str): Specifies the location of the speech recognition service used by the current SpeechRecognition service to handle the actual recognition. The default is the user agent's default speech service.\n", "* **``grammars``** (GrammarList): A GrammarList object that represents the grammars that will be understood by the current SpeechRecognition service.\n", "\n", "* **``started``** (boolean): Returns True if the Speech Recognition Service is started and False otherwise.\n", "* **``audio_started``** (boolean): Returns True if the Audio is started and False otherwise\n", "* **``sound_started``** (boolean): Returns True if the Sound is started and False otherwise\n", "* **``speech_started``** (boolean): Returns True if the the User has started speaking and False otherwise\n", "\n", "* **``button_hide``** (bool): Whether to show (False) or hide (True) the toggle start/ stop button.\n", "* **``button_type``** (str): One of 'default', 'primary', 'success', 'warning', 'danger' and 'light'.\n", "* **``button_not_started``** (str): The text to show on the button when the SpeechRecognition service is NOT started. If '' a *muted microphone* icon is shown.\n", "* **``button_started``** (str): The text to show on the button when the SpeechRecognition service is started. If '' a *muted microphone* icon is shown.\n", "\n", "##### Events\n", "\n", "Events are transient boolean parameters which when set to true trigger an event and then immediately revert to False.\n", "\n", "* **``start``** (bool): Starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition.\n", "* **``stop``** (bool): Stops the speech recognition service from listening to incoming audio, and attempts to return a list of RecognitionResult using the audio captured so far.\n", "* **``abort``** (bool): Stops the speech recognition service from listening to incoming audio, and doesn't attempt to return a list of RecognitionResult.\n", "\n", "#### Properties\n", "\n", "* **``results_deserialized``** (List\\[RecognitionResult\\]): The results recognized. A list of RecognitionResult objects.\n", "* **``results_as_html``** (str): Returns the `results` formatted as html. Convenience property for ease of use.\n", "\n", "### Grammar\n", "\n", "A set of words or patterns of words that we want the speech recognition service to recognize.\n", "\n", "For example\n", "\n", "```python\n", "grammar = Grammar(\n", " src='#JSGF V1.0; grammar colors; public = aqua | azure | beige;',\n", " weight=0.7\n", ")\n", "```\n", "\n", "Wraps the [HTML SpeechGrammar API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechGrammar).\n", "\n", "#### Parameters:\n", "\n", "* **``src``** (str): A set of words or patterns of words that we want the recognition service to recognize. Defined using JSpeech Grammar Format. See https://www.w3.org/TR/jsgf/.\n", "* **``uri``** (str): An uri pointing to the definition. If src is available it will be used. Otherwise uri. The uri will be loaded on the client side only.\n", "* **``weight``** (float): The weight of the grammar. A number in the range 0–1. Default is 1.\n", "\n", "### GrammarList\n", "\n", "A list of Grammar objects containing words or patterns of words that we want the recognition service to recognize.\n", "\n", "For example\n", "\n", "```python\n", "grammar = '#JSGF V1.0; grammar colors; public = aqua | azure | beige | bisque ;'\n", "grammar_list = GrammarList()\n", "grammar_list.add_from_string(grammar, 1)\n", "```\n", "\n", "Wraps the [HTML 5 SpeechGrammarList API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechGrammarList).\n", "\n", "#### Methods:\n", "\n", "* **``add_from_string``** (src: str, weight: float): Takes a grammar src and weight and adds it to the GrammarList as a new Grammar object. The new Grammar object is returned.\n", "* **``add_from_uri``** (uri: str, weight: float): Takes a grammar uri and weight and adds it to the GrammarList as a new Grammar object. The new Grammar object is returned.\n", "\n", "### RecognitionAlternative\n", "\n", "The RecognitionAlternative represents a word or sentence that has been recognised by the speech recognition service.\n", "\n", "Wraps the [HTML5 SpeechRecognitionAlternative API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognitionAlternative).\n", "\n", "#### Methods:\n", "\n", "* **``confidence``** (float): A numeric estimate between 0 and 1 of how confident the speech recognition system is that the recognition is correct.\n", "* **``transcript``** (str): The transcript of the recognised word or sentence.\n", "\n", "\n", "### RecognitionResult\n", "\n", "The Result represents a single recognition match, which may contain multiple RecognitionAlternative objects.\n", "\n", "Wraps the [HTML5 SpeechRecognitionResult API](https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognitionResult).\n", "\n", "\n", "#### Methods:\n", "\n", "* **``is_final``** (boolean): A Boolean that states whether this result is final (True) or not (False) — if so, then this is the final time this result will be returned; if not, then this result is an interim result, and may be updated later on.\n", "* **``alternatives``** (List\\[RecognitionResult\\]): The list of the n-best alternatives.\n", "\n", "___" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "speech_to_text_basic = SpeechToText(button_type=\"light\")\n", "\n", "pn.Row(speech_to_text_basic.controls(['value'], jslink=False), speech_to_text_basic)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get the most recent result we can simply access the `value` parameter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "speech_to_text_basic.value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more detailed results including the confidence level access the `results` parameter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "speech_to_text_basic.results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Advanced Example\n", "\n", "We start by instantiating a `SpeechToText` object with a `GrammarList`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "grammar_list = GrammarList()\n", "\n", "src = \"#JSGF V1.0; grammar colors; public = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;\"\n", "grammar_list.add_from_string(src, 1)\n", "\n", "speech_to_text = SpeechToText(button_type=\"light\", grammars=grammar_list, height=50)\n", "\n", "controls = speech_to_text.controls(jslink=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we create a callback which will render the `results_as_html`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@pn.depends(speech_to_text)\n", "def results(results):\n", " return pn.pane.HTML(speech_to_text.results_as_html, width=100, margin=(0, 15, 0, 15))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we compose this into an `app`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "app = pn.Row(controls, speech_to_text, results)\n", "app.servable()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Try changing some of the parameters. For example changing the `continuous` parameter will keep the SpeechRecognition service open which lets you say multiple statements after each other." ] } ], "metadata": { "language_info": { "name": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }