{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" }, "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "source": [ "# Whisper en français" ], "metadata": { "id": "P9O5UDYiQz4C" } }, { "cell_type": "markdown", "source": [ "- Autor: [Pierre GUILLOU](https://www.linkedin.com/in/pierreguillou)\n", "- Date: 09/12/2022\n", "- Blog post: [Speech-to-Text & IA | Transcreva qualquer áudio para o português com o Whisper (OpenAI)... sem nenhum custo!](https://medium.com/@pierre_guillou/speech-to-text-ia-transcreva-qualquer-%C3%A1udio-para-o-portugu%C3%AAs-com-o-whisper-openai-sem-ad0c17384681)" ], "metadata": { "id": "uZhdPaX_RLTk" } }, { "cell_type": "markdown", "source": [ "## Information" ], "metadata": { "id": "R2mqmp4fQ3k9" } }, { "cell_type": "markdown", "source": [ "Optional: Get GPU Information (run this before starting the interface)" ], "metadata": { "id": "BNZoCQ7v-u2y" } }, { "cell_type": "code", "source": [ "!nvidia-smi -L" ], "metadata": { "id": "dRgRXEAN-yGH", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "5ab92bfb-32bd-4e37-9a83-2578f067af39" }, "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "GPU 0: Tesla T4 (UUID: GPU-ee6eaa22-abaf-d6d9-97aa-a143d567b3ad)\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Checking out Whisper in French from Git" ], "metadata": { "id": "aHplamANQ_IQ" } }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "h85P7mbOVfby", "outputId": "95529fa9-0a96-48a8-897d-dc23d3f4e95a" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Cloning into 'whisper-demo-french'...\n", "remote: Enumerating objects: 27, done.\u001b[K\n", "remote: Counting objects: 100% (20/20), done.\u001b[K\n", "remote: Compressing objects: 100% (20/20), done.\u001b[K\n", "remote: Total 27 (delta 9), reused 0 (delta 0), pack-reused 7\u001b[K\n", "Unpacking objects: 100% (27/27), done.\n" ] } ], "source": [ "!git clone https://huggingface.co/spaces/pierreguillou/whisper-demo-french" ] }, { "cell_type": "markdown", "source": [ "Optional: Update Git repository" ], "metadata": { "id": "-mZsftNvqYze" } }, { "cell_type": "code", "source": [ "!cd whisper-demo-french/ && git pull origin" ], "metadata": { "id": "rJYJGFJPqbdL", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "4abc5e91-b4e3-49b3-a168-e0981f92931d" }, "execution_count": 3, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Already up to date.\n" ] } ] }, { "cell_type": "markdown", "source": [ "# Authorize public link" ], "metadata": { "id": "4IvV8cc_VV-9" } }, { "cell_type": "code", "source": [ "# open the app.py file\n", "with open(\"/content/whisper-demo-french/app.py\", 'r') as file :\n", " filedata = file.read()\n", "\n", "# Authorize public link\n", "old_text = 'demo.launch(enable_queue=True)'\n", "new_text = 'demo.launch(enable_queue=True, share=True)'\n", "filedata = filedata.replace(old_text,new_text)\n", "\n", "# update the app.py file\n", "with open(\"/content/whisper-demo-french/app.py\", 'w') as file:\n", " file.write(filedata)" ], "metadata": { "id": "OTBFayF4UwbO" }, "execution_count": 4, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Installing dependencies" ], "metadata": { "id": "ohLec8LfWBeM" } }, { "cell_type": "code", "source": [ "!cd whisper-demo-french/ && pip install -r requirements.txt" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "-jCbFjWpV_ci", "outputId": "7432a634-9141-4382-ec10-f3ac30b0d867" }, "execution_count": 5, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting git+https://github.com/huggingface/transformers (from -r requirements.txt (line 1))\n", " Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-5q9hsovg\n", " Running command git clone -q https://github.com/huggingface/transformers /tmp/pip-req-build-5q9hsovg\n", " Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", " Preparing wheel metadata ... \u001b[?25l\u001b[?25hdone\n", "Requirement already satisfied: torch in /usr/local/lib/python3.8/dist-packages (from -r requirements.txt (line 2)) (1.13.0+cu116)\n", "Collecting pytube\n", " Downloading pytube-12.1.0-py3-none-any.whl (56 kB)\n", "\u001b[K |████████████████████████████████| 56 kB 3.9 MB/s \n", "\u001b[?25hCollecting gradio\n", " Downloading gradio-3.12.0-py3-none-any.whl (11.6 MB)\n", "\u001b[K |████████████████████████████████| 11.6 MB 15.3 MB/s \n", "\u001b[?25hRequirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (21.3)\n", "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n", " Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)\n", "\u001b[K |████████████████████████████████| 7.6 MB 52.4 MB/s \n", "\u001b[?25hCollecting huggingface-hub<1.0,>=0.10.0\n", " Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)\n", "\u001b[K |████████████████████████████████| 182 kB 68.7 MB/s \n", "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (3.8.0)\n", "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (6.0)\n", "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (1.21.6)\n", "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (2022.6.2)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (2.23.0)\n", "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.8/dist-packages (from transformers==4.26.0.dev0->-r requirements.txt (line 1)) (4.64.1)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.8/dist-packages (from huggingface-hub<1.0,>=0.10.0->transformers==4.26.0.dev0->-r requirements.txt (line 1)) (4.4.0)\n", "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging>=20.0->transformers==4.26.0.dev0->-r requirements.txt (line 1)) (3.0.9)\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (1.3.5)\n", "Collecting uvicorn\n", " Downloading uvicorn-0.20.0-py3-none-any.whl (56 kB)\n", "\u001b[K |████████████████████████████████| 56 kB 4.1 MB/s \n", "\u001b[?25hCollecting websockets>=10.0\n", " Downloading websockets-10.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (106 kB)\n", "\u001b[K |████████████████████████████████| 106 kB 81.7 MB/s \n", "\u001b[?25hRequirement already satisfied: pillow in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (7.1.2)\n", "Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (3.8.3)\n", "Collecting ffmpy\n", " Downloading ffmpy-0.3.0.tar.gz (4.8 kB)\n", "Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (3.2.2)\n", "Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (2.11.3)\n", "Collecting markdown-it-py[linkify,plugins]\n", " Downloading markdown_it_py-2.1.0-py3-none-any.whl (84 kB)\n", "\u001b[K |████████████████████████████████| 84 kB 4.4 MB/s \n", "\u001b[?25hCollecting orjson\n", " Downloading orjson-3.8.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (278 kB)\n", "\u001b[K |████████████████████████████████| 278 kB 81.7 MB/s \n", "\u001b[?25hCollecting python-multipart\n", " Downloading python-multipart-0.0.5.tar.gz (32 kB)\n", "Collecting h11<0.13,>=0.11\n", " Downloading h11-0.12.0-py3-none-any.whl (54 kB)\n", "\u001b[K |████████████████████████████████| 54 kB 3.9 MB/s \n", "\u001b[?25hCollecting pydub\n", " Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", "Collecting paramiko\n", " Downloading paramiko-2.12.0-py2.py3-none-any.whl (213 kB)\n", "\u001b[K |████████████████████████████████| 213 kB 78.6 MB/s \n", "\u001b[?25hRequirement already satisfied: pydantic in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (1.10.2)\n", "Collecting httpx\n", " Downloading httpx-0.23.1-py3-none-any.whl (84 kB)\n", "\u001b[K |████████████████████████████████| 84 kB 5.1 MB/s \n", "\u001b[?25hCollecting fastapi\n", " Downloading fastapi-0.88.0-py3-none-any.whl (55 kB)\n", "\u001b[K |████████████████████████████████| 55 kB 4.8 MB/s \n", "\u001b[?25hRequirement already satisfied: fsspec in /usr/local/lib/python3.8/dist-packages (from gradio->-r requirements.txt (line 4)) (2022.11.0)\n", "Collecting pycryptodome\n", " Downloading pycryptodome-3.16.0-cp35-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.3 MB)\n", "\u001b[K |████████████████████████████████| 2.3 MB 60.7 MB/s \n", "\u001b[?25hRequirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (1.3.3)\n", "Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (2.1.1)\n", "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (1.8.2)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (6.0.3)\n", "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (1.3.1)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (22.1.0)\n", "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.8/dist-packages (from aiohttp->gradio->-r requirements.txt (line 4)) (4.0.2)\n", "Requirement already satisfied: idna>=2.0 in /usr/local/lib/python3.8/dist-packages (from yarl<2.0,>=1.0->aiohttp->gradio->-r requirements.txt (line 4)) (2.10)\n", "Collecting starlette==0.22.0\n", " Downloading starlette-0.22.0-py3-none-any.whl (64 kB)\n", "\u001b[K |████████████████████████████████| 64 kB 3.5 MB/s \n", "\u001b[?25hCollecting anyio<5,>=3.4.0\n", " Downloading anyio-3.6.2-py3-none-any.whl (80 kB)\n", "\u001b[K |████████████████████████████████| 80 kB 7.7 MB/s \n", "\u001b[?25hCollecting sniffio>=1.1\n", " Downloading sniffio-1.3.0-py3-none-any.whl (10 kB)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.8/dist-packages (from httpx->gradio->-r requirements.txt (line 4)) (2022.9.24)\n", "Collecting httpcore<0.17.0,>=0.15.0\n", " Downloading httpcore-0.16.2-py3-none-any.whl (68 kB)\n", "\u001b[K |████████████████████████████████| 68 kB 8.8 MB/s \n", "\u001b[?25hCollecting rfc3986[idna2008]<2,>=1.3\n", " Downloading rfc3986-1.5.0-py2.py3-none-any.whl (31 kB)\n", "Collecting httpcore<0.17.0,>=0.15.0\n", " Downloading httpcore-0.16.1-py3-none-any.whl (68 kB)\n", "\u001b[K |████████████████████████████████| 68 kB 9.3 MB/s \n", "\u001b[?25h Downloading httpcore-0.16.0-py3-none-any.whl (68 kB)\n", "\u001b[K |████████████████████████████████| 68 kB 9.7 MB/s \n", "\u001b[?25h Downloading httpcore-0.15.0-py3-none-any.whl (68 kB)\n", "\u001b[K |████████████████████████████████| 68 kB 9.1 MB/s \n", "\u001b[?25hRequirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.8/dist-packages (from jinja2->gradio->-r requirements.txt (line 4)) (2.0.1)\n", "Collecting mdurl~=0.1\n", " Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)\n", "Collecting linkify-it-py~=1.0\n", " Downloading linkify_it_py-1.0.3-py3-none-any.whl (19 kB)\n", "Collecting mdit-py-plugins\n", " Downloading mdit_py_plugins-0.3.3-py3-none-any.whl (50 kB)\n", "\u001b[K |████████████████████████████████| 50 kB 8.7 MB/s \n", "\u001b[?25hCollecting uc-micro-py\n", " Downloading uc_micro_py-1.0.1-py3-none-any.whl (6.2 kB)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib->gradio->-r requirements.txt (line 4)) (0.11.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->gradio->-r requirements.txt (line 4)) (2.8.2)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib->gradio->-r requirements.txt (line 4)) (1.4.4)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.1->matplotlib->gradio->-r requirements.txt (line 4)) (1.15.0)\n", "Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas->gradio->-r requirements.txt (line 4)) (2022.6)\n", "Collecting pynacl>=1.0.1\n", " Downloading PyNaCl-1.5.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (856 kB)\n", "\u001b[K |████████████████████████████████| 856 kB 72.9 MB/s \n", "\u001b[?25hCollecting cryptography>=2.5\n", " Downloading cryptography-38.0.4-cp36-abi3-manylinux_2_24_x86_64.whl (4.0 MB)\n", "\u001b[K |████████████████████████████████| 4.0 MB 42.5 MB/s \n", "\u001b[?25hCollecting bcrypt>=3.1.3\n", " Downloading bcrypt-4.0.1-cp36-abi3-manylinux_2_24_x86_64.whl (593 kB)\n", "\u001b[K |████████████████████████████████| 593 kB 59.4 MB/s \n", "\u001b[?25hRequirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.8/dist-packages (from cryptography>=2.5->paramiko->gradio->-r requirements.txt (line 4)) (1.15.1)\n", "Requirement already satisfied: pycparser in /usr/local/lib/python3.8/dist-packages (from cffi>=1.12->cryptography>=2.5->paramiko->gradio->-r requirements.txt (line 4)) (2.21)\n", "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests->transformers==4.26.0.dev0->-r requirements.txt (line 1)) (1.24.3)\n", "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.8/dist-packages (from requests->transformers==4.26.0.dev0->-r requirements.txt (line 1)) (3.0.4)\n", "Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.8/dist-packages (from uvicorn->gradio->-r requirements.txt (line 4)) (7.1.2)\n", "Building wheels for collected packages: transformers, ffmpy, python-multipart\n", " Building wheel for transformers (PEP 517) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for transformers: filename=transformers-4.26.0.dev0-py3-none-any.whl size=5949083 sha256=a53c9ea828f6fce1dac4a5ad5086de4777c7c9b755fe02903fa5ab420024225f\n", " Stored in directory: /tmp/pip-ephem-wheel-cache-sjohwyly/wheels/42/68/45/c63edff61c292f2dfd4df4ef6522dcbecc603e7af82813c1d7\n", " Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for ffmpy: filename=ffmpy-0.3.0-py3-none-any.whl size=4711 sha256=0bcd606261da15668b4d2565353b62a32cc5f28473fd07cf1e7eeed30f465806\n", " Stored in directory: /root/.cache/pip/wheels/ff/5b/59/913b443e7369dc04b61f607a746b6f7d83fb65e2e19fcc958d\n", " Building wheel for python-multipart (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for python-multipart: filename=python_multipart-0.0.5-py3-none-any.whl size=31678 sha256=524cc449a359f4117679cf963c065dfec9bb3ad7260d238103dd5c73e570eb2b\n", " Stored in directory: /root/.cache/pip/wheels/9e/fc/1c/cf980e6413d3ee8e70cd8f39e2366b0f487e3e221aeb452eb0\n", "Successfully built transformers ffmpy python-multipart\n", "Installing collected packages: sniffio, mdurl, uc-micro-py, rfc3986, markdown-it-py, h11, anyio, starlette, pynacl, mdit-py-plugins, linkify-it-py, httpcore, cryptography, bcrypt, websockets, uvicorn, tokenizers, python-multipart, pydub, pycryptodome, paramiko, orjson, huggingface-hub, httpx, ffmpy, fastapi, transformers, pytube, gradio\n", "Successfully installed anyio-3.6.2 bcrypt-4.0.1 cryptography-38.0.4 fastapi-0.88.0 ffmpy-0.3.0 gradio-3.12.0 h11-0.12.0 httpcore-0.15.0 httpx-0.23.1 huggingface-hub-0.11.1 linkify-it-py-1.0.3 markdown-it-py-2.1.0 mdit-py-plugins-0.3.3 mdurl-0.1.2 orjson-3.8.3 paramiko-2.12.0 pycryptodome-3.16.0 pydub-0.25.1 pynacl-1.5.0 python-multipart-0.0.5 pytube-12.1.0 rfc3986-1.5.0 sniffio-1.3.0 starlette-0.22.0 tokenizers-0.13.2 transformers-4.26.0.dev0 uc-micro-py-1.0.1 uvicorn-0.20.0 websockets-10.4\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Run the interface" ], "metadata": { "id": "x55yRVjOWW3c" } }, { "cell_type": "code", "source": [ "!cd whisper-demo-french/ && python app.py" ], "metadata": { "id": "c4zO-l6PYIDv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# END" ], "metadata": { "id": "81f5rCIkRihD" } } ] }