{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Fish Speech" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For Windows User / win用户" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "bat" } }, "outputs": [], "source": [ "!chcp 65001" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For Linux User / Linux 用户" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import locale\n", "locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# For Chinese users, you probably want to use mirror to accelerate downloading\n", "# !set HF_ENDPOINT=https://hf-mirror.com\n", "# !export HF_ENDPOINT=https://hf-mirror.com \n", "\n", "!huggingface-cli download fishaudio/fish-speech-1.2-sft --local-dir checkpoints/fish-speech-1.2-sft/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## WebUI Inference\n", "\n", "> You can use --compile to fuse CUDA kernels for faster inference (10x)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "!python tools/webui.py \\\n", " --llama-checkpoint-path checkpoints/fish-speech-1.2-sft \\\n", " --decoder-checkpoint-path checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth \\\n", " # --compile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Break-down CLI Inference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Encode reference audio: / 从语音生成 prompt: \n", "\n", "You should get a `fake.npy` file.\n", "\n", "你应该能得到一个 `fake.npy` 文件." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "## Enter the path to the audio file here\n", "src_audio = r\"D:\\PythonProject\\vo_hutao_draw_appear.wav\"\n", "\n", "!python tools/vqgan/inference.py \\\n", " -i {src_audio} \\\n", " --checkpoint-path \"checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n", "\n", "from IPython.display import Audio, display\n", "audio = Audio(filename=\"fake.wav\")\n", "display(audio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Generate semantic tokens from text: / 从文本生成语义 token:\n", "\n", "> This command will create a codes_N file in the working directory, where N is an integer starting from 0.\n", "\n", "> You may want to use `--compile` to fuse CUDA kernels for faster inference (~30 tokens/second -> ~300 tokens/second).\n", "\n", "> 该命令会在工作目录下创建 codes_N 文件, 其中 N 是从 0 开始的整数.\n", "\n", "> 您可以使用 `--compile` 来融合 cuda 内核以实现更快的推理 (~30 tokens/秒 -> ~300 tokens/秒)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "!python tools/llama/generate.py \\\n", " --text \"hello world\" \\\n", " --prompt-text \"The text corresponding to reference audio\" \\\n", " --prompt-tokens \"fake.npy\" \\\n", " --checkpoint-path \"checkpoints/fish-speech-1.2-sft\" \\\n", " --num-samples 2\n", " # --compile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Generate speech from semantic tokens: / 从语义 token 生成人声:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" } }, "outputs": [], "source": [ "!python tools/vqgan/inference.py \\\n", " -i \"codes_0.npy\" \\\n", " --checkpoint-path \"checkpoints/fish-speech-1.2-sft/firefly-gan-vq-fsq-4x1024-42hz-generator.pth\"\n", "\n", "from IPython.display import Audio, display\n", "audio = Audio(filename=\"fake.wav\")\n", "display(audio)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 2 }