{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "Cen6EE6H_dGo" }, "source": [ "
\n", "🗾 使い方\n", "\n", "1. ⚙️ **ランタイムのタイプを変更** \n", " 上部メニューの「ランタイム」→「ランタイムのタイプを変更」からハードウェアを選択します \n", " * **T4 GPU**: < おすすめ > 高速に処理できます \n", " * **CPU**: 時間がかかってもいい場合や、GPU枠を節約したい時 \n", "\n", "2. 🔌 **ランタイムに接続** \n", " 右上の「接続」をクリックして準備します \n", "\n", "3. 📑 **モードの選択** \n", " **mode** を設定します: パソコン内のファイルなら `Upload` 、GoogleDriveなら `GoogleDrive`を選択 \n", "\n", "4. 📂 **フォルダー名を入力 (GoogleDriveモード)** \n", " **drive_folder** にフォルダー名を入力します (初期値: `/Whisper/`) \n", "\n", "5. ▶️ **再生ボタンを押して実行** \n", " セルの左側にある再生ボタンをクリックして実行します \n", "\n", "6. 📤 **ファイルのアップロード (Uploadモードの場合)** \n", " 途中で「ファイル選択」ボタンが表示されるので、ファイルを選択してください \n", "\n", "7. 🔄 **続けて実行する場合** \n", " 手順3に戻り、設定を変更して再度再生ボタンを押します \n", "\n", "8. ⚠️ **終わったら接続解除 (ゼッタイ!)** \n", " **「ランタイムを接続解除して削除」** を必ず行ってください \n", "\n", "
\n", "\n", "
\n", "🔑 Gemini API キーの設定\n", "\n", "Gemini 3 Flash による「下読み」機能を有効にするために設定が必要です \n", "\n", "1. **APIキーを取得**: [Google AI Studio](https://aistudio.google.com/app/apikey) でキーを作成します \n", "2. **Colabに登録**: 画面左側の **鍵アイコン(シークレット)** をクリック \n", "3. **追加**: 名前を `GEMINI_API_KEY` とし、値を貼り付けます \n", "4. **許可**: 「ノートブックからのアクセス」のチェックを **ON** にしてください \n", "\n", "> [!TIP] APIキーがなくても動作します \n", "> キーが設定されていない場合、Geminiによる抽出プロセスのみがスキップされ、通常のWhisper文字起こしとして動作します \n", "\n", "### ⚠️ 無料枠での利用に関する注意 (Free Tier) \n", "\n", "* **データの取り扱い**: 無料枠(Free Tier)でファイルをアップロードして解析する場合、**入力データが Google のモデル改善(学習)に利用される可能性**があります \n", "* **機密情報の扱い**: 機密性の高い音声ファイルを扱う場合は、有料枠(Pay-as-you-go)への切り替え、またはAPIキーを設定せずに実行することを検討してください \n", "\n", "
\n", "\n", "
\n", "🛠️ 各モードの詳細\n", "\n", "### 📥 Upload Mode \n", "\n", "* **手軽な実行**: 実行中に表示されるボタンからファイルを選択するだけ \n", "* **再利用機能**: `execute_file_exists` にチェックを入れると、最後にアップロードしたファイルを再利用できます(パラメータを調整して試したい時に便利!) \n", "* **自動ダウンロード**: 完了後、結果ファイル(`.srt` / `.log`)がブラウザから自動でダウンロードされます \n", "\n", "### ☁️ GoogleDrive Mode \n", "\n", "* **事前準備**: 実行前に、処理したいファイルを Drive 内の指定フォルダ(初期値: `/Whisper/`)に入れておいてください \n", "* **自動保存**: 生成されたファイルは、音源と同じ Drive フォルダ内に直接保存されます \n", "* **一括処理**: フォルダ内の未実行ファイルのみを賢く選別して、まとめて文字起こしします \n", "\n", "### 📄 出力ファイル \n", "\n", "* **字幕ファイル (`.srt`)**: 動画編集や再生プレイヤーでそのまま使える標準形式(常に生成) \n", "* **議事録ログ (`.log`)**: タイムスタンプが記録された、内容確認に最適なテキスト(オプション) \n", "* **プレーンテキスト (`.txt`)**: タイムスタンプなしの純粋な本文テキスト(隠しオプション) \n", "\n", "
\n", "\n", "
\n", "⚙️ 設定の詳細\n", "\n", "### 💫 モデルとプロンプト \n", "\n", "* **`model_type`** \n", " * **`auto`**: ブラウザ言語を判定し、日本語なら `Kotoba-Whisper`、英語なら `turbo` を自動選択 \n", " * **`turbo`**: 早くしてほしい時に \n", " * **`large-v3`**: ガンバってほしい時に \n", " * **`Kotoba-Whisper`**: `turbo`をベースにした高速・軽量な日本語特化モデル \n", " * **`Distil-Whisper`**: `large-v3` をベースにした推論速度向上版 \n", "\n", "* **`initial_prompt`** \n", " 特定の固有名詞や専門用語の認識、句読点、漢字の変換ミスを防ぐために事前に伝えるヒント \n", " * **空欄の場合**: **Gemini 3 Flash** が音声を下読みし、最適なプロンプトを自動生成(APIキーが必要) \n", "\n", "### 🔄 動作モード \n", "\n", "* **`mode`**: \n", " * `Upload`: パソコン内のファイルを読み込む(手軽な単発処理) \n", " * `GoogleDrive`: 指定フォルダからファイルを読み込む(大量・一括処理) \n", " * `YouTube`: (棚上げ)\n", "* **`drive_folder`**: \n", " * Google Drive内の対象フォルダ名(初期値: `Whisper`) \n", "* **`execute_file_exists`** (Uploadモード専用) \n", " * **ON**: アップロード済みの最新ファイルを再利用します \n", " * **OFF**: 常に新しいファイルをアップロードします \n", "\n", "### 📄 出力オプション \n", "\n", "* **`records_text_download`**: タイムスタンプ付きの議事録(.log)を保存します \n", "* **`drive_batch_mode`** (GoogleDriveモード専用): \n", " * `未実行のみ一括処理`: まだ `.srt` が生成されていないファイルだけを探して実行します \n", " * `最新の1件のみ`: フォルダ内の最新ファイル1つだけを処理します \n", "* **`plain_text_download`** (隠しオプション): タイムスタンプなしの純粋なテキスト本文(`.txt`)を保存します \n", "\n", "### 🚀 効率化機能:既存ファイルの再利用 \n", "\n", "アップロード・ダウンロード済みの最新ファイルを再利用することで、パラメータ調整時の待ち時間を大幅に短縮できます \n", "\n", "#### `mode`を`Upload`にする \n", " * **通常**: $\\text{File Upload (60s)} + \\text{Whisper (120s)} = 180\\text{s}$ \n", " * **再利用モード**: $\\text{Whisper (120s)}$ only = **120s (33% OFF!)** \n", "\n", "
\n", "\n", "
\n", "\n", "
\n", "🌎 How to Use\n", "\n", "1. ⚙️ **Change Runtime Type** \n", " Go to \"Runtime\" -> \"Change runtime type\" in the top menu and select your hardware. \n", " * **T4 GPU**: < Recommended > For high-speed processing. \n", " * **CPU**: If you don't mind it taking longer or want to save your GPU quota. \n", "2. 🔌 **Connect to Runtime** \n", " Click \"Connect\" in the top right corner to prepare the environment. \n", "3. 📑 **Select Mode** \n", " Set the **mode**: Select `Upload` for local files or `GoogleDrive` for Google Drive. \n", "4. 📂 **Enter Folder Name (for GoogleDrive Mode)** \n", " Enter your target folder name in the **drive_folder** field. (default: `/Whisper/`) \n", "5. ▶️ **Click the Play Button** \n", " Click the play button on the left side of the cell to start the process. \n", "6. 📤 **Upload File (for Upload Mode)** \n", " When the \"Choose Files\" button appears during execution, select your audio file. \n", "7. 🔄 **To Continue** \n", " Go back to step 3, adjust settings, and click the play button again. \n", "8. ⚠️ **Disconnect (Crucial!)** \n", " Always select **\"Disconnect and delete runtime\"** from the menu when finished. \n", "\n", "
\n", "\n", "
\n", "🗝️ Gemini API Key Setup\n", "\n", "Setup is required to enable the \"Pre-reading\" feature using Gemini 3 Flash. \n", "\n", "1. **Get API Key**: Create your key at [Google AI Studio](https://aistudio.google.com/app/apikey). \n", "2. **Register in Colab**: Click the **Key icon (Secrets)** on the left sidebar. \n", "3. **Add Secret**: Set the Name to `GEMINI_API_KEY` and paste your key into the Value. \n", "4. **Grant Access**: Toggle the \"Notebook access\" switch to **ON**. \n", "\n", "> [!TIP] Works without an API Key \n", "> If no key is set, the Gemini extraction process is skipped, and the tool functions as a standard Whisper transcription. \n", "\n", "### ⚠️ Precautions (Free Tier) \n", "\n", "* **Data Privacy**: When using the Free Tier, **your input data may be used by Google to improve their models (training)**. \n", "* **Sensitive Information**: For highly confidential audio, consider switching to the Pay-as-you-go tier or running the tool without an API key. \n", "\n", "
\n", "\n", "
\n", "🔧 Mode Details\n", "\n", "### 📥 Upload Mode \n", "\n", "* **Easy Execution**: Simply select your file using the button that appears during execution. \n", "* **Reuse Feature**: Checking `execute_file_exists` allows you to reuse the last uploaded file (useful for fine-tuning parameters!). \n", "* **Auto-Download**: Result files (`.srt` / `.log`) are automatically downloaded to your browser upon completion. \n", "\n", "### ☁️ GoogleDrive Mode \n", "\n", "* **Preparation**: Before running, place your audio files in the designated Drive folder (default: `/Whisper/`). \n", "* **Auto-Save**: Generated files are saved directly in the same Drive folder as the source audio. \n", "* **Batch Processing**: Smartly identifies and processes only the files that haven't been transcribed yet. \n", "\n", "### 📄 Outputs \n", "\n", "* **Subtitle File (`.srt`)**: Standard format for video editing and players (always generated). \n", "* **Transcription Log (`.log`)**: Text with timestamps, ideal for reviewing content (optional). \n", "* **Plain Text (`.txt`)**: Pure transcript without timestamps (hidden option). \n", "\n", "
\n", "\n", "
\n", "⚒️ Parameter Details\n", "\n", "### 💫 Model & Prompt \n", "\n", "* **`model_type`** \n", " * **`auto`**: Detects browser language. Selects `Kotoba-Whisper` for Japanese and `turbo` for English. \n", " * **`turbo`**: Use when you want it fast. \n", " * **`large-v3`**: Use when you want the best possible accuracy. \n", " * **`Kotoba-Whisper`**: High-speed, lightweight model optimized for Japanese. \n", " * **`Distil-Whisper`**: A distilled version of `large-v3` with faster inference speed. \n", "* **`initial_prompt`** \n", " A prompt provided in advance to improve recognition of proper nouns, technical terms, and punctuation. \n", " * **If empty**: **Gemini 3 Flash** analyzes the audio and automatically generates the optimal prompt (Requires API key). \n", "\n", "### 🔄 Mode \n", "\n", "* **`mode`**: \n", " * `Upload`: Process files from your computer (Single task). \n", " * `GoogleDrive`: Process files from a specific folder (Bulk/Batch task). \n", " * `YouTube`: (shelved) \n", "* **`drive_folder`**: The target folder name in Google Drive (Default: `Whisper`). \n", "* **`execute_file_exists`** (Upload mode only): \n", " * **ON**: Reuses the most recently uploaded file. \n", " * **OFF**: Always prompts for a new file upload. \n", "\n", "### 📄 Outputs Options \n", "\n", "* **`records_text_download`**: Saves a transcription log with timestamps (`.log`). \n", "* **`drive_batch_mode`** (GoogleDrive mode only): \n", " * `Unprocessed Only`: Searches for and processes only files without `.srt`. \n", " * `Latest Only`: Processes only the single newest file in the folder. \n", "* **`plain_text_download`** (Internal option): Saves the raw text body without timestamps (`.txt`). \n", "\n", "### 🚀 Optimization: Reusing Existing Files \n", "\n", "Reuse the most recently uploaded or downloaded file to significantly reduce wait times during parameter tuning. \n", "\n", "#### switching `mode` to `Upload` \n", " * **Standard**: $\\text{File Upload (60s)} + \\text{Whisper (120s)} = 180\\text{s}$ \n", " * **Reuse**: $\\text{Whisper (120s)}$ only = **120s (33% OFF!)** \n", "\n", "
\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "n1gBBE_c691M" }, "outputs": [], "source": [ "# @title 🐦 Chirp Whisper Link v5.0\n", "APP_VERSION = '5.0'\n", "LICENSE = 'PolyForm Noncommercial 1.0.0'\n", "LINK = 'github.com/neon-aiart/chirp-whisper-link'\n", "\n", "# @markdown ---\n", "model_type = \"auto\" # @param [\"auto\", \"tiny\", \"base\", \"small\", \"medium\", \"large-v3\", \"turbo\", \"systran/faster-distil-whisper-large-v3\", \"kotoba-tech/kotoba-whisper-v2.0-faster\"]\n", "initial_prompt = \"\" # @param {type:\"string\"}\n", "mode = \"GoogleDrive\" # @param [\"Upload\", \"GoogleDrive\", \"YouTube(shelved)\"]\n", "drive_folder = \"Whisper\" # @param {type:\"string\"}\n", "youtube_url = \"Youtube URL\" # shelved @param {type:\"string\"}\n", "# @markdown ---\n", "# @markdown #### 🚀 GoogleDriveモード専用設定 (Advanced)\n", "drive_batch_mode = \"未実行のみ一括処理 (Unprocessed Only)\" # @param [\"未実行のみ一括処理 (Unprocessed Only)\", \"最新の1件のみ (Latest Only)\"]\n", "# @markdown ---\n", "# @markdown #### 🛠️ オプション (Options)\n", "language = \"ja\"\n", "condition_on_previous_text = True\n", "execute_file_exists = False # @param {type:\"boolean\"}\n", "records_text_download = False # @param {type:\"boolean\"}\n", "plain_text_download = False\n", "# @markdown ---\n", "\n", "import os, sys, time, torch, gc, subprocess, warnings, re, pathlib, logging\n", "from datetime import datetime\n", "from IPython.display import clear_output, Audio, display\n", "import numpy as np\n", "\n", "# 1. 環境判定とライブラリのインポート\n", "try:\n", " from google.colab import drive, files, output, _shell, userdata\n", " IS_COLAB = True\n", "except ImportError:\n", " import locale\n", " IS_COLAB = False\n", "\n", "# パスの設定\n", "if IS_COLAB:\n", " ROOT_PATH = \"/content/\"\n", "else:\n", " ROOT_PATH = \"./\"\n", "\n", "print(f\"🐦 Chirp Whisper Link v{APP_VERSION}\")\n", "print(f\"ねおん (neon-aiart) © 2026 | {LICENSE}\")\n", "print(f\"Official: {LINK}\")\n", "print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n", "\n", "# --- ffmpeg 存在チェック関数の定義 ---\n", "def check_ffmpeg_installed():\n", " \"\"\"ローカル環境においてffmpegがインストールされているか確認する\"\"\"\n", " if IS_COLAB:\n", " return True # Google Colabは標準搭載のためチェック不要\n", "\n", " try:\n", " # ffmpeg -version を実行して存在を確認\n", " subprocess.run([\"ffmpeg\", \"-version\"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)\n", " return True\n", " except FileNotFoundError:\n", " return False\n", "\n", "# --- 実行チェック ---\n", "HAS_FFMPEG = check_ffmpeg_installed()\n", "if not HAS_FFMPEG:\n", " print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n", " print(\"❌ エラー: ffmpeg がシステムに見つかりません。\")\n", " print(\"ローカル環境で実行するには、FFmpeg のインストールとパスの設定(環境変数)が必要です。\")\n", " print(\"インストール後、ターミナル/コマンドプロンプトを再起動してから再度実行してください。\")\n", " print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n", "\n", "# 2. ライブラリ準備\n", "packages = [\"faster-whisper\", \"transformers\", \"google-genai\"]\n", "\n", "def install_packages():\n", " print(\"📦 ライブラリをインストール中...\")\n", " # IS_COLAB かどうかで pip の叩き方を変える(!pip はノートブック専用のため)\n", " if IS_COLAB:\n", " _shell.Shell().run_line_magic('pip', f'install -U {\" \".join(packages)} -q --no-cache-dir')\n", " else:\n", " # ローカル環境では OS のコマンドとして実行\n", " subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-U\"] + packages)\n", "\n", "try:\n", " from faster_whisper import WhisperModel\n", " from google import genai\n", "except ImportError:\n", " if HAS_FFMPEG:\n", " install_packages()\n", " from faster_whisper import WhisperModel\n", " from google import genai\n", "from huggingface_hub.utils import disable_progress_bars\n", "disable_progress_bars()\n", "\n", "# Hugging Faceのトークン警告を非表示にする\n", "warnings.filterwarnings(\"ignore\", category=UserWarning, module=\"huggingface_hub\")\n", "# Hugging Faceのログレベルを「ERROR」以上に設定(Warningを無視する)\n", "logging.getLogger(\"huggingface_hub\").setLevel(logging.ERROR)\n", "# 環境変数で「トークンなしでOK」と明示的に伝える\n", "os.environ[\"HF_HUB_DISABLE_SYMLINKS_WARNING\"] = \"1\"\n", "\n", "# 3. タイムゾーン・言語設定\n", "if IS_COLAB:\n", " try:\n", " browser_languages = output.eval_js('navigator.languages')\n", " # 一番上の優先言語を取得\n", " primary_lang = browser_languages[0].lower() if browser_languages else \"en\"\n", " except:\n", " primary_lang = \"en\"\n", "else:\n", " # ローカルならOSの言語設定を取得\n", " loc = locale.getdefaultlocale()[0] # 'ja_JP' などが返る\n", " primary_lang = 'ja' if loc and loc.startswith('ja') else 'en'\n", "\n", "# 第一言語が日本語(ja)で始まる場合のみ日本設定にする\n", "if primary_lang.startswith('ja'):\n", " language = \"ja\"\n", " os.environ['TZ'] = 'Asia/Tokyo'\n", "else:\n", " language = \"en\"\n", " os.environ['TZ'] = 'UTC'\n", "\n", "# Windows対策\n", "if os.name != 'nt':\n", " try:\n", " time.tzset() # タイムゾーンの設定を反映\n", " except:\n", " pass\n", "\n", "# 長いファイル名エラー対策\n", "def get_safe_filename(original_name, ext=\".srt\"):\n", " # 拡張子を除いたベース名を取得\n", " base = os.path.splitext(os.path.basename(original_name))[0]\n", " # 特殊文字を置換 & 前方の50文字にカット(OSのパス制限対策)\n", " safe_base = re.sub(r'[\\\\/:*?\"<>|]', '', base)[:50].strip()\n", " timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n", " return f\"{safe_base}_{timestamp}{ext}\"\n", "\n", "# 4. モデルのロード (Faster版)\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "if model_type == \"auto\":\n", " selected_model = \"kotoba-tech/kotoba-whisper-v2.0-faster\" if language == \"ja\" else \"turbo\"\n", "else:\n", " selected_model = model_type\n", "\n", "if 'model' not in globals() or globals().get('current_model_type') != selected_model:\n", " if HAS_FFMPEG:\n", " print(f\"📦 Whisperモデル({selected_model})を読み込み中...\")\n", " # Faster-Whisper特有の呼び出し方\n", " model = WhisperModel(selected_model, device=device, compute_type=\"float16\" if device == \"cuda\" else \"int8\", download_root=\"./models\")\n", " current_model_type = selected_model\n", " clear_output()\n", " print(f\"✅ モデル({selected_model})のロードが完了しました\")\n", "else:\n", " print(f\"⚡ モデル({selected_model})はロード済みです\")\n", "\n", "# メモリ掃除\n", "gc.collect()\n", "if torch.cuda.is_available():\n", " torch.cuda.empty_cache()\n", "\n", "# 5. 入力ソースの選択\n", "target_files = [] # 処理対象フルパスのリスト\n", "valid_ext = ('.mp3', '.mp4', '.wav', '.m4a', '.webm', '.ogg', '.flac')\n", "\n", "if mode == \"GoogleDrive\":\n", " if IS_COLAB:\n", " # 1. ドライブをマウント\n", " if not os.path.exists('/content/drive'):\n", " print(\"Google Driveをマウントしています...\")\n", " drive.mount('/content/drive')\n", "\n", " # 2. 検索パスの設定\n", " target_path = f\"/content/drive/MyDrive/{drive_folder}\"\n", " else:\n", " # ローカルなら、PC内の特定のパスをDrive代わりにする\n", " target_path = os.path.join(ROOT_PATH, drive_folder)\n", "\n", " if not os.path.exists(target_path): os.makedirs(target_path)\n", "\n", " all_drive_files = sorted([os.path.join(target_path, f) for f in os.listdir(target_path) if f.lower().endswith(valid_ext)], key=os.path.getmtime, reverse=True)\n", "\n", " if drive_batch_mode == \"未実行のみ一括処理 (Unprocessed Only)\":\n", " existing_srts = [f for f in os.listdir(target_path) if f.endswith('.srt')]\n", " for f_path in all_drive_files:\n", " f_base = os.path.splitext(os.path.basename(f_path))[0][:20] # 前方一致判定用\n", " if not any(f_base in s for s in existing_srts):\n", " target_files.append(f_path)\n", " print(f\"📂 未実行ファイル: {len(target_files)}件を検出しました\")\n", " else:\n", " if all_drive_files: target_files = [all_drive_files[0]]\n", "\n", "elif mode == \"YouTube\":\n", " try:\n", " import yt_dlp\n", " except ImportError:\n", " print(\"📦 YouTube処理用ライブラリをインストール中...\")\n", " if IS_COLAB:\n", " _shell.Shell().run_line_magic('pip', f'install -U yt-dlp -q --no-cache-dir')\n", " else:\n", " subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-U\", \"yt-dlp\"])\n", " import yt_dlp\n", "\n", " if youtube_url == \"Youtube URL\" or not youtube_url.startswith(\"http\"):\n", " print(\"⚠️ 有効なYouTube URLを入力してください\")\n", " else:\n", " ydl_opts = {\n", " 'format': 'bestaudio/best',\n", " 'outtmpl': 'youtube_audio.%(ext)s',\n", " 'noplaylist': True,\n", " 'quiet': True,\n", " 'no_warnings': True,\n", " }\n", "\n", " max_retries = 3 # 最大3回試行\n", " for attempt in range(max_retries):\n", " try:\n", " print(f\"📺 YouTubeから音声を抽出中... (試行 {attempt + 1}/{max_retries})\")\n", " with yt_dlp.YoutubeDL(ydl_opts) as ydl:\n", " info = ydl.extract_info(youtube_url, download=True)\n", " target_files = [ydl.prepare_filename(info)]\n", "\n", " # .webm などに変わっている可能性を考慮\n", " if not os.path.exists(target_files[0]):\n", " target_files = [info.get('requested_downloads', [{}])[0].get('filepath', target_files[0])]\n", " break # 成功したらループを抜ける\n", " except Exception as e:\n", " if attempt < max_retries - 1:\n", " print(f\"⚠️ 失敗しました。5秒後に再試行します... ({e})\")\n", " time.sleep(5) # 少し間を置くのがコツ\n", " else:\n", " print(f\"❌ {max_retries}回試行しましたが失敗しました\")\n", " target_files = [] # Ensure file_name is None if download fails\n", "\n", "else: # Upload / Local\n", " # フォルダ内の対象ファイルをリストアップ(更新日時が新しい順)\n", " local_files = sorted(\n", " [f for f in os.listdir('.') if f.endswith(valid_ext)],\n", " key=os.path.getmtime, reverse=True\n", " )\n", "\n", " if local_files and execute_file_exists:\n", " # ファイルが存在し、かつ execute_file_exists が True の場合のみ既存ファイルを使用\n", " target_files = [local_files[0]]\n", " else:\n", " if IS_COLAB:\n", " print(\"📂 文字起こしするファイルをアップロードしてください\")\n", " uploaded = files.upload()\n", " if uploaded:\n", " target_files = [list(uploaded.keys())[0]]\n", " else:\n", " if local_files: target_files = [local_files[0]]\n", "\n", "if not HAS_FFMPEG:\n", " target_files = []\n", "\n", "# --- Gemini 解析セクション ---\n", "def get_ai_initial_prompt(file_path):\n", " try:\n", " if IS_COLAB:\n", " api_key = userdata.get('GEMINI_API_KEY')\n", " else:\n", " api_key = os.getenv('GEMINI_API_KEY')\n", " if not api_key: return \"\"\n", "\n", " # 新SDKの初期化\n", " client = genai.Client(api_key=api_key)\n", " print(f\"💫 Gemini 3 Flash が音声を分析中...\")\n", "\n", " # 1. アップロード\n", " p = pathlib.Path(file_path)\n", " mime_map = {\".mp3\": \"audio/mpeg\", \".wav\": \"audio/wav\", \".m4a\": \"audio/mp4\", \".ogg\": \"audio/ogg\", \".flac\": \"audio/flac\"}\n", " # 拡張子からMIME取得(なければデフォルトで mpeg)\n", " current_mime = mime_map.get(p.suffix.lower(), \"audio/mpeg\")\n", "\n", " with p.open('rb') as f:\n", " audio_file = client.files.upload(\n", " file=f,\n", " config={\n", " 'display_name': 'temp_audio',\n", " 'mime_type': current_mime\n", " }\n", " )\n", "\n", " # 2. 処理待ち\n", " while audio_file.state.name == \"PROCESSING\":\n", " time.sleep(5)\n", " audio_file = client.files.get(name=audio_file.name)\n", "\n", " # 3. 解析\n", " instruction = \"\"\"\n", " Analyze this audio and create an 'initial_prompt' to improve transcription accuracy.\n", "\n", " 【Instructions】\n", " 1. Extract proper nouns (names, companies, products), technical terms, and speaker-specific speech patterns.\n", " 2. Keep the original spelling and language as heard in the audio.\n", " (e.g., Do not forcibly translate native proper nouns into English if they are in another language.)\n", " 3. Output the result as a comma-separated list within 244 tokens.\n", "\n", " Return ONLY the comma-separated keywords for the prompt. No intro or outro.\n", " \"\"\"\n", "\n", " response = client.models.generate_content(\n", " model=\"gemini-3-flash-preview\",\n", " contents=[instruction, audio_file]\n", " )\n", "\n", " # Gemini側のファイルを削除して掃除\n", " client.files.delete(name=audio_file.name)\n", "\n", " return response.text.strip()\n", " except Exception as e:\n", " print(f\"⚠️ Gemini解析スキップ: {e}\")\n", " return \"\"\n", "\n", "# 文字起こしメインループ\n", "if target_files:\n", "\n", " for idx, file_path in enumerate(target_files):\n", " current_fname = os.path.basename(file_path)\n", " print(f\"\\n🚀 [{idx+1}/{len(target_files)}] 実行中 (Fasterモード): {current_fname}\")\n", "\n", " # ユーザーが指定していない場合のみ、Geminiに助けてもらう\n", " current_prompt = initial_prompt # グローバルの設定をコピー\n", " if not current_prompt:\n", " generated_prompt = get_ai_initial_prompt(file_path)\n", " if generated_prompt:\n", " current_prompt = generated_prompt\n", " # --- 表示用:50文字ごとに改行を入れてプリント ---\n", " display_text = \"\\n \".join([current_prompt[i:i+50] for i in range(0, len(current_prompt), 50)])\n", " print(f\"✨ 生成されたプロンプト:\\n {display_text}\")\n", "\n", " # 文字起こし実行\n", " segments, info = model.transcribe(\n", " file_path,\n", " initial_prompt=current_prompt,\n", " language=None,\n", " condition_on_previous_text=condition_on_previous_text,\n", " beam_size=5,\n", " chunk_length=30, # 30秒ずつ区切って処理\n", " # --- ここからが精度のための追加設定 ---\n", " vad_filter=True, # 余計なノイズや無音をカット\n", " vad_parameters=dict(min_silence_duration_ms=500),\n", " no_speech_threshold=0.6, # 喋っていない場所を無理に訳さない\n", " compression_ratio_threshold=2.4, # 変なループ(同じ言葉の繰り返し)を防ぐ\n", " log_prob_threshold=-1.0, # 自信がない時に適当なことを言わせない\n", " max_new_tokens=128, # 1つの字幕の最大文字数を制限\n", " repetition_penalty=1.2, # ループ(繰り返し)をより厳しく抑制\n", " )\n", "\n", " results = []\n", " full_text = \"\"\n", " last_t = 0\n", "\n", " # 進捗バーをこのファイル用に作成\n", " from tqdm.notebook import tqdm\n", " pbar = tqdm(total=info.duration, unit=\"sec\", desc=f\"Progress: {current_fname[:20]}\")\n", "\n", " for segment in segments:\n", " results.append(segment)\n", " full_text += segment.text\n", " # 進捗バーを更新\n", " pbar.update(segment.end - last_t)\n", " last_t = segment.end\n", "\n", " pbar.n = pbar.total\n", " pbar.refresh()\n", " pbar.close()\n", "\n", " # 出力ファイル名の生成 (元の名 + 日時 + 各拡張子)\n", " srt_name = get_safe_filename(file_path, \".srt\")\n", " log_name = get_safe_filename(file_path, \".log\") # 議事録用\n", " txt_name = get_safe_filename(file_path, \".txt\") # プレーンテキスト用\n", "\n", " # SRTとLOGの作成\n", " srt_content = \"\"\n", " log_content = \"\"\n", " for i, seg in enumerate(results):\n", " def f_ts(s):\n", " td = time.gmtime(s)\n", " ms = int((s - int(s)) * 1000)\n", " return f\"{time.strftime('%H:%M:%S', td)},{ms:03d}\"\n", " srt_content += f\"{i+1}\\n{f_ts(seg.start)} --> {f_ts(seg.end)}\\n{seg.text.strip()}\\n\\n\"\n", " log_content += f\"[{f_ts(seg.start)[:8]}] {seg.text.strip()}\\n\"\n", "\n", " # --- 保存とダウンロード ---\n", "\n", " # 1. 保存先ディレクトリの決定\n", " # GoogleDriveモードなら指定フォルダへ、それ以外ならルートパスへ\n", " save_dir = target_path if mode == \"GoogleDrive\" else ROOT_PATH\n", "\n", " # 2. ファイル書き出し(全環境共通)\n", " # 字幕(SRT)は常に保存\n", " with open(os.path.join(save_dir, srt_name), \"w\", encoding=\"utf-8\") as f:\n", " f.write(srt_content)\n", "\n", " # 議事録(LOG)\n", " if records_text_download:\n", " with open(os.path.join(save_dir, log_name), \"w\", encoding=\"utf-8\") as f:\n", " f.write(log_content)\n", "\n", " # プレーン(TXT)\n", " if plain_text_download:\n", " with open(os.path.join(save_dir, txt_name), \"w\", encoding=\"utf-8\") as f:\n", " f.write(full_text)\n", "\n", " # 3. 完了メッセージの表示\n", " if mode == \"GoogleDrive\":\n", " print(f\"✅ Driveに保存完了: {srt_name}\")\n", " else:\n", " if not IS_COLAB:\n", " # ローカル環境の場合\n", " print(f\"✅ カレントディレクトリに保存完了: {srt_name}\")\n", "\n", " # 4. ブラウザダウンロード処理(ColabかつGoogleDriveモード以外のみ実行)\n", " if IS_COLAB and mode != \"GoogleDrive\":\n", " # 常にSRTをダウンロード\n", " files.download(srt_name)\n", "\n", " # オプションに応じてLOGとTXTも確実にダウンロード\n", " if records_text_download:\n", " files.download(log_name)\n", " if plain_text_download:\n", " files.download(txt_name)\n", "\n", " print(f\"✅ ダウンロード完了: {srt_name}\")\n", "\n", " # メモリ掃除\n", " del results, srt_content, log_content, full_text\n", " gc.collect()\n", " if torch.cuda.is_available():\n", " torch.cuda.empty_cache()\n", "\n", " # 結果表示\n", " # clear_output()\n", "\n", " # --- 🐤 Chirp Sound (完了通知音) ---\n", " sample_rate = 44100 # Hz\n", "\n", " # 音を構成する要素の数\n", " num_sparkles = 3\n", " # 各きらめき音の長さ\n", " sparkle_duration = 0.08 # 秒\n", " # 周波数の全体的な開始範囲と終了範囲\n", " base_start_frequency = 2000 # Hz\n", " base_end_frequency = 4000 # Hz (全体的に上昇するような効果)\n", " # 急速な減衰率\n", " decay_rate = 20\n", "\n", " all_data = []\n", " for i in range(num_sparkles):\n", " # 各きらめき音のタイムベクトル\n", " t_sparkle = np.linspace(0, sparkle_duration, int(sparkle_duration * sample_rate), endpoint=False)\n", "\n", " # 各きらめき音の開始周波数と終了周波数を計算\n", " # これにより、全体のきらめき音が上昇するシーケンスになります\n", " f_start_current_sparkle = base_start_frequency + (base_end_frequency - base_start_frequency) * (i / num_sparkles)\n", " f_end_current_sparkle = base_start_frequency + (base_end_frequency - base_start_frequency) * ((i + 1) / num_sparkles)\n", "\n", " # 各きらめき音内でわずかな上昇スイープを導入し、きらめき効果を強調\n", " # 実際の周波数は f_start_current_sparkle から f_end_current_sparkle まで変化します\n", " instantaneous_frequency = np.linspace(f_start_current_sparkle, f_end_current_sparkle, len(t_sparkle))\n", "\n", " # 波形を生成\n", " sparkle_wave = np.sin(2 * np.pi * instantaneous_frequency * t_sparkle)\n", "\n", " # 指数関数的減衰を適用\n", " envelope_sparkle = np.exp(-decay_rate * t_sparkle)\n", " sparkle_data = sparkle_wave * envelope_sparkle\n", "\n", " all_data.append(sparkle_data)\n", "\n", " # 各きらめき音の間に非常に短い無音を挿入(最後の音以外)\n", " # これにより、それぞれの音がはっきりと聞こえ、混ざり合うのを防ぎます\n", " if i < num_sparkles - 1:\n", " silence_duration = 0.02 # 秒\n", " silence = np.zeros(int(silence_duration * sample_rate))\n", " all_data.append(silence)\n", "\n", " data = np.concatenate(all_data)\n", "\n", " # ボリュームを正規化(クリッピング防止と音量調整)\n", " # データが空の場合はエラーにならないように処理\n", " data = data / np.max(np.abs(data)) * 0.8 if len(data) > 0 else np.array([0.0])\n", "\n", " display(Audio(data, rate=sample_rate, autoplay=True))\n", "\n", " print(\"\\n🎉 すべての処理が完了しました!\")\n", "\n", " print(\"\\n\" + \"!\"*40)\n", " print(\"⚠️ ATTENTION: PLEASE DISCONNECT RUNTIME\")\n", " print(\"⚠️ 接続解除を忘れないでください!残り時間が削られます。\")\n", " print(\"!\"*40)\n", "\n", " # YouTube用の一時ファイル削除\n", " if mode == \"YouTube\" and target_files and os.path.exists(target_files[0]):\n", " os.remove(target_files[0])\n", "else:\n", " print(\"⚠️ 処理対象のファイルがありませんでした。\")" ] } ], "metadata": { "accelerator": "GPU", "colab": { "authorship_tag": "ABX9TyPqNtfKuat0scv8xPpZ4jd7", "gpuType": "T4", "include_colab_link": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }