{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Cen6EE6H_dGo"
},
"source": [
"\n",
"🗾 使い方
\n",
"\n",
"1. ⚙️ **ランタイムのタイプを変更** \n",
" 上部メニューの「ランタイム」→「ランタイムのタイプを変更」からハードウェアを選択します \n",
" * **T4 GPU**: < おすすめ > 高速に処理できます \n",
" * **CPU**: 時間がかかってもいい場合や、GPU枠を節約したい時 \n",
"\n",
"2. 🔌 **ランタイムに接続** \n",
" 右上の「接続」をクリックして準備します \n",
"\n",
"3. 📑 **モードの選択** \n",
" **mode** を設定します: パソコン内のファイルなら `Upload` 、GoogleDriveなら `GoogleDrive`を選択 \n",
"\n",
"4. 📂 **フォルダー名を入力 (GoogleDriveモード)** \n",
" **drive_folder** にフォルダー名を入力します (初期値: `/Whisper/`) \n",
"\n",
"5. ▶️ **再生ボタンを押して実行** \n",
" セルの左側にある再生ボタンをクリックして実行します \n",
"\n",
"6. 📤 **ファイルのアップロード (Uploadモードの場合)** \n",
" 途中で「ファイル選択」ボタンが表示されるので、ファイルを選択してください \n",
"\n",
"7. 🔄 **続けて実行する場合** \n",
" 手順3に戻り、設定を変更して再度再生ボタンを押します \n",
"\n",
"8. ⚠️ **終わったら接続解除 (ゼッタイ!)** \n",
" **「ランタイムを接続解除して削除」** を必ず行ってください \n",
"\n",
" \n",
"\n",
"\n",
"🔑 Gemini API キーの設定
\n",
"\n",
"Gemini 3 Flash による「下読み」機能を有効にするために設定が必要です \n",
"\n",
"1. **APIキーを取得**: [Google AI Studio](https://aistudio.google.com/app/apikey) でキーを作成します \n",
"2. **Colabに登録**: 画面左側の **鍵アイコン(シークレット)** をクリック \n",
"3. **追加**: 名前を `GEMINI_API_KEY` とし、値を貼り付けます \n",
"4. **許可**: 「ノートブックからのアクセス」のチェックを **ON** にしてください \n",
"\n",
"> [!TIP] APIキーがなくても動作します \n",
"> キーが設定されていない場合、Geminiによる抽出プロセスのみがスキップされ、通常のWhisper文字起こしとして動作します \n",
"\n",
"### ⚠️ 無料枠での利用に関する注意 (Free Tier) \n",
"\n",
"* **データの取り扱い**: 無料枠(Free Tier)でファイルをアップロードして解析する場合、**入力データが Google のモデル改善(学習)に利用される可能性**があります \n",
"* **機密情報の扱い**: 機密性の高い音声ファイルを扱う場合は、有料枠(Pay-as-you-go)への切り替え、またはAPIキーを設定せずに実行することを検討してください \n",
"\n",
" \n",
"\n",
"\n",
"🛠️ 各モードの詳細
\n",
"\n",
"### 📥 Upload Mode \n",
"\n",
"* **手軽な実行**: 実行中に表示されるボタンからファイルを選択するだけ \n",
"* **再利用機能**: `execute_file_exists` にチェックを入れると、最後にアップロードしたファイルを再利用できます(パラメータを調整して試したい時に便利!) \n",
"* **自動ダウンロード**: 完了後、結果ファイル(`.srt` / `.log`)がブラウザから自動でダウンロードされます \n",
"\n",
"### ☁️ GoogleDrive Mode \n",
"\n",
"* **事前準備**: 実行前に、処理したいファイルを Drive 内の指定フォルダ(初期値: `/Whisper/`)に入れておいてください \n",
"* **自動保存**: 生成されたファイルは、音源と同じ Drive フォルダ内に直接保存されます \n",
"* **一括処理**: フォルダ内の未実行ファイルのみを賢く選別して、まとめて文字起こしします \n",
"\n",
"### 📄 出力ファイル \n",
"\n",
"* **字幕ファイル (`.srt`)**: 動画編集や再生プレイヤーでそのまま使える標準形式(常に生成) \n",
"* **議事録ログ (`.log`)**: タイムスタンプが記録された、内容確認に最適なテキスト(オプション) \n",
"* **プレーンテキスト (`.txt`)**: タイムスタンプなしの純粋な本文テキスト(隠しオプション) \n",
"\n",
" \n",
"\n",
"\n",
"⚙️ 設定の詳細
\n",
"\n",
"### 💫 モデルとプロンプト \n",
"\n",
"* **`model_type`** \n",
" * **`auto`**: ブラウザ言語を判定し、日本語なら `Kotoba-Whisper`、英語なら `turbo` を自動選択 \n",
" * **`turbo`**: 早くしてほしい時に \n",
" * **`large-v3`**: ガンバってほしい時に \n",
" * **`Kotoba-Whisper`**: `turbo`をベースにした高速・軽量な日本語特化モデル \n",
" * **`Distil-Whisper`**: `large-v3` をベースにした推論速度向上版 \n",
"\n",
"* **`initial_prompt`** \n",
" 特定の固有名詞や専門用語の認識、句読点、漢字の変換ミスを防ぐために事前に伝えるヒント \n",
" * **空欄の場合**: **Gemini 3 Flash** が音声を下読みし、最適なプロンプトを自動生成(APIキーが必要) \n",
"\n",
"### 🔄 動作モード \n",
"\n",
"* **`mode`**: \n",
" * `Upload`: パソコン内のファイルを読み込む(手軽な単発処理) \n",
" * `GoogleDrive`: 指定フォルダからファイルを読み込む(大量・一括処理) \n",
" * `YouTube`: (棚上げ)\n",
"* **`drive_folder`**: \n",
" * Google Drive内の対象フォルダ名(初期値: `Whisper`) \n",
"* **`execute_file_exists`** (Uploadモード専用) \n",
" * **ON**: アップロード済みの最新ファイルを再利用します \n",
" * **OFF**: 常に新しいファイルをアップロードします \n",
"\n",
"### 📄 出力オプション \n",
"\n",
"* **`records_text_download`**: タイムスタンプ付きの議事録(.log)を保存します \n",
"* **`drive_batch_mode`** (GoogleDriveモード専用): \n",
" * `未実行のみ一括処理`: まだ `.srt` が生成されていないファイルだけを探して実行します \n",
" * `最新の1件のみ`: フォルダ内の最新ファイル1つだけを処理します \n",
"* **`plain_text_download`** (隠しオプション): タイムスタンプなしの純粋なテキスト本文(`.txt`)を保存します \n",
"\n",
"### 🚀 効率化機能:既存ファイルの再利用 \n",
"\n",
"アップロード・ダウンロード済みの最新ファイルを再利用することで、パラメータ調整時の待ち時間を大幅に短縮できます \n",
"\n",
"#### `mode`を`Upload`にする \n",
" * **通常**: $\\text{File Upload (60s)} + \\text{Whisper (120s)} = 180\\text{s}$ \n",
" * **再利用モード**: $\\text{Whisper (120s)}$ only = **120s (33% OFF!)** \n",
"\n",
" \n",
"\n",
"
\n",
"\n",
"\n",
"🌎 How to Use
\n",
"\n",
"1. ⚙️ **Change Runtime Type** \n",
" Go to \"Runtime\" -> \"Change runtime type\" in the top menu and select your hardware. \n",
" * **T4 GPU**: < Recommended > For high-speed processing. \n",
" * **CPU**: If you don't mind it taking longer or want to save your GPU quota. \n",
"2. 🔌 **Connect to Runtime** \n",
" Click \"Connect\" in the top right corner to prepare the environment. \n",
"3. 📑 **Select Mode** \n",
" Set the **mode**: Select `Upload` for local files or `GoogleDrive` for Google Drive. \n",
"4. 📂 **Enter Folder Name (for GoogleDrive Mode)** \n",
" Enter your target folder name in the **drive_folder** field. (default: `/Whisper/`) \n",
"5. ▶️ **Click the Play Button** \n",
" Click the play button on the left side of the cell to start the process. \n",
"6. 📤 **Upload File (for Upload Mode)** \n",
" When the \"Choose Files\" button appears during execution, select your audio file. \n",
"7. 🔄 **To Continue** \n",
" Go back to step 3, adjust settings, and click the play button again. \n",
"8. ⚠️ **Disconnect (Crucial!)** \n",
" Always select **\"Disconnect and delete runtime\"** from the menu when finished. \n",
"\n",
" \n",
"\n",
"\n",
"🗝️ Gemini API Key Setup
\n",
"\n",
"Setup is required to enable the \"Pre-reading\" feature using Gemini 3 Flash. \n",
"\n",
"1. **Get API Key**: Create your key at [Google AI Studio](https://aistudio.google.com/app/apikey). \n",
"2. **Register in Colab**: Click the **Key icon (Secrets)** on the left sidebar. \n",
"3. **Add Secret**: Set the Name to `GEMINI_API_KEY` and paste your key into the Value. \n",
"4. **Grant Access**: Toggle the \"Notebook access\" switch to **ON**. \n",
"\n",
"> [!TIP] Works without an API Key \n",
"> If no key is set, the Gemini extraction process is skipped, and the tool functions as a standard Whisper transcription. \n",
"\n",
"### ⚠️ Precautions (Free Tier) \n",
"\n",
"* **Data Privacy**: When using the Free Tier, **your input data may be used by Google to improve their models (training)**. \n",
"* **Sensitive Information**: For highly confidential audio, consider switching to the Pay-as-you-go tier or running the tool without an API key. \n",
"\n",
" \n",
"\n",
"\n",
"🔧 Mode Details
\n",
"\n",
"### 📥 Upload Mode \n",
"\n",
"* **Easy Execution**: Simply select your file using the button that appears during execution. \n",
"* **Reuse Feature**: Checking `execute_file_exists` allows you to reuse the last uploaded file (useful for fine-tuning parameters!). \n",
"* **Auto-Download**: Result files (`.srt` / `.log`) are automatically downloaded to your browser upon completion. \n",
"\n",
"### ☁️ GoogleDrive Mode \n",
"\n",
"* **Preparation**: Before running, place your audio files in the designated Drive folder (default: `/Whisper/`). \n",
"* **Auto-Save**: Generated files are saved directly in the same Drive folder as the source audio. \n",
"* **Batch Processing**: Smartly identifies and processes only the files that haven't been transcribed yet. \n",
"\n",
"### 📄 Outputs \n",
"\n",
"* **Subtitle File (`.srt`)**: Standard format for video editing and players (always generated). \n",
"* **Transcription Log (`.log`)**: Text with timestamps, ideal for reviewing content (optional). \n",
"* **Plain Text (`.txt`)**: Pure transcript without timestamps (hidden option). \n",
"\n",
" \n",
"\n",
"\n",
"⚒️ Parameter Details
\n",
"\n",
"### 💫 Model & Prompt \n",
"\n",
"* **`model_type`** \n",
" * **`auto`**: Detects browser language. Selects `Kotoba-Whisper` for Japanese and `turbo` for English. \n",
" * **`turbo`**: Use when you want it fast. \n",
" * **`large-v3`**: Use when you want the best possible accuracy. \n",
" * **`Kotoba-Whisper`**: High-speed, lightweight model optimized for Japanese. \n",
" * **`Distil-Whisper`**: A distilled version of `large-v3` with faster inference speed. \n",
"* **`initial_prompt`** \n",
" A prompt provided in advance to improve recognition of proper nouns, technical terms, and punctuation. \n",
" * **If empty**: **Gemini 3 Flash** analyzes the audio and automatically generates the optimal prompt (Requires API key). \n",
"\n",
"### 🔄 Mode \n",
"\n",
"* **`mode`**: \n",
" * `Upload`: Process files from your computer (Single task). \n",
" * `GoogleDrive`: Process files from a specific folder (Bulk/Batch task). \n",
" * `YouTube`: (shelved) \n",
"* **`drive_folder`**: The target folder name in Google Drive (Default: `Whisper`). \n",
"* **`execute_file_exists`** (Upload mode only): \n",
" * **ON**: Reuses the most recently uploaded file. \n",
" * **OFF**: Always prompts for a new file upload. \n",
"\n",
"### 📄 Outputs Options \n",
"\n",
"* **`records_text_download`**: Saves a transcription log with timestamps (`.log`). \n",
"* **`drive_batch_mode`** (GoogleDrive mode only): \n",
" * `Unprocessed Only`: Searches for and processes only files without `.srt`. \n",
" * `Latest Only`: Processes only the single newest file in the folder. \n",
"* **`plain_text_download`** (Internal option): Saves the raw text body without timestamps (`.txt`). \n",
"\n",
"### 🚀 Optimization: Reusing Existing Files \n",
"\n",
"Reuse the most recently uploaded or downloaded file to significantly reduce wait times during parameter tuning. \n",
"\n",
"#### switching `mode` to `Upload` \n",
" * **Standard**: $\\text{File Upload (60s)} + \\text{Whisper (120s)} = 180\\text{s}$ \n",
" * **Reuse**: $\\text{Whisper (120s)}$ only = **120s (33% OFF!)** \n",
"\n",
" \n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "n1gBBE_c691M"
},
"outputs": [],
"source": [
"# @title 🐦 Chirp Whisper Link v5.0\n",
"APP_VERSION = '5.0'\n",
"LICENSE = 'PolyForm Noncommercial 1.0.0'\n",
"LINK = 'github.com/neon-aiart/chirp-whisper-link'\n",
"\n",
"# @markdown ---\n",
"model_type = \"auto\" # @param [\"auto\", \"tiny\", \"base\", \"small\", \"medium\", \"large-v3\", \"turbo\", \"systran/faster-distil-whisper-large-v3\", \"kotoba-tech/kotoba-whisper-v2.0-faster\"]\n",
"initial_prompt = \"\" # @param {type:\"string\"}\n",
"mode = \"GoogleDrive\" # @param [\"Upload\", \"GoogleDrive\", \"YouTube(shelved)\"]\n",
"drive_folder = \"Whisper\" # @param {type:\"string\"}\n",
"youtube_url = \"Youtube URL\" # shelved @param {type:\"string\"}\n",
"# @markdown ---\n",
"# @markdown #### 🚀 GoogleDriveモード専用設定 (Advanced)\n",
"drive_batch_mode = \"未実行のみ一括処理 (Unprocessed Only)\" # @param [\"未実行のみ一括処理 (Unprocessed Only)\", \"最新の1件のみ (Latest Only)\"]\n",
"# @markdown ---\n",
"# @markdown #### 🛠️ オプション (Options)\n",
"language = \"ja\"\n",
"condition_on_previous_text = True\n",
"execute_file_exists = False # @param {type:\"boolean\"}\n",
"records_text_download = False # @param {type:\"boolean\"}\n",
"plain_text_download = False\n",
"# @markdown ---\n",
"\n",
"import os, sys, time, torch, gc, subprocess, warnings, re, pathlib, logging\n",
"from datetime import datetime\n",
"from IPython.display import clear_output, Audio, display\n",
"import numpy as np\n",
"\n",
"# 1. 環境判定とライブラリのインポート\n",
"try:\n",
" from google.colab import drive, files, output, _shell, userdata\n",
" IS_COLAB = True\n",
"except ImportError:\n",
" import locale\n",
" IS_COLAB = False\n",
"\n",
"# パスの設定\n",
"if IS_COLAB:\n",
" ROOT_PATH = \"/content/\"\n",
"else:\n",
" ROOT_PATH = \"./\"\n",
"\n",
"print(f\"🐦 Chirp Whisper Link v{APP_VERSION}\")\n",
"print(f\"ねおん (neon-aiart) © 2026 | {LICENSE}\")\n",
"print(f\"Official: {LINK}\")\n",
"print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n",
"\n",
"# --- ffmpeg 存在チェック関数の定義 ---\n",
"def check_ffmpeg_installed():\n",
" \"\"\"ローカル環境においてffmpegがインストールされているか確認する\"\"\"\n",
" if IS_COLAB:\n",
" return True # Google Colabは標準搭載のためチェック不要\n",
"\n",
" try:\n",
" # ffmpeg -version を実行して存在を確認\n",
" subprocess.run([\"ffmpeg\", \"-version\"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)\n",
" return True\n",
" except FileNotFoundError:\n",
" return False\n",
"\n",
"# --- 実行チェック ---\n",
"HAS_FFMPEG = check_ffmpeg_installed()\n",
"if not HAS_FFMPEG:\n",
" print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n",
" print(\"❌ エラー: ffmpeg がシステムに見つかりません。\")\n",
" print(\"ローカル環境で実行するには、FFmpeg のインストールとパスの設定(環境変数)が必要です。\")\n",
" print(\"インストール後、ターミナル/コマンドプロンプトを再起動してから再度実行してください。\")\n",
" print(\"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\")\n",
"\n",
"# 2. ライブラリ準備\n",
"packages = [\"faster-whisper\", \"transformers\", \"google-genai\"]\n",
"\n",
"def install_packages():\n",
" print(\"📦 ライブラリをインストール中...\")\n",
" # IS_COLAB かどうかで pip の叩き方を変える(!pip はノートブック専用のため)\n",
" if IS_COLAB:\n",
" _shell.Shell().run_line_magic('pip', f'install -U {\" \".join(packages)} -q --no-cache-dir')\n",
" else:\n",
" # ローカル環境では OS のコマンドとして実行\n",
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-U\"] + packages)\n",
"\n",
"try:\n",
" from faster_whisper import WhisperModel\n",
" from google import genai\n",
"except ImportError:\n",
" if HAS_FFMPEG:\n",
" install_packages()\n",
" from faster_whisper import WhisperModel\n",
" from google import genai\n",
"from huggingface_hub.utils import disable_progress_bars\n",
"disable_progress_bars()\n",
"\n",
"# Hugging Faceのトークン警告を非表示にする\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning, module=\"huggingface_hub\")\n",
"# Hugging Faceのログレベルを「ERROR」以上に設定(Warningを無視する)\n",
"logging.getLogger(\"huggingface_hub\").setLevel(logging.ERROR)\n",
"# 環境変数で「トークンなしでOK」と明示的に伝える\n",
"os.environ[\"HF_HUB_DISABLE_SYMLINKS_WARNING\"] = \"1\"\n",
"\n",
"# 3. タイムゾーン・言語設定\n",
"if IS_COLAB:\n",
" try:\n",
" browser_languages = output.eval_js('navigator.languages')\n",
" # 一番上の優先言語を取得\n",
" primary_lang = browser_languages[0].lower() if browser_languages else \"en\"\n",
" except:\n",
" primary_lang = \"en\"\n",
"else:\n",
" # ローカルならOSの言語設定を取得\n",
" loc = locale.getdefaultlocale()[0] # 'ja_JP' などが返る\n",
" primary_lang = 'ja' if loc and loc.startswith('ja') else 'en'\n",
"\n",
"# 第一言語が日本語(ja)で始まる場合のみ日本設定にする\n",
"if primary_lang.startswith('ja'):\n",
" language = \"ja\"\n",
" os.environ['TZ'] = 'Asia/Tokyo'\n",
"else:\n",
" language = \"en\"\n",
" os.environ['TZ'] = 'UTC'\n",
"\n",
"# Windows対策\n",
"if os.name != 'nt':\n",
" try:\n",
" time.tzset() # タイムゾーンの設定を反映\n",
" except:\n",
" pass\n",
"\n",
"# 長いファイル名エラー対策\n",
"def get_safe_filename(original_name, ext=\".srt\"):\n",
" # 拡張子を除いたベース名を取得\n",
" base = os.path.splitext(os.path.basename(original_name))[0]\n",
" # 特殊文字を置換 & 前方の50文字にカット(OSのパス制限対策)\n",
" safe_base = re.sub(r'[\\\\/:*?\"<>|]', '', base)[:50].strip()\n",
" timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
" return f\"{safe_base}_{timestamp}{ext}\"\n",
"\n",
"# 4. モデルのロード (Faster版)\n",
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
"\n",
"if model_type == \"auto\":\n",
" selected_model = \"kotoba-tech/kotoba-whisper-v2.0-faster\" if language == \"ja\" else \"turbo\"\n",
"else:\n",
" selected_model = model_type\n",
"\n",
"if 'model' not in globals() or globals().get('current_model_type') != selected_model:\n",
" if HAS_FFMPEG:\n",
" print(f\"📦 Whisperモデル({selected_model})を読み込み中...\")\n",
" # Faster-Whisper特有の呼び出し方\n",
" model = WhisperModel(selected_model, device=device, compute_type=\"float16\" if device == \"cuda\" else \"int8\", download_root=\"./models\")\n",
" current_model_type = selected_model\n",
" clear_output()\n",
" print(f\"✅ モデル({selected_model})のロードが完了しました\")\n",
"else:\n",
" print(f\"⚡ モデル({selected_model})はロード済みです\")\n",
"\n",
"# メモリ掃除\n",
"gc.collect()\n",
"if torch.cuda.is_available():\n",
" torch.cuda.empty_cache()\n",
"\n",
"# 5. 入力ソースの選択\n",
"target_files = [] # 処理対象フルパスのリスト\n",
"valid_ext = ('.mp3', '.mp4', '.wav', '.m4a', '.webm', '.ogg', '.flac')\n",
"\n",
"if mode == \"GoogleDrive\":\n",
" if IS_COLAB:\n",
" # 1. ドライブをマウント\n",
" if not os.path.exists('/content/drive'):\n",
" print(\"Google Driveをマウントしています...\")\n",
" drive.mount('/content/drive')\n",
"\n",
" # 2. 検索パスの設定\n",
" target_path = f\"/content/drive/MyDrive/{drive_folder}\"\n",
" else:\n",
" # ローカルなら、PC内の特定のパスをDrive代わりにする\n",
" target_path = os.path.join(ROOT_PATH, drive_folder)\n",
"\n",
" if not os.path.exists(target_path): os.makedirs(target_path)\n",
"\n",
" all_drive_files = sorted([os.path.join(target_path, f) for f in os.listdir(target_path) if f.lower().endswith(valid_ext)], key=os.path.getmtime, reverse=True)\n",
"\n",
" if drive_batch_mode == \"未実行のみ一括処理 (Unprocessed Only)\":\n",
" existing_srts = [f for f in os.listdir(target_path) if f.endswith('.srt')]\n",
" for f_path in all_drive_files:\n",
" f_base = os.path.splitext(os.path.basename(f_path))[0][:20] # 前方一致判定用\n",
" if not any(f_base in s for s in existing_srts):\n",
" target_files.append(f_path)\n",
" print(f\"📂 未実行ファイル: {len(target_files)}件を検出しました\")\n",
" else:\n",
" if all_drive_files: target_files = [all_drive_files[0]]\n",
"\n",
"elif mode == \"YouTube\":\n",
" try:\n",
" import yt_dlp\n",
" except ImportError:\n",
" print(\"📦 YouTube処理用ライブラリをインストール中...\")\n",
" if IS_COLAB:\n",
" _shell.Shell().run_line_magic('pip', f'install -U yt-dlp -q --no-cache-dir')\n",
" else:\n",
" subprocess.check_call([sys.executable, \"-m\", \"pip\", \"install\", \"-U\", \"yt-dlp\"])\n",
" import yt_dlp\n",
"\n",
" if youtube_url == \"Youtube URL\" or not youtube_url.startswith(\"http\"):\n",
" print(\"⚠️ 有効なYouTube URLを入力してください\")\n",
" else:\n",
" ydl_opts = {\n",
" 'format': 'bestaudio/best',\n",
" 'outtmpl': 'youtube_audio.%(ext)s',\n",
" 'noplaylist': True,\n",
" 'quiet': True,\n",
" 'no_warnings': True,\n",
" }\n",
"\n",
" max_retries = 3 # 最大3回試行\n",
" for attempt in range(max_retries):\n",
" try:\n",
" print(f\"📺 YouTubeから音声を抽出中... (試行 {attempt + 1}/{max_retries})\")\n",
" with yt_dlp.YoutubeDL(ydl_opts) as ydl:\n",
" info = ydl.extract_info(youtube_url, download=True)\n",
" target_files = [ydl.prepare_filename(info)]\n",
"\n",
" # .webm などに変わっている可能性を考慮\n",
" if not os.path.exists(target_files[0]):\n",
" target_files = [info.get('requested_downloads', [{}])[0].get('filepath', target_files[0])]\n",
" break # 成功したらループを抜ける\n",
" except Exception as e:\n",
" if attempt < max_retries - 1:\n",
" print(f\"⚠️ 失敗しました。5秒後に再試行します... ({e})\")\n",
" time.sleep(5) # 少し間を置くのがコツ\n",
" else:\n",
" print(f\"❌ {max_retries}回試行しましたが失敗しました\")\n",
" target_files = [] # Ensure file_name is None if download fails\n",
"\n",
"else: # Upload / Local\n",
" # フォルダ内の対象ファイルをリストアップ(更新日時が新しい順)\n",
" local_files = sorted(\n",
" [f for f in os.listdir('.') if f.endswith(valid_ext)],\n",
" key=os.path.getmtime, reverse=True\n",
" )\n",
"\n",
" if local_files and execute_file_exists:\n",
" # ファイルが存在し、かつ execute_file_exists が True の場合のみ既存ファイルを使用\n",
" target_files = [local_files[0]]\n",
" else:\n",
" if IS_COLAB:\n",
" print(\"📂 文字起こしするファイルをアップロードしてください\")\n",
" uploaded = files.upload()\n",
" if uploaded:\n",
" target_files = [list(uploaded.keys())[0]]\n",
" else:\n",
" if local_files: target_files = [local_files[0]]\n",
"\n",
"if not HAS_FFMPEG:\n",
" target_files = []\n",
"\n",
"# --- Gemini 解析セクション ---\n",
"def get_ai_initial_prompt(file_path):\n",
" try:\n",
" if IS_COLAB:\n",
" api_key = userdata.get('GEMINI_API_KEY')\n",
" else:\n",
" api_key = os.getenv('GEMINI_API_KEY')\n",
" if not api_key: return \"\"\n",
"\n",
" # 新SDKの初期化\n",
" client = genai.Client(api_key=api_key)\n",
" print(f\"💫 Gemini 3 Flash が音声を分析中...\")\n",
"\n",
" # 1. アップロード\n",
" p = pathlib.Path(file_path)\n",
" mime_map = {\".mp3\": \"audio/mpeg\", \".wav\": \"audio/wav\", \".m4a\": \"audio/mp4\", \".ogg\": \"audio/ogg\", \".flac\": \"audio/flac\"}\n",
" # 拡張子からMIME取得(なければデフォルトで mpeg)\n",
" current_mime = mime_map.get(p.suffix.lower(), \"audio/mpeg\")\n",
"\n",
" with p.open('rb') as f:\n",
" audio_file = client.files.upload(\n",
" file=f,\n",
" config={\n",
" 'display_name': 'temp_audio',\n",
" 'mime_type': current_mime\n",
" }\n",
" )\n",
"\n",
" # 2. 処理待ち\n",
" while audio_file.state.name == \"PROCESSING\":\n",
" time.sleep(5)\n",
" audio_file = client.files.get(name=audio_file.name)\n",
"\n",
" # 3. 解析\n",
" instruction = \"\"\"\n",
" Analyze this audio and create an 'initial_prompt' to improve transcription accuracy.\n",
"\n",
" 【Instructions】\n",
" 1. Extract proper nouns (names, companies, products), technical terms, and speaker-specific speech patterns.\n",
" 2. Keep the original spelling and language as heard in the audio.\n",
" (e.g., Do not forcibly translate native proper nouns into English if they are in another language.)\n",
" 3. Output the result as a comma-separated list within 244 tokens.\n",
"\n",
" Return ONLY the comma-separated keywords for the prompt. No intro or outro.\n",
" \"\"\"\n",
"\n",
" response = client.models.generate_content(\n",
" model=\"gemini-3-flash-preview\",\n",
" contents=[instruction, audio_file]\n",
" )\n",
"\n",
" # Gemini側のファイルを削除して掃除\n",
" client.files.delete(name=audio_file.name)\n",
"\n",
" return response.text.strip()\n",
" except Exception as e:\n",
" print(f\"⚠️ Gemini解析スキップ: {e}\")\n",
" return \"\"\n",
"\n",
"# 文字起こしメインループ\n",
"if target_files:\n",
"\n",
" for idx, file_path in enumerate(target_files):\n",
" current_fname = os.path.basename(file_path)\n",
" print(f\"\\n🚀 [{idx+1}/{len(target_files)}] 実行中 (Fasterモード): {current_fname}\")\n",
"\n",
" # ユーザーが指定していない場合のみ、Geminiに助けてもらう\n",
" current_prompt = initial_prompt # グローバルの設定をコピー\n",
" if not current_prompt:\n",
" generated_prompt = get_ai_initial_prompt(file_path)\n",
" if generated_prompt:\n",
" current_prompt = generated_prompt\n",
" # --- 表示用:50文字ごとに改行を入れてプリント ---\n",
" display_text = \"\\n \".join([current_prompt[i:i+50] for i in range(0, len(current_prompt), 50)])\n",
" print(f\"✨ 生成されたプロンプト:\\n {display_text}\")\n",
"\n",
" # 文字起こし実行\n",
" segments, info = model.transcribe(\n",
" file_path,\n",
" initial_prompt=current_prompt,\n",
" language=None,\n",
" condition_on_previous_text=condition_on_previous_text,\n",
" beam_size=5,\n",
" chunk_length=30, # 30秒ずつ区切って処理\n",
" # --- ここからが精度のための追加設定 ---\n",
" vad_filter=True, # 余計なノイズや無音をカット\n",
" vad_parameters=dict(min_silence_duration_ms=500),\n",
" no_speech_threshold=0.6, # 喋っていない場所を無理に訳さない\n",
" compression_ratio_threshold=2.4, # 変なループ(同じ言葉の繰り返し)を防ぐ\n",
" log_prob_threshold=-1.0, # 自信がない時に適当なことを言わせない\n",
" max_new_tokens=128, # 1つの字幕の最大文字数を制限\n",
" repetition_penalty=1.2, # ループ(繰り返し)をより厳しく抑制\n",
" )\n",
"\n",
" results = []\n",
" full_text = \"\"\n",
" last_t = 0\n",
"\n",
" # 進捗バーをこのファイル用に作成\n",
" from tqdm.notebook import tqdm\n",
" pbar = tqdm(total=info.duration, unit=\"sec\", desc=f\"Progress: {current_fname[:20]}\")\n",
"\n",
" for segment in segments:\n",
" results.append(segment)\n",
" full_text += segment.text\n",
" # 進捗バーを更新\n",
" pbar.update(segment.end - last_t)\n",
" last_t = segment.end\n",
"\n",
" pbar.n = pbar.total\n",
" pbar.refresh()\n",
" pbar.close()\n",
"\n",
" # 出力ファイル名の生成 (元の名 + 日時 + 各拡張子)\n",
" srt_name = get_safe_filename(file_path, \".srt\")\n",
" log_name = get_safe_filename(file_path, \".log\") # 議事録用\n",
" txt_name = get_safe_filename(file_path, \".txt\") # プレーンテキスト用\n",
"\n",
" # SRTとLOGの作成\n",
" srt_content = \"\"\n",
" log_content = \"\"\n",
" for i, seg in enumerate(results):\n",
" def f_ts(s):\n",
" td = time.gmtime(s)\n",
" ms = int((s - int(s)) * 1000)\n",
" return f\"{time.strftime('%H:%M:%S', td)},{ms:03d}\"\n",
" srt_content += f\"{i+1}\\n{f_ts(seg.start)} --> {f_ts(seg.end)}\\n{seg.text.strip()}\\n\\n\"\n",
" log_content += f\"[{f_ts(seg.start)[:8]}] {seg.text.strip()}\\n\"\n",
"\n",
" # --- 保存とダウンロード ---\n",
"\n",
" # 1. 保存先ディレクトリの決定\n",
" # GoogleDriveモードなら指定フォルダへ、それ以外ならルートパスへ\n",
" save_dir = target_path if mode == \"GoogleDrive\" else ROOT_PATH\n",
"\n",
" # 2. ファイル書き出し(全環境共通)\n",
" # 字幕(SRT)は常に保存\n",
" with open(os.path.join(save_dir, srt_name), \"w\", encoding=\"utf-8\") as f:\n",
" f.write(srt_content)\n",
"\n",
" # 議事録(LOG)\n",
" if records_text_download:\n",
" with open(os.path.join(save_dir, log_name), \"w\", encoding=\"utf-8\") as f:\n",
" f.write(log_content)\n",
"\n",
" # プレーン(TXT)\n",
" if plain_text_download:\n",
" with open(os.path.join(save_dir, txt_name), \"w\", encoding=\"utf-8\") as f:\n",
" f.write(full_text)\n",
"\n",
" # 3. 完了メッセージの表示\n",
" if mode == \"GoogleDrive\":\n",
" print(f\"✅ Driveに保存完了: {srt_name}\")\n",
" else:\n",
" if not IS_COLAB:\n",
" # ローカル環境の場合\n",
" print(f\"✅ カレントディレクトリに保存完了: {srt_name}\")\n",
"\n",
" # 4. ブラウザダウンロード処理(ColabかつGoogleDriveモード以外のみ実行)\n",
" if IS_COLAB and mode != \"GoogleDrive\":\n",
" # 常にSRTをダウンロード\n",
" files.download(srt_name)\n",
"\n",
" # オプションに応じてLOGとTXTも確実にダウンロード\n",
" if records_text_download:\n",
" files.download(log_name)\n",
" if plain_text_download:\n",
" files.download(txt_name)\n",
"\n",
" print(f\"✅ ダウンロード完了: {srt_name}\")\n",
"\n",
" # メモリ掃除\n",
" del results, srt_content, log_content, full_text\n",
" gc.collect()\n",
" if torch.cuda.is_available():\n",
" torch.cuda.empty_cache()\n",
"\n",
" # 結果表示\n",
" # clear_output()\n",
"\n",
" # --- 🐤 Chirp Sound (完了通知音) ---\n",
" sample_rate = 44100 # Hz\n",
"\n",
" # 音を構成する要素の数\n",
" num_sparkles = 3\n",
" # 各きらめき音の長さ\n",
" sparkle_duration = 0.08 # 秒\n",
" # 周波数の全体的な開始範囲と終了範囲\n",
" base_start_frequency = 2000 # Hz\n",
" base_end_frequency = 4000 # Hz (全体的に上昇するような効果)\n",
" # 急速な減衰率\n",
" decay_rate = 20\n",
"\n",
" all_data = []\n",
" for i in range(num_sparkles):\n",
" # 各きらめき音のタイムベクトル\n",
" t_sparkle = np.linspace(0, sparkle_duration, int(sparkle_duration * sample_rate), endpoint=False)\n",
"\n",
" # 各きらめき音の開始周波数と終了周波数を計算\n",
" # これにより、全体のきらめき音が上昇するシーケンスになります\n",
" f_start_current_sparkle = base_start_frequency + (base_end_frequency - base_start_frequency) * (i / num_sparkles)\n",
" f_end_current_sparkle = base_start_frequency + (base_end_frequency - base_start_frequency) * ((i + 1) / num_sparkles)\n",
"\n",
" # 各きらめき音内でわずかな上昇スイープを導入し、きらめき効果を強調\n",
" # 実際の周波数は f_start_current_sparkle から f_end_current_sparkle まで変化します\n",
" instantaneous_frequency = np.linspace(f_start_current_sparkle, f_end_current_sparkle, len(t_sparkle))\n",
"\n",
" # 波形を生成\n",
" sparkle_wave = np.sin(2 * np.pi * instantaneous_frequency * t_sparkle)\n",
"\n",
" # 指数関数的減衰を適用\n",
" envelope_sparkle = np.exp(-decay_rate * t_sparkle)\n",
" sparkle_data = sparkle_wave * envelope_sparkle\n",
"\n",
" all_data.append(sparkle_data)\n",
"\n",
" # 各きらめき音の間に非常に短い無音を挿入(最後の音以外)\n",
" # これにより、それぞれの音がはっきりと聞こえ、混ざり合うのを防ぎます\n",
" if i < num_sparkles - 1:\n",
" silence_duration = 0.02 # 秒\n",
" silence = np.zeros(int(silence_duration * sample_rate))\n",
" all_data.append(silence)\n",
"\n",
" data = np.concatenate(all_data)\n",
"\n",
" # ボリュームを正規化(クリッピング防止と音量調整)\n",
" # データが空の場合はエラーにならないように処理\n",
" data = data / np.max(np.abs(data)) * 0.8 if len(data) > 0 else np.array([0.0])\n",
"\n",
" display(Audio(data, rate=sample_rate, autoplay=True))\n",
"\n",
" print(\"\\n🎉 すべての処理が完了しました!\")\n",
"\n",
" print(\"\\n\" + \"!\"*40)\n",
" print(\"⚠️ ATTENTION: PLEASE DISCONNECT RUNTIME\")\n",
" print(\"⚠️ 接続解除を忘れないでください!残り時間が削られます。\")\n",
" print(\"!\"*40)\n",
"\n",
" # YouTube用の一時ファイル削除\n",
" if mode == \"YouTube\" and target_files and os.path.exists(target_files[0]):\n",
" os.remove(target_files[0])\n",
"else:\n",
" print(\"⚠️ 処理対象のファイルがありませんでした。\")"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"authorship_tag": "ABX9TyPqNtfKuat0scv8xPpZ4jd7",
"gpuType": "T4",
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}