--- title: Clone any voice with a 15-second sample date: "2024-10-24T01:36:21Z" lastmod: "2024-10-24T01:36:58Z" categories: - coding - llms wp_id: 3667 description: "F5-TTS makes voice cloning practical from a 15-second sample, opening up lightweight workflows for audiobooks, IVR, and easily editable narrated presentations." keywords: [voice cloning, F5-TTS, text to speech, audio generation, Colab, presentations] --- 
It's surprisingly easy to clone a voice using F5-TTS: "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching".
Here's a clip of me, saying:
I think Taylor Swift is the best singer. I've attended every one of her concerts and in fact, I've even proposed to her once. Don't tell anyone.
(Which is ironic since I didn't know who she was until this year and I still haven't seen or heard her.)
You'll notice that my voice is a bit monotic. That's because I trained it on a segment of my talk that's monotonic.
Here's the code. You can run this on Google Colab for free.
A few things to keep in mind when preparing the audio.
input.txt manually to get it right, though Whisper is fine to clone in bulk. (But then, who are you and what are you doing?)This has a number of uses I can think of (er... ChatGPT can think of), but the ones I find most interesting are: