SPEECH_TO_TEXT_TRANSLATOR

The Python program featured in this tutorial attempts to transcribe spoken words in an MP3 audio file to text (in a resulting output plain-text file) using Python modules which are downloaded and installed on the local machine which executes that Python program.

The example MP3 file which was used to generate the resulting output file which is featured on this web page was derived from the MP4 video file which is featured on this web page (and which features karbytes speaking the words which were verbatim transcribed by the Python program into the text in the resulting output file featured on this web page). That MP3 file was obtained by converting that MP4 file to an MP3 file using the Python program featured on the tutorial web page at the following Uniform Resource Locator:

https://karbytesforlifeblog.wordpress.com/mp4_to_mp3_converter/

To view hidden text inside each of the preformatted text boxes below, scroll horizontally.

SOFTWARE_APPLICATION_COMPONENTS

python_source_file: https://raw.githubusercontent.com/karlinarayberinger/KARLINA_OBJECT_extension_pack_24/main/speech_to_text.py

input_audio_source_video_file: https://raw.githubusercontent.com/karlinarayberinger/KARLINA_OBJECT_extension_pack_24/main/input_audio_source_karbytes_01november2024.mp4

output_plain_text_file: https://raw.githubusercontent.com/karlinarayberinger/KARLINA_OBJECT_extension_pack_24/main/output_text.txt

PROGRAM_INTERPRETATION_AND_EXECUTION

STEP_0: Copy and paste the Python source code into a new text editor document and save that document as the following file name:

speech_to_text.py

STEP_1: Install pip (which, for karbytes, required using a virtual environment in which to install and run pip-installed Python modules) using the following Unix commands.

(The purpose of the virtual environment is to prevent conflicts which may arise from different processes using the same computing resources. In this case, the virtual environment enables the user to manage pip-installed packages independently of the system Python environment).

(Note also that installing the virtual environment automatically installs pip with it).

1.0: Create the virtual environment:

python3 -m venv myenv

1.1: Activate the virtual environment:

source myenv/bin/activate

1.2: Install SpeechRecognition within the Virtual Environment:

python3 -m pip show SpeechRecognition

1.3: Install FFmpeg using the following commands (in top-down order):

sudo apt update

sudo apt install ffmpeg

1.4: Use the Virtual Environment:

While the virtual environment is active, any Python commands will use the packages installed in myenv. To deactivate the virtual environment, use the following command:

deactivate

STEP_2: While the virtual environment is active, set the current directory to wherever the Python program file is located on the local machine (e.g. Desktop).

cd Desktop

STEP_3: Run the program by entering the following command:

python3 speech_to_text.py

STEP_4: If the program interpretation command does not work, then use the following commands (in top-down order) to install the Python interpreter:

sudo apt update

sudo apt install python3

STEP_5: Observe program results in the file directory where the input audio file is located. The program should have created a plain-text output file in that directory which is an approximate transcription of the audio file if the audio file was not too long in sound-track duration and contained machine-comprehensible words.

PROGRAM_SOURCE_CODE

Note that the text inside of each of the the preformatted text boxes below appears on this web page (while rendered correctly by the web browser) to be identical to the content of that preformatted text box text’s respective plain-text file or source code output file (whose Uniform Resource Locator is displayed as the green hyperlink immediately above that preformatted text box (if that hyperlink points to a source code file) or whose Uniform Resource Locator is displayed as the orange hyperlink immediately above that preformatted text box (if that hyperlink points to a plain-text file)).

(Note that angle brackets which resemble HTML tags (i.e. an “is less than” symbol (i.e. ‘<‘) followed by an “is greater than” symbol (i.e. ‘>’)) displayed on this web page have been replaced (at the source code level of this web page) with the Unicode symbols U+003C (which is rendered by the web browser as ‘<‘) and U+003E (which is rendered by the web browser as ‘>’). That is because the WordPress web page editor or web browser interprets a plain-text version of an “is less than” symbol followed by an “is greater than” symbol as being an opening HTML tag (which means that the WordPress web page editor or web browser deletes or fails to display those (plain-text) inequality symbols and the content between those (plain-text) inequality symbols)).

python_source_file: https://raw.githubusercontent.com/karlinarayberinger/KARLINA_OBJECT_extension_pack_24/main/speech_to_text.py

#########################################################################################
# file: speech_to_text.py
# type: Python
# date: 01_NOVEMBER_2024
# author: karbytes
# license: PUBLIC_DOMAIN 
#########################################################################################
import speech_recognition as sr
from pydub import AudioSegment

def mp3_to_text():
    # Hardcoded file paths
    mp3_file_path = "input_audio.mp3"
    output_file_path = "output_text.txt"
    
    # Convert MP3 to WAV
    audio = AudioSegment.from_mp3(mp3_file_path)
    wav_file_path = "temp_audio.wav" # user can manually delete or keep the intermediary WAV file
    audio.export(wav_file_path, format="wav")

    # Initialize recognizer
    recognizer = sr.Recognizer()

    # Load the WAV file and transcribe it
    with sr.AudioFile(wav_file_path) as source:
        audio_data = recognizer.record(source)
        try:
            text = recognizer.recognize_google(audio_data)
            print("Transcription successful!")

            # Write transcription to a text file
            with open(output_file_path, "w") as text_file:
                text_file.write(text)

            print(f"Transcription saved to {output_file_path}")

        except sr.UnknownValueError:
            print("Could not understand audio")
        except sr.RequestError as e:
            print(f"Could not request results from Speech Recognition service; {e}")

if __name__ == "__main__":
    mp3_to_text()

This web page was last updated on 07_NOVEMBER_2024. The content displayed on this web page is licensed as PUBLIC_DOMAIN intellectual property.