{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PraatIO - doing speech analysis with Python\n", "\n", "*An introduction and tutorial*" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "
\n", "TABLE OF CONTENTS\n", "\n", "**An introduction**\n", "- What is Praat?\n", "- TextGrids, IntervalTiers, and PointTiers\n", "- The physical textgrid\n", "- What is PraatIO\n", "- What are some uses of PraatIO?\n", "\n", "**A tutorial**\n", "- Installing PraatIO\n", "- Creating bare textgrid files\n", " - Example: Creating blank textgrids from audio recordings\n", "- Opening TextGrids\n", "- Getting information from TextGrids and Tiers\n", "- Modifying TextGrids and Tiers with new()\n", "- Working with Textgrid objects\n", " - Example: Extracting time series data from specific intervals\n", "- Cropping TextGrids\n", "- Working with Tier objects\n", "- Operations between tiers\n", "\n", "Summary\n", "\n", "Beyond PraatIO\n", "- ProMo\n", "- Pysle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "## An introduction\n", "\n", "\n", "### What is Praat?\n", "\n", "**PraatIO** or *Praat Input and Output* was originally conceived as a way to work with **Praat** from within python.\n", "\n", "**Praat** ([http://www.fon.hum.uva.nl/praat/](http://www.fon.hum.uva.nl/praat/)) is a freely available tool for doing speech and phonetic analysis. It has a spectrogram visualization tool with overlays of information like pitch track and intensity. This visualization is paired with an editor for transcribing speech and for analyzing speech. It also has tools for extracting acoustic parameters of speech, generating waveforms, and resynthesizing speech. It is a comprehensive and indispensible tool for speech scientists. \n", "\n", "Praat comes with its own scripting language for automating tasks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### TextGrids, IntervalTiers, and PointTiers\n", "\n", "The heart of any speech analysis is a transcript. Praat calls its transcript files TextGrids, and the same terminology is used here.\n", "\n", "More specifically, a TextGrid, used by Praat or PraatIO, is a collection of independent annotation analyses for a given audio recording. Each layer of analysis is known as a tier. There are two kinds of tiers: IntervalTiers and PointTiers. **IntervalTiers** are used to annotate events that have duration. Like syllables, words, or utterances. **PointTiers** are used for annotating instaneous events. Like places where the audio clipped. The peak of a pitch contour. Or the sound of a clap. Etc.\n", "\n", "*Below is a sample textgrid as seen in Praat, with accompanying wavfile.* In this example the textgrid contains two interval tiers and a point tier (named 'phone', 'word', and 'maxF0' respectively). 'phone' marks the phonemes--the consonants and vowels of the words. 'word' indicates the word boundaries. And 'maxF0' marks the highest peaks of the pitch contour (the blue curve superimposed over the spectrogram) for each word.\n", "\n", "![praat_tiers.png](./resources/praat_tiers.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### The physical textgrid\n", "\n", "Praat has its own plain text format for working TextGrids. It's an easy to read and easy to parse. Here is a small snippet. No magic or wizardy. The TextGrid has a few properties defined--min and max times (the start and end of textgrid with respect to the audio file) and the number of tiers. Then the tiers are presented in order of appearance. They too have a few properties defined, followed by the intervals and points that they contain. And thats it!\n", "\n", "There is a more condensed version of the TextGrid which contains the same information but without the extraneous text that makes the below example so easy to read. I won't cover that here.\n", "\n", "```\n", "File type = \"ooTextFile\"\n", "Object class = \"TextGrid\"\n", "\n", "xmin = 0 \n", "xmax = 1.869687 \n", "tiers? \n", "size = 3 \n", "item []: \n", " item [1]:\n", " class = \"IntervalTier\" \n", " name = \"phone\" \n", " xmin = 0 \n", " xmax = 1.869687 \n", " intervals: size = 16 \n", " intervals [1]:\n", " xmin = 0 \n", " xmax = 0.3154201182247563 \n", " text = \"\" \n", " intervals [2]:\n", " xmin = 0.3154201182247563 \n", " xmax = 0.38526757369599995 \n", " text = \"m\" \n", " intervals [3]:\n", " xmin = 0.38526757369599995 \n", " xmax = 0.4906833231456586 \n", " text = \"ə\" \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### What is PraatIO\n", "\n", "I think python should have its own library for doing robust speech analysis. Thats where praatIO comes in. \n", "\n", "PraatIO is *not* a python implementation of Praat or an interface to Praat. PraatIO *is* a pure python library that contains a robust toolset for creating, querying, and modifying textgrid annotations. It also comes with a diverse array of tools that use these annotations to modify speech or extract information from speech. It depends on Praat for some but not all of its functionality." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### What are some uses of PraatIO?\n", "\n", "* Creating textgrids for annotation in a consistent manner\n", "\n", "* Extracting pitch, intensity, and duration from labeled regions of interest\n", "\n", "* Extracting user-made annotations or verifying user-made annotations\n", "\n", "* Extracting subsegments of recordings, substituting segments with other segments, making supercuts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "## A tutorial\n", "\n", "\n", "### Installing PraatIO\n", "\n", "Before we can run PraatIO, we need to install it. It can be installed easily enough using pip like so. For other installation options, see the main github page for praatio." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: praatio in /Users/tmahrt/.pyenv/versions/3.10.0/lib/python3.10/site-packages (6.0.0)\n", "Requirement already satisfied: typing-extensions in /Users/tmahrt/.pyenv/versions/3.10.0/lib/python3.10/site-packages (from praatio) (4.0.1)\n" ] } ], "source": [ "%pip install praatio --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Creating bare textgrid files\n", "\n", "The code for working with textgrids is contained in textgrid.py. This file does not store the classes we will be working with, but it does expose them for convenient access. There are three classes of particular interest and some functions. Let's start with the basics.\n", "\n", "TextGrid, IntervalTier, and PointTier are the three classes we'll be using a lot in textgrid.py\n", "\n", "For our first, most basic example, we create a TextGrid with a blank IntervalTier and a blank PointTier--ripe for annotating. A minimal Textgrid needs at least one tier but can have as many as you want." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from praatio import textgrid\n", "\n", "# Textgrids take no arguments--it gets all of its necessary attributes from the tiers that it contains.\n", "tg = textgrid.Textgrid()\n", "\n", "# IntervalTiers and PointTiers take four arguments: the tier name, a list of intervals or points,\n", "# a starting time, and an ending time.\n", "wordTier = textgrid.IntervalTier('words', [], 0, 1.0)\n", "maxF0Tier = textgrid.PointTier('maxF0', [], 0, 1.0)\n", "\n", "tg.addTier(wordTier)\n", "tg.addTier(maxF0Tier)\n", "\n", "tg.save(\"empty_textgrid.TextGrid\", format=\"short_textgrid\", includeBlankSpaces=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Example: Creating blank textgrids from audio recordings\n", "\n", "\n", "The above example gets the job done, but it's frankly not very useful. What about a basic example that does something we would actually need?\n", "\n", "**One problem** with the above example is the ending time. It's 1.0. More often then not, we don't know the exact length of an audio file. The start and end time are actually optional arguments to the constructor. If not supplied, praat will get them from the min and max values in the list of intervals or points (not generally recommended). Alternatively, one can supply a wave file to set the ending time of the tier.\n", "\n", "**Scenario:** You have a large corpus of speech recordings--telephone conversations. You are coordinating a team of annotators who will transcribe the words in the corpus. Rather than have the annotators create textgrids from scratch, you use praatio to generate skelaton textgrids that they can fill in themselves." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "bobby.TextGrid\n", "mary_300hz_high_pass_filtered.TextGrid\n", "damon_set_test.TextGrid\n", "mary_segment.TextGrid\n", "mary.TextGrid\n" ] } ], "source": [ "import os\n", "from os.path import join\n", "\n", "from praatio import textgrid\n", "from praatio import audio\n", "\n", "inputPath = join('..', 'examples', 'files')\n", "outputPath = join(inputPath, \"generated_textgrids\")\n", "\n", "if not os.path.exists(outputPath):\n", " os.mkdir(outputPath)\n", "\n", "for fn in os.listdir(inputPath):\n", " name, ext = os.path.splitext(fn)\n", " if ext != \".wav\":\n", " continue\n", " \n", " duration = audio.getDuration(join(inputPath, fn))\n", " wordTier = textgrid.IntervalTier('words', [], 0, duration)\n", " \n", " tg = textgrid.Textgrid()\n", " tg.addTier(wordTier)\n", " tg.save(join(outputPath, name + \".TextGrid\"), format=\"short_textgrid\", includeBlankSpaces=True)\n", "\n", "# Did it work?\n", "for fn in os.listdir(outputPath):\n", " ext = os.path.splitext(fn)[1]\n", " if ext != \".TextGrid\":\n", " continue\n", " print(fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bravo! You've saved your colleagues the tedium of creating empty textgrids for each wav file from scratch and removed one vector of human error from your workflow.\n", "\n", "There are more things we can do with bare textgrid files, but for now let's move on to how to work with existing textgrid files in praatio.\n", "\n", "
\n", "\n", "### Opening TextGrids\n", "\n", "We know how to save textgrids. Now lets learn how to open them." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "from os.path import join\n", "\n", "from praatio import textgrid\n", "\n", "inputFN = join('..', 'examples', 'files', 'mary.TextGrid')\n", "\n", "tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False) # Give it a file name, get back a Textgrid object" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Getting information from TextGrids and Tiers\n", "\n", "So we've opened a TextGrid file. What happens next? A textgrid is just a container for tiers. So after opening a textgrid, probably the first thing you'll want to do is access the tiers.\n", "\n", "A TextGrid's tiers are stored in a dictionary called **tierDict**. The names of the tiers, and their order in the TextGrid, are stored in **tierNames**." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phone', 'word', 'pitch')\n", "\n" ] } ], "source": [ "# What tiers are stored in this textgrid?\n", "print(tg.tierNames)\n", "\n", "# It's possible to access the tiers by their position in the TextGrid\n", "# (i.e. the order they were added in)\n", "firstTier = tg.getTier(tg.tierNames[0])\n", "\n", "# Or by their names\n", "wordTier = tg.getTier('word')\n", "\n", "print(firstTier)\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so with the TextGrid, we got a Tier. What happens next? Most of the time, you'll be accessing the intervals or points stored in the tier. These are stored in the **entries**.\n", "\n", "For a pointTier, `entries` looks like:\n", "\n", " [(time, labe1), (time, label), ...]\n", "\n", "While for an intervalTier, `entries` looks like:\n", "\n", " [(start, end, label), (start, end, label), ...]" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('hello', 'hello')\n", "('55', '55')\n" ] } ], "source": [ "# A PointTier's entries are all Points and an IntervalTier's entries are all Intervals.\n", "# Both types of entries are named tuples, so they can be accessed like a tuple or like an object.\n", "from praatio.utilities.constants import Interval, Point\n", "\n", "interval = Interval(0, 1, \"hello\")\n", "print((interval.label, interval[2]))\n", "\n", "point = Point(0.5, \"55\")\n", "print((point.label, point[1]))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['mary', 'rolled', 'the', 'barrel']\n", "[0.36012987312514183, 0.3083570381281018, 0.07981859410500003, 0.4545282708797298]\n" ] } ], "source": [ "# I just want the labels from the entries\n", "labelList = [entry.label for entry in wordTier.entries]\n", "print(labelList)\n", "\n", "# Get the duration of each interval\n", "# (in this example, an interval is a word, so this outputs word duration)\n", "durationList = []\n", "for start, stop, _ in wordTier.entries:\n", " durationList.append(stop - start)\n", "\n", "print(durationList)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "I use this idiom--open textgrid, get target tier, and forloop through the entries--on a regular basis. For clarity, here the whole idiom is presented in a concise example" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "From:0.315420, To:0.675550, mary\n", "From:0.675550, To:0.983907, rolled\n", "From:0.983907, To:1.063726, the\n", "From:1.063726, To:1.518254, barrel\n" ] } ], "source": [ "# Print out each interval on a separate line\n", "from os.path import join\n", "from praatio import textgrid\n", "\n", "inputFN = join('..', 'examples', 'files', 'mary.TextGrid')\n", "tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)\n", "tier = tg.getTier('word')\n", "for start, stop, label in tier.entries:\n", " print(\"From:%f, To:%f, %s\" % (start, stop, label))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Modifying TextGrids and Tiers with new()\n", "\n", "Textgrids and tiers come with a **new()** function. This function gives you a new copy of the current instance. The new() function takes the same arguments as the constructor for the object, *except that with new() the arguments are all optional*. Any arguments you don't provide, will be copied over from the original instance." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Sometimes you just want to have two copies of something\n", "newTG = tg.new()\n", "newTier = tier.new()\n", "\n", "# emptiedTier and renamedTier are the same as tier, except for the parameter specified in .new()\n", "emptiedTier = tier.new(entries=[]) # Remove all entries in the entry list\n", "renamedTier = tier.new(name=\"lexical items\") # Rename the tier to 'lexical items'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Working with Textgrid objects\n", "\n", "TextGrids are containers that store tiers. They come with some methods to help manage the state of the Textgrid." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phone', 'word', 'pitch')\n" ] } ], "source": [ "# Let's reload everything\n", "from os.path import join\n", "from praatio import textgrid\n", "\n", "inputFN = join('..', 'examples', 'files', 'mary.TextGrid')\n", "tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)\n", "\n", "# Ok, what were our tiers?\n", "print(tg.tierNames)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phone', 'word', 'pitch', 'utterance')\n" ] } ], "source": [ "# We've already seen how to add a new tier to a TextGrid\n", "# Here we add a new tier, 'utterance', which has one entry that spans the length of the textgrid\n", "utteranceTier = textgrid.IntervalTier(name='utterance', entries=[('0', tg.maxTimestamp, 'mary rolled the barrel'), ],\n", " minT=0, maxT=tg.maxTimestamp)\n", "tg.addTier(utteranceTier)\n", "print(tg.tierNames)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phone', 'pitch', 'utterance')\n", "\n" ] } ], "source": [ "# Maybe we decided that we don't need the phone tier. We can remove it using the tier's name.\n", "# The remove function returns the removed tier, in case you want to do something with it later.\n", "wordTier = tg.removeTier('word')\n", "print(tg.tierNames)\n", "print(wordTier)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('word', 'pitch', 'utterance')\n" ] } ], "source": [ "# We can also replace one tier with another like so (preserving the order of the tiers)\n", "tg.replaceTier('phone', wordTier)\n", "print(tg.tierNames)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('lexical items', 'pitch', 'utterance')\n" ] } ], "source": [ "# Or rename a tier\n", "tg.renameTier('word', 'lexical items')\n", "print(tg.tierNames)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above featured functions are perhaps the most useful functions in praatio. But there are some other functions which I'll mention briefly here. \n", "\n", "**tg.appendTextgrid(tg2)** will append tg2 to the end of tg--modifying all of the times in tg2 so that they appear chronologically after tg.\n", "\n", "**tg.eraseRegion(start, stop, doShrink)** will erase a segment of a textgrid. The erased segment can be left blank, or the textgrid can shrink\n", "\n", "**tg.insertSpace(start, duration, collisionCode)** will insert a blank segment into a textgrid. collisionCode determines what happens to segments that span the start location of the insertion.\n", "\n", "**tg.mergeTiers()** will merge all tiers into a single tier. Overlapping intervals on different tiers will have their labels combined by this process.\n", "\n", "
\n", "\n", "### Cropping TextGrids\n", "\n", "The last function to look at with TextGrids is also a very useful and powerful one. Using **crop()** you can get a...well...cropped TextGrid.\n", "\n", "Crop() takes five arguments:\n", "Crop(startTime, endTime, strictFlag, softFlag, rebaseToZero)\n", "\n", "**startTime**, **endTime** - these are the start and end times that define the crop region. Simple enough\n", "\n", "**strictFlag** - if True, only wholly contained intervals will be included in the output. Intervals that are partially contained are not included. If False, all intervals that are at least partially contained will be included in the cropped textgrid.\n", "\n", "**softFlag** - if False, the crop boundaries are firm. if True and strictFlag is False, partially contained boundaries will extend the boundaries of the crop interval\n", "\n", "**rebaseToZero** - if True, the entry time values will be subtracted by startTime.\n", "\n", "Let's see the effects of these different arguments:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='mary'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'), Interval(start=1.063725623583, end=1.5182538944627297, label='barrel'))\n", "(Interval(start=0.0, end=1.869687, label='mary rolled the barrel'),)\n", "Start time: 0.000000\n", "End time: 1.869687\n" ] } ], "source": [ "# Let's start by observing the pre-cropped entry lists\n", "wordTier = tg.getTier('lexical items')\n", "print(wordTier.entries)\n", "\n", "utteranceTier = tg.getTier('utterance')\n", "print(utteranceTier.entries)\n", "print(\"Start time: %f\" % wordTier.minTimestamp)\n", "print(\"End time: %f\" % utteranceTier.maxTimestamp)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.0, end=0.17554999134989813, label='mary'), Interval(start=0.17554999134989813, end=0.4839070294779999, label='rolled'), Interval(start=0.4839070294779999, end=0.5, label='the'))\n", "(Interval(start=0.0, end=0.5, label='mary rolled the barrel'),)\n", "Start time: 0.000000\n", "End time: 0.500000\n" ] } ], "source": [ "# Now let's crop and see what changes!\n", "# Crop takes four arguments\n", "# If mode is 'truncated', all intervals contained within the crop region will appear in the\n", "# returned TG--however, intervals that span the crop region will be truncated to fit within\n", "# the crop region\n", "# If rebaseToZero is True, the times in the textgrid are recalibrated with the start of\n", "# the crop region being 0.0s\n", "croppedTG = tg.crop(0.5, 1.0, mode='truncated', rebaseToZero=True)\n", "\n", "wordTier = croppedTG.getTier('lexical items')\n", "print(wordTier.entries)\n", "\n", "utteranceTier = croppedTG.getTier('utterance')\n", "print(utteranceTier.entries)\n", "print(\"Start time: %f\" % croppedTG.minTimestamp)\n", "print(\"End time: %f\" % croppedTG.maxTimestamp)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.5, end=0.6755499913498981, label='mary'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.0, label='the'))\n", "(Interval(start=0.5, end=1.0, label='mary rolled the barrel'),)\n", "Start time: 0.500000\n", "End time: 1.000000\n" ] } ], "source": [ "# If rebaseToZero is False, the values in the cropped textgrid will be what they were in the\n", "# original textgrid (but without values outside the crop region)\n", "# Compare the output here with the output above\n", "croppedTG = tg.crop(0.5, 1.0, mode='truncated', rebaseToZero=False)\n", "\n", "wordTier = croppedTG.getTier('lexical items')\n", "print(wordTier.entries)\n", "\n", "utteranceTier = croppedTG.getTier('utterance')\n", "print(utteranceTier.entries)\n", "print(\"Start time: %f\" % croppedTG.minTimestamp)\n", "print(\"End time: %f\" % croppedTG.maxTimestamp)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'),)\n", "()\n", "Start time: 0.500000\n", "End time: 1.000000\n" ] } ], "source": [ "# If mode is 'strict', only wholly contained intervals will be included in the output.\n", "# Compare this with the previous result\n", "croppedTG = tg.crop(0.5, 1.0, mode='strict', rebaseToZero=False)\n", "\n", "# Let's start by observing the pre-cropped entry lists\n", "wordTier = croppedTG.getTier('lexical items')\n", "print(wordTier.entries)\n", "\n", "utteranceTier = croppedTG.getTier('utterance')\n", "print(utteranceTier.entries)\n", "print(\"Start time: %f\" % croppedTG.minTimestamp)\n", "print(\"End time: %f\" % croppedTG.maxTimestamp)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='mary'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'))\n", "(Interval(start=0.0, end=1.869687, label='mary rolled the barrel'),)\n", "Start time: 0.000000\n", "End time: 1.869687\n" ] } ], "source": [ "# If mode is 'lax', partially contained intervals will be wholly contained in the outpu.\n", "# Compare this with the previous result\n", "croppedTG = tg.crop(0.5, 1.0, mode='lax', rebaseToZero=False)\n", "\n", "# Let's start by observing the pre-cropped entry lists\n", "wordTier = croppedTG.getTier('lexical items')\n", "print(wordTier.entries)\n", "\n", "utteranceTier = croppedTG.getTier('utterance')\n", "print(utteranceTier.entries)\n", "print(\"Start time: %f\" % croppedTG.minTimestamp)\n", "print(\"End time: %f\" % croppedTG.maxTimestamp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Working with Tiers\n", "\n", "Textgrids alone aren't very useful for working with data. The real utility is in the tiers contained in the textgrids. In this section we'll learn about functions that help us work with IntervalTiers and PointTiers. \n", "\n", "We'll start with the easy stuff. **eraseRegion()**, **insertSpace()**, **crop()**, and **editTimestamps()** are back and they work exactly the same on tiers as they did for textgrids. If you only need to modify one tier, it's better to take advantage of this feature, rather than modifying a whole textgrid.\n", "\n", "Now let's move on to some of the functions that are unique to tiers. We'll start with **deleteEntry()** and **insertEntry()**\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phone', 'word', 'pitch')\n" ] } ], "source": [ "# Let's reload everything, just as before\n", "from os.path import join\n", "from praatio import textgrid\n", "\n", "inputFN = join('..', 'examples', 'files', 'mary.TextGrid')\n", "tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)\n", "\n", "# Ok, what are our tiers?\n", "print(tg.tierNames)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='mary'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'), Interval(start=1.063725623583, end=1.5182538944627297, label='barrel'))\n", "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='bloop'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='bloop'), Interval(start=0.9839070294779999, end=1.063725623583, label='bloop'), Interval(start=1.063725623583, end=1.5182538944627297, label='bloop'))\n" ] } ], "source": [ "# The `entries`, which holds the tier point or interval data, is the heart of the tier.\n", "# Recall the 'new()' function, if you want to modify all of the entries in a tier at once\n", "wordTier = tg.getTier('word')\n", "newEntries = [(start, stop, 'bloop') for start, stop, label in wordTier.entries]\n", "newWordTier = wordTier.new(entries=newEntries)\n", "print(wordTier.entries)\n", "print(newWordTier.entries)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'), Interval(start=1.063725623583, end=1.5182538944627297, label='barrel'))\n" ] } ], "source": [ "# If, however, we only want to modify a few entries, there are some functions for doing so\n", "\n", "# deleteEntry() takes an entry and deletes it\n", "maryEntry = wordTier.entries[0]\n", "wordTier.deleteEntry(maryEntry)\n", "print(wordTier.entries)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='mary'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'), Interval(start=1.063725623583, end=1.5182538944627297, label='barrel'))\n", "\n", "(Interval(start=0.3154201182247563, end=0.6755499913498981, label='bob'), Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled'), Interval(start=0.9839070294779999, end=1.063725623583, label='the'), Interval(start=1.063725623583, end=1.5182538944627297, label='barrel'))\n" ] } ], "source": [ "# insertEntry() does the opposite of deleteEntry.\n", "wordTier.insertEntry(maryEntry)\n", "print(wordTier.entries)\n", "print()\n", "\n", "# you can also set the collision code to 'merge' or 'replace' to set the behavior in the event an entry already exists\n", "# And the collisionReportingMode can be used to have warnings printed out when a collision occurs\n", "wordTier.insertEntry((maryEntry[0], maryEntry[1], 'bob'), collisionMode='replace', collisionReportingMode='silence')\n", "print(wordTier.entries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "The next two functions are very useful when working in conjunction with other numerical data:\n", "\n", "**IntervalTier.getValuesInIntervals()** and **PointTier.getValuesAtPoints()**\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interval(start=0.3154201182247563, end=0.6755499913498981, label='bob')\n", "[(0.4, 210), (0.5,), (0.6, 154)]\n", "\n", "Interval(start=0.6755499913498981, end=0.9839070294779999, label='rolled')\n", "[(0.7, 181), (0.8, 110), (0.9, 203)]\n", "\n", "Interval(start=0.9839070294779999, end=1.063725623583, label='the')\n", "[(1.0, 240)]\n", "\n", "Interval(start=1.063725623583, end=1.5182538944627297, label='barrel')\n", "[]\n", "\n" ] } ], "source": [ "# Let's say we have some time series data\n", "# Where the data is organized as [(timeV1, dataV1a, dataV1b, ...), (timeV2, dataV2a, dataV2b, ...), ...]\n", "dataValues = [(0.1, 15), (0.2, 98), (0.3, 105), (0.4, 210), (0.5, ),\n", " (0.6, 154), (0.7, 181), (0.8, 110), (0.9, 203), (1.0, 240)]\n", "\n", "# Often times when working with such data, we want to know which data\n", "# corresponds to certain speech events\n", "# e.g. what was the max pitch during the stressed vowel of a particular word etc...\n", "intervalDataList = wordTier.getValuesInIntervals(dataValues)\n", "\n", "# The returned list is of the form [(interval1, )]\n", "for interval, subDataList in intervalDataList:\n", " print(interval)\n", " print(subDataList)\n", " print()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Often times, we only want to limit the entries that we analyze.\n", "\n", "**crop()**, which we've already seen, allows us to limit by time. **find()** allows us to limit by label.\n", "\n", "find() returns the index of any matches in the entries. It takes one required argument: the string to match and two optional arguments: a flag for allowing partial matches, and a flag for running searches as a regular expression" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Interval(start=0.3154201182247563, end=0.6755499913498981, label='bob')\n" ] } ], "source": [ "bobWordIList = wordTier.find('bob')\n", "bobWord = wordTier.entries[bobWordIList[0]]\n", "print(bobWord)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "#### Example: Extracting time series data from specific intervals\n", "\n", "To end this subsection, let's try another real life example using what we've learned. For this, we're going to learn one praatio tool that fits outside of the scope of this tutorial.\n", "\n", "This function can be used to extract time and pitch values from audio recordings: praatio.pitch_and_intensity.audioToPI(). Don't worry too much about how the function works--thats for another tutorial.\n", "\n", "**Scenario:** You want to examine the maximum pitch that was produced whenever a speaker was saying someone's name. You're not interested in the pitch for other words.\n", "\n", "To do this, **1)** we're first going to have to extract the pitch from audio recordings, **2)** then we're going to need to find when the words were spoken, **3)** then we'll isolate the relevant pitch values for each word, and **4)** finally find the maximum value." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('mary.wav', Interval(start=0.3154201182247563, end=0.6755499913498981, label='mary'), 119.56262121912178)\n", "('bobby.wav', Interval(start=0.06469123242311078, end=0.41156462585, label='BOBBY'), 136.1707851512811)\n" ] } ], "source": [ "import os\n", "from os.path import join\n", "\n", "from praatio import textgrid\n", "from praatio import pitch_and_intensity\n", "\n", "# For pitch extraction, we need the location of praat on your computer\n", "#praatEXE = r\"C:\\Praat.exe\"\n", "praatEXE = \"/Applications/Praat.app/Contents/MacOS/Praat\"\n", "\n", "# The 'os.getcwd()' is kindof a hack. With jypter __file__ is undefined and\n", "# os.getcwd() seems to default to the praatio installation files.\n", "rootPath = join(os.getcwd(), '..', 'examples', 'files')\n", "pitchPath = join(rootPath, \"pitch_extraction\", \"pitch\")\n", "\n", "fnList = [('mary.wav', 'mary.TextGrid'),\n", " ('bobby.wav', 'bobby_words.TextGrid')]\n", "\n", "# The names of interest -- in an example working with more data, this would be more comprehensive\n", "nameList = ['mary', 'BOBBY', 'lisa', 'john', 'sarah', 'tim', ]\n", "\n", "outputList = []\n", "for wavName, tgName in fnList:\n", " \n", " pitchName = os.path.splitext(wavName)[0] + '.txt'\n", "\n", " tg = textgrid.openTextgrid(join(rootPath, tgName), includeEmptyIntervals=False)\n", " \n", " # 1 - get pitch values\n", " pitchList = pitch_and_intensity.extractPitch(join(rootPath, wavName),\n", " join(pitchPath, pitchName),\n", " praatEXE, 50, 350,\n", " forceRegenerate=True)\n", " \n", " # 2 - find the intervals where a name was spoken\n", " nameIntervals = []\n", " targetTier = tg.getTier('word')\n", " for name in nameList:\n", " findMatches = targetTier.find(name)\n", " for i in findMatches:\n", " nameIntervals.append(targetTier.entries[i])\n", " \n", " # 3 - isolate the relevant pitch values\n", " matchedIntervals = []\n", " intervalDataList = []\n", " for entry in nameIntervals:\n", " start, stop, label = entry\n", " \n", " croppedTier = targetTier.crop(start, stop, \"truncated\", False)\n", " intervalDataList = croppedTier.getValuesInIntervals(pitchList)\n", " matchedIntervals.extend(intervalDataList)\n", " \n", " # 4 - find the maximum value\n", " for interval, subDataList in intervalDataList:\n", " pitchValueList = [pitchV for timeV, pitchV in subDataList]\n", " maxPitch = max(pitchValueList)\n", " \n", " outputList.append((wavName, interval, maxPitch))\n", "\n", "# Output results\n", "for name, interval, value in outputList:\n", " print((name, interval, value))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "### Operations between Tiers\n", "\n", "A lot of functions require a second tier. Two are only useful in specific situations. I'll just introduce them here:\n", "\n", "**append()** is a function on tiers that appends a tier to another one. Could be useful if you are combining two audio files that have been transcribed.\n", "\n", "**morph()** changes the duration of labeled segments in one textgrid to that of another while leaving silences alone and leaving alone the labels. It's used by my ProMo library. Maybe you'll find some other use for it.\n", "\n", "\n", "Of more general use, there are the functions that do **set operations**: difference(), intersection(), and union()" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('phons', 'syllable', 'tonicVowel', 'tonicSyllable', 'words', 'manually_labeled_pitch_errors')\n" ] } ], "source": [ "# Let's reload everything\n", "from os.path import join\n", "from praatio import textgrid\n", "\n", "# We'll use a special textgrid for this purpose\n", "inputFN = join('..', 'examples', 'files', 'damon_set_test.TextGrid')\n", "tg = textgrid.openTextgrid(inputFN, includeEmptyIntervals=False)\n", "\n", "# Ok, what are our tiers?\n", "print(tg.tierNames)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.05127748605468781, end=0.16128645133720465, label='T'), Interval(start=0.3020979268988262, end=0.505, label='T'), Interval(start=0.615, end=0.755, label='T'))\n", "(Interval(start=0.06278710646000359, end=0.17536306002462773, label='x'), Interval(start=0.5350436443504402, end=0.649222328635095, label='x'))\n" ] } ], "source": [ "# Let's take set operations between these two tiers\n", "syllableTier = tg.getTier('tonicSyllable')\n", "errorTier = tg.getTier('manually_labeled_pitch_errors')\n", "\n", "print(syllableTier.entries)\n", "print(errorTier.entries)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Interval(start=0.05127748605468781, end=0.06278710646000359, label='T'), Interval(start=0.3020979268988262, end=0.505, label='T'), Interval(start=0.649222328635095, end=0.755, label='T'))\n", "(Interval(start=0.06278710646000359, end=0.16128645133720465, label='T-x'), Interval(start=0.615, end=0.649222328635095, label='T-x'))\n", "(Interval(start=0.05127748605468781, end=0.17536306002462773, label='T-x'), Interval(start=0.3020979268988262, end=0.505, label='T'), Interval(start=0.5350436443504402, end=0.755, label='x-T'))\n" ] } ], "source": [ "# Set difference -- the entries that are not in errorTier are kept\n", "diffTier = syllableTier.difference(errorTier)\n", "diffTier = diffTier.new(name=\"different\")\n", "print(diffTier.entries)\n", "\n", "# Set intersection -- the overlapping regions between the two tiers are kept\n", "interTier = syllableTier.intersection(errorTier)\n", "interTier = interTier.new(name=\"intersection\")\n", "print(interTier.entries)\n", "\n", "# Set union -- the two tiers are merged\n", "unionTier = syllableTier.union(errorTier)\n", "unionTier = unionTier.new(name=\"union\")\n", "print(unionTier.entries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That output might be a little hard to visualize. Here is what the output looks like in a textgrid:\n", "\n", "![set_operations_tiers.png](./resources/set_operations_tiers.png)\n", "\n", "Just for more practice, this textgrid could be generated with code like the following:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "outputFN = join('..', 'examples', 'files', 'damon_set_test_output.TextGrid')\n", "setTG = textgrid.Textgrid()\n", "for tier in [syllableTier, errorTier, diffTier, interTier, unionTier]:\n", " setTG.addTier(tier)\n", "setTG.save(outputFN, format=\"short_textgrid\", includeBlankSpaces=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Summary\n", "\n", "In this tutorial we covered the basics of working with TextGrids in a programmatic fashion. This tutorial may have given you some ideas for your own project. However, in the next tutorials (which don't exist yet) we'll cover more functionality in praatio which will make it easier to work with real data and might make it clearer how to use textgrid.py. \n", "\n", "Stay tuned!" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "
\n", "\n", "## Beyond PraatIO\n", "\n", "I've developed some 'advanced' libraries that extend the functionality of PraatIO but have larger dependencies and might not be of interest to everyone. For these reasons they were spun out as their own thing.\n", "\n", "If you've got a cool thing that uses PraatIO, let me know and I can add it to this list!\n", "\n", "### ProMo\n", "\n", "[ProMo](https://github.com/timmahrt/promo), or Prosody Morph, is a library for resynthesizing the prosodic qualities of speech--in particular the pitch and duration. The key feature is to 'morph' from a source file to a target. The prosodic qualities of one utterance can be superimposed onto another.\n", "\n", "### Psyle\n", "\n", "ISLE is an English plain text pronunciation dictionary. It lists pronunciations using the International Phonetic Alphabet and indicates syllables, stress information, and part of speech.\n", "\n", "[Pysle](https://github.com/timmahrt/pysle) is a python interface to ISLE. Using it, one can search words based on how they are pronounced or find out information about the canonical way of pronouncing a word.\n", "\n", "Pysle has some functionality that depends on PraatIO. For example, one can mark the stressed syllable in a textgrid that has been labeled with words and syllables or phones." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 2 }