{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "99af7aa1", "metadata": {}, "source": [ "# Getting started with `musif`\n", "\n", "[Download the Getting started tutorial Jupyter notebook here](https://raw.githubusercontent.com/DIDONEproject/musif/main/docs/source/Tutorial.ipynb)\n", "\n", "\n", "`musif` is a Python library to analyze music scores. It is a tool to massively extract features from MusicXML and MuseScore files.\n", "\n", "`musif` was born in the context of the [ERC Project \"DIDONE\"](https://didone.eu/) and, consequently,\n", "it is specialized in 18th-century Italian opera arias. However, it is also prepared to work with other repertoires.\n", "\n", "This tutorial is an introduction for people who are not experts in programming. If you are already an expert, just skip to the [Data Section](#data) and then go to the [Advanced Tutorial](https://musif.didone.eu/Tutorial_20poprock.html).\n", "\n", "\n", "## Installation\n", "\n", "First, you should install [`Python`](https://www.python.org/downloads/) > 3.10. An easy way to do this is by using [`Anaconda`](https://www.anaconda.com/products/distribution), especially if you are not used to commandline interface.A\n", "Once you have installed `anaconda`:\n", "1. Launch the `anaconda-navigator`\n", "2. [Create an environment](https://docs.anaconda.com/navigator/getting-started/#managing-environments) selecting python version >= 3.10\n", "3. Switch to the newly created environment by clicking on its name\n", "\n", "\n", "To install `musif`:\n", "1. [Download this notebook](https://raw.githubusercontent.com/DIDONEproject/musif/main/docs/source/Tutorial.ipynb).\n", "2. Start `jupyter` in your Anaconda environment.\n", "3. Open this tutorial.\n", "4. Run the following cell by clicking on it and pressing Ctrl+Enter." ] }, { "attachments": {}, "cell_type": "markdown", "id": "91eac574", "metadata": {}, "source": [ "Here, the `!` is a special command that executes commands in the terminal. After having run it, you may need to restart the notebook (click the circular arrow ↻ in the top bar, near the icons ▶ and ⏹)\n", "\n", "To run this tutorial:\n", "1. In the `Home` tab of the `anaconda-navigator`, select \"All applications\" and the newly created environment in the options at the top.\n", "2. Click on `Install`, near to the `Jupyter` icon\n", "3. Once installed, click on `launch` near the `notebook` icon; a web interface will open in the browser\n", "4. [Download](https://raw.githubusercontent.com/DIDONEproject/musif/main/docs/source/Tutorial.ipynb) by clicking iwth right mouse button and selecting \"save as...\"\n", "5. Navigate to the downloaded file from the web interface and open it\n", "6. Run the following cell by clicking on it and pressing Ctrl+Enter " ] }, { "cell_type": "code", "execution_count": null, "id": "a7710973", "metadata": {}, "outputs": [], "source": [ "! pip install musif" ] }, { "cell_type": "code", "execution_count": 2, "id": "9a1257d3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Version: 1.2.3\n" ] } ], "source": [ "import musif\n", "print('Version: ', musif.__version__)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "01cd214c", "metadata": {}, "source": [ "## Introduction\n", "\n", "If you are new to Python, we suggest you to read an introductory tutorial for it, for instance, [this one](https://www.w3schools.com/python/default.asp). \n", "\n", "In the following, we will introduce some technical terminology that may be useful to you to understand technical documentation while working with `musif`:\n", "\n", "* A _function_ is a way to represent code that is convenient for humans. You can think of functions as mathematical functions, with some input and some output. However, some programming languages call them _procedures_; this is not the case with Python, but this name allows grasping what functions are, after all: successions of commands that the computer has to execute.\n", "\n", "* An _object_ is a computational way to represent information _and_ code in the memory of computers; you can think of objects as real concepts of the real world: objects have properties (in Python named _fields_) and functionalities (named _methods_). For instance, an object could be a vehicle, which has some properties (length, maximum speed, number of wheels) and some functionalities (accelerate, decelerate, stop). Objects can also have specializations (named _children_): in our example, a _child_ of vehicle could be the car and another _child_ could be the bike: they have different properties and apply the functionalities in a different way. Both the vehicle, the car, and the bike may have instances: the car that you use everyday to go to work is different from your friend's even if they have the same exact properties, because they are two different concrete objects. Technically, those two cars are two _instances_ of the same _class_. To create an instance, you have to call a function, generically named _constructor_, which takes as arguments the class and the other properties. This function will return the instance. To use `musif`, you don't need to know a lot about objects, but while you search the web it is good to have a little of knowledge.\n", "\n", "* A _DataFrame_ is another way to represent information for computers. They are designed to be extremely efficient, even if sometimes some aspects of the information can get lost. They are mainly used for data science problems. You can think of a _DataFrame_ as a table, with rows and columns. Usually, rows are _instances_ while columns are _properties_. In data science, these words often become _samples_ and _features_/_variables_. A typical operation is to select only certain columns (properties) or only certain rows (instances) to select subset of the data or to modify the data itself.\n", "\n", "* Don't be scared to use web search engines such as Google: searching the web in a proper way is one of the most important skills a programmer has!\n", "\n", "### Main objects\n", "\n", "When using `musif`, you will usually interface with two objects:\n", "1. [`FeaturesExtractor()`](API/musif.extract.html#musif.extract.extract.FeaturesExtractor), which reads music scores and computes a DataFrame containing all the extracted features. In the simplest case, each row represents a music score, while each column represents a feature.\n", "2. [`DataProcessor()`](API/musif.process.html#musif.process.processor.DataProcessor), which takes the DataFrame with all the features in it and post-processes it to clean, improve, and possibly modify some of the features.\n", "\n", "These two objects take as input two different configurations that modify their behavior. In other words, the constructors of `FeaturesExtractor` and `DataProcessor` can accept a wide range of arguments.\n", "\n", "But let's proceed step by step!" ] }, { "cell_type": "code", "execution_count": 3, "id": "b8119d50", "metadata": {}, "outputs": [], "source": [ "import urllib.request\n", "import zipfile\n", "from pathlib import Path\n", "\n", "data_dir = Path(\"data\")\n", "dataset_path = \"dataset.zip\"\n", "urllib.request.urlretrieve(\"https://zenodo.org/record/4027957/files/AnatomyComposerAttributionMIDIFilesAndFeatureData_1_0.zip?download=1\", dataset_path)\n", "with zipfile.ZipFile(dataset_path, 'r') as zip_ref:\n", " zip_ref.extractall(data_dir)\n", "data_dir = data_dir / Path('AnatomyComposerAttributionMIDIFilesAndFeatureData_1_0') / Path('MIDI/')\n" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cab6f329", "metadata": {}, "source": [ "## Configuration\n", "\n", "Let's create a configuration for our experiment. Configurations can be expressed using a `yaml` file or with key-value arguments. `yaml` files are designed for complex projects, while key-value arguments are perfect for simple situations like this.\n", "\n", "Key-value arguments are something similar to a dictionary: There is a _key_ which must be unique in the dictionary; each _key_ is associated with a _value_, which can be repeated. Python can retrieve a value using its key in a very efficient way!\n", "\n", "First, we'll need to import the class that describes how a configuration is:" ] }, { "cell_type": "code", "execution_count": 4, "id": "7fe4511f", "metadata": {}, "outputs": [], "source": [ "from musif.config import ExtractConfiguration\n", "\n", "config = ExtractConfiguration(\n", " None,\n", " data_dir = data_dir,\n", " basic_modules=[\"scoring\"],\n", " features = [\"core\", \"ambitus\", \"melody\", \"tempo\", \n", " \"density\", \"texture\", \"lyrics\", \"scale\", \n", " \"key\", \"dynamics\", \"rhythm\"],\n", " parallel = -1 #use > 1 if you wish to use parallelization (runs faster, uses more memory)\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b7c511bb", "metadata": {}, "source": [ "Now, we can call its constructor to obtain a configuration object:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "64a59048", "metadata": {}, "source": [ "## Feature extraction\n", "\n", "Now that we have our configuration, we pass it to the function that creates `FeaturesExtraction` objects. This function is exactly named `FeaturesExtraction`:" ] }, { "cell_type": "code", "execution_count": 5, "id": "b900810d", "metadata": {}, "outputs": [], "source": [ "from musif.extract.extract import FeaturesExtractor\n", "extractor = FeaturesExtractor(config)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "343373ac", "metadata": {}, "source": [ "Before starting the extraction, we also need to tell MuseScore the type of files it should look for. In this case, we want it to look for files with extension `'.mid'`. By default, it would look for `.mscx` files, so we need to change it:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "756f12ba", "metadata": {}, "source": [ "Now, we can start the extraction using the method `extract`. It will return a `DataFrame`:" ] }, { "cell_type": "code", "execution_count": null, "id": "e286b65b", "metadata": {}, "outputs": [], "source": [ "df = extractor.extract()" ] }, { "cell_type": "code", "execution_count": 7, "id": "da54809a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape df: (175, 927)\n" ] }, { "data": { "text/html": [ "
\n", " | FamilyGen_Density | \n", "FamilyGen_Notes | \n", "FamilyGen_NotesMean | \n", "FamilyGen_NumberOfFilteredParts | \n", "FamilyGen_NumberOfParts | \n", "FamilyGen_SoundingDensity | \n", "FamilyGen_SoundingMeasures | \n", "FamilyGen_SoundingMeasuresMean | \n", "FamilyInstrumentation | \n", "FamilyScoring | \n", "... | \n", "SoundFl_TrimmedIntervallicMean | \n", "SoundFl_TrimmedIntervallicStd | \n", "SoundScoring | \n", "Tempo | \n", "TempoGrouped1 | \n", "TempoGrouped2 | \n", "TimeSignature | \n", "TimeSignatureGrouped | \n", "Voices | \n", "WindowId | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.113402 | \n", "1.468171 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
1 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.165948 | \n", "1.620333 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
2 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.106242 | \n", "1.634416 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
3 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.095768 | \n", "1.578589 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
4 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.073604 | \n", "1.623796 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
5 rows × 927 columns
\n", "\n", " | FamilyGen_Density | \n", "FamilyGen_Notes | \n", "FamilyGen_NotesMean | \n", "FamilyGen_NumberOfFilteredParts | \n", "FamilyGen_NumberOfParts | \n", "FamilyGen_SoundingDensity | \n", "FamilyGen_SoundingMeasures | \n", "FamilyGen_SoundingMeasuresMean | \n", "FamilyInstrumentation | \n", "FamilyScoring | \n", "... | \n", "SoundFl_TrimmedIntervallicMean | \n", "SoundFl_TrimmedIntervallicStd | \n", "SoundScoring | \n", "Tempo | \n", "TempoGrouped1 | \n", "TempoGrouped2 | \n", "TimeSignature | \n", "TimeSignatureGrouped | \n", "Voices | \n", "WindowId | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
0 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.113402 | \n", "1.468171 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
1 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.165948 | \n", "1.620333 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
2 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.106242 | \n", "1.634416 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
3 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.095768 | \n", "1.578589 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
4 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "ww | \n", "ww | \n", "... | \n", "-0.073604 | \n", "1.623796 | \n", "fl | \n", "<NA> | \n", "<NA> | \n", "None | \n", "2/1 | \n", "other | \n", "\n", " | 0 | \n", "
5 rows × 926 columns
\n", "\n", " | PartFlI_IntervalA-2_Per | \n", "PartFlI_IntervalA1_Per | \n", "PartFlI_IntervalA2_Per | \n", "PartFlI_IntervalA3_Per | \n", "PartFlI_IntervalA4_Per | \n", "PartFlI_IntervalM-2_Per | \n", "PartFlI_IntervalM-3_Per | \n", "PartFlI_IntervalM-6_Per | \n", "PartFlI_IntervalM-7_Per | \n", "PartFlI_IntervalM-9_Per | \n", "... | \n", "PartFlI_IntervalsMajorDesc_Per | \n", "PartFlI_IntervalsMinorAll_Per | \n", "PartFlI_IntervalsMinorAsc_Per | \n", "PartFlI_IntervalsMinorDesc_Per | \n", "PartFlI_IntervalsPerfectAll_Per | \n", "PartFlI_IntervalsPerfectAsc_Per | \n", "PartFlI_IntervalsPerfectDesc_Per | \n", "PartFlI_IntervalsWithinOctaveAll_Per | \n", "PartFlI_IntervalsWithinOctaveAsc_Per | \n", "PartFlI_IntervalsWithinOctaveDesc_Per | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
0 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.114943 | \n", "0.034483 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.172837 | \n", "0.068966 | \n", "0.034483 | \n", "0.034483 | \n", "0.512123 | \n", "0.106447 | \n", "0.101868 | \n", "1.0 | \n", "0.324497 | \n", "0.371695 | \n", "
1 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.188312 | \n", "0.045455 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.218958 | \n", "0.162338 | \n", "0.045455 | \n", "0.116883 | \n", "0.372618 | \n", "0.105962 | \n", "0.062201 | \n", "1.0 | \n", "0.360532 | \n", "0.435014 | \n", "
2 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.245713 | \n", "0.044699 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.282851 | \n", "0.285156 | \n", "0.109375 | \n", "0.15 | \n", "0.189659 | \n", "0.045257 | \n", "0.024475 | \n", "1.0 | \n", "0.404641 | \n", "0.475432 | \n", "
3 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.137931 | \n", "0.062069 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.191059 | \n", "0.236842 | \n", "0.075862 | \n", "0.122807 | \n", "0.375296 | \n", "0.107478 | \n", "0.068547 | \n", "1.0 | \n", "0.373753 | \n", "0.426976 | \n", "
4 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.190211 | \n", "0.055034 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.249377 | \n", "0.190083 | \n", "0.057851 | \n", "0.132231 | \n", "0.273048 | \n", "0.084658 | \n", "0.039822 | \n", "1.0 | \n", "0.415049 | \n", "0.436383 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
170 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.214841 | \n", "0.02509 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.243784 | \n", "0.254902 | \n", "0.108696 | \n", "0.137255 | \n", "0.282986 | \n", "0.070828 | \n", "0.051396 | \n", "1.0 | \n", "0.39817 | \n", "0.441068 | \n", "
171 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.192255 | \n", "0.048058 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.243046 | \n", "0.275132 | \n", "0.126984 | \n", "0.142857 | \n", "0.189175 | \n", "0.064503 | \n", "0.047243 | \n", "1.0 | \n", "0.479253 | \n", "0.443318 | \n", "
172 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.196581 | \n", "0.025641 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.24005 | \n", "0.220588 | \n", "0.101695 | \n", "0.117647 | \n", "0.244157 | \n", "0.031732 | \n", "0.034152 | \n", "1.0 | \n", "0.417226 | \n", "0.404501 | \n", "
173 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.254902 | \n", "0.039216 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.281809 | \n", "0.19697 | \n", "0.090909 | \n", "0.106061 | \n", "0.226009 | \n", "0.051943 | \n", "0.031067 | \n", "1.0 | \n", "0.408321 | \n", "0.44868 | \n", "
174 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "0.214286 | \n", "0.038961 | \n", "<NA> | \n", "<NA> | \n", "<NA> | \n", "... | \n", "0.243536 | \n", "0.220779 | \n", "0.077922 | \n", "0.133333 | \n", "0.313252 | \n", "0.084427 | \n", "0.049106 | \n", "1.0 | \n", "0.374877 | \n", "0.445405 | \n", "
175 rows × 66 columns
\n", "