{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Improvise a Jazz Solo with an LSTM Network\n", "\n", "Welcome to your final programming assignment of this week! In this notebook, you will implement a model that uses an LSTM to generate music. At the end, you'll even be able to listen to your own music! \n", "\n", "\n", "\n", "**By the end of this assignment, you'll be able to:**\n", "\n", "- Apply an LSTM to a music generation task\n", "- Generate your own jazz music with deep learning\n", "- Use the flexible Functional API to create complex models\n", "\n", "This is going to be a fun one. Let's get started! " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "\n", "- [Packages](#0)\n", "- [1 - Problem Statement](#1)\n", " - [1.1 - Dataset](#1-1)\n", " - [1.2 - Model Overview](#1-2)\n", "- [2 - Building the Model](#2)\n", " - [Exercise 1 - djmodel](#ex-1)\n", "- [3 - Generating Music](#3)\n", " - [3.1 - Predicting & Sampling](#3-1)\n", " - [Exercise 2 - music_inference_model](#ex-2)\n", " - [Exercise 3 - predict_and_sample](#ex-3)\n", " - [3.2 - Generate Music](#3-2)\n", "- [4 - References](#4) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Packages\n", "\n", "Run the following cell to load all the packages you'll need. This may take a few minutes!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import IPython\n", "import sys\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import tensorflow as tf\n", "\n", "from music21 import *\n", "from grammar import *\n", "from qa import *\n", "from preprocess import * \n", "from music_utils import *\n", "from data_utils import *\n", "from outputs import *\n", "from test_utils import *\n", "\n", "from tensorflow.keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector\n", "from tensorflow.keras.models import Model\n", "from tensorflow.keras.optimizers import Adam\n", "from tensorflow.keras.utils import to_categorical" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 1 - Problem Statement\n", "\n", "You would like to create a jazz music piece specially for a friend's birthday. However, you don't know how to play any instruments, or how to compose music. Fortunately, you know deep learning and will solve this problem using an LSTM network! \n", "\n", "You will train a network to generate novel jazz solos in a style representative of a body of performed work. 😎🎷\n", "\n", "\n", "### 1.1 - Dataset\n", "\n", "To get started, you'll train your algorithm on a corpus of Jazz music. Run the cell below to listen to a snippet of the audio from the training set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "IPython.display.Audio('./data/30s_seq.wav')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The preprocessing of the musical data has been taken care of already, which for this notebook means it's been rendered in terms of musical \"values.\" \n", "\n", "#### What are musical \"values\"? (optional)\n", "You can informally think of each \"value\" as a note, which comprises a pitch and duration. For example, if you press down a specific piano key for 0.5 seconds, then you have just played a note. In music theory, a \"value\" is actually more complicated than this -- specifically, it also captures the information needed to play multiple notes at the same time. For example, when playing a music piece, you might press down two piano keys at the same time (playing multiple notes at the same time generates what's called a \"chord\"). But you don't need to worry about the details of music theory for this assignment. \n", "\n", "#### Music as a sequence of values\n", "\n", "* For the purposes of this assignment, all you need to know is that you'll obtain a dataset of values, and will use an RNN model to generate sequences of values. \n", "* Your music generation system will use 90 unique values. \n", "\n", "Run the following code to load the raw music data and preprocess it into values. This might take a few minutes!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "X, Y, n_values, indices_values, chords = load_music_utils('data/original_metheny.mid')\n", "print('number of training examples:', X.shape[0])\n", "print('Tx (length of sequence):', X.shape[1])\n", "print('total # of unique values:', n_values)\n", "print('shape of X:', X.shape)\n", "print('Shape of Y:', Y.shape)\n", "print('Number of chords', len(chords))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You have just loaded the following:\n", "\n", "- `X`: This is an (m, $T_x$, 90) dimensional array. \n", " - You have m training examples, each of which is a snippet of $T_x =30$ musical values. \n", " - At each time step, the input is one of 90 different possible values, represented as a one-hot vector. \n", " - For example, X[i,t,:] is a one-hot vector representing the value of the i-th example at time t. \n", "\n", "- `Y`: a $(T_y, m, 90)$ dimensional array\n", " - This is essentially the same as `X`, but shifted one step to the left (to the past). \n", " - Notice that the data in `Y` is **reordered** to be dimension $(T_y, m, 90)$, where $T_y = T_x$. This format makes it more convenient to feed into the LSTM later.\n", " - Similar to the dinosaur assignment, you're using the previous values to predict the next value.\n", " - So your sequence model will try to predict $y^{\\langle t \\rangle}$ given $x^{\\langle 1\\rangle}, \\ldots, x^{\\langle t \\rangle}$. \n", "\n", "- `n_values`: The number of unique values in this dataset. This should be 90. \n", "\n", "- `indices_values`: python dictionary mapping integers 0 through 89 to musical values.\n", "\n", "- `chords`: Chords used in the input midi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 1.2 - Model Overview\n", "\n", "Here is the architecture of the model you'll use. It's similar to the Dinosaurus model, except that you'll implement it in Keras.\n", "\n", "\n", "
\n", " **np.argmax(results[12])** =\n", " | \n", "\n", " 26\n", " | \n", "
\n", " **np.argmax(results[17])** =\n", " | \n", "\n", " 7\n", " | \n", "
\n", " **list(indices[12:18])** =\n", " | \n", "\n", " [array([26]), array([18]), array([53]), array([27]), array([40]), array([7])]\n", " | \n", "