{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Identificando pajaritos con Python\n", "## Angie K. Reyes\n", "### Mayo 2018" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# AGENDA\n", "### Introducción\n", "#### - ¿Quién soy?\n", "#### - Temas de interes\n", "\n", "### Antecedentes\n", "#### - LifeClef Challenge\n", "#### - Motivación\n", "\n", "### Contenido\n", "#### - ¿Cómo identificar especies de aves usando Python?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Acerca de mi ....\n", "\n", "\n", "## https://github.com/angiereyesbet/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ingeniera de sistemas y computación\n", "### 25 años de edad\n", "### Chica TIC 2016\n", "\n", "### Estudiante del doctorado en Ciencia Aplicada - UAN\n", "### Desarrollador Back-End" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Temáticas\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Por qué Python?\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# \"Desarrolle una pasión por el aprendizaje. \n", "# Si lo hace, usted nunca dejará de crecer.\"\n", "### Anthony J.D' Angelo." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# BACKGROUND" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Motivación\n", "### - Se necesitan Ornitologos expertos\n", "### - Los pajaros tienen acentos distintos dependientes de la región\n", "### - Indentificar la migración de audios\n", "### - Pajaros no identificados o en peligro de extinción\n", "### - Colombia es el segundo país con mayor biodiversidad \n", "### - En 2013 Colombia, la cantidad de especies de aves ascendio a 1.903" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Life Cleft Challenge\n", "### El objetivo de la tarea es identificar todas las especies de aves a partir de un conjunto de audios.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Acerca del aprendizaje de máquina\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Support Vector Machine\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Clustering\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# ¿Cómo identificar especies de aves usando Python?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Dataset\n", "### Xeno Canto (https://www.xeno-canto.org)\n", "### 34.496 grabaciones de audio\n", "### 1.500 tipos de especies\n", "#### ---------------------------------------------------------------------\n", "### 7.860 grabaciones de audio Colombianas\n", "### 789 tipos de especies\n", "#### ---------------------------------------------------------------------\n", "### 3.440 grabaciones de audio\n", "### 100 tipos de especies" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Paso 1: Procesamiento de datos\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Paso 2: Procesamiento de señal\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Paso 3: Clasificación\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Ahora sí .... lo Divertido!!!\n", "## (El Código)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "import os\n", "\n", "import json\n", "\n", "import speechpy\n", "\n", "import numpy as np\n", "\n", "import IPython.display as ipd\n", "\n", "import scipy.io.wavfile as wav\n", "\n", "import xml.etree.ElementTree as ET\n", "\n", "from urllib.request import urlopen\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ejemplo de archivo XML" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cwd = os.getcwd()\n", "\n", "xml_path = \"LIFECLEF2015_BIRDAMAZON_XC_WAV_RN15568.xml\"\n", "xml_dict = {}\n", "tree = ET.parse(xml_path)\n", "root = tree.getroot() \n", "for child in root:\n", " xml_dict[child.tag] = root.find(child.tag).text" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MediaId : 15568\n", "FileName : LIFECLEF2015_BIRDAMAZON_XC_WAV_RN15568.wav\n", "ClassId : ssmptq\n", "Date : 2000-06-01\n", "Time : ?\n", "Locality : Humedal de Tibanica, Bosa, Bogotá D.C.\n", "Latitude : 4.6030444444\n", "Longitude : -74.2044555556\n", "Elevation : 2546\n", "Author : Paula Caycedo Rosales (Colección de Sonidos Ambientales - Instituto Humboldt)\n", "AuthorID : XMFDPACYJN\n", "Content : song\n", "Comments : BSA 7557To obtain a wav file of the original recording, please contact csa@humboldt.org.coplayback-used:no\n", "Quality : 1\n", "Year : BirdCLEF2015\n", "BackgroundSpecies : None\n", "Order : Passeriformes\n", "Family : Troglodytidae\n", "Genus : Cistothorus\n", "Species : apolinari\n", "Sub-species : apolinari\n", "VernacularNames : Apolinar's Wren\n" ] } ], "source": [ "for key in xml_dict:\n", " print(key, \":\", xml_dict[key])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ejemplo de archivo de Audio" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio_path = \"LIFECLEF2015_BIRDAMAZON_XC_WAV_RN15568.wav\"\n", "\n", "ipd.Audio(audio_path)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ejemplo para obtener pais con el API de Google" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Colombia\n" ] } ], "source": [ "password = os.environ[\"PASS_GOOGLE\"]\n", "\n", "latitude = xml_dict['Latitude'].lower().strip()\n", "longitude = xml_dict['Longitude'].lower().strip()\n", "\n", "url =\"https://maps.googleapis.com/maps/api/geocode/json?latlng=\"+latitude+\",\"+longitude+\"&key=\"+password\n", "\n", "jsonResponse=json.load(urlopen(url))\n", "jsonRes= jsonResponse['results']\n", "\n", "for x in jsonRes:\n", " res= x['address_components']\n", "\n", "for x in res:\n", " country = x['long_name']\n", " print(country)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Eliminar ruido en los archivos de audio" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_audio_noise = str(audio_path).replace('.wav','_noise.wav')\n", "\n", "# remove noise\n", "resp = os.system(\"sox \" + audio_path + \" \" + new_audio_noise + \" noisered speech.noise-profile .5\")\n", "\n", "ipd.Audio(new_audio_noise)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Eliminar los silencios del audio " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_audio_silence = str(audio_path).replace('.wav','_silence.wav')\n", "\n", "# create new file without silence\n", "resp = os.system(\"sox \" + new_audio_noise + \" \" + new_audio_silence + \" silence 1 0.1 1% -1 0.1 1%\")\n", "\n", "ipd.Audio(new_audio_silence)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Extraer características\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mfcc features: (961, 13)\n", "mfcc(mean + variance normalized) features: (961, 13)\n", "mfcc feature cube: (961, 13, 3)\n" ] } ], "source": [ "fs, signal = wav.read(audio_path)\n", "\n", "# mfcc features\n", "mfcc = speechpy.feature.mfcc(signal, sampling_frequency=fs, \n", " frame_length=0.020, frame_stride=0.01,\n", " num_filters=40, fft_length=512, \n", " low_frequency=0, high_frequency=None)\n", "\n", "print(\"mfcc features:\", np.shape(mfcc))\n", "\n", "# mfcc(mean + variance normalized) features\n", "mfcc_cmvn = speechpy.processing.cmvnw(mfcc, win_size=301, \n", " variance_normalization=True)\n", "\n", "print(\"mfcc(mean + variance normalized) features:\", np.shape(mfcc_cmvn))\n", "\n", "# mfcc feature cube\n", "mfcc_feature_cube = speechpy.feature.extract_derivative_feature(mfcc)\n", "\n", "print(\"mfcc feature cube:\", np.shape(mfcc_feature_cube))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Clasificador con características globales\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "import collections\n", "\n", "from sklearn.svm import SVC\n", "\n", "from sklearn.externals import joblib\n", "\n", "from sklearn.model_selection import train_test_split as database_split" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Support Vector Machine" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def classifier(feat_train, feat_test, lab_train, lab_test):\n", " \n", " clf = SVC(C=2**0, cache_size=300, class_weight=None, coef0=0.0,\n", " decision_function_shape=None, degree=3, \n", " gamma='auto', kernel='rbf', probability=False, \n", " random_state=None, shrinking=True,\n", " tol=0.001, verbose=False)\n", " \n", " clf.fit(feat_train, lab_train)\n", " \n", " score_test = clf.score(feat_test, lab_test)\n", " \n", " return score_test" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Las características globales se conforma de la media de los datos" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Extract global features\n", "def globalFeatures(features):\n", " return features.mean(0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Crear conjuntos de datos " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# Classifier and model evaluation\n", "def main(dataset, labels):\n", " \n", " # Split data into two set (training set, test set)\n", " feat_train, feat_test, lab_train, lab_test = database_split(dataset, \n", " labels, test_size=0.3)\n", "\n", " print(\"Train set shape:\", np.shape(feat_train))\n", " print(\"Test set shape:\", np.shape(feat_test))\n", "\n", " counter = collections.Counter(lab_train)\n", " counter = dict(counter)\n", "\n", " print(\"Distribution labels (train set):\", counter)\n", "\n", " counter=collections.Counter(lab_test)\n", " counter = dict(counter)\n", "\n", " print(\"Distribution labels (test set):\", counter)\n", "\n", " score = classifier(feat_train, feat_test, \n", " lab_train, lab_test)\n", "\n", " print(\"Done!\", \"Score:\", score)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Importar datos" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "data = joblib.load(\"mfcc_features.pkl.compressed\")\n", "\n", "mfcc_data = list()\n", "mfcc_cmvn_data = list()\n", "mfcc_feature_cube_data = list()\n", "labels_data = list()\n", "\n", "for key in data:\n", " \n", " mfcc = data[key][\"mfcc\"]\n", " mfcc_cmvn = data[key][\"mfcc_cmvn\"]\n", " mfcc_feature_cube = data[key][\"mfcc_feature_cube\"]\n", " mfcc_feature_cube = mfcc_feature_cube.reshape((len(mfcc_feature_cube), 39))\n", " \n", " label = data[key][\"label\"] \n", " \n", " mfcc_global = globalFeatures(mfcc) \n", " mfcc_cmvn_global = globalFeatures(mfcc_cmvn) \n", " mfcc_feature_global = globalFeatures(mfcc_feature_cube)\n", " \n", " mfcc_data.append(mfcc_global) \n", " mfcc_cmvn_data.append(mfcc_cmvn_global) \n", " mfcc_feature_cube_data.append(mfcc_feature_global)\n", "\n", " labels_data.append(label)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Clasificador para los 3 conjuntos de datos " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train set shape: (245, 13)\n", "Test set shape: (105, 13)\n", "Distribution labels (train set): {5: 44, 4: 53, 2: 65, 3: 83}\n", "Distribution labels (test set): {5: 21, 2: 24, 3: 39, 4: 21}\n", "Done! Score: 0.819047619047619\n", "\n", "\n", "Train set shape: (245, 13)\n", "Test set shape: (105, 13)\n", "Distribution labels (train set): {3: 84, 2: 55, 4: 57, 5: 49}\n", "Distribution labels (test set): {3: 38, 2: 34, 4: 17, 5: 16}\n", "Done! Score: 0.3619047619047619\n", "\n", "\n", "Train set shape: (245, 39)\n", "Test set shape: (105, 39)\n", "Distribution labels (train set): {2: 60, 4: 52, 3: 86, 5: 47}\n", "Distribution labels (test set): {3: 36, 2: 29, 5: 18, 4: 22}\n", "Done! Score: 0.8095238095238095\n" ] } ], "source": [ "# Process for mfcc\n", "main(mfcc_data, labels_data)\n", "print(\"\\n\")\n", "\n", "# Process for mfcc (mean + variance normalized)\n", "main(mfcc_cmvn_data, labels_data)\n", "print(\"\\n\")\n", "\n", "# Process for mfcc (cube)\n", "main(mfcc_feature_cube_data, labels_data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Clasificador con características \"Bag of features\"\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import collections\n", "from sklearn.svm import SVC\n", "from sklearn.externals import joblib\n", "from scipy.spatial.distance import cdist\n", "from sklearn.cluster import MiniBatchKMeans\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Paso 3: Clasificación\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Función para normalizar la información" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def normalize(data):\n", " \n", " std_dev = np.std(data, axis=0)\n", " zero_std_mask = std_dev == 0\n", " \n", " if zero_std_mask.any():\n", " \n", " std_dev[zero_std_mask] = 1.0\n", " \n", " result = data / std_dev\n", " \n", " return result, std_dev" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Función para crear los grupos de características\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def k_means(data, k_guess):\n", " \n", " batch_size = 100 \n", " \n", " mbk = MiniBatchKMeans(init='k-means++', n_clusters=k_guess, \n", " batch_size=batch_size, n_init=10, \n", " max_no_improvement=10, verbose=0)\n", " \n", " codebook = mbk.fit(data)\n", " \n", " return codebook.cluster_centers_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Función para crear el histograma" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def histogram(std_dev, features, codebook, index): \n", " \n", " histogram_= [0] * index\n", " \n", " distance = cdist(features, codebook, 'euclidean')\n", " \n", " short = ((np.argsort(distance)).transpose()[0]).tolist()\n", " \n", " counter = dict(collections.Counter(short))\n", " \n", " for key in counter: \n", " \n", " histogram_[key] = int(counter[key])\n", " \n", " return histogram_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Bag of features (visual words)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "def bagOfWords(features, n_clusters):\n", " \n", " features, std_dev = normalize(mfcc)\n", " \n", " codebook = k_means(features, n_clusters)\n", " \n", " histogram_ = histogram(std_dev, features, codebook, n_clusters)\n", " \n", " return histogram_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Importar los datos y procesarlos \n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "mfcc_data = list()\n", "mfcc_cmvn_data = list()\n", "mfcc_feature_cube_data = list()\n", "labels_data = list()\n", "\n", "n_clusters = 100" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "for key in data:\n", " \n", " mfcc = data[key][\"mfcc\"]\n", " mfcc_cmvn = data[key][\"mfcc_cmvn\"]\n", " mfcc_feature_cube = data[key][\"mfcc_feature_cube\"]\n", " mfcc_feature_cube = mfcc_feature_cube.reshape((len(mfcc_feature_cube), 39))\n", " \n", " label = data[key][\"label\"] \n", " \n", " n_clusterster = int ( len(mfcc) / 2) \n", " \n", " mfcc_histogram = bagOfWords(mfcc, n_clusters)\n", " \n", " mfcc_cmvn_histogram = bagOfWords(mfcc_cmvn, n_clusters)\n", " \n", " mfcc_feature_cube_histogram = bagOfWords(mfcc_feature_cube, n_clusters)\n", " \n", " \n", " mfcc_data.append(mfcc_histogram) \n", " \n", " mfcc_cmvn_data.append(mfcc_cmvn_histogram) \n", " \n", " mfcc_feature_cube_data.append(mfcc_feature_cube_histogram)\n", " \n", " \n", " labels_data.append(label)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Clasificador para los 3 conjuntos de datos\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train set shape: (245, 100)\n", "Test set shape: (105, 100)\n", "Distribution labels (train set): {3: 88, 4: 55, 2: 60, 5: 42}\n", "Distribution labels (test set): {3: 34, 2: 29, 5: 23, 4: 19}\n", "Done! Score: 0.3238095238095238\n", "\n", "\n", "Train set shape: (245, 100)\n", "Test set shape: (105, 100)\n", "Distribution labels (train set): {3: 85, 2: 64, 5: 46, 4: 50}\n", "Distribution labels (test set): {3: 37, 5: 19, 2: 25, 4: 24}\n", "Done! Score: 0.3523809523809524\n", "\n", "\n", "Train set shape: (245, 100)\n", "Test set shape: (105, 100)\n", "Distribution labels (train set): {4: 56, 2: 61, 3: 83, 5: 45}\n", "Distribution labels (test set): {3: 39, 2: 28, 5: 20, 4: 18}\n", "Done! Score: 0.37142857142857144\n" ] } ], "source": [ "# process for mfcc\n", "main(mfcc_data, labels_data)\n", "print(\"\\n\")\n", "\n", "# Process for mfcc (mean + variance normalized)\n", "main(mfcc_cmvn_data, labels_data)\n", "print(\"\\n\")\n", "\n", "# Process for mfcc (cube)\n", "main(mfcc_feature_cube_data, labels_data)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Puertas abiertas\n", "\n", "### - Modificación de algoritmos y experimentación\n", "\n", "### - Participación en retos\n", "\n", "### - Clasificación con imágenes\n", "\n", "### - Incluir datos en la clasificación" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# GRACIAS!!\n", "\n", "### angreyes@outlook.com\n", "### angiereyes.bet@gmail.com" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }