{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "This notebook presents the architecture of DeepConvLSTM: a deep framework for wearable activity recognition based on convolutional and LSTM recurrent units. To obtain a detailed description of the architecture consult the paper \"Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition\"." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "The data" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "One of the benchmarks dataset employed to evaluate DeepConvLSTM is the 'OPPORTUNITY Activity Recognition Data Set'. OPPORTUNITY is a dataset devised to benchmark human activity recognition algorithms. It comprises the readings of motion sensors recorded while users executed typical daily activities and includes several annotations of gestures and modes of locomotion (visit https://archive.ics.uci.edu/ml/datasets/OPPORTUNITY+Activity+Recognition for further info). In this example DeepConvLSTM will perform recognition of sporadic gestures. This task concerns recognition of the different right-arm gestures. This is a 18 class segmentation and classification problem.\n", "\n", "The dataset must be be preprocessed prior to be feed to the neural network, in order to fill in missing values using linear interpolation and to do a per channel normalization to interval [0,1]. A Python script is provided to automatically preprocess the data, download the original OPPORTUNITY dataset if required and segment sensor data into train and test.\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "We would recommend to download the OPPORTUNITY zip file from the UCI repository and then use the script to generate the data file." ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00226/OpportunityUCIDataset.zip" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!python preprocess_data.py -h" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "usage: preprocess_data.py [-h] -i INPUT -o OUTPUT [-t {gestures,locomotion}]\r\n", "\r\n", "Preprocess OPPORTUNITY dataset\r\n", "\r\n", "optional arguments:\r\n", " -h, --help show this help message and exit\r\n", " -i INPUT, --input INPUT\r\n", " OPPORTUNITY zip file\r\n", " -o OUTPUT, --output OUTPUT\r\n", " Processed data file\r\n", " -t {gestures,locomotion}, --task {gestures,locomotion}\r\n", " Type of activities to be recognized\r\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "!python preprocess_data.py -i data/OpportunityUCIDataset.zip -o oppChallenge_gestures.data" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Checking dataset data/OpportunityUCIDataset.zip\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Processing dataset files ...\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-Drill.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-ADL1.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-ADL2.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-ADL3.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-ADL4.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S1-ADL5.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-Drill.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-ADL1.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-ADL2.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-ADL3.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-Drill.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-ADL1.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-ADL2.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-ADL3.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-ADL4.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S2-ADL5.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-ADL4.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "... file OpportunityUCIDataset/dataset/S3-ADL5.dat\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Final datasets with size: | train (557963, 113) | test (118750, 113) | \r\n" ] } ], "prompt_number": 6 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Running DeepConvLSTM" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "DeepConvLSTM is defined as a neural netowrk which combines convolutional and recurrent layers. The convolutional\n", "layers act as feature extractors and provide abstract representations of the input sensor data in feature\n", "maps. The recurrent layers model the temporal dynamics of the activation of the feature maps." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Setup" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import lasagne\n", "import theano\n", "import time\n", "\n", "import numpy as np\n", "import cPickle as cp\n", "import theano.tensor as T\n", "from sliding_window import sliding_window\n", "\n", "# Hardcoded number of sensor channels employed in the OPPORTUNITY challenge\n", "NB_SENSOR_CHANNELS = 113\n", "\n", "# Hardcoded number of classes in the gesture recognition problem\n", "NUM_CLASSES = 18\n", "\n", "# Hardcoded length of the sliding window mechanism employed to segment the data\n", "SLIDING_WINDOW_LENGTH = 24\n", "\n", "# Length of the input sequence after convolutional operations\n", "FINAL_SEQUENCE_LENGTH = 8\n", "\n", "# Hardcoded step of the sliding window mechanism employed to segment the data\n", "SLIDING_WINDOW_STEP = 12\n", "\n", "# Batch Size\n", "BATCH_SIZE = 100\n", "\n", "# Number filters convolutional layers\n", "NUM_FILTERS = 64\n", "\n", "# Size filters convolutional layers\n", "FILTER_SIZE = 5\n", "\n", "# Number of unit in the long short-term recurrent layers\n", "NUM_UNITS_LSTM = 128" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Load the sensor data" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Load the OPPORTUNITY processed dataset. Sensor data is segmented using a sliding window of fixed length. The class associated with each segment corresponds to the gesture which has been observed during that interval. Given a sliding window of length T, we choose the class of the sequence as the label at t=T, or in other words, the label of last sample in the window." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def load_dataset(filename):\n", "\n", " f = file(filename, 'rb')\n", " data = cp.load(f)\n", " f.close()\n", "\n", " X_train, y_train = data[0]\n", " X_test, y_test = data[1]\n", "\n", " print(\" ..from file {}\".format(filename))\n", " print(\" ..reading instances: train {0}, test {1}\".format(X_train.shape, X_test.shape))\n", "\n", " X_train = X_train.astype(np.float32)\n", " X_test = X_test.astype(np.float32)\n", "\n", " # The targets are casted to int8 for GPU compatibility.\n", " y_train = y_train.astype(np.uint8)\n", " y_test = y_test.astype(np.uint8)\n", "\n", " return X_train, y_train, X_test, y_test\n", "\n", "print(\"Loading data...\")\n", "X_train, y_train, X_test, y_test = load_dataset('data/oppChallenge_gestures.data')\n", "\n", "assert NB_SENSOR_CHANNELS == X_train.shape[1]\n", "def opp_sliding_window(data_x, data_y, ws, ss):\n", " data_x = sliding_window(data_x,(ws,data_x.shape[1]),(ss,1))\n", " data_y = np.asarray([[i[-1]] for i in sliding_window(data_y,ws,ss)])\n", " return data_x.astype(np.float32), data_y.reshape(len(data_y)).astype(np.uint8)\n", "\n", "# Sensor data is segmented using a sliding window mechanism\n", "X_test, y_test = opp_sliding_window(X_test, y_test, SLIDING_WINDOW_LENGTH, SLIDING_WINDOW_STEP)\n", "print(\" ..after sliding window (testing): inputs {0}, targets {1}\".format(X_test.shape, y_test.shape))\n", "\n", "# Data is reshaped since the input of the network is a 4 dimension tensor\n", "X_test = X_test.reshape((-1, 1, SLIDING_WINDOW_LENGTH, NB_SENSOR_CHANNELS))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Loading data...\n", " ..from file data/oppChallenge_gestures.data" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", " ..reading instances: train (557963, 113), test (118750, 113)\n", " ..after sliding window (testing): inputs (9894, 24, 113), targets (9894,)" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 14 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Define the Lasagne network" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Sensor data are processed by four convolutional layer which allow to learn features from the data. Two dense layers then perform a non-lineartransformation which yields the classification outcome with a softmax logistic regresion output layer" ] }, { "cell_type": "code", "collapsed": false, "input": [ "net = {}\n", "net['input'] = lasagne.layers.InputLayer((BATCH_SIZE, 1, SLIDING_WINDOW_LENGTH, NB_SENSOR_CHANNELS))\n", "net['conv1/5x1'] = lasagne.layers.Conv2DLayer(net['input'], NUM_FILTERS, (FILTER_SIZE, 1))\n", "net['conv2/5x1'] = lasagne.layers.Conv2DLayer(net['conv1/5x1'], NUM_FILTERS, (FILTER_SIZE, 1))\n", "net['conv3/5x1'] = lasagne.layers.Conv2DLayer(net['conv2/5x1'], NUM_FILTERS, (FILTER_SIZE, 1))\n", "net['conv4/5x1'] = lasagne.layers.Conv2DLayer(net['conv3/5x1'], NUM_FILTERS, (FILTER_SIZE, 1))\n", "net['shuff'] = lasagne.layers.DimshuffleLayer(net['conv4/5x1'], (0, 2, 1, 3))\n", "net['lstm1'] = lasagne.layers.LSTMLayer(net['shuff'], NUM_UNITS_LSTM)\n", "net['lstm2'] = lasagne.layers.LSTMLayer(net['lstm1'], NUM_UNITS_LSTM)\n", "# In order to connect a recurrent layer to a dense layer, it is necessary to flatten the first two dimensions\n", "# to cause each time step of each sequence to be processed independently (see Lasagne docs for further information)\n", "net['shp1'] = lasagne.layers.ReshapeLayer(net['lstm2'], (-1, NUM_UNITS_LSTM))\n", "net['prob'] = lasagne.layers.DenseLayer(net['shp1'],NUM_CLASSES, nonlinearity=lasagne.nonlinearities.softmax)\n", "# Tensors reshaped back to the original shape\n", "net['shp2'] = lasagne.layers.ReshapeLayer(net['prob'], (BATCH_SIZE, FINAL_SEQUENCE_LENGTH, NUM_CLASSES))\n", "# Last sample in the sequence is considered\n", "net['output'] = lasagne.layers.SliceLayer(net['shp2'], -1, 1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Load the model parameters" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# The model is populated with the weights of the pretrained network\n", "all_params_values = cp.load(open('weights/DeepConvLSTM_oppChallenge_gestures.pkl'))\n", "lasagne.layers.set_all_param_values(net['output'], all_params_values)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Run the model" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Compile the Theano function required to classify the data" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Compilation of theano functions\n", "# Obtaining the probability distribution over classes\n", "test_prediction = lasagne.layers.get_output(net['output'], deterministic=True)\n", "# Returning the predicted output for the given minibatch\n", "test_fn = theano.function([ net['input'].input_var], [T.argmax(test_prediction, axis=1)])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py:1019: Warning: In the strict mode, all neccessary shared variables must be passed as a part of non_sequences\n", " 'must be passed as a part of non_sequences', Warning)\n", "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py:135: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility\n", " from scan_perform.scan_perform import *\n" ] } ], "prompt_number": 20 }, { "cell_type": "raw", "metadata": {}, "source": [ "Testing data are segmented in minibatches and classified." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def iterate_minibatches(inputs, targets, batchsize, shuffle=False):\n", " assert len(inputs) == len(targets)\n", " if shuffle:\n", " indices = np.arange(len(inputs))\n", " np.random.shuffle(indices)\n", " for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):\n", " if shuffle:\n", " excerpt = indices[start_idx:start_idx + batchsize]\n", " else:\n", " excerpt = slice(start_idx, start_idx + batchsize)\n", " yield inputs[excerpt], targets[excerpt]\n", " \n", "# Classification of the testing data\n", "print(\"Processing {0} instances in mini-batches of {1}\".format(X_test.shape[0],BATCH_SIZE))\n", "test_pred = np.empty((0))\n", "test_true = np.empty((0))\n", "start_time = time.time()\n", "for batch in iterate_minibatches(X_test, y_test, BATCH_SIZE):\n", " inputs, targets = batch\n", " y_pred, = test_fn(inputs)\n", " test_pred = np.append(test_pred, y_pred, axis=0)\n", " test_true = np.append(test_true, targets, axis=0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Processing 9894 instances in mini-batches of 100\n" ] } ], "prompt_number": 21 }, { "cell_type": "raw", "metadata": {}, "source": [ "Models is evaluated using the F-Measure, a measure that considers the correct classification of each class equally important. Class imbalance is countered by weighting classes according to their sample proportion." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Results presentation\n", "print(\"||Results||\")\n", "print(\"\\tTook {:.3f}s.\".format( time.time() - start_time))\n", "import sklearn.metrics as metrics\n", "print(\"\\tTest fscore:\\t{:.4f} \".format(metrics.f1_score(test_true, test_pred, average='weighted')))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "||Results||\n", "\tTook 35.687s.\n", "\tTest fscore:\t0.9157 " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 22 } ], "metadata": {} } ] }