{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Description of data sources\n", "\n", "Here we give a brief description of *where* to find the main data sets used in this tutorial. Detailed descriptions of how to work with this data once it has been downloaded are given within the main tutorial content (links given below).\n", "\n", "___" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### MNIST handwritten digits\n", "\n", "This is arguably the most well-known benchmark data set for the pattern recognition task. The data is available at\n", "\n", " http://yann.lecun.com/exdb/mnist/\n", " \n", "for anyone with an internet connection. No registration is required.\n", "\n", "Once the raw data has been acquired, we assume that it is stored in the `data/mnist` directory, and prepared as follows.\n", "\n", "```\n", "$ mkdir data/mnist\n", "$ cd data/mnist\n", "$ wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n", "$ wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\n", "$ wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\n", "$ wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\n", "$ gunzip *\n", "```\n", "\n", "From here, we can get to work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### CIFAR-10 tiny images\n", "\n", "Let's prepare a sub-directory called `cifar10` to store the data. There are three versions of the data available: a Python version, MATLAB version, and binary version. While the Python version is perfectly acceptable, let's prepare using the binary version.\n", "\n", "```\n", "$ cd data/cifar10\n", "$ wget http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz\n", "$ tar -xzf cifar-10-binary.tar.gz\n", "```\n", "\n", "A directory `cifar-10-batches-bin` is created, with content as follows:\n", "\n", "```\n", "$ ls cifar-10-batches-bin\n", "batches.meta.txt data_batch_2.bin data_batch_4.bin readme.html\n", "data_batch_1.bin data_batch_3.bin data_batch_5.bin test_batch.bin\n", "```\n", "\n", "From here, we can get to work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Fisher's Iris data set\n", "\n", "```\n", "$ mkdir data/iris\n", "$ cd data/iris\n", "$ wget [ URL ]/bezdekIris.data\n", "$ wget [ URL ]/iris.data\n", "$ wget [ URL ]/iris.names\n", "```\n", "\n", "where `[ URL ]` is as follows:\n", "\n", "```\n", "https://archive.ics.uci.edu/ml/machine-learning-databases/iris\n", "```\n", "\n", "From here, we can get to work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Visual image data\n", "\n", "The *vim-2* data set, also known as the \"Gallant Lab Natural Movie 4T fMRI Data set\", is available from the website of Collaborative Research in Computational Neuroscience (CRCNS), at the following URL:\n", "\n", " https://crcns.org/data-sets/vc/vim-2\n", "\n", "This requires free registration to CRCNS.org, which can be done quickly using their \"Request Account\" page:\n", "\n", " https://crcns.org/request-account\n", " \n", "The application is screened, and so it may take a day or two before it is (hopefully) accepted.\n", "\n", "If you are just downloading it locally, then logging in and downloading via your browser is perfectly acceptable, but if you are using a remote server for computation, be it your own or some cloud-based solution, it is best to make use of the download scripts that are provided:\n", "\n", " https://crcns.org/download\n", "\n", "Under \"Batch download method\", there is a link (https://portal.nersc.gov/project/crcns/download/tools) to a page which requires input of your username and password. From here, we get access to the sub-directory within `tools`. Looking inside `tools/download`, there are a handful of files, including `crcns-download-tools-instuctions`, which explains how to set up the configuration file and how to use the download/verification scripts. Setup requires only a few minutes; just follow the lucid instructions and take a break while the files are downloaded.\n", "\n", "Once the raw data has been acquired, we assume that it is stored in the `data/vim-2` directory, in whatever your working directory is.\n", "\n", "From here, we can get to work." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "___" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }