{ "metadata": { "name": "", "signature": "sha256:87c8d49710ff70019a7cc8e1008537d3731d7e25569e6a8a7af59aa96f08e78c" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Hands-on: Numerical Python -- Core tools for loading and storing data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Objectives**\n", "\n", "To get grasp of common utilities for loading data from/to common data formats. Upon completion of this class you should be able\n", "\n", "- to load data from CSV files as Python disctionary or NumPy array\n", "- pickle and unpickle data\n", "- load and save neuroimaging data" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "CSV (Comma separated values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tabular data is often stored in CSV files, of which we have a little fake example:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!head files/exp_res.csv" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "builtin **csv** module can be a big help to load them up:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import csv" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "entries = list(csv.DictReader(open('files/exp_res.csv','r')))" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which later could also be saved" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open('files/exp_res_saved.csv', 'w') as f:\n", " writer = csv.DictWriter(f, entries[0].keys())\n", " writer.writeheader()\n", " writer.writerows(entries)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!head files/exp_res_saved.csv" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "**Excercise**\n", "\n", "In our HW2/HW3 we stored response and response time in our custom format. Revisit your HW3 (you must have submitted by now, if not -- HW2), add cmdline option to store in CSV, upon which the same information should be stored as CSV format. Hint: You don't have to use `DictWriter` ;-)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Pickling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if dealing with any Python data type, regardless how nested they are, we can pickle those data structures (well, fancy work would be to \"serialize\") and store to a file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pickle\n", "with open(\"files/exp_res.pickle\", \"w\") as f:\n", " pickle.dump(entries, f)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "so they could be stored in file and reloaded later on:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open(\"files/exp_res.pickle\") as f:\n", " entries_in_winter = pickle.load(f)\n", "\n", "print entries_in_winter == entries" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pickling is convenient but\n", "\n", "- Python-specific: not portable across languages/environments\n", "- Often not efficient for storing vast (numpy) data arrays\n", "- csv/txt files are more portable happen you have simple tabular data\n", "- you might like to look into [json](https://docs.python.org/2/library/json.html) if you want to go beyond tabular data while staying language agnostic, lightweight and standardized" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "NumPy I/O functionality" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy's `loadtxt` provides a helper to load any tabular data from a text file" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1)\n", "print a[:2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And gurus could use structured data types (which I successfully avoided to describe in the previous lecture)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1,\n", " dtype=[('Subject', int), ('Question', int), ('RT', float)])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print a[:2]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print a[\"Question\"]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "print a[[\"Question\", \"RT\"]]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "\n", "**Excercise**\n", "\n", "Choose any method from above (csv, numpy) to load this sample data and compute per question mean/std. Do not assume that entries anyhow ordered. You might like to use \"np.unique\" function" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Neuroimaging data I/O: nibabel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The holy grail library for anyone doing neuroimaging in Python. Provides I/O to a variety (but not all, yet) of neuroimaging formats making access to data and meta-information very easy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import nibabel as nib" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Commands below make use of [git-annex]() to download tutorial data for PyMVPA. If you don't have annex installed -- you would benefit from installing it. Otherwise, the same data is available as a tarball from http://data.pymvpa.org/datasets/tutorial_data -- download and extract it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "!git clone http://data.pymvpa.org/datasets/tutorial_data/.git" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!cd tutorial_data/data; git annex get anat.nii.gz mask_* bold.nii.gz" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To open any neuroimaging data file, use `.load`" ] }, { "cell_type": "code", "collapsed": false, "input": [ "anat = nib.load('tutorial_data/data/anat.nii.gz')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "which will open and load meta-information but not real data, which you can load if desired as a numpy array:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "anat_data = anat.get_data()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can now analyze that data as any other numpy array and whenever you are done -- save it as another Nifti image. E.g. let's perform `poor man skull stripping` with which we remove all voxels which carry values less than 50:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "anat_data[anat_data < 50] = 0" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "nib.Nifti1Image(anat_data, anat.get_affine(), anat.get_header()).to_filename('/tmp/anat_betted.nii.gz')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- - -\n", "\n", "**Excercise**\n", "\n", "Load functional volume (`bold.nii.gz`) and one of the masks. Apply the mask (zero out non-mask voxels), save resultant \"masked\" volume as a new `bold_masked.nii.gz`." ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }