{ "metadata": { "name": "intro_to_pycap" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# General REDCap API Usage\n", "\n", "[Scott Burns](http://sburns.org), last updated 2013-07-12\n", "\n", "As far as I can tell, there isn't a tutorial on the internet about how to use the REDCap API. So here goes...\n", "\n", "[REDCap](http://project-redcap.org) is an advanced web-based application for securely storing and retrieving tabular data. In simple terms, it can be thought of a web-based spreadsheet, though it is much more than that. It provides an **Application Programming Interface** which means external software can programmatically download and upload data into REDCap Projects. This tutorial assumes working knowledge of REDCap. When all else fails, please consult your site's API help page, which is at https://redcap.vanderbilt.edu/api/help for Vanderbilt.\n", "\n", "Becuase the API is based on simple HTTP requests, any programming langauge with a HTTP library can use the REDCap API. I'm going to demonstrate simple API usage in python using the wonderful [requests](http://python-requests.org) library.\n", "\n", "To use the REDCap API, you must know the following:\n", "\n", "* The API URL for your site's REDCap installation. For Vanderbilt, this url is `https://redcap.vanderbilt.edu/api/`.\n", "* The API token for your Project. A token is generated by the REDCap administrators and connects your user account to a particular REDCap Project. Therefore, if you have API access to many Projects, you will have many tokens to manage.\n", "\n", "## Basic Usage\n", "\n", "Every call to the REDCap API is a HTTP POST request with specific parameters in the payload. The `token` parameter is always required as this tells the API from which Project you're requesting a response. Next, the `content` parameter is used to declare the type of request you're making. Finally, you may want to include the `format` field as well as this tells the API in what format you want the response. It defaults to returning a CSV string, but I generally prefer getting [json-formatted](http://www.json.org) responses as that format can be easily converted to actual in-memory objects like lists, dictionaries, strings, etc.\n", "\n", "So let's begin by making the most simple request, exporting the Project's Metadata (AKA Data Dictionary)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from requests import post\n", "# Two constants we'll use throughout\n", "TOKEN = '8E66DB6844D58E990075AFB51658A002'\n", "URL = 'https://redcap.vanderbilt.edu/api/'\n", "\n", "payload = {'token': TOKEN, 'format': 'json', 'content': 'metadata'}\n", "\n", "response = post(URL, data=payload)\n", "print response.status_code" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "200\n" ] } ], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "A few things to talk about here:\n", "\n", "- At least at Vanderbilt, don't forgot the trailing slash at the end of the API URL string. Your site may differ but if you mess up the URL, nothing will work and you'll probably get 501 \"Method Not Implemented\" responses.\n", "- Under no circumstance should you ever publicize your Project token. This is like publishing the password you use to login to REDCap, which you would never do. In this instance however, this token is from a dummy project I use to test things with. There's no real data and definitely not any PHI in it, so I'm not super worried.\n", "\n", "But just to be clear:\n", "\n", "## Under no circumstances should you publicize your project token(s)!\n", "\n", "You've been warned. (If you do publicize them for whatever reason, don't fret. Just delete those tokens through the web app ASAP and request new tokens).\n", "\n", "With that out of the way, the API accepted our request and returned data with a '200' status, which means \"everything is peachy\" in HTTP.\n", "\n", "Now let's examine our metadata a bit. The `.json()` method I'm going to use just decodes the response (every language's JSON library will work a bit differently, though)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "metadata = response.json()\n", "print \"This project has %d fields\" % len(metadata)\n", "print\n", "print \"field_name (type) ---> field_label\"\n", "print \"---------------------------\"\n", "for field in metadata:\n", " print \"%s (%s) ---> %s\" % (field['field_name'], field['field_type'], field['field_label'])\n", "print \n", "print 'Every field has these keys: %s' % ', '.join(sorted(metadata[0].keys()))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "This project has 11 fields\n", "\n", "field_name (type) ---> field_label\n", "---------------------------\n", "study_id (text) ---> Study ID\n", "first_name (text) ---> First Name\n", "last_name (text) ---> Last Name\n", "dob (text) ---> Date of Birth\n", "sex (dropdown) ---> Gender\n", "address (notes) ---> Street, City, State, ZIP\n", "phone_number (text) ---> Phone number\n", "file (file) ---> File\n", "foo_score (text) ---> Test score for Foo test\n", "bar_score (text) ---> Test score for Bar test\n", "image_path (text) ---> image_path\n", "\n", "Every field has these keys: branching_logic, custom_alignment, field_label, field_name, field_note, field_type, form_name, identifier, matrix_group_name, question_number, required_field, section_header, select_choices_or_calculations, text_validation_max, text_validation_min, text_validation_type_or_show_slider_number\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned json decodes to a list of `dict` objects (python's name for hash tables). We see that there are 11 fields in this project, we print out a mapping of the `field_name` (the \"machine\" name for a field) along with it's type and the `field_label` (the human-readable description). Finally, I just print out all of the keys from the first field so we can look at all of the data that comes with each field.\n", "\n", "For all intents and purposes, this data structure is what we get when we manually download the Data Dictionary from our project, just in a slightly more machine-readable format.\n", "\n", "## Data Export\n", "\n", "Here's the fun part. Just tweak the request payload a little and we'll download all of the data from our project:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "payload['content'] = 'record'\n", "payload['type'] = 'flat' # we want each row to contain the entire record\n", "response = post(URL, data=payload)\n", "data = response.json()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Voilà, we've just downloaded all of the data from our project. Let's examine it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "print \"This project has %d records\" % len(data)\n", "\n", "print \"Each record has the following keys: %s.\" % ', '.join(data[0].keys())\n", "print \n", "print \"But our metadata structure has the following fields: %s!\" % ', '.join(f['field_name'] for f in metadata)\n", "print" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "This project has 3 records\n", "Each record has the following keys: phone_number, first_name, last_name, image_path, dob, demographics_complete, foo_score, sex, study_id, file, address, imaging_complete, testing_complete, bar_score.\n", "\n", "But our metadata structure has the following fields: study_id, first_name, last_name, dob, sex, address, phone_number, file, foo_score, bar_score, image_path!\n", "\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'd be wrong to assume the fields we get from exporting the data matches the `field_name`s from the metadata structure. This is because the REDCap API also returns the status of all of the forms for a particular record. These fields are always called `[form name]_complete` where `[form name]` is the lowercased & underscore-replaced version of the forms you see in the web-application. (You would be correct to assume the fields from an export are a superset of the fields from the metadata structure)\n", "\n", "We can examine a particular record like so:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "record = data[0]\n", "for field_name, value in record.items():\n", " print \"%s: %s\" % (field_name, value)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "phone_number: (615) 555-1234\n", "first_name: Billy Bob\n", "last_name: blah blah\n", "image_path: /path/to/image\n", "dob: 2000-01-01\n", "demographics_complete: 2\n", "foo_score: 100\n", "sex: 1\n", "study_id: 1\n", "file: [document]\n", "address: 123 Main Street, Anytown USA 23456\n", "imaging_complete: 2\n", "testing_complete: 2\n", "bar_score: 2\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty neat. Within the `payload` that you send to the API, you can specify parameters that will limit the response to just include specific records, fields, forms, events (if your Project is longitudinal) and whether to get the raw or human-label in mutliple-choice fields. Experimenting with these calls is left to the reader.\n", "\n", "## Importing new data\n", "\n", "Even fancier than exporting current data from the Project is updating records through the API. This payload looks a little different, though. We've got to encode the data that we want to import and attach it to the payload." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from json import dumps # the function we'll need to make a json-string of our new data\n", "\n", "updated_record = data[0]\n", "# Update a particular field\n", "updated_record['foo_score'] = '100'\n", "\n", "#we have to pass a list of records to the redcap API, so we're going to dump our new record within a list\n", "# and we need to specify how to format the json string\n", "to_import_json = dumps([updated_record], separators=(',',':'))\n", "payload['data'] = to_import_json\n", "\n", "response = post(URL, data=payload)\n", "print response.json()['count']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Real quickly:\n", "\n", "- We updated a field from the first record.\n", "- We made a json-formatted string of this data structure (after packing it into a list because that's what the API wants).\n", "- We attached this data to the `data` field of the payload and made the request to the API.\n", "- By default when importing data, the API will respond with a dict with the key `count`. This number is how many records you imported. You can see here that we import one record.\n", "\n", "You might be wondering to yourself, how did the API know which record to update? That information is specified in the `study_id` field because `study_id` is the primary key of the Project, which is by definition the first field in the metadata (take this opportunity to look back and see that `study_id` was in fact the first field).\n", "\n", "Note, we formatted the incoming data as json because that was the format we specified in the `format` parameter of the payload. You could just as easily import data formatted as CSV or XML if you change that parameter.\n", "\n", "Exporting and Importing data are the two most important methods of the API. You can also download, upload and delete files stored in `file` fields per record but doing this is different for every HTTP library so I'll let you figure it out for your programming language :)\n", "\n", "That brings us to the end of how to use the REDCap API generally. I've implemented everything above in python, but you're free to use whatever language you like as long as it has an HTTP library.\n", "\n", "That being said, python is fantastic language with great libraries for high-level data manipulation like [pandas](http://pandas.pydata.org), low-level data structures like [NumPy](http://www.numpy.org), scientific libraries like [SciPy](http://www.scipy.org). Python is also very popular in web development communities so there are web frameworks like [Django](https://www.djangoproject.com) and [Flask](http://flask.pocoo.org) in case you want to build websites or applications. If you need to do some advanced task, there probably exists a python package to help you on your way. It's a great platform to build all sorts of tools.\n", "\n", "# Using the REDCap API in Python Applications\n", "\n", "To make it easier to use the REDCap API from within python scripts and applications, I wrote [`PyCap`](http://sburns.github.io/PyCap). I'll assume a Mac OS X or Linux environment, though all of this should work on Windows. It assumes working knowledge of the shell and the python language.\n", "\n", "First, we must install the package. In a shell:\n", "\n", "`$ pip install PyCap`\n", "\n", "If you don't have `pip` installed, this will work (you really should though, `easy_install` is considered deprecated by much of the python community):\n", "\n", "`$ easy_install PyCap`\n", "\n", "You may notice another package, `requests`, is installed as well. \n", "\n", "With installation out of the way, let's start writing python. We'll begin with importing the package. The two main classes your scripts and applications should use are the `Project` class and the `RedcapError` exception." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from redcap import Project, RedcapError" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "(As long as this import doesn't fail, you installed `PyCap` correctly).\n", "\n", "## Connecting to REDCap Projects\n", "\n", "Just like above, you'll need to know your API token and URL for your site." ] }, { "cell_type": "code", "collapsed": false, "input": [ "project = Project(URL, TOKEN)\n", "\n", "for field in project.metadata:\n", " print \"%s (%s) ---> %s\" % (field['field_name'], field['field_type'], field['field_label'])\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "study_id (text) ---> Study ID\n", "first_name (text) ---> First Name\n", "last_name (text) ---> Last Name\n", "dob (text) ---> Date of Birth\n", "sex (dropdown) ---> Gender\n", "address (notes) ---> Street, City, State, ZIP\n", "phone_number (text) ---> Phone number\n", "file (file) ---> File\n", "foo_score (text) ---> Test score for Foo test\n", "bar_score (text) ---> Test score for Bar test\n", "image_path (text) ---> image_path\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "When you create a `Project`, PyCap automatically exports the metadata from your project. First, it does so to setup a few nice attributes on the object but more importantly, if the metadata request works correctly, the URL and token are correct and can be trusted to work later on.\n", "\n", "All of the methods the API provides are available. To demonstrate what we did above, consider the following:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "metadata = project.export_metadata()\n", "data = project.export_records()\n", "data[0]['first_name'] = 'Billy Bob'\n", "response = project.import_records(data)\n", "print response['count']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "3\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "In these 5 lines, we:\n", "\n", "- Made an export metadata request (by default in json format), then automatically decoded it.\n", "- Made a data export request (again, by default in JSON format) and returning the decoded data.\n", "- Tweaked a single field of the first record.\n", "- Imported the new data.\n", "- Printing how many records were imported.\n", "\n", "All of the HTTP request machinery, making sure the payloads correct, encoding and decoding the JSON responses is handled for you. I wrote PyCap because I think most people just want their data and shouldn't have to know HTTP to make it happen. Trust me, I made a lot of mistakes in building this library. You should use it so you don't have to waste your time.\n", "\n", "## File downloads/uploads/deletions\n", "\n", "I didn't really go through file actions above because every HTTP library is going to deal with files differently. If you use PyCap, file operations are super simple:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "record = '1'\n", "field = 'file'\n", "contents, headers = project.export_file(record, field)\n", "print contents\n", "print headers['name']" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Just some data, you know.\n", "data.txt\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously, most important returned data is the file contents. In the web-application, the filename you see for this particular record/field is what comes through in `headers['name']`. So if you want to save it to your local hard drive, it's easy to keep the same name." ] }, { "cell_type": "code", "collapsed": false, "input": [ "with open(headers['name'], 'w') as f:\n", " f.write(contents)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just FYI, if you download a stored PDF, `contents` will be the binary data string and you'll want to open the file in the `wb` mode.\n", "\n", "Let's say we want to upload a new file to that record. A little more complicated, but still pretty easy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# First write a new file\n", "with open(headers['name'], 'w') as f:\n", " f.write('Yeah, I decided to change the contents of the file')\n", "\n", "new_fname = 'new_data.txt'\n", "with open(headers['name'], 'r') as f:\n", " response = project.import_file(record, field, new_fname, f)\n", " \n", "# just to check...\n", "contents, headers = project.export_file(record, field)\n", "print contents\n" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Yeah, I decided to change the contents of the file\n" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And if you really want to delete a file from REDCap, that too is possible. **Warning** there is no undo button for this :)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "response = project.delete_file(record, field)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is more documentation for PyCap [here](http://sburns.github.io/PyCap)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Feedback/Questions/Comments\n", "\n", "Any feedback about this tutorial is greatly appreciated. There isn't much on the internet about this so I hope you find it helpful in your work with REDCap. Feel free to email me with questions & comments at `scott.s.burns@vanderbilt.edu`" ] } ], "metadata": {} } ] }