{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A4.1. Getting MODIS URLs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Access to MODIS data is now through `http`, which means that previous methods using `ftp` no longer operate.\n", "\n", "In some ways, this complicates automatic download (also, download seems now to be throttled, which means it takes longer to access the data).\n", "\n", "That said, you can of course still easily order data through NASA tools such as [reverb](http://reverb.echo.nasa.gov).\n", "\n", "Some tools have been developed to allow automated access to MODIS products from Python, such as [get_modis](https://github.com/jgomezdans/get_modis), but here, we will demonstrate how you can do it yourself, semi-automatically.\n", "\n", "We will see that a large part of the overhead and complexity is negotiating the directory structure.\n", "\n", "We have provided a shell programme [`zat`](files/python/zat) that will produce a list of urls of the MODIS products on the USGS server (use `zat > ` [`urls.txt`](files/data/robot.txt)), which you would probably find more convenient than this section. \n", "\n", "So, only go through section A1 if you are particularly intererested in trawling directories with http ...\n", "\n", "Once we have a full list of the urls of the hdf files that we want, life is much simpler. Such a list of urls is *exactly* what [reverb](http://reverb.echo.nasa.gov) supplies you with." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A4.1.1 Identify the server and directory structure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, you need to identify which datasets you want. You should explore the data products e.g. through [reverb](http://reverb.echo.nasa.gov) to do this.\n", "\n", "If you go through the ordering system for one tile of these products, you can get the information you need for further data download. When you come to order the data, it will give you a download file.\n", "\n", "As an example:\n", "\n", "- MODIS LAI/fAPAR for Trerra and Aqua 8 day composite for 17 Jan 2013 for tile h18v03" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2013.01.17/MCD15A2.A2013017.h18v03.005.2013026065052.hdf\n", "http://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2013.01.26/BROWSE.MCD15A2.A2013017.h18v03.005.2013026065052.1.jpg\n", "http://e4ftl01.cr.usgs.gov/WORKING/BRWS/Browse.001/2013.01.26/BROWSE.MCD15A2.A2013017.h18v03.005.2013026065052.2.jpg\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2013.01.17/MCD15A2.A2013017.h18v03.005.2013026065052.hdf.xml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A4.1.2 Identify the available dates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From this, we see that the server is `e4ftl01.cr.usgs.gov`, that the `hdf` data (the spatial dataset we want) for the product `MCD15A2` version `005` is in the directory `MODIS_Composites/MOTA/MCD15A2.005`.\n", "\n", "Below that, we have the date and then the filename.\n", "\n", "Let's use `urllib2` to explore this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import urllib2\n", "url_base = 'http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005'\n", "response = urllib2.urlopen(url_base)\n", "html = response.read()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# print the first 30 lines\n", "html.split('\\n')[:30]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an html directory listing. We can identify the directories as lines that contain `[DIR]`.\n", "\n", "We can use `find` to identify lines that have this field:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs = []\n", "for line in html.split('\\n'):\n", " if line.find('[DIRS]'):\n", " dirs.append(line)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# or more succinctly\n", "dirs = [line for line in html.split('\\n') if line.find('[DIR]') != -1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice that the first such line is the directory listing information, so, what we really want is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs = [line for line in html.split('\\n') if line.find('[DIR]') != -1][1:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The subdirectory name is jusr after the field `href=\"`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print dirs[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print dirs[1].split('href=\"')[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print dirs[1].split('href=\"')[1].split('/\">')[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, in this case, we can get the subdirectory names with:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs = [line.split('href=\"')[1].split('/\">')[0] for line in html.split('\\n') if line.find('[DIR]') != -1][1:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# print the first 10\n", "dirs[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pattern is `YYYY.MM.DD`. So we could split these as we go along. It would be convenient to have this as a numpy array:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs = np.array([line.split('href=\"')[1].split('/\">')[0].split('.') \\\n", " for line in html.split('\\n') if line.find('[DIR]') != -1][1:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "dirs[:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "all_years = np.sort(np.unique(dirs[:,0]))\n", "all_months = np.sort(np.unique(dirs[:,1]))\n", "all_doys = np.sort(np.unique(dirs[:,2]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "years,months,doys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A4.1.3 Identify the datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know the full url is of the form:\n", "\n", "`http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2013.01.17/MCD15A2.A2013017.h18v03.005.2013026065052.hdf\n", "`\n", "\n", "Simplifying what we did above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import urllib2\n", "url_base = 'http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005'\n", "response = urllib2.urlopen(url_base)\n", "dirs = np.array([line.split('href=\"')[1].split('/\">')[0] for line in html.split('\\n') if line.find('[DIR]') != -1][1:])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "years = np.array([i.split('.')[0] for i in dirs])\n", "# year mask\n", "year = '2012'\n", "mask = (year == years)\n", "sub_dirs = dirs[mask]\n", "print sub_dirs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# test with first one\n", "this_date = sub_dirs[0]\n", "\n", "url_date = url_base + '/' + this_date\n", "print url_date\n", "response1 = urllib2.urlopen(url_date)\n", "html1 = response1.read()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# print the first 21 lines\n", "html1.split('\\n')[:21]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We note that the directory contains data for all tiles.\n", "\n", "Lets filter only lines that have the tile we want in:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tile = 'h18v03'\n", "lines = [line for line in html1.split('\\n') if line.find(tile) != -1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want the `.hdf` file, so refine the filter:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tile = 'h18v03'\n", "hdf_lines = [i for i in [line for line in html1.split('\\n') \\\n", " if line.find(tile) != -1] if i.find('.hdf\"') != -1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hdf_lines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now split this to get the filename we want:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "hdf_lines[0].split('')[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, putting all of that together:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tile = 'h18v03'\n", "hdf_lines = [i for i in [line for line in html1.split('\\n') \\\n", " if line.find(tile) != -1] if i.find('.hdf\"') != -1]\n", "hdf_file = hdf_lines[0].split('')[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A4.1.4 Some code for MODIS LAI filenames for a year" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The http access is quite slow, so this may take some minutes to run." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.01\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.09\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.17\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.25\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.02\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.10\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.18\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.26\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.05\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.13\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.21\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.29\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.06\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.14\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.22\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.30\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.08\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.16\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.24\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.01\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.09\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.17\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.25\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.03\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.11\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.19\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.27\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.04\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.12\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.20\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.28\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.05\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.13\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.21\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.29\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.07\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.15\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.23\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.31\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.08\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.16\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.24\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.02\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.10\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.18\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.26\n" ] } ], "source": [ "year = '2012'\n", "tile = 'h17v03'\n", "\n", "\n", "hdf_files = []\n", "\n", "import urllib2\n", "\n", "# base URL for the product\n", "url_base = 'http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005'\n", "\n", "response = urllib2.urlopen(url_base)\n", "html = response.read()\n", "\n", "dirs = np.array([line.split('href=\"')[1].split('/\">')[0] for line in html.split('\\n') if line.find('[DIR]') != -1][1:])\n", "\n", "# identify years\n", "years = np.array([i.split('.')[0] for i in dirs])\n", "# year mask\n", "mask = (year == years)\n", "sub_dirs = dirs[mask]\n", "\n", "for this_date in sub_dirs:\n", " url_date = url_base + '/' + this_date\n", " print url_date\n", " response1 = urllib2.urlopen(url_date)\n", " html1 = response1.read()\n", " hdf_lines = [i for i in [line for line in html1.split('\\n') \\\n", " if line.find(tile) != -1] if i.find('.hdf\"') != -1]\n", " hdf_file = url_date + '/' + hdf_lines[0].split('')[0]\n", " hdf_files.append(hdf_file+'\\n')\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case download fails later, lets save this list `hdf_files`. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f = open('files/data/lai_list.txt','w')\n", "f.writelines(hdf_files)\n", "f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A4.1.5 Pull Data from url" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This part is actually faster than doing all of that messing around with directories.\n", "\n", "You don't really want to have to do too much of the directory exploration, so it is *probably* a good idea to just periodically scan the whole structure and store that in a local file. You can then parse the local file much more easily (that is what we do in the main part of the class).\n", "\n", "This is achieved for instance with the shell [`zat`](files/python/zat) (use `zat > ` [`urls.txt`](files/data/robot.txt)) if you want to do an update, or just use the existing [url file](files/data/robot.txt)." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.01/MCD15A2.A2012001.h17v03.005.2012017211237.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.09/MCD15A2.A2012009.h17v03.005.2012019044037.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.17/MCD15A2.A2012017.h17v03.005.2012026072526.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.01.25/MCD15A2.A2012025.h17v03.005.2012052124839.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.02/MCD15A2.A2012033.h17v03.005.2012042060649.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.10/MCD15A2.A2012041.h17v03.005.2012050092057.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.18/MCD15A2.A2012049.h17v03.005.2012068144447.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.02.26/MCD15A2.A2012057.h17v03.005.2012068140544.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.05/MCD15A2.A2012065.h17v03.005.2012075021749.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.13/MCD15A2.A2012073.h17v03.005.2012083010304.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.21/MCD15A2.A2012081.h17v03.005.2012090131602.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.03.29/MCD15A2.A2012089.h17v03.005.2012107201245.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.06/MCD15A2.A2012097.h17v03.005.2012108125047.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.14/MCD15A2.A2012105.h17v03.005.2012116125519.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.22/MCD15A2.A2012113.h17v03.005.2012122072153.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.04.30/MCD15A2.A2012121.h17v03.005.2012137221611.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.08/MCD15A2.A2012129.h17v03.005.2012142001241.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.16/MCD15A2.A2012137.h17v03.005.2012153021910.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.05.24/MCD15A2.A2012145.h17v03.005.2012160130927.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.01/MCD15A2.A2012153.h17v03.005.2012166161748.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.09/MCD15A2.A2012161.h17v03.005.2012170080216.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.17/MCD15A2.A2012169.h17v03.005.2012181134242.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.06.25/MCD15A2.A2012177.h17v03.005.2012188150145.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.03/MCD15A2.A2012185.h17v03.005.2012208181105.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.11/MCD15A2.A2012193.h17v03.005.2012202144013.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.19/MCD15A2.A2012201.h17v03.005.2012215131931.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.07.27/MCD15A2.A2012209.h17v03.005.2012219144450.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.04/MCD15A2.A2012217.h17v03.005.2012228215213.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.12/MCD15A2.A2012225.h17v03.005.2012234105932.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.20/MCD15A2.A2012233.h17v03.005.2012242093511.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.08.28/MCD15A2.A2012241.h17v03.005.2012250182515.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.05/MCD15A2.A2012249.h17v03.005.2012261231425.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.13/MCD15A2.A2012257.h17v03.005.2012270114223.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.21/MCD15A2.A2012265.h17v03.005.2012276134731.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.09.29/MCD15A2.A2012273.h17v03.005.2012297134400.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.07/MCD15A2.A2012281.h17v03.005.2012297135831.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.15/MCD15A2.A2012289.h17v03.005.2012299194634.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.23/MCD15A2.A2012297.h17v03.005.2012306163257.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.10.31/MCD15A2.A2012305.h17v03.005.2012314140451.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.08/MCD15A2.A2012313.h17v03.005.2012322095802.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.16/MCD15A2.A2012321.h17v03.005.2012335133638.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.11.24/MCD15A2.A2012329.h17v03.005.2012340181739.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.02/MCD15A2.A2012337.h17v03.005.2012346165133.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.10/MCD15A2.A2012345.h17v03.005.2012356133200.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.18/MCD15A2.A2012353.h17v03.005.2012363125132.hdf\n", "http://e4ftl01.cr.usgs.gov/MODIS_Composites/MOTA/MCD15A2.005/2012.12.26/MCD15A2.A2012361.h17v03.005.2013007202756.hdf\n" ] } ], "source": [ "import urllib2\n", "\n", "f = open('files/data/lai_list.txt','r')\n", "hdf_files = f.readlines()\n", "f.close()\n", "\n", "for url in hdf_files:\n", " url = url.strip()\n", " print url\n", " response = urllib2.urlopen(url.strip())\n", " ofile = 'files/data/' + url.split('/')[-1]\n", " f = open(ofile,'w')\n", " f.write(response.read())\n", " f.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A4.2 GDAL tools and HDF format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[HDF](http://www.hdfgroup.org/HDF-FAQ.html)(Hierarchical Data Format) and [HDF-EOS](http://hdfeos.org/) are common formats for EO data so you need to have some idea how to use and manipulate them.\n", "\n", "A hierarchical data format is essentially a format that ‘packs’ together various aspects of a dataset (metadata, raster data etc.) into a binary file. There are many tools for manipulating and reading HDF in python, but we will use one of the more generic tools, [gdal](http://gdal.org) here.\n", "\n", "When using HDF files, we need to have some idea of the stucture of the contents, although you can clearly explore that yourself in an interactive session. MODIS products have extensive information available to help you interpret the datasets, for example the MODIS LAI/fAPAR product [MOD15A2](https://lpdaac.usgs.gov/products/modis_products_table/leaf_area_index_fraction_of_photosynthetically_active_radiation/8_day_l4_global_1km/mod15a2). We will use this as an example to explore a dataset.\n", "\n", "You will need access to the file [`files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf`](files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf), which you might access from the [MODIS Land Products site](https://lpdaac.usgs.gov/)\n", "\n", "Before going into the Python coding for GDAL, it is worthwhile looking over some of the tools that are provided with GDAL and that can be run from the shell. In particular, we can use the `gdalinfo` program, that takes a filename and will output a copious description of the data, including metadata, but also geogrpahic projection, size, number of bands, etc.\n", "\n", "Here, we will look at the first 20 lines that come out of `gdalinfo`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Driver: HDF4/Hierarchical Data Format Release 4\r\n", "Files: files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf\r\n", "Size is 512, 512\r\n", "Coordinate System is `'\r\n", "Metadata:\r\n", " ALGORITHMPACKAGEACCEPTANCEDATE=10-01-2004\r\n", " ALGORITHMPACKAGEMATURITYCODE=Normal\r\n", " ALGORITHMPACKAGENAME=MCDPR_15A2\r\n", " ALGORITHMPACKAGEVERSION=5\r\n", " ASSOCIATEDINSTRUMENTSHORTNAME=MODIS\r\n", " ASSOCIATEDINSTRUMENTSHORTNAME=MODIS\r\n", " ASSOCIATEDPLATFORMSHORTNAME=Aqua\r\n", " ASSOCIATEDPLATFORMSHORTNAME=Terra\r\n", " ASSOCIATEDSENSORSHORTNAME=MODIS\r\n", " ASSOCIATEDSENSORSHORTNAME=MODIS\r\n", " AUTOMATICQUALITYFLAG=Passed\r\n", " AUTOMATICQUALITYFLAGEXPLANATION=No automatic quality assessment is performed in the PGE\r\n", " CHARACTERISTICBINANGULARSIZE=30.0\r\n", " CHARACTERISTICBINSIZE=926.625433055556\r\n", " DATACOLUMNS=1200\r\n" ] } ], "source": [ "!gdalinfo files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf | head -20" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use standard unix filters (e.g. `grep`) to look at particular fields:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " EASTBOUNDINGCOORDINATE=-92.3664205550513\n", " NORTHBOUNDINGCOORDINATE=39.9999999964079\n", " SOUTHBOUNDINGCOORDINATE=29.9999999973059\n", " WESTBOUNDINGCOORDINATE=-117.486656023174\n" ] } ], "source": [ "%%bash\n", "# Filter lines that do not have BOUNDINGCOORDINATE in them\n", "file=files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf\n", "gdalinfo $file | grep BOUNDINGCOORDINATE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check this against e.g. the [UNH MODIS tile calculator](http://remotesensing.unh.edu/modis/modis.shtml), just to confirm that we have interpreted the coordinates correctly.\n", "\n", "We can apply other shell GDAL tools, e.g. to perform a reprojection from the native [MODIS sinusoidal](http://modis-land.gsfc.nasa.gov/MODLAND_grid.html) projection, to the [Contiguous United States NAD27 Albers Equal Area](http://spatialreference.org/ref/sr-org/7271/):" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating output file that is 2152P x 1323L.\n", "Processing input file HDF4_EOS:EOS_GRID:files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf:MOD_Grid_MOD15A2:Lai_1km.\n", "Using internal nodata values (eg. 255) for image HDF4_EOS:EOS_GRID:files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf:MOD_Grid_MOD15A2:Lai_1km.\n", "0...10...20...30...40...50...60...70...80...90...100 - done.\n", "Input file size is 2152, 1323\n", "0...10...20...30...40...50...60...70...80...90...100 - done.\n" ] } ], "source": [ "%%bash\n", "# a bash script\n", "\n", "# set the variables file to be the filename for convenience\n", "file=files/data/MCD15A2.A2011185.h09v05.005.2011213154534.hdf\n", "\n", "# dselete the output file if it exists\n", "rm -f files/data/output_file.tif \n", "\n", "# reproject the data\n", "gdalwarp -of GTiff \\\n", " -t_srs '+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 \\\n", " +y_0=0 +ellps=clrk66 +units=m +no_defs' -tr 1000 1000 \\\n", " 'HDF4_EOS:EOS_GRID:'${file}':MOD_Grid_MOD15A2:Lai_1km' files/data/output_file.tif\n", "\n", "# convert to gif for viewing\n", "gdal_translate -outsize 30% 30% -of gif \\\n", " files/data/output_file.tif files/data/output_file.gif" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](files/data/output_file.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "where `MCD15A2.A2011185.h09v05.005.2011213154534.hdf` is the name of the input HDF file, `MOD_Grid_MOD15A2:Lai_1km` is the data product we want, and the rather menacing string `+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +ellps=clrk66 +units=m +no_defs` specifies the projection in Proj4 format. You can typically find the projection you want on [spatialreference.org](http://spatialreference.org), and just copy and paste the contents of [Proj4 definition](http://spatialreference.org/ref/sr-org/7271/proj4/) (remember to surround it by quotes). The option `-tr xres yres` specifies the desired resolution of the output dataset (1000 by 1000 m in the case above). `-of GTiff` specifies the GeoTiff format to be used as as output." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A4.3 Further Geospatial Notes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are additonal notes that go into the details of `gdal` and vectors processing tools.\n", "\n", "Follow these on:\n", "\n", "- [gdal](GDAL_Python_bindings.ipynb)\n", "- [ogr](OGR_Python)\n", "\n", "There are no explicit advanced exercises this week. Instead, you should eplore these notes and see if you can apply the concepts to your own datasets." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 0 }