{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 02. Fetching binary data with `urllib` and unzipping files with `zipfile`\n", "\n", "If you can get the web address, or URL, of a specific binary file that you found on some website, you can usually download it fairly easily using Python's native `urllib` package, which is a simple interface for interacting with network resources. \n", "\n", "Here, we demonstrate how to use the `request` sub-module of `urllib` package to send a request to a server and handle the repsonse. While `request` can do much more than just download files, we'll keep ourselves to its `request.urlretrieve` function, which is designed to fetch remote files, saving them as local files.. \n", "\n", "We'll use the urllib.urlretrive to download a Census tract shapefile located on the US Census's web server: https://www2.census.gov/geo/tiger/TIGER2017/TRACT. The one file we'll get is tracts for North Dakota (because it's a fairly small file): `tl_2017_38_tract.zip`. \n", "\n", "We'll also take this opportunity to examine how Python can unzip files with the `zipfile` library." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Import the two modules\n", "from urllib import request\n", "import zipfile" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Specify the URL of the resource\n", "theURL = 'https://www2.census.gov/geo/tiger/TIGER2017/TRACT/tl_2017_38_tract.zip'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Set a local filename to save the file as\n", "localFile = 'tl_2017_38_tract.zip'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#The urlretrieve function downloads a file, saving it as the file name we specify\n", "request.urlretrieve(url=theURL,filename=localFile)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now take a look at the contents of your folder. You'll see a local copy of the zip file!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!dir *.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This works for images too..." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imgURL = 'https://imgs.xkcd.com/comics/state_borders.png'\n", "request.urlretrieve(url=imgURL,filename=\"map.jpg\",);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "#Display the file in our notebook\n", "from IPython.display import Image\n", "Image(\"map.jpg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So now, let's look at how the zipfile module can unzip it" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#First open the local zip file as a zipFile object\n", "zipObject = zipfile.ZipFile(localFile)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Create a folder to hold the file\n", "\n", "#Name the folder we'll create the same as the file, without the extension\n", "outFolderName = localFile[:-4]\n", "\n", "#Well us the os module to do check if the folder exists, and create if not\n", "import os\n", "if not os.path.exists(outFolderName): \n", " outFolder = os.mkdir(localFile[:-4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "zipObject.extractall(path=outFolderName)\n", "zipObject.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And viola! If we use some looping, we can tell Python to download a lot of files at our command!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 2 }