{ "cells": [ { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "# Notebook-2: Thinking Like a Computer\n", "\n", "In order to understand how to _program_ a computer, it helps to learn how to _think_ like a computer. At least a little bit. A lot of what we do in programming strips back the veneer of point-and-click friendliness of OSX or Windows so that we can interact with the computer in ways that require much less human input. The world will seem a bit less friendly in the short term, but you will gain the ability to work much more quickly and powerfully with the computer through code." ] }, { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "## What _is_ a computer?\n", "\n", "At it's most basic, a computer is a programmable device for performing calculations.\n", "\n", "_This_ is a kind of computer.\n", "\n", "\n", "\n", "\n", "As is _this_.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are a huge number of videos on YouTube and resources accessible from Google that delve into more detail than we possibly can here about what is happening inside your computer." ] }, { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "### What's Going on Inside Your Computer?\n", "\n", "If you've never really got to grips with what is happening inside a computer, then [this TedED video](http://www.youtube.com/watch?v=AkFi90lZmXA) would be a good way to get started because it helps to explain the basics of things like I/O and what actually happens when you click with the mouse on a button. In fact, you will see that we've used code to import the YouTube video in a way that requires me to do very little work and this is one of the strengths of programming: that someone else created the code to embed a YouTube video into a notebook (which is what this web page is) and all I need to do is know how to ask that code to find the video on the YouTube web site. Everything else happens automatically." ] }, { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "[![What's Going On in There?](http://img.youtube.com/vi/AkFi90lZmXA/0.jpg)](http://www.youtube.com/watch?v=AkFi90lZmXA)" ] }, { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "### How a Computer Adds Numbers\n", "\n", "[This next video](http://www.youtube.com/watch?v=VBDoT8o4q00) is a little more technical and we don't really expect you to remember it, but it touches on a lot of really important concepts: binary numbers, Boolean logic, and how these basic building blocks are assembled into much more complex processes like adding numbers or, ultimately, manipulating data.\n", "\n", "[![IMAGE ALT TEXT HERE](http://img.youtube.com/vi/VBDoT8o4q00/0.jpg)](http://www.youtube.com/watch?v=VBDoT8o4q00)" ] }, { "cell_type": "markdown", "metadata": { "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)", "Jon Reades (https://github.com/jreades)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" } } }, "source": [ "The really important thing to get from this last video is that computers are chaining together long sets of simple operations which always basically work out to 1 or 0, which is the same as True or False. This is [Boolean logic](http://computer.howstuffworks.com/boolean.htm) and it is integral to computation _and_ to data processing, but you should always keep in mind that a huge set of calculations are going on in your computer in an order specified by a set of *rules*: do 'A', then do 'B', then... When these rules become sufficiently complex (they become like long recipes!) they are called algorithms. And when they get so complicated that they are not easy to write down as a set of logical outputs, it's often easier to express in a more human-readable form... which is why we have [programming languages](http://www.computerhope.com/jargon/p/proglang.htm).\n", "\n", "But remember: finding the average of a set of numbers involves an algorithm (which, in a digital comupter, is based on lots of logical operations involving 1s and 0s). And calculating the probability that the lecturer won't show up to the first lecture also involves an algorithm, it's just that it's a much more complicated one unless you take matters into your own hands and arrange for an accident..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Working Without Buttons\n", "\n", "At _some_ point in your exploration of programming you will need to learn how to read/write data from elsewhere on the computer or the Internet. At that point, for anyone who learned how to use a computer after about 1990, computers can quickly come to seem very unfriendly indeed! That's because when you are programming you will need to give the computer a lot more information about how to find, read, and write data. \n", "\n", "When you use an app on your mobile phone or on your computer it will often save your files somewhere convenient so that you can find them again, but this isn't magic: a programmer made a choice about how to do this for you, and now that you're learning to program _you_ have to make that choice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Paths\n", "\n", "Have you ever thought about how and where files are stored on your computer? Probably not, unless you were very, very bored. Unfortunately, when you start programming you _do_ need to learn a bit more about this -- enough, at least, to tell Python where to find the file that you want it to read... though you can do much more than that!\n", "\n", "A few starting principles:\n", "\n", "1. Directories (a.k.a. folders) and files all have a unique location _somewhere_ on your hard drive.\n", "\n", "2. A directory (or folder) can contain directories and files. A _simple_ file cannot contain a directory. Only special _types_ of file such as a Zip archive can contain a folder.\n", "\n", "3. The directory that sits at the bottom of the hierarchy (_i.e._ the _one_ directory that is not _in_ another directory) is known as the **root directory**.\n", "\n", "4. The directory in which _your_ settings and documents are saved (_i.e._ the stuff associated with your *username*) is known as the **home directory**.\n", "\n", "5. A file must be stored in a directory (there are no root files)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Paths in the Finder\n", "\n", "We'll get to a video in a second, but on a Mac you can 'view' the path in the Finder simply by clicking on any finder window and typing `Alt + Command + P` (or selecting `View > Path` from the menu bar). This will show the current path in a strip along the bottom of the finder window. Pay attention to the Path view in the movie below.\n", "\n", "[![The Path (Graphical Version)](http://img.youtube.com/vi/U0hrKz5Ajrg/0.jpg)](http://www.youtube.com/watch?v=U0hrKz5Ajrg \"The Path (Graphical Version)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Paths in a Terminal\n", "\n", "Now, sooner or later you are going to have to learn how to use the Terminal (a.k.a. Shell) because it will make certain tedious tasks go much, much faster. It is also by far the best way to install Python libraries or other 'libraries' that you need to develop your code. Learning to the use the Terminal is also going to help a _lot_ in learning to fix subtle problems with your program (e.g. checking that you are reading the _right_ data). \n", "\n", "So here's the _same_ process of navigating from a directory called `KCL Modules` down to the 2017-18 Geocomputation Teaching directory as you just saw above, but using the Terminal:\n", "\n", "[![The Path (Terminal Version)](http://img.youtube.com/vi/qxvhsuOKGaY/0.jpg)](http://www.youtube.com/watch?v=qxvhsuOKGaY \"The Path (Terminal Version)\")\n", "\n", "You'll notice that there were several seemingly cryptic commands -- we'll examine them in more detail below, but the important ones to note in this video are:\n", "\n", "- `ls` *lists* the contents of the current directory\n", "- `cd` *c*hanges *d*irectory\n", "\n", "We always say that programmers are lazy and this is a good example of that: why write `list` when we can write `ls` or `change directory` when we can write `cd`? You obviously need to learn what those bits of laziness mean, but they can help a lot in speeding up your code!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Navigating Using the Terminal\n", "\n", "You need to think of the terminal 'prompt' as having a 'location in' your computer: this works the same as clicking around through the Finder (or Windows Explorer) in that you have to move 'down' into sub-folders or 'up' into parent directories to see what's available at that 'level' of the drive. \n", "\n", "So in the movie you've just watched:\n", "1. I started the movie while in a directory called `KCL Teaching` that (you see at the end) sits under `/Users/my.blurred.username/Documents/`.\n", "2. I listed the files and directories in `KCL Teaching` using `ls`.\n", "3. I then changed directory (using `cd`) into the directory called `KCL Modules` (and note that, because there's a space in the directory name I had to write this as `KCL\\ Modules`... that's because the Terminal couldn't otherwise tell if we meant to move to a directory called `KCL` and then run a command called `Modules`; it's a long story).\n", "4. I then changed directory again into a directory called `Undergraduate` -- if you're keeping track this now means that we're 'in' `/Users/my.blurred.username/Documents/KCL\\ Teaching/KCL\\ Modules/Undergraduate/`\n", "5. We carry on navigating down the hierarchy until we reach the 2017-18 teaching folder, at which point I print the working directory to show you where we've ended up and that it's the same location as we reached using the Finder in the other video. \n", "\n", "For people who are used to just saving a document pretty much anywhere on their computer (or iPad or iPhone) and having it accessible via Spotlight/Search functions this can take a _lot_ of getting used to, but remember that in learning to program we're gradually stripping away the bells and whistles that sit between you and what the computer's actually doing. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chaining\n", "\n", "In and of itself the advantages of the Terminal might not be obvious, but we can also _chain_ commands together to do several things in one go. **You do not _need_ to run the example below** because it's for illustrative purposes:\n", "\n", "```\n", "curl https://raw.githubusercontent.com/kingsgeocomp/geocomputation/master/CitiesWithWikipediaData.csv | head > Header.csv\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**_Note_**: If you _do_ want to run the code above then you will probably need to do some prep-work:\n", "* **On a Mac** you will need to run (from the Terminal!) the following command: `xcode-select --install`. This installs the `curl` utility along with a bunch of other useful applications.\n", "* **On Windows** you will probably need to run something like `conda install posix` but unless you want to donate a Windows machine I can't test this.\n", "* Once that's done you should type all of the above (or copy+paste) on to _one new line_ and you should see the first line of this file printed to the Terminal." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This command might seem _really_ cryptic but we can break this problem down into steps (as I did when trying to remember how to do it!):\n", "\n", "1. `curl` -- this is a tool (hopefully installed on your computer!) that allows you to download a file from the internet using only the Terminal.\n", "2. `head` -- this utility works with the _top_ part of a file (`tail` starts working with the _end_ of a file)\n", "3. We 'glue' the output of `curl` together with `head` using `|` (known as a 'pipe') -- this tells the computer to pass the output of `curl` to the input of `head` (i.e. we are _chaining_ the commands together).\n", "4. `head` alone returns the first _ten_ lines of the file (but if we used `head -5`, for example, we would get the first _five_ lines of the file - you can read more about the `head` command here).\n", "5. We then direct the output of `head` to a file called `Header.csv` using the redirection command `>`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So this one line of _code_ allows you to download a file, extract the first line of the file (so that we can see what the variables are!), and then write it to a file on the local computer. Assuming that you didn't see any errors, you should now have a file called `Header.csv` in your home directory containing the first 10 lines of the file! Go check this in your home directory. \n", "\n", "The point of this is that we are gaining in power at the cost of 'ease of use'. Doing this same task _without_ using code would require: navigating to the web page, clicking the link to download the file, opening the file in a text editor or Excel, selecting and copying the first line, and then opening a new file, pasting in the copied data, and saving the file with a name and location! Easier the first time round, but much more work in the long run!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Delayed Gratification\n", "\n", "As you will experience, learning to program involves a _lot_ of delayed gratification: you will need to invest quite a bit of time in learning to crawl before you can walk, and even more time in learning to walk before you can run. A well-designed programming language like Python can make it a little bit _easier_ to learn each step, but it still _**won't make it easy**_. \n", "\n", "We've said this before and we'll say it again (and again, and again) that you _must_ think of this as learning a language: at first very little will make sense and you won't be able to say much more than 'hello, my name is X', but if you _practice_ and really _think_ about what you're doing it will get easier and easier, and faster and faster, to program.\n", "\n", "Learning to navigate around your hard drive using the Terminal is a good case in point: why learn to do this when you could just dump all of your data into one directory and then never need to think about it again? Or navigate around by clicking on folders and files using the mouse?\n", "\n", "Two reasons:\n", "\n", "1. Because you will need to think _logically_ and in an _organised_ way about how you manage data when you start doing data analysis: you will almost never receive raw data that doesn't require some work; so if you save your raw data and your processed data in the same directory how can you be sure you've loaded the right file? It's much easier to have two folders -- `raw` or `source`, and `clean` or `output` -- and use Python to read in raw data from one directory and then write the processed data out to a different directory than it would be using Excel's `File > Save As`. That's because the path is just _text_ and you can easily work with it that way!\n", "\n", "2. Because once you've learned how to use the path on your _own_ computer, you can use the _same_ techniques to navigate a web server or computer on the other side of the planet using URLs (which are just paths with some special 'sauce' that tell Python that the file isn't on _your_ computer).\n", "\n", "Here's an example that does the _same_ thing (and then some) as the Terminal code above, but using Python code instead:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "import requests as r\n", "\n", "url = 'http://bit.ly/2iIK9bA'\n", "data = r.get(url)\n", "content = data.content.decode('utf-8').splitlines()\n", "cr = csv.reader(content)\n", "\n", "for row in cr:\n", " if row[3] != 'Population':\n", " print(row[1] + \" has a population of \" + \"{:,}\".format(int(row[3])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just to highlight what we've just done:\n", "\n", "1. We retrieved a CSV file from somewhere on this planet using `requests`,\n", "2. We 'parsed' it so that each `row` became something from which we could extract variables thanks to the magic of the `csv` library that we `import`,\n", "3. We skip the first row because it's a 'header' row and doesn't contain data,\n", "4. We print out the 2nd and 4th columns of the CSV file.\n", "\n", "You can see what the source looked like [here](http://bit.ly/2iIK9bA). And you'll note that the URL looks _exactly_ like the path that we saw when we used the Terminal to navigate between directories. \n", "\n", "So if you grasp the concept in one context, you can apply it in others. _That_ is scalability!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Special Paths\n", "\n", "Finally, there are a few 'special' paths that you need to know about when using the Terminal or Python:\n", "\n", "- `/` is the **root directory** (so `cd /` would take you to the 'root' of the hard drive)\n", "- `~` is the **home directory** (this is a shortcut: it's the same as typing `cd /Users/myusername/`)\n", "- `.` means the **current directory** (this is avoid ambiguity: `cd ./myusername/Documents` would assume that in the current directory there is a sub-directory called `Documents` under `myusername`)\n", "- `..` means the **next directory up** (this is so that you don't have to type out `cd /Users/myusername/Settings` just to go from `Documents` 'over' to `Settings`; instead, you can use `cd ../Settings/`.\n", "\n", "In terms of terminology: any path that starts with a `/` is an _**absolute path**_ because we are starting from the _**root**_ directory; any path that starts with either `..` or `.` is a _**relative path**_ because it is starting from the _**current**_ directory and is relative to where the Terminal or program 'is' now.\n", "\n", "### Windows vs Everyone Else\n", "\n", "For historical reasons, Windows has long been just a bit different from Unix/Linux/Mac in how paths are specified: on Windows it has long been the case that the path separator was `\\`, not `/` (`:` also isn't allowed in file name on a Mac for historical reasons)!\n", "\n", "So the following Unix/Mac path: `/Users/myusername/Documents/`\n", "\n", "On a Windows machine would usually be: `\\Users\\myusername\\Documents\\`\n", "\n", "If you remember that on a Unix/Mac we use `\\` to manage things like spaces in a directory name then you can see how it gets pretty confusing and complicated pretty quickly when you start to program... But in Python we can cope with _both_ of these situations and others using the `os` library (short for 'operating system'); see the examples below!\n", "\n", "**Remember**: to turn code you need click in the code block below and then either type `Ctrl+Enter` or hit the 'Run' button on the menu bar above." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "# Create a path by joining together the input arguments\n", "print(os.path.join('Users','myusername','Documents'))\n", "print(\" \")\n", "# Create an absolute path (starts with '/') and file names that contain spaces (we don't need to worry about spaces)\n", "print(os.path.join('/','Users','myusername','Documents','My Work','Code Camp'))\n", "print(\" \")\n", "# Create a relative path (starts with '..') and file names that contain spaces (we don't need to worry about spaces)\n", "print(os.path.join('..','Documents','My Work','Code Camp'))\n", "print(\" \")\n", "# 'Expand' the current user's home directory (~) into an absolute path\n", "print(os.path.expanduser(\"~\"))\n", "print(\" \")\n", "# 'Expand' the current directory path (.) into an absolute path\n", "print(os.path.abspath('.'))\n", "print(\" \")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's quite a lot of new content to get your head around, so you might want to spend some time exploring your own computer using the Terminal (Shell/Power Shell on Windows) to get the hang of how paths work. All you'll need are: `cd`, `ls`, and `pwd` (or [their Windows equivalents](https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Step_by_Step_Guide/ap-doslinux.html) and [here for more](https://courses.cs.washington.edu/courses/cse140/13wi/shell-usage.html#windows)) to get started!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Power Terminal Features\n", "\n", "You might recall that we previously said that [all programmers are lazy](notebook-1.ipynb), and understanding this laziness is the key to thinking that most Terminal (and Python) commands look like the result of monkeys hitting keys at random to being able to write code without constantly having to check Google. Let's revisit some of the commeands we've just seen:\n", "\n", "| Command | Short For... | What It Does |\n", "|:------- |:------------ |:------------ |\n", "| ls | list | List the contents of the directory I'm about to give you ('`ls ~`' lists the contents of your home directory; '`ls ..`' lists the contents of the parent directory; etc.) |\n", "| cd | change directory | Move to the path I'm about to give you ('`cd ~`' moves you to your home directory; '`cd ..`' moves you to the parent directory; etc.) | \n", "| pwd | print working directory | Show the path of the current current directory (_i.e._ where am I now?) |\n", "| head | Get the head of a file | Print the first few lines of the specified file (_e.g._ `head somefile.csv`) |\n", "| tail | Get the tail of a file | Print the last few lines of the specified file (_e.g._ `tail -n 2 somefile.csv`) |\n", "| clear | Clear the Terminal | When the Terminal looks really busy with junk you don't need just type `clear` |\n", "| exit | Exit the Terminal | To close an open Terminal window |\n", "| mv | Move a directory or file | Like dragging things in the Finder window (_e.g._ `mv somefile.csv ../another_dir/` to move the file `somefile.csv` _over_ to a directory named `another_dir` that is accessible from the parent directory of this one) |\n", "| cp | Copy a directory or file | Like dragging things in the Finder window (_e.g._ `cp somefile.csv ../another_dir/` to make a copy of the file `somefile.csv` in a directory named `another_dir` that is accessible from the parent directory of this one) |\n", "\n", "You can see how all of these commands are basically abbreviations/mnemonics that make it easy for programmers to be lazy, but productive: less time typing == more time thinking/doing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Credits!\n", "\n", "#### Contributors:\n", "The following individuals have contributed to these teaching materials: \n", "- [James Millington](https://github.com/jamesdamillington)\n", "- [Jon Reades](https://github.com/jreades)\n", "- [Michele Ferretti](https://github.com/miccferr)\n", "- [Zahratu Shabrina](https://github.com/zarashabrina)\n", "\n", "#### License\n", "The content and structure of this teaching project itself is licensed under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 license](https://creativecommons.org/licenses/by-nc-sa/4.0/), and the contributing source code is licensed under [The MIT License](https://opensource.org/licenses/mit-license.php).\n", "\n", "#### Acknowledgements:\n", "Supported by the [Royal Geographical Society](https://www.rgs.org/HomePage.htm) (with the Institute of British Geographers) with a Ray Y Gildea Jr Award.\n", "\n", "#### Potential Dependencies:\n", "This notebook may depend on the following libraries: None" ] } ], "metadata": { "anaconda-cloud": {}, "geopyter": { "Contributors": [ "Michele Ferretti (https://github.com/miccferr)", "Jon Reades (https://github.com/jreades)", "James Millington (https://github.com/jamesdamillington)" ], "git": { "active_branch": "master", "author.name": "Jon Reades", "authored_date": "2017-06-05 16:12:59", "committed_date": "2017-06-05 16:12:59", "committer.name": "Jon Reades", "sha": "2fce1fab4c3c28acabee493a24302e5ee1e8a1eb" }, "libs": {} }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 4 }