--- layout: chapter title: 'HTTP' category: part2 --- In the previous chapter, I mentioned the large number of application protocols that exist. You probably make use of many such protocols when you use the Internet, but in this chapter we're going to focus on the king of application protocols: the HyperText Transfer Protocol (HTTP). Remember that thanks to the transport and Internet layers, we don't have to worry at all about the technical details we discussed in the previous chapters. When discussing the application layer, we can pretend that computers magically send formatted data to each other just as easily as you might talk to another person in the same room as you. {: .note} You might know good ol' HTTP from your browser's address bar, where it is often seen just ahead of the domain name. That's because HTTP is the backbone of the _World Wide Web_: the interlinked multimedia web pages you view in your web browser. ## Text ## Before we dig into HTTP, we need a quick aside to discuss [data formats][fm]. Recall that one thing that a protocol requires agreement on is a data format to use for communication. HTTP is a text-based protocol, meaning that much of its communication is in the form of human-readable text. But of course we need a binary format for storing that text in a computer. One such format is called ASCII. [fm]: {{ site.baseurl }}/part1/formatting/ **ASCII** is a simple text format where each byte represents a single character. {: .definition} What does "character" mean here? A **character** is a single textual symbol. For example, upper and lower case letters and punctuation symbols are all characters. {: .definition} Here is a table translating between hexadecimal byte values and ASCII characters: {::options parse_block_html="false" /}
20 30040@50P60`70p
21!31141A51Q61a71q
22"32242B52R62b72r
23#33343C53S63c73s
24$34444D54T64d74t
25%35545E55U65e75u
26&36646F56V66f76v
27'37747G57W67g77w
28(38848H58X68h78x
29)39949I59Y69i79y
2A*3A:4AJ5AZ6Aj7Az
2B+3B;4BK5B[6Bk7B{
2C,3C<4CL5C\6Cl7C|
2D-3D=4DM5D]6Dm7D}
2E.3E>4EN5E^6En7E~
2F/3F?4FO5F_6Fo
{::options parse_block_html="true" /} A few observations of this table: * 0x20 translates to a space * Digits are really easy to translate since 0–9 correspond to 0x30–0x39 * You can convert letters from upper case to lower case and vice versa by adding or subtracting 0x20 * There are a bunch of missing byte values: 0x00–0x1F and 0x7F–0xFF. The reason for some of these (0x80–0xFF) is because ASCII only uses the first 7 bits of each byte so only the first 27=128 byte values can be used. The other missing characters are "unprintable ASCII". They include characters representing line breaks and indentation or even "control characters" that can have special meaning to the program using the ASCII text You might recall from the chapter on data formats that one goal of a format is to identify the type of data to the computer. ASCII is such a simple format and it is understood so widely that it doesn't bother with such things. Instead it is common for a computer to simply scan the bytes of data and, if they all fall within the ASCII range (less than 0x80), assume that the data are ASCII. Those are the basics of ASCII. Again, you don't need to worry about the details of ASCII as we move on. I just wanted to give you an idea of how computers handle all of the text we'll be seeing later on. ## HTML ## Now that we know how computers read text, this opens up a world of text formats. Just like how a data format agrees on the meaning of binary data, a text format agrees on the meaning of text (which itself might be stored in a binary data format like ASCII). HyperText Markup Language (HTML) is one such text format. The purpose of HTML is to enrich plain text with additional meaning. For example, consider this text: The rare original heartsbleed goes, Spends in the earthen hide, in the folds and wizenings, flows In the gutters of the banked and staring eyes. He lies As still as if he would return to stone, Richard Wilbur, The Death of a Toad From the context you can probably tell that this is a quotation, but computers aren't so good at guessing such things. They like to have things all spelled out. Let's _mark up_ this text with HTML to make the meaning explicit. {% highlight html %} The rare original heartsbleed goes, Spends in the earthen hide, in the folds and wizenings, flows In the gutters of the banked and staring eyes. He lies As still as if he would return to stone, Richard Wilbur, The Death of a Toad {% endhighlight %} I have added special text colors and styles to these sections to make the HTML easier to read. {: .note} It's easy to spot the HTML parts because they are all wrapped in angled brackets ``. These bracketed bits are called "tags". The tags we've added are the bare minimum to identify this as an HTML document. Let's examine each tag's meaning. The `!DOCTYPE` tag at the top lets the computer know that this is an HTML document. Next is an `` tag. You will notice another similar tag at the bottom: ``. The `/` at the beginning of the tag tells us that these two tags are a pair. This means that everything between `` and `` is HTML. These two tags always wrap the contents of an HTML document. Next we see another pair of tags: `` and ``. These tags enclose the body of our text. As I said, this is just the bare minimum. Let's add interesting stuff. {% highlight html %}

The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,

Richard Wilbur, The Death of a Toad {% endhighlight %} Here we've identified the stanza of the poem as a **p**aragraph using `

` tags and we've added `
` tags at the end of each line to indicate line **br**eaks. The `/` at the _end_ of the `br` tag indicates that each tag is on its own and doesn't have a matching `
` later in the document. This teaches us an important lesson about HTML. **HyperText Markup Language** (HTML) is a language for describing the _structure_ and _meaning_ of text, with no regard to its appearance or presentation. {: .definition} We humans understand the difference in meaning between line breaks in a paragraph and line breaks in a poem. We understand from context how the name following a quoted paragraph is not part of the quotation itself but a citation. HTML needs all of these implicit meanings to be made clear: line breaks are assumed to be meaningless unless specified with tags; text is assumed to be grouped together unless separated by tags. A side effect of HTML being very explicit and ignoring line breaks and indentation is that we can use these tools to try to make HTML a little more readable. Notice how I use indentation to make it clearer where tags start and end. {: .note} HTML also provides tags for marking up quotations: {% highlight html %}

The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,

Richard Wilbur, The Death of a Toad
{% endhighlight %} Now the association between the quotation and citation is clear. You might have wondered earlier at the point of the `` tag. What isn't part of the body of text? Well, HTML provides another tag `` in which you can place information _about_ the document that isn't part of the document itself. {% highlight html %} My Favorite Poem

The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,

Richard Wilbur, The Death of a Toad
{% endhighlight %} Now this is looking like a proper HTML document. But there's one notable HTML tag which is missing: {% highlight html %} My Favorite Poem

The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,

Richard Wilbur, The Death of a Toad
{% endhighlight %} We have added the mighty **a**nchor tag or, as you probably know it, a hyperlink. This tag looks a little different because it includes an _attribute_. An attribute goes inside a start tag after the tag name and usually looks something like `key="value"`. Attributes let us describe additional information about a particular instance of a tag. In our case, the anchor "Richard Wilbur" has a "**h**ypertext **ref**erence" (`href`) to a Wikipedia article. ## Behind the Scenes ## We've created a wonderful HTML document, but now what can we do with it? Well the real magic of HTML occurs when you give an HTML document to a web browser (like the one you're using right now). The browser reads the document, interprets the various tags, and turns it into an interactive web page for you to browse. Check out the document we just made by clicking [here][ex]. [ex]: {{ site.baseurl }}/extras/example Pretty cool, huh? To prove that there's no trickery going on here, try right-clicking on that page. The menu that pops up should have an option like "View page source" (this option may be difficult to find on a mobile device). This shows you exactly what HTML your browser is interpreting to create the page. "Interpreting" is definitely the correct word to use here. In most web browsers, you will probably see "Richard Wilbur" underlined and colored in blue and the whole citation typeset in italics. But nowhere in our HTML does it say "make this blue and underlined"! Your browser has styled the HTML according to its interpretation in order to pass along the meaning of the tags to _you_. One advantage of HTML is that it allows for alternative interpretations. For example a blind person might use a web browser that interprets HTML into a medium of touch and sound so that they can still interact with it. ## Exercises ##
1. Translate the following bytes into text using the ASCII format: 55 73 69 6E 67 20 41 53 43 49 49 20 69 73 20 65 61 73 79 21 2. View the page source of _this_ page; it's written in HTML. Try to match up the HTML tags with what you actually see in your browser. Well... I don't actually write this book in HTML. I write it in a different textual language called Markdown. A program then turns my Markdown text into HTML. [This][gh]{: .alert-link} is what that looks like. {: .deeper} [gh]: https://raw.githubusercontent.com/silverhammermba/superuser/gh-pages/_posts/2012-03-01-http.markdown 3. Here is one of my favorite recipes written in plain text. Give it an HTML treatment like we did to the quotation above. Pesto ===== This family recipe is simple, yet I have rarely tasted a restaurant's pesto that bested it. Ingredients ----------- * 2/3 cup basil leaves (approx.) * 1/3 cup olive oil * 1/3 cup parmesan cheese * 2 tbsp pine nuts or walnuts * 1/8 tsp (white) pepper * 2-3 cloves garlic Directions ---------- 1. Put all ingredients in blender 2. Blend Serves 4-6 There are a lot more tags to choose from than the few I showed you. Check out the full list of HTML tags [here][moz]{: .alert-link}. Note that you don't have to keep all of the text from the original if it doesn't seem important to the meaning of the document.
[moz]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element