---
layout: chapter
title: 'HTTP'
category: part2
---
In the previous chapter, I mentioned the large number of application protocols
that exist. You probably make use of many such protocols when you use the
Internet, but in this chapter we're going to focus on the king of application
protocols: the HyperText Transfer Protocol (HTTP).
Remember that thanks to the transport and Internet layers, we don't have to
worry at all about the technical details we discussed in the previous chapters.
When discussing the application layer, we can pretend that computers magically
send formatted data to each other just as easily as you might talk to another
person in the same room as you.
{: .note}
You might know good ol' HTTP from your browser's address bar, where it is often
seen just ahead of the domain name. That's because HTTP is the backbone of the
_World Wide Web_: the interlinked multimedia web pages you view in your web
browser.
## Text ##
Before we dig into HTTP, we need a quick aside to discuss [data formats][fm].
Recall that one thing that a protocol requires agreement on is a data format to
use for communication. HTTP is a text-based protocol, meaning that much of its
communication is in the form of human-readable text. But of course we need a
binary format for storing that text in a computer. One such format is called
ASCII.
[fm]: {{ site.baseurl }}/part1/formatting/
**ASCII** is a simple text format where each byte represents a single character.
{: .definition}
What does "character" mean here?
A **character** is a single textual symbol. For example, upper and lower case
letters and punctuation symbols are all characters.
{: .definition}
Here is a table translating between hexadecimal byte values and ASCII
characters:
{::options parse_block_html="false" /}
20 | | 30 | 0 | 40 | @ | 50 | P | 60 | ` | 70 | p |
21 | ! | 31 | 1 | 41 | A | 51 | Q | 61 | a | 71 | q |
22 | " | 32 | 2 | 42 | B | 52 | R | 62 | b | 72 | r |
23 | # | 33 | 3 | 43 | C | 53 | S | 63 | c | 73 | s |
24 | $ | 34 | 4 | 44 | D | 54 | T | 64 | d | 74 | t |
25 | % | 35 | 5 | 45 | E | 55 | U | 65 | e | 75 | u |
26 | & | 36 | 6 | 46 | F | 56 | V | 66 | f | 76 | v |
27 | ' | 37 | 7 | 47 | G | 57 | W | 67 | g | 77 | w |
28 | ( | 38 | 8 | 48 | H | 58 | X | 68 | h | 78 | x |
29 | ) | 39 | 9 | 49 | I | 59 | Y | 69 | i | 79 | y |
2A | * | 3A | : | 4A | J | 5A | Z | 6A | j | 7A | z |
2B | + | 3B | ; | 4B | K | 5B | [ | 6B | k | 7B | { |
2C | , | 3C | < | 4C | L | 5C | \ | 6C | l | 7C | | |
2D | - | 3D | = | 4D | M | 5D | ] | 6D | m | 7D | } |
2E | . | 3E | > | 4E | N | 5E | ^ | 6E | n | 7E | ~ |
2F | / | 3F | ? | 4F | O | 5F | _ | 6F | o |
{::options parse_block_html="true" /}
A few observations of this table:
* 0x20 translates to a space
* Digits are really easy to translate since 0–9 correspond to
0x30–0x39
* You can convert letters from upper case to lower case and vice versa by adding
or subtracting 0x20
* There are a bunch of missing byte values: 0x00–0x1F and 0x7F–0xFF.
The reason for some of these (0x80–0xFF) is because ASCII only uses the
first 7 bits of each byte so only the first 27=128 byte values can
be used. The other missing characters are "unprintable ASCII". They include
characters representing line breaks and indentation or even "control
characters" that can have special meaning to the program using the ASCII text
You might recall from the chapter on data formats that one goal of a format is
to identify the type of data to the computer. ASCII is such a simple format and
it is understood so widely that it doesn't bother with such things. Instead it
is common for a computer to simply scan the bytes of data and, if they all fall
within the ASCII range (less than 0x80), assume that the data are ASCII.
Those are the basics of ASCII. Again, you don't need to worry about the details
of ASCII as we move on. I just wanted to give you an idea of how computers
handle all of the text we'll be seeing later on.
## HTML ##
Now that we know how computers read text, this opens up a world of text formats.
Just like how a data format agrees on the meaning of binary data, a text format
agrees on the meaning of text (which itself might be stored in a binary data
format like ASCII).
HyperText Markup Language (HTML) is one such text format. The purpose of HTML is
to enrich plain text with additional meaning. For example, consider this text:
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur, The Death of a Toad
From the context you can probably tell that this is a quotation, but computers
aren't so good at guessing such things. They like to have things all spelled
out. Let's _mark up_ this text with HTML to make the meaning explicit.
{% highlight html %}
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur, The Death of a Toad
{% endhighlight %}
I have added special text colors and styles to these sections to make the HTML
easier to read.
{: .note}
It's easy to spot the HTML parts because they are all wrapped in angled brackets
``. These bracketed bits are called "tags". The tags we've added are
the bare minimum to identify this as an HTML document. Let's examine each tag's
meaning.
The `!DOCTYPE` tag at the top lets the computer know that this is an HTML
document. Next is an `` tag. You will notice another similar tag at the
bottom: ``. The `/` at the beginning of the tag tells us that these two
tags are a pair. This means that everything between `` and `` is
HTML. These two tags always wrap the contents of an HTML document. Next we see
another pair of tags: `` and ``. These tags enclose the body of our
text.
As I said, this is just the bare minimum. Let's add interesting stuff.
{% highlight html %}
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur, The Death of a Toad
{% endhighlight %}
Here we've identified the stanza of the poem as a **p**aragraph using ``
tags and we've added `
` tags at the end of each line to indicate line
**br**eaks. The `/` at the _end_ of the `br` tag indicates that each tag is on
its own and doesn't have a matching `` later in the document. This teaches
us an important lesson about HTML.
**HyperText Markup Language** (HTML) is a language for describing the
_structure_ and _meaning_ of text, with no regard to its appearance or
presentation.
{: .definition}
We humans understand the difference in meaning between line breaks in a
paragraph and line breaks in a poem. We understand from context how the name
following a quoted paragraph is not part of the quotation itself but a citation.
HTML needs all of these implicit meanings to be made clear: line breaks are
assumed to be meaningless unless specified with tags; text is assumed to be
grouped together unless separated by tags.
A side effect of HTML being very explicit and ignoring line breaks and
indentation is that we can use these tools to try to make HTML a little more
readable. Notice how I use indentation to make it clearer where tags start and
end.
{: .note}
HTML also provides tags for marking up quotations:
{% highlight html %}
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur, The Death of a Toad
{% endhighlight %}
Now the association between the quotation and citation is clear.
You might have wondered earlier at the point of the `` tag. What isn't
part of the body of text? Well, HTML provides another tag `` in which you
can place information _about_ the document that isn't part of the document
itself.
{% highlight html %}
My Favorite Poem
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur, The Death of a Toad
{% endhighlight %}
Now this is looking like a proper HTML document. But there's one notable HTML
tag which is missing:
{% highlight html %}
My Favorite Poem
The rare original heartsbleed goes,
Spends in the earthen hide, in the folds and wizenings, flows
In the gutters of the banked and staring eyes. He lies
As still as if he would return to stone,
Richard Wilbur,
The Death of a Toad
{% endhighlight %}
We have added the mighty **a**nchor tag or, as you probably know it, a
hyperlink. This tag looks a little different because it includes an _attribute_.
An attribute goes inside a start tag after the tag name and usually looks
something like `key="value"`. Attributes let us describe additional information
about a particular instance of a tag.
In our case, the anchor "Richard Wilbur" has a "**h**ypertext **ref**erence"
(`href`) to a Wikipedia article.
## Behind the Scenes ##
We've created a wonderful HTML document, but now what can we do with it? Well
the real magic of HTML occurs when you give an HTML document to a web browser
(like the one you're using right now). The browser reads the document,
interprets the various tags, and turns it into an interactive web page for you
to browse. Check out the document we just made by clicking [here][ex].
[ex]: {{ site.baseurl }}/extras/example
Pretty cool, huh? To prove that there's no trickery going on here, try right⌘-clicking
on that page. The menu that pops up should have an option like "View page
source" (this option may be difficult to find on a mobile
device). This shows you exactly what HTML your browser is interpreting to
create the page.
"Interpreting" is definitely the correct word to use here. In most web browsers,
you will probably see "Richard Wilbur" underlined and colored in blue and the
whole citation typeset in italics. But nowhere in our HTML does it say "make
this blue and underlined"! Your browser has styled the HTML according to its
interpretation in order to pass along the meaning of the tags to _you_.
One advantage of HTML is that it allows for alternative interpretations. For
example a blind person might use a web browser that interprets HTML into a
medium of touch and sound so that they can still interact with it.
## Exercises ##
1. Translate the following bytes into text using the ASCII format:
55 73 69 6E 67 20 41 53 43 49 49 20 69 73 20 65 61 73 79 21
2. View the page source of _this_ page; it's written in HTML. Try to match up
the HTML tags with what you actually see in your browser.
Well... I don't actually write this book in HTML. I write it in a different
textual language called Markdown. A program then turns my Markdown text into
HTML. [This][gh]{: .alert-link} is what that looks like.
{: .deeper}
[gh]: https://raw.githubusercontent.com/silverhammermba/superuser/gh-pages/_posts/2012-03-01-http.markdown
3. Here is one of my favorite recipes written in plain text. Give it an HTML
treatment like we did to the quotation above.
Pesto
=====
This family recipe is simple, yet I have rarely tasted a restaurant's
pesto that bested it.
Ingredients
-----------
* 2/3 cup basil leaves (approx.)
* 1/3 cup olive oil
* 1/3 cup parmesan cheese
* 2 tbsp pine nuts or walnuts
* 1/8 tsp (white) pepper
* 2-3 cloves garlic
Directions
----------
1. Put all ingredients in blender
2. Blend
Serves 4-6
There are a lot more tags to choose from than the few I showed you. Check out
the full list of HTML tags [here][moz]{: .alert-link}. Note that you don't
have to keep all of the text from the original if it doesn't seem important
to the meaning of the document.
[moz]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element