What is the web?

The web is... well, let's back up a bit.

The Internet is a bunch of computers that are connected together across the world. "Computer" here is used loosely---I'm not just talking about your desktop Dell, I'm talking about your laptops and your phones and maybe even the dimmer switch that controls the lights in your living room.

Different computers connect to the Internet in different ways, and using diverse media: for example, your laptop might use wifi to communicate wirelessly with a router, which itself is connected via a cable in the cafe; where you're working to another router, which is in turn connected to an underground fiber optic cable that leads to still another router, etc.

The key interesting part of the Internet is that every computer on it has a unique address, and any computer on the Internet can talk to any other computer, as long as it knows that address. Computers on the Internet talk to each other using a "protocol" called TCP/IP, which allows them to reliably exchange "packets" of information. ("Information" here meaning "a bunch of bytes.")

HTTP

TCP/IP itself isn't all that useful for actually building applications on the Internet. It lets two computers communicate with one another, but it doesn't specify what that communication means. In order to have two computers more usefully communicate about things, our venerable Internet forebears invented other protocols that work on top of TCP/IP. Many of these protocols (like Gopher) have fallen by the wayside; others, like IMAP and FTP are still commonly in use. But by far the most prevalent protocol on the Internet today is HTTP: HyperText Transfer Protocol.

HTTP is a protocol designed to allow one computer to request a resource from another server. (Or to create a new resource on a server, or to modify an existing resource---but for now let's focus on just requesting resources.) The first computer sends a message to the second computer using a message that looks like this:

GET /cheese.html HTTP/1.1

In the line above, GET is the HTTP "method" (what the first computer wants the second computer to do on its behalf), and /cheese.html is the "path," or a description of where on the second computer the first computer believes the resource to be. (Often, this corresponds to a filename in a directory on the second computer.) The resource named in the path can be an HTML document, or it could be an image, or a video file, or whatever really.

The second computer reads the message above, and because of the rules it knows from the HTTP protocol, understands it to mean "I want you to fetch a resource called /cheese.html and return it to me." The second computer then sends a message back to the first that looks something like this:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 51

<html><body><h1>I enjoy cheese.</h1></body></html>

This response says to the first computer, "I found your document just fine. I think this is an HTML document (the Content-Type line) and it's 51 bytes long (the Content-Length line). Here's the document!"

Web servers and web browsers

The second computer in the example above is a web server---a computer somewhere on the Internet that is capable of responding to HTTP requests. Every time you type a URL (see below!) into your browser, somewhere behind the scenes a web server is being contacted, and a message much like the "GET" request above is being made to the web server on your behalf, and the server is responding with a message much like the "200 OK" message above.

A web browser is a specialized program that does two things: first, it knows how to make HTTP requests, i.e., it knows how to put together a message like GET /cheese.html HTTP/1.1 and send it to the right web server on the Internet. Second, it knows how to interpret the response from the web server. If the web server returns an image, it displays the image; if the web server returns a video, it displays the video; and importantly, if the web server returns an HTML document, it displays the HTML document.

URLs and hyperlinks

We'll get back to HTML in a second, but first let's talk about what a "URL" is. The abbreviation "URL" stands for "Uniform Resource Locator," and that's just what a URL is---a string of characters that describes, in a uniform manner, where to find a particular resource (document, image, video, text file...) on the Internet. The URL is what you type into your browser's location bar when you know what page you want to get to on the Internet; when you see something like http://www.drzizmormd.net/ in an advertisement on the subway, you're looking at a URL. When you cut-and-paste the "link" to an amusing cat video from the location bar of your browser to send it to a friend, you're cut-and-pasting a URL.

URLs have internal structure, and it's worth talking about parts of that structure. Here's an example URL:

http://aparrish.neocities.org/across-media.html#Schedule

This URL has the following parts, each of which are used by your web browser to make a successful request to the web server that has the document you want:

http: This is called the "scheme"; it determines which protocol should be used to fetch the document. (You also might see https here; HTTPS is a variant of HTTP that is "secure," i.e. encrypted.)
aparrish.neocities.org is the "hostname"; this is the name of the computer on the Internet that should be contacted with the web request.
/across-media.html is the "path," or the location of the resource you want to fetch.
Finally, #Schedule is the "fragment"; it's used by the web browser to select a particular section of the web page after it's been retrieved.

Every document on the Internet is uniquely identified by a URL. If you have the URL for it, you can find any document on the Internet. URLs are just strings of characters, but they're very powerful things.

HTML: an overview

At this point we have a basic understanding of a number of fundamental concepts---the Internet, TCP/IP, HTTP, URLs. Now it's time to talk about the real workhorse of the web, a formatting language called HTML.

HTML stands for "HyperText Markup Language," and that's a pretty good description of what it is. HTML allows you to take plain text documents and "mark them up" with a language that gives extra meaning to the text, above and beyond the meaning of the letters and words themselves.

Nearly every document that you look at on the Internet is written in HTML. Most browsers allow you to examine the HTML source code of any web page you visit. In Chrome, you can do so by right-clicking (or ctrl+click) on the web browser window and select "View Source."

Viewing source in Chrome

The source code will look something like this:

NYTimes source code

Your web browser knows how to interpret this jumble of weird-looking characters and render the beautiful New York Times home page layout that you know and love.

The task before us: learn how to write HTML so that web browsers can interpret our hopes, dreams and desires for what a web page should look like.

What HTML looks like

HTML consists of a series of tags. Tags have a name, a series of key/value pairs called attributes, and some textual content. Attributes are optional. Here's a simple example, using the HTML <p> tag (p means "paragraph"):

<p>Mother said there'd be days like these.</p>

This example has just one tag in it: a <p> tag. The source code for a tag has two parts, its opening tag (<p>) and its closing tag (</p>). In between the opening and closing tag, you see the tag's contents (in this case, the text Mother said there'd be days like these.).

Here's another example, using the HTML <div> tag:

<div class="header" style="background: blue;">Mammoth Falls</div>

In this example, the tag's name is div. The tag has two attributes: class, with value header, and style, with value background: blue;. The contents of this tag is Mammoth Falls.

Tags can contain other tags, in a hierarchical relationship. For example, here's some HTML to make a bulletted list:

<ul>
  <li>Item one</li>
  <li>Item two</li>
  <li>Item three</li>
</ul>

The <ul> tag (ul stands for "unordered list") in this example has three other <li> tags inside of it (li stands for "list item"). The <ul> tag is said to be the "parent" of the <li> tags, and the <li> tags are the "children" of the <ul> tag. All tags grouped under a particular parent tag are called "siblings."

There are dozens of HTML tags. One of the biggest parts of reading and writing HTML is learning all of the various "tags" and what they mean.

HTML: An example

Let's look an example HTML page. I designed this page to demonstrate how HTML works. It's not a very sophisticated page, but it's a good start! It's called Kittens and the TV Shows They Love. Click on the page and have a look.

Now let's go over the source code, reproduced below:

<!doctype html>
<html>
  <head>
    <title>Kittens!</title>
    <style type="text/css">
      span.lastcheckup { font-family: "Courier", fixed; font-size: 11px; }
    </style>
  </head>
  <body>
    <h1>Kittens and the TV Shows They Love</h1>
    <div class="kitten">
      <h2>Fluffy</h2>
      <div><img src="http://placekitten.com/120/120"></div>
      <ul class="tvshows">
        <li>
          <a href="http://www.imdb.com/title/tt0106145/">Deep Space Nine</a>
        </li>
        <li>
          <a href="http://www.imdb.com/title/tt0088576/">Mr. Belvedere</a>
        </li>
      </ul>
      Last check-up: <span class="lastcheckup">2014-01-17</span>
    </div>
    <div class="kitten">
      <h2>Monsieur Whiskeurs</h2>
      <div><img src="http://placekitten.com/110/110"></div>
      <ul class="tvshows">
        <li>
          <a href="http://www.imdb.com/title/tt0106179/">The X-Files</a>
        </li>
        <li>
          <a href="http://www.imdb.com/title/tt0098800/">Fresh Prince</a>
        </li>
      </ul>
      Last check-up: <span class="lastcheckup">2013-11-02</span>
    </div>
  </body>
</html>

This is pretty well organized HTML, but if you don't know how to read HTML, it will still look like a big jumble. Here's how I would characterize the structure of this HTML, reading in my own idea of what the meaning of the elements are.

The <!doctype html> at the top of the file is a special line that tells the browser what kind of document this is (its "doctype").
The <html> tag is the "root" element of the document. HTML documents almost always have an <html> tag that contains everything else.
The <head> and <body> tags also have a special meaning: the <head> tag contains "header" information about the document---things that are important for the browser to understand the document, but that don't get displayed on the page, like the document's title (enclosed in a <title> tag), which shows up in the title bar of the browser window. The <body> tag is the parent tag of all the elements that are to be displayed on the page.
The <h1> tag means "Header, Level 1." We'll talk more about the meaning of this tag later, but the main effect it has is to make the text inside the tag appear very large and in bold on the page.
We have two "kittens," both of which are contained in <div> tags with class kitten. (The <div> tag means "division"---it's a neutral way of saying "this is a bunch of related stuff on the page.")
Each "kitten" <div> has an <h2> tag ("Header, Level 2") with that kitten's name.
There's an image for each kitten, specified with an <img> tag. The src attribute of the <img> tag specifies where the browser should look for the image of the kitten.
Each kitten has a list (a <ul> with class tvshows) of television shows, contained within <li> tags.
Those list items themselves have links (<a> tags) with an href attribute that contains a link to an IMDB entry for that show.

BONUS QUIZ: What's the parent tag of <a href="http://www.imdb.com/title/tt0088576/">Mr. Belvedere</a>? Both <div class="kitten"> tags share a parent tag---what is it? What attributes are present on both <img> tags?

Style

Every HTML element has a "style" associated with it---that is, rules for what that element should look like when it's rendered on the screen. Browsers have built-in style rules: by default, for example, an <h1> tag is displayed in a large, bold font, and an <h2> tag is displayed in a large font that is nonetheless slightly smaller than <h1>; an <li> tag is displayed with a little dot off to the side; an <a> tag has its text colored blue and underlined.

But we can also change the way that HTML elements look, either on a tag-by-tag basis, or by making rules that apply to whole categories of tags. The language that we do this with is called CSS ("Cascading Style Sheets").