HTML

Goals

Provide a history of HTML.
Learn the basic structure of HTML documents.
Become familiar with the most common HTML elements and their correct usage.
Be able to create HTML documents using XML.
Gain experience deploying a web site by serving HTML pages from a server.

Concepts

browser wars
browsing context
content model
divitis
doctype
doctype switching
document type declaration
empty attribute
flow content
global attribute
heading content
HyperText Markup Language (HTML)
HTML5
Hypertext Transfer Protocol (HTTP)
hyperlinks
living standard
markup language
metadata
named character reference
paragraph
phrasing content
polyglot
preformatted
presentation-oriented
quirks mode
Scalable Vector Graphics (SVG)
sectioning content
semantics
Standard Generalized Markup Language (SGML)
vocabulary
void element
World Wide Web Consortium (W3C)
Web Hypertext Application Technology Working Group (WHATWG)
World Wide Web (WWW)
XHTML
Extensible Markup Language (XML)

Lesson

One of the primary drivers of the popularity of the Internet, along with the Hypertext Transfer Protocol (HTTP), is the HyperText Markup Language (HTML), the most popular content format sent over HTTP. HTML is a markup language because it allows you to add meaning to a document by inserting text delimiters to mark certain locations. You already saw a snippet of HTML when you studied the Extensible Markup Language (XML).

Excerpt from an HTML document discussing XML.

…
<p>To process <abbr title="Extensible Markup Language">XML</abbr> a computer must <dfn>parse</dfn>
the XML document, but still the computer may not know what the resulting tags <em>mean</em>
unless it is familiar with the XML vocabulary being used.</p>
…

History

Tim Berners-Lee, generally considered to be the inventor of the World Wide Web (WWW), created HTML in the 1990s as a specialized format for linking documents via hyperlinks. HTML sent over HTTP is the basis for the web that still exists today.

HTML evolved from the Standard Generalized Markup Language (SGML) from the 1980s. Originally HTML was an application of SGML—a restricted syntax with a set vocabulary. The most recent versions of HTML have abandoned full SGML compliance, however.

HTML 3

The browser wars began in which companies such as Microsoft and Netscape tried to to gain a control of the browser market. They introducing extension tags that were only understood by their own web browsers, Microsoft Internet Explorer and Netscape Navigator. This caused many inconsistencies and incompatibilities in the way web pages appeared across browsers.

HTML 4

By the end of the 1990s HTML had advanced to version 4. By then the HTML specification was being maintained and improved by the World Wide Web Consortium (W3C), headed by Tim Berners-Lee. Around 2000 the W3C released HTML 4.01, which was to remain unchanged for many years, becoming the most common form of HTML. In the meantime XML had become very popular, so the W3C decided to stop work on HTML as an unrelated format, and produce later versions of HTML based on XML.

HTML5

Many thought that the W3C was moving too slowly in creating new markup languages, and that the languages they were creating were too large and complicated. For this reason Apple, Mozilla Foundation, and Opera Software formed the Web Hypertext Application Technology Working Group (WHATWG) to more quickly develop an updated simple version of HTML. This effort proved popular, and eventually the W3C reconsidered its abandonment of HTML as a specification independent of XML.

In 2014 the W3C released the official specification of what is now known as HTML5, based on the WHATWG's work. The WHATWG continues to improve HTML5 as a living standard, meaning that it is constantly being updated and improved. The W3C intends to periodically release new updates of HTML5 as snapshots, such as HTML 5.1.

Content

The media type of HTML is text/html.

The general syntax of HTML is very similar to that of XML, reflecting HTML's shared evolution from SGML.

Comments

A comment in HTML, as in XML, takes the form  and can span multiple lines. Comment text must not contain the a sequence of two hyphen -- characters.

Elements

Just like in XML, the primary structure for delimiting information is an element, which can consist of a start-tag and an end-tag. While XML simply provides a general syntax for tags, HTML actually provides an element vocabulary, a set of tag names that mean certain things. For example the <p> element indicates that the marked up character data between the tags is a normal paragraph of text, as opposed to a figure or a caption.

HTML elements with no end-tags.

…
<p>Here is a picture of a car.
<p><img href="car.jpg" alt="A car.">
<p>Another type of vehicle is a truck.
…

The biggest difference from XML is that in HTML the end-tag of an element is sometimes optional! For example the figure to the side shows is a valid HTML fragment with no closing tags.

In some ways HTML is not as flexible as XML. Some elements, called void elements, are prohibited from having ending tags! They are: <area>, <base>, <br>, <col>, <embed>, <hr>, <img>, <input>, <keygen>, <link>, <menuitem>, <meta>, <param>, <source>, <track>, and <wbr>. Although you can't add an ending tag, you can (and should) use an empty-element tag, such as <img … />, to make your markup compatible with XML. See HTML 5.1 § 8.1.2. Elements: Void elements and Are (non-void) self-closing tags valid in HTML5?

HTML5 does not provide a DTD. It does however provide a sort of schema by placing elements in different categories. Elements of certain categories should only contain certain other categories, as you will see throughout this lesson. This is called a content model. Some elements may fall into several categories. See HTML 5.1 § 3.2.4.2. Kinds of content.

Attributes

Unlike XML, HTML does not require attribute values to be quoted, and can even be present without a value in certain contexts. See HTML 5.1 § 8.1.2.3. Attributes. Here are the ways you might see an attribute in HTML:

<html lang="en-US">: The attribute is quoted, just like in XML. Single quotes are also allowed, as in XML.
<html lang=en-US>: If an attribute value consists of a single word composed of a limited set of characters, it does not need to be quoted. It is still a good idea to quote the value for XML compatibility, as discussed in XHTML below.
<button disabled>: Some Boolean attributes allow the empty attribute form; their mere presence acts as a flag. An empty attribute is equivalent to setting the attribute to the empty string, as in <button disabled="">. If you want compatibility with XHTML, you can set the value to the same string as the attribute name, as in <button disabled="disabled">; see XHTML below.

Global Attributes

HTML provides several global attributes, so called because they be placed on any element in the document. Here are some of the ones you will use frequently. See HTML 5.1 § 3.2.5. Global attributes.

id: Creates an identifier for the element, unique within the document. You can use the ID to create internal hyperlinks; see Links below.
title: Contains general advisory information about the element. Browsers often show the title in a tooltip when the mouse is over an element. Do not depend on browser behavior; don't make the title attribute the only source of essential information, because it may not be shown.
lang: Indicates the language of a section of the document, using a language tag as described in BCP 47. This is most commonly placed on the root <html> element to indicate the language of the entire document. See an example in Structure.
class: "Classifies" the element into one or more categories for applying style information. You will learn more about styles and stylesheets in an upcoming lesson.
style: Includes style information directly in the element. As you will learn in the upcoming lesson on styles, placing literal style definitions in an element's style attribute usually not a good idea.

Formatting

TODO

TODO nbsp

Document Type Declaration

HTML, like its cousin format XML, can provide a document type declaration (or doctype) at the top of the document. Originally the DTD was intended to indicate to a web browser which version of HTML the document used, just as XML uses the doctype to indicate which XML DTD is being used as the document schema. Eventually the doctype provided several functions:

Going forward the W3C and WHATWG has indicated that all new HTML versions will continue to use the <!DOCTYPE html> doctype. See Recommended list of Doctype declarations for a list of doctypes historically used in various situations.

version: The version of HTML.
vocabulary: Whether the document included only HTML or included additional vocabularies such as Scalable Vector Graphics (SVG).
transitional: Whether the document adhered to the latest HTML recommendations, or whether it included older, deprecated elements that were being phased out.
quirks: Whether the browser should display the content as the standards prescribe, or whether it should exhibit old, erroneous behavior for backwards compatibility.

Although the HTML5 doctype is not case sensitive, it is best to put DOCTYPE in uppercase and html in lowercase for consistency across documents.

Recent popular HTML doctypes.

HTML 4.01 Strict: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
HTML 4.01 Transitional: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
HTML 5: <!DOCTYPE html>

Structure

HTML skeleton document.

<!DOCTYPE html>
<html lang="en-US">
<head>
  <meta charset="UTF-8"/>
  <title>HTML Skeleton</title>
  <meta name="…" content="…"/>
</head>
<body>
  …
</body>
</html>

HTML documents require a <html> root element, containing a <head> and a <body>. The W3C recommends adding a lang attribute to the root element, using a language tag to indicate the language of the document, as shown in the figure.

Metadata

Inside the <head> element are various elements that provide metadata, data that is not document content but instead information about the content data. The most important of these is the <title> element, which provides a title for the document. Other metadata can be provided in general <meta> elements using the name and content attributes to provide name/value pairs, such as <meta name="author" content="Jane Doe"/>. Document metadata is important because its helps categorize and search for content.

The <meta> element with a charset attribute is special; if an HTML file is encoded in any character set compatible with ASCII, the HTML parser can determine the actual charset when it reaches this <meta> element. For this reason HTML documented encoded in UTF-8 should always include a charset <meta> element as early as possible in the document (before the <title> element, for example). HTML files using XML should declare the charset in the XML declaration, and web servers should indicate the charset in the response Content-Type header as well. See HTML 5.1 § 4.2.5.5. Specifying the document’s character encoding.

<meta> elements are optional, and there are no requirements for their content. Nevertheless there are several standard metadata names you should consider adding to your document.

author: The name of the person who wrote the document.
description: A human-readable description of the document's contents.
keywords: A comma-separated list of search keyword tokens, such as "programming,course,GlobalMentor,Java".

See HTML 5.1 § 4.2.5.1 Standard metadata names and WHATWG MetaExtensions.

Sectioning Content

The main content of the document goes inside the <body> element, and in the early days of the web, the <body> element was all that was needed. HTML content is not as simple as it used to be, and some documents now contain entire books or more. HTML5 introduced a set of sectioning content elements that allow documents to organize complex material within the <body> element. These are optional but highly useful in dividing the document into sections. See HTML 5.1 § 3.2.4.2.3. Sectioning content.

Examples of sections.

…
<body>
  <header><h1>Newspaper</h1></header>
  <section>
    <h1>Sports</h1>
    <h2>Regional Games</h2>
    <article>
      <h1>Last Night's Win</h1>
      …
      <aside>…</aside>
      …

<article>: A self-contained work such as an article or blog post, such as one of many stories appearing on a news site.
<aside>: Information that is related to the main comment but tangential, such as the tips and warning boxes appearing in this lesson.
<nav>: A section set apart just for providing navigation links within the document or to other documents.
<section>: A general section of content, such as inside an <article>.

Sections are optional; for simple documents you can simply place your content directly inside <body> as traditionally done.

Headings

Within a section or directly inside the <body> you may use one of the heading content elements: <h1>, <h2>, <h3>, <h4>, <h5>, <h6> to provide a sort of title that appears above a portion of content. The headings are hierarchical: first heading <h1> represents a top-level heading, while <h2> represents a subordinate heading. See HTML 5.1 § 3.2.4.2.4. Heading content.

TODO discuss sections, outlines, and headings in the HTML5 era; mention multiple <h1>s

Headers and Footers

A <header> can contain various content—even headings such as <h1>.

If you want to group information at the top or bottom of a section, use the <header> or <footer> element, respectively.

Flow Content

HTML5 Content Venn Diagram — HTML5 content Venn diagram (HTML 5.1)

Most of the elements in HTML, the ones traditionally used to mark up text, are those in the flow content category. They include the heading and sectioning elements you've already seen, along with some metadata content. See HTML 5.1 § 3.2.4.2.2. Flow content.

Paragraphs

Content within the <body> or within one of the section elements are grouped into paragraphs. The obvious element for defining a paragraph is the the <p> element, but HTML's definition of paragraph is broader than this. Sections of text inside a section that are not enclosed in any element implicitly form paragraphs as well.

In the following example, the group of sentences starting with Vehicles can be classified … and ending with … wheels they have is considered to be a type of paragraph (though not a <p> paragraph) even though they have no surrounding tags. HTML's idea of a paragraph then is closer to what you might think of as a block of text. To avoid confusion, this lesson will use the term block to refer to HTML's general idea of paragraph, while reserving the term paragraph to refer to blocks marked by the <p> element.

Implicit paragraph or block.

…
<body>
  <p>The term “vehicle” is a broad concept. In a program,
  classes can be used to represent vehicles.</p>
  Vehicles can be classified in several ways. One is by the
  number of wheels they have.
  <p>A vehicle that has single wheel is called a <dfn>unicycle</dfn>.</p>
</body>
…

Groups

HTML technically has no grouping content content model category, but there are several elements that are made for grouping content. Most of these element may include other elements. Some of them, such as the <p> element which you've already seen, is primarily made to create text blocks. Others may even contain other grouping elements. See HTML 5.1 § 4.4. Grouping content.

Be careful not to overuse the <div> element (and its corresponding phrase element <span>), or your document's markup will become devoid of meaning, humorously referred to divitis. Try your best to use a more appropriate grouping element if possible; see Semantics below for details.

<blockquote>: A longer quotation from some source. You can use the cite attribute to indicate the URL of the source, such as a blog article.
<div>: A purely grouping element that has no meaning in itself. You can use a <div> as an element on which to add attributes, such as grouping paragraphs written in Hindi using <div lang="hi-IN">…</div>.
<figure>: A way to group self-contained content, such as text, images, and source code. Unlike <aside>, containing content that is tangential, content inside <figure> is still an essential part of the main flow. You can specify a caption for a <figure> by placing a <figcaption>, another grouping element, inside it. See the section on Images for a full example.
<hr>: A thematic break in the content at the same level as a paragraph. This is often used in novels when there is a break in thought or in time in the narrative. An <hr> element is not meant to contain other elements. The name hr originally stood for horizontal rule, and many browsers still render this element as a horizontal line by default.
<main>: The main content of the document., if there is a need to distinguish it from other parts such as navigation, logos, and copyright information.
<p>: A paragraph of text and other content.
<pre>: Contains text that is preformatted; that is, the text spacing and arrangement has already been determined, such as computer output or ASCII art. A <pre> element is often used to surround a <code> element for computer source code. Be careful with line breaks; because whitespace is significant inside <pre>, you must indent subsequent lines to match the source, not the surrounding HTML content.

Grouping content.

<p>The lesson about indirection mentioned a poem by Lydia Maria Child:</p>
<!-- A <blockquote> is used because only part of the entire is being quoted. -->
<blockquote cite="https://www.poetryfoundation.org/poems/43942/the-new-england-boys-song-about-thanksgiving-day">
  <pre>Over the river, and through the wood,
    To grandfather's house we go;
      The horse knows the way,
      To carry the sleigh,
    Through the white and drifted snow. …</pre>
</blockquote>

Lists

An unordered list <ul> element item is also a good idea for representing a menu of links, with each <li> containing a single menu item. This is semantically appropriate because a menu represents a list of choices; applying the correct styles can improve how the menu appears.

A list item <li> or a description <dd> may even include other lists!

There are several types of HTML lists, which are grouping elements that indicate a sequence of items. Most often used are the unordered list <ul> and the ordered list <ol>. Inside either of these elements, each item in the list must be placed inside a list item <li>. Make sure to choose the correct type of list, based upon whether your content is naturally in some order, such as steps to be performed, or in no required order, such as a list of the primary colors. Normally ordered lists are shown with numbers, while unordered lists are shown with bullet points.

Ordered list

<ol>
  <li>Cross the river.</li>
  <li>Go through the woods.</li>
  <li>Arrive at grandfather's house!</li>
</ol>

Description list with hostname IP address mappings.

<dl>
  <dt>localhost</dt>
  <dd>127.0.0.1</dd>
  <dd>www.myserver.com</dt>
  <dd>1.2.3.4</dd>
  <dd>www.example.com</dt>
  <dd>198.51.100.27</dd>
</dl>

HTML also offers a description list <dl> element for marking up a list of items and their associated descriptions. The items need not be actual definitions as in a dictionary, but may contain any content that is associated with other content, such as teams and their rankings. The term or the thing being described is placed in a <dt> element, followed by its description in a <dd> element. If you indeed intend use a description list to hold dictionary-like definitions, you may additionally indicate that each term is being defined by using <dt><dfn>foobar</dfn></dt>, as explained below.

Phrasing Content

The elements in the phrasing content category help markup the content inside a block of text. They appear inline and do not create new blocks of text, although the <br> element will create a line break within the current block. TODO move note on <br> to aside The following are some of the most useful phrasing content elements.

Element	Description	Example	Example Rendering
`<abbr>`	Indicates that something is an abbreviation or an acronym. You may use the optional `title` attribute to indicate the non-abbreviated form.	`<abbr title="Java API for RESTful Web Services">JAX-RS</abbr>`	JAX-RS
`<cite>`	Represents the reference to some creative work such as a book or magazine, indicating the title and/or author. This element is commonly rendered in italics.	`<cite>A Tale of Two Cities</cite>`	A Tale of Two Cities
`<del>`	Indicates text that has been removed from the document, such as during editing. This element is commonly rendered in strikeout. See `<ins>`.	`The book is on the <del>the</del> table.`	The book is on the ~~the~~ table.
`<dfn>`	Indicates a reference to a new word where it is being defined. This element is commonly rendered in italics.	`A magazine published four times a year is sometimes called a <dfn>quarterly</dfn>.`	A magazine published four times a year is sometimes called a quarterly.
`<em>`	Places emphasis on the contents. This element is commonly rendered in italics. See also `<strong>`.	`The <dfn>penultimate</dfn> is the <em>second</em> to last.`	The penultimate is the second to last.
`<ins>`	Indicates text that has been added to the document. This element is commonly rendered in underline. See `<del>`.	`The book is on <ins>the</ins> table.`	The book is on the table.
`<q>`	Indicates text quoted from some other source. You may use the optional `cite` attribute to indicate the URL of the source of the quote. This element usually causes quotation marks to be shown around the content. The `<q>` element is useful, because a browser will usually show quotation marks appropriate for the content language. Don't use `<q>` if not referring to an actual quotation, such as in a sarcastic reference.	`Donald Knuth said that so-called "premature optimization" is <q>the root of all evil</q>.`	Donald Knuth said that so-called "premature optimization" is the root of all evil.
`<s>`	Indicates text that is no longer accurate or relevant. This element is commonly rendered in strikeout. The `<s>` element originally referred to general strikeout rendering, but should now only be used if semantically appropriate. If the content has actually been removed from the document as an edit, use `<del>` instead.	`Final closeout sale: <s>Two</s> Three for the price of one!`	Final closeout sale: ~~Two~~ Three for the price of one!
`<small>`	Contains text sometimes referred to as "small print" such as disclaimers and legal restrictions. The `<small>` element originally was a way to show any text in a smaller font, but now should only be used if the meaning is appropriate.	`<small>Offer not valid in all areas.</small>`	Offer not valid in all areas.
`<span>`	A general element that has no meaning in itself for grouping phrases. A `<span>` is useful for adding a `class` attribute to a section of text, if appropriate styles have been defined. Similar to the `<div>` element, the `<span>` element should not be overused; try to find another element that is more semantically appropriate, as explained under Semantics.
`<strong>`	Places strong emphasis on the contents to indicate importance or urgency. This element is commonly rendered in bold. See also `<em>`.	`If an electrical outlet is placed near a sink, <strong>it must protected by GFCI</strong>.`	If an electrical outlet is placed near a sink, it must protected by GFCI.
`<sub>`	Represents a subscript. See also `<sup>`.	`The binary logarithm is log<sub>2</sub>.`	The binary logarithm is log₂.
`<sup>`	Represents a super. See also `<sub>`.	`Go to the 4<sup>th</sup> floor.`	Go to the 4^th floor.

Links

Originally the <a> element stood for an “anchor”, and could serve as a link destination using the name attribute. HTML5 no longer allows the <a> element to have a name. There no need to use <a> as an anchor any more; just link to the id of the target element.

Hyperlinks in HTML appear when the <a> element is used. The “hyperlink reference” attribute href indicates the location to load when the link is activated or clicked. Most commonly the href value is a URL to another HTML page. If it may be a relative reference, it will be resolved in the context of the URL of currently loaded document.

The href URL may contain a fragment identifier, which is marked by the number sign # character. This works for relative references, and may even be used alone to reference a location within the same document. When a fragment identifier is given, it indicates the id attribute of the destination element; if there is no matching id attribute value, the first matching name attribute will be used. The following example shows both an external and an internal link. The Images section contains more examples of links.

Hyperlink.

<p>You can find <a href="#more">more information</a> below.</p>
…
<h3 id="more">More Information</h3>
<p>More information can be found
  on the <a href="https://www.example.com/">example page</a>.</p>

Here are a few useful attributes for <a>. See HTML 5.1 § 4.5.1. The a element for more details.

href: Identifies the destination of the link. Can be a URL or a relative reference, which will be resolved to the URL of the HTML document. If possible you should use a relative reference in href to provide flexibility deploying your web site or document.
rel: The "relationship" of the referenced document to the current one: one or more space-separated tokens such as help or license. See HTML 5.1 § 4.8.6. Link types for the allowed values HTML defines, and microformats: existing rel values more registered extension link types.
target: Indicates the browsing context (usually a browser window) to use when navigating the link. The most commonly used target value is _blank, which typically causes the browser to open the link in a new window or tab. Use this capability sparingly; most of the time you do not want to force a new window to open when the user navigates a link.

TODO: Mention different types of links such as mailto:

Media

HTML has included support for images since early on, but only in HTML5 has support for audio and video become available without the need for browser plugins.

Images

Images are embedded in an HTML document using the <img> element. Similar to a link's href attribute, <img> specifies the image to embed using the src attribute. The alt attribute allows you to provide a short text description as “alternate text” to display in case the image cannot be loaded, or for users with visual disabilities. If possible you should use a relative reference in src to provide flexibility deploying your web site or document. To provide accessibility to the largest number of users, providing an alt attribute is required in almost all circumstances.

<figure>
  <a href="https://upload.wikimedia.org/wikipedia/commons/e/eb/Lafd_ladder_truck.jpg">
    <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/eb/Lafd_ladder_truck.jpg/800px-Lafd_ladder_truck.jpg"
        alt="Tiller truck fire engine." />
  </a>
  <figcaption>
    Tiller or “hook-and-ladder” truck fire engine.
    (<a href="https://commons.wikimedia.org/wiki/File:Lafd_ladder_truck.jpg">Wikimedia Commons</a>)
  </figcaption>
</figure>

Example of a small image, linked to a larger version of the same image, embedded in a figure with a caption.

Audio

TODO

Video

TODO

TODO audio, video

Code

A common need especially for developers is to represent information that is input to or output from a computer. The most commonly used element for representing such information is <code>, usually rendered by browsers in some sort of monospace type. But there are other related elements that represent shades of semantics for describing computer-related information.

<code>: Represents a portion of computer code. This includes source code, a file name, a section of JSON, or even a keyword. You may indicate the computer language the code represents by indicating it, along with a language- prefix, in the class attribute, e.g. class="language-java"; this provides additional information and may allow a syntax highlighter to better format the code.
<kbd>: Represents user input to a computer, such as keyboard commands to enter. This includes not only text but also voice commands, menu items, or keystrokes.
<samp>: Represents the output of a computer program.

When showing blocks of code, you should wrap the entire block in a <pre> element to indicate that the line breaks are predetermined.

Example Code Block Sample Rendering

Example Code Block	Sample Rendering
`<pre><code class="language-java">package com.example; public class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!"); } }</code></pre>`	`package com.example; public class HelloWorld { public static void main(String[] args) { System.out.println("Hello, World!"); } }`

<pre><code class="language-java">package com.example;

public class HelloWorld {

  public static void main(String[] args) {
    System.out.println("Hello, World!");
  }

}</code></pre>

package com.example;

public class HelloWorld {

  public static void main(String[] args) {
    System.out.println("Hello, World!");
  }

}

Data

TODO <data>, <time>, etc.

TODO mention data-* and how history repeats itself regarding XML namespace prefixes

Tables

HTML has long had the ability to present tabular data, that is, information arranged in rows and columns. The <table> element is one of the most useful but also most abused parts of HTML. Its purpose is to present data in cells that are arranged in rows and columns.

An HTML table can be divided into an optional header, an optional footer, and the main table body. Each section comprises a series of rows, and each row is composed of several cells. There is no element for a “column” as such—a column simply comprises the cells in each row that are in the same position.

The following are the fundamental table elements. See HTML 5.1 § 4.9. Tabular data for more details.

<table>
  <caption>Command-Line Options</caption>
  <thead>
    <tr>
      <th>Option</th>
      <th>Alias</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code>list</code></td>
      <td></td>
      <td>Lists all available widgets.</td>
    </tr>
    <tr>
      <td><code>--help</code></td>
      <td><code>-h</code></td>
      <td>Prints out a help summary.</td>
    </tr>
  </tbody>
</table>

Command-Line Options
Option	Alias	Description
`list`		Lists all available widgets.
`--help`	`-h`	Prints out a help summary.

Example HTML table with sample rendering.

<table>: The outermost element of an HTML table.
<caption>: (optional) Contains a title for the table. If the <table> is the only content in a <figure>, the W3C recommends you use a <figcaption> for the entire figure instead.
<thead>: (optional) Groups a set of rows that represent the headers of the table. These rows may be repeated after a page break in the middle of the table when printing, for example. A user agent may allow the table body to scroll separately from the header.
<tbody>: Contains the rows that make up the main part of the table. The <tbody> element is technically optional (but still a good idea); the rows may be placed directly within the <table> element. Multiple <tbody> elements are allowed if you want to divide your table into sections.
<tfoot>: (optional) Groups a set of rows that represent the footers of the table.
<tr>: Represents a row of information, and contains the cells to appear in each column
<td>: Contains a single cell of information within a row.
<th>: Used in place of <td> to represents a header cell, such as at the beginning of a row. Normally header cells that represent column headers are placed in a table header <thead> element.

Normally in each row there appears a single <td> or <th> in the position of each column, containing that column's cell contents (even if empty). HTML allows you to “merge” a cell with several others in front and/or below it by indicating a column and/or row span attributes for the cell element. If a span greater than one is given, then no additional <td> or <th> elements are provided for the merged cells.

colspan: (optional; defaults to 1) Indicates the number of columns the cell should take up.
rowspan: (optional; defaults to 1) Indicates the number of rows the cell should take up.

Semantics

When HTML was created, although some of its elements indicated the purpose of the content (such as a <p> to indicate a paragraph), other elements specified how the content should appear to the user. One of the most notorious examples of such presentation-oriented elements was the <font> element, which specified a specific type, size, and color of text. This caused numerous problems. Besides often lacking a specific font on the user's browser, the element did nothing to indicate why the text should be presented in a different way. A screen reader used by someone with visual limitations (see Accessibility below) would be at a loss to be able to convey the significance of the indicated style.

Modern web design stresses using elements that indicate the semantics or the meaning of the content. Rather than indicating that text should be in italics, for example, the <em> element should be used on indicate that the text is emphasized. The browser may indeed show the text in italics, depending on the styles in effect. But the <em> element indicates why the text is italicized (as emphasis rather than a definition, for example). This allows accessibility technology to more appropriately derive the meaning of the document. It allows authors to tweak styles more easily and consistently. And it allows computer to better search, process, and transform documents if they are semantically rich.

Initially HTML deprecated the <b>, <i>, and <u> elements because they seemed purely presentation-oriented—indicating “bold”, “italics”, and “underline”, respectively. Later the WHATWG realized that there were some instances that bold and italics were customarily used in text without indicating emphasis. The WHATWG therefore rehabilitated these elements and brought them back, now defined in more semantic terms.

Element	Description	Example	Example Rendering
`<b>`	Draws attention to text, without an alternate voice or mood, and without placing special semantic emphasis on the words. This element is commonly rendered in bold. Useful for representing keywords or product names. See also `<strong>`.	`Step 5: Place the <b>broom</b> in the <b>closet</b>.`	Step 5: Place the broom in the closet.
`<i>`	An alternate voice or mood. This element is commonly rendered in italics. Useful for representing a taxonomic designation or a term from another language. The `<i>` element originally stood for "italics", but for general emphasis the more semantic `<em>` should usually be used.	`The poetry of Vinicius de Moraes frequently refers to <i lang="pt">saudades</i>.`	The poetry of Vinicius de Moraes frequently refers to saudades.
`<u>`	“Unarticularted” textual annotation, such as indicating a misspelled word or a Chinese proper name. This element is commonly rendered with an underline. The `<u>` element originally stood for "underline", but for general emphasis the more semantic `<em>` should usually be used. *Only use `<u>` if you are absolutely sure it is appropriate—and it usually isn't.*

As you can see, the subtle semantic distinctions between these elements are less than clear. It is probably best not to use these elements unless there is a clear need. See The i, b, em, & strong elements.

Accessibility

TODO

XHTML

During the years of HTML stagnation before the introduction of HTML5, the W3C concentrated on reformulating HTML in terms of XML, which was wildly popular at the time. The W3C created several specifications it referred to as XHTML, using the media type application/xhtml+xml. The first version, XHTML 1.0 included several DTDs for various combinations of HTML and SVG, some including elements for backwards compatibility. The second version, XHTML 1.1, attempted to modularize the DTDs and added XML Schema definitions. The third version was to produce XHTML 2.0, a reformulated XHTML abandoning backwards compatibility and integrating several new XML vocabularies; this effort was eventually abandoned.

In theory XML brings several benefits, including simpler parsing via a more predictable tree structure, along with the ability to intersperse elements from other namespaces. Unfortunately the W3C's formulation of XHTML encountered several problems:

The XHTML DTDs and XML Schemas were large, rigid and complicated.
Browsers never completely handled XML parsing correctly, complete with namespaces.
Web designers found the XML syntax too complicated and thereby restrictive.
Web application developers found the intricacies of namespace access confusing and unnecessary.

HTML5 XHTML

Because of the benefits of a predictable syntax, the XHTML has been revived in the HTML5, but not with a complicated set of DTDs or XML schemas. Rather the W3C simply allows HTML5 to be stored in either of two syntaxes: the “HTML syntax”, which has been discussed throughout this lesson; and the “XHTML syntax”, which in large part simply means that the document follows the well-formedness rules of XML. See HTML 5.1 § 1.6. HTML vs XHTML.

XML Declaration

An HTML5 document using the XML syntax may include an XML declaration, but this is incompatible with the HTML syntax.

Media Type

Documents using the XML syntax must be transmitted using the application/xhtml+xml media type rather than text/xml. The media type is the primary indicator of which syntax an HTML5 document uses. When loading an HTML5 document from a file system, a browser may infer the XHTML syntax by the use of an xhtml filename extension and/or the presence of an XML declaration.

Don't serve your HTML documents on a public web server using the application/xhtml+xml media type. Many web browsers cannot handle XML parsing correctly. For those web browsers that do support it, the resulting document model tree functions differently in XML mode, and will likely cause problems with JavaScript web frameworks.

Namespaces

(X)HTML polyglot skeleton document.

<!DOCTYPE html>
<html lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta charset="UTF-8"/>
  <title>(X)HTML Polyglot Skeleton</title>
</head>
<body>
  …
</body>
</html>

In the XML syntax the HTML namespace http://www.w3.org/1999/xhtml must be declared. Because HTML does not recognize the colon : character in names as indicating a namespace prefix, for compatibility with the HTML syntax documents should declare the HTML namespace as the default namespace, as shown in the figure.

Character References

XML Predefined Entities

Entity	Value
`&`	`&`
`<`	`<`
`>`	`>`
`'`	`'`
`"`	`"`

Both the HTML and XML syntaxes support the predefined entity references XML, shown in the figure on the side. In addition HTML supports over 2,000 named character references, including letters such as Á for Á (U+00C1), symbols such as © for © (U+00A9), and even icon characters such as &phone; for ☎ (U+0260E). See HTML 5.1 § 8.5. Named character references.

The rules of XML, however, indicate that all entities other than the predefined entities must be defined in an internal or external XML DTD. If you try to parse an HTML5 document using the XML syntax, or you try to open an HTML5 document as XML in a browser, the XML parser will refuse to load the document if it encounters one of the HTML named character references.

Don't use named character references in your HTML document other than the predefined entities &, <, >, ', and ". With support for Unicode in modern editors, there is no reason not to simply use the desired character itself. Rather than using —, for instance, just use an em dash character — (U+2014) directly in your document.

Servers

TODO review information about how to deploy to Tomcat; mention index.html; discuss content type mapping

Review

Summary

TODO

Gotchas

Don't use elements that try to dictate the font and style of a document.
Although technically HTML5 has given <b>, <i>, and <u> new semantic definitions, you should avoid them unless there is a clear need. Otherwise you risk unintentionally encoding style information in your document. Usually there is a more appropriate semantic element for the job.
Many browsers cannot handle the application/xhtml+xml media type. Serve your documents as text/html, even if they follow the XML syntax for HTML5. Just be sure that they comply with the HTML syntax as well.
Named character references other than &, <, >, ', and " will not be recognized if parsed as XML.
Don't use the <table> and related elements to lay out information on a page.

In the Real World

Try to indicate the semantics of content by using appropriate elements.
Menus are often represented by lists such as <ol>, because a menu is a list of links.
Serve your HTML documents using the text/html media type.
Don't use character references other than &, <, >, ', and ".

Think About It

Is the XML syntax of HTML5 beneficial?

Self Evaluation

Who invented the World Wide Web?
What was the most comment version of HTML before the arrival of HTML5?
What is the biggest difference between the rules for HTML tags and for XML tags?
What is quirks mode?
What doctype should be used for all new HTML documents?
What is the difference between a header and a heading?
How would you decide whether to use an ordered or unordered list?
How do you indicate that an HTML5 document is using the XML syntax?

Task

Create a simple web site to serve as the web user interface for Booker. The site will as of yet not actually list any books.

Create at least four pages:

Home

General welcome information.

When mentioning publications, provide link to the section on the help page describing the types of publications (see below).

Search

Reserved for future book searching.

Help

Provides instructions for the program.

Include a section with a table of the different publication types, their definitions, and what sort of identifiers they use.
Provide a link to the online Booker RESTful API documentation.

About

Information about the program, the author(s), and copyright if any.

Include some sort of picture.
Use a figure to provide a caption for the picture.

Include a header for all the pages.
Include a navigation menu inside the header.
Provide author and/or copyright information in a footer for each page.
Include in the footer a link to send an email to request more information.
Serve this Booker site at the /booker/ path in your web application.

References

Resources

Acknowledgments

HTML content Venn diagram is Copyright © 2016 W3C^® (MIT, ERCIM, Keio, Beihang) and used from the W3C recommendation HTML 5.1 § 3.2.4.2. Kinds of content under the W3C Software and Document Notice and License of 2015-05-13. W3C liability, trademark and permissive document license rules apply.
Over the river, and through the wood … from The New-England Boy's Song about Thanksgiving Day by Lydia Maria Child, in the public domain.
Some symbols are from Font Awesome by Dave Gandy.

HTML

Goals

Concepts

Lesson

History

HTML 3

HTML 4

HTML5

Content

Comments

Elements

Attributes

Global Attributes

Formatting

Document Type Declaration

Structure

Metadata

Sectioning Content

Headings

Headers and Footers

Flow Content

Paragraphs

Groups

Lists

Phrasing Content

Links

Media

Images

Audio

Video

Code

Data

Tables

Semantics

Accessibility

XHTML

HTML5 XHTML

XML Declaration

Media Type

Namespaces

Character References

Servers

Review

Summary

Gotchas

In the Real World

Think About It

Self Evaluation

Task

See Also

References

Resources

Acknowledgments