Introduction to HTML
Hypertext Markup Language is the language used to… well, mark up hypertext. It serves to structure the content of our application.
What we really mean when we talk about HTML depends on context. In a strict sense, HTML as a language refers only to the syntax and grammar used to make up static web documents.
However, when speaking about dynamic web applications, the term HTML is often used to refer to overlapping realms of expected visual appearance, behavior, and other characteristics of document content in the context of the browser, in addition to just the syntax of the markup language itself.
As the browser reads the document from the HTTP response, it parses this HTML and builds an in-memory representation of the document called the Document Object Model, or DOM. The DOM is a W3C standard spec that allows “programs and scripts to dynamically access and update the content, structure and style of documents.”1
The DOM API is a critical part of web applications – it is the mechanism that ties together our spheres of content, presentation, and behavior. We will discuss more about the interactive properties of the DOM in a later chapter. For now, just keep in mind that the browser builds the DOM from your markup into a tree-like structure.
Structure & Syntax
To explore how HTML is structured, consider the following small sample document:
<!DOCTYPE html>
<html>
<head>
<title>A tiny HTML example</title>
<link rel="stylesheet" type="text/css" href="main.css" />
</head>
<body>
<header>Header content</header>
<main>
<p>Paragraph content</p>
</main>
<footer>Footer content</footer>
</body>
</html>
The doctype
tag seen at the beginning is the document type declaration,
and must always appear as the first line of the document. In HTML5, this terse
two-word declaration is all that is needed.
We see right away that the document structure is hierarchical with parts that
are clearly nested inside of others. These parts are called elements, each
of which is defined by tags. An element’s opening tag starts with an open
angle bracket (<
) immediately followed by the tagname, or element type.
The opening tag may include more information about the element, and ends with a
closing angle bracket (>
).
Most elements have both an opening and closing tag surrounding their content,
which may be text, other elements, a combination thereof, or no content at all.
A closing tag appears the same as an opening tag, except that the first angle
bracket and tagname are separated by a backslash (/
).
A few types of elements are defined by only a single tag as they either contain
no content – such as a line break <br>
– or their content comes from
elsewhere and is not enumerated in the document itself, like an image <img src="path/to/image.jpg">
. These may optionally have a trailing slash (e.g.
<br/>
), and so are sometimes also referred to as “self-closing tags”.
The important part to remember is that, in order to create a well-formed document structure, we want our elements to be cleanly nested. That is, for any given element that contains other elements, all of its contained elements should be closed before the element itself is closed (apply this rule recursively). Our well-formed markup ensures that the browser builds the DOM as a tree that is structured exactly as we expect. For example, from the HTML markup shown above, the browser would produce a DOM tree similar to that shown below.
The <html>
element is referred to as the root element as it is the
common ancestor of all other elements in the document. The <head>
element
contains metadata about the document which is discussed below. As such, the head
is not a visually-rendered part of the document. The <body>
element contains
all of the document content that should be rendered to the screen.
Elements can also have any number of attributes, which are defined inside
of the opening tag. For example, in the document above, the link
element
inside of the head
has three attributes: rel
, type
, and href
. Attributes
can optionally have associated values, which are assigned inside of quotation
marks. So, altogether, the parts of an element’s anatomy appear as shown in the
below example.
The attributes class
and id
are common for elements in general and are
useful in styling and scripting, as we will later see. The id
attribute is an
element identifier that is unique within the document; no two elements should
have the same id
. On the other hand, the same class can be applied to any
number of elements of any type, and a single element can have multiple classes.
These attributes have separate namespaces; for example, id="pretty"
and
class="pretty"
are completely unrelated.
Identifier and class names can contain dashes and underscores. To assign multiple classes to one element, the class names are separated by spaces.
<button id="my-button">press me!</button>
<div class="my-class-1 my-class-2">...</div>
The Head and Document Metadata
The head
element of an HTML document contains metadata about that document or
page. Though it is not directly rendered as content on the screen, some metadata
do affect how the page is displayed by informing the browser how it should
interpret document content in the body or specifying additional related content
to load. Many types of metadata are provided in meta
tags.
Charset
One of the first elements which should appear in a document’s head is its
charset
, which declares the character set and affects how the browser parses
the rest of the document. This should be utf-8
unless a different encoding is
specifically needed for existing compatibility reasons,2 as Unicode
supports the widest range of, and interoperability between, characters from all
languages.
<meta charset="utf-8" />
If your page has pieces of text that appear incorrect or nonsensical – especially where you expect non-ASCII characters (such as emoji or latin letters with accents) – check the encoding.
Viewport
Another meta
tag that affects document rendering is specifying the viewport
.
For new projects, it is recommended to use the following:
<meta name="viewport" content="width=device-width, initial-scale=1" />
We will go over the specifics of viewport mechanics in a later discussion of CSS and layout, but for now, we can roughly summarize this tag as instructing the browser to use the current device’s native screen size in its layout and sizing calculations for rendering the page.
In this example we also saw how meta
tags allow for arbitrary name-value data
pairs using the name
and content
attributes. Some named metadata, like
viewport
, are well-known and have specific meaning for major browsers. Other
often-used metadata are specific to popular social media or other web services
and help automated crawlers programmatically gain information about the
document.
Relative URL Base
By default, relative URLs used in a document are resolved relative to the
current page’s URL. For example, on the page http://example.com/foo/bar
, the
relative url ./baz
would resolve to http://example.com/foo/baz
.
We can specify a different base URL using the base
element with an href
attribute.
<base href="http://example.com/foo" />
A page with the above base
element in its head would instead resolve the
relative URL ./baz
to http://example.com/baz
, regardless of the page’s own
URL.
The base
tag must appear before any relative URLs are used in the document, so
it is recommended to appear close to the top of the head. The base’s href
attribute can itself be a relative URL, which is resolved to an absolute URL
before being used as the base for other relative URLs.
Links and External Assets
The document head can also contain references to assets at other URLs.
The link
tag can be used to create these references. The link
tag