Client-Server Communication
Understanding how browsers get content in the first place is an essential part of understanding the web ecosystem. Concepts like IP, TCP, DNS, and other topics in networking are of great relevance, but can also be of great complexity. Here, we will limit our focus mainly to the application layer and what communication looks like from the browser’s perspective.
If you are unfamiliar with the TCP/IP model and related networking concepts, I highly recommend skimming over Wikipedia’s article on the subject.1
Uniform Resource Locators (URLs)
When a user wants to visit a website, or the browser needs to fetch certain assets to load in a page, we must first know exactly what resource to request from the server. In a vast internet comprised of different machines all over the world, and each machine containing many resources that we may potentially want to retrieve, we need an unambiguous way to identify (or “address”) individual resources. Uniform Resource Locators, or URLs, provide a standardized address format for this purpose.
The following example URL is annotated with the names of its component parts:2
hierarchical part
┌───────────────────┴─────────────────────┐
authority path
┌───────────────┴───────────────┐┌───┴────┐
abc://username:password@example.com:123/path/data?key=value&key2=value2#fragid1
└┬┘ └───────┬───────┘ └────┬────┘ └┬┘ └─────────┬─────────┘ └──┬──┘
scheme user information host port query fragment
Each of these parts have specific meaning and uses. In working with URLs in typical web applications, we deal mostly with the scheme, host, path, and query parts.
hierarchical part
┌─────────┴───────────┐
authority path
┌────┴────┐┌───┴────┐
https://example.com/path/data?key=value&key2=value2#fragid1
└─┬──┘ └────┬────┘ └─────────┬─────────┘ └──┬──┘
scheme host query fragment
The scheme determines the mechanism or protocol used to access the resource
over the network. Many standard schemes also imply a default port if no port is
specified. Common schemes in the browser include http
and https
.
The host part, also referred to as the domain, roughly specifies which machine on which the resource is located. The host can be a specific IP address, but is most often given in terms of a human-readable domain name which is mapped to a specific IP address using the Domain Name System (DNS).
The path component resembles the format of file system paths, containing
hierarchical segments separated by slashes (/
). The path allows us to specify
what resource we want from a particular machine.
The query part is a string that can be used to send additional information
in requests to a given URL. It is separated from the path by a question mark
(?
) and most commonly takes the form of key-value pairs separated by a
delimiter such as an ampersand (&
) or semicolon (;
).
For example, the Wikipedia article on URLs is located at
https://en.wikipedia.org/wiki/URL
└─┬─┘ └───────┬──────┘└───┬───┘
scheme host path
The scheme here is https
, a secure encrypted variant of the HTTP protocol.
HTTPS connections are served over port 443 by default, so the above URL is
effectively equivalent to https://en.wikipedia.org:443/wiki/URL
.
The domain part of this URL is en.wikipedia.org
. At the time of writing, this
domain maps to the IPv4 address 208.80.154.224
, so the URL would be equivalent
to https://208.80.154.224:443/wiki/URL
.
Hypertext Transfer Protocol
Browsers communicate primarily using the Hypertext Transfer Protocol. HTTP is itself a protocol that operates on top of a Transmission Control Protocol (TCP) session, which ensures the integrity of the data being communicated.
You may be surprised to learn that HTTP requests and responses are largely
human-readable plaintext. Let’s look at an example: say you want to go to a page
on example.com
. Into the address bar we type example.com/index.html
, hit
enter, and zoom – off goes the browser to do some work. After resolving the
domain example.com
into an IP address, the browser opens a new TCP connection
to that IP address and sends the following HTTP request message:
GET /index.html HTTP/1.1
Host: example.com
The first line of the message contains the HTTP method (or verb) used in the request. In this case, where the browser is starting to load a page, it sends a GET request. GET requests are used to fetch a particular resource, and so the total size of a GET request is quite small as it has no payload of its own. After the method is the resource path. Notice that the path is relative to the domain, as this particular request is already being routed to a specific receiving host (server IP address and port). The end of the first line indicates the version of the HTTP protocol being used.
It is not uncommon for a single machine to serve multiple, potentially unrelated websites, or for multiple machines to exist behind a firewall or other network configuration that has one public-facing IP address. In such cases, DNS records for multiple domain names will point to the same IP address, and connecting clients for any of their hosted websites will communicate over ports 80 or 443.
The second line of an HTTP GET
request specifies the host (domain name) of the
request. The receiving server can use this to resolve the request path
internally in case it is serving sites for multiple domains.
After the server receives and processes the request, it forms an HTTP response message and sends it back over the TCP connection to the client (browser):
HTTP/1.1 200 OK
Cache-Control: max-age=604800
Content-Type: text/html
Date: Sun, 02 Jul 2017 22:21:10 GMT
Etag: "359670651+ident"
Expires: Sun, 09 Jul 2017 22:21:10 GMT
Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
Server: ECS (fty/2FA4)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1270
<!doctype html>
<html>
<head>
<title>Example Domain</title>
These are the bytes straight from the payload of the TCP segment. The first line states the HTTP protocol version and the response code. Underneath this, on one line each, are the rest of the HTTP headers – other fields containing information about the response and its content. After the last header there are 2 newlines, and then the next line starts the content of the message body. This response body, as we can see, happens to be an HTML document. Notice that it’s plain UTF8-encoded text, which the browser will parse and eventually turn into pixels on the screen in a process we will soon investigate further.
Client-Server Separation
It is important to realize that, from the perspective of the web browser, we know very little about the server other than the fact that it apparently “speaks” HTTP, allowing us to communicate with it. We sent a request for a page and got that page back as HTML in a response, but we know nothing about how the HTML page itself was created.
This fact – this paradigm of communication – has some very interesting, and indeed fundamental, implications for how we approach architecting our application system as a whole and how we reason about the various system components and their role in the overall application lifecycle. The fact is this: the browser does not care how any content is generated by the server. It doesn’t matter that a server is running python with Django, or Java with JSP, or Node.js, or PHP, or Ruby on Rails, or that it’s using some cool, fancy backend templating language, or that maybe the entire page itself is just a static file sitting in a directory on the disk somewhere…
The browser doesn’t care. It is completely oblivious, in fact; the translation from database objects to models, through templates and then to standard HTML (or whatever it is that a given server does) – all of that happens before the bytes ever leave the server. From the perspective of simply rendering a page, the browser deals with HTML, CSS, and JavaScript.
This also has some interesting implications for those who work primarily as a front-end developer: if your focus is mostly in the browser, you can have great flexibility in working on various projects and teams that use very different server-side technologies. Maybe you have to learn some new syntax for their specific templating language or framework, but the end goal is the same: produce correct and performant HTML, CSS, and JavaScript to send to the browser.