HTML5, 20 Lessons to Successful Web Development (2015)
PART I Basic HTML
LESSON 1 An Introduction to HTML
To view the accompanying video for this lesson, please visit mhprofessional.com/nixonhtml5/.
In his famous play for radio, Under Milk Wood, the poet Dylan Thomas chose to start with the words “To begin at the beginning,” and that seems also the appropriate place to start this book on HTML5, because many of you will be new to HTML, while others will be seasoned professionals who wish to add the new skills of HTML5 to your toolkit.
If you are new to web development, simply work your way through the entire book, or if you already use HTML, I still recommend that you browse through these early lessons as a refresher before moving on to the HTML5 elements (often called tags). So let’s start at the beginning and look at what HTML is all about.
Each lesson includes examples and screen grabs to illustrate the techniques being explained, and you can download the example files from the companion website, at 20lessons.com. There is a .zip archive file downloadable from the front page in which each lesson has its own folder, within which you will find the example files and associated content. For example, the examples from this lesson are all in the lesson01 folder.
What Is HTML?
HTML stands for HyperText Markup Language, and it was invented by Sir Timothy Berners-Lee in the early 1990s to solve the problem of quickly and efficiently distributing documents between scientists around the world who were working with experimenters at CERN (the European Laboratory for Particle Physics, where the Large Hadron Collider is now also situated).
The Internet was already in place and there were tens of thousands of computers connected to each other using it, but there was no easy means of publishing content for all to see, and in which references to other documents could be easily followed. So Berners-Lee created a hyperlinking framework he called the Hyper Text Transfer Protocol, or HTTP (the same set of letters at the front of a web address). He also created a language to use this protocol, which he called HTML (for Hyper Text Markup Language). To utilize both these new inventions, he also wrote the world’s first web browser, of which Figure 1-1 is a screenshot.
FIGURE 1-1 Berners-Lee’s original NextEditor browser
This was a remarkable invention and was widely hailed in the computer press of the time as heralding a new age of communication. Until then the best connectivity computer users had experienced was dialing in to a local bulletin board, usually with only one, or at the most just a few, phone lines attached. You could then upload or download files and read and leave messages, but then you had to log off again to allow other people to take your place. Occasionally these bulletin boards would swap messages every few days with other boards, so users could interact with people further away, but only with a huge delay.
But right away HTML changed everything because now there was a way for all these bulletin boards and, in fact, any computers to stay in touch with each other, and documents could be stored in a multitude of places, which now were only ever a click away. People all over the world could connect to a local Internet host and immediately be in touch with any other person logged in to any other web-connected computer. It’s hard to feel that way about it now that we’ve had the internet for so long, but at the time it was revolutionary, and within the course of a few years, there were three major graphical browsers and more than five million Internet users—while today that has mushroomed into over two billion people who regularly use the Web!
HTTP and HTML Basics
Let’s look more closely at these two acronyms, starting with HTTP, which is the communication standard used for controlling the requests and responses that occur between a web browser running on your computer and a web server, and stands for HyperText Transfer Protocol.
The job of the web server is to accept a request from a client such as a web browser and then to reply to it in the most meaningful way it can, generally (as far as you are concerned) by simply returning the contents of a requested document, but in the process many other requests and responses also take place. This returning of a web page is called serving, which is why the web server is so named.
In between a client and server there can be a multitude of other computers and devices such as routers, gateways, and proxies. A web router chooses the best route to use in order to transfer data as fast as possible between the client and server. Gateways are nodes on the edge of one network that act as a connection from it to another, and proxies support indirect connections by acting as if they are the destination (or server), and then fetching the data you request and returning it to you, often employing a cache in which commonly requested documents are stored to save fetching them repeatedly.
These devices generally use an Internet protocol suite called TCP/IP for sending all this information flying across the Web, although there are other protocols that could be used to send HTML data (but which generally aren’t, and are therefore beyond the scope of this book).
Unlike the bulletin boards mentioned earlier, which supported only one user for each connected telephone line, web servers can use a single Internet connection to allow dozens, hundreds, or even thousands of simultaneous users at a time (depending on the power of the server).
Each web server spends much of its time simply listening for incoming requests. When one arrives, the server returns a response to confirm safe receipt of the request. It does this by sending a status message such as the following back to the client:
HTTP/1.1 200 OK
After this the server then sends its own message, which generally will be the document that was requested by the client, or it could be an error message if the document was not found.
If a document is returned, it can be in any format such as audio, video, images, or, most commonly, HTML, which consists of a simple text file within which the text is separated into different sections using a special set of markup tags, and which commonly will have the extension .htmor .html (although any extension is acceptable, as long as the server knows about it). To indicate that this type of file is being sent to the client, a web server will begin the document with a header telling the client about it, which will look like this:
Content-Type: text/html; charset=utf-8
Here the type of document is clearly specified to be HTML, and the character encoding used by the file is set to utf-8. But other header types could also be sent. For example, if the requested document has been moved to a new location, the web server might, instead, return the following headers:
The first header tells the client that the document has moved and, instead of sending the document, the second line states where the document can now be found. Then it’s the client’s job to go off and request the document from the new location, which could be on the same or a different web server.
As you might imagine, there are many more different types of headers and information that can be sent back and forward between web servers and clients, of which the most common one you may encounter is the following:
HTTP/1.0 404 Not Found
After sending this header, the web server will then serve up a page explaining why the document could not be found. Because of the header response code of 404, these pages are often referred to as “404” pages.
The Request/Response Sequence
Following is an example of a web client talking to a web server from which it is requesting a file:
1. You enter a URL such as http://myserver.com into your browser.
2. Your browser looks up the IP address for myserver.com.
3. Your browser issues a request for the home page from myserver.com.
4. The request crosses the Internet and arrives at the myserver.com web server.
5. The web server looks for the web page on its hard disk.
6. The web page is retrieved by the server and returned to the browser.
7. Your browser displays the web page.
In Step 1, the user enters a URL (Uniform Resource Locator), also known as a web address, into the browser’s input field. In this instance the root document (or home page) is being requested. Once the browser receives the request, then in Step 2, it makes a request to a set of servers on the Internet known as domain name servers. These translate sequences of letters such as myserver.com into an IP address, which consists of four groups of numbers separated by periods, like this: 184.108.40.206. In fact, all websites reside at IP addresses and you can demonstrate this by entering http://220.127.116.11 into a web browser, which should take you to Google’s website.
However, it’s difficult to remember such groups of numbers (and is even more so since IPV6 was introduced!). Therefore a system called DNS (Domain Name System) was invented, which simply stores domain names alongside their IP addresses, so that all you need to do is enterhttp://google.com, rather than an obscure set of numbers. Your browser then performs a DNS lookup, discovers that the IP this domain refers to is 18.104.22.168 and then initiates discussions directly with the web server at that address, as shown in Step 3.
In Step 4, the request your browser makes to the web server traverses the Internet and arrives at the destination server where, in Step 5, the page requested (in this instance the home page), is fetched from the server’s file system. In Step 6, the web server then transmits that page (preceded by a header) back to your web browser, which then displays the page in Step 7.
If the page was not found then in Step 6, an appropriate error header will be returned to the web browser. Also, web server scripting languages such as Perl and PHP may first manipulate the document and its contents by adding, removing, or changing contents according to any embedded scripting commands. Such documents are generally recognizable by their commonly used file extensions of .pl and .php.
The Difference Between Get and Post Requests
When requesting a document, it is possible for the web client (or browser) to request additional information or send information to the web server using either Get or Post requests. In a Get request, data is appended to the tail of a URL in the form of a query string, like this:
This URL directly sends the search lookup string of html5 to the Google web servers by passing it as a string value in the argument q. When Google sees this request, it knows to return to you all the pages it thinks are relevant to the request. A longer such request might look like the following, in which the + symbol is used in place of spaces:
Here the search string html5 course is passed to Google.
In a Post request, however, the additional information is passed from the client to the server in the headers, which is neater as far as the user goes, because it does not appear as part of the URL. Both get and post requests are discussed in detail later in this book.
HTML documents are simply text files in which extra tags have been added within angle brackets, like this: <head>. So, for example, the tag <i> tells the web browser that all following text should be displayed using an italic font. And when a </i> is encountered, the preceding slash (/) character tells the browser to disable the italics. Therefore you frequently find HTML tags in pairs. For example, in the following line of HTML the word fox will appear in bold face, and dog in italics:
The <b>fox</b> jumps over the <i>dog</i>.
The result looks like this:
The fox jumps over the dog.
There is a whole lot more to HTML, though, than simply markup tags, because many of the tags either support or require the use of attributes. These are arguments that you pass alongside the tag to provide further information to the web browser. Generally an attribute consists of an attribute name followed by the = sign and then either single or double quotation marks enclosing a value.
For example, to create a hyperlink that the user can click to navigate to another document, you use the <a> tag (which stands for anchor), like this:
<a href=′http://google.com′>Visit Google</a>
In a web browser this displays simply as:
In HTML tags you can generally use the single or double quotation marks interchangeably. Therefore <a href="http://google.com"> is equivalent to <a href=′http://google.com′>. Wherever possible, though, I tend to use single quotes because they don’t require pressing the Shift key to type them in. Also there are sometimes occasions when you need two levels of nested quotes, where I would then choose double quotation marks for the outer string, and then apply single quotes within it, like this: <p style="font-family:′Times New Roman′;">.
In this element the href part (which stands for hypertext reference) is the attribute name, and the string http://google.com is the attribute value. The content between the opening and closing parts of this tag is the text Visit Google, which is simply displayed, and if default styling is applied, it will be shown in underlined blue (although this is easy to change with HTML or CSS—there’s more on this later in the book). The final </a> closes the tag, ready for displaying in the browser.
There are several different types of attributes available, with different tags supporting different attributes, but to give you an overview, here are some of the more common ones you will encounter and use:
• class This attribute lets you supply a group name that may apply to this and other objects. For example <p class=′indent′> applies the class name indent to the <p> tag, which might be used by a style sheet (with a suitable rule) to indent the first line of all objects using it.
• style This attribute lets you apply a CSS style to an object by putting it within the quotation marks. For example, to apply the Arial font to a paragraph object, you could use the style attribute like this: <p style=′font-family:Arial′>.
• title Any HTML element may be given a title, which most browsers will use to display as a tooltip when the mouse passes over it. For example, the following anchor displays a tooltip when the mouse passes over it: <a href=′/′ title=′Go to the Home page′>.
Now that you understand the basics of what HTML is about, in the next lesson I’ll introduce the different parts of an HTML document and their associated tags, such as the <html>, <head>, and <body> sections.
Test how much you have learned in this lesson with these questions. If you don’t know an answer, go back and reread the relevant section until your knowledge is complete. You can find the answers in the appendix.
1. What does the acronym HTML stand for?
2. What is the difference between a web browser and a web server?
3. What does the acronym HTTP stand for?
4. What does a web proxy do?
5. What file extension is often used by HTML documents?
6. What is a 404 page more commonly known as?
7. What is the difference between an IP address and a domain name?
8. What is a query string?
9. What is an HTML tag?
10. What is a tag attribute?