Cowford Ed, © May 2003
In January 2000, the World-Wide Web Consortium (W3C) declared XHTML the standard for web pages. With XHTML, HTML has been reformulated as an application of XML (eXtensible Markup Language), which is itself a restricted form of SGML (Standard Generalized Markup Language). The point (beyond creating excellent acronyms) is to make the data that's contained in web pages accessible for many years to come, and by a wide variety of applications.
XHTML is a standarized, non-proprietary encoding language that provides a means for sharing data online. In other words, it's an "adaptable internet tagging system" that anyone can use to make web pages.
XHTML is "purer" than old-fashioned HTML. It makes it easier for non-PC platforms (like Web-TV, palm computers, cell phones, etc.) and non-visual devices (like voice or Braille readers) to process web page data. And, it makes for more efficient indexing of web pages by search engine robots.
But, files created with XHTML are still called "HTML" documents.
To open Notepad, click "Start" on your desktop taskbar, then "Programs," "Accessories," and "Notepad."
All you need to create an HTML document is Microsoft® Notepad, or some other plain text editor. Simply save the document with the filename extension .html or .htm.
If your default web browser is Microsoft ® Internet Explorer, double-click on your original HTML file to open it in IE. Next, click on "view," then "source" from the toolbar. This opens the file in Notepad (while it stays open in IE), and it's ready for editing. After you save any changes made to your file, you can "refresh" the web page in IE to see how the revisions look.
There are two main parts to the web page:
Thus, the five required elements
are:
DOCTYPE, html, head, title, and
body.
It is not required that you type your tags and data on separate lines, or use double-spacing or indentations to separate different elements. These are used at the webmaster's descretion just to make the code easier for people to read.
<!DOCTYPE ... >
<html ... >
<head>
<title>[Web page
title goes here]</title>
</head>
<body>
[Web page content goes
here]
</body>
</html>
See also: Web Page Template
XHTML elements are defined by
prescribed tags between
less-than (<) and
greater-than (>) angle
brackets. Tags identify and delimit the various parts of
the web page, and they help control the display of text and
graphics.
The DOCTYPE precedes the html root element, and consists of just one (XML) tag enclosed in angle brackets. It starts off with an exclamation point, uses upper- and lowercase letters, and it isn't "closed" (i.e., there's no closing tag containing a slash mark).
The DOCTYPE enables different types of applications to display the web page consistently; without it the browser (or other device) must guess at what you intend, sometimes with unfortunate results. It also enables you to use W3C's HTML Validator, a very useful online tool that identifies any problems with your code.
There are three choices for an HTML DOCTYPE:
"99.9% of Websites are Obsolete." PC-based browsers (like Internet Explorer, Netscape, etc.) will likely always be able to read outdated HTML encoding because they have the memory capacity to cope with it. And you will still find tons of the old stuff as you look at examples on the internet. But that doesn't mean that it's good to perpetuate it ...
The specifics of the DOCTYPE are prescribed, so just copy and paste whichever one is appropriate:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
According to 2002 statistics, about 88% of the world uses either IE5 or IE6. But some people cling to outdated browsers (like the bug-infested Netscape 4, even though they're up to Netscape 7.02), so many (probably most) web page designers use the W3C-authorized "transitional" mode, that mixes deprecated HTML tags with the standard XHTML ones. (But in this guide we will avoid using outdated coding--no use confusing the issue by learning stuff that you will have to "unlearn" later.)
Recognizing some particular problems with "legacy" browsers, the W3C has put together some HTML Compatibility Guidlines. (NOTE: Their compatibility guidelines have been built into our practical discussion. For instance, the "XML Declaration" that might precede the DOCTYPE has been omitted.)
Thus, there's a great deal of effort made trying to get web pages to "look" the same, regardless of the age of the browser. Chances are good, however, that those who use outdated browsers aren't that interested in "style" anyway--if they were they would upgrade since current versions of most browsers can be downloaded for free. So, you may not want to clutter up your code with unnecessary "work-arounds" trying to be all things to all browsers (which is impossible, anyway).
Just make sure that your content flows logically even if the style elements are removed.
In the HTML document proper, besides being enclosed in angle brackets (< >) all XHTML tags must be written in lowercase letters. They also must be "closed," meaning they either are paired with a separate closing tag, or are self-closing "empty" tags.
Paired opening and closing tags are used to enclose all web page content. For example, <body> </body> respectively show the beginning and ending of all the web page data that will appear in the browser window. Note that in the closing tag a slash mark always precedes the tag name.
Opening and closing tag pairs must be properly "nested." This means you may not close one element until all the elements it contains have been closed first.
GOOD: <tag1> <tag2> data </tag2> </tag1>
BAD: <tag1> <tag2> data </tag1> </tag2>
The self-closing slash in "empty" tags is new with XHTML; it does not appear in the earlier HTML encoding that is still seen in abundance on the internet.
Self-closing tags are complete in themselves, and include everything they need inside a single pair of angle brackets. (In other words, they do not mark the beginning or end of a data string which they "contain.") These "empty" tags have a blank space and a slash mark before the closing angle bracket, (e.g., <hr />). (The blank space before the slash isn't actually part of the XHTML specification; it's included so outdated browsers can properly process these tags, but might be disapproved later on.)
HTML Root and Head Elements, and <body> Tag | Basic Tags for Text Markup | Tags for Adding Links and Images | Tags for Tables and Lists |
Most tags an have identifying or modifying attributes. Attributes must have the following structure, in which the attribute's name and value are variables, and the equals sign and double quotation marks are the prescribed punctuation:
attribute_name="Value"
Note that the attribute_name may use only lowercase letters, but, depending on the tag it applies to, the value might use upper- and lowercases, numbers, spaces, punctuation marks, and even full sentences.
If you want white text on a black background, you can modify the <body> tag using the attribute named "style," with a detailed "value" that specifies these display features:
<body style="color: white; background-color: black">
XHTML more thoroughly separates the data content (i.e., what the document says) from its stylistic presentation (i.e., how the document looks, including fonts, background colors, margin widths, text-alignment, and the like).
Style features are assigned through the use of "cascading style sheets" (CSS) , which were first used with HTML4.0. They are called "cascading" because they may be assigned at different levels that flow from one level into the next to create the desired stylistic effects.
Cascading style instructions can be encoded using:
Successive levels inherit several style properties from the preceding levels, unless they redefine the property.
For each style definition, there is a "selector" (that specifies what you want to change) and a "declaration" (that tells how you want it to look).
In the style sheets the declaration is surrounded by curly braces:
With inline style definitions the declaration becomes the value of the style attribute:
If one selector has multiple declarations, the declarations are always separated by semicolons.
Say you want your section headings (for which you've used <h2> </h2> as the markup) to have white text on a black background ...
Using either external or internal style sheets, it is possible to create one "class" selector that can be applied to any number of elements that will share the same style properties. Choose a name for the selector, such as "red" for a class that will change the font color to red. In the style sheet you list the selector's name beginning with the prescribed punction mark for classes, the period. Then set up your declaration:
Every time you want red letters just add the class attribute to whatever tag you're dealing with, and it will refer back to the specified declaration in the style sheet:
Also used in style sheets, the id attribute works similarly to the class attribute, but in reverse. Instead of referring elements back to the style sheet the id refers the style sheet definition(s) to just one specific element, and you use the # at the beginning of the id's name in the styles list instead of the period.
But why?
Say you have a group of paragraphs that you want to show up as white text in a navy box in the middle of the page. So, you group the paragraphs within a division that you id (and also "name" for "backward compatibility") as "navybox." Only you then realize that you can't see the blue-colored links -- you need the links in "navybox" to be in a contrasting color from the background, but you want the links outside "navybox" to stay as they are. You could try inline style attributes on all the "navybox" links, but the style sheet is easier, plus you can add some other effects not possible with the inline definitions.
As you see, CSS is complex topic all by itself, and there are many more effects possible. The W3C offers a brief tutorial called Adding a Touch of Style, by Dave Raggett. Another free online CSS Tutorial is available from www.w3schools.com.
In truth, XHTML isn't quite as easy as old-fashioned HTML (and certainly not as forgiving). But now you know the basics of creating webpages. And even though there's still plenty more to learn, you can do it!