XHTML (Extensible Hypertext Markup Language) is a family of XML markup languages that mirror or extend versions of the widely used Hypertext Markup Language (HTML), the language in which web pages are written.
While HTML (prior to HTML5) was defined as an application of Standard Generalized Markup Language (SGML), a very flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. Because XHTML documents need to be well-formed, they can be parsed using standard XML parsers—unlike HTML, which requires a lenient HTML-specific parser.
XHTML 1.0 became a World Wide Web Consortium (W3C) Recommendation on January 26, 2000. XHTML 1.1 became a W3C Recommendation on May 31, 2001. XHTML5 is undergoing development as of September 2009, as part of the HTML5 specification.
XHTML 1.0 is „a reformulation of the three HTML 4 document types as applications of XML 1.0“. The World Wide Web Consortium (W3C) also continues to maintain the HTML 4.01 Recommendation and the specifications for HTML5 and XHTML5 are being actively developed. In the current XHTML 1.0 Recommendation document, as published and revised to August 2002, the W3C commented that, „The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content’s backward and future compatibility.“
However, in 2004, the Web Hypertext Application Technology Working Group (WHATWG) formed, independently of the W3C, to work on advancing ordinary HTML not based on XHTML. Most major browser vendors were unwilling to implement the features in new W3C XHTML drafts, and felt that they didn’t serve the needs of modern web development. The WHATWG eventually began working on a standard that supported both XML and non-XML serializations, HTML 5, in parallel to W3C standards such as XHTML 2. In 2007, the W3C’s HTML working group voted to officially recognize HTML 5 and work on it as the next-generated HTML standard. In 2009, the W3C allowed the XHTML 2 Working Group’s charter to expire, acknowledging that HTML 5 would be the sole next-generation HTML standard, including both XML and non-XML serializations.
XHTML was developed to make HTML more extensible and increase interoperability with other data formats. HTML 4 was ostensibly an application of Standard Generalized Markup Language (SGML); however the specification for SGML was complex, and neither web browsers nor the HTML 4 Recommendation were fully conformant with it. The XML standard, approved in 1998, provided a simpler data format closer in spirit to HTML 4. By shifting to an XML format, it was hoped HTML would become compatible with common XML tools; servers and proxies would be able to transform content, as necessary, for constrained devices such as mobile phones. By utilizing namespaces, XHTML documents could provide extensibility by including fragments from other XML-based languages such as Scalable Vector Graphics and MathML. Finally, the renewed work would provide an opportunity to divide HTML into reusable components (XHTML Modularization) and clean up untidy parts of the language.
Relationship to HTML
|This section needs additional citations for verification.
Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (September 2009)
The only essential difference between XHTML and HTML is that XHTML must be well-formed XML, while HTML need not be. (HTML 4 and earlier were nominally SGML, while HTML 5 defines its own parsing model in great detail.) Some examples of differences this imposes in practice are:
- In HTML, some tags (e.g.,
<br>) are always empty and may not have closing tags, whereas all elements must be explicitly closed in XHTML. XML permits two ways of writing empty elements:
<br></br>. In XML these are interchangeable, and either can be used freely for any tag. However, if XHTML content is to be served with the text/html Internet media type to legacy browsers, only the self-closing form should be used for always-empty elements. They should be expressed like
<br />, with an extra space that can help parsing by some specific out of date browsers. An explicit closing tag should be used for empty elements that are not always empty (like
<script></script>). Without following these rules and recommendations, some browsers will parse empty elements incorrectly.
- Similarly, HTML permits omitting end tags for some elements, such as
<p>. XHTML forbids this.
- In HTML, almost everything is case-insensitive, while in XML, all element and attribute names are case-sensitive. XHTML requires all element and attribute names to be lowercase, while in HTML documents it’s common to find uppercase or even mixed-case names.
- Various versions of HTML often permit quotes to be omitted from attribute values, e.g.,
<body lang=en>. In XHTML, all attribute values must be enclosed by quotes, either single or double:
- HTML permits „attribute minimization“, where boolean attributes can have their value omitted entirely, e.g.,
<option selected>. All XML attributes must have explicit quoted values, so in XML this would be written as
- Some required elements may be omitted in HTML, in which case they are implicitly added by the parser. For instance, various versions of HTML don’t require
<body>tags to be present unless they’re intended to have attributes. On the other hand, in XML the DOM must be determined without having to know which elements are required, so these tags must be specified explicitly.
In addition to these differences, some specifications define only an HTML serialization or only an XHTML serialization. XHTML 1.0 is roughly just an XML serialization of HTML 4.0, but XHTML 1.1, 1.2, and 2.0 have no HTML serialization, while HTML versions less than 4 have no XML serialization. HTML 5 is the first (X)HTML standard designed to support both HTML and XHTML serializations equally.
The similarities between HTML 4.01 and XHTML 1.0 led many web sites and content management systems to adopt the initial W3C XHTML 1.0 Recommendation. To aid authors in the transition, the W3C provided guidance on how to publish XHTML 1.0 documents in an HTML-compatible manner, and serve them to browsers that were not designed for XHTML.
Such „HTML-compatible“ content is sent using the HTML media type (
text/html) rather than the official Internet media type for XHTML (
application/xhtml+xml). When measuring the adoption of XHTML to that of regular HTML, therefore, it is important to distinguish whether it is media type usage or actual document contents that is being compared.
Most web browsers have mature support for all of the possible XHTML media types. The notable exception is Internet Explorer by Microsoft; rather than rendering
application/xhtml+xml content, a dialog box invites the user to save the content to disk instead. Both Internet Explorer 7 (released in 2006) and Internet Explorer 8 (released in March 2009) exhibit this behavior, and it is unclear whether this will be resolved in a future release. As long as this remains the case, most web developers avoid using XHTML that isn’t HTML-compatible, so advantages of XML such as namespaces, faster parsing and smaller-footprint browsers do not benefit the user. Microsoft developer Chris Wilson explained in 2005 that IE7’s priorities were improved security and CSS support, and that proper XHTML support would be difficult to graft onto IE’s compatibility-oriented HTML parser.
In the early 2000s, some web developers began to question why Web authors ever made the leap into authoring in XHTML. Others countered that the problems ascribed to the use of XHTML could mostly be attributed to two main sources: the production of invalid XHTML documents by some Web authors and the lack of support for XHTML built into Internet Explorer 6. They went on to describe the benefits of XML-based Web documents (i.e. XHTML) regarding searching, indexing and parsing as well as future-proofing the Web itself.
In October 2006, HTML inventor and W3C chair Tim Berners-Lee, introducing a major W3C effort to develop new HTML 5 and XHTML 5 specifications, posted in his blog that, „The attempt to get the world to switch to XML … all at once didn’t work. The large HTML-generating public did not move … Some large communities did shift and are enjoying the fruits of well-formed systems … The plan is to charter a completely new HTML group.“ In the current HTML and XHTML 5 working draft, its authors say that, „special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability … while at the same time updating the HTML specifications to address issues raised in the past few years.“ Ian Hickson, author of a paper criticising the improper use of XHTML in 2002, is a member of the group developing this specification and is listed as one of the co-authors of the current working draft.
Simon Pieters researched the XML-compliance of mobile browsers and concluded “the claim that XHTML would be needed for mobile devices is simply a myth”.
Versions of XHTML
December 1998 saw the publication of a W3C Working Draft entitled Reformulating HTML in XML. This introduced Voyager, the codename for a new markup language based on HTML 4 but adhering to the stricter syntax rules of XML. By February 1999 the specification had changed name to XHTML 1.0: The Extensible HyperText Markup Language, and in January 2000 it was officially adopted as a W3C Recommendation. There are three formal DTDs for XHTML 1.0, corresponding to the three different versions of HTML 4.01:
- XHTML 1.0 Strict is the XML equivalent to strict HTML 4.01, and includes elements and attributes that have not been marked deprecated in the HTML 4.01 specification.
- XHTML 1.0 Transitional is the XML equivalent of HTML 4.01 Transitional, and includes the presentational elements (such as
strike) excluded from the strict version.
- XHTML 1.0 Frameset is the XML equivalent of HTML 4.01 Frameset, and allows for the definition of frameset documents—a common Web feature in the late 1990s.
The second edition of XHTML 1.0 became a W3C Recommendation in August 2002.
Modularization of XHTML
Modularization provides an abstract collection of components through which XHTML can be subsetted and extended. The feature is intended to help XHTML extend its reach onto emerging platforms, such as mobile devices and Web-enabled televisions. The initial draft of Modularization of XHTML became available in April 1999, and reached Recommendation status in April 2001.
The first XHTML Family Markup Languages to be developed with this technique were XHTML 1.1 and XHTML Basic 1.0. Another example is XHTML-Print (W3C Recommendation, September 2006), a language designed for printing from mobile devices to low-cost printers.
XHTML 1.1—Module-based XHTML
XHTML 1.1 evolved out of the work surrounding the initial Modularization of XHTML specification. The W3C released a first draft in September 1999; Recommendation status was reached in May 2001. The modules combined within XHTML 1.1 effectively recreate XHTML 1.0 Strict, with the addition of ruby annotation elements (
rp) to better support East-Asian languages. Other changes include removal of the
lang attribute (in favour of
xml:lang), and removal of the
name attribute from the
Although XHTML 1.1 is largely compatible with XHTML 1.0 and HTML 4, in August 2002 the HTML WG (renamed to XHTML2 WG since) issued a Working Group Note advising that it should not be transmitted with the HTML media type. With limited browser support for the alternate
application/xhtml+xml media type, XHTML 1.1 proved unable to gain widespread use. In January 2009 a second edition of the document (XHTML Media Types – Second Edition, not to be confused with the XHTML 1.1 – 2nd ed) was issued, relaxing this restriction and allowing XHTML 1.1 to be served as
XHTML 1.1 Second Edition (W3C Proposed Edited Recommendation) was issued on 7 May 2009 and rescinded on 19 May 2009. (This does not affect the text/html media type usage for XHTML 1.1 as specified in the: XHTML Media Types – Second Edition)
XHTML Basic and XHTML-MP
To support constrained devices, XHTML Basic was created by the W3C; it reached Recommendation status in December 2000. XHTML Basic 1.0 is the most restrictive version of XHTML, providing a minimal set of features that even the most limited devices can be expected to support.
The Open Mobile Alliance and its predecessor the WAP Forum released three specifications between 2001 and 2006 that extended XHTML Basic 1.0. Known as XHTML Mobile Profile or XHTML-MP, they were strongly focused on uniting the differing markup languages used on mobile handsets at the time. All provide richer form controls than XHTML Basic 1.0, along with varying levels of scripting support.
XHTML Basic 1.1 became a W3C Recommendation in July 2008, superseding XHTML-MP 1.2. XHTML Basic 1.1 is almost but not quite a subset of regular XHTML 1.1. The most notable addition over XHTML 1.1 is the
inputmode attribute—also found in XHTML-MP 1.2—which provides hints to help browsers improve form entry.
The XHTML 2 Working Group is considering the creation of a new language based on XHTML 1.1. If XHTML 1.2 is created, it will include WAI-ARIA and
role attributes to better support accessible web applications, and improved Semantic Web support through RDFa. The
inputmode attribute from XHTML Basic 1.1, along with the
target attribute (for specifying frame targets) may also be present. It’s important to note that the XHTML2 WG have not yet been chartered to carry out the development of XHTML1.2 and the W3C has announced that it does not intend to recharter the XHTML2 WG, this means that the XHTML1.2 proposal may not eventuate.
Between August 2002 and July 2006 the W3C released the first eight Working Drafts of XHTML 2.0, a new version of XHTML able to make a clean break from the past by discarding the requirement of backward compatibility. This lack of compatibility with XHTML 1.x and HTML 4 caused some early controversy in the web developer community. Some parts of the language (such as the
role and RDFa attributes) were subsequently split out of the specification and worked on as separate modules, partially to help make the transition from XHTML 1.x to XHTML 2.0 smoother. A ninth draft of XHTML 2.0 was expected to appear in 2009, however, on July 2, 2009, the W3C decided to let the XHTML2 Working Group charter expire by that year’s end, effectively halting any further development of the draft into a standard.
New features introduced by XHTML 2.0 include:
- HTML forms will be replaced by XForms, an XML-based user input specification allowing forms to be displayed appropriately for different rendering devices.
- HTML frames will be replaced by XFrames.
- The DOM Events will be replaced by XML Events, which uses the XML Document Object Model.
- A new list element type, the
nlelement type, will be included to specifically designate a list as a navigation list. This will be useful in creating nested menus, which are currently created by a wide variety of means like nested unordered lists or nested definition lists.
- Any element will be able to act as a hyperlink, e. g.,
<li href="articles.html">Articles</li>, similar to XLink. However, XLink itself is not compatible with XHTML due to design differences.
- Any element will be able to reference alternative media with the
srcattribute, e. g.,
<p src="lbridge.jpg" type="image/jpeg">London Bridge</p>is the same as
<object src="lbridge.jpg" type="image/jpeg"><p>London Bridge</p></object>.
altattribute of the
imgelement has been removed: alternative text will be given in the content of the
imgelement, much like the
objectelement, e. g.,
<img src="hms_audacious.jpg">HMS <span>Audacious</span></img>.
- A single heading element (
h) will be added. The level of these headings are determined by the depth of the nesting. This allows the use of headings to be infinite, rather than limiting use to six levels deep.
- The remaining presentational elements
tt, still allowed in XHTML 1.x (even Strict), will be absent from XHTML 2.0. The only somewhat presentational elements remaining will be
subfor superscript and subscript respectively, because they have significant non-presentational uses and are required by certain languages. All other tags are meant to be semantic instead (e. g.
<strong>for strong or bolded text) while allowing the user agent to control the presentation of elements via CSS.
- The addition of RDF triple with the
aboutattributes to facilitate the conversion from XHTML to RDF/XML.
HTML5—Vocabulary and APIs for HTML5 and XHTML5
HTML5 initially grew independently of the W3C, through a loose group of browser manufacturers and other interested parties calling themselves the WHATWG, or Web Hypertext Application Technology Working Group. The WHATWG announced the existence of an open mailing list in June 2004, along with a website bearing the strapline “Maintaining and evolving HTML since 2004.” The key motive of the group was to create a platform for dynamic web applications; they considered XHTML 2.0 to be too document-centric, and not suitable for the creation of internet forum sites or online shops.
In April 2007, the Mozilla Foundation and Opera Software joined Apple in requesting that the newly rechartered HTML Working Group of the W3C adopt the work, under the name of HTML 5. The group resolved to do this the following month, and the First Public Working Draft of HTML 5 was issued by the W3C in January 2008. The most recent W3C Working Draft was published in June 2008.
HTML5 has both a regular
text/html serialization and an XML serialization, which is known as XHTML5. In addition to the markup language, the specification includes a number of application programming interfaces. The Document Object Model is extended with APIs for editing, drag-and-drop, data storage and network communication.
The language is more compatible with HTML 4 and XHTML 1.x than XHTML 2.0, due to the decision to keep the existing HTML form elements and events model. It adds many new elements not found in XHTML 1.x, however, such as
aside. (The XHTML 1.2 equivalent (which (X)HTML5 replaces) of these structural elements would be
<div role="region"> and
As of 2009-09-03, the latest editor’s draft includes WAI-ARIA support.
Valid XHTML documents
An XHTML document that conforms to an XHTML specification is said to be valid. Validity assures consistency in document code, which in turn eases processing, but does not necessarily ensure consistent rendering by browsers. A document can be checked for validity with the W3C Markup Validation Service. In practice, many web development programs provide code validation based on the W3C standards.
The root element of an XHTML document must be
html, and must contain an
xmlns attribute to associate it with the XHTML namespace. The namespace URI for XHTML is
http://www.w3.org/1999/xhtml. The example tag below additionally features an
xml:lang attribute to identify the document with a natural language:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
In order to validate an XHTML document, a Document Type Declaration, or DOCTYPE, may be used. A DOCTYPE declares to the browser the Document Type Definition (DTD) to which the document conforms. A Document Type Declaration should be placed before the root element.
The system identifier part of the DOCTYPE, which in these examples is the URL that begins with http://, need only point to a copy of the DTD to use, if the validator cannot locate one based on the public identifier (the other quoted string). It does not need to be the specific URL that is in these examples; in fact, authors are encouraged to use local copies of the DTD files when possible. The public identifier, however, must be character-for-character the same as in the examples.
A character encoding may be specified at the beginning of an XHTML document in the XML declaration when the document is served using the
application/xhtml+xml MIME type. (If an XML document lacks encoding specification, an XML parser assumes that the encoding is UTF-8 or UTF-16, unless the encoding has already been determined by a higher protocol.)
<?xml version="1.0" encoding="UTF-8"?>
The declaration may be optionally omitted because it declares as its encoding the default encoding. However, if the document instead makes use of XML 1.1 or another character encoding, a declaration is necessary. Internet Explorer prior to version 7 enters quirks mode, if it encounters an XML declaration in a document served as
Some of the most common errors in the usage of XHTML are:
- Not closing empty elements (elements without closing tags in HTML4)
Note that any of these is acceptable in XHTML:
<br />. Older HTML-only browsers interpreting it as HTML will generally accept
- Not closing non-empty elements
<p>This is a paragraph.<p>This is another paragraph.
<p>This is a paragraph.</p><p>This is another paragraph.</p>
- Improperly nesting elements (Note that this would also be invalid in HTML)
<em><strong>This is some text.</em></strong>
<em><strong>This is some text.</strong></em>
- Not putting quotation marks around attribute values
- Using the ampersand character outside of entities (Note that this would also be invalid in HTML)
<title>Cars & Trucks</title>
<title>Cars & Trucks</title>
- Failing to recognize that XHTML elements and attributes are case sensitive
<BODY><P ID="ONE">The Best Page Ever</P></BODY>
<body><p id="ONE">The Best Page Ever</p></body>
- Using attribute minimization
- Misusing CDATA, script-comments and xml-comments when embedding scripts and stylesheets.
- This problem can be avoided altogether by putting all script and stylesheet information into separate files and referring to them as follows in the XHTML
- This problem can be avoided altogether by putting all script and stylesheet information into separate files and referring to them as follows in the XHTML