5.1. Markup: A General Overview
A markup language is a system for marking or tagging a document to
define the structure of the document. You may add tags to your document
to define which parts of your document are paragraphs, titles,
sections, glossary items (the list goes on!).
There are many markup languages in use today. XHTML and HTML will be
familiar to those who author web documents. The LDP uses a markup
language known as DocBook. Each of these markup languages uses its own
"controlled vocabulary" to describe documents. For
example: in XHTML a paragraph would be marked up with the tagset
<p></p> while in DocBook a paragraph would be marked up
with <para></para>. The tagsets are defined in a quasi
dictionary known as a Document Type Definition (DTD).
Markup languages also follow a set of rules on how a document
can be assembled. The rules are either SGML (Standard Generalized
Markup Language) or XML (eXtensible Markup Language). These rules are
essentially the "grammar" of a document's markup. SGML and
XML are very similiar. XML is a sub-set of SGML, but XML requires more
precise use of the tags when marking up a document.
The LDP accepts both SGML and XML documents, but prefers XML.
There are three components to an XML/SGML document which is read by a
person.
 | Content, markup and transformations |
|---|
| | Steve Champeon does a great job of
explaining how content, markup languages, and transformations all fit
together in his article The
Secret Life of Markup. Although he is writing from an HTML
perspective, the ideas are relevant and there is an example of DocBook markup. |