A markup language combines a mixture of text and information about that text. This extra information that we speak of is basically the structure and presentation of the text in question and is described by the term "markup". The most widely known and used markup language is known as HTML, which stands for Hypertext Markup Language and is what the basic
language of most web pages on the Internet is written in. Also known as the foundation of the World Wide Web, HTML is the fundamental and basic language that has been used to create web pages from the time the Internet was born.
HTML is actually an add-on to the language known as XML, Extensible Markup Language, which is a W3C-recommended general-purpose markup language that supports a wide variety of applications. HTML is a web site users form of XML and is governed by the rules that W3C (The World Wide Web Consortium) implemented to give Internet browsers the easiest and best availability of web pages. XML languages or 'dialects' may be designed by anyone and may be processed by conforming software. XML is also designed to be reasonably human-legible, and to this end, terseness was not considered essential in its structure.
Classes of markup languagesPresentational markup is an attempt to infer document structure from cues in the encoding. For example, in a text file, the title of a document might be preceded by several newlines and/or spaces, thus suggesting leading spacing and centering.
Procedural markup is typically also focused on the presentation of text, but is usually visible to the user editing the text file, and
is expected to be interpreted by software in the order in which it appears.
Descriptive markup or semantic markup applies labels to fragments of text without necessarily mandating any particular display or other processing semantics. For example, the Atom syndication language provides markup to label the "updated" time-stamp, which is an assertion from the publisher as to when some item of information was last changed. While the Atom specification discusses the meaning of the "updated" timestamp, and specifies the markup used to identify it, it makes no assertions about whether or how it might be presented to a user. Software might put this markup to a variety of uses, including many not foreseen by the designers of the Atom language. SGML and XML are systems explicitly designed to support the design of descriptive markup languages.
Generic markup is another term for descriptive markup. Most modern descriptive markup systems structure documents into trees, while also providing some means for embedding cross-references. Because of this, documents can be readily treated as databases, in which the database system is aware of the structure. Because they do not have such strict schemas as relational databases, however, they are commonly called "semi-structured databases".
In the third millennium, great interest has arisen in document structures that are not trees. For example, ancient and sacred literature commonly has a rhetorical or prose structure (stories, pericopes, paragraphs, and so on), as well as a reference structure (books, chapters, verses, lines). Since the boundaries of these units often cross, they cannot readily be encoded using tree-structured markup systems.