Standard Generalized Markup Language

aesthetics  →
being  →
complexity  →
database  →
enterprise  →
ethics  →
fiction  →
history  →
internet  →
knowledge  →
language  →
licensing  →
linux  →
logic  →
method  →
news  →
perception  →
philosophy  →
policy  →
purpose  →
religion  →
science  →
sociology  →
software  →
truth  →
unix  →
wiki  →
essay  →
feed  →
help  →
system  →
wiki  →
critical  →
discussion  →
forked  →
imported  →
original  →
Standard Generalized Markup Language
[ temporary import ]
please note:
- the content below is remote from Wikipedia
- it has been imported raw for GetWiki

| magic = International Organization for Standardization>ISO| genre = Markup Language| container for = | contained by = IBM Generalized Markup Language>GML| extended to = HTML, XML | standard = ISO 8879}}The Standard Generalized Markup Language (SGML; ISO 8879:1986) is a standard for defining generalized markup languages for documents. ISO 8879 Annex A.1 defines generalized markup:Generalized markup is based on two postulates:
  • Markup should be declarative: it should describe a document's structure and other attributes, rather than specify the processing to be performed on it. Declarative markup is less likely to conflict with unforeseen future processing needs and techniques.
  • Markup should be rigorous so that the techniques available for processing rigorously-defined objects like programs and databases can be used for processing documents as well.
HTML was theoretically an example of an SGML-based language until HTML 5, which browsers cannot parse as SGML for compatibility reasons.DocBook SGML and LinuxDoc are examples which were used almost exclusively with actual SGML tools.

Standard versions

SGML is an ISO standard: "ISO 8879:1986 Information processing â€“ Text and office systems â€“ Standard Generalized Markup Language (SGML)", of which there are three versions:
  • Original SGML, which was accepted in October 1986, followed by a minor Technical Corrigendum.
  • SGML (ENR), in 1996, resulted from a Technical Corrigendum to add extended naming rules allowing arbitrary-language and -script markup.
  • SGML (ENR+WWW or WebSGML), in 1998, resulted from a Technical Corrigendum to better support XML and WWW requirements.
SGML is part of a trio of enabling ISO standards for electronic documents developed by ISO/IEC JTC1/SC34WEB,weblink JTC 1/SC 34 – Document description and processing languages, ISO, ISO, 2009-12-25, WEB,weblink JTC 1/SC 34 – Document Description and Processing Languages, ISO JTC1/SC34, 2009-12-25, (ISO/IEC Joint Technical Committee 1, Subcommittee 34 â€“ Document description and processing languages) :
  • SGML (ISO 8879)—Generalized markup language
    • SGML was reworked in 1998 into XML, a successful profile of SGML. Full SGML is rarely found or used in new projects.
  • DSSSL (ISO/IEC 10179)—Document processing and styling language based on Scheme.
    • DSSSL was reworked into{{Clarify|date=February 2014|reason=look at DSSSL and then at XSLT and say that again with a straight face}} W3C XSLT and XSL-FO which use an XML syntax. Nowadays, DSSSL is rarely used in new projects apart from Linux documentation.
  • HyTime—Generalized hypertext and scheduling.ISO/IEC 10744 â€“ Hytime
    • HyTime was partially reworked into W3C XLink. HyTime is rarely used in new projects.
SGML is supported by various technical reports, in particular
  • ISO/IEC TR 9573 â€“ Information processing â€“ SGML support facilities â€“ Techniques for using SGMLNEWS, ISO/IEC TR 9573,weblink{ed1.0}en.pdf, 1991, International Organization for Standardization, ISO, 5 December 2017,
    • Part 13: Public entity sets for mathematics and science
      • In 2007, the W3C MathML working group agreed to assume the maintenance of these entity sets.


SGML descended from IBM's Generalized Markup Language (GML), which Charles Goldfarb, Edward Mosher, and Raymond Lorie developed in the 1960s. Goldfarb, editor of the international standard, coined the “GML” term using their surname initials.WEB,weblink 1996, The Roots of SGML – A Personal Recollection, Charles F., Goldfarb, Charles F. Goldfarb, July 7, 2007, Goldfarb also wrote the definitive work on SGML syntax in "The SGML Handbook".WEB,weblink 1990, The SGML Handbook, Charles F., Goldfarb, Charles F. Goldfarb, The syntax of SGML is closer to the COCOA format.{{Clarify|date=November 2013|reason=Closer than what?}} As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry. Many such documents must remain readable for several decades—a long time in the information technology field. SGML also was extensively applied by the military, and the aerospace, technical reference, and industrial publishing industries. The advent of the XML profile has made SGML suitable for widespread application for small-scale, general-purpose use.File:OED-LEXX-Bungler.jpg|framed|A fragment of the Oxford English DictionaryOxford English Dictionary

Document validity

SGML (ENR+WWW) defines two kinds of validity. According to the revised Terms and Definitions of ISO 8879 (from the public draftTerms and Definitions of ISO 8879 draft):A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both. Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.A type-valid SGML document is defined by the standard asAn SGML document in which, for each document instance, there is an associated document type declaration (DTD) to whose DTD that instance conforms.A tag-valid SGML document is defined by the standard asAn SGML document, all of whose document instances are fully tagged. There need not be a document type declaration associated with any of the instances. Note: If there is a document type declaration, the instance can be parsed with or without reference to it.


Tag-validity was introduced in SGML (ENR+WWW) to support XML which allows documents with no DOCTYPE declaration but which can be parsed without a grammar or documents which have a DOCTYPE declaration that makes no XML Infoset contributions to the document. The standard calls this fully tagged. Integrally stored reflects the XML requirement that elements end in the same entity in which they started. Reference-free reflects the HTML requirement that entity references are for special characters and do not contain markup. SGML validity commentary, especially commentary that was made before 1997 or that is unaware of SGML (ENR+WWW), covers type-validity only.The SGML emphasis on validity supports the requirement for generalized markup that markup should be rigorous. (ISO 8879 A.1)


An SGML document may have three parts:
  1. the SGML Declaration,
  2. the Prologue, containing a DOCTYPE declaration with the various markup declarations that together make a Document Type Definition (DTD), and
  3. the instance itself, containing one top-most element and its contents.
An SGML document may be composed from many entities (discrete pieces of text). In SGML, the entities and element types used in the document may be specified with a DTD, the different character sets, features, delimiter sets, and keywords are specified in the SGML Declaration to create the concrete syntax of the document.Although full SGML allows implicit markup and some other kinds of tags, the XML specification (s4.3.1) states:For introductory information on a basic, modern SGML syntax, see XML. The following material concentrates on features not in XML and is not a comprehensive summary of SGML syntax.

Optional features

SGML generalizes and supports a wide range of markup languages as found in the mid 1980s. These ranged from terse Wiki-like syntaxes to RTF-like bracketed languages to HTML-like matching-tag languages. SGML did this by a relatively simple default reference concrete syntax augmented with a large number of optional features that could be enabled in the SGML Declaration. Not every SGML parser can necessarily process every SGML document. Because each processor's System Declaration can be compared to the document's SGML Declaration it is always possible to know whether a document is supported by a particular processor.Many SGML features relate to markup minimization. Other features relate to concurrent (parallel) markup (CONCUR), to linking processing attributes (LINK), and to embedding SGML documents within SGML documents (SUBDOC).The notion of customizable features was not appropriate for Web use, so one goal of XML was to minimize optional features. However XML's well-formedness rules cannot support Wiki-like languages, leaving them unstandardized and difficult to integrate with non-text information systems.

Concrete and abstract syntaxes

The usual (default) SGML concrete syntax resembles this example, which is the default HTML concrete syntax:
typically something like this
SGML provides an abstract syntax that can be implemented in many different types of concrete syntax. Although the markup norm is using angle brackets as start- and end- tag delimiters in an SGML document (per the standard-defined reference concrete syntax), it is possible to use other characters—provided a suitable concrete syntax is defined in the document's SGML declaration.WEB,weblink SGML Declarations, July 21, 1998, Wayne, Wohler, August 17, 2009, For example, an SGML interpreter might be programmed to parse GML, wherein the tags are delimited with a left colon and a right full stop, thus, an :e prefix denotes an end tag: :xmp.Hello, world:exmp.. According to the reference syntax, letter-case (upper- or lower-) is not distinguished in tag names, thus the three tags: (i) , (ii) , and (iii) are equivalent. (NOTE: A concrete syntax might change this rule via the NAMECASE NAMING declarations).

Markup minimization

SGML has features for reducing the number of characters required to mark up a document, which must be enabled in the SGML Declaration. SGML processors need not support every available feature, thus allowing applications to tolerate many types of inadvertent markup omissions; however, SGML systems usually are intolerant of invalid structures. XML is intolerant of syntax omissions, and does not require a DTD for validation.


Both start tags and end tags may be omitted from a document instance, provided:
  1. the OMITTAG feature is enabled in the SGML Declaration,
  2. the DTD indicates that the tags are permitted to be omitted,
  3. (for start tags) the element has no associated required (REQUIRED) attributes, and
  4. the tag can be unambiguously inferred by context.
For example, if OMITTAG YES is specified in the SGML Declaration (enabling the OMITTAG feature), and the DTD includes the following declarations:then this excerpt:Introduction to SGMLThe SGML Declaration...which omits two {{tag|title|o}} tags and two {{tag|title|c}} tags, would represent valid markup.Note also that omitting tags is optional â€“ the same excerpt could be tagged like this:Introduction to SGMLThe SGML Declaration...and would still represent valid markup.{{Anchor|EMPTY}}Note: The OMITTAG feature is unrelated to the tagging of elements whose declared content is EMPTY as defined in the DTD:Elements defined like this have no end tag, and specifying one in the document instance would result in invalid markup. This is syntactically different than XML empty elements in this regard.


Tags can be replaced with delimiter strings, for a terser markup, via the SHORTREF feature. This markup style is now associated with wiki markup, e.g. wherein two equals-signs (==), at the start of a line, are the “heading start-tag”, and two equals signs (==) after that are the “heading end-tag”.


SGML markup languages whose concrete syntax enables the SHORTTAG VALUE feature, do not require attribute values containing only alphanumeric characters to be enclosed within quotation marks—either double " " (LIT) or single ' ' (LITA)—so that the previous markup example could be written:
typically something like this
One feature of SGML markup languages is the "presumptuous empty tagging", such that the empty end tag in this "inherits" its value from the nearest previous full start tag, which, in this example, is (in other words, it closes the most recently opened item). The expression is thus equivalent to this.


Another feature is the NET (Null End Tag) construction: this.

Other features

Additionally, the SHORTTAG NETENABL IMMEDNET feature allows shortening tags surrounding an empty text value, but forbids shortening full tags:can be written asSignificant open-source implementations of SGML have included:
  • ARC-SGML, by Standard Generalized Markup Language Users', 1991, C language
  • SGMLS, by James Clark, 1993, C language
  • Project YAO, by Yuan-ze Institute of Technology, Taiwan, with Charles Goldfarb, 1994, object
  • SP by James Clark, C++ language
SP and Jade, the associated DSSSL processors, are maintained by the OpenJade project, and are common parts of Linux distributions. A general archive of SGML software and materials resides at SUNET. The original HTML parser class, in Sun System's implementation of Java, is a limited-features SGML parser, using SGML terminology and concepts.

See also



External links

{{ISO standards}}{{Authority control}}

- content above as imported from Wikipedia
- "Standard Generalized Markup Language" does not exist on GetWiki (yet)
- time: 8:05pm EDT - Sun, Jul 22 2018
[ this remote article is provided by Wikipedia ]
LATEST EDITS [ see all ]
M.R.M. Parrott