Goto Chapter: Top 1 2 3 4 5 6 7 A B C Bib Ind
 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 

1 Introduction and Example
 1.1 XML
 1.2 A complete example
 1.3 Some questions

1 Introduction and Example

The main purpose of the GAPDoc package is to define a file format for documentation of GAP-programs and -packages (see [GAP06]). The problem is that such documentation should be readable in several output formats. For example it should be possible to read the documentation inside the terminal in which GAP is running (a text mode) and there should be a printable version in high typesetting quality (produced by some version of TeX). It is also popular to view GAP's online help with a Web-browser via an HTML-version of the documentation. Nowadays one can use LaTeX and standard viewer programs to produce and view on the screen dvi- or pdf-files with full support of internal and external hyperlinks. Certainly there will be other interesting document formats and tools in this direction in the future.

Our aim is to find a format for writing the documentation which allows a relatively easy translation into the output formats just mentioned and which hopefully makes it easy to translate to future output formats as well.

To make documentation written in the GAPDoc format directly usable, we also provide a set of programs, called converters, which produce text-, hyperlinked LaTeX- and HTML-output versions of a GAPDoc document. These programs are developed by the first named author. They run completely inside GAP, i.e., no external programs are needed. You only need latex and pdflatex to process the LaTeX output. These programs are described in Chapter 5.

1.1 XML

The definition of the GAPDoc format uses XML, the "eXtendible Markup Language". This is a standard (defined by the W3C consortium, see which lays down a syntax for adding markup to a document or to some data. It allows to define document structures via introducing markup elements and certain relations between them. This is done in a document type definition. The file gapdoc.dtd contains such a document type definition and is the central part of the GAPDoc package.

The easiest way for getting a good idea about this is probably to look at an example. The Appendix A contains a short but complete GAPDoc document for a fictitious share package. In the next section we will go through this document, explain basic facts about XML and the GAPDoc document type, and give pointers to more details in later parts of this documentation.

In the last Section 1.3 of this introductory chapter we try to answer some general questions about the decisions which lead to the GAPDoc package.

1.2 A complete example

In this section we recall the lines from the example document in Appendix A and give some explanations.

<?xml version="1.0" encoding="UTF-8"?> 

This line just tells a human reader and computer programs that the file is a document with XML markup and that the text is encoded in the UTF-8 character set (other common encodings are ASCII or ISO-8895-X encodings).

<!--   A complete "fake package" documentation   

Everything in a XML file between "<!--" and "-->" is a comment and not part of the document content.

<!DOCTYPE Book SYSTEM "gapdoc.dtd">

This line says that the document contains markup which is defined in the system file gapdoc.dtd and that the markup obeys certain rules defined in that file (the ending dtd means "document type definition"). It further says that the actual content of the document consists of an element with name "Book". And we can really see that the remaining part of the file is enclosed as follows:

<Book Name="3k+1">
  [...] (content omitted)

This demonstrates the basics of the markup in XML. This part of the document is an "element". It consists of the "start tag" <Book Name="3k+1">, the "element content" and the "end tag" </Book> (end tags always start with </). This element also has an "attribute" Name whose "value" is 3k+1.

If you know HTML, this will look familiar to you. But there are some important differences: The element name Book and attribute name Name are case sensitive. The value of an attribute must always be enclosed in quotes. In XML every element has a start and end tag (which can be combined for elements defined as "empty", see for example <TableOfContents/> below).

If you know LaTeX, you are familiar with quite different types of markup, for example: The equivalent of the Book element in LaTeX is \begin{document} ... \end{document}. The sectioning in LaTeX is not done by explicit start and end markup, but implicitly via heading commands like \section. Other markup is done by using braces {} and putting some commands inside. And for mathematical formulae one can use the $ for the start and the end of the markup. In XML all markup looks similar to that of the Book element.

The content of the book starts with a title page.

  <Title>The <Package>ThreeKPlusOne</Package> Package</Title>
  <Version>Version 42</Version>
  <Author>Dummy Authör

  <Copyright>&copyright; 2000 The Author. <P/>
    You can do with this package what you want.<P/> Really.

The content of the TitlePage element consists again of elements. In Chapter 3 we describe which elements are allowed within a TitlePage and that their ordering is prescribed in this case. In the (stupid) name of the author you see that a German umlaut is used directly (in ISO-latin1 encoding).

Contrary to LaTeX- or HTML-files this markup does not say anything about the actual layout of the title page in any output version of the document. It just adds information about the meaning of pieces of text.

Within the Copyright element there are two more things to learn about XML markup. The <P/> is a complete element. It is a combined start and end tag. This shortcut is allowed for elements which are defined to be always "empty", i.e., to have no content. You may have already guessed that <P/> is used as a paragraph separator. Note that empty lines do not separate paragraphs (contrary to LaTeX).

The other construct we see here is &copyright;. This is an example of an "entity" in XML and is a macro for some substitution text. Here we use an entity as a shortcut for a complicated expression which makes it possible that the term copyright is printed as some text like (C) in text terminal output and as a copyright character in other output formats. In GAPDoc we predefine some entities. Certain "special characters" must be typed via entities, for example "<", ">" and "&" to avoid a misinterpretation as XML markup. It is possible to define additional entities for your document inside the <!DOCTYPE ...> declaration, see 2.2-3.

Note that elements in XML must always be properly nested, as in this example. A construct like <a><b>...</a></b> is not allowed.


This is another example of an "empty element". It just means that a table of contents for the whole document should be included into any output version of the document.

After this the main text of the document follows inside certain sectioning elements:

  <Chapter> <Heading>The <M>3k+1</M> Problem</Heading>
    <Section Label="sec:theory"> <Heading>Theory</Heading>
      [...] (content omitted)
    <Section> <Heading>Program</Heading>
      [...] (content omitted) 

These elements are used similarly to "\chapter" and "\section" in LaTeX. But note that the explicit end tags are necessary here.

The sectioning commands allow to assign an optional attribute "Label". This can be used for referring to a section inside the document.

The text of the first section starts as follows. The whitespace in the text is unimportant and the indenting is not necessary.

      Let  <M>k \in  &NN;</M> be  a  natural number.  We consider  the
      sequence <M>n(i, k), i \in &NN;,</M> with <M>n(1, k) = k</M> and

Here we come to the interesting question how to type mathematical formulae in a GAPDoc document. We did not find any alternative for writing formulae in TeX syntax. (There is MATHML, but even simple formulae contain a lot of markup, become quite unreadable and they are cumbersome to type. Furthermore there seem to be no tools available which translate such formulae in a nice way into TeX and text.) So, formulae are essentially typed as in LaTeX. (Actually, it is also possible to type unicode characters of some mathematical symbols directly, or via an entity like the &NN; above.) There are three types of elements containing formulae: "M", "Math" and "Display". The first two are for in-text formulae and the third is for displayed formulae. Here "M" and "Math" are equivalent, when translating a GAPDoc document into LaTeX. But they are handled differently for terminal text (and HTML) output. For the content of an "M"-element there are defined rules for a translation into well readable terminal text. More complicated formulae are in "Math" or "Display" elements and they are just printed as they are typed in text output. So, to make a section well readable inside a terminal window you should try to put as many formulae as possible into "M"-elements. In our example text we used the notation n(i, k) instead of n_i(k) because it is easier to read in text mode. See Sections 2.2-2 and 3.9 for more details.

A few lines further on we find two non-internal references.

      problem, see <Cite Key="Wi98"/> or

The first within the "Cite"-element is the citation of a book. In GAPDoc we use the widely used BibTeX database format for reference lists. This does not use XML but has a well documented structure which is easy to parse. And many people have collections of references readily available in this format. The reference list in an output version of the document is produced with the empty element

<Bibliography Databases="3k+1" />

close to the end of our example file. The attribute "Databases" give the name(s) of the database (.bib) files which contain the references.

Putting a Web-address into an "URL"-element allows one to create a hyperlink in output formats which allow this.

The second section of our example contains a special kind of subsection defined in GAPDoc.

        <Func Name="ThreeKPlusOneSequence" Arg="k[, max]"/>
          This  function computes  for a  natural number  <A>k</A> the
          beginning of the sequence  <M>n(i, k)</M> defined in section
          <Ref Sect="sec:theory"/>.  The sequence  stops at  the first
          <M>1</M>  or at  <M>n(<A>max</A>, k)</M>,  if <A>max</A>  is
gap> ThreeKPlusOneSequence(101);
"Sorry, not yet implemented. Wait for Version 84 of the package"

A "ManSection" contains the description of some function, operation, method, filter and so on. The "Func"-element describes the name of a function (there are also similar elements "Oper", "Meth", "Filt" and so on) and names for its arguments, optional arguments enclosed in square brackets. See Section 3.4 for more details.

In the "Description" we write the argument names as "A"-elements. A good description of a function should usually contain an example of its use. For this there are some verbatim-like elements in GAPDoc, like "Example" above (here, clearly, whitespace matters which causes a slightly strange indenting).

The text contains an internal reference to the first section via the explicitly defined label sec:theory.

The first section also contains a "Ref"-element which refers to the function described here. Note that there is no explicit label for such a reference. The pair <Func Name="ThreeKPlusOneSequence" Arg="k[, max]"/> and <Ref Func="ThreeKPlusOneSequence"/> does the cross referencing (and hyperlinking if possible) implicitly via the name of the function.

Here is one further element from our example document which we want to explain.


This is again an empty element which just says that an output version of the document should contain an index. Many entries for the index are generated automatically because the "Func" and similar elements implicitly produce such entries. It is also possible to include explicit additional entries in the index.

1.3 Some questions

Are those XML files too ugly to read and edit?

Just have a look and decide yourself. The markup needs more characters than most TeX or LaTeX markup. But the structure of the document is easier to see. If you configure your favorite editor well, you do not need more key strokes for typing the markup than in LaTeX.

Why do we not use LaTeX alone?

LaTeX is good for writing books. But LaTeX files are generally difficult to parse and to process to other output formats like text for browsing in a terminal window or HTML (or new formats which may become popular in the future). GAPDoc markup is one step more abstract than LaTeX insofar as it describes meaning instead of appearance of text. The inner workings of LaTeX are too complicated to learn without pain, which makes it difficult to overcome problems that occur occasionally.

Why XML and not a newly defined markup language?

XML is a well defined standard that is more and more widely used. Lots of people have thought about it. Years of experience with SGML went into the design. It is easy to explain, easy to parse and lots of tools are available, there will be more in the future.

 [Top of Book]  [Contents]   [Previous Chapter]   [Next Chapter] 
Goto Chapter: Top 1 2 3 4 5 6 7 A B C Bib Ind

generated by GAPDoc2HTML