Mnemonic Library for storage of TeX documents
General Info

Introduction
Screenshots
Mailing Lists and IRC
Alternative Browsers
Special Thanks

FAQ
Understanding Mnemonic
TODO list and ideas
Bug Reports


User Info

Download binaries
Platforms
Compiling Mnemonic
Other useful software


Developer Info

Core
Message modules
Library modules
Object modules
Coding Guidelines
Browse Source
Using CVS


View with any browser

Website questions to:
webmaster@mnemonic.org

Mnemonic questions to:
disc@mnemonic.org

 

TeXdoc library overview

Class structure

The texdoc library contains objects for the storage of parsed TeX documents. Just like lib-dom, a document is stored in a nested form, with TeX groups (usually opened and closed with `{' and `}' respectively) stored in a tree-like structure.

In contrast to the representation in TeX itself, lib-texdoc stores an entire document in memory. Mode changing commands, like for instance `$', result in the construction of nodes of a specific type,

  1. hlist_node
  2. vlist_node
  3. rule_node
  4. ins_node
  5. mark_node
  6. adjust_node
  7. ligature_node
  8. disc_node
  9. whatsit_node
  10. math_node
  11. latex_node
The very last node type is special to lib-texdoc and was added for greater flexibility in storage of LaTeX documents. These nodes correspond to blocks of data enclosed in \begin{...} and \end{...} commands. In normal TeX these are handled by the LaTeX format file, in lib-texdoc it is possible to have them scanned by a C++ parser and stored in latex_nodes.

Some nodes appearing in TeX lists are not represented in lib-texdoc as they are concerned with typeset output rather than input, and do therefore not belong in a library that encodes input. (NOTE: not all of the above nodes should remain in the library).

TeX knows about many parameters that determine the structure of a document. The actual values of those parameters can be changed at any point in the input and will subsequently influence scanning and parsing of the remainder of the input. The texdoc library keeps track of these variables by storing the modifications as boundaries of nested blocks are crossed. The objects that take care of this are defined in eqtb.hh.

Do we have to have native scanning of macros with optional parameters in lib-texdoc? Probably we will. The easiest way to do that is to define additional catcodes `open_optional' and `close_optional'. See `tests/latex_test.tex' for an example.

Macro storage

Macros are stored in the form of token lists, defined in token_list.hh.

Typical document structure

Most documents start out with some symbols in an implicit hbox, so these are stored as children of an hlist_node. Opening special nodes by using a TeX primitive, eg. \vbox leads to the construction of a new primitive node type. The primitive TeX commands can be found by grepping for `primitive' in the TeX source.

It is up to the parser to scan for special non-primitive grouping commands, for instance \section. Data following such a command is stored as children of a latex_node.

some text
\catcode`\@11
\def@#1#2{foo #1 bar #2}
\catcode`\@12
other 
\catcode
Note: we basically want a way to represent the catcode-independent version of a document, together with the TeX primitives like \hbox, \vbox, and the LaTeX `primitives' like `\section' as well as `\begin{...}/\end{...}' constructs. In addition, we want to keep track of macro definitions, whether these are single or multiple character ones. Not only do we have to be able to store macros, we should also be able to evaluate them.

While scanning all funky catcode things just result in the parser using different input characters to map to TeX primitives. The parser can get access to the catcode information to do this. The resulting document does not have to be scanned from the beginning onwards as all catcode have been expanded.

How should we handle macro expansion? If we store the actual call of the macro but do not expand, we may be messing up the catcodes. So it seems necessary to always expand definitions while scanning, not just upon request. We could still keep a macro_activated node to keep track of the input file. Or scan the macro definition for changes to the table of equivalents that affect the syntactic scanner, and only activate those.

Something similar is required to store the text that describes the definition of a macro.