Mnemonic Conversion from DOM to layout tree
General Info

Introduction
Screenshots
Mailing Lists and IRC
Alternative Browsers
Special Thanks

FAQ
Understanding Mnemonic
TODO list and ideas
Bug Reports


User Info

Download binaries
Platforms
Compiling Mnemonic
Other useful software


Developer Info

Core
Message modules
Library modules
Object modules
Coding Guidelines
Browse Source
Using CVS


View with any browser

Website questions to:
webmaster@mnemonic.org

Mnemonic questions to:
disc@mnemonic.org

 

Conversion from DOM tree to layout tree

The w3render library is a bridge between DOM trees and layout trees (as implemented in lib-dom and lib-layout respectively). In order to make this conversion, it uses the information in a CSS stylesheet.

This library is completely independent of the GUI toolkit. Instead, it requires only box concepts defined in lib-layout (hvboxes, parboxes, glue and so on). Use is made of the factory class of lib-layout to create the boxes.

Internals

The map from the XML/CSS layout system to the one used in lib-layout is fairly straightforward. In XML, elements can be associated to geometrical information or to property information (or both). The former happens for instance when an element is defined to have a display property of block. The latter is often the case for non-empty inline elements. The layout library has a similar distinction between geometrical information and property information.

The main entry point to the w3render library is dom_to_layout, which takes a vector of DOM nodes. These nodes are the top nodes of newly created subtrees of the DOM tree. All of the subtrees are traversed in turn and a corresponding subtree in the layout tree is constructed. While doing this, the library constructs a map which relates DOM elements to layout boxes, as follows:

  • For inline empty elements, a map from the DOM node to the layout box (these elements provide a layout box to which no further boxes should be added, ie. the created box should not be considered as a container box).
  • For inline non-empty elements, a map from the DOM node to the property box (these elements do not provide a geometrical container themselves, but can provide property information).
  • For all other elements, among which block, table, list_item and so on, a map from the DOM node to the outermost container_box and a map from the DOM node to the the innermost container_box (only the innermost container_box map is used when there is only one container).
(`empty' and `non-empty' are used in the XML sense, ie. a non-empty element does not necessary have children, only the ability to have them). This is enough information to be able to find, afterwards, which layout boxes have been generated by a given DOM element. This is necessary for future extensions which allow for removal of DOM elements as well (ie. when using scripting).

The logic to create new layout boxes is as follows. First, the geometrical structure of an element is considered. This leads to the creation of zero or more nested layout::container_box objects. A container box (the target) is picked as the container for any subsequent empty elements. Next, if the element is empty, an appropriate layout::box is created and stored in the target. Subsequently, a single layout::property_box is constructed in case there is property information associated to the element. The last step consists of putting these things together and storing the relevant information for later use.

In detail, the algorithm that determines which layout::box objects get created is as follows (target is the layout box which acts as the container for any boxes created by the element under consideration):

  • If a block element is encountered, the DOM tree is scanned upwards for the first element with block display property; the associated inner box which will be the target. A layout::vbox is added.

  • If an inline element is encountered, this element could have been broken by block boxes. In order to find the correct container, the tree is scanned to the left for previous siblings. In case the previous sibling is an inline sibling, it can either be empty or non-empty. In the former case the parbox associated to this empty element is the target. In the latter case, the parent of the box associated to the non-empty element is the target.

    If the previous sibling is block, the DOM tree is scanned upwards for the first block parent element with an associated box, and to this box a parbox is added. Then the intermediate part is examined upwards for an inline element with a box associated to it, of which the properties are inherited by adding a property_box to the parbox just added. This property box becomes the target (in case the first upwards box is the block box, do nothing).

    In the remaining cases there is no preceding sibling. In this case the DOM tree is scanned upwards for the first element with associated box. If it is inline, the box will become the target. If it is block, a parbox is created and associated to the present element; this box is the target.

As an example, consider
<BLOCKQUOTE>
some text <EM>emphasised <RED>and
<H1>header element</H1>
more</RED> emphasised text</EM> and normal again.
</BLOCKQUOTE>
This translates to (in symbolic language for lib-layout)
\vbox{
  \parbox{
     some text
     \property_box[em]{
        emphasised 
        \property_box[red]{and}}
  \vbox{
     \property_box[em,red,big]{
        \parbox{header element}}}
  \parbox{
     \property_box[em]{
        \property_box[red]{more} 
        emphasised text} 
     and normal again.}
}
The presence of the block-level element <H1> breaks the geometrical nesting.

Properties that should be inherited include table stuff: border-collapse border-spacing caption-side empty-cells, colors: color cursor, font stuff: font font-family font-size font-size-adjust font-stretch font-family font-variant font-weight, box spacing: letter-spacing line-height orphans widows text-align text-indent text-transform white-space word-spacing page page-break-inside, lists: list-style list-style-image list-style-position list-style-type, junk: quotes.