tango.text.xml.Document

License:

Version:

Initial release: February 2008

Authors:

Aaron, Kris

class Document(T) : PullParser!(T) ¶

Implements a DOM atop the XML parser, supporting document parsing, tree traversal and ad-hoc tree manipulation.

The DOM API is non-conformant, yet simple and functional in style - locate a tree node of interest and operate upon or around it. In all cases you will need a document instance to begin, whereupon it may be populated either by parsing an existing document or via API manipulation.

This particular DOM employs a simple free-list to allocate each of the tree nodes, making it quite efficient at parsing XML documents. The tradeoff with such a scheme is that copying nodes from one document to another requires a little more care than otherwise. We felt this was a reasonable tradeoff, given the throughput gains vs the relative infrequency of grafting operations. For grafting within or across documents, please use the move() and copy() methods.

Another simplification is related to entity transcoding. This is not performed internally, and becomes the responsibility of the client. That is, the client should perform appropriate entity transcoding as necessary. Paying the (high) transcoding cost for all documents doesn't seem appropriate.

Parse example

auto doc = new Document!(char);
doc.parse (content);

auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;

API example

auto doc = new Document!(char);

// attach an xml header
doc.header;

// attach an element with some attributes, plus 
// a child element with an attached data value
doc.tree.element   (null, "element")
        .attribute (null, "attrib1", "value")
        .attribute (null, "attrib2")
        .element   (null, "child", "value");

auto print = new DocPrinter!(char);
Stdout(print(doc)).newline;

Note that the document tree() includes all nodes in the tree, and not just elements. Use doc.elements to address the topmost element instead. For example, adding an interior sibling to the prior illustration

1	doc.elements.element (null, "sibling");

Printing the name of the topmost (root) element:

1	Stdout.formatln ("first element is '{}'", doc.elements.name);

XPath examples:

auto doc = new Document!(char);

// attach an element with some attributes, plus 
// a child element with an attached data value
doc.tree.element   (null, "element")
        .attribute (null, "attrib1", "value")
        .attribute (null, "attrib2")
        .element   (null, "child", "value");

// select named-elements
auto set = doc.query["element"]["child"];

// select all attributes named "attrib1"
set = doc.query.descendant.attribute("attrib1");

// select elements with one parent and a matching text value
set = doc.query[].filter((doc.Node n) {return n.children.hasData("value");});

Note that path queries are temporal - they do not retain content across mulitple queries. That is, the lifetime of a query result is limited unless you explicitly copy it. For example, this will fail

1 2	auto elements = doc.query["element"]; auto children = elements["child"];

The above will lose elements because the associated document reuses node space for subsequent queries. In order to retain results, do this

1 2	auto elements = doc.query["element"].dup; auto children = elements["child"];

The above .dup is generally very small (a set of pointers only). On the other hand, recursive queries are fully supported

1	set = doc.query[].filter((doc.Node n) {return n.query[].count > 1;});

Typical usage tends to follow the following pattern, Where each query result is processed before another is initiated

foreach (node; doc.query.child("element"))
        {
        // do something with each node
        }

Note that the parser is templated for char, wchar or dchar.

this(size_t nodes = 1000) ¶: Construct a DOM instance. The optional parameter indicates the initial number of nodes assigned to the freelist
XmlPath!(T).NodeSet query() [final] ¶: Return an xpath handle to query this document. This starts at the document root.

See also Node.query
Node tree() [@property, final] ¶: Return the root document node, from which all other nodes are descended.

Returns null where there are no nodes in the document
Node elements() [@property, final] ¶: Return the topmost element node, which is generally the root of the element tree.

Returns null where there are no top-level element nodes
Document reset() [final] ¶: Reset the freelist. Subsequent allocation of document nodes will overwrite prior instances.
Document header(const(T)[] encoding = null) [final] ¶: Prepend an XML header to the document tree
void parse(const(T[]) xml) [final] ¶: Parse the given xml content, which will reuse any existing node within this document. The resultant tree is retrieved via the document 'tree' attribute