Three Ways To Parse XML in Qt

Update 1/12/2015: I’ve written a follow-up to this post comparing the performance of the different parsers. I’ve also fixed a few mistakes in the code and text below. I’ve also changed my mind about QXmlSimpleReader now that I’ve found a simple way to use it.

For this past February’s CoderNight meetup, I thought I would write the solution using Qt and take the time to explore qdoc, Qt’s excellent documentation tool. So, of course, I spent all my timeÂ figuring out the three native ways to parse XML using Qt and completely ran out of time for qdoc. While researching the XML parsing, I couldn’t find any webpages addressing and comparing the methods all at once. Since then, I’ve discovered that the documentation for Qt 5 has an XBEL bookmarks example for each of the methods and you can compare those examples to get a feel for the differences, but there’s still no Qt XML parsing rosetta stone page. Here’s a shorter, incomplete comparison of the three methods.

The Problem

We’ve got to read in a table of currency conversions stored in an xml file that looks like this:

<rates>
  <rate>
    <from>AUD</from>
    <to>CAD</to>
    <conversion>1.0079</conversion>
  </rate>
  <rate>...</rate>
  ...
</rates>

<rates>

<rate>

</rate>

...

</rates>

There are three Qt ways to parse this:

Use QXmlStreamReader from QtCore to parse the xml linearly
Use QDomDocument from the QtXml module to model the entire xml as a tree data structure of XML objects
Use QXmlQuery from the QtXmlPatterns modules to create an XQuery to search the xml and return a formatted result set

Note: All of the code in this post and the problem description can be found in this github repository. Later, there’s some code from the Qt git repository for 5.3.

1. Stream Reader

Qt describes a stream reader as [building] recursive descent parsers, allowing XML parsing code to be split into different methods or classes. This means that there’s usually a separate function to process each level of elements with functions getting more detailed the deeper you get. Qt has a good example for parsing XBEL (web browser bookmark) files, but we’ll use our own problem to illustrate.

Our stream reader class is XmlRateReader which parses the rates from a given xml file.

#ifndef XMLRATEREADER_H
#define XMLRATEREADER_H

#include <QXmlStreamReader>
#include <QString>

class XmlRateReader
{
public:
    XmlRateReader(const QString filename);

    void read();

private:
    void processRates();
    void processRate();
    QString readNextText();
    QString errorString();

    QString _filename;
    QXmlStreamReader xml;
};

#endif // XMLRATEREADER_H

class XmlRateReader

{

public:

XmlRateReader(const QString filename);

void read();

private:

void processRates();

void processRate();

QString readNextText();

QString errorString();

QString _filename;

QXmlStreamReader xml;

The mapping from the xml works like this:

rates.xml                           => XmlRateReader("rates.xml"), read()
---------
<rates>                               => processRates()
  <rate>                                => processRate()
    <from>AUD</from>                      => QXmlStreamReader::text()
    <to>CAD</to>                          => QXmlStreamReader::text()
    <conversion>1.0079</conversion>       => QXmlStreamReader::text()
  </rate>
</rates>

rates.xml => XmlRateReader("rates.xml"), read()

---------

<rates> => processRates()

<rate> => processRate()

<from>AUD</from> => QXmlStreamReader::text()

<to>CAD</to> => QXmlStreamReader::text()

<conversion>1.0079</conversion> => QXmlStreamReader::text()

</rate>

</rates>

We start by looking at the top level start tag in the xml stream, calling processRates() when we find rates.

#include "xmlratereader.h"

#include <QFile>
#include <QDebug>

#include "currency.h"

XmlRateReader::XmlRateReader(const QString filename) :
    _filename(filename)
{}

void XmlRateReader::read() {
    QFile xmlFile(_filename);
    xmlFile.open(QIODevice::ReadOnly);
    xml.setDevice(&xmlFile);

    if (xml.readNextStartElement() && xml.name() == "rates")
       processRates();

    // readNextStartElement() leaves the stream in
    // an invalid state at the end. A single readNext()
    // will advance us to EndDocument.
    if (xml.tokenType() == QXmlStreamReader::Invalid)
        xml.readNext();

    if (xml.hasError()) {
        xml.raiseError();
        qDebug() << errorString();
    }
}

void XmlRateReader::processRates() {
    if (!xml.isStartElement() || xml.name() != "rates")
        return;
    while (xml.readNextStartElement()) {
        if (xml.name() == "rate")
            processRate();
        else
            xml.skipCurrentElement();
    }
}

// Uncomment this to see another way to read element
// text. It returns the concatenation of the text
// from all child elements.
//#define USE_READ_ELEMENT_TEXT 1

void XmlRateReader::processRate() {
    if (!xml.isStartElement() || xml.name() != "rate")
        return;

    QString from;
    QString to;
    QString conversion;
    while (xml.readNextStartElement()) {
        if (xml.name() == "from")
            from = readNextText();
        else if (xml.name() == "to")
            to = readNextText();
        else if (xml.name() == "conversion")
            conversion = readNextText();
#ifndef USE_READ_ELEMENT_TEXT
        xml.skipCurrentElement();
#endif
    }

    if (!(from.isEmpty() || to.isEmpty() || conversion.isEmpty()))
        Currency::addRate(from, to, conversion);
}

QString XmlRateReader::readNextText() {
#ifndef USE_READ_ELEMENT_TEXT
    xml.readNext();
    return xml.text().toString();
#else
    return xml.readElementText();
#endif
}

QString XmlRateReader::errorString() {
    return QObject::tr("%1\nLine %2, column %3")
            .arg(xml.errorString())
            .arg(xml.lineNumber())
            .arg(xml.columnNumber());
}

void XmlRateReader::read() {

QFile xmlFile(_filename);

xmlFile.open(QIODevice::ReadOnly);

xml.setDevice(&xmlFile);

if (xml.readNextStartElement() && xml.name() == "rates")

processRates();

// readNextStartElement() leaves the stream in

// an invalid state at the end. A single readNext()

// will advance us to EndDocument.

if (xml.tokenType() == QXmlStreamReader::Invalid)

xml.readNext();

if (xml.hasError()) {

xml.raiseError();

qDebug() << errorString();

}

From within processRates(), we use readNextStartElement() to loop over tags calling processRate() when we find a rate start tag and skipping all other elements. We call skipCurrentElement() for start elements that do not match because readNextStartElement() always descends the tree. Calling skipCurrentElement() jumps to the corresponding end element and keeps us moving on the same level.

#include "xmlratereader.h"

#include <QFile>
#include <QDebug>

#include "currency.h"

XmlRateReader::XmlRateReader(const QString filename) :
    _filename(filename)
{}

void XmlRateReader::read() {
    QFile xmlFile(_filename);
    xmlFile.open(QIODevice::ReadOnly);
    xml.setDevice(&xmlFile);

    if (xml.readNextStartElement() && xml.name() == "rates")
       processRates();

    // readNextStartElement() leaves the stream in
    // an invalid state at the end. A single readNext()
    // will advance us to EndDocument.
    if (xml.tokenType() == QXmlStreamReader::Invalid)
        xml.readNext();

    if (xml.hasError()) {
        xml.raiseError();
        qDebug() << errorString();
    }
}

void XmlRateReader::processRates() {
    if (!xml.isStartElement() || xml.name() != "rates")
        return;
    while (xml.readNextStartElement()) {
        if (xml.name() == "rate")
            processRate();
        else
            xml.skipCurrentElement();
    }
}

// Uncomment this to see another way to read element
// text. It returns the concatenation of the text
// from all child elements.
//#define USE_READ_ELEMENT_TEXT 1

void XmlRateReader::processRate() {
    if (!xml.isStartElement() || xml.name() != "rate")
        return;

    QString from;
    QString to;
    QString conversion;
    while (xml.readNextStartElement()) {
        if (xml.name() == "from")
            from = readNextText();
        else if (xml.name() == "to")
            to = readNextText();
        else if (xml.name() == "conversion")
            conversion = readNextText();
#ifndef USE_READ_ELEMENT_TEXT
        xml.skipCurrentElement();
#endif
    }

    if (!(from.isEmpty() || to.isEmpty() || conversion.isEmpty()))
        Currency::addRate(from, to, conversion);
}

QString XmlRateReader::readNextText() {
#ifndef USE_READ_ELEMENT_TEXT
    xml.readNext();
    return xml.text().toString();
#else
    return xml.readElementText();
#endif
}

QString XmlRateReader::errorString() {
    return QObject::tr("%1\nLine %2, column %3")
            .arg(xml.errorString())
            .arg(xml.lineNumber())
            .arg(xml.columnNumber());
}

void XmlRateReader::processRates() {

if (!xml.isStartElement() || xml.name() != "rates")

return;

while (xml.readNextStartElement()) {

if (xml.name() == "rate")

processRate();

else

xml.skipCurrentElement();

}

Lastly, processRate() iterates over the start elements and grabs the values we want as they’re seen. It also shows an alternative implementation using readElementText().

#include "xmlratereader.h"

#include <QFile>
#include <QDebug>

#include "currency.h"

XmlRateReader::XmlRateReader(const QString filename) :
    _filename(filename)
{}

void XmlRateReader::read() {
    QFile xmlFile(_filename);
    xmlFile.open(QIODevice::ReadOnly);
    xml.setDevice(&xmlFile);

    if (xml.readNextStartElement() && xml.name() == "rates")
       processRates();

    // readNextStartElement() leaves the stream in
    // an invalid state at the end. A single readNext()
    // will advance us to EndDocument.
    if (xml.tokenType() == QXmlStreamReader::Invalid)
        xml.readNext();

    if (xml.hasError()) {
        xml.raiseError();
        qDebug() << errorString();
    }
}

void XmlRateReader::processRates() {
    if (!xml.isStartElement() || xml.name() != "rates")
        return;
    while (xml.readNextStartElement()) {
        if (xml.name() == "rate")
            processRate();
        else
            xml.skipCurrentElement();
    }
}

// Uncomment this to see another way to read element
// text. It returns the concatenation of the text
// from all child elements.
//#define USE_READ_ELEMENT_TEXT 1

void XmlRateReader::processRate() {
    if (!xml.isStartElement() || xml.name() != "rate")
        return;

    QString from;
    QString to;
    QString conversion;
    while (xml.readNextStartElement()) {
        if (xml.name() == "from")
            from = readNextText();
        else if (xml.name() == "to")
            to = readNextText();
        else if (xml.name() == "conversion")
            conversion = readNextText();
#ifndef USE_READ_ELEMENT_TEXT
        xml.skipCurrentElement();
#endif
    }

    if (!(from.isEmpty() || to.isEmpty() || conversion.isEmpty()))
        Currency::addRate(from, to, conversion);
}

QString XmlRateReader::readNextText() {
#ifndef USE_READ_ELEMENT_TEXT
    xml.readNext();
    return xml.text().toString();
#else
    return xml.readElementText();
#endif
}

QString XmlRateReader::errorString() {
    return QObject::tr("%1\nLine %2, column %3")
            .arg(xml.errorString())
            .arg(xml.lineNumber())
            .arg(xml.columnNumber());
}

// Uncomment this to see another way to read element

// text. It returns the concatenation of the text

// from all child elements.

//#define USE_READ_ELEMENT_TEXT 1

void XmlRateReader::processRate() {

if (!xml.isStartElement() || xml.name() != "rate")

return;

QString from;

QString to;

QString conversion;

while (xml.readNextStartElement()) {

if (xml.name() == "from")

from = readNextText();

else if (xml.name() == "to")

to = readNextText();

else if (xml.name() == "conversion")

conversion = readNextText();

#ifndef USE_READ_ELEMENT_TEXT

xml.skipCurrentElement();

#endif

}

if (!(from.isEmpty() || to.isEmpty() || conversion.isEmpty()))

Currency::addRate(from, to, conversion);

}

QString XmlRateReader::readNextText() {

#ifndef USE_READ_ELEMENT_TEXT

xml.readNext();

return xml.text().toString();

#else

return xml.readElementText();

#endif

}

QXmlStreamReader can be made to function more simply. I’ll cover that in a follow-up. The biggest issue I’ve found with this approach is that the main two methods for moving around, readNextStartElement() and skipCurrentElement(), are confusing to use in practice. I’ll let this code explain it for me:

#include "xmlratereader.h"

#include <QFile>
#include <QHash>
#include <QDebug>

// Pick one of these:
//
//#define PRINT_START_ELEMENT_TREE 1
#define USE_RECURSIVE_APPROACH 1       // uses a hierarchy
//#define USE_FIND_ALL_NAMED_ELEMENTS 1  // like QDomDocument::elementsByTagName()

XmlRateReader::XmlRateReader(const QString filename) :
    _filename(filename),
    _total(0)
{}

void XmlRateReader::read() {
    QFile xmlFile(_filename);
    xmlFile.open(QIODevice::ReadOnly);
    _xml.setDevice(&xmlFile);

#if PRINT_START_ELEMENT_TREE
    printStartElementTree();
#elif USE_RECURSIVE_APPROACH
    processAllWithMethod(QStringList() << "update" << "install" << "file",
                          &XmlRateReader::processFile);
#elif USE_FIND_ALL_NAMED_ELEMENTS
    processAllNamedElementsWithMethod("file", &XmlRateReader::processFile);
#endif

    // A single readNext() is required here or else we'll get an error.
    _xml.readNext();
    if (_xml.hasError()) {
        _xml.raiseError();
        qDebug() << errorString();
    }
}

// This method serves to illustrate how readNextStartElement()
// works. The key things to remember are that the current stream
// location may not match the current element, and it always
// descends the tree. Did you get that? Every time it's called it
// descends one level. It returns false and points to an End element
// when it can no longer descend. At that point the stream will point
// to the end element, but the current logical element is the parent
// containing that end element. This is imporant because if you call
// skipCurrentElement() while sitting on an end element, you're
// actually skipping to the end of the parent because it is the
// current element. In the example below, calling skipCurrentElement()
// at the end of line 4 would move the stream to the end of line 7.
//
// Example: pretend we call readNextStartElement() at each line below.
//                      stream      current     returns     skip moves to
// 0. --- nothing ---
// 1. <zero>               zero        zero        true     8
// 2.    <one>             one         one         true     7
// 3.       <two>          two         two         true     4
// 4.       </two>         /two        one         false    7
// 5.       <three>        three       three       true     6
// 6.       </three>       /three      one         false    7
// 7.    </one>            /one        zero        false    8
// 8. </zero>              /zero       ??          false    end of doc
void XmlRateReader::printStartElementTree() {
    int indent = 4;
    int level = 0;
    while (!_xml.atEnd()) {
        while (_xml.readNextStartElement())
            printCurrent(QString(level++ * indent, ' '));
        level--;
    }
    printCurrent(QString(level * indent, ' '));
}

// Helper method to print out where we are in the stream
void XmlRateReader::printCurrent(QString extra) {
    qDebug("%s %s: %s",
           qPrintable(extra),
           qPrintable(_xml.tokenString().remove("Element")),
           qPrintable(_xml.name().toString()));
}

// Acts like QDomDocument::elementsByTagName(name), applying method to each one
void XmlRateReader::processAllNamedElementsWithMethod(QString name, member_fn_type method) {
    if (name.isEmpty() || method == 0)
        return;

    while (!_xml.atEnd()) {
        while (_xml.readNextStartElement()) {
            if (_xml.name() == name) {
                (this->*method)();
                // Ensure that we are at the end of the element
                // matching name or else we might match a nested
                // element with the same name:
                // <name> ... <name>...<name/> ... </name>
                //
                // skipCurrentElement() moves forward to the nearest
                // end element which moves the current element up a
                // level. Using <0><1></1><2></2></0>: if we start at
                // <1>, we'd skip to </1>, then </0>.
                //
                // Edge case that we can't do anything about:
                // <0>  <1>  <0></0> <0></0> </1> </0>
                //           ^- start here
                // If we start from the second <0>, skip moves to the
                // nearby </0>. This will drop us out of the skip loop.
                // Then readNextStartElement() will put as at the next
                // <0> which will trigger a call to method(). The only
                // way to avoid this is to build our own local readNext
                // skip methods which keep track of what level we're on.
                // Then, you'd need to ensure that method() only used
                // our methods to move around. QXmlStreamReader doesn't
                // expose any way of knowing where we are in the
                // tree.
                while(!(_xml.isEndElement() && _xml.name() == name))
                    _xml.skipCurrentElement();
            }
        }
    }
}

// Like processAllNamedElementsWithMethod() above but with a defined hierarchy.
// method will only be applied to elements with a hierarchy that matches names.
// Example: names = QStringList() << "root element" << "next level" << ... << "name";
void XmlRateReader::processAllWithMethod(QStringList names, member_fn_type method)
{
    if (names.isEmpty() || method == 0)
        return;

    QString currentElementName = _xml.name().toString();
    QString name = names.first();
    names.removeFirst();
    while (!_xml.atEnd()) {
        while (_xml.readNextStartElement()) {
            if (_xml.name() == name) {
                if (names.isEmpty())
                    (this->*method)();
                else
                    processAllWithMethod(names, method);
                // The just called level may not have left us
                // at the end of the element named name. This
                // explicitly moves us there. Otherwise we might
                // recurse unexpectedly. Example: if "name" has
                // a child element named "name" and we've returned
                // here before seeing the child, we'll end up
                // processing the child as if it were the parent.
                while(!(_xml.isEndElement() && _xml.name() == name))
                    _xml.skipCurrentElement();
            } else {
                if (!_xml.isEndElement())
                    _xml.skipCurrentElement();
            }
        }
        if (_xml.isEndElement() && _xml.name() == currentElementName)
            break;
    }
}

void XmlRateReader::processFile() {
    QHash<QString, QString> results = getTextElements(QStringList() << "size" << "hash");
    QString size = results.value("size", "");
    QString hash = results.value("hash", "");
    if (hash.isEmpty() || size.isEmpty())
        return;

    if (hash[0].isDigit())
        _total += size.toULongLong();
}

// Find each child element of the current element with a name found in names.
// Return a hash table of string results.
// The code can be made to return early once all names have been found.
// One method is to stop when the number of results equals the number of names.
// The other is to remove keys from names as they're found and return when
// names is empty. Performance may differ based on the number of child elements
// and the number of keys in names. Regardless of which approach is used, sorting
// names by the order that the keys can expect to be seen will yield the best results.
QHash<QString, QString> XmlRateReader::getTextElements(QStringList names) {
    QHash<QString, QString> results;
    // readNextStartElement() descends the XML tree which
    // is undesirable, but we skip over descendants below
    // which keeps us on the same level.
    while (!names.isEmpty() && _xml.readNextStartElement()) {
        if (!names.contains(_xml.name().toString())) {
            _xml.skipCurrentElement();
            continue;
        }
        // readElementText() internally skips the current element
        // TODO: I think readElementText() can throw
        results.insert(_xml.name().toString(),
                       _xml.readElementText(_xml.SkipChildElements));
        names.removeOne(_xml.name().toString());
    }
    return results;
}


QString XmlRateReader::errorString() {
    return QObject::tr("%1\nLine %2, column %3")
            .arg(_xml.errorString())
            .arg(_xml.lineNumber())
            .arg(_xml.columnNumber());
}

// This method serves to illustrate how readNextStartElement()

// works. The key things to remember are that the current stream

// location may not match the current element, and it always

// descends the tree. Did you get that? Every time it's called it

// descends one level. It returns false and points to an End element

// when it can no longer descend. At that point the stream will point

// to the end element, but the current logical element is the parent

// containing that end element. This is imporant because if you call

// skipCurrentElement() while sitting on an end element, you're

// actually skipping to the end of the parent because it is the

// current element. In the example below, calling skipCurrentElement()

// at the end of line 4 would move the stream to the end of line 7.

// Example: pretend we call readNextStartElement() at each line below.

// stream current returns skip moves to

// 0. --- nothing ---

// 1. <zero> zero zero true 8

// 2. <one> one one true 7

// 3. <two> two two true 4

// 4. </two> /two one false 7

// 5. <three> three three true 6

// 6. </three> /three one false 7

// 7. </one> /one zero false 8

// 8. </zero> /zero ?? false end of doc

void XmlRateReader::printStartElementTree() {

int indent = 4;

int level = 0;

while (!_xml.atEnd()) {

while (_xml.readNextStartElement())

printCurrent(QString(level++ * indent, ' '));

level--;

}

printCurrent(QString(level * indent, ' '));

}

Â Pros & Cons

++ fastest parser
++ for simple parsing, can be made as simple as QDomDocument
+ easy to understand and follow
+ parsing is linear and follows the code
+ low memory
+ can be parsed incrementally (chunks of xml at a time)
- extremely verbose for simple parsing
- very easy to mess up traversal using readNextStartElement() and skipCurrentElement()
- lots of extra boilerplate that junks up your parsing

2. QDomDocument

QDomDocument processes an xml document into a fully processed internal tree of xml objects that can be read and manipulated through a single interface. Here we read the entire xml file into our QDomDocument, doc, and ask for a list of all of the rate elements. Then, it’s just a couple more steps to parse out the child elements.

#include "xmldomratereader.h"

#include <QDomDocument>
#include <QFile>

#include "currency.h"

void readRatesFromXml(const QString &filename) {
    QDomDocument doc;
    QFile file(filename);
    if (!file.open(QIODevice::ReadOnly) || !doc.setContent(&file))
        return;

    QDomNodeList rates = doc.elementsByTagName("rate");
    for (int i = 0; i < rates.size(); i++) {
        QDomNode n = rates.item(i);
        QDomElement from = n.firstChildElement("from");
        QDomElement to = n.firstChildElement("to");
        QDomElement conversion = n.firstChildElement("conversion");
        if (from.isNull() || to.isNull() || conversion.isNull())
            continue;
        Currency::addRate(from.text(), to.text(), conversion.text());
    }
}

void readRatesFromXml(const QString &filename) {

QDomDocument doc;

QFile file(filename);

if (!file.open(QIODevice::ReadOnly) || !doc.setContent(&file))

return;

QDomNodeList rates = doc.elementsByTagName("rate");

for (int i = 0; i < rates.size(); i++) {

QDomNode n = rates.item(i);

QDomElement from = n.firstChildElement("from");

QDomElement to = n.firstChildElement("to");

QDomElement conversion = n.firstChildElement("conversion");

if (from.isNull() || to.isNull() || conversion.isNull())

continue;

Currency::addRate(from.text(), to.text(), conversion.text());

}

Note: If there are other types of rate elements in the xml file, we might get the wrong types as well. There might be a way to be more specific using namespaces or to just first find the rates element(s) and then find the rate elements within those.

Â Pros & Cons

++ very easy to use and understand
+ less code
? simplicity might break for complex files
- potential for high memory usage (entire xml is parsed and stored in memory)
- not actively maintained anymore¹

3. XQuery/XPath

From the QtXmlPatterns module, XQuery/XPath is a language for parsing XML and formatting the parsed results. To be more specific, XPath is the syntax specification that XQuery uses and greatly extends to indicate what and how to parse. Both of them are standardized and are therefore not unique to Qt. Qt has two good introductions to XQuery: there is a detailed introduction to the XQuery language itself, but their [XQuery documentation] does a better job of introducing XQuery by showing it used in practice. I found it rather difficult to figure out how to pull out my rate information. As you’ll see below, I found two methods of doing it, but there’s still room for improvement.

Before moving on to the solutions, you should know that Qt comes with a command line utility called xmlpatterns. If you put your XQuery into a text file, you can test it from the command line saving you lots of time re-compiling. The tool was my primary method of testing and learning XQuery. You can find my tests in this folder on github.

bash$ cat xquery_tests/file7.xq
doc('../doc/RATES.xml')//rate/string-join(data((from, to, conversion)), ',')

bash$ ~/Qt/5.3/clang_64/bin/xmlpatterns xquery_tests/file7.xq
AUD,CAD,1.0079 AUD,EUR,0.7439 CAD,AUD,0.9921 CAD,USD,1.0090 EUR,AUD,1.3442 USD,CAD,0.9911
bash$

bash$ cat xquery_tests/file7.xq

doc('../doc/RATES.xml')//rate/string-join(data((from, to, conversion)), ',')

bash$ ~/Qt/5.3/clang_64/bin/xmlpatterns xquery_tests/file7.xq

AUD,CAD,1.0079 AUD,EUR,0.7439 CAD,AUD,0.9921 CAD,USD,1.0090 EUR,AUD,1.3442 USD,CAD,0.9911

bash$

I started by pulling out all of the from, to, and conversion elements as separate lists and then assuming the three lists would be in the same order. I ran the same query for each element name. Note: I should have used variable binding instead of string substitution here. It doesn’t matter here, but it’s a good habit to form because XQuery is subject to injection attacks.

#include "xqueryratereader.h"

#include <QtXmlPatterns/QXmlQuery>
#include <QStringList>
#include <QFileInfo>
#include <QDebug>

#include "currency.h"

// Notes: my biggest issue with this design is that the different pieces of
//        each rate are pulled out separately. I would prefer one query that
//        pulled them out in sets.
void readRatesUsingXQuery(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    typedef QPair<QStringList &, QString> QueryPair;
    QList<QueryPair> queries;
    QStringList from, to, conversion;
    queries << QueryPair(from, "from") << QueryPair(to, "to") << QueryPair(conversion, "conversion");
    QXmlQuery query;
    foreach (QueryPair pair, queries) {
        query.setQuery(queryUrl.arg(pair.second));
        query.evaluateTo(&pair.first);
    }
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

// Same as method above but without any pizzazz. Note that it's only one line shorter.
void readRatesUsingXQuery_expanded(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    QStringList from, to, conversion;
    QXmlQuery query;
    query.setQuery(queryUrl.arg("from"));
    query.evaluateTo(&from);
    query.setQuery(queryUrl.arg("to"));
    query.evaluateTo(&to);
    query.setQuery(queryUrl.arg("conversion"));
    query.evaluateTo(&conversion);
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

void readRatesUsingXQuery2(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/string-join((from, to, conversion)/string(), ',')")
                             .arg(file.absoluteFilePath());

    QStringList rates;
    QXmlQuery query;
    query.setQuery(queryUrl);
    query.evaluateTo(&rates);
    foreach (const QString &rate, rates) {
        QStringList values = rate.split(',');
        if (values.size() != 3)
            continue;
        Currency::addRate(values[0], values[1], values[2]);
    }
}

void readRatesUsingXQuery_expanded(const QFileInfo file) {

const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

QStringList from, to, conversion;

QXmlQuery query;

query.setQuery(queryUrl.arg("from"));

query.evaluateTo(&from);

query.setQuery(queryUrl.arg("to"));

query.evaluateTo(&to);

query.setQuery(queryUrl.arg("conversion"));

query.evaluateTo(&conversion);

if (to.size() != from.size() || to.size() != conversion.size())

return;

for (int i = 0; i < to.size(); ++i)

Currency::addRate(from.at(i), to.at(i), conversion.at(i));

}

Next, I tried to jazz up the code a bit by storing the lists and element names in a list themselves and writing generic code to do the rest. For my three elements, the code ended up being one line longer and more complex than the simple approach above. Given lots more terms, this would’ve paid for itself. But there’s something to be said for simplicity.

#include "xqueryratereader.h"

#include <QtXmlPatterns/QXmlQuery>
#include <QStringList>
#include <QFileInfo>
#include <QDebug>

#include "currency.h"

// Notes: my biggest issue with this design is that the different pieces of
//        each rate are pulled out separately. I would prefer one query that
//        pulled them out in sets.
void readRatesUsingXQuery(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    typedef QPair<QStringList &, QString> QueryPair;
    QList<QueryPair> queries;
    QStringList from, to, conversion;
    queries << QueryPair(from, "from") << QueryPair(to, "to") << QueryPair(conversion, "conversion");
    QXmlQuery query;
    foreach (QueryPair pair, queries) {
        query.setQuery(queryUrl.arg(pair.second));
        query.evaluateTo(&pair.first);
    }
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

// Same as method above but without any pizzazz. Note that it's only one line shorter.
void readRatesUsingXQuery_expanded(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    QStringList from, to, conversion;
    QXmlQuery query;
    query.setQuery(queryUrl.arg("from"));
    query.evaluateTo(&from);
    query.setQuery(queryUrl.arg("to"));
    query.evaluateTo(&to);
    query.setQuery(queryUrl.arg("conversion"));
    query.evaluateTo(&conversion);
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

void readRatesUsingXQuery2(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/string-join((from, to, conversion)/string(), ',')")
                             .arg(file.absoluteFilePath());

    QStringList rates;
    QXmlQuery query;
    query.setQuery(queryUrl);
    query.evaluateTo(&rates);
    foreach (const QString &rate, rates) {
        QStringList values = rate.split(',');
        if (values.size() != 3)
            continue;
        Currency::addRate(values[0], values[1], values[2]);
    }
}

void readRatesUsingXQuery(const QFileInfo file) {

const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

typedef QPair<QStringList &, QString> QueryPair;

QList<QueryPair> queries;

QStringList from, to, conversion;

queries << QueryPair(from, "from") << QueryPair(to, "to") << QueryPair(conversion, "conversion");

QXmlQuery query;

foreach (QueryPair pair, queries) {

query.setQuery(queryUrl.arg(pair.second));

query.evaluateTo(&pair.first);

}

if (to.size() != from.size() || to.size() != conversion.size())

return;

for (int i = 0; i < to.size(); ++i)

Currency::addRate(from.at(i), to.at(i), conversion.at(i));

}

Eventually, I returned to the problem and found a single query to pull all three elements out at once. The result is a list of rate elements separated by commas with each rate grouping separated by a space. XQuery has support for types, but it’s pointless to use them because Qt’s XQuery code will just convert them back to strings before you get them which just wastes the effort.²

#include "xqueryratereader.h"

#include <QtXmlPatterns/QXmlQuery>
#include <QStringList>
#include <QFileInfo>
#include <QDebug>

#include "currency.h"

// Notes: my biggest issue with this design is that the different pieces of
//        each rate are pulled out separately. I would prefer one query that
//        pulled them out in sets.
void readRatesUsingXQuery(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    typedef QPair<QStringList &, QString> QueryPair;
    QList<QueryPair> queries;
    QStringList from, to, conversion;
    queries << QueryPair(from, "from") << QueryPair(to, "to") << QueryPair(conversion, "conversion");
    QXmlQuery query;
    foreach (QueryPair pair, queries) {
        query.setQuery(queryUrl.arg(pair.second));
        query.evaluateTo(&pair.first);
    }
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

// Same as method above but without any pizzazz. Note that it's only one line shorter.
void readRatesUsingXQuery_expanded(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/%2/string()").arg(file.absoluteFilePath());

    QStringList from, to, conversion;
    QXmlQuery query;
    query.setQuery(queryUrl.arg("from"));
    query.evaluateTo(&from);
    query.setQuery(queryUrl.arg("to"));
    query.evaluateTo(&to);
    query.setQuery(queryUrl.arg("conversion"));
    query.evaluateTo(&conversion);
    if (to.size() != from.size() || to.size() != conversion.size())
        return;
    for (int i = 0; i < to.size(); ++i)
        Currency::addRate(from.at(i), to.at(i), conversion.at(i));
}

void readRatesUsingXQuery2(const QFileInfo file) {
    const QString queryUrl = QString("doc('%1')//rate/string-join((from, to, conversion)/string(), ',')")
                             .arg(file.absoluteFilePath());

    QStringList rates;
    QXmlQuery query;
    query.setQuery(queryUrl);
    query.evaluateTo(&rates);
    foreach (const QString &rate, rates) {
        QStringList values = rate.split(',');
        if (values.size() != 3)
            continue;
        Currency::addRate(values[0], values[1], values[2]);
    }
}

void readRatesUsingXQuery2(const QFileInfo file) {

const QString queryUrl = QString("doc('%1')//rate/string-join((from, to, conversion)/string(), ',')")

.arg(file.absoluteFilePath());

QStringList rates;

QXmlQuery query;

query.setQuery(queryUrl);

query.evaluateTo(&rates);

foreach (const QString &rate, rates) {

QStringList values = rate.split(',');

if (values.size() != 3)

continue;

Currency::addRate(values[0], values[1], values[2]);

}

I think for applications like this where you want to pull some information out of XML, XQuery is the way to go. It’s significantly harder to figure out (at first), but the potential is great and the fact that it’s a standard means you can use the same technique elsewhere. Also, as the queries get more complex, your Xquery string may get more complex, but your code still remains relatively short.

Pros & Cons

++ less code with potential for greater savings as the parsing problem gets more complex
++ very powerful
+ standardized, use it other places where XQuery is supported
+ very easy to test using xmlpatterns from the commandline
+ can parse data that “looks like XML”
+ can convert all or part of one XML format to a different XML format in only a few lines of code
- extremely hard to understand for all but very simple queries
- makes regular expressions look simple
- most of the online tutorials and documentation google returns only show extremely simple examples
- potential for bugs due to complexity
-- slowest parsing method (4 times slower in one test)

4. SAX2

Wait, what? I thought you said THREE methods.

Well, I did… but that’s because I intentionally skipped SAX.

Why?

Honestly… I mistakenly thought it was deprecated.¹ ³

SAX is an event-based XML parser. You create a class with callback methods and give your XML and your event-handling class to the SAX parser. The parser will then call methods in your class for each event such as: start tag, character data, end tag, and error. XML data is parsed serially and not kept in memory, so it’s very similar to stream reader in that respect. The difference is that stream reader implementations typically model the XML structure in code where specially coded and named methods for different levels track the XML. For SAX, a single interface is called. I think you could make SAX almost identical to stream reader though by having your event methods use the strategy pattern. You could then have a different strategy class for each XML element similar to the stream reader’s methods. To give you an idea of what SAX looks like, here are some excerpts from the bookmarks SAX example.

In the example, the XML is parsed into a tree widget and therefore, the xml handler has intimate knowledge of the tree. You can see here in the open() method how the parsing is established.

Now look at the class declaration for the handler. Notice how the handler has to maintain state versus the stream reader where the state was implicitly defined by the code.

Looking at the declaration for XbelHandler::startElement(), you can see how the handler must use a combination of element name matching and saved state in order to take action.

Pros & Cons

+ can be parsed incrementally (chunks of xml at a time)
+ based on a well-established Java pattern of parsing XML so perhaps porting code is easier
- a little more difficult to understand and code
- probably more prone to encapsulation problems and keeping tons of state in the handler as the parsing gets more complex
- not actively maintained anymore¹

Conclusions & Caveats

Searching XML, Minimal Parsing

If I wanted to quickly pull a selection of data out of xml, I would use XQuery. If I know the source XML is guaranteed to be small, I might choose QDomDocument for it’s ease-of-use.

Update: I’d actually use QXmlStreamReader along with a helper method that makes it about as simple as QDomDocument to use. This can be seen in the follow-up post.

Extensive Parsing

If I were processing an entire XML file (like a configuration file where every line is important), I’d use QXmlStreamReader or maybe SAX. I think by now you can see why we might use a stream reader, but why SAX? Well, both streaming and SAX support partial input, but stream reader has to detect PrematureEndOfDocumentError. Here’s the kicker, though. According to the docs, once you’ve determined that it’s safe to resume, you have to resume from the code position where you left off since stream reader state is stored in the code position. Alternatively, SAX stores it’s state in variables, so it knows exactly where it left off. Furthermore, since SAX is a push-based parser rather than pull-based like stream reader (meaning your SAX handler has events handed to it while stream reader requests data as needed), there’s no need to catch the end of document error. It’s all event driven, so you let the parser run event-based until it’s done, errors out (for real), or times out. Then, you signal that parsing is finished. Also, I believe that if you used the strategy pattern for the different element types of your XML, you could get the benefits of stream readers recursive descent parser design with SAX. In fact, if you make it such that the strategy classes keep their own state, you could have a generic reusable SAX handler that implements any set of strategy classes. The Qt documentation says “QXmlStreamReader is a faster and more convenient replacement for Qt’s own SAX parser”, and they don’t indicate how much faster, so your mileage may vary. Additionally, since stream reader was written after sax, the developers may have more interest in it and it may be better maintained, but that’s just speculation.

Caveat Emptor

Finally, I don’t use XML much in my work, and my knowledge is limited. I assume that I’ve made mistakes in this document. Please leave me any comments, corrections, or suggestions below. I’ll do my best then to fix problems here.

Wrap Up

Hopefully, this document will help you quickly evaluate all of the different Qt native XML methods in one place and saves you some time. Use the excellent Qt documentation to refine your choice.

Qt documentation says that the QtXml module in not actively maintained anymore here and here, but maybe because it worked well enough. Here’s a mailing list e-mail stating that it’s really slow and proposes an alternative library, pugixml, to use with a similar interface. It’s only 2 headers and 1 source file. Although, many people anecdotally claim to use the DOM interface with small XML documents and have never noticed performance problems warranting attention. According to QTBUG-32926, QtXml is probably not going to be removed, it just won’t be improved. Critical bug fixes, if any, may still occur, just don’t expect it to ever change. ↩ ↩ ↩
Although, maybe I’m wrong. There’s a lot about types just underneath the section on variable binding. ↩
The stream reader is apparently faster, easier to code, and more Qt-like. Additionally, SAX is modeled after the Java SAX2 interface for parsing XML. So, the main reason you’d use it is to port a Java parser to Qt. So, unless you’re porting, effectively deprecated. ↩

3gfp

Three Ways To Parse XML in Qt

The Problem

1. Stream Reader

Â Pros & Cons

2. QDomDocument

Â Pros & Cons

3. XQuery/XPath

Pros & Cons

4. SAX2

Pros & Cons

Conclusions & Caveats

Searching XML, Minimal Parsing

Extensive Parsing

Caveat Emptor

Wrap Up

Embedded Systems / Linux Expertise