I've settled with a simple C++ class for interpreting configuration files quickly. This is the class I'm using:
#include "XmlParser.h"
CXMLParser parser;
parser.Create();
CXmlConfig config;
parser.SetHandler(&config);
parser.Parse(pstrXml);
This is how the parser is initialized and begins processing a file. The idea is that the CXMLParser
class
is a wrapper around the real XML library. In addition, you provide a handler C++ class (here named CXmlConfig
)
which does the dissection of the XML structures.
The XML library used is abstracted away in the
CXMLParser
class.
The first version I wrote, used the open-source
Expat parser.
It is a SAX parser, which makes things a little complicated since you have to keep a lot of state during the parse steps.
Unfortunately I needed the C++ class to work on LINUX systems too, so I chose a simple light-weight easy-to-install library.
Eventually, I rewrote the class to work with the also light-weight Microsoft Xmllite parser. The only difference between
my 2 classes (both included in the source download) is that the Xmllite version uses UNICODE strings, while the Expat version
is strictly LATIN-1 codepage enabled.
To iterate over the XML content you must supply a C++ class (handler) that derives from
CXMLElementHandler
.
In the sample above, this is what is done with the CXmlConfig
class.
Given the following XML:
<?xml version="1.0"?>
<Configuration>
<Data strValue="ABC" intValue="222" boolValue="true" />
</Configuration>
...you can write a CXmlConfig
class from the sample code above like this:
class CXmlConfig : public CXMLElementHandler
{
public:
std::string sValue;
int iValue;
bool bValue;
BEGIN_XML_PARSE_MAP()
BEGIN_XML_ELEMENT("Data")
XML_ATTRIB_STR("strValue", sValue)
XML_ATTRIB_INT("intValue", iValue)
XML_ATTRIB_BOOL("boolValue", bValue)
END_XML_ELEMENT()
END_XML_PARSE_MAP()
};
One of the features of the handler class, is that it defines a macro-map for simple iteration over the
XML elements and attributes. Yes, macro-maps are not a favourite for the C++ purists, but since I do
a lot of work with ATL and WTL libraries, I find them
quite useful.
The macro-map allows you the test for tags with a particular name:
BEGIN_XML_ELEMENT("Data")
XML_ATTRIB_STR("strValue", sValue)
END_XML_ELEMENT()
This snippet makes sure that whenever a tag named "Data" comes along, it looks for an XML attribute named
"strValue" and assigns its text value to the sValue
string member variable.
You might have noticed that the macro-map is just a simple wrapper around the SAX callback functionality.
The block above only dissects attributes. If your XML format uses text-nodes such as...
<Description>Text message goes here...</Description>
then the macro-map allows you to catch this with...
XML_CHARDATA_STR("Description", sDescription)
If you have repeating nodes (lists) in your XML, things may get a little more complicated. Generally, since this whole stuff is based around a SAX parser layout, you'll have to maintain a little state while parsing the lists.
In the macro-map you could do this:
BEGIN_XML_ELEMENT("Item")
XML_ATTRIB_INIT( ITEM t; aList.push_back(t); pTempItem = &aList.back(); )
XML_ATTRIB_INT("a", pTempItem->a)
END_XML_ELEMENT()
Here, the XML_ATTRIB_INIT
macro allows you to inject C++ code that runs immediately when the tag element is first found.
In the case, it makes sure to allocate room for another list-entry before parsing the attributes of the tag element.
To ensure this works, you'll also have to define 2 variables in the handler class.
std::vector<ITEM> aList;
ITEM* pTempItem;
The code above is inheriently unsafe when used for anything else than the simple 1 Element Construct above.
The pointer reference is dangerously close to become invalid, so pay good attention if you need to reference
the temporary pointer outside the element where it was originally initialized. This can happen when embedded tags must be
parsed.
Limitations
Anytime you decide to base your XML parsing on a SAX parser you really need to reconsider if it's worth the trouble. If your XML schema is anything but simple, the time needed to verify the code, and the complexity of the state maintenance may outweigh the time it takes to do a traditional DOM parsing. At least these points should be met:- XML comes from a secure source.
- You control the XML schema
- or at least the schema is sufficiently simple that tag names aren't reused too often.
Linear mapping
As an alternative for iterating over each tag element, you may consider a dictionary approach. This is one of the things a SAX parser does really well, but at the expense of validation and data-type mapping. I've used this technique especially for internal configuration files.The idea is that you generate a dictionary of all the attributes in the XML file. Each attribute is stored in a dictionary structure (ie. a STL map) with its name expanded to full-depth including its parent tag names used as key.
Like this:
Key | Value |
---|---|
Configuration.Data.strValue | ABC |
Configuration.Data.intValue | 222 |
Configuration.Data.boolValue | true |
And so you can reference each attribute in your code through the STL map.
std::map<std::wstring, std::wstring> cfgmap;
...
std::wstring s = cfgmap[L"Configuration.Data.strValue"];
The
CXmlConfig
class that can do this magic looks like this:
class CXmlConfig : public CXMLElementHandler
{
public:
std::map<std::wstring, std::wstring> cfgmap;
std::wstring sTempName;
void OnStartElement(LPCWSTR pszName, LPCWSTR* ppszAttribs)
{
if( !sTempName.empty() ) sTempName += L".";
sTempName += pszName;
while( *ppszAttribs != NULL ) {
LPCWSTR pszName = *ppszAttribs++;
std::wstring sValue = *ppszAttribs++;
std::wstring sKey = sTempName;
sKey += L"."; sKey += pszName;
cfgmap[sKey] = sValue;
}
}
void OnEndElement(LPCWSTR pszName)
{
size_t iPos = sTempName.find_last_of('.');
if( iPos != std::wstring::npos ) sTempName = sTempName.erase(iPos);
}
};
However, this class doesn't support lists.
In the download below you can find samples for both the Expat and Xmllite based classes.
Source Code Dependencies
Microsoft Visual Studio.NET 2008Download Files
![]() | Source Code (98 Kb) |