viksoe.dk

XML Config Reader


This article was submitted .


For a few projects I have had the need to load application settings from XML files or even receive commands through XML messages from a remote server. And while reading data from an XML file does mandate using a proper XML Parser library, the act of parsing the XML structure can be tedious and labour intensive.

I've settled with a simple C++ class for interpreting configuration files quickly. This is the class I'm using:
#include "XmlParser.h"

CXMLParser parser;
parser.Create();

CXmlConfig config;
parser.SetHandler(&config);

parser.Parse(pstrXml);
This is how the parser is initialized and begins processing a file. The idea is that the CXMLParser class is a wrapper around the real XML library. In addition, you provide a handler C++ class (here named CXmlConfig) which does the dissection of the XML structures.

The XML library used is abstracted away in the CXMLParser class. The first version I wrote, used the open-source Expat parser. It is a SAX parser, which makes things a little complicated since you have to keep a lot of state during the parse steps. Unfortunately I needed the C++ class to work on LINUX systems too, so I chose a simple light-weight easy-to-install library. Eventually, I rewrote the class to work with the also light-weight Microsoft Xmllite parser. The only difference between my 2 classes (both included in the source download) is that the Xmllite version uses UNICODE strings, while the Expat version is strictly LATIN-1 codepage enabled.

To iterate over the XML content you must supply a C++ class (handler) that derives from CXMLElementHandler. In the sample above, this is what is done with the CXmlConfig class. Given the following XML:
<?xml version="1.0"?>
<Configuration>
	<Data strValue="ABC" intValue="222" boolValue="true" />
</Configuration>
...you can write a CXmlConfig class from the sample code above like this:
class CXmlConfig : public CXMLElementHandler
{
public:
  std::string sValue;
  int iValue;
  bool bValue;

  BEGIN_XML_PARSE_MAP()
    BEGIN_XML_ELEMENT("Data")
      XML_ATTRIB_STR("strValue", sValue)
      XML_ATTRIB_INT("intValue", iValue)
      XML_ATTRIB_BOOL("boolValue", bValue)
    END_XML_ELEMENT()
  END_XML_PARSE_MAP()
};
One of the features of the handler class, is that it defines a macro-map for simple iteration over the XML elements and attributes. Yes, macro-maps are not a favourite for the C++ purists, but since I do a lot of work with ATL and WTL libraries, I find them quite useful.

The macro-map allows you the test for tags with a particular name:
    BEGIN_XML_ELEMENT("Data")
      XML_ATTRIB_STR("strValue", sValue)
    END_XML_ELEMENT()
This snippet makes sure that whenever a tag named "Data" comes along, it looks for an XML attribute named "strValue" and assigns its text value to the sValue string member variable.
You might have noticed that the macro-map is just a simple wrapper around the SAX callback functionality.

The block above only dissects attributes. If your XML format uses text-nodes such as...
<Description>Text message goes here...</Description>
then the macro-map allows you to catch this with...
    XML_CHARDATA_STR("Description", sDescription)

If you have repeating nodes (lists) in your XML, things may get a little more complicated. Generally, since this whole stuff is based around a SAX parser layout, you'll have to maintain a little state while parsing the lists.
In the macro-map you could do this:
    BEGIN_XML_ELEMENT("Item")
      XML_ATTRIB_INIT( ITEM t; aList.push_back(t); pTempItem = &aList.back(); )
      XML_ATTRIB_INT("a", pTempItem->a)
    END_XML_ELEMENT()
Here, the XML_ATTRIB_INIT macro allows you to inject C++ code that runs immediately when the tag element is first found. In the case, it makes sure to allocate room for another list-entry before parsing the attributes of the tag element. To ensure this works, you'll also have to define 2 variables in the handler class.
  std::vector<ITEM> aList;
  ITEM* pTempItem;
The code above is inheriently unsafe when used for anything else than the simple 1 Element Construct above. The pointer reference is dangerously close to become invalid, so pay good attention if you need to reference the temporary pointer outside the element where it was originally initialized. This can happen when embedded tags must be parsed.

Limitations

Anytime you decide to base your XML parsing on a SAX parser you really need to reconsider if it's worth the trouble. If your XML schema is anything but simple, the time needed to verify the code, and the complexity of the state maintenance may outweigh the time it takes to do a traditional DOM parsing. At least these points should be met:
  • XML comes from a secure source.
  • You control the XML schema
  • or at least the schema is sufficiently simple that tag names aren't reused too often.
The system I describe here really only works well for very basic XML schemas. As mentioned, I use it primarily for reading XML configuration files that are private to my applications.

Linear mapping

As an alternative for iterating over each tag element, you may consider a dictionary approach. This is one of the things a SAX parser does really well, but at the expense of validation and data-type mapping. I've used this technique especially for internal configuration files.

The idea is that you generate a dictionary of all the attributes in the XML file. Each attribute is stored in a dictionary structure (ie. a STL map) with its name expanded to full-depth including its parent tag names used as key.
Like this:
KeyValue
Configuration.Data.strValueABC
Configuration.Data.intValue222
Configuration.Data.boolValuetrue

And so you can reference each attribute in your code through the STL map.
std::map<std::wstring, std::wstring> cfgmap;
...
std::wstring s = cfgmap[L"Configuration.Data.strValue"];

The CXmlConfig class that can do this magic looks like this:
class CXmlConfig : public CXMLElementHandler
{
public:
   std::map<std::wstring, std::wstring> cfgmap;
   std::wstring sTempName;

   void OnStartElement(LPCWSTR pszName, LPCWSTR* ppszAttribs)
   {
      if( !sTempName.empty() ) sTempName += L".";
      sTempName += pszName;
      while( *ppszAttribs != NULL ) {
         LPCWSTR pszName = *ppszAttribs++;
         std::wstring sValue = *ppszAttribs++;
         std::wstring sKey = sTempName;
         sKey += L"."; sKey += pszName;
         cfgmap[sKey] = sValue;
      }
   }

   void OnEndElement(LPCWSTR pszName)
   {
      size_t iPos = sTempName.find_last_of('.');
      if( iPos != std::wstring::npos ) sTempName = sTempName.erase(iPos);
   }
};
However, this class doesn't support lists.

In the download below you can find samples for both the Expat and Xmllite based classes.

Source Code Dependencies

Microsoft Visual Studio.NET 2008

Download Files

DownloadSource Code (98 Kb)

To the top