XML: About

XCG is a command line utility for transforming the structure of an XML instance into some kind of structure. It was created to help me generate wrappers for several dozen different XML formats that I was working with. As a result of the need, the first version generated c++ code wrappers around libxml2, which after a lot of research appeared to be the best solution for my XML reading and writing needs.

Now, it is much more powerful than that what I just described. Instead of just generating c++ wrapper code, it now generates based on the contents of its configuration, which is contained within XML. In fact, it uses itself to generate the code to read the XML configuration file (well, not directly, but its the same end result). It could be self hosting simply by embedding a copy of the default XML configuration in the executable; I chose to keep it external to make it easier for others to modify the default code generation.

I suppose I should explain why I didn't use a DTD or schema as the basis of the generation. Problem is, I don't really have what I consider solid reasons. For most of the XML data I was involved with, there either wasn't a DTD or schema, or if there was, it was out of date. I suppose I could have made the creation and/or update of the DTD's or schema a requirement, but in the end, I found myself creating a lot of last minute simple XML instances with all kinds of short little formats for things that would never be used again (look at example3 for a real world example). Yes, I could have used XSL to do some/most of these translations and such. However, in the end, I think what made me choose this method was a library named TinyXML that has a pretty faithful following because it is so easy to use (alas, it has a few issues that make it unusable for me). I also wanted something that was as easy to use, but which would be much more flexible. This is where I ended up, and I am actually very happy with the result! If one is familiar with stl, then the default code generation is pretty straight forward to use; Because all elements are wrapped in objects, and those objects don't allow direct acess to any data, it actually is highly maintainable as well. As always, there are a few things I still want from it, but its a great start.

As a result of using a configuration file (called a "template"), there are very few limitations on what XCG can generate. So far, the limitations I have encounterd are more annoyances than limitations; Issues I got around by making significant modifications to my template - but which a few extra featuures will eventually make much easier to deal with.

The following is a list of the major features (or functionality) provided by XCG in its current form:

Output is completely configuration defined: There isn't a requirement to output source code - it can literally be anything desired!
Any number of files can be generated, as all generated files are defined by the template. For example, if 1 class per element is created, one can have each class in a seperate file, or one can have all of them in a single file (like the default generation)
Unlimited iterations through elements, both at file and sub-file scope (i.e. one could iterate over elements, and within the generated text of that iteration, iterate over them again, recursively). Depth is limited by machine hardware, not XCG itself (just don't forget to make it end somewhere!)
Unlimited iterations of children of an element within an element iteration (without an element, how can there be child elements?)
Unlimited iteration of attributes of an element within an element iteration (without an element, what are attributes?)
One can reference the name of the current file, element, attribute, child element, or the root element within the generated text.
In addition to the previous item, there are some built in transformations that can be done on the file, element, attribute, child element and root elements names:
- Capitalize first letter
- Capitalize entire name
- Perform name replacements (for example, replacing words reserved within the target language)
- Pluralize
- Replace invalid characters with valid characters (or strings)
- Remove invalid characters and capitalize next char (for example, make "foo-bar" into "fooBar")
For pluralization, the template can provide verbatim pluralized forms of words. For example, pluralizing "child" isn't just a matter of adding an "s", so one could provide a pluralized form of child as "children"
A list of invalid characters, and their replacements can be specified in the template
A list of invalid words, and their replacements can be specified in the template
Eventually, all transformation lists will be able to specify their context. For example, maybe a word replacement is only valid while iterating through elements.
Isolates your XML file i/o from the underlying technology. If you later on decide to change the technology of your XML support, its simply a matter of updating the template to generate the exact same external interface, but internally use the new technology. For example, currently the default generation is for libxml2. However, if I decided to change to using the Apache XML OpenSource library, I can modify the template to use a different back end (the Apache XML library), but keep the external interface the same. Its then simply a matter of a recompile to change the back-end technology from libxml2 to Apache XML!
Abilities available in 1.1.0 or later
- <element> allows user defined meta data.
- Template and element specific information are now contained in seperate files. This makes it easy to use the same template code for multiple projects, as the data that is specific to a project is in a seperate file. For example, all <template> elements are contained within a template file, but the new <element> elements are only in the configuration file. Additionally, the <replacement>, <invalid>, and <plural> elements can exist in either or both files.
- <condition> elements allow custom conditionals. All standard conditions can be used as well as the values of the new meta data can be queried.