Introduction to XML


The Extensible Markup Language


Overview of Files

Consider the following list:


When you create this type of list, you choose the programming environment of your choice and save the file. You cannot open that file in just any application. You must create and distribute the application to those who want to be able to open a file created by your application. It could be useful to create a type of file using any application of your choice and be able to open that file using another type of application that you may not even know.


Introduction to XML

The Extensible Markup Language, or XML, is a technique of using a document, such as a text file, to describe information and make that information available to whatever and whoever can take advantage of it. The description is done so the document can be created by one person or company and used by another person or another company without having to know who first created the document. This is because the document thus created is not a program, it is not an application: it is just a text-based document.

Because XML is very flexible, it can be used in regular Windows applications, in databases, in web-based systems, in communication applications, in computer networks, in scientific applications, etc. To make sure that XML can be universally used without one person or group owning it, it is standardized by the W3C (http://www.w3c.org) organization. XML is released through an XML Recommendation document with a version.

To create an XML file, in the document, you type units of code using normal characters of the English language. The XML document is made of units called entities. These entities are spread on various lines of the document as you judge them necessary and as we will learn. XML has strict rules as to how the contents of the document should or must be structured.

After an XML document has been created and is available, in order to use it, you need a program that can read, analyze, and interpret it. This program is called a parser. The most popular parser used in Microsoft Windows applications is MSXML, published by Microsoft. Before using it, you must install it in your computer. This is taken care of when you install CodeGear C++Builder.

Practical LearningPractical Learning: Introducing XML

  1. Start C++Builder and create a VCL Forms Application
  2. To save it, click the Save All button
  3. Click the Create New Folder button
  4. Type Exercise02 and press Enter twice to display it in the Save In combo box
  5. Change the name of the unit to Exercise and click Save
  6. Set the project name to Exercise2 and click Save


A markup is an instruction that defines XML. The fundamental formula of a markup is:


The left angle bracket "<" and the right angle bracket ">" are required. Inside of these symbols, you type a word or a group of words of your choice, using regular characters of the English alphabet and sometimes non-readable characters such as ?, !, or [. The combination of a left angle bracket "<", the right angle bracket ">", and what is inside of these symbols is called a markup. There are various types of markups we will learn.

The Document Type Declaration (DTD)

XML is released as a version. Because there can be various versions, the first line that can be processed in an XML file must specify the version of XML you are using. At the time of this writing, the current widely supported version is 1.0. When creating an XML file, you should (should in 1.0 but must in 1.1) specify what version your file is conform with, especially if you are using a version higher than 1.0. For this reason, an XML file should start (again, must, in 1.1), in the top section, with a line known as an XML declaration. It starts with <?xml version=, followed by the version you are using, assigned as a string, and followed by ?>. An example of such a line is:

<?xml version="1.0"?>

By default, an XML file created using C++Builder specifies the version as 1.0. Under the XML declaration line, you can then create the necessary tags of the XML file.

Encoding Declaration

As mentioned already, the tags are created using characters of the alphabet and conform to the ISO standard. This is known as the encoding declaration. For example, most of the characters used in the US English language are known as ASCII. These characters use a combination of 7 bits to create a symbol (because the computer can only recognize 8 bits, the last bit is left for other uses). Such an encoding is specified as UTF-8. There are other standards such as UTF-16 (for wide, 2-Byte, characters).

To specify the encoding you are using, type encoding followed by the encoding scheme you are using, which must be assigned as a string. The encoding is specified in the first line. Here is an example:

<?xml version="1.0" encoding="utf-8"?>

Creating an XML File

Due to the high level of support of XML, there are various ways you can create an XML file. For example, you can use a simple text editor such as Notepad. An XML file is first of all a normal text-based document that has a .xml extension. Therefore, however you create it, it must specify that extension.

Many other applications allow creating an XML file or generating one from an existing file. There are also commercial editors you can get or purchase to create an XML file.

Practical LearningPractical Learning: Creating an XML File

  1. On the main menu, click File -> New -> Other...
  2. In the Items Category list, click Web Documents
  3. Click XML File and click OK.
    Notice that the file already contains the encoding

Introduction to the Document Object Model

When you create an XML file, there are standard rules you should (must) follow in order to have a valid document. The standards for an XML file are defined by the W3C Document Object Model (DOM). To support these standards, the VCL provides the TXMLDocument class. This class allows you to create an XML document, to populate it with the desired content, and to perform many other related operations on the contents of the file.

In the Tool Palette, the Document Object Model is represented by the TXMLDocument control. Therefore, to initiate support for XML in your application, from the Internet section of the Tool Palette, you can click TXMLDocument TXMLDocument and click the form. This would add a TXMLDocument variable to the header file of the form:


#ifndef Unit1H
#define Unit1H
#include <Classes.hpp>
#include <Controls.hpp>
#include <StdCtrls.hpp>
#include <Forms.hpp>
class TForm1 : public TForm
__published:	// IDE-managed Components
	TXMLDocument *XMLDocument1;
private:	// User declarations
public:		// User declarations
	__fastcall TForm1(TComponent* Owner);
extern PACKAGE TForm1 *Form1;

Of course, you can (should) change the name of the variable in the Object Inspector.

To programmatically initiate XML, declare a variable of type TXMLDocument. The TXMLDocument class is created as follows:

class TXMLDocument : public TComponent, 
		     public IInterface, 
		     public IXMLDocument, 
		     public IXMLDocumentAccess;

As you can see, the TXMLDocument class inherits the VCL objects characteristics from TComponent and implements an interface named IXMLDocument. The TXMLDocument class is defined in the XMLDoc.hpp header file, which means you should include it in your file. Here is an example:


#include <vcl.h>
#include <XMLDoc.hpp>
#pragma hdrstop

#include "Exercise.h"
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
__fastcall TForm1::TForm1(TComponent* Owner)
	: TForm(Owner)
void __fastcall TForm1::btnDocumentClick(TObject *Sender)
    TXMLDocument *docMusic;

After using a TXMLDocument variable, you must reclaim the memory it was using. To take care of this, when declaring the variable, you must indicate what container owns it, so that that container would get rid of the XML document when it is deleted. To support this, the TXMLDocument is equipped with two constructors and one of them uses the following syntax:

TXMLDocument(TComponent AOwner);

This constructor allows you to specify what control owns your TXMLDocument variable. In a typical application, this container would be a frame or a form. Here is an example:

void __fastcall TfrmMain::btnDocumentClick(TObject *Sender)
    TXMLDocument *docMusic = new TXMLDocument(frmMain);

Of course, you can specify the this pointer as the owner.

As mentioned already, in order to use XML, you must a parser you will use. When you install C++Builder, it sets up a few parsers. If you add a TXMLDocument object to your form, it selects MSXML as the default. If you want to use another, in the Object Inspector, access the DOMVendor property and select the desired one:

DOM Vendor

Writing XML Code


Writing XML Code

As mentioned previously, you can use XML created by someone else or you can create your own. XML is created as a normal text document. Therefore, to create code for XML, you can use any regular text editor such as Notepad. You can start an XML file in C++Builder (File -> Other..., We Documents -> XML File) and add the necessary code to it.

Writing XML Code Using TXMLDocument

The content of an XML document is considered a (long) string with different sections. To support this concept of a (single) string, the TXMLDocument class is equipped with a property named XML that is a collection of strings:

__property TStrings XML;

To use this property, you can create a TStrings list, add the necessary strings to it, and then assign that collection to the TXMLDocument::XML property. If you add a TXMLDocument object to your application, to create its code, you can access its Object Inspector, and click the ellipsis button of the XML field. This would open the String List Editor dialog box with an empty section where you can write your XML code:

String List Editor

If course if you thing the area in the String List Editor is not big enough, you can click the Code Editor button to open a wider window.

Besides the XML property, the TXMLDocument class is equipped with a method named LoadFromXML() that is overloaded with two versions. Their syntaxes are:

LoadFromXML(string XMLconst);
LoadFromXML(DOMString XMLconst);

The first method takes a string as argument. The second version takes a DOMString value as argument. The TXMLDocument.LoadFromXML() method allows you to specify the code of an XML document. The code can be created as argument. You can also first declare and initialize an AnsiString variable with the XML code, then pass it as argument to the method.

Opening an XML File



You will create most (or all) of the files we will use in our lessons. Still, sometimes you will encounter an XML created by someone else. Regardless of how you get an XML file, you have various options to open it to see its content. The easiest way to open an XML file is to use a text editor, such as Notepad. The Code Editor is another candidate. To open an XML file, on the main menu, you can click File -> Open ..., locate the file, and click Open.

Unless you programmatically and temporarily create XML code that is not saved, XML is usually used as a regular computer file, with a name. To support this concept, the TXMLDocument class is equipped with a string property named FileName. If you know the file you want to use, after adding a TXMLDocument object to a form or container, to specify the name of the file or its path, in the Object Inspector, access the FileName field and enter the file name or the complete path. Here is an example:

File Name

You can also click the ellipsis button. This would display the Open XML Document dialog box that allows you to select the desired file. Here is an example:

Open XML Document

To programmatically specify the file, after declaring your TXMLDocument variable, assign its name or its (complete) path to the TXMLDocument.Filename property. Here is an example:

void __fastcall TfrmMain::btnDocumentClick(TObject *Sender)
    TXMLDocument *docMusic = new TXMLDocument(this);

    docMusic->FileName = L"E:\\Programs\\music.xml";

Instead of using the FileName property, you can specify the name or path of the file you want to use when creating the XML document. To support this, the TXMLDocument class is equipped with another constructor whose syntax is:

TXMLDocument(DOMString AFileNameconst);

Here is an example of creating an XML document object using this constructor:

void __fastcall TfrmMain::btnDocumentClick(TObject *Sender)
    TXMLDocument *docMusic = new TXMLDocument(L"E:\\Programs\\music.xml");

After doing any of these, you can use the XML document as you see fit.

An XML File in a Browser

Another way you can open an XML file is to display it in a browser. To do this, if you see the file in Windows Explorer or in My Documents, you can double-click it. Here is an example:

Programmatically Opening an XML File Using the DOM

At times, you will need to programmatically access an XML file. To support this operation, the TXMLDocument class provides the LoadFromFile() method. Its syntax is:

LoadFromFile(DOMString AFileNameconst = '');

This method takes as argument the name or path of the file. Here is an example of calling it:

void __fastcall TfrmMain::btnDocumentClick(TObject *Sender)
    TXMLDocument *docVideos = new TXMLDocument(frmMain);


In this case, the compiler would look for the file in the folder of the current application. You can also provide a complete path to the file. Either way, if the compiler does not find the file, it would throw a EDOMParseError exception. For this reason, it is cautious to first check that the file exists before opening it.

You can also use a TStream-based object to identify the file. Once the object is ready, you can use the following  method to open it:

LoadFromStream(TStream Streamconst,
	       TXMLEncodingType EncodingType = xetUnknown);

This method expects a TStream type of object.

Programmatically Reading an XML File

Many of the XML files you encounter will have been created by someone else. Still, because it is primarily a text document, you are expected to be able to read any XML file and figure out its content. As mentioned already, you can open an XML file using a text editor such as Notepad. After opening the file, you can check the document declaration, then move to other items.

Another way you can explore an XML file consists of programmatically reading it. This is also referred to as parsing (the parser parses the document). 

Saving an XML File



As mentioned already, to create XML code, you can use any text editor such as notepad. After writing the code, you can save it. When saving it, you can include the name of the file in double-quotes:

Save As

You can also first set the Save As Type combo box to All Files and then enter the name of the file with the .xml extension.

Practical Learning Practical Learning: Saving an XML File

  1. To save the file, on the Standard toolbar, click the Save button Save
  2. Set the name of the file to students.xml
    Save As
  3. Click Save

Saving a DOM Object

If you call the TXMLDocument.LoadFromXML() method, only the XML code is created, not the file. To actually create the Windows file, you can call the SaveToFile() method. Its syntax is:

SaveToFile(const DOMString AFileName = '');

The argument can be a valid filename and if so, must include the .xml extension. You can create the file in the folder of the current project or in another folder by passing the complete path. The argument is optional. If you do not pass it, the compiler would use the name specified as the FileName property.

XML Well-Formed


Tag Creation

Earlier, we mentioned that XML worked through markups. A simple markup is made of a tag created between the left angle bracket "<" and the right angle bracket ">". Just creating a markup is not particularly significant. You must give it meaning. To do this, you can type a number, a date, or a string on the right side of the right angle bracket ">" symbol. The text on the right side of ">" is referred to as the item's text. It is also called a value.

After specifying the value of the markup, you must close it: this is a rule not enforced in HTML but must be respected in XML to make it "well-formed". To close a tag, use the same formula of creating a tag with the left angle bracket "<", the tag, and the right angle bracket ">" except that, between < and the tag, you must type a forward slash. The formula to use is:

<tag>some value</tag>

The item on the left side of the "some value" string, in this case <tag>, is called the opening or start-tag. The item on the right side of the "some value" string, in this case </tag>, is called the closing or end-tag. Like<tag> is a markup, </tag> also is called a markup.

With XML, you create your own tags with custom names. This means that a typical XML file is made of various items. Here is an example:

<title>The Distinguished Gentleman</title>
<director>Jonathan Lynn</director>
<length>112 Minutes</length>

Tag Names

When creating your tags, there are various rules you must observe with regards to their names. Unlike HTML, XML is very restrictive with its rules. For example, unlike HTML but like C/C++/C#, XML is case-sensitive. This means that CASE, Case, and case are three different words. Therefore, from now on, you must pay close attention to what you write inside of the < and the > delimiters.

Besides case sensitivity, there are some rules you must observe when naming the tags of your markups:

  • The name of a tag must be in one word, no space in the name
  • The name must start with an alphabetic letter or an underscore - Examples are <Country> or <_salary>
  • The first letter or underscore that starts a name can be followed by:
    • Letters - Example: <OperatingSystem>
    • Digits - Example: <L153>
    • Hyphens - Example: <TV-Rating>
    • Underscores - Example: <Chief_Accountant>
  • The name of a tag cannot start with xml, XML or any combination of X (uppercase or lowercase), followed by M (uppercase or lowercase), and followed by L (uppercase or lowercase)

In our lessons, here are the rules we will apply:

  • Sometimes a name will be made of lowercase only
  • Sometimes a name will start in uppercase (most of the time) or lowercase
  • When a name is a combination of words, such as [hourly salary], we will start each part in uppercase. Examples will be HourlySalary or DateOfBirth

In future sections, we will learn that, with some markups, you can include non-readable characters between the angle brackets. In fact, you will need to pay close attention to the symbols you type in a markup. We will also see how some characters have special meaning.

The Root

Every XML document must have one particular tag that, either is the only tag in the file, or acts as the parent of all the other tags of the same document. This tag is called the root. Here is an example of a file that has only one tag:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>

This would produce:


If there are more than one tag in the XML file, one of them must serve as the parent or root. Otherwise, you would receive an error. Based on this rule, the following XML code is not valid:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>
<square>A square is a rectangle whose 4 sides are equal</square>

This would produce:

An ill-formed XML file in a Browser

 To correct this type of error, you can change one of the existing tags to act as the root (or as the parent). In the following example, the <rectangle> tag acts as the parent of <square>:

<rectangle>A rectangle is a shape with 4 sides and 4 straight angles
<square>A square is a rectangle whose 4 sides are equal</square></rectangle>

This would produce:

Good Nested Tags

Alternatively, you can create a tag that acts as the parent for the other tags. In the following example, the <geometry> tag acts as the parent of the <rectangle> and of the <square> tags:

<geometry><rectangle>A rectangle is a shape with 4 sides and 4 straight angles
</rectangle><square>A square is a rectangle whose 4 sides are equal</square></geometry>

This would produce:


As mentioned already, a good XML file should have a Document Type Declaration:

<?xml version="1.0" encoding="utf-8"?><geometry><rectangle>A rectangle 
is a shape with 4 sides and 4 straight angles</rectangle><square>A 
square is a rectangle whose 4 sides are equal</square></geometry>

To give you access to the root of an XML file, the XmlDocument class is equipped with the DocumentElement property.

Practical Learning Practical Learning: Creating the Root Tag

  1. In the students.xml file, click under the top line and type <students></students>
  2. Save the file

The Structure of an XML Tag


Empty Tags

We mentioned that, unlike HTML, every XML tag must be closed. We also saw that the value of a tag was specified on the right side of the right angle bracket of the start tag. In some cases, you will create a tag that doesn't have a value or, may be for some reason, you don't provide a value to it. Here is an example:


This type of tag is called an empty tag. Since there is no value in it, you may not need to provide an end tag but it still must be closed. Although this writing is allowed, an alternative is to close the start tag itself. To do this, between the tag name and the right angle bracket, type an empty space followed by a forward slash. Based on this, the above line can be written as follows:

<dinner />

Both produce the same result or accomplish the same role.

Practical Learning Practical Learning: Creating Empty Tags

  1. To create a tag, change the students.xml file as follows:
    <?xml version="1.0" encoding="utf-8"?>
  2. Save the file

White Spaces

In the above example, we typed various items on the same line. If you are creating a long XML document, although creating various items on the same line is acceptable, this technique can make it (very) difficult to read. One way you can solve this problem is to separate tags with empty spaces. Here is an example:

<title>The Distinguished Gentleman</title> 
	<director>Jonathan Lynn</director>
		<length>112 Minutes</length>

Yet a better solution consists of typing each item on its own line. This would make the document easier to read. Here is an example:

<title>The Distinguished Gentleman</title>
<director>Jonathan Lynn</director>
<length>112 Minutes</length>

All these are possible and acceptable because the XML parser doesn't consider the empty spaces or end of line. Therefore, to make your code easier to read, you can use empty spaces, carriage-return-line-feed combinations, or tabs inserted in various sections. All these are referred to as white spaces.

Nesting Tags

Most XML files contain more than one tag. We saw that a tag must have a starting point and a tag must be closed. One tag can be included in another tag: this is referred to as nesting. A tag that is created inside of another tag is said to be nested. A tag that contains another tag is said to be nesting. Consider the following example:

<Smile>Please smile to the camera</Smile>
<English>Welcome to our XML Class</English>
<French>Bienvenue à notre Classe XML</French>

In this example, you may want the English tag to be nested in the Smile tag. To nest one tag inside of another, you must type the nested tag before the end-tag of the nesting tag. For example, if you want to nest the English tag in the Smile tag, you must type the whole English tag before the </Smile> end tag. Here is an example:

<Smile>Please smile to the camera<English>Welcome to our XML Class</English></Smile>

To make this code easier to read, you can use white spaces as follows:

<smile>Please smile to the camera
<English>Welcome to our XML Class</English>

When a tag is nested, it must also be closed before its nesting tag is closed. Based on this rule, the following code is not valid:

<Smile>Please smile to the camera
<English>Welcome to our XML Class

The rule broken here is that the English tag that is nested in the the Smile tag is not closed inside the Smile tag but outside.

Once you have decided on the structure of your XML file, we save that you can create it in memory using the LoadFromXML() method. For example, the following XML code:

<?xml version="1.0" encoding="utf-8"?>
	<artist>Vincent Nguini</artist>
	<publisher>Mesa Records</publisher>
	<artist>Debbie Gibson</artist>

can be created in memory as follows:

void __fastcall TForm1::btnDocumentClick(TObject *Sender)
    const AnsiString strXML = L"<?xml version=\"1.0\" encoding=\"utf-8\"?>"
			      L"<artist>Vincent Nguini</artist>"
			      L"<publisher>Mesa Records</publisher></album>"
			      L"<title>None</title><artist>Debbie Gibson</artist>"

Notice that the whole XML code can be created as one line of text and the code would be valid.

Practical Learning Practical Learning: Creating XML

  1. To apply the concept of nesting XML tags, change the students.xml file as follows:
    <?xml version="1.0" encoding="utf-8"?>
  2. Save the file

An XML Node



Consider the following example of an XML file named Videos.xml:

<?xml version="1.0" encoding="utf-8" ?>
    	<title>The Distinguished Gentleman</title>
    	<director>Jonathan Lynn</director>
    	<length>112 Minutes</length>
    	<title>Her Alibi</title>
    	<director>Bruce Beresford</director>
	<length>94 Mins</length>
        <title>Chalte Chalte</title>
	<director>Aziz Mirza</director>
	<length>145 Mins</length>

An XML file appears as an upside-down tree: it has a root (in this case <Videos>), it can have branches (an example in this case is <Video>), and it can have leaves (an example in this case is <Title>). As we have seen so far, all of these objects are created using the same technique: a tag with a name (such as <Title>) and an optional value. Based on their similarities, each of these objects is called a node.

A Class for a Node

To support nodes of an XML file, the VCL provides the TXMLNode class. To get a node, you must have an object that would produce one and you can only retrieve a node from an (existing) object.

The TXMLNode class starts as follows:

class TXMLNode : public TInterfacedObject, 
		 public IXMLNode,
		 public IXMLNodeAccess;

Notice that this class implements an interface named IXMLNode. That is where it gets all of its functionality.

The Document Element of a Node

In previous sections, we saw that every XML file must have a root. To support the root node, the TXMLDocument is equipped with a property named DocumentElement. This property is declared as follows:

__property IXMLNode DocumentElement;

This property is of type IXMLNode. Here is an example of getting it:

void __fastcall TfrmMain::btnDocumentClick(TObject *Sender)
    TXMLDocument *docVideos = new TXMLDocument(frmMain);


    IXMLNode *nodVideo = docVideos->DocumentElement;



Home Copyright © 2008-2009 FunctionX, Inc. Next