Home

The Elements of an XML Document

 

Elements Fundamentals

 

Introduction

An element in an XML document is an object that begins with a start-tag, may contain a value, and may terminate with an end-tag. Based on this, the combination of a start-tag, the value, and the end-tag is called an element. An element can be more than that but for now, we will consider that an element is primarily characterized by a name and possibly a value.

 

The Name of an Element

The name of an element is the string that represents the tag. For example, in <Director>, the word Director is the name of the element. An element must have at least a start-tag. All of the tags we have seen so far were created as elements. When creating your elements, remember to follow the rules we defined for names.

The Text or Value of an Element

The value of an element is the item displayed on the right side of the start-tag. It is also called the text of the element. In the case of <director>Jonathan Lynn</director>, the "Jonathan Lynn" string is the value of the director element.

While the value of one element can be a number, the value of another element can be a date. Yet another element can use a regular string as its value. Consider the following example:

<?xml version="1.0" encoding="utf-8"?>
<videos>
    <video>
	<title>The Distinguished Gentleman</title>
	<director>Jonathan Lynn</director>
	<LengthInMinutes>112</LengthInMinutes>
	<format>DVD</format>
	<rating>R</rating>
	<price>14.95</price>
    </video>
    <video>
	<title>Her Alibi</title>
	<director>Bruce Beresford</director>
	<LengthInMinutes>94</LengthInMinutes>
	<format>VHS</format>
	<rating>PG-13</rating>
	<price>9.95</price>
    </video>
</videos>

Notice that the price elements contain numbers that look like currency values and the LengthInMinutes elements use an integer as value.

An element may not have a value but only a name. Consider the following example:

<?xml version="1.0" encoding="utf-8"?>
<videos>
  <video>
    <vitle>The Distinguished Gentleman</title>
    <director>Jonathan Lynn</director>
  </video>
</videos>

In this case, the video element doesn't have a value. It is called an empty element.

Character Entities in an Element Value

Besides the obvious types of values, you may want to display special characters as values of elements. Consider the following example:

<?xml version="1.0" encoding="utf-8" ?>
<Employees>
    <Employee>
	<FullName>Sylvie <Bellie> Aronson</FullName>
	<Salary>25.64</Salary>
	<DepartmentID>1</DepartmentID>
    </Employee>
    <Employee>
	<FullName>Bertrand Yamaguchi</FullName>
	<Salary>16.38</Salary>
	<DepartmentID>4</DepartmentID>
    </Employee>
</Employees>

If you try using this XML document, for example, if you try displaying it in a browser, you would receive an error:

The reason is that when the parser reaches the <FullName>Sylvie <Bellie> Aronson</FullName> line, it thinks that <Bellie> is a tag but then <Bellie> is not closed. The parser concludes that the document is not well-formed, that there is an error. For this reason, to display a special symbol as part of a value, you can use its character code. For example, the < (less than) character is represented with &lt and the > (greater than) symbol can be used with &gt;. Therefore, the above code can be corrected as follows:

<?xml version="1.0" encoding="utf-8" ?>
<Employees>
    <Employee>
	<FullName>Sylvie &lt;Bellie&gt; Aronson</FullName>
	<Salary>25.64</Salary>
	<DepartmentID>1</DepartmentID>
    </Employee>
    <Employee>
	<FullName>Bertrand Yamaguchi</FullName>
	<Salary>16.38</Salary>
	<DepartmentID>4</DepartmentID>
    </Employee>
</Employees>

This would produce:

Here is a list of other codes you can use for special characters:

Code Symbol Code Symbol Code Symbol Code Symbol Code Symbol
&apos; ' &#067; C &#106; j &#179; ³ &#218; Ú
&lt; < &#068; D &#107; k &#180; ´ &#219; Û
&gt; > &#069; E &#108; l &#181; µ &#220; Ü
&amp; & &#070; F &#109; m &#182; &#221; Ý
&quot; " &#071; G &#110; n &#183; · &#222; Þ
&#033; ! &#072; H &#111; o &#184; ¸ &#223; ß
&#034; " &#073; I &#112; p &#185; ¹ &#224; à
&#035; # &#074; J &#113; q &#186; º &#225; á
&#036; $ &#075; K &#114; r &#187; » &#226; â
&#037; % &#076; L &#115; s &#188; ¼ &#227; ã
&#038; & &#077; M &#116; t &#189; ½ &#228; ä
&#039; ' &#078; N &#117; u &#190; ¾ &#229; å
&#040; ( &#079; O &#118; v &#191; ¿ &#230; æ
&#041; ) &#080; P &#119; w &#192; À &#231; ç
&#042; * &#081; Q &#120; x &#193; Á &#232; è
&#043; + &#082; R &#121; y &#194; Â &#233; é
&#044; , &#083; S &#122; z &#195; Ã &#234; ê
&#045; - &#084; T &#123; { &#196; Ä &#235; ë
&#046; . &#085; U &#125; } &#197; Å &#236; ì
&#047; / &#086; V &#126; ~ &#198; Æ &#237; í
&#048; 0 &#087; W &#160; empty &#199; Ç &#238; î
&#049; 1 &#088; X &#161; ¡ &#200; È &#239; ï
&#050; 2 &#089; Y &#162; ¢ &#201; É &#240; ð
&#051; 3 &#090; Z &#163; £ &#202; Ê &#241; ñ
&#052; 4 &#091; [ &#164; ¤ &#203; Ë &#242; ò
&#053; 5 &#092; \ &#165; ¥ &#204; Ì &#243; ó
&#054; 6 &#093; ] &#166; ¦ &#205; Í &#244; ô
&#055; 7 &#094; ^ &#167; § &#206; Î &#245; õ
&#056; 8 &#095; _ &#168; ¨ &#207; Ï &#246; ö
&#057; 9 &#096; ` &#169; © &#208; Ð &#247; ÷
&#058; : &#097; a &#170; ª &#209; Ñ &#248; ø
&#059; ; &#098; b &#171; « &#210; Ò &#249; ù
&#060; < &#099; c &#172; ¬ &#211; Ó &#250; ú
&#061; = &#100; d &#173; ­ &#212; Ô &#251; û
&#062; > &#101; e &#174; ® &#213; Õ &#252; ü
&#063; ? &#102; f &#175; ¯ &#214; Ö &#253; ý
&#064; @ &#103; g &#176; ° &#215; × &#254; þ
&#065; A &#104; h &#177; ± &#216; Ø &#255; ÿ
&#066; B &#105; i &#178; ² &#217; Ù &#256; Ā

There are still other codes to include special characters in an XML file.

The  Child Nodes of  a Node

 

Introduction

As mentioned already, one node can be nested inside of another. A nested node is called a child of the nesting node. This also implies that a node can have as many children as necessary, making them child nodes of the parent node. Once again, consider our videos.xml example:

<?xml version="1.0" encoding="utf-8"?>
<videos>
    <video>
	<title>The Distinguished Gentleman</title>
	<director>Jonathan Lynn</director>
	<length>112 Minutes</length>
	<format>DVD</format>
	<rating>R</rating>
    </video>
    <video>
	<title>Her Alibi</title>
	<director>Bruce Beresford</director>
	<length>94 Mins</length>
	<format>DVD</format>
	<rating>PG-13</rating>
    </video>
    <video>
	<title>Chalte Chalte</title>
	<director>Aziz Mirza</director>
	<length>145 Mins</length>
	<format>DVD</format>
	<rating>N/R</rating>
    </video>
</videos>

The title and the director nodes are children of the video node. The video node is the parent of both the title and the director nodes.

Not all nodes have children, obviously. For example, the title node of our videos.xml file does not have children.

 

 

 

The First Child Node

The children of a nesting node are also recognized by their sequence. For our videos.xml file, the first line is called the first child of the DOM. This would be:

<?xml version="1.0" encoding="utf-8"?>

After identifying or locating a node, the first node that immediately follows it is referred to as its first child. In our videos.xml file, the first child of the first video node is the <title>The Distinguished Gentleman</title> element. The first child of the second <video> node is <title>Her Alibi</title>.

In this example, we started our parsing on the root node of the document. At times, you will need to consider only a particular node, such as the first child of a node. For example, you may want to use only the first child of the root.

Consider the following modification of the Videos.xml file:

<?xml version="1.0" encoding="utf-8" ?>
<Videos>
    <Video>
	<Title>The Distinguished Gentleman</Title>
	<Director>Jonathan Lynn</Director>
	<CastMembers>
	    <Actor>Eddie Murphy</Actor>
	    <Actor>Lane Smith</Actor>
	    <Actor>Sheryl Lee Ralph</Actor>
	    <Actor>Joe Don Baker</Actor>
	    <Actor>Victoria Rowell</Actor>
	</CastMembers>
	<Length>112 Minutes</Length>
	<Format>DVD</Format>
	<Rating>R</Rating>
    </Video>
    <Video>
	<Title>Her Alibi</Title>
	<Director>Bruce Beresford</Director>
	<Length>94 Mins</Length>
	<Format>DVD</Format>
	<Rating>PG-13</Rating>
    </Video>
    <Video>
	<Title>Chalte Chalte</Title>
	<Director>Aziz Mirza</Director>
	<Length>145 Mins</Length>
	<Format>DVD</Format>
	<Rating>N/R</Rating>
    </Video>
</Videos>

As we have learned that a node or a group of nodes can be nested inside of another node. When you get to a node, you may know or find out that it has children. You may then want to consider only the first child.

The Last Child Node

As opposed to the first child, the child node that immediately precedes the end-tag of the parent node is called the last child.

The Siblings of a Node

The child nodes that are nested in a parent node and share the same level are referred to as siblings. Consider the above file: Director, CastMembers, and Length are child nodes of the Video node but the Actor node is not a child of the Video node. Consequently, Director, CastMembers, and Length are siblings. Obviously, to get a sibling, you must first have a node.

 
 
   
 

Previous Copyright © 2009-2016, FunctionX, Inc. Next