Introduction to the Extensible Markup Language
The Extensible Markup Language
Introduction to XML
The Extensible Markup Language, or XML, is a technique of using a document, such as a text file, to describe information and make that information available to whatever and whoever can take advantage of it. The description is done so the document is just text-based.
Because XML is very flexible and can be used in various types of applications. XML is standardized by the W3C (http://www.w3c.org) organization. XML is released through an XML Recommendation document with a version.
We will use XML through the .NET Framework classes. The particularity is that these classes are highly structured to take care of all facets of XML without compromising the standards.
To create an XML file, start a text editor and type the necessary code. The XML document is made of units called entities. These entities are spread on various lines of the document as you judge them necessary. XML has strict rules as to how the contents of the document should or must be structured.
After an XML document has been created and is available, in order to use it, you need a program that can read, analyze, and interpret it. This program is called a parser. The most popular parser used in Microsoft Windows applications is MSXML, published by Microsoft.
A markup is an instruction that defines XML. The fundamental formula of a markup is:
The left angle bracket "<" and the right angle bracket ">" are required. Inside of these symbols, you type a word or a group of words of your choice, using regular characters of the English alphabet and sometimes non-readable characters such as ?, !, or [. The combination of a left angle bracket "<", the right angle bracket ">", and what is inside of these symbols is called a markup. There are various types of markups we will learn.
The Document Type Declaration (DTD)
As mentioned above, XML is released as a version. Because there can be various versions, the first line that can be processed in an XML file must specify the version of XML you are using. At the time of this writing, the widely supported version of the .NET Framework is 1.0. When creating an XML file, you should (should in 1.0 but must in 1.1) specify what version your file is conform with, especially if you are using a version higher than 1.0. For this reason, an XML file should start (again, must, in 1.1), in the top section, with a line known as an XML declaration. It starts with <?xml version=, followed by the version you are using, assigned as a string, and followed by ?>. An example of such a line is:
As mentioned already, the tags are created using characters of the alphabet and conform to the ISO standard. This is known as the encoding declaration. For example, most of the characters used in the US English language are known as ASCII. These characters use a combination of 7 bits to create a symbol (because the computer can only recognize 8 bits, the last bit is left for other uses). Such an encoding is specified as UTF-8. There are other standards such as UTF-16 (for wide, 2-Byte, characters).
To specify the encoding you are using, type encoding followed by the encoding scheme you are using, which must be assigned as a string. The encoding is specified in the first line. Here is an example:
<?xml version="1.0" encoding="utf-8"?>
Creating an XML File
You can create and use an XML file using a simple text editor. In Microsoft Windows, this would be Notepad. There are various text editors in Linux. An XML file is first of all a normal text-based document that has a .xml extension.
Writing XML Code
Introduction to the Document Object Model
To implement XML, the .NET Framework provides the System.Xml namespace. Therefore, you should include that namespace in the code of your webpage. Here is an example:
<%@ Page Language="VB" %> <%@ Import Namespace="System.Xml" %> <!DOCTYPE html> <html> <head runat="server"> <title>Exercise</title> </head> <body> </body> </html>
The System.Xml namespace is defined in the System.Xml.dll library.
When you create an XML file, there are standard rules you should (must) follow in order to have a valid document. The standards for an XML file are defined by the W3C Document Object Model (DOM). To support these standards, the System.Xml namespace provides a class named XmlDocument. This class allows you to create an XML document, to populate it with the desired contents, and to perform many other related operations on the contents of the file. Here is an example of declaring a variable of type XmlDocument:
<%@ Page Language="VB" %> <%@ Import Namespace="System.Xml" %> <!DOCTYPE html> <html> <head runat="server"> <title>Video Collection</title> </head> <body> <h3>Video Collection</h3> <% Dim chemistry As New XmlDocument() %> </body> </html>
Creating XML Code Using XmlDocument
To create XML code using XmlDocument, the class has a method called LoadXml(). Its syntax is:
Public Overridable Sub LoadXml(ByVal xml As String)
This method takes a String as argument. The XmlDocument.LoadXml() method doesn't create an XML file, it only allows you to provide or create XML code. The code can be created as argument. You can also first declare and initialize a String variable with the XML code, then pass it as argument to the XmlDocument.LoadXml() method. This can be done as follows:
After launching a text editor, you can start typing XML code.
Saving an XML File
After writing code, you must save the file and give the .xml extension.
Saving a DOM Object
If you call the XmlDocument.LoadXml() method, only the XML code is created, not the file. To let you create the file, the XmlDocument class is equipped with a method named Save. This method is provided in four versions. One of the versions takes as argument a string value. The syntax of this method is:
Public Overridable Sub Save(ByVal filename As String)
The argument must be a valid filename and must include the .xml extension.
Opening an XML File
The easiest way to open an XML file is to use a text editor, such as Notepad.
An XML File in a Browser
Another way you can display an XML file is in a browser. You can either double-click a file in a file utility or right-click it and choose from the menu.
Programmatically Opening an XML File Using the DOM
At times, you will need to programmatically access an XML file. To support this operation, the XmlDocument class is equipped with a method named Load which is available in various versions. One of the syntaxes used by this method is:
Public Overridable Sub Load(filename As String)
This version takes as argument the name or path of the file. Here is an example of calling this method:
<%@ Page Language="VB" %> <%@ Import Namespace="System.Xml" %> <!DOCTYPE html> <html> <head runat="server"> <title>Lambda Square Apartments - Employees</title> </head> <body> <h3>Lambda Square Apartments - Employees</h3> <% Dim xdEmployees As New XmlDocument() Dim strEmployeesFile = Server.MapPath("employees.xml") xdEmployees.Load(strEmployeesFile) %> </body> </html>
You can also use a Stream-based object to identify the file. Once the object is ready, you can use the following version of the Load() method to open it:
Public Overridable Sub Load(inStream As Stream)
This method expects a Stream type of object, such as a FileStream variable. This can be done as follows:
<%@ Page Language="VB" %> <%@ Import Namespace="System.IO" %> <%@ Import Namespace="System.Xml" %> <!DOCTYPE html> <html> <head runat="server"> <title>Watts A Loan - Employees</title> </head> <body> <h3>Watts A Loan - Employees</h3> <% Dim xdEmployees As New XmlDocument() Dim strEmployeesFile = Server.MapPath("employees.xml") Using fsEmployees As New FileStream(strEmployeesFile, FileMode.Open, FileAccess.Read) xdEmployees.Load(fsEmployees) End Using %> </body> </html>
In both cases, if the file is not in the specified locaiton, the server would throw a FileNotFoundException exception. For this reason, it is a good idea to first check that the file exists before opening it. This can be done by calling the File.Exists() method. An alternative is to handle the exception yourself.
Programmatically Reading an XML File
Many of the XML files you encounter will have been created by someone else. Still, because it is primarily a text document, you are expected to be able to read any XML file and figure out its content.
Another way you can explore an XML file consists of programmatically reading it. This is also referred to as parsing (the parser parses the document). To support reading an XML file, the .NET Framework provides an abstract class named XmlReader as the ancestor of classes that can read an XML file:
Public MustInherit Class XmlReader Implements IDisposable
One of the classes derived from XmlReader is called XmlTextReader:
Public Class XmlTextReader Inherits XmlReader Implements IXmlLineInfo, IXmlNamespaceResolver
The XmlTextReader class provides the ability to read the file from the left to the right sides and from the top to the bottom sections. This class has very important characteristics you must remember:
To programmatically read an XML file, you can start by declaring a variable of type XmlTextReader using one of its constructors, including the default. To specify the file you want to open and read, you can use the constructor whose syntax is the following :
Public Sub New(ByVal url As String)
When using this method, pass the name of the file or its path as argument. You can also identify a file using a Stream-based object. Once the object is ready, you can pass it to the following constructor of the class:
Public Sub New(ByVal input As Stream)
To actually read the file, the XmlTextReader is equipped with the Read() method whose syntax is:
Public Overrides Function Read As Boolean
As you may suspect, this method only tells you that it successfully read an item. It doesn't tell you what it read. As stated already, the XmlTextReader scans a file in a left-right-top-down approach. When it has read something, it returns true. If it didn't or couldn't read something, it returns false. Therefore, you can call it to read an item. If it succeeds, it returns true. After reading that item, you can call it again to move to the next item. If there is a next item, it reads it and returns true. But, if there is no next item, the Read() method would not be able to read it and it would return false. In other words, you can ask the Read() method to continuously read the items as long as it returns true. Once it cannot read an item, you can ask it to stop. To perform this exercise, you can use a loop.
To identify what was read, the XmlTextReader provides methods appropriate for the different types of items that an XML file can contain. We will review the types of items of a file.
In XML, a simple markup is made of a tag created between the left angle bracket "<" and the right angle bracket ">". Just creating a markup is not particularly significant. You must give it meaning. To do this, you can type a number, a date, or a string on the right side of the right angle bracket ">" symbol. The text on the right side of ">" is referred to as the item's text. It is also called a value.
After specifying the value of the markup, you must close it: this is a rule not enforced in HTML but must be respected in XML to make it "well-formed". To close a tag, use the same formula of creating a tag with the left angle bracket "<", the tag, and the right angle bracket ">" except that, between < and the tag, you must type a forward slash. The formula to use is:
The item on the left side of the "some value" string, in this case <tag>, is called the opening or start-tag. The item on the right side of the "some value" string, in this case </tag>, is called the closing or end-tag. Like<tag> is a markup, </tag> also is called a markup.
With XML, you create your own tags with custom names. This means that a typical XML file is made of various items. Here is an example:
<title>The Distinguished Gentleman</title> <director>Jonathan Lynn</director><length>112 Minutes</length>
When creating your tags, there are various rules you must observe with regards to their names. Unlike HTML, XML is very restrictive with its rules. For example, unlike HTML, XML is case-sensitive.
Besides case sensitivity, there are some rules you must observe when naming the tags of your markups:
The Root of an XML document
Every XML document must have one particular tag that, either is the only tag in the file, or acts as the parent of all the other tags of the same document. This tag is called the root. Here is an example of a file that has only one tag:
<rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle>
If there are more than one tag in the XML file, one of them must serve as the parent or root. Otherwise, you would receive an error.
As mentioned already, a good XML file should have a Document Type Declaration:
<?xml version="1.0" encoding="utf-8"?><geometry><rectangle>A rectangle is a shape with 4 sides and 4 straight angles</rectangle><square>A square is a rectangle whose 4 sides are equal</square></geometry>
To give you access to the root of an XML file, the XmlDocument class is equipped with the DocumentElement property:
Public ReadOnly Property DocumentElement As XmlElement
The Structure of an XML Tag
Remember that every XML tag must be closed. We also saw that the value of a tag was specified on the right side of the right angle bracket of the start tag. In some cases, you will create a tag that doesn't have a value or, may be for some reason, you don't provide a value to it. Here is an example:
This type of tag is called an empty tag. Since there is no value in it, you may not need to provide an end tag but it still must be closed. Although this writing is allowed, an alternative is to close the start tag itself. To do this, between the tag name and the right angle bracket, type an empty space followed by a forward slash. Based on this, the above line can be written as follows:
Both produce the same result or accomplish the same role.
If you are creating a long XML document, although creating various items on the same line is acceptable, this can make it (very) difficult to read. One way you can solve this problem is to separate tags with empty spaces. Here is an example:
<state>New South Wales</state> <capital>Sydney</capital> <area>312,528 sq mi</area>
Yet a better solution consists of typing each item on its own line. This would make the document easier to read. Here is an example:
<state>New South Wales</state> <capital>Sydney</capital> <area>312,528 sq mi</area>
All these are possible and acceptable because the XML parser doesn't consider the empty spaces or end of line. Therefore, to make your code easier to read, you can use empty spaces, carriage-return-line-feed combinations, or tabs inserted in various sections. All these are referred to as white spaces.
Most XML files contain more than one tag. A tag can be included in another tag: this is referred to as nesting. A tag that is created inside of another tag is said to be nested. A tag that contains another tag is said to be nesting. Consider the following example:
<Smile>Please smile to the camera</Smile> <English>Welcome to our XML Class</English> <French>Bienvenue à notre cours XML</French>
To nest one tag inside of another, you must type the nested tag before the end-tag of the nesting tag. For example, if you want to nest the English tag in the Smile tag, you must type the whole English tag before the </Smile> end tag. Here is an example:
<Smile>Please smile to the camera<English>Welcome to our XML Class</English></Smile>
To make this code easier to read, you can use white spaces as follows:
<smile>Please smile to the camera <English>Welcome to our XML Class</English> </smile>
When a tag is nested, it must also be closed before its nesting tag is closed. Based on this rule, the following code is not valid:
<Smile>Please smile to the camera <English>Welcome to our XML Class </Smile> </English>
The rule broken here is that the English tag that is nested in the the Smile tag is not closed inside the Smile tag but outside.
Once you have decided on the structure of your XML file, we saw that you can create it in memory using the XmlDocument.LoadXml() method. Here is an example, the following XML code:
<?xml version="1.0" encoding="utf-8"?> <musiccollection> <album> <shelfnumber>FJ-7264</shelfnumber> <title>Symphony-Bantu</title> <artist>Vincent Nguini</artist> <copyrightyear>1994</copyrightyear> <publisher>Mesa Records</publisher> </album> <album> <shelfnumber>MR-2947</shelfnumber> <title>None</title> <artist>Debbie Gibson</artist> <copyrightyear>1990</copyrightyear> <publisher>Atlantic</publisher> </album> </musiccollection>
can be created in memory as follows:
<%@ Page Language="VB" %> <%@ Import Namespace="System.Xml" %> <!DOCTYPE html> <html> <head runat="server"> <title>Video Collection</title> </head> <body> <h3>Video Collection</h3> <% Dim docMusicCollection As New XmlDocument() docMusicCollection.LoadXml("<?xml version=""1.0"" encoding=""utf-8""?>" & "<musiccollection><album>" & "<shelfnumber>FJ-7264</shelfnumber>" & "<title>Symphony-Bantu</title>" & "<artist>Vincent Nguini</artist>" & "<copyrightyear>1994</copyrightyear>" & "<publisher>Mesa Records</publisher></album>" & "<album><shelfnumber>MR-2947</shelfnumber>" & "<title>None</title><artist>Debbie Gibson</artist>" & "<copyrightyear>1990</copyrightyear>" & "<publisher>Atlantic</publisher>" & "</album></musiccollection>") %> </body> </html>
The whole XML code can be created as one line of text and the code would be valid. Just like one tag can be nested in another tag, a nested tag can also have one or more tags that are nested in it, and so on.
An XML Node
Introduction to XML Nodes
Consider the following example of an XML file named Videos.xml:
<?xml version="1.0" encoding="utf-8" ?> <periodic-table> <element> <atomic-number>1</atomic-number> <element-name>Hydrogen</element-name> <symbol>H</symbol> <atomic-mass>1.0079</atomic-mass> </element> <element> <atomic-number>2</atomic-number> <element-name>Helium</element-name> <symbol>He</symbol> <atomic-mass>4.002682</atomic-mass> </element> <element> <atomic-number>3</atomic-number> <element-name>Lithium</element-name> <symbol>Li</symbol> <atomic-mass>6.941</atomic-mass> </element> </periodic-table>
An XML file appears as an upside-down tree: it has a root (in this case <Videos>), it can have branches (in this case <Video>), and it can have leaves (an example in this case is <Title>). As we have seen so far, all of these objects are created using the same technique: a tag with a name (such as <Title>) and an optional value. Based on their similarities, each of these objects is called a node.
To support nodes of an XML file, the .NET Framework provides the XmlNode class:
type XmlNode = class interface ICloneable interface IEnumerable interface IXPathNavigable end
The XmlNode class is the ancestor to all types of nodes, including the XmlDocument class:
Public MustInherit Class XmlNode Implements ICloneable, IEnumerable, IXPathNavigable
XmlNode is an abstract class without a constructor. To get a node, you must have an object that would produce one and you can only retrieve a node from an (existing) object.
Introduction to Node Types
An XML node can contain various types of nodes. The categories or possible types of nodes are identified by an enumeration named XmlNodeType. If you use an XmlTextReader object to scan a file, when calling Read(), the class has a property named NodeType that allows you to identify the node that was read. NodeType is a read-only property of type XmlNodeType and it is declared as follows:
Public MustOverride ReadOnly Property NodeType As XmlNodeType
Therefore, when calling the XmlTextReader.Read() method, you can continuously check the value of the XmlTextReader.NodeType property to find out what type of node was just read, and then you can take an appropriate action.
|Previous||Copyright © 2005-2016 FunctionX||Next|