Home

Introduction to XML

 

The Extensible Markup Language

 

Introduction

The eXtensible Markup Language or XML is a technique of using a document, such as a text file, to describe information and making that information available to whatever and whoever can take advantage of it. The description is done so the document can be created by one person or company and used by another person or another company without having to know who first created the document or how it works. This is because the document thus created is not a program, it is not an application: it is just a text-based document.

Because XML is very flexible, it can be used in regular Windows applications, in databases, in web-based systems (Internet), in communication applications, in computer networks, in scientific applications, etc. To make sure that XML can be universally used without one person or group owning it, it is standardized by the W3C organization. XML is released through an XML Recommendation document with a version.

In our lessons, we will learn or use XML through the .NET Framework classes. The particularity is that these classes are highly structured to take care of all facets of XML without compromising the standards. In fact, the .NET Framework classes are highly conform to the W3C standards in all areas.

To create an XML file, in the document, you type units of codes using normal characters of the English language. The XML document is made of units called entities. These entities are spread on various lines of the document as you judge them necessary and as we will learn. XML has strict rules as to how the content of the document should or must be structured.

After an XML document has been created and is available, in order to use it, you need a program that can read, analyze, and interpret it. This program is called a parser. The most popular parser used in Microsoft Windows applications is MSXML, published by Microsoft.

Markup

A markup is an instruction that defines XML. To fundamental formula of a markup is:

<tag>

The left angle bracket "<" and the right angle bracket ">" are required. Inside of these symbols, you will type a word or a group of words of your choice, using regular characters of the English alphabet and sometimes non-readable characters such as ?, !, or [. The combination of a left angle bracket "<", the right angle bracket ">", and what is inside of these symbols is called a markup. There are various types of markups we will learn.

The Document Type Declaration

As mentioned above, XML is released as a version. Because there can be various versions, the first line that can be processed in an XML file must specify the version of XML you are using. At the time of this writing, the widely supported version of the .NET Framework is 1.0. When creating an XML file, you should (should in 1.0 but must in 1.1) specify what version your file is conform with, especially if you are using a version higher than 1.0. For this reason, an XML file should start (again, must, in 1.1), in the top section, with a line known as an XML declaration. It starts with <?xml version=, followed by the version you are using, assigned as a string, and followed by ?>. An example of such a line is:

<?xml version="1.0"?>

By default, an XML file created using Visual Studio .NET 2003 specifies the version as 1.0. Under the XML declaration line, you can then create the necessary tags of the XML file.

Encoding Declaration

As mentioned already, the tags are created using characters of the alphabet and conform to the ISO standard. This is known as the encoding declaration. For example, most of the characters used in the English language are known as ASCII. These characters use a combination of 7 bits to create a symbol (because the computer can only recognize 8 bits, the last bit is left for other uses). Such an encoding is specified as UTF-8. There are other standards such as UTF-16 (for wide, 2-Byte, characters).

To specify the encoding you are using, type encoding followed by the encoding scheme you are using, which must be assigned as a string. The encoding is specified in the the first line. Here is an example:

<?xml version="1.0" encoding="utf-8"?>
 

Copyright © 2005-2012 FunctionX Next