Introduction to XML

As I previously blogged about, I am currently in an XML training class. So I thought I would share some of the things I learned today. In fact, this post will cover what I learned this morning. The class costs over $2600 over 4 days. That's averages out to over $650 a day. This morning information cost my company $325. So listen up and learn a thing or two free of charge.

XML is a method of data transfer that intends to be simple and cost effective. It is text based. XML is an open standard. It is a subset of SGML. Although it is very powerful, SGML is complex. Only one person in my class of 24 knew SGML well enough to confess it. A related markup language is HTML, which is an implementation of SGML.

A newer version of HTML called XHTML has a stricter syntax to conform to XML. HTML is for presentation in a web browser. XML on the other hand is for expressing data. Another data transfer method is Electronic Data Interchange (EDI). EDI is expensive and more complex. Let's now focus in on some XML Details.

The basic unit of XML is an element. An element is demarcated by a start and end tag. Elements can have optional attributes. The outermost element in an XML file is called the root or document element. XML is case sensitive. One way to process XML files is by use of a parser. Microsoft provides the MSHTML parser with Internet Explorer. And Apache has the free Xerces parser.

You can put text in an element. XML calls this parsed character data (PCDATA). Some special codes within PCDATA expand to reserved characters. These are called entities. An example of an entity is < which is the less than sign (<). You can put special characters in data. But you then need to surround them with an unparsed character data tag (CDATA).

XML documents are encoded in Unicode. You can choose a format such as UTF-8 or UTF-16. An XML document is called well formed if it has a correct syntax. I will not go into the detail of all the XML syntax rules. However they are few and we learned them all in the morning session. I will post some more later on further XML topics such as XPath and XQuery. Until then cheers.