Parsing

I am on my third class of an online XML tutorial. This week’s topic was XML parsing. This covers how to actually use XML. The parser reads the XML. Some web browsers can parse XML. Internet Explorer version 4 was the first browser to include a parser. Expat is a free XML parser. Lark is a non-validating XML parser written in Java. That’s all the class had to say about parsers. It was pretty light.

So I also went to the W3 schools for some XML parser information. All modern browsers have a built in XML parser. The parser can convert XML into a DOM object. The parser traverses XML trees. I also looked up information about the Expat parser since my online course mentioned it. This XML parser is written in the C programming language. It is stream oriented. Applications register handlers, causing the parser to call back to the application when events of interest occur.

Finally I went to Sun Microsystems to brush up on XML parsers. They were heavily oriented towards parsing with the Java programming language. Data becomes available to the application while the XML is being parsed. SAX makes parsing callbacks available. Your first step is to actually obtain a parser. It should comply with the XML specifications for a parser. Sun recommends the Apache Xerces parser. Xerces is free. It works with both the C and Java programming languages.

Sun says that you should receive the SAX classes along with your parser, as they are parser dependent. The first thing the application needs to do is instantiate the parser. Then you set up callbacks so that SAX can take action on interesting events. This is called registering handlers with SAX. I get the feeling that I actually need to play around with this some more to get a better feeling of how it works.