XML Home: 2008

XSL and XSLT

Once again I got an email from my online XML course. This week there were two topics covered – XSL and XSLT. Unfortunately the coverage of XSLT was very light. I think I have written about XSLT with a little more depth before. If not, it might be time for a little more study and a future post. This week starts with an understanding the XML itself is not concerned with display. That’s where XSL comes in.

XSL stands for eXtensible Stylesheet Language. This language contains formatting information for XML. The really interesting part is that XSL stylesheets are XML documents themselves. XSL was one of the first applications written in XML.

Cascading Style Sheets (CSS) define how XML looks in a web browser. They can also be applied to HMTL source as well. Finally the eXtensible Stylesheet Language for Transformations (XSLT) is a language which transforms XML into other formats. Any example would be to transforms XML to HTML. However you can choose other outputs formats, including XML itself.

The heart of XSLT is the use of templates. Now that I look back on my blog posts, I find that I did not cover XSLT in depth anywhere yet. I did see that I wrote myself a note about there being a follow on class to the hands on training I took previously. Maybe that will be a good candidate for me to take next year. Then I will be able to write at length about this complex topic. Until then be well.

Cloud Computing

I thought in this post I would talk about something not directly related to XML. At my local bank, I picked up a free copy of the Business Monthly newspaper. There were many stories of interest. However I really enjoyed reading the Pounding the Keyboard column by Cliff Feldwick. This particular column was on stupid computer ideas. I have not read this paper before, and I never heard of Cliff. However I was impressed that Cliff not only gave out his email address, he provided his telephone number.

Cliff went though a couple technologies he thought were pointless. One of them was Cloud Computing. He said this is really nothing more than last year’s Software as a Service. In fact, this concept has already been known as Application Service Providers, Business Process Outsourcing, and Managed Service Providers. In other words, this is just another fancy buzzword.

Cloud Computing is a bad idea according to Cliff. I think others share his opinion as well. Essentially cloud computing is where the application you run is hosted in the cloud. It does not physically run on your computer. The risk this poses is what will happen to you if your cloud provider goes bankrupt. You will most likely be out of the service. You may also have lost all your data as well. Running applications directly on your computer is the most secure, as well as the fastest option.

To be fair, cloud computing does have some ideas of interest. This may apply more to the enterprise level. It is certainly easier to manage one cloud server than the many desktops spread across your organization. You can handle backups quicker. It is easier to ensure the one server is always up regardless of power conditions. In general, it should be easier to manage. To tell the truth, I have not been paying much attention myself to the cloud computing hype train. Perhaps this is for the better.

XML Schema Class

I received the latest installment of my online XML course. This class was on XML Schemas. It mentioned that XML started with just Document Type Definitions (DTDs). However the DTD is specified in a format different than XML. However the newer XML Schema is actually written in XML format.

The XML Schema is used to define text data within an XML document. You can define elements of simple type. You can also define elements which have subelements. These are called complex types. There are also built in types which come with XML Schema.

Overall I found this class to be a light treatment of a heavy topic. Perhaps there is more information coming in the next class. Let’s get our bearings here for a moment. XML Schema is used for XML document validation. This validation is performed by the parser. It checks the structure and content of an XML document. In other words it looks at the grammar.

XML Schema itself was approved by the W3C back in 2001. To add to what the class taught, simple types can only have text. They cannot contain attributes. Furthermore, elements from schema must have a namespace qualification. This class glossed over or did not even cover some details such as compositors and facets. I think I mentioned these in an earlier post on my blog. Maybe I will write a separate post to cover them in more detail.

DTDs

This week’s lesson in my online XML course was on Document Type Definitions (DTDs). They are required for well formed XML documents. The DTD is an XML file itself. It is the grammar of your XML document. The DTD provides structure and meaning to the XML document.

The DTD describes your language. Validation is the process of comparing your XML document to the DTD. If you follow the rules in the DTD, your document is known as being valid.

A good way for developing a DTD is to first write your XML document. Then you can spot the patterns and structure required. You can then go back and write a DTD for future XML documents of the same kind.

The DTD is made up of elements, entities, attributes, and notations. This week we went a little further into what constitutes an element. There are actually three types of elements: simple, compound, and stand alone. Perhaps I shall go into more detail about these types in a future post.

XHTML

I caught an article on Dev Archive by Tina Holmboe entitled “XTML – Myths and Reality”. I found this to be of interest given my XML activities recently. So I thought I would review some of the things she said, and figure out whether XHTML is in my future.

The article was geared toward newbies in the field of XHTML. That means it is for me. It started by reviewing that SGML is too complex. HTML was made easy to use. XML was also made for easy implementation. It also clarified that XML is not an SGML application. XML is a set of conventions for SGML.

The web is moving from an SGML to XML format. One of the reasons for the move is to harness the power of XML namespaces. Shifting gears a bit, HTTP allows data type identification. This tells the client how to deal with data. XTML has its own new content type. This tells the client which parser to use on the data.

XHTML is stricter than HTML. There is a lack of XHTML support out there. In fact, no versions of Microsoft’s Internet Explorer support XHTML. That was all I needed to hear to determine that XHTML is not for me. Maybe a better way to put it is that XHTML is not for me just yet. I exclusively use Internet Explorer for everything I do on the web. I wonder why Microsoft is against supporting XHTML.

Attributes

I got another installment from my online XML course. This week we are learning about attributes. They describe the elements. You could say that they provide more information on elements. Attributes are as important as tags. HTML also has attributes. Many HTML attributes deal with margins.

The XML attribute is like an adjective that modifies the element. The syntax is of the format attribute=value. There are several attribute types as follows.

CDATA attributes are character data.

Entity attributes refer to an external item in the XML document.

Enumeration attributes are, as the name implies, a list of values.

Nmtoken attributes are like CDATA, but with more restrictions.

There are also notation attributes. However I have not used them before. Attributes in general are a building block of the XML language. I would consider them second only to elements. You have some choice as to make items elements or attributes. I recall in my formal XML course the instructor went over some guidance on this. Too bad I cannot ask questions in my latest online XML course. This would make for a good discussion.

Elements

I received the latest installment of my online XML course. This week we are covering XML elements. As an aside, I am joining a professional computer association. They state that they have training available for all paid members. One of the courses they teach is XML. So I may have another opportunity to brush up on my XML skills in the future.

The lesson first reviewed the composition of an XML document. It starts with an XML declaration. The rest of the file consists of XML elements. The outermost element is the container element. In a tree notation, each branch of the tree is an element.

The prologue identified that the file is in XML format. The prologue can include validation such as a DTD or XML Schema. It also contains processing instructions.

It is interesting to note that an XML document can contain only one element. In this case, that element would be the containing element. However it would contain no other nested XML elements. This brings up the fact that elements can contain other elements. All elements except the outermost element are contained within the outermost element. However these elements themselves can contain other XML elements.

The concepts of an element and its tags are distinct. Elements can be empty. For such an element, there is short hand syntax to eliminate the full trailing tag. I think it is safe to say that elements are one of the key parts of an XML document.

Parsing

I am on my third class of an online XML tutorial. This week’s topic was XML parsing. This covers how to actually use XML. The parser reads the XML. Some web browsers can parse XML. Internet Explorer version 4 was the first browser to include a parser. Expat is a free XML parser. Lark is a non-validating XML parser written in Java. That’s all the class had to say about parsers. It was pretty light.

So I also went to the W3 schools for some XML parser information. All modern browsers have a built in XML parser. The parser can convert XML into a DOM object. The parser traverses XML trees. I also looked up information about the Expat parser since my online course mentioned it. This XML parser is written in the C programming language. It is stream oriented. Applications register handlers, causing the parser to call back to the application when events of interest occur.

Finally I went to Sun Microsystems to brush up on XML parsers. They were heavily oriented towards parsing with the Java programming language. Data becomes available to the application while the XML is being parsed. SAX makes parsing callbacks available. Your first step is to actually obtain a parser. It should comply with the XML specifications for a parser. Sun recommends the Apache Xerces parser. Xerces is free. It works with both the C and Java programming languages.

Sun says that you should receive the SAX classes along with your parser, as they are parser dependent. The first thing the application needs to do is instantiate the parser. Then you set up callbacks so that SAX can take action on interesting events. This is called registering handlers with SAX. I get the feeling that I actually need to play around with this some more to get a better feeling of how it works.

Channels

Kris Zyp has written some interesting entries on the Comet Daily blog that have to do with channels. The first I saw had to do with HTTP Channels. It is a “publish and subscribe” model to communicate resource changes using HTTP.

Kris has also written about JSON Channels. They are an extension to JSON-RPC. They provide a subscription capability as well. However they are easier to implement than HTTP Channels, as you do not have to parse the HTTP protocol. It can be thought of as an alternative format to HTTP Channels.

In another post, Kris introduces REST Channels. The plan here is to use the web sockets protocol from HTTP 5. However the data is in JSON format. You can mix HTTP and JSON in a browser session. So an efficient technique could use HTTP requests, with updates coming back in JSON format.

I do not want to get too deep into the details. The Comet Daily blog is the source for that. However it is interesting to see that HTTP is still a continual force in the web. However I found it interesting to see that people consider its format a chore for parsing. I have heard colleagues mention how JSON is a good and light format for the web.

I myself am trying to get into web development to prepare myself for where my current project is going. So I am going to need a good handle on all the technology involved. It seems this might be a bit more than just learning XML and its supporting technologies. At least I will have to do a lot of homework to be able to speak intelligently at the architecture level.

Valid XML

Last week I subscribed to an online class for XML. This week I received my second class installment. The topic was valid XML documents. This lesson taught that a valid XML document is one that meets the DTD or XML Schema. It is different from an XML document being well formed, which means an XML processor can read the document. To me well formed means it meets the syntax rules of XML.

The course notes this week pointed out that XML is a self describing language. So you might not always need a DTD or XML Schema. However when you do use validation, the DTD can be local or public. A validation file such as XML Schema describes the XML document. Then a parser validates the XML document against the schema.

One of the most interesting experiences with this week’s class was an ad on the lesson page. The ad offered a free XML viewer. So I clicked through, then downloaded the Firstobject XML editor. It was a free download. The editor divides the XML document into elements and their sub elements. The tool uses a tree like structure in the user interface. I found the tool to be very basic compared to XMLSpy.

The Firstobject XML editor displays XML as text. There does not seem to be much graphic viewing capabilities other than the tree structure. One nice thing about this product is that the company will give you the source code to the editor if you buy their product. They are trying to sell a single MFC class that does all things XML for $249. That’s a lot cheaper than my XMLSpy tool would cost.

Perhaps the best course of action would be for me to write my own XML viewer and/or editor. Then I would really learn the ins and outs of parsing XML documents. I would not have to start from scratch. I could use an XML parser library to actually read the XML document. This would not be cheating. I would be learn the parser’s API.

Going to the Web

The Software Maintenance blog give an example of one project that is going to the Web. Moving from a client server environment to a web one involves a number of technologies. Of course there are some bare bones basics like web servers and HTML. But it is so much more than that. Frequently developers are classified as either desktop or web developers. Making the transition can be difficult.

I have sensed a different outlook amongst web developers. That might be a hard trait to emulate. However it should not be difficult to identify some common technologies used by web developers. For example, they may tend to use a combination of Java and JavaScript on the front and back ends. And you can pretty well imagine that they might be using XML format for sending data between systems.

Personally I am planning to enroll in some classes to beef up my web programming skills. The first one is going to be Java 101. I guess that would actually be my second class. I have already taken an introductory XML class.

XML 101

I took an XML training class a couple weeks ago. However I have not been able to do any real XML work due to other constraints at work. Therefore I decided to do an XML refresher to make sure I did not forget the things I learned in class. I signed up for an online course from About dot com. They provided some courses that are controlled through e-mail. I chose the course “XML 101” by Jennifer Kyrnin.

Perhaps this online course is not as current as the recent one my company paid for. This course stated that XML is not a programming language. It is also not a language of tags. Instead, XML is instructions on how to create the tags. It is a markup language to define information. XML is a meta language.

The online course covered Document Type Definitions (DTDs), which are the grammar of the XML document. The second line of an XML file should be the DTD. However you can omit the DTD line.

The latest specifications for XML are at the W3C web site. There is a lot of jargon associated with XML. However XML is not hard to learn. An entity is a storage unit for XML. Processing instructions were covered. Elements in XML are case sensitive.

Kyrnin recommended that you learn HTML before trying to learn XML. She says the benefit of XML is that you can move processing from the server to the client. So far I have only read through the first lesson. There are a lot of ads in the course material I receive. However the course is free so it is a reasonable trade off. Given the coverage of DTDs already, I wonder if this course is going to cover XML Schema as well.

XMLSpy Ad

Last weekend I was browsing the latest copy of Oracle magazine. I saw a full page ad in it for XMLSpy 2008. This product has advanced tools for XML Schema development. It is the self proclaimed world’s best selling XML editor. I recall my XML instructor telling me the same thing.

The ad declared that XMLSpy had many useful features such as support for very large XML files, DTD conversion, UML generation, advanced validation, and code generation for Java/C#/C++. It encourages you to leave the XML Schema details to the XMLSpy program while you concentrate on the business at hand. Altova (the maker of XMLSpy) offers a free 30 day trial download of the program.

I went to the Altova web site to find out more about the program. It has Visual Studio and Eclipse plugins. Of course it offers a visual XML schema editor. It also has a DTD editor. It supports Open XML (OOXML). It can debug SOAP. XMLSpy has a CSS editor.

XMLSpy has a number of tools for XQuery like an editor, debugger, and profiler. It can analyze XPath. It integrates with your database. XMLSpy supports XInclude and XPointer. Hell. I don’t even know what XInclude is. So you know this thing must be good. LOL.

The only downside to XMLSpy is the price. I had been warned about this before. The standard edition goes for $149. The professional edition costs $599. And the enterprise edition is $1190. Now that is not a lot for a development tool. We have third party tools that cost a whole lot more. The problem is that I need to justify the cost to my company and/or client. And that is some non-trivial effort.

Luckily my boss already knows the power of XMLSpy. So now we can work together to convince the powers that be that I need this tool. Altova has done a good job with its marketing. From the grapevine I also hear the product itself is excellent.

Benchmark Tool by Intel

I found a product on the web called the XML Benchmark Tool (XBT) by Intel. It is an XML performance and measurement tool. It analyzes the performance of XML processing engines. It tests a number of things such as XML parsing, XLST, XML Schema validation, and XPath operations to name a few. You can write your own driver since the tool is provided as a framework. There are versions for Windows and Linux. You can test C++ and Java code.

The XBT is provided for free. I suspect Intel is trying to gather good will to sell its XML Software Suite. This product costs $199 for the developer edition. The run time license goes for $1999. It states that it can handle large XML file processing. It claims to be standards compliant. The product also states that it is thread safe.

I downloaded the free XBT. It was almost 6 Megabytes. The download unzips into a directory structure. There is no install program. I found that a bit strange. You have to manually configure the product yourself. It requires you to obtain and install third party products such as Cygwin, the JDK, and Visual Studio. The free downloads comes with a PDF user guide. There were too many dependencies and configurations required to get the application up and quickly running.

XBT comes with many familiar drivers. For example it comes with Xerces, JAXP, libxml, MSXML, JDOM, Saxxon, and Xalan. This seemed promising. When I signed up to download the free product, I had to provide my e-mail address. It was embarrassing on September 4th when I received the following e-mail from a guy at Intel:

From: censored@intel.com
Date: Sep 4, 2008 9:39 PM
Subject: Out of Office: XML Benchmark Tool
To: me

I will be on vacation with no email access, returning Sept 2nd. Please contact for any items needing my immediate attention.

Ouch. Somebody has fallen asleep at Intel product support. I was not even given the e-mail address of the other guy at Intel. I suspect this out of office message was intended for other Intel Corporation employees. The dude forget that he was the one who received automatic e-mails every time somebody downloaded the XML Benchmark Tool. LOL.

Schema From XML

A long time ago I used to read the CodeGuru web site for information about software development. I have since graduated to other more interesting web site. However I still go back to CodeGuru every now any then. Today I saw an article posted there entitled “Inferring an XML Schema from an XML Document” by Paul Kimmel. I read it with interest as I am becoming more aware of XML Schema.

Paul said that writing plain XML documents is easy. However he does not have enough practice to write XML Schema documents from scratch. The attribute and namespace syntax gets him mixed up sometimes. So he decided to write a program to generate an XML Schema from any given XML document. That sounds like a good idea to me.

The trick is to employ the XmlSchemaInterface class from the .NET framework. Specifically you can use the InferSchema method of this class to produce an XML Schema Definition Language (XSD) schema if you pass it an XML file. That way you can let the class do the formatting and syntax for you. Sure you could do this yourself by hand if you know the ins and outs of XML Schema. But why take the hard route?

I suspect Paul wrote this article to help publicize his new book “LINQ Unleashed: for C#”. The list price for this book is $49.99. However you can get it for $31.49 from Amazon with free shipping. Good luck Paul with your new title. I have some interest in LINQ myself. If I can get my company to pay for it, I will pick up a copy.

Protocol Buffers

Matt Cutts of Google declared on his blog that Google had open sourced protocol buffers. They encode data in binary format for transmission. You write a description of the protocol you desire. The Google code then generates classes to work with the protocol. It supports the C++, Java, and Python programming languages. There are over 10k protocols used by Google itself.

People have said Protocol Buffers look similar to Facebook Thrift. And Thrift supports even more languages like Perl, PHP, XSD, C#, Ruby, Objective C, Smalltalk, Erlang, Ocaml, and Haskell. Matt Cutts has gone on the record as stating that Protocol Buffers predate Thrift. Both Thrift and Protocol Buffers are based on old ideas such as Corba and IDL. Some have commented that it would be nice if you could map Protocol Buffers automatically to XML.

Protocol buffers create stubs for your RPCs. People have commented that Protocol Buffers look a lot like JSON. Sometimes Google refers to Protocol Buffers as pbuffers. Google uses it exclusively for talk between servers. Google itself uses C++ for programs that run on production machines. This is in order to get the best performance.

I consulted the Google developer’s guide on Protocol Buffers. It declares that they are language neutral, platform neutral, and extensible. Their intended purpose is to serialize structured data. It claims that Protocol Buffers are smaller, faster, and simpler than XML. You define message types in “.proto” files. They contain name value pairs. Fields are numbered in each message type. When you add fields, the result is still backward compatible.

Mark Pilgram, another Google employee, likens a proto file to a schema. It does not contain data. He says that Protocol Buffers are designed to minimize network traffic and maximize performance. They can be nested. And they are both backward and forward compatible. He stated they will not replace JSON.

I have not used Protocol Buffers myself. However if Google uses them that much, there must be some really good benefits to them. Unfortunately my own project at work seems to be going in the direction of XML. I think we are officially prototyping it next year in production.

What is XMPP?

Lately I have been reading up on the whole SOAP versus REST discussion. I saw a comment on one blog that “XMPP was not going to replace REST”. At that point I figured this was a true statement. That is because I had never heard about XMPP before. So I decided to do a little research on the web to find out more about it. This is what I found.

XMPP stands for Extensible Messaging and Presence Protocol. It is a steaming XML technology for instant messaging and buddy lists. It is the core protocol of Jabber, which itself is an open source technology for instant messaging. Jabber’s advantages are that it is open, decentralized, secure, and free. XMPP is an open standard. It is implemented in Google Talk. I hear that it has a large overhead.

XMPP was developed by the Jabber open source community in 1999. There is actually an XMPP Standard Foundation. The specifications for XMPP were produced by the IETF XMPP Working Group. The standards were written in a set of RFC. For example RFC 3920 is XMPP Core, produced in 2004.

I read a funny blog post by Matt Tucker that said developers should “come to Jesus about the [XMPP] protocol”. I guess to some this is sort of like a religion. Matt goes on to say that XMPP is good for cloud computing. He believes that SOAP does not scale.

Getting a little more into the details, XMPP defines extensible elements called XML stanzas. They are exchanged in real time. The old way of doing things was for clients to poll servers. Examples of this are G-mail and RSS readers. However this method would not work for instant messaging due to the sheer volume of clients. The solution is XMPP. There is obviously a lot more to learn about this technology. There are a lot of extensions to the core standard. I will keep you posted on anything that I learn.

WS Star

The more I look at SOAP and REST, the more I started seeing references to WS-*. I wondered what the heck that meant. Nobody I know seemed to talk about that. And I spend a good deal of time with developers. The problem is that most of us are legacy developers who do not deal a lot with newer technologies on our jobs. I found that WS-* refers to a set of specifications for web services. It is not a single set of specs. And no one body owns all of them. Many of the actual specs begin with WS. So that is why they are collectively known as WS-*. And it is pronounced WS-Star.

Here are some sample WS-* specifications that I have found:

WS-Notification
WS-Addressing
WS-Transfer
WS-Eventing
WS-Enumeration
WS-Policy
WS-Discovery
WS-Metadata Exchange
WS-Resource Framework
WS-Security
WS-Trust
WS-Federation
WS-Reliability
WS-Transfer
WS-AtomicTransaction
WS-Coordination
WS-CAF
WS-Transaction
WS-Context
WS-CF
WS-Management

I decided to look up one example in more detail. So I chose WS-Policy at random. The full name of the specification is Web Services Policy Framework. There are 20 authors of the spec. Many of them seemed to be affiliated with Microsoft Corporation. For example Don Box was one of the authors. The document itself is 25 pages long.

To tell the truth, I really did not understand the introduction section to the WS-Policy spec. It was filled with a bunch of buzz words. At least I got that the name space for the spec is http://schemas.xmlsoap.org/ws/2004/09/policy. And it helped that they gave an example of the spec in XML code.

There is a reason why the author of Ruby on Rails calls WS-Star the “WS Death Star”.

SOAP and REST

I have read a couple blog posts regarding the Simple Object Access Protocol (SOAP) and Representational State Transfer (REST). Let me start by talking about and InfoWorld article. It discusses a statement by Tim Bray, the director of technologies at Sun Microsystems. He says the “SOAP stack is a failure”. Tim continues that REST is more viable, elegant, and affordable. He says there will be more and more tools for REST from big companies such as Sun, Microsoft, and Oracle. However Tim concedes that there is a lack of current tools for REST.

Another good blog post was “REST as an engineering discipline” by Bill de hOra. Bill comes out and says that SOAP is simple. After all the S in SOAP stands for simple. He contrasts this with REST which is not necessarily simple. Bill qualifies this by saying that REST is neither better nor worse than SOAP. He does state that REST works. But he advises that REST is not for hackers. And there is a lot of hype around rest right now. Things may be a lot different once the hype dies down.

Having done a little SOAP with a lot of help from instructional materials, I can say that it is indeed not all that simple. Yes the outer wrapper of the SOAP envelope may not be rocket science. But from a beginner’s standpoint, you have enough to worry about with XML. Adding the SOAP layer on top for messaging does introduce a little more complexity. Perhaps this will get easier if I work with SOAP more often. For now my project only intends to receive input files in XML format. I am not sure if they shall be validated with an XML schema, or be packed in a SOAP envelope. It is just exciting that the topics I have learned about and will be using are current ones in the technology sector.

REST Discussion

It seems there is a lot of buzz about Representational State Transfer (REST) in blogs recently. First there was a post by Damien Katz, which got a response by Dare Obasanjo that I would like to mention first. He pointed out that REST was coined by Roy Fielding who was one of the HTTP 1.1 specification authors. SOAP on the other hand came from the Microsoft camp with people like Don Box. SOAP was then pushed by the W3C. It was of interest so that people could package old enterprise techniques such as COBRA into new buzzwords.

Obasanjo continued that REST is specifically for the client server architecture. The statelessness feature of REST contributes to increased scalability and reliability. REST is a characteristic of the web. A resource is anything that can be named on the web. Regarding PUT and POST, PUT is idempotent. POST is not idempotent. In other words, you can PUT many times, and the outcome is the same. This is not true with POST. I liked the phrase that “session state is evil”. That does not mean that this architectural outlook is correct. In addition, Obasanjo lobbied for developers to not fight against the web.

In the same regard, I read a REST Questions blog post. The most interesting part of it were the comments from readers. REST says that application state is not supposed to be kept by the server. It is acceptable and desired for the client to keep application state. REST is not necessarily good at communicating errors back to the client. There is definitely a lack of good tools out there for REST. PUT and DELETE are not supported in browsers, causing a limitation to REST.

More REST Questions comments included one that the envelope/wrapper idea from SOAP is not a bad one. However making the envelope be in XML format is the problem. REST may indeed be simple to understand at a high level. However that does not mean it is easy to implement. There is still some confusion out there as to what REST is exactly. For example there is no REST RFC.

I expect the discussion to continue since REST is still one of those hot buzz words. We shall see what technology wins when all the dust settles. Last year I recall SOA being the buzzword of choice. And before that it was Ajax. Time will tell.

Anti XML

I read a blog post by Brennan Spies entitled “XML Backlash”. This post summarized some of the anti-XML sentiment I have been hearing for some time in the industry. I thought this would be a relevant topic since I just completed a training course in XML myself.

Brennan offered that XML was introduced in 1998. XML has a complementing set of other technologies such as XPath, XSLT, XQuery, and XML Schema. XML is used for configuration and enterprise applications. It is also at the heart of AJAX.

There are other technologies competing with XML. These include Protocol Buffers, JSON, and YAML. Protocol Buffers is a format designed and heavily used by Google. JSON (JavaScript Object Notation) is a simple text based format that is actually language independent. YAML strangely stands for YAML Ain’t a Markup Language.

XML does have its benefits. It is both platform and language independent. There are many tools available to work with XML. I personally plan on getting my company to buy me a copy of XMLSpy from Altova. Although some developers on my team say that Visual Studio is enough. XML is surprisingly readable. Another person taking training when I did told me she thought XML was pretty easy to read. I guess she had not done XSLT yet. Finally XML does have the whole XML Schema language for validation.

One of the main complaints about XML is that it is overly verbose. That translates into big files. What I found interesting about the blog post that I read were the comments from other people. Some said that XML is good, but not good for everything. Another pointed out that XML is good in that it is web friendly. I found it interesting that many people agreed that file in CSV format were most common.

The thing I do know is that XML is in my future. We are going to be receiving some input files in XML format next year. This is a requirement dictated to us by our customer. Apparently this change is part of a modernization effort. You have to understand that we deal with big legacy systems at my work. They are mainly COBOL programs that run on the mainframe. So XML might by new in that light. I will keep you posted on how our project is doing with introducing XML to it. There is one piece of good news. One of my buddies on the project was planning to leave for another project. However he is thinking about sticking around and handling the XML upgrade.

REST

I am starting to become more aware of the term Representational State Transfer (REST) having learned about XML. This term was coined by Roy Fielding in his Ph. D thesis "Architectural Styles and the Design of Network-based Software Architectures". REST is truly an architectural style. It is not a standard. It also does not deal with implementation. The web (with HTTP) is an example of REST.

REST deals with resources on the web. These resources are all named with URLs. They are accessible via the HTTP GET command. Thus it is a pull type interaction. Furthermore REST is a stateless transaction. Operations cannot expect to retain state from previous transactions to be able to work. However REST does accommodate the need for a cache to speed up network performance.

For now I plan to keep my eyes open and perhaps try to learn more about REST details. My chief goal right now is to put my new XML knowledge into practice. So I will be gearing to more complex methods for web services like Service Oriented Architecture (SOA). Look for my previous blog post about SOA
.

WOA

I just got done reading my latest copy of Information Week magazine. There was an article in there by Roger Smith entitled "Smart Web App Development". It explained the rising popularity of a Web Oriented Architecture (WOA). This was a new term for me. So I read the article carefully. I am glad I recently trained in XML so I could follow some of the topics involved.

WOA is an approach to system design. From what I gather it is based upon Representational State Transfer (REST). REST is a simple data transfer technique over HTTP without using complex wrappers like SOAP. Resources are accessed through URIs. It purports to have better response times, lesser server load, and less client code.

WOA is a lighter form of Service Oriented Architecture (SOA). SOA is more complex and expensive to develop. It requires you to sends XML messages wrapped in SOAP envelopes. I have blogged about SOA in the past so I will not repeat all the details.

The Information Week article stated that many companies such as Google, Yahoo, Amazon, and Mozilla are going to the simpler WOA approach. The article also admitted that a WOA might not be best for enterprise level computing. In fact it portrays WOA as a complimentary technology to SOA. For now I think I will be more into Service Oriented Architecture since I just learned a lot about XML.

AJAX

Today was the last day of my XML Training class. We had a shortened session today. And we covered many subjects. However we did not delve too deeply in any of these subject. Essentially we reviewed different applications that use XML. One of these uses is Asynchronous JavaScript and XML (XML).

Early web pages had to reload the entire web page every time there was a change needed. However this was slow and was overkill when you needed to modify a small portion of the web page. Thus Asynchronous JavaScript and XML (AJAX) was introduced. It allows JavaScript running on the client to make a request to the server. The JavaScript then takes the server response and updates a small portion of the web page. This is done in the background. Therefore this is an asynchronous operation. The AJAX communication between client and server is in XML. The JavaScript sends the request to the server via XML over HTTP. The server also responds in XML format.

The final topic of the class was securing XML. You can utilize HTTPS. This method was previously called Secure Sockets Layer (SSL). However this is a slow option. An alternative to speed this up is to encrypt select elements that contain sensitive information. And that's it for my class. However there is a follow on class covering advanced XSLT that I may take. If so, I will share all that I learn. I hope this series in my blog has been instructional.

SOA

In this post I continue a review of my last day in XML Training class. We briefly mentioned web services today. Web services run on a server. Clients communicate with the server via XML messages over the HTTP protocol. It is platform independent. The umbrella technology is referred to as Service Oriented Architecture (SOA).

There are three main components to SOA. There is a registry that lists all the service. Providers are the servers that implement the services. Finally requestors are the clients that look up needs in the registry, and communicate directly with the providers.

The registry information is provided in the Web Services Definition Language (WSDL). This acronym is pronounced wiz-dull. WSDL gives information on where the service is located, the format of the input messages, and the format of the output messages. The formatting information is given in XML Schema notation.

Web Service communication is done via messages in XML format. Furthermore these messages adhere to the Simple Object Access Protocol (SOAP). SOAP is a format which is some tags in XML which stipulate that the XML messages are enclosed in a SOAP envelope. The SOAP format is sent over the HTTP protocol.

There were a couple other miscellaneous topics covered in class such as Ajax and XML Security. I will defer these to a future post.

XSL-FO

Today was the last day of my XML Training class. We covered a number of topics today. Most of them only provided an introduction to the topic. One such topic was XSL Formatting Objects (XSL-FO). This technology is used for presentation of XML data similar to what HTML does.

The most popular use of XML-FO is to generate PDF output from XML documents. This is actually a multi step process. First the XML and XSL run through a processor generating the XSL-FO code. Then a formatter takes the XSL-FO and produces the PDF output document.

One implementation of XSL-FO is the Apache FOP. Its best feature is that it is free. However it does not implement all of the XSL-FO standard. At the other end of the spectrum is RendexX XEP. It is the most advanced XSL-FO implementation. However it is also the most expensive, costing thousands of dollars.

My instructor did point out that XSL Formatting Objects is not very popular. It is used by government organizations to produce things such as tax forms. It is also used in the financial community. My instructor said that at one class, New York banks were recruiting class members to work for them doing XSL-FO work. They had high paying clients that needed financial documents generated by XSL-FO.

There is a whole follow-up class that teaches more of XSLT and XSL-FO. So we did not go deep into the subject. I will post some more on the other topics we covered today such as web services, SOA, SOAP, WSDL, and XML Security. Be sure to tune in.

XSL

The majority of XML is structured around how data is stored. It does not concern itself with presentation. This is where the Extensible Stylesheet Language (XSL) comes in. XSL has a similar functionality to XQuery. However XSL is much older than XQuery. It allows you to transform XML to a presentation format such as HTML.

There are two places you can perform an XSL transformation. It can be done on the server with the resulting output such as HTML being piped to the client. Or you can send both the XML and XSL data for the client to do the transformation. It is recommend that the server do this. You cannot always guarantee that a client Web browser has the capability to do the transformation.

The transformation makes use of an XSL stylesheet. The default behavior of an empty stylesheet is to traverse the children of an XML data tree and print the PCDATA elements. However you normally add template matches to the stylesheet in XPath format. When such a match is made, the code in that template block is executed and the traversal normally stops. You can issues an apply-templates command in this block. That causes XSL to continue processing the templates you instruct it to (with the default being all children elements in the tree).

Here are a couple closing ideas on this brief intro to XSL. The XSL templates are never nested. And their order in the XSL file is not important. Execution is determined by the order of nodes in the XML tree, and the XPath information in the templates. Tomorrow we are going to learn how to use XSL formatting objects to turn XML into a PDF file. There is also another 3 day long class on the topic of XSL transformations by itself. So you know it is a huge topic that I cannot even start to cover here. I just wanted to share what little bit I have learned. I can't wait to get back to work and put some of this XML knowledge into action.

More XML Schema

This afternoon I had a rapid introduction to the XML Schema language. If you have not done so already, see my first post on XML Schema. In addition to being able to specify an element format (or type), you can specify its cardinality. This is done with the element attributes minOccurs and maxOccurs. The minOccurs attribute default to 1, can be set to whatever you desire, and can represent an optional element if set to 0. The maxOccurs attribute defaults to the lesser of 1 or minOccurs. If you want to specify that there is no maximum, the maxOccurs attribute can be specified as "unbounded".

Previously I had introduced the concept of complex types. They represent attributes, or elements made up other sub elements. XML Schema allows you to specify fine control what a particular complex type element requires. This is done in part by what's known as a compositor property. The valid compositors are sequence, choice, and all. A sequence compositor means that all pieces of a complex type are required and must be in a specific order. A choice compositor allows you to specify the different components that a composite type can optionally have. And an all compositor allows specification of a list of complex type parts, each of which can have 0 or 1 occurrences in the whole element.

XML Schema allows you to define a global element. Such an element must be a direct child of the root element. The opposite of this is a local element which is defined within the compositor. You can also define new data types. A simple new data type is one which derives from a built in type, but restricts the range of possible values through a mechanism called a facet. I will not enumerate all the possible facet varieties here. You can also create a new data type which is of the complex variety.

XML by itself has a way to add a comment. That should be used if you have an internal comment. XML Schema has another form of adding comments to an XML file. However these comments get pulled into the official documentation. So make sure they are appropriate for public consumption. You can use XML Schemas with XML namespaces. There is a little tricky syntax change that must be applied to both the XML File and the XML Schema to make namespaces work. XML Schemas can include other XML Schemes (.xsd files). Finally you can use a tool to convert an old style Document Type Definition (DTD) into the newer XML Schema language.

XML Schema

I continue to pass on information learned in my XML Training class. Previously I had learned that a well formed XML document is one that has correct syntax. That just means that the formatting meets the XML rules. But this has little to do whether the required type of data, or the correct data format is contained in the file. For this we need XML validation.

XML validation used to be performed by a Document Type Definition (DTD). This is another file which describes the required rules to determine whether the XML data is valid. DTDs did not work well with XML. They allow specification of structure definition. They were weak on data types. This led to creation of a new validation language called XML Schema.

Initially there were many flavors of XML Schema. However on 05/02/2001, a single version of XML Schema became a standard. Although you can code XML Schema documents using a text editor, it is much easier to use a tool to design the schema graphically. XML Schema has an XML namespace that normally is denoted with the alias xsd. XML Schema comes with 40+ built in data types. These are the simple types. You can also use a complex type which is either a set of sub elements, attributes, or a combination of both.

There is a lot more to what you can specify in the XML Schema language. I think I will save that for a future post. Mind you that we learned all about XML Schema in one afternoon. I am in a demanding training class.

DOM

This afternoon in my XML training class we learned about the Document Object Model (DOM). It is an API used to pragmatically work with XML data parsed into a tree structure. You should use it if you need to more than just read and write XML. Otherwise you should use the simpler XQuery language. DOM is object oriented. It is also an open standard.

Here is a list of crucial DOM API functions: createElement, appendChild, createTextNode, and setAttribute. Some other lesser user DOM functions are createComment, createElementNS, createCDATASection, createProcessingInstruction. You will need to also make calls to a loadXMLDocument function prior to using any of the DOM functions. The loadXMLDocument function is not a part of the DOM API. Instead it is provided by the implementation of the parser you choose.

We used Firebug to debug our DOM code. Firebug is a Firefox extension written by one of the original authors of Firefox. Note that it is a little ambiguous seeing a text node and the text data itself in Firebug. However if you mouse over a text node, it will show an underline signifying that it is a node.

My XML training class is skipping over the Simple API for XML (SAX). It is an API used to deal with parsers that are event driven as opposed to tree driven. I am not sure if we skipped this topic because it is not used as much, or if it is more complicated, or that there was not just enough time to cover it. Any of you readers use SAX? Let me know.

XML and Databases

This morning I attended my second day of XML training. We spent a lot of the morning doing exercises to reinforce the stuff we had learned. However I thought I would pass on some of the new information I have learned. This was paid training that my company paid big bucks for. You get the benefit of this training here for free.

The Microsoft Excel application stores data in its native Excel format by default. However it has the ability to export to XML format. You need to map spreadsheet cells to XML elements. This is done by first choosing XML Source from the XML submenu of the Data menu. Once you have done this, you can choose to Save As file format "XML Data".

There are a number of ways to store XML data in a database. One technique is to choose to store the whole XML file as a record in a column of type native XML. Oracle databases that are version 9i and above support this. SQL Sever databases that are version 2005 and above also support this. If you have an older version of Oracle or SQL Server, or if you have a different database that does not have an XML native data type, you can still store the whole XML file as a CLOB data type.

Another way to store XML data to a database is called Shredding. This is where you just extract the character and attribute data. This data is then inserted into columns in the database. The XML formatting (e.g. tags) is lost in the process. This has the benefit of using less space in the database. It also makes for faster queries. However this method is not recommended if your XML data is unstructured, or if you plan to transfer data in XML format frequently.

Once you have the data stored in an XML database, some databases have built in support to query and retrieve the data. In Oracle you use the built in package DBMS_XMLGEN. This allows you to retrieve XML from the database and generate an XML output file. If the resulting format by default is not what you want, you can furthermore run the database output through XQuery to reformat it.

Unfortunately our class had computers which only had Oracle XE (Database Express Edition) installed. This version of Oracle had a limitation that did not allow you to use advanced features to format results using XQuery directly. We had to pipe the output to an XML file first. Then we used another tool to format the results. I wish my training company had spent a little extra and got a license for a real Oracle database version. However I can go back to work, try it out on Oracle, and let you know the results.

It is time to get back to class. Lunch break is over. I shall post again to let you know what I learn this afternoon. I hope the information I am sharing is of some use to you. If your company has the budget, I would recommend you also attend formal XML training. I am learning much here.

XPath and XQuery

Ok so I am finally getting around to talking a little about XPath and XQuery. Sorry for the delay. Note that this was information I got on my first day in XML training. I am sure there will be a lot more information that I learn and pass on to you.

XPath is actually a part of the Extensible Stylesheet Language (XSL). It became so popular that its use has be expanded for XML purposes. XPath is a language that allows you to specify a path that will return certain parts of an XML document. It is very popular. And it is 10 years old. Here is the important symbology in the XPath language:

/ = document
* = element
text() = PCDATA
@* = attribute

Now let's move on to XQuery. About 80% of XQuery is just XPath syntax. It is very similar to SQL. It is used to interrogate an XML file and produce a subset in XML format. The most common pattern in XQuery is as follows:

for ... in ...
where ...
order by ...
return ...

Introduction to XML

As I previously blogged about, I am currently in an XML training class. So I thought I would share some of the things I learned today. In fact, this post will cover what I learned this morning. The class costs over $2600 over 4 days. That's averages out to over $650 a day. This morning information cost my company $325. So listen up and learn a thing or two free of charge.

XML is a method of data transfer that intends to be simple and cost effective. It is text based. XML is an open standard. It is a subset of SGML. Although it is very powerful, SGML is complex. Only one person in my class of 24 knew SGML well enough to confess it. A related markup language is HTML, which is an implementation of SGML.

A newer version of HTML called XHTML has a stricter syntax to conform to XML. HTML is for presentation in a web browser. XML on the other hand is for expressing data. Another data transfer method is Electronic Data Interchange (EDI). EDI is expensive and more complex. Let's now focus in on some XML Details.

The basic unit of XML is an element. An element is demarcated by a start and end tag. Elements can have optional attributes. The outermost element in an XML file is called the root or document element. XML is case sensitive. One way to process XML files is by use of a parser. Microsoft provides the MSHTML parser with Internet Explorer. And Apache has the free Xerces parser.

You can put text in an element. XML calls this parsed character data (PCDATA). Some special codes within PCDATA expand to reserved characters. These are called entities. An example of an entity is < which is the less than sign (<). You can put special characters in data. But you then need to surround them with an unparsed character data tag (CDATA).

XML documents are encoded in Unicode. You can choose a format such as UTF-8 or UTF-16. An XML document is called well formed if it has a correct syntax. I will not go into the detail of all the XML syntax rules. However they are few and we learned them all in the morning session. I will post some more later on further XML topics such as XPath and XQuery. Until then cheers.

XML Course

Three months ago I started working for a new company. At first I was assigned to a manager that was very busy. He told me this was temporary. Later I got reassigned to another manager. My new manager had the time to explain some of the benefits available for working at our company. Once a year we are allowed to attend 4 days worth of training, with a $2500 budget to attend a class. That sounded good to me.

Our project is about to receive input files in XML. So I thought it would be good to attend a class on this topic. I looked at a catalog for a training company. They had a 4 day class. That matched the time my company gave me off. However the cost was $2650. I asked my manager what we could do. She said I needed to get her boss to sign off on the extra $150. Her boss told me that his boss had to sign off.

It took a couple calls. However I finally found out that the full price for the course had been approved. Today I started the first day of training. The instructor had a strong Polish accent. At the beginning of class she told everyone to pay attention at all times and not do anything else. She said we needed to learn while we were in her class. If we did not like this, she recommended we leave and sign up for another instructor. Harsh!

So far I have received a morning worth of instruction. It has been a pretty good introduction. Since XML itself is not complicated, we have learned the entirety of the syntax of the language. The remaining days will be spent learning different XML applications and languages such as XML Schema. I think I am going to stick it through with this dictator instructor. It would be too much trouble to reschedule the class with another instructor.

I got a good feeling at lunch today. I ordered some Chinese food, and got the following fortune in my cookie:

You're transforming yourself into someone who is certain to succeed.

XML Home