What is XMPP?

Lately I have been reading up on the whole SOAP versus REST discussion. I saw a comment on one blog that “XMPP was not going to replace REST”. At that point I figured this was a true statement. That is because I had never heard about XMPP before. So I decided to do a little research on the web to find out more about it. This is what I found.

XMPP stands for Extensible Messaging and Presence Protocol. It is a steaming XML technology for instant messaging and buddy lists. It is the core protocol of Jabber, which itself is an open source technology for instant messaging. Jabber’s advantages are that it is open, decentralized, secure, and free. XMPP is an open standard. It is implemented in Google Talk. I hear that it has a large overhead.

XMPP was developed by the Jabber open source community in 1999. There is actually an XMPP Standard Foundation. The specifications for XMPP were produced by the IETF XMPP Working Group. The standards were written in a set of RFC. For example RFC 3920 is XMPP Core, produced in 2004.

I read a funny blog post by Matt Tucker that said developers should “come to Jesus about the [XMPP] protocol”. I guess to some this is sort of like a religion. Matt goes on to say that XMPP is good for cloud computing. He believes that SOAP does not scale.

Getting a little more into the details, XMPP defines extensible elements called XML stanzas. They are exchanged in real time. The old way of doing things was for clients to poll servers. Examples of this are G-mail and RSS readers. However this method would not work for instant messaging due to the sheer volume of clients. The solution is XMPP. There is obviously a lot more to learn about this technology. There are a lot of extensions to the core standard. I will keep you posted on anything that I learn.

WS Star

The more I look at SOAP and REST, the more I started seeing references to WS-*. I wondered what the heck that meant. Nobody I know seemed to talk about that. And I spend a good deal of time with developers. The problem is that most of us are legacy developers who do not deal a lot with newer technologies on our jobs. I found that WS-* refers to a set of specifications for web services. It is not a single set of specs. And no one body owns all of them. Many of the actual specs begin with WS. So that is why they are collectively known as WS-*. And it is pronounced WS-Star.

Here are some sample WS-* specifications that I have found:

WS-Notification
WS-Addressing
WS-Transfer
WS-Eventing
WS-Enumeration
WS-Policy
WS-Discovery
WS-Metadata Exchange
WS-Resource Framework
WS-Security
WS-Trust
WS-Federation
WS-Reliability
WS-Transfer
WS-AtomicTransaction
WS-Coordination
WS-CAF
WS-Transaction
WS-Context
WS-CF
WS-Management

I decided to look up one example in more detail. So I chose WS-Policy at random. The full name of the specification is Web Services Policy Framework. There are 20 authors of the spec. Many of them seemed to be affiliated with Microsoft Corporation. For example Don Box was one of the authors. The document itself is 25 pages long.

To tell the truth, I really did not understand the introduction section to the WS-Policy spec. It was filled with a bunch of buzz words. At least I got that the name space for the spec is http://schemas.xmlsoap.org/ws/2004/09/policy. And it helped that they gave an example of the spec in XML code.

There is a reason why the author of Ruby on Rails calls WS-Star the “WS Death Star”.

SOAP and REST

I have read a couple blog posts regarding the Simple Object Access Protocol (SOAP) and Representational State Transfer (REST). Let me start by talking about and InfoWorld article. It discusses a statement by Tim Bray, the director of technologies at Sun Microsystems. He says the “SOAP stack is a failure”. Tim continues that REST is more viable, elegant, and affordable. He says there will be more and more tools for REST from big companies such as Sun, Microsoft, and Oracle. However Tim concedes that there is a lack of current tools for REST.

Another good blog post was “REST as an engineering discipline” by Bill de hOra. Bill comes out and says that SOAP is simple. After all the S in SOAP stands for simple. He contrasts this with REST which is not necessarily simple. Bill qualifies this by saying that REST is neither better nor worse than SOAP. He does state that REST works. But he advises that REST is not for hackers. And there is a lot of hype around rest right now. Things may be a lot different once the hype dies down.

Having done a little SOAP with a lot of help from instructional materials, I can say that it is indeed not all that simple. Yes the outer wrapper of the SOAP envelope may not be rocket science. But from a beginner’s standpoint, you have enough to worry about with XML. Adding the SOAP layer on top for messaging does introduce a little more complexity. Perhaps this will get easier if I work with SOAP more often. For now my project only intends to receive input files in XML format. I am not sure if they shall be validated with an XML schema, or be packed in a SOAP envelope. It is just exciting that the topics I have learned about and will be using are current ones in the technology sector.

REST Discussion

It seems there is a lot of buzz about Representational State Transfer (REST) in blogs recently. First there was a post by Damien Katz, which got a response by Dare Obasanjo that I would like to mention first. He pointed out that REST was coined by Roy Fielding who was one of the HTTP 1.1 specification authors. SOAP on the other hand came from the Microsoft camp with people like Don Box. SOAP was then pushed by the W3C. It was of interest so that people could package old enterprise techniques such as COBRA into new buzzwords.

Obasanjo continued that REST is specifically for the client server architecture. The statelessness feature of REST contributes to increased scalability and reliability. REST is a characteristic of the web. A resource is anything that can be named on the web. Regarding PUT and POST, PUT is idempotent. POST is not idempotent. In other words, you can PUT many times, and the outcome is the same. This is not true with POST. I liked the phrase that “session state is evil”. That does not mean that this architectural outlook is correct. In addition, Obasanjo lobbied for developers to not fight against the web.

In the same regard, I read a REST Questions blog post. The most interesting part of it were the comments from readers. REST says that application state is not supposed to be kept by the server. It is acceptable and desired for the client to keep application state. REST is not necessarily good at communicating errors back to the client. There is definitely a lack of good tools out there for REST. PUT and DELETE are not supported in browsers, causing a limitation to REST.

More REST Questions comments included one that the envelope/wrapper idea from SOAP is not a bad one. However making the envelope be in XML format is the problem. REST may indeed be simple to understand at a high level. However that does not mean it is easy to implement. There is still some confusion out there as to what REST is exactly. For example there is no REST RFC.

I expect the discussion to continue since REST is still one of those hot buzz words. We shall see what technology wins when all the dust settles. Last year I recall SOA being the buzzword of choice. And before that it was Ajax. Time will tell.

Anti XML

I read a blog post by Brennan Spies entitled “XML Backlash”. This post summarized some of the anti-XML sentiment I have been hearing for some time in the industry. I thought this would be a relevant topic since I just completed a training course in XML myself.

Brennan offered that XML was introduced in 1998. XML has a complementing set of other technologies such as XPath, XSLT, XQuery, and XML Schema. XML is used for configuration and enterprise applications. It is also at the heart of AJAX.

There are other technologies competing with XML. These include Protocol Buffers, JSON, and YAML. Protocol Buffers is a format designed and heavily used by Google. JSON (JavaScript Object Notation) is a simple text based format that is actually language independent. YAML strangely stands for YAML Ain’t a Markup Language.

XML does have its benefits. It is both platform and language independent. There are many tools available to work with XML. I personally plan on getting my company to buy me a copy of XMLSpy from Altova. Although some developers on my team say that Visual Studio is enough. XML is surprisingly readable. Another person taking training when I did told me she thought XML was pretty easy to read. I guess she had not done XSLT yet. Finally XML does have the whole XML Schema language for validation.

One of the main complaints about XML is that it is overly verbose. That translates into big files. What I found interesting about the blog post that I read were the comments from other people. Some said that XML is good, but not good for everything. Another pointed out that XML is good in that it is web friendly. I found it interesting that many people agreed that file in CSV format were most common.

The thing I do know is that XML is in my future. We are going to be receiving some input files in XML format next year. This is a requirement dictated to us by our customer. Apparently this change is part of a modernization effort. You have to understand that we deal with big legacy systems at my work. They are mainly COBOL programs that run on the mainframe. So XML might by new in that light. I will keep you posted on how our project is doing with introducing XML to it. There is one piece of good news. One of my buddies on the project was planning to leave for another project. However he is thinking about sticking around and handling the XML upgrade.

REST

I am starting to become more aware of the term Representational State Transfer (REST) having learned about XML. This term was coined by Roy Fielding in his Ph. D thesis "Architectural Styles and the Design of Network-based Software Architectures". REST is truly an architectural style. It is not a standard. It also does not deal with implementation. The web (with HTTP) is an example of REST.

REST deals with resources on the web. These resources are all named with URLs. They are accessible via the HTTP GET command. Thus it is a pull type interaction. Furthermore REST is a stateless transaction. Operations cannot expect to retain state from previous transactions to be able to work. However REST does accommodate the need for a cache to speed up network performance.

For now I plan to keep my eyes open and perhaps try to learn more about REST details. My chief goal right now is to put my new XML knowledge into practice. So I will be gearing to more complex methods for web services like Service Oriented Architecture (SOA). Look for my previous blog post about SOA
.

WOA

I just got done reading my latest copy of Information Week magazine. There was an article in there by Roger Smith entitled "Smart Web App Development". It explained the rising popularity of a Web Oriented Architecture (WOA). This was a new term for me. So I read the article carefully. I am glad I recently trained in XML so I could follow some of the topics involved.

WOA is an approach to system design. From what I gather it is based upon Representational State Transfer (REST). REST is a simple data transfer technique over HTTP without using complex wrappers like SOAP. Resources are accessed through URIs. It purports to have better response times, lesser server load, and less client code.

WOA is a lighter form of Service Oriented Architecture (SOA). SOA is more complex and expensive to develop. It requires you to sends XML messages wrapped in SOAP envelopes. I have blogged about SOA in the past so I will not repeat all the details.

The Information Week article stated that many companies such as Google, Yahoo, Amazon, and Mozilla are going to the simpler WOA approach. The article also admitted that a WOA might not be best for enterprise level computing. In fact it portrays WOA as a complimentary technology to SOA. For now I think I will be more into Service Oriented Architecture since I just learned a lot about XML.

AJAX

Today was the last day of my XML Training class. We had a shortened session today. And we covered many subjects. However we did not delve too deeply in any of these subject. Essentially we reviewed different applications that use XML. One of these uses is Asynchronous JavaScript and XML (XML).

Early web pages had to reload the entire web page every time there was a change needed. However this was slow and was overkill when you needed to modify a small portion of the web page. Thus Asynchronous JavaScript and XML (AJAX) was introduced. It allows JavaScript running on the client to make a request to the server. The JavaScript then takes the server response and updates a small portion of the web page. This is done in the background. Therefore this is an asynchronous operation. The AJAX communication between client and server is in XML. The JavaScript sends the request to the server via XML over HTTP. The server also responds in XML format.

The final topic of the class was securing XML. You can utilize HTTPS. This method was previously called Secure Sockets Layer (SSL). However this is a slow option. An alternative to speed this up is to encrypt select elements that contain sensitive information. And that's it for my class. However there is a follow on class covering advanced XSLT that I may take. If so, I will share all that I learn. I hope this series in my blog has been instructional.

SOA

In this post I continue a review of my last day in XML Training class. We briefly mentioned web services today. Web services run on a server. Clients communicate with the server via XML messages over the HTTP protocol. It is platform independent. The umbrella technology is referred to as Service Oriented Architecture (SOA).

There are three main components to SOA. There is a registry that lists all the service. Providers are the servers that implement the services. Finally requestors are the clients that look up needs in the registry, and communicate directly with the providers.

The registry information is provided in the Web Services Definition Language (WSDL). This acronym is pronounced wiz-dull. WSDL gives information on where the service is located, the format of the input messages, and the format of the output messages. The formatting information is given in XML Schema notation.

Web Service communication is done via messages in XML format. Furthermore these messages adhere to the Simple Object Access Protocol (SOAP). SOAP is a format which is some tags in XML which stipulate that the XML messages are enclosed in a SOAP envelope. The SOAP format is sent over the HTTP protocol.

There were a couple other miscellaneous topics covered in class such as Ajax and XML Security. I will defer these to a future post.

XSL-FO

Today was the last day of my XML Training class. We covered a number of topics today. Most of them only provided an introduction to the topic. One such topic was XSL Formatting Objects (XSL-FO). This technology is used for presentation of XML data similar to what HTML does.

The most popular use of XML-FO is to generate PDF output from XML documents. This is actually a multi step process. First the XML and XSL run through a processor generating the XSL-FO code. Then a formatter takes the XSL-FO and produces the PDF output document.

One implementation of XSL-FO is the Apache FOP. Its best feature is that it is free. However it does not implement all of the XSL-FO standard. At the other end of the spectrum is RendexX XEP. It is the most advanced XSL-FO implementation. However it is also the most expensive, costing thousands of dollars.

My instructor did point out that XSL Formatting Objects is not very popular. It is used by government organizations to produce things such as tax forms. It is also used in the financial community. My instructor said that at one class, New York banks were recruiting class members to work for them doing XSL-FO work. They had high paying clients that needed financial documents generated by XSL-FO.

There is a whole follow-up class that teaches more of XSLT and XSL-FO. So we did not go deep into the subject. I will post some more on the other topics we covered today such as web services, SOA, SOAP, WSDL, and XML Security. Be sure to tune in.

XSL

The majority of XML is structured around how data is stored. It does not concern itself with presentation. This is where the Extensible Stylesheet Language (XSL) comes in. XSL has a similar functionality to XQuery. However XSL is much older than XQuery. It allows you to transform XML to a presentation format such as HTML.

There are two places you can perform an XSL transformation. It can be done on the server with the resulting output such as HTML being piped to the client. Or you can send both the XML and XSL data for the client to do the transformation. It is recommend that the server do this. You cannot always guarantee that a client Web browser has the capability to do the transformation.

The transformation makes use of an XSL stylesheet. The default behavior of an empty stylesheet is to traverse the children of an XML data tree and print the PCDATA elements. However you normally add template matches to the stylesheet in XPath format. When such a match is made, the code in that template block is executed and the traversal normally stops. You can issues an apply-templates command in this block. That causes XSL to continue processing the templates you instruct it to (with the default being all children elements in the tree).

Here are a couple closing ideas on this brief intro to XSL. The XSL templates are never nested. And their order in the XSL file is not important. Execution is determined by the order of nodes in the XML tree, and the XPath information in the templates. Tomorrow we are going to learn how to use XSL formatting objects to turn XML into a PDF file. There is also another 3 day long class on the topic of XSL transformations by itself. So you know it is a huge topic that I cannot even start to cover here. I just wanted to share what little bit I have learned. I can't wait to get back to work and put some of this XML knowledge into action.

More XML Schema

This afternoon I had a rapid introduction to the XML Schema language. If you have not done so already, see my first post on XML Schema. In addition to being able to specify an element format (or type), you can specify its cardinality. This is done with the element attributes minOccurs and maxOccurs. The minOccurs attribute default to 1, can be set to whatever you desire, and can represent an optional element if set to 0. The maxOccurs attribute defaults to the lesser of 1 or minOccurs. If you want to specify that there is no maximum, the maxOccurs attribute can be specified as "unbounded".

Previously I had introduced the concept of complex types. They represent attributes, or elements made up other sub elements. XML Schema allows you to specify fine control what a particular complex type element requires. This is done in part by what's known as a compositor property. The valid compositors are sequence, choice, and all. A sequence compositor means that all pieces of a complex type are required and must be in a specific order. A choice compositor allows you to specify the different components that a composite type can optionally have. And an all compositor allows specification of a list of complex type parts, each of which can have 0 or 1 occurrences in the whole element.

XML Schema allows you to define a global element. Such an element must be a direct child of the root element. The opposite of this is a local element which is defined within the compositor. You can also define new data types. A simple new data type is one which derives from a built in type, but restricts the range of possible values through a mechanism called a facet. I will not enumerate all the possible facet varieties here. You can also create a new data type which is of the complex variety.

XML by itself has a way to add a comment. That should be used if you have an internal comment. XML Schema has another form of adding comments to an XML file. However these comments get pulled into the official documentation. So make sure they are appropriate for public consumption. You can use XML Schemas with XML namespaces. There is a little tricky syntax change that must be applied to both the XML File and the XML Schema to make namespaces work. XML Schemas can include other XML Schemes (.xsd files). Finally you can use a tool to convert an old style Document Type Definition (DTD) into the newer XML Schema language.

XML Schema

I continue to pass on information learned in my XML Training class. Previously I had learned that a well formed XML document is one that has correct syntax. That just means that the formatting meets the XML rules. But this has little to do whether the required type of data, or the correct data format is contained in the file. For this we need XML validation.

XML validation used to be performed by a Document Type Definition (DTD). This is another file which describes the required rules to determine whether the XML data is valid. DTDs did not work well with XML. They allow specification of structure definition. They were weak on data types. This led to creation of a new validation language called XML Schema.

Initially there were many flavors of XML Schema. However on 05/02/2001, a single version of XML Schema became a standard. Although you can code XML Schema documents using a text editor, it is much easier to use a tool to design the schema graphically. XML Schema has an XML namespace that normally is denoted with the alias xsd. XML Schema comes with 40+ built in data types. These are the simple types. You can also use a complex type which is either a set of sub elements, attributes, or a combination of both.

There is a lot more to what you can specify in the XML Schema language. I think I will save that for a future post. Mind you that we learned all about XML Schema in one afternoon. I am in a demanding training class.

DOM

This afternoon in my XML training class we learned about the Document Object Model (DOM). It is an API used to pragmatically work with XML data parsed into a tree structure. You should use it if you need to more than just read and write XML. Otherwise you should use the simpler XQuery language. DOM is object oriented. It is also an open standard.

Here is a list of crucial DOM API functions: createElement, appendChild, createTextNode, and setAttribute. Some other lesser user DOM functions are createComment, createElementNS, createCDATASection, createProcessingInstruction. You will need to also make calls to a loadXMLDocument function prior to using any of the DOM functions. The loadXMLDocument function is not a part of the DOM API. Instead it is provided by the implementation of the parser you choose.

We used Firebug to debug our DOM code. Firebug is a Firefox extension written by one of the original authors of Firefox. Note that it is a little ambiguous seeing a text node and the text data itself in Firebug. However if you mouse over a text node, it will show an underline signifying that it is a node.

My XML training class is skipping over the Simple API for XML (SAX). It is an API used to deal with parsers that are event driven as opposed to tree driven. I am not sure if we skipped this topic because it is not used as much, or if it is more complicated, or that there was not just enough time to cover it. Any of you readers use SAX? Let me know.

XML and Databases

This morning I attended my second day of XML training. We spent a lot of the morning doing exercises to reinforce the stuff we had learned. However I thought I would pass on some of the new information I have learned. This was paid training that my company paid big bucks for. You get the benefit of this training here for free.

The Microsoft Excel application stores data in its native Excel format by default. However it has the ability to export to XML format. You need to map spreadsheet cells to XML elements. This is done by first choosing XML Source from the XML submenu of the Data menu. Once you have done this, you can choose to Save As file format "XML Data".

There are a number of ways to store XML data in a database. One technique is to choose to store the whole XML file as a record in a column of type native XML. Oracle databases that are version 9i and above support this. SQL Sever databases that are version 2005 and above also support this. If you have an older version of Oracle or SQL Server, or if you have a different database that does not have an XML native data type, you can still store the whole XML file as a CLOB data type.

Another way to store XML data to a database is called Shredding. This is where you just extract the character and attribute data. This data is then inserted into columns in the database. The XML formatting (e.g. tags) is lost in the process. This has the benefit of using less space in the database. It also makes for faster queries. However this method is not recommended if your XML data is unstructured, or if you plan to transfer data in XML format frequently.

Once you have the data stored in an XML database, some databases have built in support to query and retrieve the data. In Oracle you use the built in package DBMS_XMLGEN. This allows you to retrieve XML from the database and generate an XML output file. If the resulting format by default is not what you want, you can furthermore run the database output through XQuery to reformat it.

Unfortunately our class had computers which only had Oracle XE (Database Express Edition) installed. This version of Oracle had a limitation that did not allow you to use advanced features to format results using XQuery directly. We had to pipe the output to an XML file first. Then we used another tool to format the results. I wish my training company had spent a little extra and got a license for a real Oracle database version. However I can go back to work, try it out on Oracle, and let you know the results.

It is time to get back to class. Lunch break is over. I shall post again to let you know what I learn this afternoon. I hope the information I am sharing is of some use to you. If your company has the budget, I would recommend you also attend formal XML training. I am learning much here.

XPath and XQuery

Ok so I am finally getting around to talking a little about XPath and XQuery. Sorry for the delay. Note that this was information I got on my first day in XML training. I am sure there will be a lot more information that I learn and pass on to you.

XPath is actually a part of the Extensible Stylesheet Language (XSL). It became so popular that its use has be expanded for XML purposes. XPath is a language that allows you to specify a path that will return certain parts of an XML document. It is very popular. And it is 10 years old. Here is the important symbology in the XPath language:


  • / = document
  • * = element
  • text() = PCDATA
  • @* = attribute
Now let's move on to XQuery. About 80% of XQuery is just XPath syntax. It is very similar to SQL. It is used to interrogate an XML file and produce a subset in XML format. The most common pattern in XQuery is as follows:

for ... in ...
where ...
order by ...
return ...

Introduction to XML

As I previously blogged about, I am currently in an XML training class. So I thought I would share some of the things I learned today. In fact, this post will cover what I learned this morning. The class costs over $2600 over 4 days. That's averages out to over $650 a day. This morning information cost my company $325. So listen up and learn a thing or two free of charge.

XML is a method of data transfer that intends to be simple and cost effective. It is text based. XML is an open standard. It is a subset of SGML. Although it is very powerful, SGML is complex. Only one person in my class of 24 knew SGML well enough to confess it. A related markup language is HTML, which is an implementation of SGML.

A newer version of HTML called XHTML has a stricter syntax to conform to XML. HTML is for presentation in a web browser. XML on the other hand is for expressing data. Another data transfer method is Electronic Data Interchange (EDI). EDI is expensive and more complex. Let's now focus in on some XML Details.

The basic unit of XML is an element. An element is demarcated by a start and end tag. Elements can have optional attributes. The outermost element in an XML file is called the root or document element. XML is case sensitive. One way to process XML files is by use of a parser. Microsoft provides the MSHTML parser with Internet Explorer. And Apache has the free Xerces parser.

You can put text in an element. XML calls this parsed character data (PCDATA). Some special codes within PCDATA expand to reserved characters. These are called entities. An example of an entity is < which is the less than sign (<). You can put special characters in data. But you then need to surround them with an unparsed character data tag (CDATA).

XML documents are encoded in Unicode. You can choose a format such as UTF-8 or UTF-16. An XML document is called well formed if it has a correct syntax. I will not go into the detail of all the XML syntax rules. However they are few and we learned them all in the morning session. I will post some more later on further XML topics such as XPath and XQuery. Until then cheers.

XML Course

Three months ago I started working for a new company. At first I was assigned to a manager that was very busy. He told me this was temporary. Later I got reassigned to another manager. My new manager had the time to explain some of the benefits available for working at our company. Once a year we are allowed to attend 4 days worth of training, with a $2500 budget to attend a class. That sounded good to me.

Our project is about to receive input files in XML. So I thought it would be good to attend a class on this topic. I looked at a catalog for a training company. They had a 4 day class. That matched the time my company gave me off. However the cost was $2650. I asked my manager what we could do. She said I needed to get her boss to sign off on the extra $150. Her boss told me that his boss had to sign off.

It took a couple calls. However I finally found out that the full price for the course had been approved. Today I started the first day of training. The instructor had a strong Polish accent. At the beginning of class she told everyone to pay attention at all times and not do anything else. She said we needed to learn while we were in her class. If we did not like this, she recommended we leave and sign up for another instructor. Harsh!

So far I have received a morning worth of instruction. It has been a pretty good introduction. Since XML itself is not complicated, we have learned the entirety of the syntax of the language. The remaining days will be spent learning different XML applications and languages such as XML Schema. I think I am going to stick it through with this dictator instructor. It would be too much trouble to reschedule the class with another instructor.

I got a good feeling at lunch today. I ordered some Chinese food, and got the following fortune in my cookie:

You're transforming yourself into someone who is certain to succeed.