XML and Databases

This morning I attended my second day of XML training. We spent a lot of the morning doing exercises to reinforce the stuff we had learned. However I thought I would pass on some of the new information I have learned. This was paid training that my company paid big bucks for. You get the benefit of this training here for free.

The Microsoft Excel application stores data in its native Excel format by default. However it has the ability to export to XML format. You need to map spreadsheet cells to XML elements. This is done by first choosing XML Source from the XML submenu of the Data menu. Once you have done this, you can choose to Save As file format "XML Data".

There are a number of ways to store XML data in a database. One technique is to choose to store the whole XML file as a record in a column of type native XML. Oracle databases that are version 9i and above support this. SQL Sever databases that are version 2005 and above also support this. If you have an older version of Oracle or SQL Server, or if you have a different database that does not have an XML native data type, you can still store the whole XML file as a CLOB data type.

Another way to store XML data to a database is called Shredding. This is where you just extract the character and attribute data. This data is then inserted into columns in the database. The XML formatting (e.g. tags) is lost in the process. This has the benefit of using less space in the database. It also makes for faster queries. However this method is not recommended if your XML data is unstructured, or if you plan to transfer data in XML format frequently.

Once you have the data stored in an XML database, some databases have built in support to query and retrieve the data. In Oracle you use the built in package DBMS_XMLGEN. This allows you to retrieve XML from the database and generate an XML output file. If the resulting format by default is not what you want, you can furthermore run the database output through XQuery to reformat it.

Unfortunately our class had computers which only had Oracle XE (Database Express Edition) installed. This version of Oracle had a limitation that did not allow you to use advanced features to format results using XQuery directly. We had to pipe the output to an XML file first. Then we used another tool to format the results. I wish my training company had spent a little extra and got a license for a real Oracle database version. However I can go back to work, try it out on Oracle, and let you know the results.

It is time to get back to class. Lunch break is over. I shall post again to let you know what I learn this afternoon. I hope the information I am sharing is of some use to you. If your company has the budget, I would recommend you also attend formal XML training. I am learning much here.