Native XML Database

XML… in the Cloud !

April 16, 2017

Over the past couple of years I have focused a lot on Cloud Data Services and less on XML as a specialty topic. But sure enough, all of the great XML processing capabilities that you know from DB2 pureXML are also available in cloud database services such as dashDB and DB2 on Cloud.

DB2 on Cloud is a so-called hosted service, which means that clients can rent a DB2 server in a IBM cloud data center where IBM manages the hardware, storage, and networking while the customer has full control (and full responsibility) over the administration of the software stack, including OS and DB2. For example, applying fixpacks as well as overall database administration including backup/restore, database configuration, etc. is all in the hands of the customer. This gives you a lot of control and flexibility, but also requires a certain degree of effort and skill.

In contrast, dashDB is a fully managed service in the cloud. dashDB is also based on DB2 software but the entire software and database administration is done by an IBM operations team. This service includes software patches and upgrades, database backup/restore, setting database configuration parameters, defining table spaces and bufferpools, and so on. This takes a lot of the everyday database admin efforts off your shoulders so you can focus directly on your application use case: creating tables, loading data, running queries, setting permissions for your users, connecting applications, and so on.

Both dashDB and DB2 on Cloud provide key XML capabilities, such as:

Create tables with XML columns
Load, import, insert, update, and delete XML documents
Define XML indexes
Query XML data with SQL/XML or XQuery
Process XML in stored procedures
Validate XML with XML Schemas
Generate XML from relational tables
…and so on.

Note that there are two flavors of dashDB, known as dashDB for Analytics and dashDB for Transactions. The latter is the more suitable choice for most XML processing applications.

The overall benefit of such a cloud services is faster time to value. You don’t need to spend time on setting up a server and installing software, and you have no up-front investment. You pay for the service only for as a long as you use it, like a subscription.

The next time you need relational or XML database capabilities, consider dashDB and DB2 on Cloud as simple and convenient options! Both are available in an IBM data center near you!

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

How to validate XML in a Stored Procedure?

May 8, 2015

Encapsulating database operations in stored procedures has a variety of potential benefits. For example, stored procedures allow you to code more complex data manipulation logic than what can be done with an individual SQL statement. Also, combining multiple operations in a single stored procedure can improve performance because the operations are executed close to the data (in the database engine) with a single invocation from the application, reducing the number of API calls to the database.

For these reasons it is a quite common practice to implement pieces of XML processing logic in database stored procedures. Some applications want the database to validate XML documents against an XML Schema, which is easy to do as part of insert, update, or load operations, or even in database queries. But, how to validation XML in a stored procedure?

It’s easy. Let’s look at the following procedure as a simple example. It takes an XML document “doc” as an input parameter, and returns a flag “isvalid” which is 1 if the document is valid and 0 if the document is not valid. Of course, you could return more detailed information if needed, but 0 and 1 should be sufficient as an example.

The procedure declares a condition handle for SQLSTATE 2200M, which is raised when an XML document is not well-formed or not valid in an attempted schema validation. If that state should occur during the execution of the stored procedure, then the corresponding exit handler gets executed. In this example, the exit handler simply sets “isvalid” to zero.

CREATE PROCEDURE processXML(IN doc XML, OUT isvalid INTEGER)
 BEGIN

  DECLARE invalid_document CONDITION FOR '2200M';
  DECLARE EXIT HANDLER FOR invalid_document 
       SET isvalid = 0; -- indicate that doc failed to validate

  IF (XMLVALIDATE(doc ACCORDING TO XMLSCHEMA ID db2admin.myschema) IS VALIDATED)
   THEN
      -- indicate that the document is valid:
      SET isvalid = 1;
      -- then process the XML as intended:

             ....
             ....
             ....

    END IF;

END # -- end of procedure

After these declarations, the first operation in the stored procedure is the document validation with the XMLVALIDATE function. The XMLVALIDATE function specifies the name (ID) of the XML Schema that is being used for validation. In this example, the schema name is hardcoded as “db2admin.myschema”, but it could also be another parameter into the stored procedure.

If the validation is successful (i.e. IS VALIDATED is true) then we set the “isvalid” flag to 1 and proceed with whatever processing we want to perform on the XML document. If the validation is not successful, then the procedure executes the exit handler and terminates.

This example provides a simple skeleton for XML stored procedures, to ensure that the XML processing in the body of the procedure is performed only if validation is successful.

Posted by Matthias Nicola
Filed in Uncategorized

1 Comment »

IDUG North America 2015: XML and more…

May 8, 2015

I spent this week in Philadelphia at the annual DB2 user group conference. The week was packed with excellent technical sessions, panel discussions, customer presentations, hands-on labs, seminars, and lots of networking opportunities.

As you would expect from such a big event, a very large variety of DB2 and application development topics were on the grid, including DB2 BLU, high availability, DB2 11 for z/OS, query optimization, DB2 in the Cloud, and many others.

And, almost 9 years since the introduction of pureXML in DB2 9.1, XML was still a popular topic at this year’s conference, with some excellent sessions from DB2 users and from IBMers:

E03 – XML – Essentials for the DB2 z/OS DBA
Thanikachalam “Billy” Sundarrajan, Fifth Third Bank

E05 – XML and JSON in Examples
Jan Marek, CA Technologies

E17 – XML Data Processing Performance
Aleskey Slutch and Alexander Veremyev, Raiffeisenbank Russia

Hands-on Lab: Learning Exciting JSON and XML in IBM DB2 for z/OS
Jane Man, IBM Corporation

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

XML and BLU

December 24, 2013

As you probably noticed, DB2 10.5 features a new technology called BLU Acceleration which provides significant gains in performance, reduced storage consumption, and simplicity for analytical database workloads. To this end, BLU combines column-organized storage with novel compression techniques, hardware-specific optimizations, data skipping, new techniques for join processing and aggregations, and other innovations.

Since the release of BLU in June 2013, I was asked several times whether BLU can be used with XML or how BLU relates to XML in general.

The short answer is that a BLU table, i.e. a column-organized table, cannot contain columns of type BLOB, CLOB, or XML at this point in time. If you use BLU in a DB2 10.5 database, your XML documents can be stored in the same database but need to reside in traditional tables which are not column-organized.

Fundamentally, XML and BLU follow two very different approaches to managing data.

The concept of XML documents is to keep all attributes for a given business object (e.g. an order, a product, a contract) physically together as one entity so that you can easily retrieve or insert all the relevant information for an object in one shot. This is particularly useful for complex objects whose attributes would otherwise be spread over many relational tables. XML is also useful for objects whose set of attributes can evolve over time, since XML also offers schema flexibility. The main focus of using XML is on the business object as the unit of interest.

In contrast, BLU is designed to store all attributes separately from each other, purposefully breaking up the objects, even more so than traditional storage formats for relational tables. The focus of BLU is on accessing individual attributes across all or many objects as efficiently as possible. The column-organized storage purposefully exploits the fact that many analytical questions never need to see whole objects but only a (small) subset of the attributes. The unit of interest is the individual attribute, not the entire objects.

As such, XML and BLU serve different purposes and access patterns, both valid in their own right.

Analytical queries that need to scan, join, filter, aggregate, etc. on individual attributes across a large number of objects will perform much better with BLU than with XML.

Applications that often need to read or write entire objects, esp. complex objects, perform much better with XML than with BLU. Similarly, queries that typically access most or all columns of a relational table perform better on traditional row-organized tables than on column-organized tables.

Now what if you have subset of attributes in your XML documents that you want to run analytical queries on, ideally across all or many of the documents? In this case you can consider extracting some of these attributes for redundant storage in a BLU table. This can be done with triggers, stored procedures, or application logic.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

How to list all elements and attributes in DB2 for z/OS

January 26, 2013

In a previous post I described how to write a query that produces a list of all elements and attributes that occur in a document or a set of documents.

The core idea of that technique was to use the XPath expression $doc//(*, @*) in the XMLTABLE function. In this expression, $doc references the XML document or XML column that is used as input. This expression also uses the so-called comma operator that combines two squences into one. These two sequences are are all elements (*) and all attributes (@*).

For this technique to work in DB2 for z/OS, you need to code the XPath expression in a slightly different way: ($doc//*, $doc//@*) , which lists all elements followed by all attributes.

It is useful to sort the result by the parent of each node in the document so that the elements and attributes that belong to the same parent appear in consecutive order in the result set.

The following listing shows two queries that produce the same result set, and the second of the two queries is the recommendad notation for DB2 for z/OS:


SELECT T.*
FROM purchaseorder p,
     XMLTABLE('$doc//(*, @*)' passing p.porder as "doc"
      COLUMNS
       node   VARCHAR(20) PATH 'name(.)',
       parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;

SELECT T.*
FROM purchaseorder p,
     XMLTABLE('($doc//*, $doc//@*)' passing p.porder as "doc"
      COLUMNS
       node   VARCHAR(20) PATH 'name(.)',
       parent VARCHAR(20) PATH '../name(.)') AS T
WHERE poid = 5000
ORDER BY parent;

And then you can ertainly extend such queries to also list the node values or an indication whether the node is an element or an attribute, as previously described here.

Similar considerations apply to queries that list all paths in a document.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

nativexmldatabase.com – 2008 to 2012

December 28, 2012

This blog on XML database technology and DB2 pureXML has been active for almost 5 years now. My former colleage Conor O’Mahony ran this blog from early 2008 to December 2009, and I took over on Dec 15, 2009. We have received a lot of feedback and many of the blog posts were inspired by questions from XML database users.

I’m not sure yet how much time I will have for blogging in 2013 and beyond. In any case, I liked the idea of producing a summary page that provides a quick overview of all the topics and posts in this blog. But, how to produce such a summary most efficiently?

Well, true to the topic of this blog I decided to load this blog as a piece of XML into DB2 pureXML and run queries on it!

First I downloaded the complete blog as an HTML document. But, HTML is not necessarily wellformed XML because HTML allows elements with missing end tags and other goofy things. Luckily there are free tools available that convert HTML into XHTML. XHTML is a form of HTML that complies with the rules for well-formed XML documents.

After converting the entire blog into an XML (XHTML) document, it took only three steps to create a list of blog post titles, dates, and URLs:

CREATE TABLE nativexmldatabase(doc XML);


LOAD FROM blog.del OF DEL REPLACE INTO nativexmldatabase(doc);


SELECT date, title, url
FROM nativexmldatabase,
   XMLTABLE('$DOC//div[@class="entrytitle"]'
   COLUMNS
     seq     FOR ORDINALITY,
     date    VARCHAR(20)   PATH 'h3',
     title   VARCHAR(105)  PATH 'h2/a',
     url     VARCHAR(145)  PATH 'h2/a/@href') AS T;

Of course, a quick look into the XML document is necessary to discover the required tag names and XPath expressions, such as //div[@class=”entrytitle”] to iterate over all the individual posts, or h2/a and h2/a/@href to retrieve the title and URL of each post.

This query produces a nice list of all blog post titles, dates, and URLs. Instead of posting this listing as-is, I decided to roughly group the blog posts along the following topic areas:

* Design
* Performance
* Migration
* Relational to XML or XML to relational
* XML Queries
* XML Storage and Indexing
* XML Schemas
* Application Development
* Use Cases and Case Studies
* ETL and Warehousing
* DB2 pureXML Product News
* Other Products and Tools
* Publications and Resources
* XML Coverage at IDUG and IOD Conferences
* News, Announcements, Miscellaneous

And here is the result, a table of contents for nativexmldatabase.com from 2008 through 2012. I hope this can serve as a quick reference to topics of interest.

Date	Design
Dec 4, 2012	Good and bad XML design
Jan 21, 2012	Business Records in the 21st Century
Jan 8, 2012	Data Normalization Reconsidered
Jan 21, 2011	Name/Value Pairs – A pretty bad idea in XML as in Relational !
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Sep 28, 2010	5 Reasons for Storing XML in a Database
Sep 26, 2008	Flexible Schemas: When to Persist Data in XML Instead of Relational
Aug 4, 2008	When to Store in XML Format
Mar 21, 2008	When to use a Native XML Database

Date	Performance
Oct 23, 2012	Overcoming Performance Obstacles in XML Encryption
Nov 15, 2011	TPoX 2.1 has been released!
May 9, 2011	News Flash: Intel publishes TPoX Benchmark results on new 10-core Westmere-EX CPUs
Mar 4, 2011	New TPoX Benchmark Results Available
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Aug 22, 2010	XML versus Relational Database Performance
July 14, 2010	A 10TB XML Data Warehouse Benchmark
Nov 3, 2008	Benchmark for 1TB Transactional XML System
Mar 20, 2008	Native XML versus CLOB and Shredding

Date	Migration
June 29, 2012	How to migrate from the XML Extender to DB2 pureXML
Oct 3, 2010	How to migrate XML from LOB to XML columns

Date	Relational to XML or XML to relational
Aug 30, 2010	XML Construction in Views and User-Defined Functions
Aug 24, 2011	How to generate XML from relational data – with line breaks!
June 8, 2011	How to quickly produce XML from relational tables
June 5, 2011	How to insert relational data into existing XML documents (Part 2)
May 21, 2011	How to add relational data into existing XML documents (Part 1)

Date	XML Queries
Sep 13, 2012	How to Validate XML Documents in Database Queries
Aug 14, 2012	Result set cardinalities when querying repeating XML elements
Aug 10, 2012	Using the XMLTABLE function with MERGE statements in DB2 for z/OS
July 30, 2012	Using the XMLTABLE function in UPDATE and MERGE statements
May 24, 2012	How to query CDATA sections in XML
Feb 4, 2012	How to list the paths of all elements in an XML document?
Nov 30, 2011	XQuery support in DB2 10 for z/OS
Oct 9, 2011	Advanced SQL/XML: Joins and FLWOR Expressions in the XMLTABLE Function
Aug 29, 2011	Quantified expressions in XQuery: When ‘some’ and ‘every’ satisfy!
Mar 18, 2011	XQuery and SQL/XML: How to convert a date that is not a date into a date?
Aug 11, 2010	How to order query results based on XML values
May 26, 2010	XML Profiling – How to get a list of all elements and attributes
May 9, 2010	How to deal with dirty data
July 24, 2011	XQuery and Other XML Manipulation in SQL Stored Procedures
April 6, 2011	Returning data from one path based on predicates in another (part 2/2)
April 3, 2011	Intra-document references: Returning data from one path depending on predicates in another path
June 17, 2010	An XQuery Cheat Sheet
April 12, 2010	The XMLTABLE function and a case where a left outer join can help
Feb 21, 2010	Get Started with XML Full-Text Search in DB2
Jan 25, 2010	tokenize it !
Jan 11, 2010	Watch your step! Positional predicates in XPath.
Oct 14, 2008	XQuery: Introduction, Tips, and Future Directions
July 25, 2008	XQuery versus SQL/XML – Part Deux
June 26, 2008	XQuery versus SQL/XML

Date	XML Storage and Indexing
July 11, 2012	XML Storage in DB2: To inline or not to inline?
June 8, 2012	XML Indexing in DB2 9.x and DB2 10 for Linux, UNIX, and Windows
July 11, 2011	How to index XML documents in the presence of schema diversity?
July 19, 2010	Structure-agnostic XML indexes in DB2
Dec 26, 2010	How to insert XML documents from the file system?
Feb 21, 2010	Get Started with XML Full-Text Search in DB2
May 6, 2009	DB2 Compresses XML Data by 60% to 80%
May 3, 2009	Multi-Dimensional Clustering for Tables with Native XML Data
May 1, 2009	Use Hash Partitioning for Fast Analysis of XML Data
Oct 16, 2008	DBA Concerns about Native XML Storage
April 8, 2008	Native XML Reduces Storage Costs

Date	XML Schemas
Sep 13, 2012	How to Validate XML Documents in Database Queries
Feb 29, 2012	What is an XML Schema and why should I care?
Aug 22, 2008	Schema Evolution
Aug 21, 2008	Schema Flexibility

Date	Application Development
Sep 29, 2011	At your service: REST with DB2 pureXML!
Aug 8, 2011	Using XQuery across Application Server and Database Server
July 30, 2011	Beyond the Database: Native XML and XQuery in the Application Server
Feb 25, 2011	CICS meets XML: Using CICS Web Servives with DB2 pureXML
Jan 8, 2011	XML Manipulation in Application code vs. a Native XML Database
Oct 8, 2010	Buidling XML Solutions with Rational Software Architect (RSA) and Rational Application Dev
Mar 1, 2010	XML Support in JDBC 4.0: The SQLXML Interface
Nov 9, 2009	Building a JSON and DB2 pureXML Application
Aug 5, 2008	Native XML Storage Reduces Development Costs
Nov 18, 2008	Referential Integrity and XML Data

Date	XML Use Cases and Case Studies
Sep 12, 2011	DB2 pureXML – Rich in Proteins!
June 24, 2011	A true story about using XML forms and DB2 to process millions of tax returns
May 12, 2010	Webcast on XML Use Cases (May 18)
Feb 7, 2010	BJC HealthCare Improves Clinical Research with DB2 pureXML
Nov 2, 2009	Electronic Forms Using Adobe PDF and IBM DB2 pureXML
Sep 8, 2009	The Patient-Centered Medical Home
Aug 24, 2009	Electronic Health Records for Smarter Healthcare
July 17, 2009	Article about XML for Healthcare
June 16, 2009	Building an XML-Based Electronic Forms Solution
June 5, 2009	Short Video Highlighting Early Experiences with DB2 9.7
Mar 30, 2009	GUS Group uses DB2 pureXML for ERP System
Mar 13, 2009	UCLA Health System uses XML and SOA to Optimize Data Integration and Migration
Mar 2, 2009	Storing and Retrieving ACORD Data for Insurance
Feb 26, 2009	Speedy Communications and Collaboration in XML-based Environments
Dec 30, 2008	XForms and Native XML Storage
Mar 27, 2008	How to Choose Solution Areas
Mar 25, 2008	More information about Storebrand
Mar 23, 2008	Productivity Improvements at Storebrand

Date	ETL and Warehousing
April 13, 2011	XML ETL with DataStage 8.5
July 14, 2010	A 10TB XML Data Warehouse Benchmark
June 23, 2010	More on Data Warehousing and XML
April 5, 2010	More on XML ETL
Mar 26, 2010	XML ETL
Jan 31, 2010	New Article on XML in the Data Warehouse
May 28, 2009	XML in the Data Warehouse
April 22, 2009	Fast Performance for Native XML Data in the Warehouse
Aug 24, 2009	Article on Integrating XML Operational Data into a Data Warehouse

Date	DB2 pureXML product news
April 19, 2012	XML Enhancements in DB2 10 for Linux, UNIX, and Windows
Nov 30, 2010	Node-level XML Updates in DB2 10 for z/OS
Nov 18, 2010	New XML Features in DB2 10 for z/OS !
June 8, 2010	Updated version of a free XML database: DB2 Express-C 9.7.2
Aug 7, 2009	SOA Projects: IBM DB2 versus Oracle Database
Feb 11, 2009	XML Database in the Cloud
Feb 10, 2009	pureXML Moves into Core DB2 Product

Date	Other Products and Tools
Dec 5, 2010	XML Tools for DB2 pureXML
Mar 8, 2010	Integration of Altova XMLSpy with DB2 pureXML
Jan 18, 2010	XPath and XQuery Functions in SQL Server vs. DB2
Jan 4, 2010	Shredding and constructing XML in mySQL
Dec 23, 2009	XML Support in mySQL
Dec 17, 2009	Oracle vs. mySQL: The Full Monty
Aug 7, 2009	MyCareTeam and IBM Collaborate to Improve Continuing Care for Diabetes
July 8, 2009	Why Won’t Oracle Publish XML Benchmark Results for TPoX?
Sep 30, 2008	XML in Oracle 11g
Sep 24, 2008	XML in SQL Server 2008
Sep 19, 2008	XML in SQL Server 2005
July 11, 2008	Questions for XML Database Vendors
May 13, 2008	XML-only Databases
May 12, 2008	All Native XML Databases are not Equal

Date	Publications and Resources
Feb 13, 2011	Extremely pureXML in DB2 10 for z/OS
Sep 14, 2010	DB2 pureXML Cookbook – Errata
Aug 13, 2009	DB2 pureXML Cookbook – 45% Discount
June 3, 2009	DB2 pureXML for Dummies . Get Your Copy!
April 29, 2010	Get off to a fast start with DB2 pureXML
April 10, 2009	DB2 pureXML Cookbook
Feb 25, 2009	Intel Insights Magazine covers Native XML Storage
Feb 5, 2009	Henrik’s Blog
Feb 5, 2009	DB2 pureXML leads SOA World Magazine Reader’s Choice Award
Jan 21, 2009	Podcast Series about DB2 pureXML on the Mainframe
Jan 20, 2009	Reports for Information in Native XML Format
Dec 30, 2008	Anant Jhingran on Improving XML Applications
Sep 1, 2008	DB2 pureXML Online Communities
July 27, 2008	Getting Started with Native XML Databases

Date	XML Coverage at IDUG and IOD Conferences
Oct 21, 2012	XML from Vegas to Berlin!
Oct 18, 2011	XML in Las Vegas !
April 25, 2011	XML sessions at IDUG North America 2011
Oct 21, 2010	Next week in Vegas
April 18, 2010	DB2 for z/OS pureXML Hands-on Lab @ IDUG North America
Oct 23, 2009	DB2 pureXML at the IOD Conference
May 28, 2009	IBM IOD Conference Attendees get a Poken for Networking
May 13, 2009	Short Video from the IDUG North America Confernece
Nov 11, 2008	XML at IBM Information on Demand Conference
Sep 11, 2008	DB2 pureXML at the IOD Conference
Sep 4, 2008	Meet Native XML Databases Users

Date	News, Announcements, Miscellaneous
July 30, 2010	DB2 Express-C Light – A reduced footprint XML database
Mar 15, 2010	More DB2 pureXML Bootcamps
Feb 14, 2010	DB2 pureXML Webinars and Bootcamps
Dec 15, 2009	Hi there here is the new guy!
Dec 13, 2009	Welcome Matthias Nicola!
June 9, 2009	Flirting with Poken
April 22, 2009	The XML Rap Superstar
April 22, 2009	IDUG announces winners of XML Challenge
April 20, 2009	Teaser for Upcoming DB2 pureXML Features
Mar 12, 2009	DB2 and Cloud Computing – Chat with the Labs
Feb 5, 2009	Free Online Conference – Data in Action
Jan 20, 2009	DB2 Technology Sandbox
Dec 26, 2008	Native XML Storage for the Mac
Dec 3, 2008	Update on the XML Challenge
Nov 5, 2008	Why IBM DB2 is ideal for Transactional XML
Oct 9, 2008	10 Reasons why DBAs Should Understand Native XML
Oct 2, 2008	XML Challenge Web Site is Live!
Sep 28, 2008	1st Oct is Online Community Action Day
Aug 7, 2008	Webcast: A primer for storing and retrieving XML data
July 1, 2008	Viral Video – Wednesday
June 20, 2008	Viral Video – Tuesday
June 16, 2008	Viral Video – Monday
June 13, 2008	Learn to Use XML with Databases and win Prizes!
Mar 22, 2008	How to Market New Technology
Mar 19, 2008	Welcome

Happy browsing!

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

Good and bad XML design

December 4, 2012

Whenever people design things –such as houses, software systems, database schemas, or even XML structures– there are good and not so good design options. Roughly speaking, a design is usually considered “good” if it allows for easy and (cost-)efficient use and implementation of whatever is being built.

For XML documents we have seen various cases of “good” and “bad” designs over the years. Previously I wrote about name/value pair design for XML documents, which often is not a good choice.

Recently I received a question regarding another design choice, which I would like to share here in a simplified form.

The question was about an XML message format to transmit a list of order numbers and the corresponding order system identifiers. Two alternative XML structures were proposed:

Option A:

<OrderList>
  <Order>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
  </Order>
  <Order>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
  </Order>
</OrderList>

Option B:

<OrderList>
    <OrderNumber>A0000001</OrderNumber>
    <OrderSystem>ABC</OrderSystem>
    <OrderNumber>B0000001</OrderNumber>
    <OrderSystem>XP1</OrderSystem>
    <OrderNumber>C0000001</OrderNumber>
    <OrderSystem>Q-9</OrderSystem>
</OrderList>

The motivation for option B might be a smaller message size. One might argue that XML elements are by definition ordered, and if we know that the message consists of pairs of order number and order system, then XML format B is as easy to interpret as format A. Or, is it not?

The problem that I see with option B is that the absence of the <Order> elements gives up the explicit structure of the data and instead requires implicit knowledge, i.e. one must know that the OrderNumber at position i belongs to the OrderSystem at position i+1.

In my view, this design is against the principles of XML. In XML, the tags and their nesting should describe the data explicitly. XML is meant to be a hierarchical data format so that data items that belong together are grouped under the same parent (as in option A). After all, that is the benefit of XML over a flat file format.

I also think that using the explicit structure of option A is less error-prone than using option B which, requires implicit knowledge based on element positions.

How about queryability?

It turns out that XML format A is considerably easier to query than option B. Let’s assume you need to code a SQL/XML query to list the order numbers and systems. For that exercise we assume that the messages are stored in a table “mytable” with XML column “doc”.

The query for XML option A is very simple, using the XMLTABLE function in its most basic form. The row-generating XPath expression $DOC/OrderList/Order iterates over the orders, and the column expressions extract the order number and system.
http://www.ibm.com/developerworks/data/library/techarticle/dm-0708nicola/

-- Query for XML option A:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE ('$DOC/OrderList/Order'
       COLUMNS
         OrderSystem VARCHAR(20) PATH 'OrderSystem',
         OrderNumber VARCHAR(20) PATH 'OrderNumber' );

ORDERNUMBER          ORDERSYSTEM
-------------------- --------------------
A0000001             ABC
B0000001             XP1
C0000001             Q-9

3 record(s) selected.

How can you produce the same result set from XML format B? This is remarkably tricky! You need to iterate over the child elements and keep track of which element is at position i versus i+1 to produce the proper pairs in the output.

There are likely several ways to code the query for XML option B, and you see one solution below. It seems hard to query the elements by name alone because the important structural information is by position. Hence, this query iterates over the positions 1 to N, where N is the number of elements. For each even position 2, 4, 6, etc. the query produces the pair of the current ($pos) and the previous ($pos – 1) element to construct the proper output.

-- Query for XML option B:

SELECT OrderNumber, OrderSystem
FROM mytable,
     XMLTABLE('for $pos in (1 to count($DOC/OrderList/*))
               where $pos mod 2 = 0
               return  <pair>
                         {$DOC/OrderList/*[$pos - 1]}
                         {$DOC/OrderList/*[$pos]}
                       </pair>'
       COLUMNS
         OrderNumber VARCHAR(20) PATH 'OrderNumber',
         OrderSystem VARCHAR(20) PATH 'OrderSystem' );

OrderSYSTEM          OrderNUMBER
-------------------- --------------------
ABC                  A0000001
XP1                  B0000001
Q-9                  C0000001

3 record(s) selected.

This query is not only more complex but also potentially less efficient. Note that the return clause of the FLOWR expresion constructs the explicit pairing that it missing in XML format B.

Let me know if you can think of a simpler query to produce the same result from XML format B.

Next, imagine a message that doesn’t have just 2 fields per order but maybe 10 or 100 fields per order. Then option B becomes increasingly more ugly.

Or worse, what if there is a schema change that allows one order to have a different number of fields than the next? Then the query that we coded for XML format B will no longer work because it was based on a specific structural assumption that is no longer true. In contrast, our query for XML format A will work even if you add additional fields to some of the orders in the message.

In summary, the benefits for XML option A include the following:

Explicit structure rather than implicit positional format
Easier and more efficient to query (query by name rather query by postion)
More resilient to schema changes

I think that these benefits far outweigh the concern that XML format A is larger. Also, you could chose to send or store the XML in a compressed format to greatly alleviate space concerns.

Posted by Matthias Nicola
Filed in Uncategorized

2 Comments »

Overcoming Performance Obstacles in XML Encryption

October 23, 2012

In times of data leakage, hacker attacks, trojan horses, and various other data security threats, efficient and effective data encryption has becomes a critical requirement for many enterprises. This need applies to XML as much as to most other forms of data.

Intel and IBM have collaborated to demonstrate that a combination of modern hardware and software can perform effective encryption with very high efficiency.

In particular, the latest Intel Xeon E5 and E7 processor families provide AES-NI, which is a set of new instructions in the Intel Xeon processor that help accelerate encryption, decryption, key generation, matrix manipulation, and carry-less multiplication.

A joint benchmark has measured the performance impact of encrypting and decrypting XML data in DB2 9.7 using the IBM InfoSphere Guardium Data Encryption capabilities on the Intel Xeon E5 platform.

The results of the TPoX benchmark show that full encryption can be performed for a read/write XML transaction processing application with less than 4% overhead. This is a fantastic result.

More information on these tests and on the technologies and products used, is available here:

http://software.intel.com/en-us/articles/overcoming-performance-obstacles-in-data-encryption

http://software.intel.com/sites/default/files/m/d/4/1/d/8/IBM_DMM_Intel_AES_Vormetric_FINAL.pdf

http://ibmdatamag.com/2012/05/overcoming-performance-obstacles-in-data-encryption/

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

XML from Vegas to Berlin!

October 21, 2012

I’m in Las Vegas right now, this glittering and crazy city in Nevada! Las Vegas is once again the location for IBM’s annual Information On Demand (IOD) conference, which is starting today!

IOD is the premier IBM event for Information Management software, including DB2, Informix, Netezza, Cognos, Big Data, SPSS, Content Maagement, and other product areas. The conference program is very diverse and has something for everyone. At IOD you can choose from more than 1500 sessions and 100+ hands-on labs.

Not surprisingly, XML continues to be an important topic for information management, and there are various XML-related sessions at IOD. In particular, I’m looking forward to the following sessions where some of IBM’s customers are sharing their success stories with DB2 pureXML:

Session 3674A:
IBM pureXML in Financial Applications: Experiences From Vanguard
Mon, Oct 22, 201, 11:30 AM – 12:30 PM
Speakers: George White and Milton Beaver, The Vanguard Group

Sessio 1284B:
IBM DB2 and XML: Excellent Opportunity or Extra Problems? Are You Ready?
Mon, Oct 22, 2012, 2:15 PM – 3:15 PM
Speaker: Kurt Struyf, suadasoft

Session 1613A:
How to Get Warehouse-Type Performance With XML Tables
Wed, Oct 24, 2012, 2:30 PM – 3:30 PM
Speaker: Ray Sippel, BJC HealthCare

Additionally, you can get hands-on experience with XML on DB2 for Linux, UNIX, and Windows and DB2 for z/OS in the following hands-on labs:

Lab 1255A
Beyond SQL With IBM DB2 10: Maintaining and Querying XML Data and RDF Stores
Thu, Oct 25, 2012, 1:30 PM – 4:30 PM

Lab 1314A
New Features in IBM DB2 10 for z/OS Improve Development Productivity (including XML features)
Mon, Oct 22, 2012, 10:00 AM – 1:00 PM

If you can’t be in Las Vegas this week (maybe because you’re on the other side of the globe?), how would Berlin work for you?!

The annual European conference of the International DB2 User Group (IDUG) is coming up in Berlin, Germany, on November 4 to 9, 2012. For a number of years there hasn’t been an IDUG conference without any XML sessions, and the same is true this year. Here are some presentations that can you see in Berlin:

“An XML Document’s Life – Dr. Node!”
Speaker: Terri Grissom, BMC Software
Wed, Nov 07, 2012, 11:00 AM – 12:00 PM

“Breaking the Relational Limit with pureXML in DB2 for z/OS”
Speaker: Mengchu Cai, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 01:00 PM – 02:00 PM

“How to Design a Hybrid XML/Relational Database Schema”
Speaker: Matthias Nicola, IBM Silicon Valley Lab
Tue, Nov 06, 2012, 02:15 PM – 03:15 PM

I hope to see you either in Vegas or in Berlin! Enjoy your conference!

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

How to Validate XML Documents in Database Queries

September 13, 2012

In DB2, validation of XML documents against XML Schemas is optional. If you choose to validate XML documents in DB2, the most typical scenario is to validate XML documents when they are inserted or loaded into the database. This makes sense: if you ensure that the XML that enters the database is valid, then subsequent queries can assume that the data is valid and complies with a particular XML Schema.

Likewise, validation in XML updates statements ensures that document replacement or document modifications do not violate your XML Schema.

Here is a simple example for document validation in INSERT and UPDATE statements, based on an XML Schema that was registered in the DB2 XML Schema Repository (XSR) under the SQL name db2admin.myschema:

CREATE TABLE mytable (id INTEGER NOT NULL PRIMARY KEY, doc XML);

INSERT INTO mytable
VALUES(?, XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema));

UPDATE mytable
SET doc = XMLVALIDATE(? ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

UPDATE mytable
SET doc = XMLVALIDATE( XMLQUERY(‘copy $new := $DOC
                                 modify do insert <status>delivered</activated>
                                           into $new/message/header
                                 return $new’)
             ACCORDING TO XMLSCHEMA ID db2admin.myschema)
WHERE id = ?;

There are also cases when you might want to validate XML as part of a query. There can be several reasons for that:

Documents were inserted or updated without validation and you need to validate them before consumptions.
You wish to validate XML documents against a different schema than the one was used for validation upon insert or update.
You are extracting fragments of stored XML documents and wish to validate them against a specific schema.
Your queries are constructing entirely new XML documents and you wish to vaidate that the constructed XML complies with a given schema.

Regardless of the motivation, XML validation in a query is simple.

You can simply use the XMLVALIDATE function in a SELECT statement. All the same options for XMLVALIDATE are allowed as if you would use it in an INSERT or UPDATE statement. Let’s look at several examples:

SELECT XMLVALIDATE(doc ACCORDING TO XMLSCHEMA ID db2admin.myschema)
FROM mytable
WHERE id = 5;

This query above reads a specific document and performs schema validation against the XML Schema that was registered as db2admin.myschema.
If the selected document is valid for the specified schema, the document is returned.
If the selected document is not valid for the specified schema, the query fails and produces an error code that points to why the schema is violated.

Instead of the XML column name doc, the XMLVALIDATE function can take any argument of type XML, such the result of an XMLQUERY function. The following query uses the XMLQUERY function to extract just the message body from an XML document and validates it against the XML Schema db2admin.msgbodyXSD:

SELECT XMLVALIDATE( XMLQUERY(‘$DOC/message/body’)
ACCORDING TO XMLSCHEMA ID db2admin.msgbodyXSD )
FROM mytable
WHERE id = 5;

The next query constructs a new XML document and validates it as part of the query:

SELECT XMLVALIDATE(
         XMLQUERY(‘document{
                    <newdocument>
                      <header>{$DOC/party/identity}</header>
                      <body>
                          {$DOC/party/name}
                          {$DOC/party/details/address}
                      </body>
                   </newdocument>}’)
       ACCORDING TO XMLSCHEMA ID db2admin.newdocXSD)
FROM mytable
WHERE id = 5;

These examples give you an idea of the capabilities for validating XML query results against an XML Schema.

Posted by Matthias Nicola
Filed in Uncategorized

7 Comments »

Result set cardinalities when querying repeating XML elements

August 14, 2012

Recently I received some questions about the result sets when querying XML, and especially when querying repeating elements that occur more than once per document.

As it turns out, the same logical result can be returned in different ways, depending on how you write your XQuery or SQL/XML query.

Let’s look at a simple table with two XML documents, and then at several different queries against that data. Here is the sample data:


create table testtable(doc XML);

insert into testtable(doc)
 values ('<a id="1">
             <b>1</b>
             <b>2</b>
          </a>');

insert into testtable(doc)
 values ('<a id="2">
             <b>3</b>
             <b>4</b>
             <b>5</b>
         </a>');

Now assume we want to return all the <b> elements from these two documents. You can write such a query in several different ways, each returning the same <b> elements in a slightly different way:

XQuery FLWOR expression
XQuery FLWOR expression within an SQL VALUES clause
SQL/XML query with the XMLQUERY function
SQL/XML query with the XMLTABLE function

Let’s look at each of these options in turn.

1. XQuery FLWOR expression

The fist example is a simple XQuery FLWOR expression. It iterates over the path /a/b in all documents and returns the <b> elements one by one. The result is a sequence of 5 elements, and each is returned as a single item in the result set:

xquery
 for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
 return $b';

<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>

5 record(s) selected.

2. XQuery FLWOR expression in an SQL VALUES clause

If you enclose the same FLWOR expression in an SQL VALUES clause then the same XML elements are returned in a different format.

In this example, the VALUES clause produces a single value. The SQL type of that value is the XML data type and the value itself is a sequence of 5 elements. The entire sequence is returned as a single value of type XML:

values(xmlquery('
 for $b in db2-fn:xmlcolumn("TESTTABLE.DOC")/a/b
 return $b'));

<b>1</b><b>2</b><b>3</b><b>4</b><b>5</b>

1 record(s) selected.

3. SQL/XML query with the XMLQUERY function

You could also write an SQL SELECT statement and include your XQuery or XPath expression in an XMLQUERY function.

Note that the XMLQUERY function is a scalar function, i.e. it returns one result value of type XML for each row that it is applied to. Since our sample table contains two rows, the following query returns two results values of type XML. The first value is a sequence with all the <b> elements from the first document, and the second value is the sequence of all <b> elements from the second document:

SELECT xmlquery('for $b in $DOC/a/b return $b') as col1
FROM testtable;

COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>

2 record(s) selected.


-- same result with a simple XPath:
SELECT xmlquery('$DOC/a/b') as col1
FROM testtable;

COL1
----------------------------
<b>1</b><b>2</b>
<b>3</b><b>4</b><b>5</b>

2 record(s) selected.

The potential benefit of this result format is that you now exactly which <b> elements came from the same input document. If you prefer to return each <b> element as a separate item, use the XMLTABLE function.

4. SQL/XML query with the XMLTABLE function

The XMLTABLE function is not a scalar function, it’s a table function. This means that it returns a set of result rows for each input document. More precisely, it return one result row for each item that is produced buy the row-generating expression /a/b:

-- return a column of type XML:
SELECT X.*
FROM testtable,
 XMLTABLE('$DOC/a/b'
 COLUMNS
 col1 XML PATH '.') as X;

COL1
----------------------------
<b>1</b>
<b>2</b>
<b>3</b>
<b>4</b>
<b>5</b>

5 record(s) selected.

-- return a column of type integer:
SELECT X.*
FROM testtable,
 XMLTABLE('$DOC/a/b'
 COLUMNS
 col1 INTEGER PATH '.') as X;

COL1
--------
 1
 2
 3
 4
 5

5 record(s) selected.

The result sets in all of these example make sense and are consistent with SQL semantics. You can chose the shape of your query results and write your queries accordingly.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

Using the XMLTABLE function with MERGE statements in DB2 for z/OS

August 10, 2012

This is an add-on to my previous post on Using the XMLTABLE function in UPDATE and MERGE statements.

I had tested all the examples in that previous post in DB2 for Linux, UNIX, and Windows but overlooked that DB2 for z/OS currently has a restriction for the MERGE statement.

The MERGE statement in DB2 for z/OS expects a VALUES clause to provide the data that is to be merged, not an arbitrary sub-select. For details, see:
http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/topic/com.ibm.db2z9.doc.sqlref/src/tpc/db2z_sql_merge.htm

As a result, the MERGE example at the end of my previous post needs to be adjusted for DB2 z/OS. For example, you can use an SQL procedure to loop over the rows produced by the XMLTABLE query and feed these rows into the MERGE statement:


CREATE PROCEDURE XMLMERGE(IN P_CHANGE XML)
LANGUAGE SQL
BEGIN
  DECLARE ID INT;
  DECLARE X INTEGER;
  DECLARE Y VARCHAR(20);
  DECLARE SQLCODE INT;

  DECLARE C1 CURSOR FOR
    SELECT ID, X, Y
    FROM XMLTABLE('$DOC/root/mydata' PASSING P_CHANGE AS "DOC"
         COLUMNS
           id INTEGER     PATH '@id',
           x  INTEGER     PATH 'elem1',
           y  VARCHAR(20) PATH 'elem2') T;

  OPEN C1;
    LOOP1: LOOP
      FETCH C1 INTO ID, X, Y;
      IF SQLCODE <> 0 THEN LEAVE LOOP1; END IF;

      MERGE INTO RELTABLE   R
      USING (VALUES(ID, X, Y)) AS N(ID, X, Y)
      ON (R.ID = N.ID)
      WHEN MATCHED THEN UPDATE SET R.X = X, R.Y = Y
      WHEN NOT MATCHED THEN INSERT VALUES(ID, X, Y);
    END LOOP LOOP1;
  CLOSE C1;
END

Thanks to my colleague Guogen Zhang for providing this sample solution.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

Using the XMLTABLE function in UPDATE and MERGE statements

July 30, 2012

The XMLTABLE function is part of the SQL standard. It can take XML as input, use XPath or XQuery expressions to extract values or pieces from the XML, and return the result in a relational row and column format.

As discussed in a previous blog post, the XMLTABLE function is often used in queries to read XML documents from an XML column in the database and return the extracted values as a relational result set.

But, the XMLTABLE function can also be used in INSERT and UPDATE statements. This allows an application to pass an XML document as a parameter to an INSERT or UPDATE statement that extractx selected values and uses them to insert or update relational rows in a table.

Using the XMLTABLE function in INSERT statements is quite common if requirements dictate that XML needs to be shredded to relational tables rather than stored natively in an XML column.

Here is a simple example that extracts values from a small XML document and inserts these values into the table “reltable”:


CREATE TABLE reltable(id int, col2 int, col3 varchar(20));

INSERT INTO reltable
SELECT id, x, y
FROM XMLTABLE ('$doc/mydata'
       PASSING xmlparse(document '<mydata id="1">
                                     <elem1>555</elem1>
                                     <elem2>test</elem2>
                                  </mydata>') as "doc"
       COLUMNS
          id INTEGER     PATH '@id',
          x  INTEGER     PATH 'elem1',
          y  VARCHAR(20) PATH 'elem2'   );

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         555 test

1 record(s) selected.

The PASSING clause assigns the input document to the variable “$doc” which is then the starting point for the extraction.

You don’t necessarily need to hardcode the XML input document in the INSERT statement. It can also be supplied via a parameter marker or host variable, as in this example:

INSERT INTO reltable
SELECT id, x, y
FROM XMLTABLE ('$doc/mydata'  PASSING  cast(? as XML) as "doc"
       COLUMNS
          id INTEGER     PATH '@id',
          x  INTEGER     PATH 'elem1',
          y  VARCHAR(20) PATH 'elem2'   );

Ok, this was the easy part.

Now what if we want to use the XMLTABLE function in an UPDATE statement to replace the values in an existing row in “reltable” with new values from an incoming XML document? It might not be quite as obvious how to write such an UPDATE statement, but as it turns out it’s not very hard either!

Here is an UPDATE statement that extracts the values of the XML elements “elem1” and “elem2” to update the columns “col2” and “col3” in “reltable” for the row that has id = 1.

UPDATE reltable
SET (col2, col3) = (SELECT x, y
                    FROM XMLTABLE ('$doc/mydata' PASSING
                           xmlparse(document '<mydata id="1">
                                                 <elem1>777</elem1>
                                                 <elem2>abcd</elem2>
                                              </mydata>') as "doc"
                         COLUMNS
                           x  INTEGER     PATH 'elem1',
                           y  VARCHAR(20) PATH 'elem2'   )
                    )
WHERE id = 1;

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         777 abcd

1 record(s) selected.

The UPDATE statement above uses a simple WHERE clause to select a specific row to be updated. But, what if we don’t know which target row the incoming XML document needs to be applied to? Ideally, we want to extract the @id attribute from the input document and update whichever row matches its value.

We might be tempted to simply extract the @id attribute in the same XMLTABLE function and add a join predicate to the subselect, like this:

-- this statement is not a good idea!
UPDATE reltable r
SET (r.col2, r.col3) = (SELECT T.x, T.y
                        FROM XMLTABLE ('$doc/mydata'
                              PASSING  cast(? as XML) as "doc"
                              COLUMNS
                                id INTEGER     PATH '@id',
                                x  INTEGER     PATH 'elem1',
                                y  VARCHAR(20) PATH 'elem2' ) T
                        WHERE r.id = T.id
                       );

But, this statement would update all rows in the table (many of them with NULL values) because the WHERE clause only applies to the rows produced by the subselect, not to the rows in “reltable”.

One possible solution is to add a WHERE clause that extracts the @id attribute as needed to filter the rows in “reltable”:

-- this statement is better, but not optimal
UPDATE reltable
SET (col2, col3) = (SELECT x, y
                    FROM XMLTABLE ('$doc/mydata'
                          PASSING  cast(? as XML) as "doc"
                          COLUMNS
                            x  INTEGER     PATH 'elem1',
                            y  VARCHAR(20) PATH 'elem2' )
                    )
WHERE id = XMLCAST( XMLQUERY('$doc/mydata/@id'
                    PASSING cast(? as XML) as "doc") AS INTEGER);

Yes, this works, but it requires us to pass the XML document into the statement twice: once into the subselect in the SET clause, and once into the XMLQUERY function in the WHERE clause. That’s not very elegant and probably not ideal for performance either.

There might be multiple ways to improve the UPDATE statement above. One nice solution is to use a MERGE statement:


MERGE INTO reltable r
USING (SELECT id, x, y
       FROM XMLTABLE ('$doc/mydata' PASSING  cast(? as XML) as "doc"
             COLUMNS
               id INTEGER     PATH '@id',
               x  INTEGER     PATH 'elem1',
               y  VARCHAR(20) PATH 'elem2'   )) t
ON r.id = t.id
WHEN MATCHED
THEN UPDATE SET r.col2 = t.x, r.col3 = t.y;

The nice thing about the MERGE statement is that it can also handle multiple cases where the XMLTABLE function produces multiple rows whose keys may or may not already exist in the target table, so that UPDATE and/or INSERT operations need to be performed. Here is an example:


MERGE INTO reltable r
USING (SELECT id, x, y
       FROM XMLTABLE ('$doc/root/mydata'
             PASSING xmlparse(document '<root>
                                         <mydata id="1">
                                          <elem1>999</elem1>
                                          <elem2>xyz</elem2>
                                         </mydata>
                                         <mydata id="3">
                                          <elem1>333</elem1>
                                          <elem2>test33</elem2>
                                         </mydata>
                                        </root>') as "doc"
             COLUMNS
               id INTEGER     PATH '@id',
               x  INTEGER     PATH 'elem1',
               y  VARCHAR(20) PATH 'elem2'   )) t
ON r.id = t.id
WHEN MATCHED     THEN UPDATE SET r.col2 = t.x, r.col3 = t.y
WHEN NOT MATCHED THEN INSERT VALUES(t.id, t.x, t.y);

SELECT * FROM reltable;

ID          COL2        COL3
----------- ----------- --------------------
          1         999 xyz
          3         333 test33

2 record(s) selected.

This blog post has provided some basic examples that might serve as a starting point to develop your own INSERT, UPDATE, and MERGE statements with the XMLTABLE function.

Posted by Matthias Nicola
Filed in Uncategorized

2 Comments »

XML Storage in DB2: To inline or not to inline?

July 11, 2012

First, let’s revisit the concept of inlined XML storage and then discuss the pros and cons of inlining.

What is XML Inlining?

In short, inlining is an optional storage optimization in DB2 for “small” XML documents.

When you define a table with an XML column in DB2, such as CREATE TABLE mytable(id INTEGER, ….. , doc XML), the DB2 server creates three storage objects in the table space:

A data object (DAT), which holds the relational rows of the table
An index object (INX), which stores any indexes for the table
An XML storage object (XDA), which is the XML Storage Area and holds any XML documents

Optionally, you can assign these three objects to different table spaces but by default they all go into the same table space.

As illustrated in the following picture, the XML column in the data object doesn’t contain the actual XML document, but only references (descriptors) of where the documents can be found. The XML document trees are stored in the XDA object, and if a tree is larger than a single page then it is automatically cut into multiple regions. This way, large documents can span many pages, and that’s completely transparent to your application.

The region index -automatically defined and maintained by DB2- essentially remembers which regions belong to the same XML document for any given row in the data object. The regions index also enables very efficient access to any portion of an XML document. If only some part of a large document is required, e.g. to answer a query, DB2 does not necessarily need to bring all pages of the document into the buffer pool.

The XML documents that are small enough to fit onto a single page just have a single region, and multiple regions can be stored on the same page, if space is available.

As it turns out, there are very many applications that deal with small XML documents, often just 1KB to 20KB for most documents. The access to such small documents can be optimized by storing there document trees right in the DAT object, together with the rows that they belong to. This is called inlined XML storage and is illustrated in the following picture.

To enable inlining, define the XML column with an INLINE LENGTH that indicates that maximum size up to which you want documents to be inlined. For example:

CREATE TABLE mytable(id INTEGER, ….. , doc XML INLINE LENGTH 30000)

In this example, any XML documents whose parsed hierarchical representation is less than 30000 bytes will be inlined. Any documents that are larger than this are automatically stored in the XDA object as usual. The application sees no difference.

The specified INLINE LENGTH must be smaller than the page size of the DB2 table space minus the length of any other columns in the table.

Inlined documents are stored in the same tree format as documents that are not inlined. Just in a different location.

The Pros and Cons of Inlining

Performance measurements and plenty of experience with real XML applications have shown that inlining is almost always recommended, if a large percentage of your XML documents (for example, more than 70%) can be inlined. The benefits of inlining include the following:

Faster access to inlined documents – no redirection via the regions index
The regions index has no entries for inlined documents. If a large percentage of your documents are inlined then this reduces the space and maintenance cost associated with the regions index.
Better prefetching since inlined XML documents are prefetched as part of the row they belong to.

A key characteristic of inlining is that it drastically increases the row size on the data pages, and hence reduces the number of rows per data page. This can negatively impact the performance of queries that read only relational columns and do not access the XML column. For example, consider the following table and query:

CREATE TABLE myxml(c1 INT, c2 INT, c3 INT, doc XML INLINE LENGTH 30000);

SELECT SUM(c1 + c2 + c3)
FROM myxml;

This query reads the 3 integer columns from *all* rows in the table. Due to the inlined XML column, these rows are spread over a much larger number of pages than without inlining, so this query needs to fetch a lot more pages than without inlining. If you have many such queries, then inlining might not be the best choice for you.

How do I know whether a given document is inlined?

The DB2 function ADMIN_IS_INLINED can be applied to an XML column and it returns 1 if a document is inlined, and zero otherwise. This enables you to determine which and how many documents are inlined.

The DB2 function ADMIN_EST_INLINE_LENGTH can also be applied to an XML document in an XML column and returns the smallest required inline length for this document to be inlined, or -1 if the document is too large to be inlined.

Posted by Matthias Nicola
Filed in Uncategorized

3 Comments »

How to migrate from the XML Extender to DB2 pureXML

June 29, 2012

The XML Extender was introduced in DB2 7.1 (yes, that long ago!) to enable applications to store, query, update, construct, and validate XML documents in DB2 for Linux, UNIX, Windows and DB2 for z/OS.

In 2006, DB2 9.1 introduced pureXML – a set of native XML features that are deeply integrated into the DB2 engine. DB2 pureXML is vastly superior over the XML Extender, in both functionality and performance.

The XML Extender is still available in DB2 9.1 and 9.5 but has been discontinued in DB2 9.7. Hence, if you are using the XML Extender and want to move to DB2 9.7 or DB2 10, you need to migrate your XML application to DB2 pureXML.

Although the migration from the XML Extender to pureXML requires some considerations, it is not very hard. The migration to pureXML yields dramatic performance benefits and often simplifies your solution. You will need to migrate eventually, and the sooner you do it the earlier you will reap the benefits of significantly better performance and greater flexibility.

The details of the migration depend very much on which parts of the XML Extender functionality you are using, such as:

Shredding XML to relational tables (called “XML collection” in XML Extender lingo)
Construction of XML documents from relational tables
Storage of XML documents in db2xml.xmlclob columns
- with or without side tables?
- extraction of values with extract UDFs?
- XML updates with the update UDF?
XML validations against DTDs or XML Schemas
Other XML Extender features such as XSLT transformations or interaction with Websphere MQ message queues.

Let’s briefly discuss these one by one:

1. Shredding XML to relational tables

If you are using the XML Extender to shred XML into a set of relational tables, e.g. with the db2xml.dxxInsertXML stored procedure, you have two options:

(a) Consider whether you can avoid the shredding and instead store and query your XML documents in a column of type XML. Inserting documents into an XML column is significantly faster than shredding, and you can build indexes and express XML queries on XML columns.

(b) DB2 pureXML still offers shredding capabilities: you can use the XMLTABLE function or annotated XML Schemas to convert XML to relational format. These options are typically 5x to 8x faster than the old XML Extender shredding and still make the data available in same relational tables as before so that consuming applications don’t need any changes.

For more details on shredding in DB2 9.1 or higher, see the free sample Chapter of the DB2 pureXML Cookbook. That chapter also discusses the pros and cons of XML columns vs shredding, which is also examined in a recent developerWorks article.

If you choose to continue to shred XML into relational tables, note that shredding with the XMLTABLE function can sometimes be simpler and more flexible than using annotated XML Schemas. However, if you have a large number of XML Extender DADs, you might benefit from a free tool that converts DADs to annotated XML Schemas. This tool is available for download at the bottom of the article “From DAD to annotated XML schema decomposition“.

2. Constructing XML from relational tables

In DB2 9.1 and higher, construction of XML documents from relational tables is performed with SQL/XML construction functions that are part of the SQL standard. These functions include:

XMLELEMENT
XMLATTRIBUTES
XMLAGG
XMLFOREST
XMLNAMESPACES
XMLCONCAT
etc.

As part the migration you would write SQL statements (or stored procedures, if you prefer) that read relational data and produce XML with these functions.

3. Storage of XML documents in db2xml.xmlclob columns

If are using the XML Extender to store XML in db2xml.xmlclob or db2xml.xmlvarchar columns, you will migrate these columns to native XML columns. A previous blog post describes how to move data from a CLOB column to an XML column.

If you have used so-called side tables with your db2xml.xmlclob columns, you should replace those with native XML indexes. Note that DB2 10 contains some great XML index enhancements. Using XML indexes instead of side tables is simpler, more flexible, and provides better performance.

The XML Extender also offered a set of EXTRACT functions that enable applications to use XPath to extract individual values from XML documents in a db2xml.xmlclob column. In the migration you will replace these EXTRACT functions in your queries with the SQL/XML functions XMLQUERY, XMLEXISTS, or XMLTABLE, which are faster and more powerful. For details, see the articles “Which way to query your XML data?” and “XMLTABLE By Example“.

If you are using the XML Extender UPDATE function to modify XML in db2xml.xmlclob columns, you will migrate them to native XML updates.

4. XML validations against DTDs or XML Schemas

If you are using DTDs to validate your XML documents, you can convert the DTDs to an equivalent XML Schema and use the XML Schema capabilities that are part of DB2 pureXML. That is, you can register XML Schemas in DB2’s schema repository and validate XML documents during LOAD, insert, update, select, or other operations.

5. Other XML Extender features such as XSLT transformations or interaction with MQ

If you are using XSLT style sheets with the XML Extender, you can continue using the same style sheets with the XSLTRANSFORM function.This function also allows you to pass parameters to a style sheet at runtime. Also, the DB2 functions to read and write XML documents from/to Websphere MQ message queues are still available as part of the DB2 Federation capabilities that provide connectivity between DB2 and MQ.

These are the main topics that need consideration for moving from the XML Extender to DB2 pureXML. Additional information is available in the DB2 Information Center here and here.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

XML Indexing in DB2 9.x and DB2 10 for Linux, UNIX, Windows

June 8, 2012

If you want to learn more about XML indexes in DB2, including the XML index enhancements in DB2 10, join the DB2 pureXML devotee conference call on Wed, June 13, at 10am US Eastern Time. For further details and dial-in numbers, see:

https://www.ibm.com/developerworks/wikis/display/db2xml/pureXMLDevotee2012#pureXMLDevotee2012-jun

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

How to query CDATA sections in XML

May 24, 2012

First, what is a CDATA section in XML? CDATA stands for character data, and a CDATA section in an XML document defines a piece of text that should not be parsed by an XML parser.

Normally, an XML parser examines every character in an XML document, even the characters that make up the text value of an XML element. The reason for this behavior is that one element could contain other elements, and so the parser must look for the “<“-character that would denote the beginning of another element.

However, if you want the parser to ignore certain portions of text then you can enclose this text in a CDATA section. A CDATA section starts with <![CDATA[ and ends with ]]>.

A CDATA section can be useful if the text value of an XML element contains any reserved characters. For example, the characters “<” and “&” have a special meaning in XML and must not appear in the text value of an element. (The “&” marks the beginning of an entity reference in XML.) But, if text is declared as a CDATA section then its characters are ignored, including any illegal characters.

Let’s look at an example where I’m inserting the string “Food & Wine” with the reserved character “&” in it.

The first insert fails because the “&” is not properly escaped and not included in a CDATA section. The other 4 inserts are successful. In these 4 cases the CDATA section spans different portions of the string but always includes the “&”:

create table cdata(id int, doc XML);

insert into cdata
values(0, '<doc><name>Food & Wine</name></doc>');

insert into cdata
values(1, '<doc><name><![CDATA[Food & Wine]]></name></doc>');

insert into cdata
values(2, '<doc><name><![CDATA[Food &]]> Wine</name></doc>');

insert into cdata
values(3, '<doc><name>Food <![CDATA[& Wine]]></name></doc>');

insert into cdata
values(4, '<doc><name>Fo<![CDATA[od & Wi]]>ne</name></doc>');

When you insert and later retrieve an XML document in DB2, the CDATA sections are not preserved. Instead, any offending characters in the CDATA section are escaped according to the XML standard. Upon insert the CDATA sections are removed, and upon retrieval the “&” is escaped as & as it should be. Similary, a “<” sign in the CDATA section would be escaped to < .

select * from cdata;

1      <doc><name>Food &amp; Wine</name></doc>
2      <doc><name>Food &amp; Wine</name></doc>
3      <doc><name>Food &amp; Wine</name></doc>
4      <doc><name>Food &amp; Wine</name></doc>

4 record(s) selected.

To be precise, the escaping of reserved characters happens upon XML serialization to text.

Now what happens if we want to index and query such XML that has (had) CDATA sections in it? Well, let’s give it a try. First, let’s create an index and then query for “Food & Wine” with the “&” in it:

create index idx1 on cdata(doc) generate keys using xmlpattern '/doc/name' as sql varchar(35);

select id
 from cdata
 where xmlexists('$DOC/doc/name[ . = "Food & Wine"]');

SQL16002N  An XQuery expression has an unexpected token "&" following "Food ".

The query fails, as expected, because the “&” is a reserved character. We need to follow XQuery rules and escape the “&”:

select id
 from cdata
 where xmlexists('$DOC/doc/name[ . = "Food &amp; Wine"]');

ID
 -----------
 1
 2
 3
 4

4 record(s) selected.

As expected, all 4 rows are returned.If you check the execution plan you will also find that the index can be used.

Conclusions: The use of CDATA sections does not impair the queryability and indexability of XML elements in DB2. Also, it doesn’t matter whether you wrap the CDATA section just around the offending characters or around a larger piece of text in the element value.

Of course, in this simple example we wouldn’t necessarily have to use a CDATA section to escape the “&” in the input documents. We could have escaped the “&” as “&” before inserting the documents. But, if an an element contains a longer piece of text that comes from an XML-agnostic source (i.e. various illegal characters may be present), then it’s much easier to make it a CDATA section than to find and escape the offending characters one by one.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

XML Enhancements in DB2 10 for Linux, UNIX, and Windows

April 19, 2012

You might have noticed that DB2 10 for Linux, UNIX, and Windows was announced earlier this month, and it contains a bunch of great new features. One of the new features that I am particularly excited about is Time Travel, which enables bitemporal data managenment in DB2.

DB2 10 also includes a variety of useful XML enhancements. Let’s take a quick look at the most important ones here:

New XML Index Data Types: Integer, Decimal

As you know, when you create an index on an XML element or attribute you must specify the data type that DB2 should use to interpret and index the specifies nodes. The reason is that the usage of XML Schema, which might define data types, is optional. And even if an XML Schema was used, it might not define a data type for each and every element and attribute.

Before DB2 10, an XML index for numeric values is specified with the data type DOUBLE, so that any type of numeric values can be indexed. In DB2 10 you can also specify the data types INTEGER and DECIMAL to define a numeric XML index.

If you know that a certain element or attribute can only contain INTEGER (or DECIMAL) values, then you can improve query performance by creating the XML index with the most appropriate type. For example, assume that dept is a table that has an XML column deptdoc. Then you can define the following indexes in DB2 10:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/@id’ as sql integer;

create index idx2 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/salary’ as sql decimal(6,2);

You don’t need to change existing queries to use these new indexes!

XML Indexes for Case-Insensitive Search

DB2 10 allows you to define case-insensitive indexes and search very efficiently for string values regardless of their upper case or lower case spelling. For example, to enable case-insensitive search of employee names, you could create the following index:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/name/fn:upper-case(.)‘ as sql varchar(40);

This index stores all key values in upper-case. To find employees with names such as Peter Pan, PETER PAN, Peter PAN, pEtEr pAn, etc. you could write the following query that can use the index above:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[name/fn:upper-case(.) = “PETER PAN”]’);

Index support for Prefix Search

Previously, the following query with the fn:starts-with() function could not use an XML index to find all employees whose name starts with “Jo”:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:starts-with(name, “Jo”)]’);

In DB2 10, this query can benefit from existing XML indexes of type VARCHAR(n) that may exist on the <name> element.

XML Indexes for Existential (Structural) Predicates

An existential predicate, sometimes called structural predicate, is a predicate that checks the existence of an XML element or attribute in an XML document, regardless of its value.

For example, imagine you want to find all documents that contain a <comment> element for an employee. One of the following queries comes to mind:

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[comment]‘);

SELECT *
FROM dept
WHERE XMLEXISTS(‘$DEPTDOC/dept/employee[fn:exists(comment)]‘);

To support such queries, DB2 10 allows you to create an XML index with the fn:exists function, like this:

create unique index idx1 on dept(deptdoc) generate key using
xmlpattern ‘/dept/employee/fn:exists(comment)‘ as sql varchar(1);

This index must be defined as VARCHAR(1) and each index key represents true or false to indicate the existence of the <comment> element in a given document.

Friendly XML query behavior

DB2 10 also includes several enhancements that make XML queries behave in a more friendly manner. Let’s look at just one of several cases. For example, consider the following query that has a numeric predicate on the @id attribute:

select *
from dept
where xmlexists(‘$DEPTDOC/dept[@id = 15]’);

What happens if this query encounters an XML document where the @id attribute has a non-numeric string value such as “WW Sales”? In DB2 9.x this situation would cause a type error at run-time and the query would fail and abort with an error message. Although this behavior was compliant with the language standard, it is not very useful. If you look for the department with id = 15, any department with non-numeric id can never be a match. Therefore, DB2 10 simply evaluates the predicate to FALSE in this case, i.e. ignores the document with the non-numeric id, and continues processing the query.

Binary XML client/server communication

With DB2 10, Java applications can choose the format in which XML is transmitted between the application and the DB2 server. You can choose between traditional textual XML and a new binary XML format that is a compressed on-the-wire format for XML.

On the application side, the encoding and decoding of XML to and from the binary format is performed by the JDBC driver (JCC 4.9 or higher). The JDBC driver can convert the binary XML format to and from a text, DOM, SAX, or StAX representation. The use of binary XML provides the most notable performance gain for Java applications that already consume or produce XML in SAX format.

Applications can exchange XML with the DB2 server (through inserts, queries, etc.) and completely avoid the serialization of XML to text, i.e. neither the client nor the server needs to deal with textual XML for the purpose of insert or retrieving XML. Eliminating the parsing and serialization of textual XML can enable significantly better end-to-end performance.

Other IBM products that support this binary XML communication format include DB2 10 for z/OS, the WebSphere Application Server XML Feature Pack, and the WebSphere Datapower appliance.

Query Performance Enhancements

DB2 10 also includes a number of performance improvements for common types of XML queries, including many types of queries that use the XMLTABLE function.

Posted by Matthias Nicola
Filed in Uncategorized

6 Comments »

What is an XML Schema and why should I care?

February 29, 2012

What is an XML Schema? Some of you may already know this, others don’t. So before I’m going to share some more technical information about XML Schemas in subsequent blog posts, I better get some of the basics out of the way first.

When you process and manage information in XML format, you can choose to use an XML Schema with your XML documents. Roughly speaking, an XML Schema can be used to define what you want your XML documents to look like. For example, in an XML Schema you can define:

Which elements and attributes are allowed to occur in your XML documents
How the elements can be or must be nested, or the order in which the elements must appear
Which elements or attributes are mandatory vs. optional
The number of times a given element can be repeated within a document (e.g. to allow for multiple phone numbers per customer, multiple items per order, etc.)
The data types of the element and attribute values, such as xs:integer, xs:decimal, xs:string, etc.
The namespaces that the elements belong to
…and so on.

If you choose to create an XML Schema, it may define just some or all of the aspects listed above. The designer of the XML Schema can choose the degree to which the schema constraints the characteristics of the XML documents. For example, an XML Schema can be very loose and define only a few key features for your XML documents and allow for a lot of flexibility. Or it can be very strict to tightly control the XML data in every aspect. Or anything in between.

The use of an XML Schema is optional, i.e. an XML Schema is not required to store, index, query, or update XML documents. However, an XML Schema can be very useful to ensure that the XML documents that you receive or produce are compliant with certain structural rules that allow applications to process the XML. In other words, XML Schemas help you to enforce data quality.

Validation

If an document complies with a given XML Schema, then the document is said to be valid for this schema. A document might be valid for one schema but invalid for another schema. The process of testing an XML document for compliance with an XML Schema is called validation.

When an XML document is parsed by an XML parser, validation can be enabled as an optional part of the parsing process. Full validation of an XML document always requires XML parsing. For many documents and schemas, validation typically incurs only a small delta cost (in terms of CPU usage) on top of the cost of XML parsing.

What does an an XML Schema look like?

An XML Schema itself is an XML document! But, a very special document that needs to comply with very specific rules that are defined by -you guessed it!- another XML Schema, i.e. the schema for schemas.

Large XML schemas can consist of multiple schema documents that reference each other through import and include relationships. This allows you to compose an XML Schema out of smaller building blocks in a modular fashion.

I don’t want to go into the syntax details of XML Schemas here, but there are some useful resources available:

The XML Schema Primer: http://www.w3.org/TR/xmlschema-0/
A tutorial: http://www.w3schools.com/schema/default.asp
Best practices for XML Schema design: http://www.xfront.com/BestPracticesHomepage.html

When and why should I use an XML Schema?

Simply put, if you want to ensure data quality and detect XML documents that do not comply with an expected format, use an XML Schema and validate each document!

However, what if XML documents pass through multiple components of your IT infrastructure, such as a message queue, an application server, an enterprise service bus, and the database system? If these components do not modify the XML but merely read & route it, examine whether all of these components need to validate each document. For example, if the application server has already validated a document before insertion into a DB2 database, does the document need to be validated again in DB2? Maybe not, if you trust the application layer. Maybe yes, if you don’t.

An XML Schema is also often used as a “contract” between two or more parties that exchange XML documents. With this contract the parties agree on a specific format and structure of the XML messages that they send and receive, to ensure seamless operation.

Practically every vertical industry has defined XML Schemas to standardize XML message formats for the data processing in their industry. A good overview is given by the introduction of this article:

“Getting started with Industry Formats and Services with pureXML”: http://www.ibm.com/developerworks/data/library/techarticle/dm-0705malaika/

How can I validate XML documents in DB2?

Simple. First, you register one or multiple XML Schemas in the DB2 database. This can be done with CLP commands, stored procedures, or through API calls in the JDBC or .NET interface to DB2. After a schema is registered in DB2, you can use it to validate XML documents in DB2, typically when you insert, load, or update XML documents. You can enforce a single XML Schema for all XML documents in an XML column, or you can allow multiple XML Schemas per column. A database administrator can force automatic validation upon document insert, or allow applications to choose one of the previously registered schema for validation whenever a document inserted.

And… validation can also be done in SQL statements?

Yup. The SQL standard defines a function called XMLVALIDATE, which can be used for document validation in INSERT statement, UPDATE statements, triggers, stored procedures, and even in queries.

Here is a simple example of an INSERT statement that adds a row to a customer customer table, which consists of an integer ID column and an XML column called “doc”:

INSERT INTO customer(id, doc)
VALUES (?, XMLVALIDATE( ? ACCORDING TO XMLSCHEMA ID db2admin.custxsd) );

The id and the document are provided by parameter markers “?”, and the XMLVALIDATE function that is wrapped around the second parameter ensures validation against the XML Schema that has been regoistered under the identifier db2admin.custxsd.

If the inserted document is not compliant with the XML Schema, the INSERT statement fails with an appropriate error message. Similarly, the XMLVALIDATE function can also be used in the right-hand side of the SET clause of an UPDATE statement that modifies or replaces an XML document.

Ok, so much for now. In my next blog post we’ll go into more detail.

Posted by Matthias Nicola
Filed in Uncategorized

Leave a Comment »

How to list the paths of all elements in an XML document?

February 4, 2012

A common question is how to obtain a list of all the elements and attributes that occur in an XML document. Producing such a list is what I call “XML profiling” and in a previous blog post I have discussed several SQL/XML queries that can do this.

An extension of this question is how to get the paths of all the elements and attributes in a document. This seemingly simple task is -unfortunately- not nearly as simple as one would think! XPath and XQuery do not have a function that takes a given element or attribute as input and returns the full path to that node.

The solution is to write a query that traverses the XML document level by level to collect the element names at every level and concatenate them appropriately to construct the paths for every elements and attributes at every level.

There are many ways in which this can be done. You can use XQuery or SQL/XML and you can choose whether to use recursion or not. Let’s look at a few examples.

First, let’s create a simple table with a small document that we can use in the examples:

create table mytable(xmldoc XML);

insert into mytable values(
'<Message>
   <Type>Urgent</Type>
   <Person id ="123">
     <FirstName>Robert</FirstName>
     <LastName>Tester</LastName>
   </Person>
 </Message>');

A first and straightforward solution is to start at the root of the document, then at the first level of child nodes, and then at the children of each these child nodes, and so on. For each element or attribute we construct the path by concatenating the path from the parent with the name of the element or attribute. We do this for all nodes at a given level in the tree and then move to the next level of the document.:

xquery
for $L1 in db2-fn:xmlcolumn("MYTABLE.XMLDOC")/*
let $L1path := fn:string-join( ($L1/local-name() ),"/" )
return (
  $L1path,
  for $L2 in $L1/(*,@*)
  let $L2path := fn:string-join( ($L1path, $L2/local-name() ),"/" )
  return (
    $L2path,
    for $L3 in $L2/(*,@*)
    let $L3path := fn:string-join( ($L2path, $L3/local-name() ),"/" )
    return (

    $L3path,
    for $L4 in $L3/(*,@*)
    let $L4path := fn:string-join( ($L3path, $L4/local-name() ),"/" )
    return (

    $L4path,
    for $L5 in $L4/(*,@*)
    let $L5path := fn:string-join( ($L4path, $L5/local-name() ),"/" )
    return ($L5path)))));

Message
Message/Type
Message/Person
Message/Person/id
Message/Person/FirstName
Message/Person/LastName

6 record(s) selected.

The obvious shortcoming of this query is that it assumes a maximum of 5 levels in the document. If your documents are deeper than this, you can easily extend the query so that it goes down to 10 or 20 levels, whatever you need. That’s maybe not very elegant, but it works if you can define an upper bound on the depths of your XML documents, which is usually possible.

You probably notice that the path Message/Person/id should actually be Message/Person/@id because “id” is an XML attribute. The query can enhanced to take care of such details. In the last two sample queries of my XML profiling post you have seen how to use the self::attribute() test for this purpose.

If you prefer a more elegant solution that does not require any assumption about the maximum depths of the XML documents, then you need to code a recursive query, either in XQuery or in SQL/XML. Let’s try SQL/XML for a change.

You may already be familiar with how recursive SQL works. If not, you can look at several existing examples. The basic idea is to use a WITH clause, also called “common table expression”, that contains a UNION ALL between the start of the processing and a recursive reference back to the common table expression itself. The following augments this approach with the XMLTABLE function that extracts nodes and node names from the XML:

WITH pathstable (name, node, xpath) AS (
  SELECT x.name AS name, x.node AS xmlnode,'/' || x.name AS xpath
  FROM mytable,
       XMLTABLE('$XMLDOC/*'
        COLUMNS
          name    varchar(30) PATH './local-name()',
          node    XML         PATH '.') AS x
  UNION ALL
  SELECT y.name AS name, y.node AS xmlnode, xpath|| '/' || y.name AS xpath
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
        COLUMNS
         name    varchar(30) PATH 'local-name()',
         node    XML         PATH '.') AS y
) SELECT name, xpath
  FROM pathstable;

NAME                           XPATH
------------------------------ -------------------------------
Message                        /Message
Type                           /Message/Type
Person                         /Message/Person
id                             /Message/Person/id
FirstName                      /Message/Person/FirstName
LastName                       /Message/Person/LastName

6 record(s) selected

If you want to list the element and attribute values for each path, then you can easily modify this query as follows:

WITH pathstable (name, node, xpath, value) AS (
  SELECT x.name AS name, x.node AS xmlnode,
         '/' || x.name AS xpath, x.value as value
  FROM mytable,
       XMLTABLE('$XMLDOC/*'
        COLUMNS
          name    varchar(30) PATH './local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS x
  UNION ALL
  SELECT y.name AS name, y.node AS xmlnode,
         xpath|| '/' || y.name AS xpath, y.value as value
  FROM pathstable,
       XMLTABLE('$XMLNODE/(*,@*)' PASSING pathstable.node AS "XMLNODE"
        COLUMNS
          name    varchar(30) PATH 'local-name()',
          value   varchar(20) PATH 'xs:string(.)',
          node    XML         PATH '.') AS y
) SELECT xpath, value
  FROM pathstable;

XPATH                           VALUE
------------------------------- --------------------
/Message                        UrgentRobertTester
/Message/Type                   Urgent
/Message/Person                 RobertTester
/Message/Person/id              123
/Message/Person/FirstName       Robert
/Message/Person/LastName        Tester

6 record(s) selected

A few things to note:

The value of an XML element is defined at the concatenation of all text nodes in the subtree under that element. This explains the values that you see for /Message and /Message/Person in the example above.
For longer paths you may need to increase the length of the VARCHAR(n) in the XMLTABLE function.
In DB2 you may receive warning SQL0347W, which says that this query might recursively run into an infinite loop. But, this would only happen if your XML document was infinitely deep, which isn’t possible. So, you can safely ignore that warning.

Posted by Matthias Nicola
Filed in Uncategorized

6 Comments »

April 16, 2017

May 8, 2015

May 8, 2015

December 24, 2013

January 26, 2013

December 28, 2012

December 4, 2012

How about queryability?

October 23, 2012

October 21, 2012

September 13, 2012

August 14, 2012

1. XQuery FLWOR expression

2. XQuery FLWOR expression in an SQL VALUES clause

3. SQL/XML query with the XMLQUERY function

4. SQL/XML query with the XMLTABLE function

August 10, 2012

July 30, 2012

July 11, 2012

June 29, 2012

June 8, 2012

May 24, 2012

April 19, 2012

New XML Index Data Types: Integer, Decimal

XML Indexes for Case-Insensitive Search

Index support for Prefix Search

XML Indexes for Existential (Structural) Predicates

Friendly XML query behavior

Binary XML client/server communication

Query Performance Enhancements

February 29, 2012

February 4, 2012

The author

Archives