Thursday, September 3, 2009

Removing Redundant namespaces from FOR XML output

I've had trouble in the past with extraneous namespaces in the output of my FOR XML queries. I found no intrinsic functionality to remove these extra namespaces.
What happens is when you use the "With XMLNAMESPACES" clause, any namespaces you declare will be added to every top level node you generate as a sub select FOR XML query, even if the namespace is not referenced. For some this might be no big deal, but this is unecessary overhead in an already bulky medium. It also clutters up your output making it harder to read. Depending on the size of  your xml output, this can mean extra 10s or 100s of megabytes or more, of unecessary, redundant namespace attributes.

My first approach was to cast the resulting xml to varchar(max) and do a global replace/removal of any extra tags. This approach doesn't save the overhead of actually having to output the xml as it's already been generated in memory and to disk. You're just removing data, post render, for cleaner output to the calling proc or application.

I found a cleaner approach is to save your output xml to a variable without the namespace declarations first, then select the variable with the namespace declarations using the query method of the xml data type and the FOR XML clause with ROOT directive. The Root directive adds the namespace back to your top level root element.

Here are the two examples using the adventureworks database. The first shows a basic FOR XML query with a correlated subquery. The output of this sub query node will generate a namespace declaration for every node in the list, because of the WITH XMLNAMESPACES declaration. The file produced is 404kb in an average of 40 milliseconds to run.

The second example saves the xml output generate from the same query, without the XMLNAMESPACES clause, and stores it in a local XML variable. The variable is then selected with the XMLNAMESPACES clause with the ROOT directive. This method produces a 248k file in an average 50 milliseconds to run. There is the minor hit in execution time as there's an extra step in selecting the variable, but depending on what your needs are, it may be acceptable.

The IO statistics are identical, so there is no gain or loss in either method, in terms of disc access during query execution. Again, the goal for me was two fold;
  1. Adhere to an expected output format.
  2. Remove redundant namespace declarations from output xml to reduce data being saved to disk or transmitting over the pipe to the client.



-- Example 1:


;WITH XMLNAMESPACES( 'http://www.ns.org/xsd/msg/1.1/'as ns1,


'http://www.ns.org/xsd/1.1/' as ns2 )


SELECT customerid '@id',


customertype '@type',


accountnumber ,


(


SELECT firstname,


lastname


FROM person.contact


WHERE contactid=customerid


FOR XML PATH(''),TYPE


)


FROM sales.customer


WHERE territoryid=10


FOR XML PATH('customer'),TYPE, ROOT('customers')






-- Example 2:


DECLARE @output XML


SELECT @output=


(


SELECT customerid '@id',


customertype '@type',


accountnumber ,


(


SELECT firstname,


lastname


FROM person.contact


WHERE contactid=customerid


FOR XML PATH(''),TYPE


)


FROM sales.customer


WHERE territoryid=10


FOR XML PATH('customer'),TYPE


)


;WITH XMLNAMESPACES( 'http://www.ns.org/xsd/msg/1.1/'as ns1,


'http://www.ns.org/xsd/1.1/' as ns2 )


SELECT @output


FOR XML PATH(''),TYPE , ROOT('customers')

0 Comments:

Post a Comment

<< Home