пятница, 24 сентября 2010 г.

SQL Server: iterations vs. set operations with XML

I previous post I published some common information about AnjLab.FX.SQLServer, set of useful tools for SQL development. Now I would highlight one small research I did during implementation of some functions.

I always consider SQL programming language as set operations as opposed to procedural and object oriented languages. Yes, it includes procedural structures, which allows you to create complex algorithms, but core, nature of SQL approach are set operations – union, join, except, intersect, etc. If something can be done with set operation I prefer this way. In the most cases handling all scope at once is more effective than processing row by row, sometimes, much more effective.

SQL Server 2005/2008 provides ability to use XML data type, store, read and write XML data, transform it to / from relational structures. However, are set operations for XML data as effective as for relational structures? For comparing I do not want to use simple trite abstract examples. Let’s look at inverse tasks I met in practice in several projects – converting rowset column values to string with delimiters and convert string with delimiters to one-column rowset.

Let’s consider first task and use as input rowset of 10K records. For creating it I use AnjLab.FX.SQLServer function getEmptyRowset which returns one-column rowset of given number of records.

Iterations:

declare @String nvarchar(max), @Delimiter nchar(1), @i int

declare @Table table ([Value] nvarchar(10))


insert into @Table select ltrim(str(RecordID))

from fx.GetEmptyRowSet(10000)


set @String = space(0)

set @Delimiter = N','


declare A cursor for select [Value] from @Table

declare @Value nvarchar(10)


open A

fetch next from A into @Value


while @@fetch_status = 0

begin

set @String = @String + @Delimiter + @Value

fetch next from A into @Value

end


close A

deallocate A

select @String

Set operation:

declare @String nvarchar(max), @Delimiter nchar(1), @i int

declare @Table table ([Value] nvarchar(10))


insert into @Table select ltrim(str(RecordID))

from fx.GetEmptyRowSet(10000)


set @String = space(0)

set @Delimiter = N','

set @String = (

select [Value] + @Delimiter as 'data()'

from (select [Value] from @Table) as A

for xml path(''))


select @String

Here is funny trick – we convert relational data to XML with empty structure (xml path(‘’)), which is actually just a string. Profiling (I recommend to use open source AnjLab SQL Profiler) shows duration 1400 ms in first case vs. 350 ms for second. Set operation with XML is 4 times faster than iterations.

Next, focus on inverse task, and use string of the same 10K values. I omitted code for creating such long string in examples below and use just set @String = N'1,2,3,4' to simplify them.

Variant one - Iterations:

declare @String nvarchar(max), @Delimiter nchar(1)

declare @Table table ([Value] nvarchar(max))


set @String = N'1,2,3,4'

set @Delimiter = N','


while charindex(@Delimiter,@String) > 0

begin

insert into @Table values(substring(@String,1,charindex(@Delimiter,@String)-1))


set @String = substring(

@String,

charindex(@Delimiter,@String) + 1,

len(@String) - charindex(@Delimiter,@String))

end

insert into @Table values(@String)


select [Value] from @Table

Variant two – set operation with direct reading from XML:

declare @String nvarchar(max), @Delimiter nchar(1)


set @String = N'1,2,3,4'

set @Delimeter = N','


;with Data(Test) as (select Test = cast(

N'<1>' +

replace(@String, @Delimiter, N'<1>') +

'' as xml))

select Nodes.Node.value(N'.', N'nvarchar(255)') as [Value]

from Data

cross apply Test.nodes (N'//a') Nodes(Node)

What we have done here:

  • Convert string value to XML by replacing delimiters with XML tag <1>
  • Convert scalar string variable to rowset using select. Note, instead of scalar variable we can use table and handle multiply strings.
  • Define rowset as CTE (common table expression) Data. This step is actually, redundant, it just simplifies the code.
  • Using cross apply select node <1> values from XML

Variant three – set operation with openxml:

declare @String nvarchar(max), @Delimiter nchar(1), @docHandle int,

@XML xml


set @Delimeter = N','

set @String = N'1,2,3,4'

set @XML =

cast(N'<1>' +

replace(@String, @Delimiter, N'<1>') +

N'' as xml)


exec sp_xml_preparedocument @docHandle output, @XML


select [Value] = [text] from openxml(@docHandle, N'/root/1')

where [text] is not null


exec sp_xml_removedocument @docHandle

Here we use system procedure call to prepare XML for openxml function. Unfortunately, this makes this variant not applicable to use inside functions.

Let’s compare results: first variant 1600 ms, third – 50 seconds (!), second is endless, I have stopped waiting after 30 minutes (!!). Reading XML with set operation is incredible slower than iterations.

Conclusion is banal: XML is not native structure for SQL Server, reading data from XML is slow, sometimes, very slow operation. XML cannot be considered as replacement for relational structures, it is good only for storing small portions of loosely structured data, like settings. However, writing to XML is fast operation and can be used in such tricks like in first task.

PS. The best approaches were used for implementation FX functions getRowsetAsString and getStringAsRowset.

PPS. For sure, second task can be done with regular expressions, but there is no native support for regexp in SQL Server.

Комментариев нет: