пятница, 24 сентября 2010 г.

SQL Server: iterations vs. set operations with XML

I previous post I published some common information about AnjLab.FX.SQLServer, set of useful tools for SQL development. Now I would highlight one small research I did during implementation of some functions.

I always consider SQL programming language as set operations as opposed to procedural and object oriented languages. Yes, it includes procedural structures, which allows you to create complex algorithms, but core, nature of SQL approach are set operations – union, join, except, intersect, etc. If something can be done with set operation I prefer this way. In the most cases handling all scope at once is more effective than processing row by row, sometimes, much more effective.

SQL Server 2005/2008 provides ability to use XML data type, store, read and write XML data, transform it to / from relational structures. However, are set operations for XML data as effective as for relational structures? For comparing I do not want to use simple trite abstract examples. Let’s look at inverse tasks I met in practice in several projects – converting rowset column values to string with delimiters and convert string with delimiters to one-column rowset.

Let’s consider first task and use as input rowset of 10K records. For creating it I use AnjLab.FX.SQLServer function getEmptyRowset which returns one-column rowset of given number of records.

Iterations:

declare @String nvarchar(max), @Delimiter nchar(1), @i int

declare @Table table ([Value] nvarchar(10))


insert into @Table select ltrim(str(RecordID))

from fx.GetEmptyRowSet(10000)


set @String = space(0)

set @Delimiter = N','


declare A cursor for select [Value] from @Table

declare @Value nvarchar(10)


open A

fetch next from A into @Value


while @@fetch_status = 0

begin

set @String = @String + @Delimiter + @Value

fetch next from A into @Value

end


close A

deallocate A

select @String

Set operation:

declare @String nvarchar(max), @Delimiter nchar(1), @i int

declare @Table table ([Value] nvarchar(10))


insert into @Table select ltrim(str(RecordID))

from fx.GetEmptyRowSet(10000)


set @String = space(0)

set @Delimiter = N','

set @String = (

select [Value] + @Delimiter as 'data()'

from (select [Value] from @Table) as A

for xml path(''))


select @String

Here is funny trick – we convert relational data to XML with empty structure (xml path(‘’)), which is actually just a string. Profiling (I recommend to use open source AnjLab SQL Profiler) shows duration 1400 ms in first case vs. 350 ms for second. Set operation with XML is 4 times faster than iterations.

Next, focus on inverse task, and use string of the same 10K values. I omitted code for creating such long string in examples below and use just set @String = N'1,2,3,4' to simplify them.

Variant one - Iterations:

declare @String nvarchar(max), @Delimiter nchar(1)

declare @Table table ([Value] nvarchar(max))


set @String = N'1,2,3,4'

set @Delimiter = N','


while charindex(@Delimiter,@String) > 0

begin

insert into @Table values(substring(@String,1,charindex(@Delimiter,@String)-1))


set @String = substring(

@String,

charindex(@Delimiter,@String) + 1,

len(@String) - charindex(@Delimiter,@String))

end

insert into @Table values(@String)


select [Value] from @Table

Variant two – set operation with direct reading from XML:

declare @String nvarchar(max), @Delimiter nchar(1)


set @String = N'1,2,3,4'

set @Delimeter = N','


;with Data(Test) as (select Test = cast(

N'<1>' +

replace(@String, @Delimiter, N'<1>') +

'' as xml))

select Nodes.Node.value(N'.', N'nvarchar(255)') as [Value]

from Data

cross apply Test.nodes (N'//a') Nodes(Node)

What we have done here:

  • Convert string value to XML by replacing delimiters with XML tag <1>
  • Convert scalar string variable to rowset using select. Note, instead of scalar variable we can use table and handle multiply strings.
  • Define rowset as CTE (common table expression) Data. This step is actually, redundant, it just simplifies the code.
  • Using cross apply select node <1> values from XML

Variant three – set operation with openxml:

declare @String nvarchar(max), @Delimiter nchar(1), @docHandle int,

@XML xml


set @Delimeter = N','

set @String = N'1,2,3,4'

set @XML =

cast(N'<1>' +

replace(@String, @Delimiter, N'<1>') +

N'' as xml)


exec sp_xml_preparedocument @docHandle output, @XML


select [Value] = [text] from openxml(@docHandle, N'/root/1')

where [text] is not null


exec sp_xml_removedocument @docHandle

Here we use system procedure call to prepare XML for openxml function. Unfortunately, this makes this variant not applicable to use inside functions.

Let’s compare results: first variant 1600 ms, third – 50 seconds (!), second is endless, I have stopped waiting after 30 minutes (!!). Reading XML with set operation is incredible slower than iterations.

Conclusion is banal: XML is not native structure for SQL Server, reading data from XML is slow, sometimes, very slow operation. XML cannot be considered as replacement for relational structures, it is good only for storing small portions of loosely structured data, like settings. However, writing to XML is fast operation and can be used in such tricks like in first task.

PS. The best approaches were used for implementation FX functions getRowsetAsString and getStringAsRowset.

PPS. For sure, second task can be done with regular expressions, but there is no native support for regexp in SQL Server.

вторник, 21 сентября 2010 г.

AnjLab.FX.SQLServer

AnjLab.FX is .NET framework which includes many various useful tools for NET developers. This is open source project, hosted on GitHub. We, AnjLab team, use FX library in the most of our NET based projects and each new project supplements the library with new functions. You can use it without any restrictions.

AnjLab.FX.SQLServer is an independent part of the project, it can be used separately. It is set of tables, T-SQL based procedures, functions and scripts for Microsoft SQL Server 2005/2008. FX objects can be installed on any existing database on SQL Server, the code includes command line setup routine. In result, in given database will be created separate scheme FX where all objects will be located. No other objects or data will be affected.

Below is a list and short descriptions of the most interesting tools from my point of view:

DDL tools. Note, some FX functions duplicates standard tools, but do the same things in different style and with many options.

  • Stored procedure ScriptObject – returns DDL script for creating given programmability object (stored procedure, view, function, trigger)
  • Stored procedure ScriptTable – returns DDL script for creating given table, including keys, constraints and indexes
  • Stored procedure ScriptEncryptedObject – returns DDL script for creating given encrypted programmability object (stored procedure, view, function, trigger). The procedure works in DAC (dedicated admin access) mode.
  • Stored procedure ScriptDiagram – returns script for creating given diagram
  • Stored procedure ScriptData – scripts all data of given table as set of INSERTS. The procedure has many options.
  • Stored procedure SearchObjectByText - searches SQL Server programmability objects (stored procedures, functions, triggers,     views) by substring in object's body.
  • Script Indexing foreign key fields - creates non-clustered indexes for all fields which are used in foreign keys, if they are not indexed.

Date and time

  • Function CheckLeapYear - returns true if year of given date is leap and false in other case.
  • Function ReduceDate - for given datetime returns start or end of current hour, day, month, etc, in other words, truncates datetime.
  • Function GetPeriods - returns set of date and time periods (hours, days, months, etc) between given start and end dates.

Table valued tools

  • Function GetEmptyRowSet - returns rowset with defined number of record.
  • Function GetStringAsRowSet - returns rowset from string with delimiters

Other

  • Function CalculateYield - returns yield (in %%), based on time period (taking into account leap years) and revenue.
  • Function ConvertVarbinaryToVarcharHex - returns hexadecimal representation of binary data, using chars [0-0a-f].
  • Function Div – just divides one number by another, but handles dividing by zero and converts numbers to float

Tables

  • Countries - world countries dictionary (in Russian and English) according to ISO 3166 and Russian ОКСМ standards
  • Currencies - world currencies dictionary according to Russian ОКВ standard (Russian language only)
  • MeasurementUnits - Russian classification of units of measurement according to Russian ОКЕИ standard (Russian language only)
  • BusinessLegalForms - Russian Classification of Organizational and Legal Forms according to Russian ОКПФ standard (Russian language only)
  • IdentityCardTypes - Russian personal ID types dictionary (Russian language only)

Note, some tools are based on code created by other authors, but all copyrights and information are in place. All scripts have XML style headers with common information.