Sometimes you need to write a SQL query that you know is going to be really ugly, or you may need to do something that SQL doesn’t support natively. Common Language Runtime (CLR) functions can help in these types of situations.

This is a bit of a contrived example, but I have a real-world need for this, so let’s get to it. A client of mine has an e-commerce site that I support, and their products & descriptions are imported from the supplier. Their request went something like this:

Every part has a SKU and a description. Sometimes a new part will replace an old part and the new part’s description will say, “…This part replaces 12345…” where 12345 is the SKU for the old part.

We want to make sure that a customer lands on the new part’s page no matter what.

Okay, this doable. I need to search through the product descriptions and pluck out the affected records, and ultimately grab the old/new SKU numbers. A quick & dirty SQL query could find these:

select SKU as NewSKU, 
substring(LongDescription, patindex('%[0-9][0-9][0-9][0-9][0-9]%', LongDescription), 5)
as OldSSKU
from Product
where LongDescription like '%replaces%' 
and patindex('%[0-9][0-9][0-9][0-9][0-9]%', LongDescription) > 0

This query works, but I quickly realized that the SKUs are not *always* 5 digits. Sometimes they are 6 digits, or sometimes like ‘12345-1’. This could get ugly quick. What if we could pass in a Regular Expression? SQL Server doesn’t support that. This is where CLR can come in handy.

First, I’ll create a new C# Class Library project and pop in the RegEx code:

using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections;
using System.Collections.Generic;

namespace CLRRegEx
{
    public class Class1
    {
        [SqlFunction(DataAccess = DataAccessKind.Read, 
            FillRowMethodName = "FillMatches", 
            TableDefinition = "GroupNumber int, MatchText nvarchar(4000)")]
        public static IEnumerable RegexGroupValues(string Input, string Pattern)
        {
            List<RegexMatch> GroupCollection = new List<RegexMatch>();

            Match m = System.Text.RegularExpressions.Regex.Match(Input, Pattern);
            if (m.Success)
            {
                for (int i = 0; i < m.Groups.Count; i++)
                {
                    GroupCollection.Add(new RegexMatch(i, m.Groups[i].Value));
                }
            }

            return GroupCollection;
        }

        public static void FillMatches(object Group, out SqlInt32 GroupNumber, out SqlString MatchText)
        {
            RegexMatch rm = (RegexMatch)Group;
            GroupNumber = rm.GroupNumber;
            MatchText = rm.MatchText;
        }

        private class RegexMatch
        {
            public SqlInt32 GroupNumber { get; set; }
            public SqlString MatchText { get; set; }

            public RegexMatch(SqlInt32 group, SqlString match)
            {
                this.GroupNumber = group;
                this.MatchText = match;
            }
        }
    }
}

Now that it’s compiled into a .dll, we need to add it into SQL Server.

In SSMS, ensure that CLR is enabled with this:
sp_configure 'clr enabled', 1 GO RECONFIGURE GO

Security settings prevent the registration of our dll; we need to temporarily relax the CLR security:
sp_configure 'clr strict security', 0; RECONFIGURE; GO

In this post, I’ll simply copy the .dll directly onto the SQL Server’s file system. If you don’t have access to the file system, you can still use CLR but you’ll have to jump through some extra hoops. I won’t get into that in this post, but I will in my next one. Copy the .dll to the SQL Server machine (C:\Temp is what I used).

Add the assembly:
CREATE ASSEMBLY SqlRegularExpressions FROM 'C:\Temp\CLRRegEx.dll' WITH PERMISSION_SET = SAFE GO

Bind the assembly method with a new SQL function:
EXECUTE dbo.sp_executesql @statement = N'CREATE FUNCTION RegExMatcher(@text nvarchar(max), @pattern nvarchar(255)) RETURNS TABLE ([Index] int, [Value] nvarchar(4000)) AS EXTERNAL NAME CLRRegEx.[CLRRegEx.Class1].RegEx'

Reset the CLR security:
sp_configure 'clr strict security', 1; RECONFIGURE; GO

Now that everything is configured/installed, we can call it like this:

select SKU as NewSKU, Value as OldSKU 
from Product
cross apply dbo.RegExMatcher(LongDescription,'[0-9]{5}.')
where LongDescription like '%replaces%'

Now we can get fancy and use whatever regular expressions we want without being limited by SQL Server’s built-in functions.

You’ll notice that I left the “where” clause in there. That ensures that I never pass a NULL LongDescription to the RegExMatcher function. I didn’t code for it and it will definitely blow up as shown below:

I could have removed the where clause and added “replaces” into the RegEx, but why? I’m not a glutton for punishment. The where clause is WAY more simple and works just fine. Additionally, and this is just my experience, it’s faster. I suspect SQL is only sending the “where-clause-limited” rows to CLR instead of the whole dataset.

I’m not one to reinvent the wheel so 99.99% of the time I’ll use built-in SQL functions, but for that last .01% CLR can come in handy. My next post will show you how a hacker might use CLR against you 😈