On Wed, 2 Jul 2008 03:06:22 -0700 (PDT), lyle fairfield
<lyle.fairfield@[EMAIL PROTECTED]
> wrote:
Comments in-line.
<clip>
>
>I'm not so familiar with running .Net in SQL Server. Does ".Net
>assembly in SQL Server 2005" imply the function is converted to
>assembly language? Or is it compiled to machine code? P-Code? Or just
>interpreted script? Will any of these impact much on string functions
>anyway?
TvS: It runs like any other .Net assembly: Visual Studio compiles it
to IL, then the .Net runtime converts it to machine code the CPU
understands. .Net is smart enough to cache the compilation. I was
surprised how fast it was compared to my C DLL.
>
>Will the Ratcliff/Obershelp algorithm deal with such directions as:
>If you find "Yummy" and "Tasty" in the string return "YummyTasty"?
No. The algorithm dus not return a string, but a number between 0 and
1 indicating the relative similarity of two given strings. Using the C
DLL I would call it like this:
Declare Function simil Lib "simildll" Alias "_simil@[EMAIL PROTECTED]
" (ByVal strOne
As String, ByVal strTwo As String) As Double
Debug.Print simil("The Yummy and Tasty Waffle Cor****ation", "Yummy
& Tasty Waffle, Incor****ated")
=> 0.75
The .Net assembly is a bit more complicated to call: you have to first
register the assembly so it shows up under database > programmability
> assemblies,
Then you create a scalar function like this:
ALTER FUNCTION [dbo].[clrSimil](@[EMAIL PROTECTED]
[nvarchar](80), @[EMAIL PROTECTED]
(80))
RETURNS [float] WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [Simil].[Simil.clsSimil].[similDotNet]
Then you call that function in your sproc, for example:
ALTER PROCEDURE [dbo].[cboCompanies]
@[EMAIL PROTECTED]
varchar(255),
@[EMAIL PROTECTED]
float
AS
BEGIN
IF @[EMAIL PROTECTED]
= 1.0 --exact match
BEGIN
SELECT CompanyID, CompanyName
FROM tblCompanies
WHERE CompanyName=@[EMAIL PROTECTED]
CompanyID, CompanyName
FROM tblCompanies
WHERE dbo.clrSimil(@[EMAIL PROTECTED]
CompanyName) > @[EMAIL PROTECTED]
BY CompanyName
END
END
>
>An air code RegExp that does that with both strings provided by the
>OP, runs a million iterations in 7 seconds on my computer. Of course,
>that's just P-Code.
>
>If I were doing this I'd probably explore getting big strings with
>ADO's GetString, or maybe XML record set saves, and applying the rules
>to those big strings that included many records.
>
>Sample Code (done over coffee and with the connivance of allergy
>induced sleeplessness - we have met the pollen and it has won!)
>
>Private Declare Function GetTickCount& Lib "kernel32" ()
>
>Private Sub Whatever()
> Dim Iterator&
> Dim RegExp As Object
> Dim ThousandthsofaSecond&
> Dim WhatComesOut$
> Dim WhatGoesIn$(0 To 1)
> WhatGoesIn(0) = "The Yummy and Tasty Waffle Cor****ation"
> WhatGoesIn(1) = "Yummy & Tasty Waffle, Incor****ated"
> ThousandthsofaSecond = GetTickCount()
> Set RegExp = CreateObject("VBScript.RegExp")
> With RegExp
> .Global = True
> .IgnoreCase = True
> .pattern = ".*yummy.*tasty.*"
> For Iterator = 0 To 999999
> WhatComesOut = .Replace(WhatGoesIn(Iterator Mod 2),
>"YUMMYTASTY")
> Next Iterator
> End With
> Debug.Print _
> "(" & WhatGoesIn(0) _
> & "||" _
> & WhatGoesIn(1) & ")" _
> & "->" _
> & WhatComesOut & vbNewLine _
> & Iterator _
> & " times in " _
> & (GetTickCount() - ThousandthsofaSecond) / 1000 _
> & " seconds"
>End Sub
>
>(The Yummy and Tasty Waffle Cor****ation||Yummy & Tasty Waffle,
>Incor****ated)->YUMMYTASTY
>1000000 times in 7.176 seconds
>
>
>
Impressive, but this strategy requires custom programming for each
company, whereas I'm suggesting a different more generic approach that
can group like strings together.
Now I'm not saying Ratcliff is a great algorithm for all cases. It can
fail spectacularly because it works based on largest common substring.
OP should read the header of the original source code.
-Tom.


|