Carlos Nunes-Ueno wrote:
> I have a fairly large table (700,000 rows or so) that I'd like to run a
> process. However, the procedure we have right now was designed with
> tables of more like 20,000 rows and isn't able to handle it. Access
will
> always crash before it can complete.
>
> Some background: the procedure is used in the process of data cleanup
and
> is designed to process company names into a standardized form, so that
we
> can use it to confirm data across various datasets. The procedure takes
> input like, for example, "The Yummy and Tasty Waffle Cor****ation" or
> "Yummy & Tasty Waffle, Incor****ated" and turns both into "TASTYWAFFLE".
> We can then sort, link, and filter, etc. on this field along with others
> to see if there are duplicates or check if companies that have different
> IDs are in fact the same company.
>
> Specifically, another procedure takes a specified table and field and
> creates a new field, filling it with the contents of the original field.
> That procedure then p***** a DAO recordset and the name of the new field
> to the main procedure which then performs 11 operations to arrive at the
> "TASTYWAFFLE" stage. Filling the new field is clearly double work and
> I'll be triming that part out.
>
> Now for my specific questions: Currently, the procedure takes the whole
> contents of the field at once, and uses a lot of InStr, Mid, Left, and
> Right functions to perform all of the operations, then moves on to the
> next row. It seems more direct to just read character by charater until
> I have a complete word (i.e. I hit a space or other delimiter), process
> that bit, then move on to the next part of the field. Which of these
> approaches is more efficient? Also, where could I go to find some
> guidelines on writing the most scalable VBA code? I know Access has
> limitations, but I'd like to be limited by those and not by our own
> inefficiencies.
>
> Thanks in advance,
>
> Carlos
You could create a column in a query like the following. It assumes
your field name is called CompanyName and the new calculated column will
be called is ShortCompanyName.
ShortCompanyName : MakeShortName([CompanyName])
I'd suggest, when testing, to create a query that selects 3 or 4 names
from the table so it doesn't calc the entire lot in case there's a code
error. I used a return value type variant in case the field is null.
Drop the following code into a module. Then run the test query. This
assumes you are using A2000+ that has the Split() function.
Public Function MakeShortName(strVar As Variant) As Variant
Dim strHold As String
Dim strAR() As String
Dim strChar As String
Dim i As Integer
If Not IsNull(strVar) Then
'strip out non-ascii chars
For i = 1 To Len(strVar)
strChar = UCase(Mid(strVar, i, 1))
'convert the string to only include alphanumerics and spaces
If IsNumeric(strChar) Or (Asc(strChar) >= 65 And _
Asc(strChar) <= 90) Or Asc(strChar) = 32 Then
strHold = strHold & Mid(strVar, i, 1)
End If
Next
'creates an array of all words with a space as the delimiter.
strAR = Split(strHold, " ")
strHold = ""
For i = LBound(strAR) To UBound(strAR)
Select Case strAR(i)
Case "The", "And", "Or", "Corp", "Cor****ation", "Inc", "St"
'excludes words that match
Case Else
strHold = strHold & strAR(i)
End Select
Next
End If
MakeShortName = strHold
End Function
I probably would not want to sort the query with 700K names on this
calculated column. However, you could use the function in an update
query to update a field to hold the shortened company name that's
indexed in order to make it fast.
Magical Freefall Tripbox
http://www.youtube.com/watch?v=5o7NLLT4nDg


|