I have pasted in below a working version of a Boyer Moore string
matching algorithm that is testing OK here. A Boyer Moore string
matching algorithm differs from most search functions in that it
requires a table to be loaded dynamically before the algorithm is
run which reduces its suitability for short recursive searches.
Its advantage is that once the table is loaded, it has a substantial
speed advantage over more conventional search function if the
data being searched for is over about 6 characters in length. This
makes it suitable for searching very large bodies of information
such as complete disk partitions, very large files etc ...
What I need if anyone has the time is some independent testing as
it appears to be working OK here on the range of test material I
have run it on.
The technique requires 2 functions to be run, pass the string being searched
for to the "create_shift_table" function which creates the table
in OLE string memory as its return value.
The second function is the Boyer Moore search algorithm which takes the
following parameters,
Apart from the address of the shift table which must be a string
address, the function will search any address range, not just
string data. The design of two seperate function is so that different
types of searches can be performed, this table is set up for binary
search so it will only do case sensitive search in string data.
I am led to believe that J Moore coded Bob Boyer's original idea
in PDP-10 assembler back in 1977 so coding this algo in a high
performance language in x86 assembler is probably a reasonable
way of doing justice to its design.
Any feedback would be appreciated.
Regards,
[email protected]
------------------
[This message has been edited by Steve Hutchesson (edited April 02, 2001).]
matching algorithm that is testing OK here. A Boyer Moore string
matching algorithm differs from most search functions in that it
requires a table to be loaded dynamically before the algorithm is
run which reduces its suitability for short recursive searches.
Its advantage is that once the table is loaded, it has a substantial
speed advantage over more conventional search function if the
data being searched for is over about 6 characters in length. This
makes it suitable for searching very large bodies of information
such as complete disk partitions, very large files etc ...
What I need if anyone has the time is some independent testing as
it appears to be working OK here on the range of test material I
have run it on.
The technique requires 2 functions to be run, pass the string being searched
for to the "create_shift_table" function which creates the table
in OLE string memory as its return value.
The second function is the Boyer Moore search algorithm which takes the
following parameters,
Code:
ByVal lp_shift as LONG, = StrPtr(shift_table$) ByVal startpos as LONG, = 0 based offset to start search from ByVal lpsource as LONG, = StrPtr(source$) ByVal lnsource as LONG, = len(source$) ByVal lptarget as LONG, = StrPtr(search$) ByVal lntarget as LONG = len(search$)
address, the function will search any address range, not just
string data. The design of two seperate function is so that different
types of searches can be performed, this table is set up for binary
search so it will only do case sensitive search in string data.
I am led to believe that J Moore coded Bob Boyer's original idea
in PDP-10 assembler back in 1977 so coding this algo in a high
performance language in x86 assembler is probably a reasonable
way of doing justice to its design.
Any feedback would be appreciated.
Regards,
[email protected]
Code:
' ######################################################################### FUNCTION create_shift_table(srch$) as STRING #REGISTER NONE LOCAL chTable as STRING ' string to build table in LOCAL lpTable as LONG ' address of table LOCAL lpSrch as LONG ' address of search string LOCAL ssl as LONG ' search string length chTable = space$(1024) ' allocate 256 * 4 bytes ssl = len(srch$) ' get search string length lpSrch = StrPtr(srch$) ' get search string address lptable = StrPtr(chTable) ' get table address ' ------------------------------------------------------ ' fill DWORD table with the length of the search string ' ------------------------------------------------------ ! cld ; clear direction flag to read forward ! mov ecx, 256 ! mov edi, lpTable ! mov eax, ssl ; write search string length to table ! rep stosd ! mov ecx, ssl ; put search string length in ECX ! dec ecx ! mov esi, lpSrch ; address search string ! mov edi, lpTable ; table address ! xor eax, eax ; zero EAX ' ----------------------------------------------- ' write the shift vaue for each search character ' into the correct position in the shift table ' ----------------------------------------------- Write_Shift_Characters: ! mov al, [esi] ; get the character ! inc esi ! mov [edi+eax*4], ecx ; write shift for each character ! dec ecx ; to ascii location in table ! jnz Write_Shift_Characters FUNCTION = chTable END FUNCTION ' ######################################################################### FUNCTION BMsearch(ByVal lp_shift as LONG, _ ByVal startpos as LONG, _ ByVal lpsource as LONG, _ ByVal lnsource as LONG, _ ByVal lptarget as LONG, _ ByVal lntarget as LONG) as LONG #REGISTER NONE LOCAL resetpos as LONG LOCAL buffrend as LONG ! mov esi, lpsource ; source address ! mov edi, lpTarget ; target address ! mov ecx, lntarget ! dec ecx ; correct back one to read last character pair ! add esi, ecx ; set esi to first end compare ! add edi, ecx ; set edi to first end compare ! mov resetpos, edi ; set reset position for shift loop ! mov ebx, esi ; use EBX as shift location counter ' ------------------------------ ' set end of source in buffrend ' ------------------------------ ! mov edx, lpsource ; source address ! add edx, lnsource ; add source length ! inc edx ; end of buffer + 1 ! mov buffrend, edx ! mov edx, lp_shift ; use EDX as address of shift table ! dec lntarget ; dec target length for shift loop ! xor eax, eax ; clear EAX ! add esi, startpos ; move ESI forward by starting position bmMainLoop: ! mov al, [esi] ! dec esi ! dec edi ! cmp al, [edi+1] ! jne bmSetShift ! dec ecx ! jnz bmMainLoop ; if ECX = 0, match has occurred ! jmp bmCalcLength ; exit on match bmSetShift: ! add ebx, [edx+(eax*4)]; add shift to location counter ! mov esi, ebx ; copy location counter + shift to ESI ! mov ecx, lntarget ; reset ECX to target length ! mov edi, resetpos ; reset EDI to end char of target string ! cmp esi, buffrend ; test end of buffer condition ! jbe bmMainLoop ; if not end of buffer, do next string compare ! jmp bmNoMatch ; else set no match and exit bmCalcLength: ! sub esi, lpsource ! mov eax, esi ! jmp bmexitpos bmNoMatch: ! mov eax, -1 bmexitpos: ! mov FUNCTION, eax END FUNCTION ' #########################################################################
[This message has been edited by Steve Hutchesson (edited April 02, 2001).]
Comment