You are not logged in. You can browse in the PowerBASIC Community, but you must click Login (top right) before you can post. If this is your first visit, check out the FAQ or Sign Up.
Does my (abeit untidy) test code above run for you? (Obviously it will need the paths edited accordingly), but if it runs then you are half-way to getting it working in your real program.
Yes, the "&" type-specifier suffix means a LONG integer.
This thread is now getting to long and is closed... if you require more help with this item, please start a new thread and include a link to this thread.
I played with this a little further and the following code is the fruits of my efforts.
When run on the WIN32API.INC file, INSTR() beats the BM algorithm code every time on my Win2K PC, at a fairly consistent 1-30% (worst:best).
Get rid of the B$ allocation within the loops to greatly speed this up too... on my PC, the MB loop runs in 300-400ms, and the INSTR loop around 30% faster.
Here is the code I used:
Code:
#COMPILE EXE
#INCLUDE "WIN32API.INC"
'##########################################################################
FUNCTION ScanString(BYVAL startpos AS LONG, _
BYVAL source AS LONG, BYVAL lnsrc AS LONG, _
BYVAL pattern AS LONG, BYVAL lnpatn AS LONG) AS LONG
#REGISTER NONE
! mov edi, pattern
! mov al, [edi] ; get 1st byte
! mov esi, source
! mov ecx, lnsrc
! add esi, ecx
! neg ecx
! add ecx, startpos
! dec lnpatn
! jmp main_loop
' %%%%%%%%%%%%%%%%%%%%%%%%%%%%
pre_loop:
! pop ecx
! inc ecx
main_loop:
' ------------
' unroll by 2
' ------------
! cmp al, [esi+ecx]
! je pre_sub
! inc ecx
! jz nMatch
! cmp al, [esi+ecx]
! je pre_sub
! inc ecx
! jnz main_loop
' ------------
! jmp nMatch
pre_sub:
! push ecx
! mov edi, pattern
! mov ebx, lnpatn
sub_loop:
! mov ah, [esi+ecx+1]
! cmp ah, [edi+1]
! jne pre_loop
! inc ecx
! inc edi
! dec ebx
! jnz sub_loop
' %%%%%%%%%%%%%%%%%%%%%%%%%%%%
! pop eax
! add eax, lnsrc
! jmp Outa_Here
nMatch:
! mov eax, -1
Outa_Here:
! mov FUNCTION, eax
END FUNCTION
' ########################################################################
FUNCTION PBMAIN
LOCAL fl AS LONG
REDIM patterns$(25)
OPEN "F:\PBDLL60\WINAPI\win32api.inc" FOR BINARY AS #1
' OPEN "C:\WINNT5\SYSTEM32\SHELL32.DLL" FOR BINARY AS #1
fl = LOF(1)
GET$ #1,fl,a$
CLOSE #1
patterns$(0) = "VerQueryValue"
patterns$(1) = "GetFileVersionInfo"
patterns$(2) = "GetFileVersionInfoSize"
patterns$(3) = "VerInstallFile"
patterns$(4) = "VerFindFile"
patterns$(5) = "SystemTimeToVariantTime"
patterns$(6) = "VariantTimeToSystemTime"
patterns$(7) = "VarDateFromStr"
patterns$(8) = "WinExecError"
patterns$(9) = "SHFreeNameMappings"
patterns$(10) = "DragFinish"
patterns$(11) = "DragAcceptFiles"
patterns$(12) = "Shell_NotifyIcon"
patterns$(13) = "ShellExecute"
patterns$(14) = "ShellAbout"
patterns$(15) = "SHFormatDrive"
patterns$(16) = "SHGetNewLinkInfo"
patterns$(17) = "SHGetFileInfo"
patterns$(18) = "SHFileOperation"
patterns$(19) = "SHAppBarMessage"
patterns$(20) = "FindExecutable"
patterns$(21) = "FindEnvironmentString"
patterns$(22) = "ExtractIconEx"
patterns$(23) = "ExtractIcon"
patterns$(24) = "ExtractAssociatedIcon"
aln& = LEN(a$)
LOCAL cnt AS LONG
cnt = 0
b$ = ""
tc2& = GetTickCount
DO
rv& = ScanString(0,STRPTR(a$),aln&, _
STRPTR(patterns$(cnt)),LEN(patterns$(cnt)))
b$ = b$ + STR$(rv&)
! inc cnt
LOOP WHILE cnt < 25
tc2& = GettickCount - tc2&
cnt = 0
b$ = b$ + $CRLF
tc& = GetTickCount
DO
rv& = INSTR(a$, patterns$(cnt))
b$ = b$ + STR$(rv&)
! inc cnt
LOOP WHILE cnt < 25
tc& = GetTickCount - tc&
b$ = B$ + $CRLF + "INSTR/MB = " + STR$(tc&/tc2&)
MessageBox hWin&,BYVAL STRPTR(B$),"MB=" + STR$(tc2&) + " INSTR=" + STR$(tc&), _
%MB_OK OR %MB_ICONINFORMATION
END FUNCTION
I changed the code to load SHELL32.DLL (OPEN "C:\WINNT5\SYSTEM32\SHELL32.DLL" FOR BINARY AS #1) and ran Steves code above... it works fine for me...
My K6/266 (with an MP3 player running and a bunch of other apps going too), it did all 25 searches in SHELL32.DLL in 801mS-841mS, and found all of the API names that are actually contained in SHELL32.DLL.
I can only suggest that you relook at your code or post it here...
Steve, that code seems to be working. It returns the correct location
for each of them strings.
Lance, i started an entirely new project, pasted in the ScanString function,
created a PBMAIN with the previous example Steve posted - but that would always return
622 no matter what i searched for.
Steve, have you tested this code with Binary files? ie: EXEs.
No matter what i search for in a binary file, it will return -1.
You are right though, these Boyer Moore algorithms are damn fast!!
Here is a quick test piece that does the type of searching you
require. This runs 25 iterations of win32api.inc in about 150ms
on my PIII with 25 different patterns searched for in the same
source file. About 125 meg/sec which is probably in the ball park
in terms of the speed you require.
This test code below is running 100 iterations through win32api.inc in about
700 ms on my PIII so it is searching about 75 meg in 700 ms.
Code:
Open "win32api.inc" for Binary as #1
Get$ #1,lof(1),a$
Close #1
b$ = "' ** VERSION.DLL Declares" ' << almost at the end
LOCAL cnt as LONG
cnt = 0
tc& = GetTickCount
ap& = StrPtr(a$)
aln& = len(a$)
bpt& = StrPtr(b$)
bln& = len(b$)
Do
rv& = ScanString(0,ap&,aln&,bpt&,bln&)
! inc cnt
loop while cnt < 100
tc2& = GetTickCount - tc&
MessageBox hWin,ByCopy str$(rv&),ByCopy str$(tc2&), _
%MB_OK or %MB_ICONINFORMATION
I would make sure that what you are passing to the "ScanString" function
is correct. Perhaps you need a 3 dimensional array with STRING, ADDRESS and
LENGTH so that you can load the source file and just loop through it.
Converting each time you run a string is a lot slower, far better to have them
all preloaded so that when you do the scan, they are ready.
Steve,
I have a LoadDB sub which is called in PBMAIN, this loads a currently
small definitions file of 30 into 2 dimensions string array. One of the
subscripts in this contains a hexidecimal search string. Still in the
LoadDB sub, i convert these Hexidecimal strings using Chr and Val into
normal ASCII. The search array of strings is then loaded into memory.
Next i have a 'AnalyseFile' function. This opens the file specified in
the 'szFile as String' parameter of the function. Then starts a For Next loop
to analyse that file for each search string in the 2D search string array
that was created from LoadDB.
If i use the normal InStr function that comes with PB, this will
analyse a 3.2mb BMP file for all 30 search strings in 651 ms.
I used GetTickCount and a bit a math to work out a Scan Speed in Mb/sec.
And Calculated 6Mb/sec - which is awfully slow
Right now, i'm working with your ScanString function. But it doesn't find anything,
even if they are valid strings as found in the file..
Code:
LOCAL tC AS LONG, enumI AS LONG
LOCAL retVal AS LONG, MainStr AS STRING, SearchStr AS STRING
SearchStr = ayFileDef(enumI, 2)
retVal = ScanString( 0, STRPTR(MainStr), LEN(MainStr), STRPTR(SearchStr), LEN(SearchStr) )
'retVal = INSTR( 1, MainStr, SearchStr )
I'm using the exact same ScanString as you posted.
It will always return -1, what am i doing wrong?
Thanks very much and bare with me
Best,
-Nathan.
[This message has been edited by Nathan Evans (edited July 01, 2001).]
Thanks for testing the algo, this PIII I use has so many "unusual"
effects when timing algorithms that it can nearly drive you nuts.
It looks like the ScanString algo is still a bit too slow in its
branching code so i will see if I can find a way to get its branch
speed up a bit.
Nathan,
You should not have to do a conversion to a basic dynamic string
to use the two algorithms that I have pasted in, they work directly
in addresses and the way you load the file using the API function
should give you the start address and the file length.
To get your speed up, I would suggest that you make an array of the
search patterns you wish to use and store their lengths as well so
that this information is available when you start the scanning for
each file. The idea is to load the search patterns at startup so that
you have both their addresses and their lengths.
When you load the file to scan from disk, you will have the start address
and length and you can loop through the array of patterns with very little
overhead once you set it up.
The length of the pattern being searched is not that important in most
instances, the scanners will stop testing after the first mismatch and
continue with the main scan. If the Boyer Moore algo runs OK, it
will actually get faster as the pattern gets longer.
Borje/Steve, i had a go with the code you kindly posted, and have had some success.
I am inputting a 500kb file.. and searched it for 15 strings.
The total process, per GetTickCount, took 40ms. This is MUCH faster than
a previous implementation i did in VisualBasic using it's InStr, if i recall
it was around 300ms..
However much i would like to use this code. I do not believe 40ms is fast enough,
when searching for 15 strings.
May i ask a question?..
Is the speed of the ScanString function affected by the length of
the SearchStr ? ie: will using longer search strings increase or decrease
the speed of the search ?
I will keep fiddling with this code, and see if i can make it any more efficient.
Okay, tested on relatively slow machine. ScanString is a bit faster
and seems to work fine. Must observe results though, since they are
zero based and returns -1 on failure, unlike INSTR's 1-based and zero
on failure. Very interesting code, Hutch - I like it a lot.
Code:
' Processor = 266 MHz
' MainStr = 80KB text file.
' wrd() = array of 3000 different phrases, ranging from 2 - 37 bytes long
' NumRecs = UBOUND(wrd)
'
' Average results from multiple scans on file
' result INSTR, 603 hits/scan in 6.39 sec.
' result myINSTR, 603 hits/scan in 6.25 sec.
' result ScanString, 603 hits/scan in 5.58 sec.
' test code as follows: (here for ScanString, but same kind was used for all)
' ------------------------------------------------------------------------------------
LOCAL c AS LONG, mPtr AS LONG, sPtr AS LONG, mLen AS LONG, sLen AS LONG, _
hit AS LONG, lRes AS LONG, tc1 AS LONG, tc2 AS LONG
tc1 = GetTickCount()
mPtr = STRPTR(MainStr)
mLen = LEN(MainStr)
FOR c = 0 TO Numrecs
SearchStr = wrd(c)
sPtr = STRPTR(SearchStr)
sLen = LEN(SearchStr)
lRes = ScanString(0, mPtr, mLen, sPtr, sLen)
IF lRes > -1 THEN
INCR hit
DO
lRes = ScanString(lRes + sLen, mPtr, mLen, sPtr, sLen)
IF lRes > -1 THEN INCR hit
LOOP WHILE lRes > -1
END IF
NEXT
tc2 = GetTickCount() - tc1
MSGBOX STR$(hit) + " " + FORMAT$(tc2/1000, "0.0000") + " Pos: " + FORMAT$(lRes), %MB_OK OR %MB_ICONINFORMATION, "ScanString"
' ------------------------------------------------------------------------------------
I think before i can do anything, i need to convert this Byte Array
into a String. As it seems none of these search algorithms will process
a Byte Array
I wrote a simple For Next loop using Chr$ to convert the Byte Array into a string
but it's far too slow..
I scoured the helpfiles for a PB equivilent of VB's StrConv, but couldn't
find anything. You guys have any fast subs to do it?
If you have the time and still have the test handy, would you run this
byte scanner through it to see how it shapes up, I only have it
benchmarked in MASM when I was using this type of algo to test
against the Boyer Moore ones I was working on.
What I was trying to do with this one was bridge the gap between
scanners and the BM type algos. It has a short main loop instruction
path where my testing showed that the main time was taken with byte
scanners, its branch path is reasonable but I could not se an easy way to
shorten it. I unrolled the main loop by 2 so that there were no
unusual effects from it being so short.
Nathan, According to your snippet "szdata" is a BYTE array, so you cannot use STRPTR() on it!
Maybe you need to post more of the code so we can see exactly what you are doing here, but you probably just need to pass it as BYVAL VARPTR(szData(0)) which will pass the address of the 1st subscript (0) of the szData() array (assuming szData's LBOUND is 0).
Use the version I pasted in, it does not use the external shift table that the
first one did.
Code:
FUNCTION BMBinSearch(ByVal startpos as LONG, _
ByVal lpSource as LONG,ByVal srcLngth as LONG, _
ByVal lpSubStr as LONG,ByVal subLngth as LONG) as LONG
use either VarPtr() or StrPtr() to get the address of the source, depending on
how you load the source, the source length, the address of the pattern to search
for and the pattern length.
We process personal data about users of our site, through the use of cookies and other technologies, to deliver our services, and to analyze site activity. For additional details, refer to our Privacy Policy.
By clicking "I AGREE" below, you agree to our Privacy Policy and our personal data processing and cookie practices as described therein. You also acknowledge that this forum may be hosted outside your country and you consent to the collection, storage, and processing of your data in the country where this forum is hosted.
Leave a comment: