Announcement

Collapse
No announcement yet.

Searching in strings > 32kb

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lance Edmonds
    replied
    JFYI, Ethan Winers (circa 1995) book can be downloaded from the PowerBASIC web site at http://www.powerbasic.com/files/pub/docs/WINER.ZIP

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Emil Menzel
    replied
    Wayne:
    If your problem isn't already solved, see Chapter 8 in Ethan
    Winer's book, "Basic Techniques & utilities", Ziff-Davis Press,
    1991. You can download a free copy of the book from a number
    of web sites; try Ethan's site to begin with (www.ethanwiner.com).

    This book deals with QuickBASIC, but the search functions given
    will need little if any revision to run in PB.


    ------------------

    Leave a comment:


  • Wayne Diamond
    replied
    FUNCTION = lFilePtr + lRes - 1 , yes that was what I was after
    I just used it to look for a string located at the very end of a 13mb text file, it found it virtually instantly! *stoked*
    Thanks again very much for your time and work Lance, I should be able to take the training wheels off my PBDOS now
    Havvagoodweekend!


    ------------------

    Leave a comment:


  • Lance Edmonds
    replied
    Well, if you want to return the actual byte position (in accordance with OPTION BINARY BASE) of the match, it should really be FUNCTION = lFilePtr + lRes - 1. My example above simply returned TRUE (non-zero) to indicate that the match was made.

    You could probably optimize the code a little more by moving LOF() outside of the loop, and changing LOCK mode to LOCK READ WRITE (to allow PowerBASIC to use it's internal buffering facilities), etc.


    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Wayne Diamond
    replied
    Thanks for your time Lance, this is really stumping me
    Shouldnt this: FUNCTION = lRes
    be this: FUNCTION = lRes + Chunk
    ?
    as otherwise you'll always return a value in the range of 0-32k
    Apart from that I think this is exactly what I need to get up and running, although I havent done any speed tests yet


    ------------------

    Leave a comment:


  • Lance Edmonds
    replied
    Well, ok, I made you wait long enough...

    I note that you used ON ERROR RESUME NEXT, but do no error testing. It is always a good idea $ERROR ALL ON to help the debugging effort.

    Anyway, I revised your code slightly... may not be 100% bulletproof, but it seems to work fine for me:
    Code:
    $CPU 8086                 ' program works on any CPU
    $COMPILE EXE              ' compile to an EXE
     
    CLS
    PRINT "Starting..."
    PRINT "fInstr = " & STR$(fInstr("F:\PBDLL60\WINAPI\WIN32API.INC", "DeleteObject"))
     
    ' ==================================================================
    FUNCTION fInstr(sFile AS STRING, sSearchTxt AS STRING) LOCAL AS LONG
    '   ON ERROR RESUME NEXT
    
        IF ISFALSE LEN(DIR$(sFile, 7)) THEN EXIT FUNCTION
    
        DIM sChunk AS STRING
        DIM Chunk AS INTEGER
        DIM hFile AS INTEGER
        DIM lFilePtr AS LONG
        DIM lRes AS INTEGER
    
        hFile = FREEFILE
        lFilePtr = PBVBINBASE
        OPEN sFile FOR BINARY ACCESS READ AS #hFile
    
        Chunk = FRE(-4)
        DO
            Chunk = MIN(Chunk, LOF(hFile) - lFilePtr)
            GET$ #1, Chunk, sChunk
            lRes = INSTR(sChunk, sSearchTxt)
            IF lRes > 0 THEN
                CLOSE hFile
                FUNCTION = lRes
                EXIT FUNCTION
            END IF
            INCR lFilePtr, Chunk - LEN(sSearchText) + 1
            SEEK #hFile, lFilePtr
        LOOP UNTIL Chunk < FRE(-4)
        CLOSE hFile
        FUNCTION = 0
    END FUNCTION

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Lance Edmonds
    replied
    Sorry, I cannot tell you... you said you would not ask any more questions!




    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Wayne Diamond
    replied
    Lance, that sounds beautiful! And here's the code Ive come up with, it looks sensational to me but it refuses to work! It compiles fine though
    Code:
    $CPU 8086                 ' program works on any CPU
    $COMPILE EXE              ' compile to an EXE
    $INCLUDE "PB35.INC"       ' link library
      
    '[i]Function fInstr - search a file of any size for a string[/i]
    FUNCTION fInstr(sFile AS STRING, sSearchTxt AS STRING) AS LONG
    ON ERROR RESUME NEXT
    IF DIR$(sFile, 39) = "" THEN
       FUNCTION = 0
       EXIT FUNCTION
    END IF
    DIM sChunk AS STRING * 32750
    DIM hFile AS LONG, lFilePtr AS LONG, lRes AS LONG, ByteHop AS LONG
    hFile = Freefile
    ByteHop = 32750 - LEN(sSearchTxt)
    OPEN sFile FOR BINARY ACCESS READ AS hFile
     lFilePtr = 1
     DO
       GET #1, lFilePtr, sChunk
       lRes = INSTR(1, sChunk, sSearchTxt)
       IF lRes > 0 THEN
          CLOSE hFile
          FUNCTION = lRes
          EXIT FUNCTION
       END IF
       lFilePtr = lFilePtr + ByteHop
     LOOP
    CLOSE hFile
    FUNCTION = 0
    END FUNCTION
     
    ON ERROR RESUME NEXT
    PRINT "Starting..."
    'Search for the word "BIG" in c:\temp\bigfile.txt
    PRINT "fInstr = " & STR$(fInstr("c:\temp\bigfile.txt", "BIG"))
    Any idea why that is failing? It doesn't even print "Starting..." which is weird


    ------------------

    Leave a comment:


  • Lance Edmonds
    replied
    The basic technique for searching large files is quite straight forward. Lets say you want to find a "search string" that is 10 bytes long.
    • Open the file to search in binary mode, and read the first block, say 32750 bytes (32750 bytes is the maximum string length in PB/DOS).
    • Use INSTR() on that string, if found, process it accordingly.
    • If not found, move the file pointer back 9 bytes (the search string length less one), and load another 32760 bytes.
    • Repeat from step 2 above.

    How does that sound?

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Wayne Diamond
    started a topic Searching in strings > 32kb

    Searching in strings > 32kb

    (I promise this is my last question )
    I need to read a file that is normally between 10-12mb, and find a particular chunk in it. I can easily find it by looking for a string that is approx 30 bytes long. The problem is, the file is > 32kb and im using PBDOS so I cant read the whole file into a string like I would in PB/CC
    Im know im not the first person who's read a file > 32kb before, so I was hoping people might give me pointers as to how to go about finding a needle in a large haystack when I can only search a tiny bit of the haystack at a time?
    Thanks,
    Wayne


    ------------------
Working...
X