Announcement

Collapse
No announcement yet.

File Processing Speed: Read Line by Line Vs File in a string in memory

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • File Processing Speed: Read Line by Line Vs File in a string in memory

    I thought I should start a new thread cos I suspect Lance is getting ready
    to close the other one and I have begun to see the problem in a new light
    thanks to all your clever input

    John re: using a UDT to input to:
    > This would also have the speed advantage that you can define sections of the
    > line ready for processing rather than using Mid$
    John thats Darn slick!! You mean you can input to a type! no one told me that - WOW!
    I love that concept. As you say, that would speed up all that MID$ stuff.

    > Why would you bother converting? The file is a classic fixed length fixed BYTE position record
    why indeed.

    So my question has become about the speed of reading a line at a time with: GET #1, LineLength, NewLineStr
    Vs. Reading the whole file into memory and then working on it.

    It seems intuitvly that working with it in memory should be faster
    but this does not seem to be the case in practice. So I assume
    the way I am handling the file in memory is not optimal.

    So far, to work on the whole file in memory, I have only tried:

    Code:
    OPEN "file" FOR BINARY AS #1
    GET$ #1, LOF(1), a$
    CLOSE #1
    
    FOR i = 1 TO PARSECOUNT(a$, CHR$(10))
        NewLineStr = PARSE$(a$, i)
        CALL process LINE
    NEXT
    This was slow because i think Parse$ has to start at the beginning of the string a$ each time.

    I think reading a line at a time was faster because the file pointer is sitting at the start of the next record
    and windows reads large chunks at a time into memory of a file that is open (???)

    Michael,
    > How about plan C?
    > x$ = EXTRACT$([start, ]MainString, [ANY] MatchString
    This looks perfect. I assume after the extraction, i just lop off that many chars
    from the front of the string and then do it again with:
    Code:
    DO
        NewLineStr = EXTRACT$(a$, CHR$(10))
        a$ = RIGHT$(a$, LEN(a$) - LEN(NewLineStr))
    LOOP
    > Or plan D?
    > REGEXPR mask$ IN main$ [AT start&] TO posvar&, lenvar&
    Code:
    b$ = "\n" 'interpreted as a line-feed or new-line character: CHR$(10)
    start& = INSTR(a$, CHR$(10))
    DO
        REGEXPR b$ IN a$ AT start& TO position&, length&
        NewLineStr = mid$(a$, start&, start&-position&)
        start& = position&
    LOOP
    > (BTW, there's a REGEXPR replacement for LINE INPUT in the Source Code forum)
    Im not using LINE INPUT tho ???


    Or even plan E:
    position& = INSTR(a$,CHR$(10))
    NewLineStr = LEFT$(a$, position&-1)
    a$ = RIGHT$(a$, LEN(a$) - position&)

    So which is faster guys?
    GET #1, ' read a line at a time
    EXTRACT$ ' process a string in memory
    REGEXPR ' process a string in memory
    INSTR( ' process a string in memory


    Scott, Fred,
    Thank you for the code

    (PS. The Lines I posted originally were from another file that is 70 Chars long yes.
    There are two files. It turns out I need to work with the second which is 80Chars.
    Sorry for the confusion)

    ------------------
    Kind Regards
    Mike

    [This message has been edited by Mike Trader (edited July 15, 2001).]

  • #2
    I have found regexpr to be very slow for simple tasks like searching for a cr/lf.
    I think for maximum speed in parsing lines a cr/lf delimited file you would read a
    chunk(or all) of a file into memory then either use Instr or a Byte Ptr and search
    for each LF, advancing the buffer position as necessary.

    Ron

    PS: Go to Ethan Winer's web site and download his book (free) "PC Magazine Basic Techniques and Utilities".
    It has a snippet called bufin that does a great job of speeding up text file reading using BASIC.
    http://www.ethanwiner.com


    [This message has been edited by Ron Pierce (edited July 15, 2001).]

    Comment


    • #3
      A few things to speed it up: add LEN = 8192 to OPEN statement and
      parse the string, using pointers. I believe I read your text used
      UNIX line feed (CHR$(10), so something like the following maybe can
      serve as example of a pretty fast way of doing it. Here I read file
      into a global array, but it can be adjusted and used other ways.
      Code:
      'in initial declares:
      GLOBAL TheArray() AS STRING
       
      '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
      ' Read a file, line by line into a global array
      '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
      FUNCTION FileToArray(BYVAL TheFile AS STRING) AS LONG
        LOCAL I AS LONG, Letter AS BYTE PTR, p1 AS LONG, _
              NumRecs AS LONG, Lof1 AS LONG, Buf AS STRING
       
        OPEN TheFile FOR BINARY AS #1 LEN = 8192
           Lof1 = LOF(1)
           GET$ #1, Lof1, Buf
        CLOSE #1
       
        Numrecs = TALLY(Buf, CHR$(10)) ' line feed count, here counting LF's
        IF Numrecs = 0 THEN EXIT FUNCTION
       
        REDIM TheArray(Numrecs)
       
        p1 = 1: I = 0 : Numrecs = 0
        Letter = STRPTR(Buf)                            'point Letter to beginning of string
        Lof1 = LEN(Buf)
       
        FOR I = 1 TO Lof1
           IF @Letter = 10 THEN                         'Letter's value = 10, $LF
              TheArray(Numrecs) = MID$(Buf, p1, I - p1) 'pick out line to array element
              INCR Numrecs                              'incement counter
              p1 = I + 1                                'store position + 1 for the LF we skip
           END IF
           INCR Letter                                  'set byte pointer to next char
        NEXT I
       
        FUNCTION = Numrecs - 1                          'return element count
      END FUNCTION

      ------------------

      Comment


      • #4
        Ron,
        I read the section on files and buffers thx.
        I think the book is a little old tho, 45sec for disk reads ...

        Borje,
        Is that fast?? You are reading the whole file character by character!

        Wouldnt it be faster to Parse$ out a line and then assign parts of
        the line to strings

        For example:
        Code:
        TYPE CMETimeAndSales
            SymbStr     AS STRING*3  ' 1  - 3 
            OptionStr   AS STRING*1  ' 4 
            DeliveryStr AS STRING*4  ' 5  - 8 
            f1          AS STRING*8  ' 9  - 16
            TimeStr     AS STRING*4  ' 17 - 20 
            f2          AS STRING*2  ' 21 - 22
            PriceStr    AS STRING*7  ' 23 - 29  
            f3          AS STRING*12 ' 30 - 41
            CmeDateStr  AS STRING*6  ' 42 - 47 
            f4          AS STRING*2  ' 48 - 49
            DecLocStr   AS STRING*1  ' 50 
            f5          AS STRING*31 ' 51 - 81
        END TYPE
        
        
        GLOBAL Aline AS CMETimeAndSales
        
        
            OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100 LEN = 32768 
            GET$ 100, LOF(100), a$ ' read the whole file into memory
            CLOSE #100
        
        
            FOR Pos = 1 TO PARSECOUNT(a$, CHR$(10)) step LEN( PARSE$(a$, CHR$(10)) )
                Aline = INSTR(Pos, a$, CHR$(10)) ' is this possible ?????
                CALL ProcessData
            NEXT
        ------------------
        Kind Regards
        Mike

        [This message has been edited by Mike Trader (edited July 15, 2001).]

        Comment


        • #5
          Mike, if you are reading the entire file into a buffer, buffering the reads will do
          nothing for you and may slow things down. When you say the book is old and reads are 45 seconds, what
          are you referring to? Did you try the BufIn$ function after modifying it to use a larger
          buffer and changing its use of Integers to Longs?

          Comment


          • #6
            Whole file is read in one take. Parsing the result is done via
            Byte pointer. Similar routine is used to read large databases
            here and I read a 1.5 MB, 130,000 items into array in less than
            a second. Without array, a fraction of that time.

            Only way to see is by "live" test. I have a feeling it's what
            happens after read that is slow, when you CALL process.., and
            not the actual reading/parsing.


            ------------------

            Comment


            • #7
              Mike
              I agree with Borje your main speed loss is in the "processing". You are also losing much of the speed advantage of the quick disk read by having to move all the data fron one place to another. Without resorting to assembler there is a reasonable compromise. Read the bulk of the file in large chunks and the remainder in small chunks, so in the following example you can read the bulk 200 records at a time and then the last 1 to 199 one at a time (whenever I dash off code like this I make silly errors so check the math)
              Code:
              TYPE CMETimeAndSales
                  SymbStr     AS STRING*3  ' 1  - 3 
                  OptionStr   AS STRING*1  ' 4 
                  DeliveryStr AS STRING*4  ' 5  - 8 
                  f1          AS STRING*8  ' 9  - 16
                  TimeStr     AS STRING*4  ' 17 - 20 
                  f2          AS STRING*2  ' 21 - 22
                  PriceStr    AS STRING*7  ' 23 - 29  
                  f3          AS STRING*12 ' 30 - 41
                  CmeDateStr  AS STRING*6  ' 42 - 47 
                  f4          AS STRING*2  ' 48 - 49
                  DecLocStr   AS STRING*1  ' 50 
                  f5          AS STRING*31 ' 51 - 81
              END TYPE
              TYPE TandSArray
                  TS(199) AS CMETimeAndSales
              END TYPE
              
              
              GLOBAL Aline AS TandSArray
              
              
                  OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100 LEN = 32768 
                  u& = LEN(Aline)
                  w& = LOF(100) - u&
                  FOR x& = 1 TO w& step u&    'get 200 records at a time
                      GET #100,x&,Aline
                      FOR z& = 0 TO 199
                          CALL process z& 'process fields in Aline.TS(z&)
                      NEXT
                 NEXT
                 y& = x& - u&
                 u&= LEN(aline.ts(0))
                 w& = LOF(100)
                 FOR x& = y& TO w& step u&    'get remainder 1 record at a time
                     GET #100,x&,Aline.TS(0)
                     CALL process 0
                 NEXT

              ------------------

              Comment


              • #8
                Code:
                TYPE CMETimeAndSales
                    SymbStr     AS STRING*3  ' 1  - 3 
                    OptionStr   AS STRING*1  ' 4 
                    DeliveryStr AS STRING*4  ' 5  - 8 
                    f1          AS STRING*8  ' 9  - 16
                    TimeStr     AS STRING*4  ' 17 - 20 
                    f2          AS STRING*2  ' 21 - 22
                    PriceStr    AS STRING*7  ' 23 - 29  
                    f3          AS STRING*12 ' 30 - 41
                    CmeDateStr  AS STRING*6  ' 42 - 47 
                    f4          AS STRING*2  ' 48 - 49
                    DecLocStr   AS STRING*1  ' 50 
                    f5          AS STRING*31 ' 51 - 81
                END TYPE
                
                
                GLOBAL Aline() AS CMETimeAndSales
                GLOBAL LAline AS CMETimeAndSales
                GLOBAL ArrayStart AS DWORD
                
                
                    OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100 LEN = 32768 
                    w& = LOF(100)
                    u& = w&/LEN(LAline)
                    GET #100,w$,a$
                    ArrayStart = STRPTR(a$)
                    DIM Aline(u& - 1) at ArrayStart
                    FOR x& = 0 TO u& - 1
                        CALL process x&
                    NEXT
                PS remember to leave room for the LF in the UDT

                ------------------


                [This message has been edited by John Petty (edited July 16, 2001).]

                Comment


                • #9
                  Mike,

                  I would be very surprised if you can do it much faster than the
                  code that Borje has posted, I found an assembler version that I
                  had posted a long time ago that loads the file in one read from
                  disk and chops it up into an array but its truly overkill for
                  what you are doing.

                  The suggestion I made earlier about loading it into memory and
                  using REPLACE globally on it, writing it back to disk and
                  re-reading it with LINE INPUT is by no means a slow process, the
                  first disk write gets it into cache and the LINE INPUT will be
                  a lot faster than if it had not been read first.

                  Regards,

                  [email protected]

                  ------------------
                  hutch at movsd dot com
                  The MASM Forum

                  www.masm32.com

                  Comment


                  • #10
                    In the custom control editor I recently posted info about to 3-party
                    forum, I do several REPLACE$ actions on loaded file (string) before
                    loading it into the controls memory, in order to make sure line feeds
                    are in correct format. No problems with speed even on a MB file.

                    In my sample, I load the file into an array. Skip the allocation of
                    the array and simply grab the lines to a string, to be sent for further
                    processing and speed will increase a lot.

                    Like always, there are so many ways to do things. Using PB, most things
                    are so fast it's sometimes hard to decide what to use. That, IMHO, is
                    the real power in PowerBasic - the amazing flexibility.

                    However, if best speed possible is preferred, I suggest you search this
                    forum for Steve's asm sample. Thread was called "File Reading - VB Smokes
                    PBDLL", which of course turned out the other way - PB smokes VB...

                    Not many (any?) people in the world beats Steve when it comes to writing
                    fast asm routines, so it definitely is worth looking at..


                    ------------------

                    Comment


                    • #11
                      Hi fellows,
                      Here is my approach using INSTR which is very fast.
                      Your comments - in particular about the speed -
                      are most welcome.
                      Regards,
                      Erik Christensen, Copenhagen, Denmark.
                      [email protected]
                      Code:
                      #COMPILE EXE
                      #REGISTER NONE
                      #DIM ALL
                      #INCLUDE "win32api.inc"
                      FUNCTION PBMAIN
                          LOCAL K AS LONG,L AS LONG, M AS LONG,N AS LONG
                          LOCAL A$,LengthOfFile&
                          LOCAL t1 AS DOUBLE
                          LOCAL t2 AS DOUBLE
                      
                          t1 = TIMER ' START TIME
                      
                          OPEN "your_text_file" FOR BINARY AS #1
                          LengthOfFile&=LOF(1)
                      
                          GET$ #1, LOF(1), A$
                          CLOSE #1
                      
                          N = TALLY(A$, CHR$(10))   ' Count line feeds.
                      
                          DIM arr(1 TO N) AS STRING ' Use REDIM for global array.
                      
                          A$=A$+CHR$(10)            ' Safeguard against searching past the end of the string.
                      
                          K = 0
                      
                          FOR L = 1 TO N            ' Get N lines in total
                              M = K + 1             ' Starting point of search for each successive line
                              K = INSTR(M, A$, CHR$(10))  ' Find next line feed
                              arr(L) = MID$(A$, M, K - M) ' Extract line from M to K-1. Length = K - M.
                          NEXT
                      
                          t2 = TIMER ' END TIME
                      
                          MSGBOX "I read "+STR$(LengthOfFile&)+" bytes containing"+STR$(N)+" lines in "+FORMAT$(1000 * (t2 - t1), "# ms.")+"   Not bad!",%MB_ICONINFORMATION,"Reading and parsing time:"
                      
                      END FUNCTION
                      P.S. You can use the same approach with any delimiter (e.g. $TAB)
                      to obtain from each line the individual fields in a data base.
                      Normally each line of a data base will represent the data of one
                      individual. Of course the file should be saved in text format.
                      This approach could replace the PARSE-function, which involves searching
                      the string many times (=PARSECOUNT). The above function searches the
                      string only once.

                      Erik
                      ------------------




                      [This message has been edited by Erik Christensen (edited July 16, 2001).]

                      Comment


                      • #12
                        I opted for this method, didn't really go over 3k in filesize but appears to be fastest way to do it, I tried line by line..


                        Code:
                        '------------------------------------------------------------------------------------------------------------------------------
                        Function ReadLogFile() As Long
                        Local hFile            As Long
                        On Error GoTo READLOGERROR
                            If Exist(g_sFileSpec) = 0 Then
                                g_lTotalReboot = 0
                                Exit Function
                            End If
                            hFile = FreeFile
                            Open g_sFileSpec For Binary As #hFile
                            Get$ #hFile,Lof(hFile),g_sClipText
                            Close hFile
                        
                            If IsTrue g_UseEncryption Then g_sClipText = RC4NcryptString(ByVal g_sClipText,g_wnKey,0)
                            Replace $CRLF With "|" In g_sClipText
                            'Tally how many lines there are for the array etc
                            g_lTotalReboot = Tally(g_sClipText,"|")
                        Exit Function
                        READLOGERROR:
                        Local ErrType As Long
                        ErrType = Err
                        MsgBox "There was an error reading the log file: " & GetLastErrorDescription( ByVal ErrType,0),%MB_ICONSTOP,g_szMine
                        End Function
                        '------------------------------------------------------------------------------------------------------------------------------
                        Function ParseLogFile(CalculateCRC As Long) As Long
                            Local lLoop                As Long
                            Local lCount               As Long
                            Local lBufSize             As Long
                            Local lx                   As Long
                            Local sString              As String
                            Local LVArray()            As String
                        
                        Retry:
                            Dim LVArray(1 To 8193) As String
                        
                            'Perform all crc check/formatting here before deciding which
                            'direction To display the data in.
                            'All reading of log file should be done now, just parse cliptext
                            For lLoop = 1 To g_lTotalReboot
                                LVArray(lLoop) = Parse$(g_sClipText,"|",lLoop)
                                If IsTrue CalculateCRC Then g_CrcCount =  g_CrcCount + CRC32&(LVArray(lLoop))
                            Next
                        
                            'Currently the LVArray holds all data, nicely formatted,
                            'g_sClipText needs this formatting to match the order (reverse or forward)
                        '    g_sClipText = "" 'Update in loops
                        
                            If IsTrue g_lRevOrderFlag Then GoTo ReverseOrder
                            For lLoop = 1 To g_lTotalReboot
                                'Sample
                                '1øFridayø July 06 2001ø03:08 PMøtngbbsøTNG2K|
                                g_sRec(0) = Parse$(LVArray(lLoop),"ø", 1) 'Event#
                                g_sRec(1) = Parse$(LVArray(lLoop),"ø", 2) 'Day of week
                                g_sRec(2) = Parse$(LVArray(lLoop),"ø", 3) '
                                g_sRec(3) = Parse$(LVArray(lLoop),"ø",4)   'July 6, 2001
                                g_sRec(4) = Parse$(LVArray(lLoop),"ø",5)
                                g_sRec(5) = Parse$(LVArray(lLoop),"ø",6)
                                g_sRec(6) = Parse$(LVArray(lLoop),"ø",7)
                                AppendListView g_hListView 'Append to the ListView
                            Next
                            GoTo FinalizeListView
                        
                        ReverseOrder:
                        
                            For lLoop = g_lTotalReboot To 1 Step -1
                                'Sample
                                '1 Friday July 6, 2001 8:51 AM,tngbbsTNG2K|
                                g_sRec(0) = Parse$(LVArray(lLoop),"ø", 1)
                                g_sRec(1) = Parse$(LVArray(lLoop),"ø", 2)
                                g_sRec(2) = Parse$(LVArray(lLoop),"ø", 3)
                                g_sRec(3) = Parse$(LVArray(lLoop),"ø",4)   'July 6, 2001
                                g_sRec(4) = Parse$(LVArray(lLoop),"ø",5)
                                g_sRec(5) = Parse$(LVArray(lLoop),"ø",6)
                                g_sRec(6) = Parse$(LVArray(lLoop),"ø",7)
                                AppendListView g_hListView 'Append to the ListView
                            Next
                        
                        FinalizeListView:
                            Erase LVArray
                            g_sClipText = Trim$(g_sClipText)
                            ShowReboots
                            Function = g_CrcCount
                        End Function
                        '------------------------------------------------------------------------------------------------------------------------------
                        ------------------
                        Scott
                        Scott Turchin
                        MCSE, MCP+I
                        http://www.tngbbs.com
                        ----------------------
                        True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

                        Comment


                        • #13
                          This is the code I use to parse a crlf text file. You should be able to
                          modify for just $CR without too much trouble.

                          James

                          Code:
                          '~ParseCrLf
                          '---------------------------------------------------------------------------
                          FUNCTION _
                            ParseCrLf ( _
                              S1			AS STRING, _
                              S2()		AS STRING _
                            )AS LONG
                            
                            DIM bPtr      AS BYTE PTR,_
                                Indexes   AS LONG,_
                                Index     AS LONG,_
                                Start		AS LONG, _
                                Length	AS LONG
                          
                            Indexes = TALLY(S1,$CRLF)+1
                            IF Indexes = 1 THEN
                              S2(1) = S1
                              S2(0) = "1"
                              FUNCTION = 1
                              EXIT FUNCTION
                            ELSEIF Indexes > 1    THEN
                              REDIM S2(Indexes)
                              FUNCTION = Indexes
                            ELSE
                              EXIT FUNCTION
                            END IF
                          
                            bPtr = STRPTR(S1)
                            S2(Index) = FORMAT$(Indexes)
                            INCR Index
                            INCR Start
                            DO
                              IF @bPtr = 0 THEN
                                IF Length THEN
                                  S2(Index) = MID$(S1,Start,Length)
                                END IF
                                EXIT LOOP
                              END IF
                              IF @bPtr = 13 THEN
                                S2(Index) = MID$(S1,Start,Length)
                                INCR Index
                                Start = Start + Length + 2
                                INCR bPtr
                                Length = 0
                              ELSE
                                INCR Length  
                              END IF  
                              INCR bPtr
                            LOOP
                            
                          END FUNCTION
                          '===========================================================================     
                          '~FileParseCrLf
                          '---------------------------------------------------------------------------
                          FUNCTION _
                            FileParseCrLf ( _
                              sFileName		AS STRING, _
                              sLines()		AS STRING _
                            ) AS LONG
                            
                            LOCAL ff			AS LONG, _
                                  sBuffer		AS STRING
                          
                            IF LEN(DIR$(sFileName)) = 0 THEN
                              FUNCTION = -1
                              sLines(0) = "Could Not Locate -> " + sFileName
                              EXIT FUNCTION
                            END IF
                            ff = FREEFILE
                            OPEN sFileName FOR BINARY AS ff
                            GET$ #ff,LOF(ff),sBuffer
                            CLOSE #ff  
                            FUNCTION = ParseCrLf(sBuffer,sLines())
                          END FUNCTION     
                          '===========================================================================
                          ------------------

                          Comment


                          • #14
                            I hate dynamic strings. So I always do something like this (Asciiz Ptr)
                            Code:
                               #Compile Exe
                               #Register None
                               #Dim All
                               #Include "win32api.inc"
                               
                               %kt = 100 ' no. of cycles (too fast on my PC)
                               $FileName = "1.imm"
                               
                               Function PbMain
                                  Dim p As Asciiz Ptr, b As Byte Ptr, pp As Byte Ptr, Buf As String, f As Long
                                  Dim t1 As Double, t2 As Double, k As Long, n As Long
                                  
                                  t1 = Timer
                                  For k = 1 To %kt
                                     f = FreeFile: Open $FileName For Binary Shared As #f
                                     Get$ #f, Lof(f), Buf: Close #f
                                     p = StrPtr(Buf): pp = p + Len(Buf): @pp = 10
                                     n = 0 ' nomber of lines
                                     While p < pp
                                        b = p: While @b <> 10: Incr b: Wend: @b = 0
                                        ' <--------- Process a line (@p)
                                        p = b + 1
                                        Incr n
                                     Wend
                                  Next
                                  
                                  t2 = Timer
                                  MsgBox Format$((t2 - t1) / %kt, "0.0000 sec"),, "Lines =" + Str$(n)
                            
                               End Function
                            ------------------
                            E-MAIL: [email protected]

                            Comment


                            • #15
                              Agreed, much faster to use pointers, my RC4 code uses pointers, lightning quick..
                              Took 45 seconds to do 10 megs...



                              ------------------
                              Scott
                              Scott Turchin
                              MCSE, MCP+I
                              http://www.tngbbs.com
                              ----------------------
                              True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

                              Comment


                              • #16
                                Mike
                                Somehow I left the text off my last reply.
                                As Lance pointed out to me in another forum DIM AT is the fastest, no data movement or searces or replacements.
                                The prior post gives an example of reading the file in one go and then dimensioning an array of UDT's over it.


                                ------------------

                                Comment


                                • #17
                                  You guys are all so sharp! lots of really good ideas and code - thx

                                  Yes the slowest part of my code is undoubtbly the processing, but this
                                  is like comparing 0-60mph of cars, Its great fun but rarely do you
                                  get to use that fraction of a second faster!

                                  Semen,
                                  I love the use of pointers. Clever. After I have a line, I still have to
                                  use about 7 MID$() statements. So Im going to test your code against
                                  the TYPE ARRAY method to be fair.

                                  What is:
                                  %kt = 100 ' no. of cycles (too fast on my PC)
                                  How do I figure this number out?
                                  Whats its purpose?


                                  John,
                                  I did not know you could REDIM an array on top of a Dynamic string! This is
                                  super tricky. I am not able to get the code to work tho

                                  I have simplified my TYPE to just a fixed lenght string to get it to work
                                  (i will substitute the TYPE after)

                                  Code:
                                  GLOBAL TestStr() AS STRING*80
                                  GLOBAL TestLine  AS STRING*80
                                          OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100 LEN = 32768
                                          FileLength  = LOF(100)
                                          LineLength  = LEN(TestLine) ' Length of a line 80 chars
                                          LinesInFile = LOF(100) / LineLength 
                                          GET #100, FileLength, a$ ' Read whole file into a string
                                          CLOSE #100  
                                  
                                          DIM TestStr(LinesInFile-1) at STRPTR(a$) ' dont work for me    [img]http://www.powerbasic.com/support/forums/frown.gif[/img]
                                          FOR n = 0 TO LinesInFile-1
                                                  MSGBOX TestStr(n) ' View a line
                                          NEXT
                                  With the TYPE it would be:
                                  GLOBAL TnSFile() AS CMETimeAndSales
                                  MSGBOX TnSFile(n).Date
                                  MSGBOX TnSFile(n).Time
                                  MSGBOX TnSFile(n).Price
                                  etc
                                  ------------------
                                  Kind Regards
                                  Mike



                                  [This message has been edited by Mike Trader (edited July 16, 2001).]

                                  Comment


                                  • #18
                                    A technique I use for files with fixed length fields is this:

                                    Type Footype
                                    name as string * 20
                                    comma1 as string * 1
                                    addr1 as string * 20
                                    comma2 as string * 1
                                    .
                                    .
                                    .
                                    crlf as string * 2
                                    End Type

                                    This gives the speed & convenience of a random access file, and
                                    can be read into spreadsheets as a CSV. Very handy to interface
                                    DOS databases with Access & Excel!
                                    Incidentally, with Qbasic & Basic7 there were big advantages both
                                    in speed & memory use if UDTs were a power of two long. Is this
                                    also the case in PB?
                                    DIM AT is a gift! Ten years ago, in QB4, I wrote an app that
                                    read in a lot of numeric data into arrays. First DIM the array,
                                    then created a string the length of the array. Then OPEN the
                                    file for INPUT, read in the array as a string, then Memcopy the
                                    string to the array.
                                    This dropped 40 odd thousand disk reads to six. A considerable speed
                                    improvement on a 8088!


                                    ------------------

                                    Comment


                                    • #19
                                      Mike
                                      You must put the STRPTR in a dword first
                                      Code:
                                      #COMPILE EXE
                                      TYPE test
                                          t AS STRING * 2
                                      END TYPE
                                      GLOBAL ta() AS test
                                      FUNCTION PBMAIN
                                          LOCAL p AS DWORD
                                          a$ = SPACE$(10)
                                          p = STRPTR(a$)
                                          DIM ta(4) AT p
                                          ta(0).t="ab"
                                          ta(1).t="cd"
                                          ta(2).t="ef"
                                          ta(3).t="gh"
                                          ta(4).t="ij"
                                          MSGBOX a$
                                      END FUNCTION
                                      ------------------

                                      Comment


                                      • #20
                                        Originally posted by David J Walker:
                                        Incidentally, with Qbasic & Basic7 there were big advantages both
                                        in speed & memory use if UDTs were a power of two long. Is this
                                        also the case in PB?
                                        The best performance is when the size of the UDT is a multiple of four bytes, and the individual members are DWORD aligned.

                                        By using the DWORD alignment specifier in the TYPE statement, you take care of both aspects.

                                        However, if a UDT is used to read/write data with (random access) disk files, and the UDT consists of a lot of members that are not a multiple of fours bytes (ie, BYTE, INTEGER, STRING * 3, STRING * 5, etc) then the additional padding will contribute to increased disk storage due to the additional padding bytes.

                                        See TYPE/END TYPE in the doc's for more information.


                                        ------------------
                                        Lance
                                        PowerBASIC Support
                                        mailto:[email protected][email protected]</A>
                                        Lance
                                        mailto:[email protected]

                                        Comment

                                        Working...
                                        X