Announcement

Collapse
No announcement yet.

Get Rest of File Opened For Input

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get Rest of File Opened For Input

    I have a file whose first 1000s of lines of data I want read into various string variables or arrays.

    Then, I want put the rest of the file into a single string variable.

    I don't know that I've ever tried to do that, but something like this ought to work ...

    Code:
    Open "file.txt" For Input as #1
    Line Input #1, a$
    Line Input #1, b$
    Line Input #1, c$
    Line Input #1, d$
    Line Input #1, e$
    iPos = Seek #1
    Close #1
    
    Open "file.txt" for Binary as #1
    Seek #1, iPos
    Get$ #1,  Lof(1)-iPos, f$   '< - - - rest of file goes here.
    Close #1
    When Opening a file For Input, I don't know of a way to get all of the bytes past a specific location. So in this example, I get the position from the Open For Input, then re-open the file For Binary to get easy access to the rest of the file.

    I could separate the two "halves" of data into two files, but I want to keep the data together in a single file.

    Have I missed an easier approach?

  • #2
    I don't really like doing all of the Line Inputs.

    So perhaps I could create a delimiter between each data group (the sections of data that would go into a variable or array). Then, I could binary read in the whole file, use Parse$ to get each section of data and then use Parse to split the sections into the array elements.

    If I'm talking only a few 100K lines of data, I don't know that the loading speed of one would be better than the other. I'll have to try.

    Comment


    • #3
      Deterining the best method required more info.

      How do you know when you have got all the lines you want to have separated?
      How do you differentiate between variable and array data?
      Is there a unique marker, or do you know beforehand how many lines you want to read?

      How is the file generated?
      Can you embed a unique marker? If so, consider the good old ASCII control codes, read the whole file as binary and deal with blocks based on those codes.
      FS 028 1C File separator
      GS 029 1D Group separator
      RS 030 1E Record separator
      US 031 1F Unit separator
      etc



      Comment


      • #4
        Originally posted by Gary Beene View Post
        If I'm talking only a few 100K lines of data, I don't know that the loading speed of one would be better than the other. I'll have to try.
        I'd guess that a binary read and Parse will be considerably quicker than a few 100K of separate Line Inputs (as long as you don't run out of memory with long lines).

        Comment


        • #5
          Hi Gary,
          try the following:

          1) Open the file for binary
          2) SEEK 1
          3) GET (from seek value) to 1000 (define the receiving variable long as 1000)
          4) seek to 1001
          5) GET (from new seek) to end_of_file (using another variable)
          6) CLOSE

          Now you should be able to handle the two wariables as you like.
          Ciao, Dario

          Comment


          • #6
            Just to clarify my suggestion:
            my "1000" and "1001" were just example.
            You should put a variable with the lenght in byte you want to split.
            You can put any value within the size of virtual memory (till few Giga, according to you
            your operating system).
            Ciao, Dario

            Comment


            • #7
              Howdy, Stuart and Dario!
              In the file itself, I place line counts, telling the reading app how many Line Inputs are necessary. I could, as I think y'all are suggesting, put byte counts in the file to indicate the length of each data section, use Binary for reading everything, and then parse each binary extraction data group. Line Input vs Parse$ would favor Parse$ for speed, I'd think. Taking the time to create the right markers when the file is created should pay off.


              Comment


              • #8
                Line Input vs Parse$ would favor Parse$ for speed, I'd think.
                Maybe. Did you count time to load from disk to string so you can then PARSE$ the string? Looks like something that needs a test.

                Cheers,
                Dale

                Comment


                • #9
                  '
                  When Opening a file For Input, I don't know of a way to get all of the bytes past a specific location.

                  'So in this example, I get the position from the Open For Input, then re-open the file For Binary to get
                  'easy access to the rest of the file.

                  'I could separate the two "halves" of data into two files, but I want to keep the data together in a single file.
                  'Have I missed an easier approach?
                  This is rough, loops until EOF or specifed byte.
                  Writer places the start byte of the second part in the last 8-bytes of the file.
                  Sounds like you don't need that.

                  '
                  Code:
                  FUNCTION PBMAIN AS LONG
                   Writer "file.txt"
                   ? ReadFirstPart("file.txt") +_
                     ReadLastPart ("file.txt")
                  END FUNCTION
                  
                  FUNCTION ReadFirstPart(sFileName AS STRING) AS STRING
                   LOCAL VariableLen AS LONG
                   LOCAL hFile AS LONG
                   LOCAL s      AS STRING
                   LOCAL sBuffer AS STRING
                   LOCAL FirstByte AS LONG
                   LOCAL sBinaryLength AS STRING
                  
                   hFile = FREEFILE
                   OPEN sFileName FOR INPUT AS #hFile
                   SEEK #hFile,LOF(hFile)-7
                   LINE INPUT #hFile,sBuffer
                   VariableLen = VAL(sBuffer)
                   FirstByte = LOF(hFile)-VariableLen-8
                   SEEK #hFile,1
                   DO UNTIL SEEK(#hFile) => FirstByte
                    LINE INPUT #hFile,sBuffer
                    s = s + sBuffer
                   LOOP
                   CLOSE #hFile
                   FUNCTION = s
                  END FUNCTION
                  
                  FUNCTION ReadLastPart(sFileName AS STRING) AS STRING
                   LOCAL VariableLen AS LONG
                   LOCAL hFile AS LONG
                   LOCAL s      AS STRING
                   LOCAL sBuffer AS STRING
                   LOCAL FirstByte AS LONG
                   LOCAL sBinaryLength AS STRING
                  
                   hFile = FREEFILE
                   OPEN sFileName FOR INPUT AS #hFile
                   SEEK #hFile,LOF(hFile)-7
                   LINE INPUT #hFile,sBuffer
                   VariableLen = VAL(sBuffer)
                   FirstByte = LOF(hFile)-VariableLen-8
                   SEEK #hFile,FirstByte
                   RESET s
                   DO UNTIL EOF(hFile)
                    LINE INPUT #hFile, sBuffer
                    s += sBuffer
                   LOOP
                   CLOSE #hFile
                   FUNCTION = LEFT$(s,-8)
                  END FUNCTION
                  
                  SUB Writer(sFile AS STRING)
                   LOCAL hFile AS LONG
                   LOCAL sRestLen  AS STRING * 8
                   LOCAL sRest     AS STRING
                   OPEN sFile FOR OUTPUT AS #hFile
                   PRINT #hFile, "Gary"
                   PRINT #hFile, "Beene"
                   sRest = "Now is the time for all good men to do something"
                   sRestLen = FORMAT$(LEN(sRest))
                   PRINT #hFile,sRest + sRestLen;
                   CLOSE #hFile
                  END SUB'
                  How long is an idea? Write it down.

                  Comment


                  • #10
                    Howdy, Dale!

                    I haven't done any speed testing yet. So far, this is just an Einstein thought experiment!



                    And, Howdy Mike!

                    Your suggestion of jumping to a point, then using Line Input isn't something I ever considered. I guess Line Input starts at the current location in the file and reads until the next $crlf. And, the current location doesn't have to be immediately after the last $crlf. I just hadn't thought about that before.

                    I guess I'll have to test the speed.

                    Comment


                    • #11
                      Did you count time to load from disk to string so you can then PARSE$ the string? Looks like something that needs a test.
                      Is it worth a test? How many times per program do you do it? How many lines? All optimization is application-specific.

                      FWIW, LINE INPUT screams if you specify a big enough buffer (LEN=x) in the OPEN statement. (It actiually reads 'buffersize' bytes when executed...and if the line is already in the buffer there is no physical I/O to do).

                      "It's in the manual" (the new and politically correct way to say )
                      Michael Mattias
                      Tal Systems Inc. (retired)
                      Racine WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        Code:
                        Separate each group with end of file $EOF so DO UNTIL EOF(hFile) works with each group
                        Each group is read back into an array element and fields of each group delimited with $TAB.
                        
                        I used 1-array for all 3 groups, but it would be easier to process each group with its own array.
                        
                        %groups = 3
                        $fileName = "file.txt"
                        
                        FUNCTION PBMAIN () AS LONG
                         Writer
                         Reader
                        END FUNCTION
                        
                        FUNCTION Writer AS LONG
                         LOCAL hFile AS LONG
                         hFile = FREEFILE
                         OPEN $FileName FOR OUTPUT AS #hFile
                         PRINT #hFile, CHR$("A1",$EOF);                'group 1
                         PRINT #hFile, CHR$("B1",$TAB,"B2",$EOF);      'group 2
                         PRINT #hFile, CHR$("C1",$TAB,"C2",$TAB,"C3"); 'group 3
                         CLOSE #hFile
                        END FUNCTION
                        
                        FUNCTION Reader AS LONG
                         LOCAL x AS LONG
                         LOCAL hFile AS LONG
                         REDIM s(1 TO 99999) AS STRING
                         LOCAL sLine AS STRING
                         LOCAL counter AS LONG
                        
                         hFile = FREEFILE
                         OPEN $FileName FOR INPUT AS #hFile
                        
                         FOR x = 1 TO %groups
                          DO UNTIL EOF(hFile)
                           INCR counter
                           LINE INPUT #hFile, sLine
                           s(counter) = sLine
                          LOOP
                          SEEK #hFile,SEEK(#hFile)+1 'skip $EOF for next group
                         NEXT
                        
                         REDIM PRESERVE s(1 TO counter)
                         ? JOIN$(s(),$CR),,USING$("# groups",UBOUND(s))
                        END FUNCTION
                        Click image for larger version  Name:	gary.png Views:	1 Size:	7.9 KB ID:	794898
                        How long is an idea? Write it down.

                        Comment


                        • #13
                          You know what else is fast?: And easy? And well-tested? And Documented?

                          Memory-mapped version of LINE INPUT 5-31-04

                          You may use the provided LINE INPUT callback procedure to get the lines, then get the rest of the file using a single "ReadFile" call.

                          Michael Mattias
                          Tal Systems Inc. (retired)
                          Racine WI USA
                          [email protected]
                          http://www.talsystems.com

                          Comment


                          • #14
                            You know what else you could do? Instead of creating a file(you said you write the number of lines as data) which has no consistent PB access method you can use, why don't you just create TWO files, one of which can be read sequentially (OPEN FOR INPUT) and one you access with one of the random access modes (FOR RANDOM, FOR BINARY)?

                            Why you would create a file you are not sure how to access is a discussion for another day.




                            Michael Mattias
                            Tal Systems Inc. (retired)
                            Racine WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Gary,
                              Mike's idea is not bad, I mean Mike Doty of course.
                              Using the old EOF as a way to discriminate between lines and data that may contain any characters, aka zero to 255 including $CRLF, $EOF, $NUL, etc...
                              FILESCAN will stop on the first EOF, so we are in business...

                              Code:
                              #COMPILE EXE '#Win#
                              #DIM ALL
                              #REGISTER NONE
                              #INCLUDE "Win32Api.inc"
                              '______________________________________________________________________________
                              
                              FUNCTION PBMAIN AS LONG
                               LOCAL sFileName AS STRING
                               LOCAL sData     AS STRING
                               LOCAL ChrPos    AS LONG
                               LOCAL index     AS LONG
                               LOCAL LineCount AS LONG
                              
                               'Set parameters and data
                               sFileName = "zzzDelMe.txt"
                               LineCount = 20
                               DIM sOut(1 TO LineCount) AS STRING
                               FOR index = 1 TO LineCount
                                 sOut(index) = "String" & STR$(index)
                               NEXT
                               sData = "sData " & CHR$(1 TO 255) & " sDataEnd" 'May contain $NULL
                               '-------------------------------------------
                               'Save file line
                               OPEN sFileName FOR OUTPUT AS #1
                               PRINT #1, sOut()
                               PRINT #1, CHR$(26) 'Ctrl-Z aka SETEOF #1
                               ChrPos = SEEK(#1)
                               CLOSE #1
                               'Add file data
                               OPEN sFileName FOR BINARY AS #1
                               SEEK #1, ChrPos
                               PUT$ #1, sData
                               CLOSE #1
                               '-------------------------------------------
                               'Load file lines
                               OPEN sFileName FOR INPUT AS #1
                               FILESCAN #1, RECORDS TO LineCount
                               DIM sLineIn(1 TO LineCount) AS STRING
                               LINE INPUT #1, sLineIn()
                               ChrPos = SEEK(#1) + 1 + 2
                               CLOSE #1
                               'Load ending data
                               OPEN sFileName FOR BINARY AS #1
                               SEEK #1, ChrPos
                               GET$ #1, LOF(#1) - ChrPos + 1, sData
                               CLOSE #1
                               '-------------------------------------------
                               MSGBOX STR$(LineCount) & " lines" & $CRLF & sLineIn(1) & $CRLF & sLineIn(LineCount) & $CRLF & _
                                      "Len sData is" & STR$(LEN(sData)) & $CRLF & "[" & sData & "]"
                              
                              END FUNCTION
                              '______________________________________________________________________________
                              '

                              Comment


                              • #16
                                Gary, are you making the file or being given the file? AKA you can control what is in the file?

                                Comment


                                • #17
                                  Originally posted by Pierre Bellisle View Post
                                  Gary,
                                  Mike's idea is not bad, I mean Mike Doty of course.
                                  Using the old EOF as a way to discriminate between lines and data that may contain any characters, aka zero to 255 including $CRLF, $EOF, $NUL, etc...
                                  FILESCAN will stop on the first EOF, so we are in business...

                                  Code:
                                  OPEN sFileName FOR INPUT AS #1
                                  FILESCAN #1, RECORDS TO LineCount
                                  DIM sLineIn(1 TO LineCount) AS STRING
                                  LINE INPUT #1, sLineIn()
                                  ChrPos = SEEK(#1) + 1 + 2
                                  CLOSE #1
                                  'Load ending data
                                  OPEN sFileName FOR BINARY AS #1
                                  SEEK #1, ChrPos
                                  GET$ #1, LOF(#1) - ChrPos + 1, sData
                                  CLOSE #1

                                  That's a great example of exploiting PB's data processing statements/functions.
                                  You'd be hard pressed to do it so cleanly in most programming languages.
                                  .

                                  Comment


                                  • #18
                                    Code:
                                    Be sure to check for 0 records using FILESCAN or crashes on REDIM.
                                    PB idea is great, I mean Pierre Bellisle allowing binary data.
                                    
                                    GB,
                                    If no binary, it might be easier to just add a few elements for extra data.
                                    
                                    #DIM ALL
                                    FUNCTION PBMAIN () AS LONG
                                     LOCAL x AS LONG
                                     REDIM s(1 TO 9999) AS STRING      'create array
                                     FOR x = LBOUND(s) TO UBOUND(s)
                                      s(x) = FORMAT$(x)
                                     NEXT
                                     SaveArray "zzzDelMe.txt",s()      'save array
                                     ReadArray "zzzDelMe.txt",s()      'read array
                                    END FUNCTION
                                    
                                    FUNCTION SaveArray(sFileName AS STRING,s() AS STRING) AS LONG
                                     LOCAL hFile AS LONG
                                     hFile=FREEFILE
                                     OPEN sFileName FOR OUTPUT AS #hFile
                                     IF ERR THEN ? ERROR$(ERR),,FUNCNAME$:FUNCTION = -ERR:EXIT FUNCTION
                                     PRINT #hFile, s()
                                     IF ERR THEN ? ERROR$(ERR),,FUNCNAME$
                                     CLOSE #hFile
                                     ? USING$("& to &",s(LBOUND(s)),s(UBOUND(s))),,FUNCNAME$
                                    END FUNCTION
                                    
                                    FUNCTION ReadArray(sFileName AS STRING,s() AS STRING) AS LONG
                                     LOCAL hFile AS LONG
                                     LOCAL counter AS LONG
                                     hFile = FREEFILE
                                     OPEN sFileName FOR INPUT AS #hFile  'read array
                                     IF ERR THEN ? ERROR$(ERR),,FUNCNAME$:FUNCTION = -ERR:EXIT FUNCTION
                                    
                                     FILESCAN #hFile, RECORDS TO counter
                                     IF counter < 1 THEN EXIT FUNCTION   'required
                                     REDIM s(1 TO counter) AS STRING
                                     LINE INPUT #hFile, s()
                                     IF ERR THEN ? ERROR$(ERR),,FUNCNAME$
                                     CLOSE #hFile
                                     ? USING$("& to &",s(LBOUND(s)),s(UBOUND(s))),,FUNCNAME$
                                    END FUNCTION'
                                    How long is an idea? Write it down.

                                    Comment


                                    • #19
                                      This one is more elegant from my previous one to my eye...
                                      Code:
                                      #COMPILE EXE '#Win#
                                      #DIM ALL
                                      #REGISTER NONE
                                      #INCLUDE "Win32Api.inc"
                                      '_____________________________________________________________________________
                                      
                                      FUNCTION PBMAIN AS LONG
                                       LOCAL sFileName AS STRING
                                       LOCAL sData     AS STRING
                                       LOCAL sBuffer   AS STRING
                                       LOCAL CharPos    AS LONG
                                       LOCAL index     AS LONG
                                       LOCAL LineCount AS LONG
                                      
                                       'Set parameters and data
                                       sFileName = "zzzDelMe.txt"
                                       LineCount = 20
                                       DIM sLineOut(1 TO LineCount) AS STRING
                                       FOR index = 1 TO LineCount
                                         sLineOut(index) = "String" & STR$(index)
                                       NEXT
                                       sData = "sDataStart " & CHR$(1 TO 255) & " sDataEnd" 'May contain $NULL
                                       '-------------------------------------------
                                       'Save lines and data
                                       OPEN sFileName FOR BINARY AS #1
                                       PUT$ #1, JOIN$(sLineOut(), $CRLF)
                                       PUT$ #1, $EOF
                                       PUT$ #1, sData
                                       CLOSE #1
                                       '-------------------------------------------
                                       'Load all and parse
                                       OPEN sFileName FOR BINARY AS #1
                                       GET$ #1, LOF(#1), sBuffer
                                       CLOSE #1
                                       CharPos = INSTR(sBuffer, $EOF)
                                       sData   = MID$(sBuffer, CharPos + 1)
                                       sBuffer = LEFT$(sBuffer, CharPos - 1)
                                       DIM sLine(1 TO PARSECOUNT(sBuffer, $CRLF)) AS STRING
                                       PARSE sBuffer, sLine(), $CRLF
                                       RESET sBuffer
                                      '-------------------------------------------
                                       MessageBox(%HWND_DESKTOP, sLine(1) & $CRLF & _
                                                  sLine(UBOUND(sLine())) & $CRLF & _
                                                  sData, "Binary file", %MB_OK OR %MB_TOPMOST)
                                      
                                      END FUNCTION
                                      '_____________________________________________________________________________
                                      '

                                      Comment

                                      Working...
                                      X