Announcement

Collapse
No announcement yet.

Read past a $EOF char

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Read past a $EOF char

    Using LINE INPUT# to read a txt file. How can I alter it so I can read past an embedded Ctl-Z ($EOF) character. Do I have to go back to a byte level read? (Which I'm sure would be much slower on large files.

    Its currently doing a standard DO WHILE NOT EOF(x) type loop to read the lines.

  • #2
    I would read the whole file in to a string
    repace the EOF character with a space
    write out to a new file
    read the new file

    FUNCTION PBMAIN
    fname$ = "???"

    OPEN fname$ FOR BINARY ACCESS READ LOCK SHARED AS #1
    GET$ #1, LOF(1), WholeFile$ 'read entire string
    CLOSE #1
    x$ = REMOVE$(wholefile$, CHR$(00)) ' change to eof char
    OPEN "New"+fname$ FOR OUTPUT AS #33
    PRINT #33,x$
    CLOSE #33

    END FUNCTION

    Comment


    • #3
      I agree to do like Ralph says, but instead of:
      Code:
      x$ = REMOVE$(wholefile$, CHR$(00)) ' change to eof char
      use:
      Code:
      REPLACE ANY $EOF WITH $SPC IN wholefile$
      then print wholefile$ as your new file.

      Comment


      • #4
        If the file is too large to read into memory, you can get chunks of data in binary mode.

        I find it easier to prepare the file in a separate step. Below is a pretty fast routine to replace any imbedded EOF characters.
        Code:
        #COMPILE EXE
        #DIM ALL
        
        FUNCTION PBMAIN () AS LONG
        LOCAL iChunk AS LONG
        LOCAL qByte, qBytesLeft AS QUAD
        LOCAL sRecord AS STRING
            OPEN "INPUT.CSV" FOR BINARY AS #1
            OPEN "OUTPUT.CSV" FOR BINARY AS #2
            qBytesLeft = LOF(1)                               '
            iChunk = MIN(50000,LOF(1))                        '
            SETEOF(2)                                         ' ensure starting with zero bytes output
            qByte = 1
            DO
               GET$ #1,iChunk,sRecord                          ' Get a chunk of bytes
        
                ' use only ONE of the following statements:
               REPLACE CHR$(26) WITH " " IN sRecord            ' replace EOF with space
        '        sRecord = REMOVE$(sRecord, CHR$(26))            ' remove EOF
        
                PUT$ #2, sRecord
                qBytesLeft = qBytesLeft - iChunk                ' determine number of bytes left
                IF iChunk > qBytesLeft THEN iChunk = qBytesLeft ' Last read to be remaining number of bytes in file
            LOOP UNTIL qBytesLeft = 0
            CLOSE
        
        END FUNCTION
        regards
        jim
        ... .... . ... . . ... ... .... . .. ... .... .. ... .... .. .... ..

        n6jah @ yahoo.com

        Comment


        • #5
          ???
          Code:
          #CTRL_Z_IS_EOF  OFF |ON
          ???
          Michael Mattias
          Tal Systems Inc. (retired)
          Racine WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            >> #CTRL_Z_IS_EOF OFF |ON

            Michael, I'm not familiar with the above and can't seem to find any documentation on it. Can you elaborate?

            Comment


            • #7
              Hi Again,
              The suggestions are fine, but I don't want to replace the Ctl-Z, I want to get past it while reading the file. It can't be replaced since I'm updating the file and it must be carried forward to the output file.

              I'm also confused by #CTRL_Z_IS_EOF OFF |ON

              or are we all just missing the joke?

              George

              Comment


              • #8
                #CTRL_Z_IS_EOF is a fantasy; there is no such thing....

                ...yet. It was an idea for a new feature suggestion. Of course, had I really had the urge for this feature I would have sent the idea directly to the designated address for all New Feature Suggestions, [email protected].

                However, anyone who does think it is a good idea for an NFS should send it in.

                I would imagine if #CTRL_Z_IS_EOF were ON (default), the compiler would generate the same code it does today; but were it OFF, CTRL-Z (0x1A, CHR$(26)) would be treated as 'just another character' and you would not have EOF()=true just because the next byte is CTRL-Z.

                MCM
                Last edited by Michael Mattias; 7 Oct 2008, 09:54 AM.
                Michael Mattias
                Tal Systems Inc. (retired)
                Racine WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #9
                  Not byte-by-byte, just do what Ralph Bing suggested, without the REMOVE$ line or after.

                  To find lines in WholeFile$ you can use PARSE$(WholeFile$, $CRLF, x) in a FOR/NEXT loop where the TO part comes from PARSECOUNT(WholeFile$, $CRLF)

                  May I ask what a control-Z is doing in a text file?
                  Dale

                  Comment


                  • #10
                    >May I ask what a control-Z is doing in a text file?

                    Why would ANY unprintable or control character be found in a "text" file?

                    Because <stuff> happens, that's why.
                    Michael Mattias
                    Tal Systems Inc. (retired)
                    Racine WI USA
                    [email protected]
                    http://www.talsystems.com

                    Comment


                    • #11
                      I think this might have a chance:
                      Code:
                      #COMPILE EXE
                      #DIM ALL
                      
                      FUNCTION PBMAIN () AS LONG
                          LOCAL s AS STRING
                          OPEN "D:\myDatafile.txt" FOR INPUT AS #1
                          LINE INPUT #1, s
                          IF EOF(1) THEN
                             IF SEEK(#1) <> LOF(#1) THEN  '<<may need to be adjusted +/- 1
                                SEEK #1, SEEK(#1) + 1     '<<may need to be adjusted +/- 1
                             END IF
                          END IF
                          ? s
                      END FUNCTION

                      Comment


                      • #12
                        Here's a more functional version:
                        Code:
                        #COMPILE EXE
                        #DIM ALL
                        FUNCTION PBMAIN () AS LONG
                            LOCAL s AS STRING
                            OPEN "C:\myDataFile.txt" FOR INPUT AS #1
                            DO
                            LINE INPUT #1, s
                            IF EOF(1) THEN
                               IF SEEK(#1) <> LOF(#1) THEN  '<<may need to be adjusted +/- 1
                                  SEEK #1, SEEK(#1) + 1     '<<may need to be adjusted +/- 1
                                  IF SEEK(#1) - LOF(#1) >= 1 THEN EXIT DO
                                  ? s
                               END IF
                            END IF
                            LOOP
                            ? "Real End of File"
                        END FUNCTION

                        Comment


                        • #13
                          Hence my age ole "Ticked point of WHY IN THE WORLD??? would anyone use Non-Printable character?"
                          (Personally they drive me NUTS when debugging a problem, because what you can't see you can't debug)
                          >May I ask what a control-Z is doing in a text file?

                          Why would ANY unprintable or control character be found in a "text" file?

                          Because <stuff> happens, that's why.
                          control-Z is normally an "Undo" command...but that is "Application Specific"...it could mean "Turn the Car on" depending on the application, and the programmer that wrote it

                          When in doubt, I would advise a routine to transform any non-printable characters into a printable.
                          At least from there depending on what you are doing, at least you can see characters that you did not know where there, and act accordingly
                          Engineer's Motto: If it aint broke take it apart and fix it

                          "If at 1st you don't succeed... call it version 1.0"

                          "Half of Programming is coding"....."The other 90% is DEBUGGING"

                          "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                          Comment


                          • #14
                            control-Z is normally an "Undo" command
                            It is now

                            If you use 'COPY CON' as a text editor in DOS (and now from a command prompt), Ctrl+Z was what ended the editing session. Reference: PC Magazine - DOS COPY CON Definition
                            Adam Drake
                            Drake Software

                            Comment


                            • #15
                              > If you use 'COPY CON' as a text editor in DOS (and now from a command prompt....

                              And they call ME a dinosaur?
                              Michael Mattias
                              Tal Systems Inc. (retired)
                              Racine WI USA
                              [email protected]
                              http://www.talsystems.com

                              Comment


                              • #16
                                Well fellas,

                                I played with the code fragments from John, but couldn't seem to get a flow that worked through the embedded CtlZ without breaking it into two lines or repeating a line fragment.

                                So I went with the block read and extract route. Although I didn't mention it in my problem description (not relevant), the routine was also required to handle all three types of line endings $CRLF, $CR and $LF so that MAC and Unix files can also be read properly. So it already was wrinkled up somewhat.

                                Just FYI here's the essence of what's left after app specific stuff is stripped out.
                                Code:
                                #COMPILE EXE
                                #DIM ALL
                                
                                FUNCTION PBMAIN () AS LONG
                                LOCAL bytesleft, Chunklen AS QUAD, TxtLine, Chunk, Dlm AS STRING, Once AS INTEGER
                                   OPEN "Embedded CtlZ.txt" FOR BINARY ACCESS READ AS #1
                                   bytesleft = LOF(1)
                                   
                                   DO WHILE bytesleft > 0
                                      chunklen = MIN(65536, LOF(1))
                                      GET$ #1, chunklen, Chunk
                                      bytesleft = bytesleft - LEN(Chunk)
                                      IF Once = 0 THEN
                                         Dlm = $CRLF: once = 1
                                         IF INSTR(Chunk, $CRLF) THEN
                                            '
                                         ELSEIF INSTR(Chunk, $CR) THEN
                                            Dlm = $CR
                                         ELSEIF INSTR(Chunk, $LF) THEN
                                            Dlm = $LF
                                         END IF
                                      END IF
                                
                                      DO WHILE LEN(Chunk) > 0
                                         TxtLine = EXTRACT$(Chunk, Dlm)
                                         Chunk = REMAIN$(Chunk, Dlm)
                                         #DEBUG PRINT TxtLine
                                      LOOP
                                      
                                   LOOP
                                                                            
                                   CLOSE 1
                                END FUNCTION

                                Comment


                                • #17
                                  Typo?
                                  Code:
                                  DO WHILE bytesleft > 0
                                        chunklen = MIN(65536, LOF(1))
                                        GET$ #1, chunklen, Chunk
                                  Should be....
                                  Code:
                                  DO WHILE bytesleft > 0
                                        chunklen = MIN(65536, BytesLeft)
                                        GET$ #1, chunklen, Chunk
                                  MCM
                                  Michael Mattias
                                  Tal Systems Inc. (retired)
                                  Racine WI USA
                                  [email protected]
                                  http://www.talsystems.com

                                  Comment


                                  • #18
                                    Right on Michael, but thinking about it the logic would still work since all it does is allow MORE to be read than is actually there, which it doesn't. And the chunk parsing uses LEN(chunk).

                                    Regardless, thanks for spotting it, I'll correct it so it logically makes sense.

                                    George

                                    Comment

                                    Working...
                                    X