Announcement

Collapse
No announcement yet.

Read past a $EOF char

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • George Deluca
    replied
    Right on Michael, but thinking about it the logic would still work since all it does is allow MORE to be read than is actually there, which it doesn't. And the chunk parsing uses LEN(chunk).

    Regardless, thanks for spotting it, I'll correct it so it logically makes sense.

    George

    Leave a comment:


  • Michael Mattias
    replied
    Typo?
    Code:
    DO WHILE bytesleft > 0
          chunklen = MIN(65536, [B][COLOR="Red"]LOF(1)[/COLOR][/B])
          GET$ #1, chunklen, Chunk
    Should be....
    Code:
    DO WHILE bytesleft > 0
          chunklen = MIN(65536, [B][COLOR="Red"]BytesLeft[/COLOR][/B])
          GET$ #1, chunklen, Chunk
    MCM

    Leave a comment:


  • George Deluca
    replied
    Well fellas,

    I played with the code fragments from John, but couldn't seem to get a flow that worked through the embedded CtlZ without breaking it into two lines or repeating a line fragment.

    So I went with the block read and extract route. Although I didn't mention it in my problem description (not relevant), the routine was also required to handle all three types of line endings $CRLF, $CR and $LF so that MAC and Unix files can also be read properly. So it already was wrinkled up somewhat.

    Just FYI here's the essence of what's left after app specific stuff is stripped out.
    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
    LOCAL bytesleft, Chunklen AS QUAD, TxtLine, Chunk, Dlm AS STRING, Once AS INTEGER
       OPEN "Embedded CtlZ.txt" FOR BINARY ACCESS READ AS #1
       bytesleft = LOF(1)
       
       DO WHILE bytesleft > 0
          chunklen = MIN(65536, LOF(1))
          GET$ #1, chunklen, Chunk
          bytesleft = bytesleft - LEN(Chunk)
          IF Once = 0 THEN
             Dlm = $CRLF: once = 1
             IF INSTR(Chunk, $CRLF) THEN
                '
             ELSEIF INSTR(Chunk, $CR) THEN
                Dlm = $CR
             ELSEIF INSTR(Chunk, $LF) THEN
                Dlm = $LF
             END IF
          END IF
    
          DO WHILE LEN(Chunk) > 0
             TxtLine = EXTRACT$(Chunk, Dlm)
             Chunk = REMAIN$(Chunk, Dlm)
             #DEBUG PRINT TxtLine
          LOOP
          
       LOOP
                                                
       CLOSE 1
    END FUNCTION

    Leave a comment:


  • Michael Mattias
    replied
    > If you use 'COPY CON' as a text editor in DOS (and now from a command prompt....

    And they call ME a dinosaur?

    Leave a comment:


  • Adam J. Drake
    replied
    control-Z is normally an "Undo" command
    It is now

    If you use 'COPY CON' as a text editor in DOS (and now from a command prompt), Ctrl+Z was what ended the editing session. Reference: PC Magazine - DOS COPY CON Definition

    Leave a comment:


  • Cliff Nichols
    replied
    Hence my age ole "Ticked point of WHY IN THE WORLD??? would anyone use Non-Printable character?"
    (Personally they drive me NUTS when debugging a problem, because what you can't see you can't debug)
    >May I ask what a control-Z is doing in a text file?

    Why would ANY unprintable or control character be found in a "text" file?

    Because <stuff> happens, that's why.
    control-Z is normally an "Undo" command...but that is "Application Specific"...it could mean "Turn the Car on" depending on the application, and the programmer that wrote it

    When in doubt, I would advise a routine to transform any non-printable characters into a printable.
    At least from there depending on what you are doing, at least you can see characters that you did not know where there, and act accordingly

    Leave a comment:


  • John Gleason
    replied
    Here's a more functional version:
    Code:
    #COMPILE EXE
    #DIM ALL
    FUNCTION PBMAIN () AS LONG
        LOCAL s AS STRING
        OPEN "C:\myDataFile.txt" FOR INPUT AS #1
        DO
        LINE INPUT #1, s
        IF EOF(1) THEN
           IF SEEK(#1) <> LOF(#1) THEN  '<<may need to be adjusted +/- 1
              SEEK #1, SEEK(#1) + 1     '<<may need to be adjusted +/- 1
              IF SEEK(#1) - LOF(#1) >= 1 THEN EXIT DO
              ? s
           END IF
        END IF
        LOOP
        ? "Real End of File"
    END FUNCTION

    Leave a comment:


  • John Gleason
    replied
    I think this might have a chance:
    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
        LOCAL s AS STRING
        OPEN "D:\myDatafile.txt" FOR INPUT AS #1
        LINE INPUT #1, s
        IF EOF(1) THEN
           IF SEEK(#1) <> LOF(#1) THEN  '<<may need to be adjusted +/- 1
              SEEK #1, SEEK(#1) + 1     '<<may need to be adjusted +/- 1
           END IF
        END IF
        ? s
    END FUNCTION

    Leave a comment:


  • Michael Mattias
    replied
    >May I ask what a control-Z is doing in a text file?

    Why would ANY unprintable or control character be found in a "text" file?

    Because <stuff> happens, that's why.

    Leave a comment:


  • Dale Yarker
    replied
    Not byte-by-byte, just do what Ralph Bing suggested, without the REMOVE$ line or after.

    To find lines in WholeFile$ you can use PARSE$(WholeFile$, $CRLF, x) in a FOR/NEXT loop where the TO part comes from PARSECOUNT(WholeFile$, $CRLF)

    May I ask what a control-Z is doing in a text file?

    Leave a comment:


  • Michael Mattias
    replied
    #CTRL_Z_IS_EOF is a fantasy; there is no such thing....

    ...yet. It was an idea for a new feature suggestion. Of course, had I really had the urge for this feature I would have sent the idea directly to the designated address for all New Feature Suggestions, [email protected].

    However, anyone who does think it is a good idea for an NFS should send it in.

    I would imagine if #CTRL_Z_IS_EOF were ON (default), the compiler would generate the same code it does today; but were it OFF, CTRL-Z (0x1A, CHR$(26)) would be treated as 'just another character' and you would not have EOF()=true just because the next byte is CTRL-Z.

    MCM
    Last edited by Michael Mattias; 7 Oct 2008, 09:54 AM.

    Leave a comment:


  • George Deluca
    replied
    Hi Again,
    The suggestions are fine, but I don't want to replace the Ctl-Z, I want to get past it while reading the file. It can't be replaced since I'm updating the file and it must be carried forward to the output file.

    I'm also confused by #CTRL_Z_IS_EOF OFF |ON

    or are we all just missing the joke?

    George

    Leave a comment:


  • John Gleason
    replied
    >> #CTRL_Z_IS_EOF OFF |ON

    Michael, I'm not familiar with the above and can't seem to find any documentation on it. Can you elaborate?

    Leave a comment:


  • Michael Mattias
    replied
    ???
    Code:
    #CTRL_Z_IS_EOF  OFF |ON
    ???

    Leave a comment:


  • Jim Robinson
    replied
    If the file is too large to read into memory, you can get chunks of data in binary mode.

    I find it easier to prepare the file in a separate step. Below is a pretty fast routine to replace any imbedded EOF characters.
    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
    LOCAL iChunk AS LONG
    LOCAL qByte, qBytesLeft AS QUAD
    LOCAL sRecord AS STRING
        OPEN "INPUT.CSV" FOR BINARY AS #1
        OPEN "OUTPUT.CSV" FOR BINARY AS #2
        qBytesLeft = LOF(1)                               '
        iChunk = MIN(50000,LOF(1))                        '
        SETEOF(2)                                         ' ensure starting with zero bytes output
        qByte = 1
        DO
           GET$ #1,iChunk,sRecord                          ' Get a chunk of bytes
    
            ' use only ONE of the following statements:
           REPLACE CHR$(26) WITH " " IN sRecord            ' replace EOF with space
    '        sRecord = REMOVE$(sRecord, CHR$(26))            ' remove EOF
    
            PUT$ #2, sRecord
            qBytesLeft = qBytesLeft - iChunk                ' determine number of bytes left
            IF iChunk > qBytesLeft THEN iChunk = qBytesLeft ' Last read to be remaining number of bytes in file
        LOOP UNTIL qBytesLeft = 0
        CLOSE
    
    END FUNCTION
    regards
    jim

    Leave a comment:


  • John Gleason
    replied
    I agree to do like Ralph says, but instead of:
    Code:
    x$ = REMOVE$(wholefile$, CHR$(00)) ' change to eof char
    use:
    Code:
    REPLACE ANY $EOF WITH $SPC IN wholefile$
    then print wholefile$ as your new file.

    Leave a comment:


  • Ralph Bing
    replied
    I would read the whole file in to a string
    repace the EOF character with a space
    write out to a new file
    read the new file

    FUNCTION PBMAIN
    fname$ = "???"

    OPEN fname$ FOR BINARY ACCESS READ LOCK SHARED AS #1
    GET$ #1, LOF(1), WholeFile$ 'read entire string
    CLOSE #1
    x$ = REMOVE$(wholefile$, CHR$(00)) ' change to eof char
    OPEN "New"+fname$ FOR OUTPUT AS #33
    PRINT #33,x$
    CLOSE #33

    END FUNCTION

    Leave a comment:


  • George Deluca
    started a topic Read past a $EOF char

    Read past a $EOF char

    Using LINE INPUT# to read a txt file. How can I alter it so I can read past an embedded Ctl-Z ($EOF) character. Do I have to go back to a byte level read? (Which I'm sure would be much slower on large files.

    Its currently doing a standard DO WHILE NOT EOF(x) type loop to read the lines.
Working...
X