Announcement

Collapse
No announcement yet.

Ascii Windows ANSI vs DOS File type question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scott Turchin
    replied
    Ih ave some proprietary Unix->Pc code, forgot who sent it to me (KF Hon?)..

    So I don't want to give away his work, but apparantly it worked well, your results may be faster however....

    Didn't play at all with this, just remembered having it..

    Code:
     Open IN1 For Binary As #1
     Open IN2 For Output As #2
     TempStr$ = ""
    While Not Eof(1)
     Get$ 1, 1, Char$
     If CHAR$<> Chr$(10) Then
         TempStr$ = TempStr$ + Char$
     Else
         Print #2, TEMPSTR$
         TEMPSTR$=""
     End If
    Wend
    ------------------
    Scott

    Leave a comment:


  • John Petty
    replied
    Mike
    1
    A UDT is a user defined type.
    This would also have the speed advantage that you can define sections of the line ready for processing rather than using Mid$ ie
    Code:
    Type Data80Col
      FirstPart as string * 41
      RawDate as string * 6
      RawTime as string * 4
      'etc
      LF as string * 1
    End Type
    Dim Inrec as Data80Col
    Open "File" for random as #1 Len = Len(Inrec)
    For x& = 1 to Lof(1) step len(Inrec)
      Get #1, x&, Inrec
     'etc
    Next
    Remember to put room for the LF in the type
    By the way the example you gave is only 70 characters

    2
    Unless you want to write the file to an editor rather than process it' then the LF or CRLF is only a means of breaking up the records, so you dont care which it is, so long as you can identify it. This means that reading the whole file in one go (as I showed in my second example) and then breaking it into seperate records is the simplest and probably faster than attempting any conversion and doesnt care about line length. I regularly use this method to process UNIX files.


    ------------------

    Leave a comment:


  • Michael Mattias
    replied
    ..I think it is slow because each time I ask for the next occurance
    of the CHR$(10) using PARSE$, it has to go thru the entire string "BinaryFile"
    counting up occurances of CHR$(10) untill it gets to BinLineNum.
    ....
    So I think I need to revert to plan B.. read a line at a time
    How about plan C?
    Code:
    x$ = EXTRACT$([start, ]MainString, [ANY] MatchString
    Or plan D?
    Code:
    REGEXPR mask$ IN main$ [AT start&] TO posvar&, lenvar&
    (BTW, there's a REGEXPR replacement for LINE INPUT in the Source Code forum)

    MCM

    Leave a comment:


  • Mike Trader
    replied
    Eric,
    Yes Im sure that each file with EITHER have lines of 80chars
    OR lines of 81chars. No file will ever have both


    Fred,
    thx for that I will examine it closely

    ------------------
    Kind Regards
    Mike

    Leave a comment:


  • Fred Oxenby
    replied
    I think it is proper to say that (Power)Basic Line Input # function
    is reading the line up to, but not including first CR (Chr$(13))
    That is, it detects CR or CRLF not LF alone....
    CR+CR+LF would be trated as TWO lines


    ------------------
    Fred
    mailto:[email protected][email protected]</A>
    http://www.oxenby.se

    Leave a comment:


  • Fred Oxenby
    replied
    This is an example , reading a file In chunk To memory,
    And parsing Linedata including terminators
    You can be sure "PC-files" From diffrent sources can have many shapes.
    This one take care of 'most' of this kind of problems.....
    comment added later for clarity
    This code is developed to handle files of ANY size, where it is important
    to know what kind of terminator is used for every line.
    CR = overtype (CR)LF is new line and FF is new page

    Code:
    Function ReadPCFile(ByVal FilName$)As Long
    Local FilNr&
    Local InBuffer$,TmpBuffer$,LineData$,LineEnd&
    Local FilLen&&,FilPos&&,BytesToRead&&,
    Local MaxBuff&,BuffSize&,BuffPos&
    Local Refill&
    Local Terminator$  
    '--Open file-------------------------------------
        On Error Resume Next
        FilNr& = FreeFile                             
        Open FilName$ For Binary Access Read Lock Write As FilNr&
        If ErrClear <> 0 Then Function = 100:Exit Function
        FilLen&& = Lof(FilNr&)
        BytesToRead&& = FilLen&&
        MaxBuff& = 100000   '100 kB local, network 8192
        FilePos&& = 1
    '--loop thru file--------------------------------
        TmpBuffer$ =""
        Refill& = 1
        Do
         Sleep 0 'if in workerThread   
         If BytesToRead&& > 0 And Refill& =1 Then
          ReFill& = 0
          BuffSize& = Min(BytesToRead&&,MaxBuff&)
          ErrClear:Get$ FilNr&,BuffSize&,InBuffer$
          If ErrClear <> 0 Then Function = 101:Close FileNr&:Exit Function
          FilPos&& = Seek(FilNr&)
          If ErrClear <> 0 Then Function = 102:Close FileNr&:Exit Function
          BytesToRead&& = FilLen&& - (FilPos&& - 1)
    '--Get rid of Trailing "1A" och NULL---------------
          If BytesToRead&& < 1  Then
           InBuffer$ = Rtrim$(InBuffer$,Any Chr$(0,&H1A))
           If Right$(InBuffer$,1)<>Chr$(10) Then InBuffer$ = InBuffer$ + Chr$(10)
          End If
    '--Merge buffers---------------------------------
          InBuffer$ = TmpBuffer$ + InBuffer$
    '--Remove CRCRLF/CRLF----------------------------
          Replace Chr$(13,13,10) With Chr$(10) In InBuffer$
          Replace Chr$(13,10)    With Chr$(10) In InBuffer$
          Replace Chr$(13,13,12) With Chr$(12) In InBuffer$
          Replace Chr$(13,12)    With Chr$(12) In InBuffer$
    '--Correct buffersize after replace--------------
          BuffSize&& = Len(InBuffer$)
          BuffPos&   = 1
    '--Here You can report progress------------------
          If UpdateProgBar(FilPos&&,FilLen&&)<> 0 Then 
           Close FilNr&: Function = 200: Exit Function
          End If
         End If    'end of buffer-fill
    '--This is extracting LineData-------------------
         LineEnd&  = Instr(BuffPos&,InBuffer$,Any Chr$(10,12,13))
         If LineEnd& = 0 Then Close FilNr&: Function = 103: Exit Function
         LineData$   = Mid$(InBuffer$,BuffPos&,LineEnd& - BuffPos&)
         Terminator$ = Mid$(InBuffer$,LineEnd&,1) ' If you want to know 
                                                  ' lineterminator FF/CR/LF
         BuffPos& = LineEnd& + 1
    '--Check if time to refill buffer----------------
         If (BytesToRead&& > 0) And _
            (BuffSize& - BuffPos& < 1024) Then    'Max expected line-length
          TmpBuffer$ = Mid$(InBuffer$,BuffPos&)
          Refill = 1
         End If
    '--Here You process your line--------------------
         ProcessData LineData$,ErrCode&
    '--Have whole file been processed----------------
         If BytesToRead&& < 1 And BuffPos& >= BuffSize& Then Exit Do
        Loop
        Close FilNr&
        Function = 0
    End Function

    ------------------
    Fred
    mailto:[email protected][email protected]</A>
    http://www.oxenby.se



    [This message has been edited by Fred Oxenby (edited July 14, 2001).]

    Leave a comment:


  • Eric Pearson
    replied
    Mike --

    Are you certain that you will never see a file that contains some 80-character records and some 81-character records?

    -- Eric

    ------------------
    Perfect Sync Development Tools
    Perfect Sync Web Site
    Contact Us: mailto:[email protected][email protected]</A>

    Leave a comment:


  • Steve Hutchesson
    replied
    Mike,

    If all you want to do is make the file so it can be easily
    manipulated by LINE INPUT, I would be inclined to use REPLACE
    to get the CRLF pair you need as it is a genuinely fast function.

    This is not as slow as it sounds, once you have done the read
    into memory, replaced it and written it back to disk, the
    LINE INPUT command will be a lot faster as the file is still in
    cache.

    Its a simple easy and fast solution that is characteristic of
    PowerBASIC code. You could do it a bit faster as an asembler
    algorithm but its not a big file and you only have to process
    it once so I am sure your client base would not be bothered by
    the speed.

    Regards,

    [email protected]

    ------------------

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Mike Trader:
    Stuart:
    I asked that, but Eric thinks that would be much slower ... see above

    It depends on how many times you plan to do it and how critical the time is.

    I have a file manipulation utility which uses this sub:

    SUB ReplChar(sInfile AS STRING,sOutfile AS STRING,sStrip AS STRING,sRepl AS STRING,iAnychar as LONG)
    LOCAL sTemp AS STRING
    LOCAL iLoop AS LONG
    OPEN sInfile FOR BINARY AS #1
    GET$ #1,LOF(1), sTemp
    CLOSE #1
    CONTROL GET CHECK hdlg,101 TO anychar
    IF iAnychar THEN
    FOR iloop = 1 TO LEN(sStrip)
    REPLACE MID$(sStrip,iLoop,1) WITH sRepl IN sTemp
    NEXT
    ELSE
    REPLACE sStrip WITH "" IN sTemp
    END IF
    OPEN sOutfile FOR OUTPUT AS #1
    print#1, sTemp
    CLOSE #1
    END SUB


    I just ran it to replace every "a" with "ab" in a 2MB file (an Access database).

    I put a wrapper outside the function with two timer checks.

    On the test file it did 66278 replacements. Total time including for the file read, replace and write : 0.5 sec
    on a PIII 450.

    It's not as though you want to loop through this procedure lot's of times.



    ------------------
    Check out my free software at http://www.lexacorp.com.pg(all written in PB/DLL)

    Leave a comment:


  • Mike Trader
    replied
    Semen,
    Thank you

    John, Eric
    I wrote the whole thing based on reading the entire file into a string
    and then parsing out the lines using Parse.
    Code:
    OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100    
        GET$ 100, LOF(100), BinaryFile    
    CLOSE #100
    
    For BinLineNum = 1 to PARSECOUNT(BinaryFile, CHR$(10))
       InputStr = PARSE$(BinaryFile, CHR$(10), BinLineNum)
    Next
    this is not fast.
    much slower than when I was using LINE INPUT on a regular ascii file

    So I think it is slow because each time I ask for the next occurance
    of the CHR$(10) using PARSE$, it has to go thru the entire string "BinaryFile"
    counting up occurances of CHR$(10) untill it gets to BinLineNum.

    Does that sound right?

    So I think I need to revert to plan B
    read a line at a time

    John,
    > The file is a classic fixed length fixed byte position record, just create a UDT and read it
    Now it is 80 chars. Apparently before March this year it was 81 Chars.
    Whats a UDT?

    I think what I need to do is open the file, read the first say 100 chars
    look for a CHR$(10), figure out how long the line length is (80 or 81)
    then simply read a line at a time with:

    Code:
    OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100    
    GET$ 100, 99, BinaryFile ' read the first 99 chars to find the first CHR$(10) LF 
    LineLength  = LEN(PARSE$(BinaryFile, CHR$(10), 1)) ' find the number of chars in a line
    SEEK 100, 1
    While NOT EOF(100)
        GET$ 100, LineLength, InputStr ' get a line of data
        Call ProcessaLine
    Wend
    ill let you know if this is faster...

    UPDATE - waaaaay faster! lightning fast
    ------------------
    Kind Regards
    Mike



    [This message has been edited by Mike Trader (edited July 14, 2001).]

    Leave a comment:


  • Mike Trader
    replied
    Eric,
    You are exactly right. My function is redundant.
    I think I get it now.
    thx so much for your help

    ------------------
    Kind Regards
    Mike

    Leave a comment:


  • Eric Pearson
    replied
    > Eric thinks that would be much slower

    Yes I do. Think about it... What's likely to be faster?

    Read 500k
    Modify 500k
    Write 500k
    Read 500k
    Parse 500k

    ...or...

    Read 500k
    Parse 500k

    Code:
    > FOR i = 1 TO LEN(BinaryLine) 
    >     InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))
    > NEXT
    If you do that with a 500k string it is going to be slooooow. Every time one character is added to the string, PB has to create a new, longer string and copy the old string into it, then add the new character. That means that by the time the operation is complete, the first character in the string will have been copied 500,000 times, the second character 499,999 times, and so on. That's an extremely inefficient way to build a long string.

    You would be much better off creating a "buffer" string and using MID$ to insert the characters.

    Something like...

    Code:
    'create a string of the necessary length
    sOutputString = SPACE$(LEN(sInputString))
    
    FOR lCharacter = 1 TO LEN(sInputString)
        'insert character into pre-sized string
        MID$(sOutputString,lCharacter) = whatever
    NEXT
    Even if you are using strings that are much shorter than 500k, the same principle applies and the "buffer" function will still be faster.

    By the way, I don't get this part of your code...

    Code:
    CHR$(ASC(MID$(BinStr, Pos, 1)))
    It looks like you are taking the ASCII value of a character, and then creating a character with that ASCII code. Either it's late and I'm tired, or that is a complete waste of time. If that's the case, you can get rid of the whole "conversion" function because nothing needs to be coverted.

    -- Eric


    ------------------
    Perfect Sync Development Tools
    Perfect Sync Web Site
    Contact Us: mailto:[email protected][email protected]</A>



    [This message has been edited by Eric Pearson (edited July 13, 2001).]

    Leave a comment:


  • John Petty
    replied
    Mike
    Why would you bother converting? The file is a classic fixed length fixed byte position record, just create a UDT and read it as a random file.
    If you want to read it as a string then replace the line input funcion with your own ie
    Open "file" for binary as #1
    Get$ #1, lof(1), a$
    while instr(a$,chr$(10))
    x& = instr(a$,chr$(10))
    NewLineInput = left$(a$,x&-1)
    a$ = right$(a$, len(a$) - x&)
    'work on the new line
    wend

    ------------------

    Leave a comment:


  • Mike Trader
    replied
    Stuart:
    I asked that, but Eric thinks that would be much slower ... see above

    Eric:
    yes the records are fixed length (80 Chars including the LF), BUT
    I just found out that the lines used to be 81 chars long before March!
    So I guess I have to read the whole file into memory and parse it out
    from there as my prog needs to be able to handle both file lengths

    So riddle me this ...

    I have to pick out short strings within the Line some 4 chars long (Time)
    some 3chars long (symbol) some one char (Price Divider)

    Is it faster to convert the whole line from binary to Ascii
    Code:
        InputStr = ""
        FOR i = 1 TO LEN(BinaryLine) 
            InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))
        NEXT
    and then use this:
    ResultStr = Mid$(AsciiStr, startPos, Len)
    7 times
    OR

    use a function to convert just the chars needed within the binary string:

    Code:
    '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
    FUNCTION ConvertToAscii( BinStr AS STRING, First AS LONG, Last AS LONG) AS STRING
    LOCAL Pos AS LONG, AsciiStr AS STRING
        FOR Pos = First TO Last
            AsciiStr = AsciiStr + CHR$(ASC(MID$(BinStr, Pos, 1)))
        NEXT
    FUNCTION = AsciiStr
    END FUNCTION
    to put it another way is it faster to do this:

    InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))

    80 times

    or call the function 7 times executing this:

    AsciiStr = AsciiStr + CHR$(ASC(MID$(BinStr, Pos, 1)))

    a total of 6+4+4+1+2+2+3=22 times
    (thats 1/4 the above 80 times, but with the added overhead of the function call 7 times))

    ------------------
    Kind Regards
    Mike

    [This message has been edited by Mike Trader (edited July 13, 2001).]

    Leave a comment:


  • Stuart McLachlan
    replied
    The way I would handle that for a 500K file is create a replacement file like this:

    Open MyUnixFile for binary as #1
    get$ #1,lof(1), sTemp
    replace chr$(13) whith chr$(13,10) in sTemp
    close #1
    open MyDosFile for output as #1
    print #1, sTemp
    close #1

    (or just overwrite the original -if you feel lucky)



    ------------------
    Check out my free software at http://www.lexacorp.com.pg(all written in PB/DLL)

    Leave a comment:


  • Eric Pearson
    replied
    Mike --

    > would it be faster to convert the whole
    > file (500k) and then use LINE INPUT

    That would almost certainly be slower. Much slower.

    > As I understand it from reading the help file,
    > [when using BINARY] you HAVE TO specify
    > the record length of each record.

    No, that's FOR RANDOM. FOR BINARY allows you to use the file "free form", reading as many or as few bytes as you want with each GET$.

    > Are you suggesting:

    Close. You would want to use PARSECOUNT instead of EOF(1). And if the files are large you might not want to handle it as one big string. But yes, that's the basic idea.

    BUT... If the records are in fact fixed-length, it would be much easier to simply GET$ the appropriate number of bytes to get one record at a time.

    -- Eric

    ------------------
    Perfect Sync Development Tools
    Perfect Sync Web Site
    Contact Us: mailto:[email protected][email protected]</A>

    [This message has been edited by Eric Pearson (edited July 13, 2001).]

    Leave a comment:


  • Mike Trader
    replied
    Thx Scott and Tom,

    would it be faster to convert the whole file (500k) and then use
    LINE INPUT as normal rather than reading a line at a time in binary
    and converting each character to Ascii befor processing?

    Also how does this binary mode work?
    As I understand it from reading the help file, you HAVE TO specify
    the record length of each record.

    In This case each line is 76 chars long. So at 4bytes a pop thats
    a record lenght of 304 units not including the OA or LF item (is that
    another 4Bytes?)


    Eric:
    Are you suggesting:
    Code:
        OPEN FileName FOR BINARY ACCESS READ LOCK WRITE AS #1
        GET$ 1, LOF(1), BinaryStr
        CLOSE #1  
        For i = 1 to EOF(1)
            BinaryLine = Parse$(BinaryStr, CHR$(10), i)
            Call ConvertBinaryLineToAsciiLine
            Call ProcessAsciiLine
        Next
    ------------------
    Kind Regards
    Mike



    [This message has been edited by Mike Trader (edited July 13, 2001).]

    Leave a comment:


  • Tom Hanlin
    replied
    See http://www.powerbasic.com/files/pub/...ls/lf2crlf.zip


    ------------------
    Tom Hanlin
    PowerBASIC Staff

    Leave a comment:


  • Semen Matusovski
    replied
    I downloaded and looked. Unix classic CHR$(10)

    [This message has been edited by Semen Matusovski (edited July 13, 2001).]

    Leave a comment:


  • Scott Turchin
    replied
    Ah, how to convert, well I had that issue too, and I solved that issue:
    This is a key string, but I encrypt with the binary equiv of it or another key:

    Use converttobinarystring to convert this, or back...

    4E36503A3859533D423936352F2E5F29

    Code:
    Declare Function ConvertToHexString(charSt As String) As String
    Declare Function ConvertToBinaryString(charSt As String) As String
    Function ConvertToHexString(charSt As String)Export As String
    Local x As Long
    Local y As Long
    Local St As String
    y = Len(charSt)
    For x = 1 To y
        St = St + Hex$(Asc(Mid$(charSt,x,1)))
    Next
    Function = St
    End Function
    
    '----------------------------------------------------------------------------------------------------------------
    Function ConvertToBinaryString(charSt As String)Export As String
    Local y As Long
    Local x As Long
    Local PosPtr As Long
    Local St As String
    Local NewKeySt As String
    
    y = Len(charSt)
    For x = 1 To y Step 2
        St = Mid$(charSt,x,2)
        NewKeySt = NewKeySt + Chr$(Val("&h" + St))
    Next
    Function = NewKeySt
    End Function
    ------------------
    Scott

    [This message has been edited by Scott Turchin (edited July 13, 2001).]

    Leave a comment:

Working...
X