Announcement

Collapse
No announcement yet.

Ascii Windows ANSI vs DOS File type question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ascii Windows ANSI vs DOS File type question

    I have a file generated by the Chicago Mercantile Exchange.
    ftp.cme.com/pub/time
    each ZIP file contains 4 ascii data files.
    When I unzip one of these and open it with NotePad, there is no problem
    When I open it with UltraEdit i get a message:
    "Do you want to convert vt010705.iom to DOS format?"
    If I answer yes or no the Ascii file is still displayed.

    The problem is that when I try to input a line with my program
    using LINE INPUT, the program gets stuck for a few seconds
    and then locks up with the error "... illegal operation ... contact vendor ..."

    I called CME and no one really knows. They say the files are produced
    by a main frame but dont know more than that. Most people open them
    in excell which thinks the file is Windows ANSI and not DOS

    What is the difference
    How do I input a line from PB?

    ------------------
    Kind Regards
    Mike



    [This message has been edited by Mike Trader (edited July 13, 2001).]

  • #2
    Perhaps it's Unix Format? ($CR instead of $CRLF)

    ------------------
    E-Mail (home): mailto:[email protected][email protected]</A>
    E-Mail (work): mailto:[email protected][email protected]</A>

    Comment


    • #3
      > Perhaps it's Unix Format? ($CR instead of $CRLF)

      That's my guess too, but Unix and Linux use CHR$(10) ($LF) not CHR$(13) ($CR).

      Tell UltraEdit "no", then if it doesn't switch to the hex-edit mode automatically, switch it manually using Edit > HexEdit. I'll bet you see "OA" characters between the lines of text. &h0A = 10 = $LF.

      If that's the case, you will need to OPEN the file FOR BINARY and then use PARSE$ (or a similar mechanism) to parse out the individual lines of text.

      -- Eric


      ------------------
      Perfect Sync Development Tools
      Perfect Sync Web Site
      Contact Us: mailto:[email protected][email protected]</A>



      [This message has been edited by Eric Pearson (edited July 13, 2001).]
      "Not my circus, not my monkeys."

      Comment


      • #4
        Eric:
        Here is what I see in Ultra Edit Normal Mode:
        E0 E0 20010700 C 0004475 010705 105100 6 0001300 0000000250 00000000 9
        E0 E0 20010700 C 0004487 010705 073012 1 0000650 0000000100 00000000 9
        E0 E0 20010700 C 0004487 010705 085600 6 0000700 0000000500 00000000 9

        here it is in HexEdit Mode: (just the right column)
        ; .E0 E0 20010700
        ; C 0004475 010705
        ; 105100 6 000130
        ; 0 0000000250
        ; 00000000 9.E0
        ; E0 20010700 C 0
        ; 004487 010705 07

        Notice there is a "." at the beginning of each line.

        In the Hex area this shows up as OA, yes. Good Guess!

        So here is my code:

        Code:
            OPEN FilePathStr+FileNameStr FOR INPUT AS 100 LEN = 32768  ' Open data file for reading
            LINE INPUT #100, InputStr ' Read a line of Data
            IF InputStr = "" THEN MSGBOX "Line Missing",,"Error" : EXIT  ' end of data or a hole in dataTSDate      = VAL(MID$(InputStr, 42, 6)) + 1000000    
            Date  = VAL(MID$(InputStr, 42, 6)) + 1000000
            Time  = val(MID$(InputStr, 17, 4))
            etc ...
        How do I do this in Binary?
        How do you read aline in binary and convert that line to ascii chars
        for processing?



        ------------------
        Kind Regards
        Mike

        Comment


        • #5
          When you say mainframe are you saying "IBM ES9000" or are you saying "A unix box that "Acts" like a mainframe".


          If it is a mainframe, then my next question is "What software are you using to download the file from the mainframe with?"...

          cuz I can guarantee you that almost every vendor on the planet has a file transfer issue in ONE way or another with file transfers....

          If it's E!PC then hit ATM's website and get the latest patches and re-download it, or grab an eval copy of the latest software and grab it and test again....


          There are two formats when uploading/downloading from the mainframe, in one format nothing ahppens, it just PUTS the file up on the mainframe.
          The other format (and I always confuse the two) the mainframe converst the file to EBCDIC so IN CASE you wish to view/use the file on the mainframe it's in the native language..

          So, during that conversion, specially if it is a mainframe generated file and is transferred back as ASCII may have an issue with the conversion, there may be a premature EOF or some other extraneous file in there...

          Also note, we used notepad to open/copy/paste mainframe traces, saved as binary because notepad does not change the binary structure like other editors do, notepad rocks for that...

          so, Definitely sounds like a mainframe conversion issue, *or* an issue with the way the file was generated ON the mainframe...


          And, I have sufficient tools to do some playing with the file if you like, I just don't have a mainframe to test with anymore...(I used to own a system36 hehe)..


          Let me know if ya need a hand,


          Scott


          ------------------
          Scott
          Scott Turchin
          MCSE, MCP+I
          http://www.tngbbs.com
          ----------------------
          True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

          Comment


          • #6
            Ah, how to convert, well I had that issue too, and I solved that issue:
            This is a key string, but I encrypt with the binary equiv of it or another key:

            Use converttobinarystring to convert this, or back...

            4E36503A3859533D423936352F2E5F29

            Code:
            Declare Function ConvertToHexString(charSt As String) As String
            Declare Function ConvertToBinaryString(charSt As String) As String
            Function ConvertToHexString(charSt As String)Export As String
            Local x As Long
            Local y As Long
            Local St As String
            y = Len(charSt)
            For x = 1 To y
                St = St + Hex$(Asc(Mid$(charSt,x,1)))
            Next
            Function = St
            End Function
            
            '----------------------------------------------------------------------------------------------------------------
            Function ConvertToBinaryString(charSt As String)Export As String
            Local y As Long
            Local x As Long
            Local PosPtr As Long
            Local St As String
            Local NewKeySt As String
            
            y = Len(charSt)
            For x = 1 To y Step 2
                St = Mid$(charSt,x,2)
                NewKeySt = NewKeySt + Chr$(Val("&h" + St))
            Next
            Function = NewKeySt
            End Function
            ------------------
            Scott

            [This message has been edited by Scott Turchin (edited July 13, 2001).]
            Scott Turchin
            MCSE, MCP+I
            http://www.tngbbs.com
            ----------------------
            True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

            Comment


            • #7
              I downloaded and looked. Unix classic CHR$(10)

              [This message has been edited by Semen Matusovski (edited July 13, 2001).]

              Comment


              • #8
                See http://www.powerbasic.com/files/pub/...ls/lf2crlf.zip


                ------------------
                Tom Hanlin
                PowerBASIC Staff

                Comment


                • #9
                  Thx Scott and Tom,

                  would it be faster to convert the whole file (500k) and then use
                  LINE INPUT as normal rather than reading a line at a time in binary
                  and converting each character to Ascii befor processing?

                  Also how does this binary mode work?
                  As I understand it from reading the help file, you HAVE TO specify
                  the record length of each record.

                  In This case each line is 76 chars long. So at 4bytes a pop thats
                  a record lenght of 304 units not including the OA or LF item (is that
                  another 4Bytes?)


                  Eric:
                  Are you suggesting:
                  Code:
                      OPEN FileName FOR BINARY ACCESS READ LOCK WRITE AS #1
                      GET$ 1, LOF(1), BinaryStr
                      CLOSE #1  
                      For i = 1 to EOF(1)
                          BinaryLine = Parse$(BinaryStr, CHR$(10), i)
                          Call ConvertBinaryLineToAsciiLine
                          Call ProcessAsciiLine
                      Next
                  ------------------
                  Kind Regards
                  Mike



                  [This message has been edited by Mike Trader (edited July 13, 2001).]

                  Comment


                  • #10
                    Mike --

                    > would it be faster to convert the whole
                    > file (500k) and then use LINE INPUT

                    That would almost certainly be slower. Much slower.

                    > As I understand it from reading the help file,
                    > [when using BINARY] you HAVE TO specify
                    > the record length of each record.

                    No, that's FOR RANDOM. FOR BINARY allows you to use the file "free form", reading as many or as few bytes as you want with each GET$.

                    > Are you suggesting:

                    Close. You would want to use PARSECOUNT instead of EOF(1). And if the files are large you might not want to handle it as one big string. But yes, that's the basic idea.

                    BUT... If the records are in fact fixed-length, it would be much easier to simply GET$ the appropriate number of bytes to get one record at a time.

                    -- Eric

                    ------------------
                    Perfect Sync Development Tools
                    Perfect Sync Web Site
                    Contact Us: mailto:[email protected][email protected]</A>

                    [This message has been edited by Eric Pearson (edited July 13, 2001).]
                    "Not my circus, not my monkeys."

                    Comment


                    • #11
                      The way I would handle that for a 500K file is create a replacement file like this:

                      Open MyUnixFile for binary as #1
                      get$ #1,lof(1), sTemp
                      replace chr$(13) whith chr$(13,10) in sTemp
                      close #1
                      open MyDosFile for output as #1
                      print #1, sTemp
                      close #1

                      (or just overwrite the original -if you feel lucky)



                      ------------------
                      Check out my free software at http://www.lexacorp.com.pg(all written in PB/DLL)

                      Comment


                      • #12
                        Stuart:
                        I asked that, but Eric thinks that would be much slower ... see above

                        Eric:
                        yes the records are fixed length (80 Chars including the LF), BUT
                        I just found out that the lines used to be 81 chars long before March!
                        So I guess I have to read the whole file into memory and parse it out
                        from there as my prog needs to be able to handle both file lengths

                        So riddle me this ...

                        I have to pick out short strings within the Line some 4 chars long (Time)
                        some 3chars long (symbol) some one char (Price Divider)

                        Is it faster to convert the whole line from binary to Ascii
                        Code:
                            InputStr = ""
                            FOR i = 1 TO LEN(BinaryLine) 
                                InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))
                            NEXT
                        and then use this:
                        ResultStr = Mid$(AsciiStr, startPos, Len)
                        7 times
                        OR

                        use a function to convert just the chars needed within the binary string:

                        Code:
                        '¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
                        FUNCTION ConvertToAscii( BinStr AS STRING, First AS LONG, Last AS LONG) AS STRING
                        LOCAL Pos AS LONG, AsciiStr AS STRING
                            FOR Pos = First TO Last
                                AsciiStr = AsciiStr + CHR$(ASC(MID$(BinStr, Pos, 1)))
                            NEXT
                        FUNCTION = AsciiStr
                        END FUNCTION
                        to put it another way is it faster to do this:

                        InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))

                        80 times

                        or call the function 7 times executing this:

                        AsciiStr = AsciiStr + CHR$(ASC(MID$(BinStr, Pos, 1)))

                        a total of 6+4+4+1+2+2+3=22 times
                        (thats 1/4 the above 80 times, but with the added overhead of the function call 7 times))

                        ------------------
                        Kind Regards
                        Mike

                        [This message has been edited by Mike Trader (edited July 13, 2001).]

                        Comment


                        • #13
                          Mike
                          Why would you bother converting? The file is a classic fixed length fixed byte position record, just create a UDT and read it as a random file.
                          If you want to read it as a string then replace the line input funcion with your own ie
                          Open "file" for binary as #1
                          Get$ #1, lof(1), a$
                          while instr(a$,chr$(10))
                          x& = instr(a$,chr$(10))
                          NewLineInput = left$(a$,x&-1)
                          a$ = right$(a$, len(a$) - x&)
                          'work on the new line
                          wend

                          ------------------

                          Comment


                          • #14
                            > Eric thinks that would be much slower

                            Yes I do. Think about it... What's likely to be faster?

                            Read 500k
                            Modify 500k
                            Write 500k
                            Read 500k
                            Parse 500k

                            ...or...

                            Read 500k
                            Parse 500k

                            Code:
                            > FOR i = 1 TO LEN(BinaryLine) 
                            >     InputStr = InputStr + CHR$(ASC(MID$(BinaryLine, i, 1)))
                            > NEXT
                            If you do that with a 500k string it is going to be slooooow. Every time one character is added to the string, PB has to create a new, longer string and copy the old string into it, then add the new character. That means that by the time the operation is complete, the first character in the string will have been copied 500,000 times, the second character 499,999 times, and so on. That's an extremely inefficient way to build a long string.

                            You would be much better off creating a "buffer" string and using MID$ to insert the characters.

                            Something like...

                            Code:
                            'create a string of the necessary length
                            sOutputString = SPACE$(LEN(sInputString))
                            
                            FOR lCharacter = 1 TO LEN(sInputString)
                                'insert character into pre-sized string
                                MID$(sOutputString,lCharacter) = whatever
                            NEXT
                            Even if you are using strings that are much shorter than 500k, the same principle applies and the "buffer" function will still be faster.

                            By the way, I don't get this part of your code...

                            Code:
                            CHR$(ASC(MID$(BinStr, Pos, 1)))
                            It looks like you are taking the ASCII value of a character, and then creating a character with that ASCII code. Either it's late and I'm tired, or that is a complete waste of time. If that's the case, you can get rid of the whole "conversion" function because nothing needs to be coverted.

                            -- Eric


                            ------------------
                            Perfect Sync Development Tools
                            Perfect Sync Web Site
                            Contact Us: mailto:[email protected][email protected]</A>



                            [This message has been edited by Eric Pearson (edited July 13, 2001).]
                            "Not my circus, not my monkeys."

                            Comment


                            • #15
                              Eric,
                              You are exactly right. My function is redundant.
                              I think I get it now.
                              thx so much for your help

                              ------------------
                              Kind Regards
                              Mike

                              Comment


                              • #16
                                Semen,
                                Thank you

                                John, Eric
                                I wrote the whole thing based on reading the entire file into a string
                                and then parsing out the lines using Parse.
                                Code:
                                OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100    
                                    GET$ 100, LOF(100), BinaryFile    
                                CLOSE #100
                                
                                For BinLineNum = 1 to PARSECOUNT(BinaryFile, CHR$(10))
                                   InputStr = PARSE$(BinaryFile, CHR$(10), BinLineNum)
                                Next
                                this is not fast.
                                much slower than when I was using LINE INPUT on a regular ascii file

                                So I think it is slow because each time I ask for the next occurance
                                of the CHR$(10) using PARSE$, it has to go thru the entire string "BinaryFile"
                                counting up occurances of CHR$(10) untill it gets to BinLineNum.

                                Does that sound right?

                                So I think I need to revert to plan B
                                read a line at a time

                                John,
                                > The file is a classic fixed length fixed byte position record, just create a UDT and read it
                                Now it is 80 chars. Apparently before March this year it was 81 Chars.
                                Whats a UDT?

                                I think what I need to do is open the file, read the first say 100 chars
                                look for a CHR$(10), figure out how long the line length is (80 or 81)
                                then simply read a line at a time with:

                                Code:
                                OPEN FilePathStr+FileNameStr FOR BINARY ACCESS READ LOCK WRITE AS #100    
                                GET$ 100, 99, BinaryFile ' read the first 99 chars to find the first CHR$(10) LF 
                                LineLength  = LEN(PARSE$(BinaryFile, CHR$(10), 1)) ' find the number of chars in a line
                                SEEK 100, 1
                                While NOT EOF(100)
                                    GET$ 100, LineLength, InputStr ' get a line of data
                                    Call ProcessaLine
                                Wend
                                ill let you know if this is faster...

                                UPDATE - waaaaay faster! lightning fast
                                ------------------
                                Kind Regards
                                Mike



                                [This message has been edited by Mike Trader (edited July 14, 2001).]

                                Comment


                                • #17
                                  Originally posted by Mike Trader:
                                  Stuart:
                                  I asked that, but Eric thinks that would be much slower ... see above

                                  It depends on how many times you plan to do it and how critical the time is.

                                  I have a file manipulation utility which uses this sub:

                                  SUB ReplChar(sInfile AS STRING,sOutfile AS STRING,sStrip AS STRING,sRepl AS STRING,iAnychar as LONG)
                                  LOCAL sTemp AS STRING
                                  LOCAL iLoop AS LONG
                                  OPEN sInfile FOR BINARY AS #1
                                  GET$ #1,LOF(1), sTemp
                                  CLOSE #1
                                  CONTROL GET CHECK hdlg,101 TO anychar
                                  IF iAnychar THEN
                                  FOR iloop = 1 TO LEN(sStrip)
                                  REPLACE MID$(sStrip,iLoop,1) WITH sRepl IN sTemp
                                  NEXT
                                  ELSE
                                  REPLACE sStrip WITH "" IN sTemp
                                  END IF
                                  OPEN sOutfile FOR OUTPUT AS #1
                                  print#1, sTemp
                                  CLOSE #1
                                  END SUB


                                  I just ran it to replace every "a" with "ab" in a 2MB file (an Access database).

                                  I put a wrapper outside the function with two timer checks.

                                  On the test file it did 66278 replacements. Total time including for the file read, replace and write : 0.5 sec
                                  on a PIII 450.

                                  It's not as though you want to loop through this procedure lot's of times.



                                  ------------------
                                  Check out my free software at http://www.lexacorp.com.pg(all written in PB/DLL)

                                  Comment


                                  • #18
                                    Mike,

                                    If all you want to do is make the file so it can be easily
                                    manipulated by LINE INPUT, I would be inclined to use REPLACE
                                    to get the CRLF pair you need as it is a genuinely fast function.

                                    This is not as slow as it sounds, once you have done the read
                                    into memory, replaced it and written it back to disk, the
                                    LINE INPUT command will be a lot faster as the file is still in
                                    cache.

                                    Its a simple easy and fast solution that is characteristic of
                                    PowerBASIC code. You could do it a bit faster as an asembler
                                    algorithm but its not a big file and you only have to process
                                    it once so I am sure your client base would not be bothered by
                                    the speed.

                                    Regards,

                                    [email protected]

                                    ------------------
                                    hutch at movsd dot com
                                    The MASM Forum

                                    www.masm32.com

                                    Comment


                                    • #19
                                      Mike --

                                      Are you certain that you will never see a file that contains some 80-character records and some 81-character records?

                                      -- Eric

                                      ------------------
                                      Perfect Sync Development Tools
                                      Perfect Sync Web Site
                                      Contact Us: mailto:[email protected][email protected]</A>
                                      "Not my circus, not my monkeys."

                                      Comment


                                      • #20
                                        This is an example , reading a file In chunk To memory,
                                        And parsing Linedata including terminators
                                        You can be sure "PC-files" From diffrent sources can have many shapes.
                                        This one take care of 'most' of this kind of problems.....
                                        comment added later for clarity
                                        This code is developed to handle files of ANY size, where it is important
                                        to know what kind of terminator is used for every line.
                                        CR = overtype (CR)LF is new line and FF is new page

                                        Code:
                                        Function ReadPCFile(ByVal FilName$)As Long
                                        Local FilNr&
                                        Local InBuffer$,TmpBuffer$,LineData$,LineEnd&
                                        Local FilLen&&,FilPos&&,BytesToRead&&,
                                        Local MaxBuff&,BuffSize&,BuffPos&
                                        Local Refill&
                                        Local Terminator$  
                                        '--Open file-------------------------------------
                                            On Error Resume Next
                                            FilNr& = FreeFile                             
                                            Open FilName$ For Binary Access Read Lock Write As FilNr&
                                            If ErrClear <> 0 Then Function = 100:Exit Function
                                            FilLen&& = Lof(FilNr&)
                                            BytesToRead&& = FilLen&&
                                            MaxBuff& = 100000   '100 kB local, network 8192
                                            FilePos&& = 1
                                        '--loop thru file--------------------------------
                                            TmpBuffer$ =""
                                            Refill& = 1
                                            Do
                                             Sleep 0 'if in workerThread   
                                             If BytesToRead&& > 0 And Refill& =1 Then
                                              ReFill& = 0
                                              BuffSize& = Min(BytesToRead&&,MaxBuff&)
                                              ErrClear:Get$ FilNr&,BuffSize&,InBuffer$
                                              If ErrClear <> 0 Then Function = 101:Close FileNr&:Exit Function
                                              FilPos&& = Seek(FilNr&)
                                              If ErrClear <> 0 Then Function = 102:Close FileNr&:Exit Function
                                              BytesToRead&& = FilLen&& - (FilPos&& - 1)
                                        '--Get rid of Trailing "1A" och NULL---------------
                                              If BytesToRead&& < 1  Then
                                               InBuffer$ = Rtrim$(InBuffer$,Any Chr$(0,&H1A))
                                               If Right$(InBuffer$,1)<>Chr$(10) Then InBuffer$ = InBuffer$ + Chr$(10)
                                              End If
                                        '--Merge buffers---------------------------------
                                              InBuffer$ = TmpBuffer$ + InBuffer$
                                        '--Remove CRCRLF/CRLF----------------------------
                                              Replace Chr$(13,13,10) With Chr$(10) In InBuffer$
                                              Replace Chr$(13,10)    With Chr$(10) In InBuffer$
                                              Replace Chr$(13,13,12) With Chr$(12) In InBuffer$
                                              Replace Chr$(13,12)    With Chr$(12) In InBuffer$
                                        '--Correct buffersize after replace--------------
                                              BuffSize&& = Len(InBuffer$)
                                              BuffPos&   = 1
                                        '--Here You can report progress------------------
                                              If UpdateProgBar(FilPos&&,FilLen&&)<> 0 Then 
                                               Close FilNr&: Function = 200: Exit Function
                                              End If
                                             End If    'end of buffer-fill
                                        '--This is extracting LineData-------------------
                                             LineEnd&  = Instr(BuffPos&,InBuffer$,Any Chr$(10,12,13))
                                             If LineEnd& = 0 Then Close FilNr&: Function = 103: Exit Function
                                             LineData$   = Mid$(InBuffer$,BuffPos&,LineEnd& - BuffPos&)
                                             Terminator$ = Mid$(InBuffer$,LineEnd&,1) ' If you want to know 
                                                                                      ' lineterminator FF/CR/LF
                                             BuffPos& = LineEnd& + 1
                                        '--Check if time to refill buffer----------------
                                             If (BytesToRead&& > 0) And _
                                                (BuffSize& - BuffPos& < 1024) Then    'Max expected line-length
                                              TmpBuffer$ = Mid$(InBuffer$,BuffPos&)
                                              Refill = 1
                                             End If
                                        '--Here You process your line--------------------
                                             ProcessData LineData$,ErrCode&
                                        '--Have whole file been processed----------------
                                             If BytesToRead&& < 1 And BuffPos& >= BuffSize& Then Exit Do
                                            Loop
                                            Close FilNr&
                                            Function = 0
                                        End Function

                                        ------------------
                                        Fred
                                        mailto:[email protected][email protected]</A>
                                        http://www.oxenby.se



                                        [This message has been edited by Fred Oxenby (edited July 14, 2001).]
                                        Fred
                                        mailto:[email protected]se[email protected]</A>
                                        http://www.oxenby.se

                                        Comment

                                        Working...
                                        X