Announcement

Collapse
No announcement yet.

Reading plain text IN dos ? Funny input!

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reading plain text IN dos ? Funny input!

    I am attempting to read a text file using PowerBasic DOS. I have checked the PowerBasic manuals, searched online, and I am positive the answer is staring at me in the face, but I am completely missing it!! :SHOCKED: Here are three lines from the data file for example:
    "358293","Pearl and Hermes Atoll","Island","HI","15","Honolulu","003","275000N","1755000W","27.8333333","-175.8333333","","","","","0","Unknown","09/30/2003",
    "358294","Laysan Island","Island","HI","15","Honolulu","003","254615N","1714415W","25.7708333","-171.7375","","","","","3","Unknown","09/30/2003",""
    "358295","Barking Sands","Beach","HI","15","Kauai","007","220418N","1594652W","22.0716667","-159.7811111","","","","","0","Kekaha","02/06/1981",""

    (yes, each line is on a single line, but they are three separate lines. I separated the lines using colors so you can tell them apart.)
    Ok, they all read with spaces interspersed with spaces. When my program reads this file for example, "Pearl and Hermes Atoll" becomes " P e a r l a n d H e r m e s A t o l l " including the double quotes! Would this be an example of a binary file or anything? This data file contains some characters that have diatribe marks on them. That's ok, right? (I hope so!)

    Here is my line that I am reading the above with:
    Input #1,FeatId$,FeatNm$,FeatCl$,StatAl$,StatNu$,CntyNa$,CntyNu$,PLaDMS$,PLoDMS$,PLaDEC$,PLoDEC$,SLaDMS$,SLoDMS$,SLaDEC$,SLoDEC$,Elevat$,MapNam$,DateCr$,DateEd$

    Only thing I know to do at this point it remove the spaces from each and every variable one at a time, and rerun my program at that point. That would take a couple days- the data file is that large. Does anyone see something stupid I may have done?

    Please excuse my use of colors in this post. I was trying to be clear and make sure it was understandable.

    Thank you for your understanding, everyone.

    Robert
    Last edited by Robert E. Carneal; 7 Jul 2009, 11:31 AM.

  • #2
    Looks to me like you are reading a unicode file.

    A possible fix:

    temp$
    for x = 1 to len(rString$) step 2
    temp$ = temp$ + mid$(rString$,x,1)
    next x
    There are no atheists in a fox hole or the morning of a math test.
    If my flag offends you, I'll help you pack.

    Comment


    • #3
      Unicode. <cuss> (IF I say what I am thinking, someone would probably kick me off the site.) First time I ever ran across one. Now that I know, the solution is deceivingly simple. I was sure this was laughing at my face. (But, hey, I never saw a Unicode file before, so can I be forgiven?) Thanks! :coffee3:

      Robert

      Comment


      • #4
        Originally posted by Robert E. Carneal View Post
        Unicode...can I be forgiven?)
        Sure. Why not? Besides, do you think you are the only one?
        There are no atheists in a fox hole or the morning of a math test.
        If my flag offends you, I'll help you pack.

        Comment


        • #5
          temp$
          for x = 1 to len(rString$) step 2
          temp$ = temp$ + mid$(rString$,x,1)
          next x
          It works. Since I did that- I found a faster way, but non-Powerbasic. All I had to do was do this:

          C:\> type Unicode.txt > file.txt

          This removes the Unicode of this. You need to decide if that is what you really want to do first before doing it. I don't know how to reverse this is you do it by accident.

          Maybe that will help someone else? I did not know you could do that. I was impressed with the speed of that- it was really quick.

          Thanks,

          Robert

          Comment


          • #6
            You could do it like this

            Code:
            Function FixTextLine(TextLine As String) As String
             
               Dim OutString   As String
               Dim i           As Word
               Dim o           As Word
               Dim pIn         As Byte Ptr
               Dim pOut        As Byte Ptr
             
               pIn = StrPtr(TextLine)
               OutString = Space$(Len(TextLine))
               pOut = StrPtr(OutString)
               o = 0
             
               For i = 0 To Len(TextLine)-1
                  If @pIn[i] < 127 And @pIn[i] > 0 Then  ' Modify this range as needed or add ELSEIF sections to expand
                     @pOut[o] = @pIn[i]
                     Incr o
                  End If
               Next i
             
               FixTextLine = Left$(OutString,o)
             
            End Function
             
            Dim LineData As String
             
            Open "MYFILE.TXT" For Input As 1
            Line Input #1, LineData
            Close 1
             
            Print "Data as read from the file:"
            Print LineData
            Print
            Print "Fixed Data:"
            Print FixTextLine(LineData)
            End
            Not tested, but it should work. :coffee4:
            Last edited by Scott Slater; 7 Jul 2009, 07:43 PM.
            Scott Slater
            Summit Computer Networks, Inc.
            www.summitcn.com

            Comment


            • #7
              If they are spaces Chr$(32) then use the following instead.

              Code:
              Function FixUnicodeTextLine(TextLine As String) As String
               
                 Dim OutString   As String
                 Dim i           As Word
                 Dim o           As Word
                 Dim pIn         As Byte Ptr
                 Dim pOut        As Byte Ptr
               
                 pIn = StrPtr(TextLine)
                 OutString = Space$(Len(TextLine))
                 pOut = StrPtr(OutString)
                 o = 0
               
                 For i = 0 To Len(TextLine)-1 Step 2
                    @pOut[o] = @pIn[i]
                    Incr o
                 Next i
               
                 FixTextLine = Left$(OutString,o)
               
              End Function
              Pointers should go lots faster than string concatenation and MID$
              Scott Slater
              Summit Computer Networks, Inc.
              www.summitcn.com

              Comment


              • #8
                C:\> type Unicode.txt > file.txt

                Above the Hard/Long way? Or at least "out of your program's control?"

                Code:
                OPEN MyFileFor BINARY AS #6529
                GET   #6529, LOF(6529), UnicodeText$ 
                RegularText$ = ACODE$(UnicodeText$) 
                SEEK 6529, FILEATTR(6529,-2&)
                PUT$ 6529, RegularText$
                SETEOF #6529
                CLOSE  #6529
                Not sure it's UNICODE?

                Let's add one more step
                Code:
                OPEN MyFileFor BINARY AS #6529
                GET   #6529, LOF(6529), UnicodeText$ 
                IF ISTRUE (IsTextUnicode (BYVAL STRPTR(UnicodeText$, %NULL))  THEN
                     RegularText$ = ACODE$(UnicodeText$)
                     SEEK 6529, FILEATTR(6529,-2&)
                     PUT$ 6529, RegularText$
                     SETEOF #6529
                END IF 
                CLOSE  #6529
                Leaves non-Unicode text alone, converts Unicode to ANSI

                MCM
                Last edited by Michael Mattias; 7 Jul 2009, 08:11 PM.
                Michael Mattias
                Tal Systems Inc. (retired)
                Racine WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #9
                  :doh:

                  This is PowerBASIC for MS-DOS isn't it?

                  Never mind.

                  MCM
                  Michael Mattias
                  Tal Systems Inc. (retired)
                  Racine WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Originally posted by Michael Mattias View Post
                    :doh:

                    This is PowerBASIC for MS-DOS isn't it?

                    Never mind.

                    MCM
                    Hehe, It's hard to remember what is supported in PB-DOS sometimes after using all of the new stuff in the Windows compilers for so long.
                    Scott Slater
                    Summit Computer Networks, Inc.
                    www.summitcn.com

                    Comment


                    • #11
                      Part of me is saying "take a few days and work with your MS-DOS compiler. You haven't touched it for five years and you are getting really rusty."

                      The other part is saying "Don't do that, it would be a waste of time because MS-DOS is dead."
                      Michael Mattias
                      Tal Systems Inc. (retired)
                      Racine WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        I think it's not neccessary to reverse, since you now have two files, the original one and a modified one. Anyway if you need to reverse the modified may be you can open it with the Notepad and use the option 'save as', selecting the unicode format.

                        Comment


                        • #13
                          When dealing with TEXT and not knowing if UniCode or not....I TOTALLY aggree with you
                          Unicode. <cuss> (IF I say what I am thinking, someone would probably kick me off the site.)
                          I have wasted DAYYYYYS before (less often lately) before all the sudden it hits me....."WTF??? could there be unprintable characters in this text???"

                          More often than not, thats what it turns out to be, and just never occurred to me to even check (especially when none of the printable text has any extended characters)

                          a quick idea to track down would be to search each character and verify it at least is between 0 and 127 and then go from there for characters you do not expect.
                          Engineer's Motto: If it aint broke take it apart and fix it

                          "If at 1st you don't succeed... call it version 1.0"

                          "Half of Programming is coding"....."The other 90% is DEBUGGING"

                          "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                          Comment


                          • #14
                            Originally posted by Cliff Nichols View Post
                            a quick idea to track down would be to search each character and verify it at least is between 0 and 127 and then go from there for characters you do not expect.
                            Be careful with zero. If it is the default ansi to unicode conversion then the high byte for each character will be zero. Think of it as 8-bit chars converted to 16-bit with leading zero fill.

                            Also.... From the original post, it looks like some of the island names are using the higher unicode characters. What they will look like when stripped of the high byte may not look at all correct. How to fix that I can't say, just mentioning it as a possible problem.
                            Last edited by Joseph Cote; 17 Jul 2009, 12:46 PM. Reason: add'l thought
                            The boy just ain't right.

                            Comment

                            Working...
                            X