Announcement

Collapse
No announcement yet.

Converting Ascii to Unicode and back

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting Ascii to Unicode and back

    I use PBDos 3.5. Can anyone suggest/offer a way in PBDos
    language to prevent the function of convert ascii to unicode and back.

    thanks.

    ------------------

  • #2
    It is not very clear on what exactly you are looking for... do you want to convert or "prevent" conversion (in which case I have no idea what you mean!).

    In general, there are localization issues to deal with when converting from ANSI to Unicode.

    One possible approach would be to write a small PB/CC app that uses the Windows MultiByteToWideChar() and WideCharToMultiByte() API functions. You can launch a PB/CC ap synchronously from a PB/DOS app (under Windows 95 or better) using the SHELL statement.

    While this approach will work, it would probably need to use disk files to pass the strings back and forth, and the speed of the SHELL operation _may_ be an issue - without knowing what you want to achieve it is not possible to guess!

    The only thing I can suggest would be to seach the Internet for some conversion tables and write a DOS app around those. Ugh!

    Can you please be more descriptive about what you wish to achieve? Thanks!

    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>
    Lance
    mailto:[email protected]

    Comment


    • #3
      Han,
      Sorry to have to ask, but could you tell me, what is unicode?
      I am assuming that ASCII is ANSI, so unicode would be ISO?
      - Rick

      ------------------

      Comment


      • #4
        Technically, ASCII started out 7-bit (due to the high data costs involved!) covers just CHR$(0) to CHR$(127). IBM 'introduced' the first 8-bit ASCII table and it became known as the "IBM extended character set" and widely adopted in the PC world.

        When Windows 1.0 was introduced, International characters became a more significant problem, and the solution was a new character set developed by ANSI/ISO, and termed "ANSI". It provided an way to specify alternative character sets to cater for Internationalization. DOS 3.0 (or possibly 3.3 - I forget) added "code page" support to provide a similar arrangement for DOS users.

        Essentially, ANSI characters below 128 are almost identical to the original IBM 7-bit ASCII definition.

        While this all seemed cool and catered to most of the world, some languages like Japanese and Hebrew were still out of reach, since more than 256 possible characters were required (according to my old notes, over 20000 chars for Japanese + Korean, etc).

        The solution was to create "double-byte" character sets (DCBS). These were a combination of 8-bit ANSI and new 16-bit characters codes, but DCBS were found to be inherently difficult to deal with, since you had to work out whether the next character in a string was 1 or 2 bytes in length, and repeat for each character!

        So, along came Unicode - it uses ONLY 16-bit (2 byte) character codes. And yes, Unicode is a ANSI/ISO standard too.

        ------------------
        Lance
        PowerBASIC Support
        mailto:[email protected][email protected]</A>
        Lance
        mailto:[email protected]

        Comment


        • #5
          Actually, there were many 8-bit character sets before IBM. The point is, here, that
          ASCII only defines a 7-bit character set, and anything over that is nonstandard.

          Unfortunately, even a bulky 16-bit character set proved to be insufficiently comprehensive,
          so Unicode now comes in a "double word character set" version: just like DCBS, but twice as
          big. So much for simplicity...


          ------------------
          Tom Hanlin
          PowerBASIC Staff

          Comment

          Working...
          X