No announcement yet.

PB9 and UTF-8

  • Filter
  • Time
  • Show
Clear All
new posts

  • PB9 and UTF-8

    I know how to convert an UTF-8 string into an Ansi string using API or the functions José Roca developed in there:
    and they are working fine.

    But I would like to know if it is possible to convert an UTF-8 string into an Ansi string only using PB9 native ACODE$ / UCODE$ functions. So far I was not able to get correct result.

    I was thinking that something like the following would do the job but not:
    strUnicodeString = UCODE$(strUTF8String, %CP_UTF8)
    strAnsiString = ACODE$(strUnicodeString)
    Am I missing something? (quite sure )

    Thanks in advance.

  • #2
    Interesting question!

    UTF8 is not a "code page" but a "character encoding". As usual, Windows adds to the confusion by prefacing what is essentially a series of character encoding constants with %CP.

    The PB9 documentation makes it clear that it is expecting an actual ANSI code page number, if used (such as 1252 = Western, 1251 = Cyrillic, etc.) and not one of these %CP values.

    A UTF8 encoded string cannot normally be converted to a single ANSI code page anyway. If the text could be encoded using an 8-bit character set it probably would have been, unless it is done for reasons of system portability (then, I guess, some analysis of the characters included would be necessary to determine which, if any, ANSI code page it will convert to).
    - LJ


    • #3
      Use MultibyteToWideChar to convert from your code page to UTF-16, then use WideCharToMultibyte to convert from UTF-16 to UTF-8

      The constant for UTF-8 is:
      %CP_UNICODE_UTF8 = 65001
      Note: In PB you must use a STRING or block pointer to contain the UTF encoded character array as it may contain NULLs. | Slam DBMS | PrpT Control | Other Downloads | Contact Me