Announcement

Collapse
No announcement yet.

Unicode

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unicode

    Why does this display A instead of ABC so ACODE$ must be used?
    Also curious if using another code page would make a difference? Never used this before.
    Code:
    %unicode=1
    REM #OPTION ANSIAPI
    'https://learn.microsoft.com/en-us/windows/win32/intl/code-page-identifiers
    TYPE MyType
     ws AS WSTRING * 2
     b(1 TO 2) AS BYTE
    END TYPE
    
    FUNCTION PBMAIN AS LONG
     'UCODEPAGE 20297 'France?
     'UCODEPage 20273 'Germany?
     'UCODEPAGE  20285 'United Kingdom
     LOCAL typ AS MyType
     typ.ws = "AB"
     typ.b(1)= 67    'tried both ways 67,0  0,67
     typ.b(2) = 0
     ? typ           'A not ABC
     ? ACODE$(typ)   'ABC
    END FUNCTION


  • #2
    Type seems to have a problem with WString.
    Try:

    Code:
    %unicode=1
    
    TYPE MyType
     ws AS WSTRING * 2
     b(1 TO 2) AS BYTE
    END TYPE
    
    FUNCTION PBMAIN AS LONG
    
     LOCAL typ AS MyType
    
     typ.ws ="A" & CHR$$(911)                           ' 911 = Ώ
     typ.b(1)= 67    'tried both ways 67,0  0,67
     typ.b(2) = 0
    
     ? typ                    'A ??C
     ? ACODE$(typ)   'A?C  only Ansi Chars
    
     ? BITS$(WSTRING, typ)   'AΏC  all Unicode Chars
    
    
    #IF %DEF(%PB_CC32)
        WAITKEY$
    #ENDIF
    END FUNCTION
    
    
    ​

    Comment


    • #3
      Mike,

      From a purely binary point of view, WSTRINGs contain $NUL characters like A<NUL>B<NUL>C<NUL> and other control characters. That's why they take up twice the memory per character as STRINGs.

      ASCII Windows APIs (like the one called by MSGBOX) stop reading a string with they hit a $NUL, so you see only the first letter. So to view the wide string "directly", you would need to 1) pass the wide string to a Unicode version of an API or 2) use ACODE$ and the ASCII API.
      "Not my circus, not my monkeys."

      Comment


      • #4
        Thanks, makes sense.

        Comment


        • #5
          Horst,
          Thanks for the BITS$ solution
          Code:
          TYPE MyType
           ws AS WSTRING * 2
           b(1 TO 2) AS BYTE
          END TYPE
          
          FUNCTION PBMAIN AS LONG
           LOCAL typ AS MyType
           LOCAL wtemp AS WSTRING
           typ.ws = "AB"
           typ.b(1)= 67
           typ.b(2) = 0
           ? BITS$(WSTRING, typ) 'ABC
          END FUNCTION
          This function copies the exact contents of a string expression to a string variable without making any ANSI/UNICODE conversions. It assumes that the data already matches the format specified by the director word STRING or WSTRING. This functionality will not often be needed, so a certain amount of caution should be used.
          For example, in older versions of PowerBASIC, there were no WIDE string variables available. It was therefore necessary to store Unicode data in an ANSI byte string. In updating these programs, you may find you need to transfer this WIDE data to a WIDE variable, but without the automatic internal conversion normally provided by the compiler. BITS$ provides just that functionality. Of course, it can copy bytes from WIDE to ANSI as well.

          Last edited by Mike Doty; 20 Sep 2022, 02:10 PM.

          Comment


          • #6
            'MSGBOX itself can handle wide strings; but if somewhere along the way 8 bit
            'string is assumed, a $NUL will stop it.
            '
            Code:
            #compile exe
            #dim all
            %unicode=1
            type MyType
             ws as wstring * 2
             b(1 to 2) as byte
            end type
            union MyUnion
              AType as MyType
              wStr as wstring * 3
            end union
            function pbmain as long
            
             local typ as MyUnion
             typ.atype.ws = "AB"
             typ.atype.b(1)= 94
             typ.atype.b(2) = 1
             ? typ.wstr           'ABS (S-cedilla). {"C" didn't demo WSTRING in MSGBOX}
            
            end function '
            Cheers,
            Dale

            Comment


            • #7
              Originally posted by Mike Doty View Post
              Why does this display A instead of ABC so ACODE$ must be used?
              Also curious if using another code page would make a difference? Never used this before.
              Code:
              TYPE MyType
              ws AS WSTRING * 2
              b(1 TO 2) AS BYTE
              END TYPE
              ...
              LOCAL typ AS MyType
              typ.ws = "AB"
              typ.b(1)= 67 'tried both ways 67,0 0,67
              typ.b(2) = 0
              ? typ 'A not ABC
              ? ACODE$(typ) 'ABC
              Because the UDT "typ" is not a WSTRING. Unless you specify a member, in which case it's type is known, the entire UDT is just a byte string which PB displays as ANSI characters for the MSGBOX and it stops displaying at the first Null.

              ACODE$ converts each pair of bytes into a "best effort" ANSI character.

              A different code page will have no effect in this case since you don't have any pair of bytes representing anything about Chr$(127) so they are the same ASCII/ANSI character in all cases.

              BOTTOM LINE. A UDT is just a string of bytes. What any particular bytes represent are a function of the definition of each member.

              It's hinted at iin Help where it says: To allow easy conversion, PowerBASIC allows a User-Defined Type in a string expression. The User-Defined Type is simply copied, byte for byte, into the expression."
              Help also cautions against mixing UDTs treated as byte strings with WSTRINGs in a couple of places.

              Note also that when you "tried both ways", 0,67 gave your AB? (I'll leave it up to you to determine what two byte UNICODE character is represented by 0,67)

              Comment


              • #8
                Originally posted by Horst Donath View Post
                Type seems to have a problem with WString.
                Try:
                Type does NOT have a problem with WSTRING.
                Programmers have a problem when they think of an entire UDT as anything other than a byte string.

                Comment


                • #9
                  Originally posted by Dale Yarker View Post
                  'MSGBOX itself can handle wide strings; but if somewhere along the way 8 bit
                  'string is assumed, a $NUL will stop it.
                  Not assumed. A WSTRING is not being passed to the message box. It is receiving a bare UDT which IS a byte string.

                  Comment


                  • #10
                    Programmers have a problem when they think of an entire UDT as anything other than a byte string.​
                    I resemble that remark

                    Comment


                    • #11
                      Anyway, MSGBOX does WSTRINGs just fine, if it gets a WSTRING.
                      Dale

                      Comment


                      • #12
                        Code:
                        type MyType
                           ws as wstring * 2
                           b(1 to 2) as byte
                        end type​
                        Was this supposed to be UNION rather than a TYPE so you could look at that WSTRING as individual characters?
                        Michael Mattias
                        Tal Systems (retired)
                        Port Washington WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          Looking back at it, yes.

                          Comment


                          • #14
                            > > Was this supposed to be UNION rather than a TYPE so you could look at that WSTRING as individual characters?

                            > Looking back at it, yes.


                            But b(1 to 2) AS BYTE is not the same size as WSTRIINGZ *2.
                            b(1) and b(2) would not be the two characters, they would be the two bytes comprising the first "WIDE" (Unicode) character.
                            To look at thi ASC values of the the individual characters, you would need
                            UNION typ
                            ws WSTRINGZ *2
                            w(1 TO 2) AS WORD 'or INTEGER
                            END UNION

                            Comment


                            • #15
                              w(1 TO 2) AS WORD 'or INTEGER
                              Or ....

                              Code:
                               AS WSTRING * 1
                              .. which seems more "context-friendly" to me.
                              Michael Mattias
                              Tal Systems (retired)
                              Port Washington WI USA
                              [email protected]
                              http://www.talsystems.com

                              Comment

                              Working...
                              X