Announcement

Collapse

Documentation

All current versions of the PowerBASIC documentation are available for download/viewing here:

PowerBASIC Links
See more
See less

VARIANT Assignments don't honour variable or constant types.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VARIANT Assignments don't honour variable or constant types.

    When a numeric variable or constant is assigned to a variant it appears that PB only creates variants of type 5 (Double) or 3 (Long) unless you specify AS Type in the assignment

    Based on some testing, it appears that unless you specifically use AS Type in your assignment
    • All Floats (variable or constant) are converted to Double (VT = 5)
    • Integral constants (including Quads) less than or equal to the maximum value of a Long are converted to Long. (VT=3)
    • Integral variables other than Quads are converted to Longs (VT = 3)
    • All Quad variables and Quad constants larger than the maximum value of a Long are converted to Double.(VT = 5)
    If you want to store a QUAD as such in a variant, you need to do something like this:
    '
    Code:
    vq1 = 77777777777777777&&
    ? STR$(VARIANTVT(vq1))  'returns 5 (double)
    vq2 =77777777777777777&& AS QUAD
    ? STR$(VARIANTVT(vq2)) 'returns 20 (quad)
    '
    Note using "AS EXT" in the assignemnt causes a compile error. It is not supported as a variant type
    Last edited by Stuart McLachlan; 22 Jun 2022, 07:14 PM.

  • #2
    Hello Stuart, did you find a workaround for this?

    Im trying to store an unicode string to a variant, but VARIANTVT returns 8 instead of 31.

    Code:
    LOCAL V AS VARIANT
    
    V = "WIDE STRING" AS WSTRING
    
    MSGBOX FORMAT$(VARIANTVT(V)​)
    The table from the help file states it should be stored as %VT_LPWSTR (Unicode String), but it returns %VT_BSTR (dynamic string).​
    www.patreon.com/pluribasic

    Comment


    • #3
      > Im trying to store an unicode string to a variant,

      A pet peeve of mine. "Unicode string" tells us very little. A string can store Unicode code points in several different ways. Commonly as UTF-8 or UTF-16 which may or may not be Null terminated

      Help is incomplete/misleading. it should say "null-terminated wide string". (Bob never did get the difference between "unicode" and specific unicode encodings and regulary misused the word "unicode".)
      SImilarly %VT_LPSTR (30) is actually a STRINGZ. not just an "ANSI string"
      If you want a VT_LPWSTR, you are referring to a null terminated UTF-16 / WSTRINGZ

      https://learn.microsoft.com/en-us/wi...wtypes-varenum
      VT_LPWSTR
      Value: 31
      A wide null-terminated string.​


      Unfortunately, this appears to be a missing feature n PB. Possibly because of the incomplete understanding mentioned above
      '
      Code:
      FUNCTION PBMAIN() AS LONG
          LOCAL v AS VARIANT
          LOCAL ws AS WSTRING
          LOCAL wsz AS WSTRINGZ * 64
          wsz = "This is a null terminated wide string"
          'v = ws AS WSTRING ' this compiles and returns 8
          v = wsz AS WSTRINGZ ' but this doesn't compile - gives Data type mismatch (with or without * 64)
           MSGBOX STR$(VARIANTVT(v))
      END FUNCTION
      '


      If indeed you are trying to store a WSTRING, not a WSTRINGZ then VARIANTVT 8 (%VT_BSTR) is correct:
      https://learn.microsoft.com/en-us/pr...ectedfrom=MSDN

      A BSTR is a composite data type that consists of a length prefix, a data string, and a terminator. The following table describes these components.
      Length prefix A four-byte integer that contains the number of bytes in the following data string. It appears immediately before the first character of the data string. This value does not include the terminator.
      Data string A string of Unicode characters. May contain multiple embedded null characters.
      Terminator A NULL (0x0000) WCHAR.

      Comment


      • #4
        Just to clarify. It helps to read Help on VARIANT$ / VARIANT$$ to get a grasp on this.

        Comment


        • #5
          Ahh, I see. I think its very clear what the compiler is doing. It would help also if VARIANTVT returned a more descriptive content identifier though. I think i got it, thanks!
          www.patreon.com/pluribasic

          Comment


          • #6
            Just to show that VT_BSTR is always a WSTRING and that PB converts between STRING and WSTRING automagically when going to and from variants, check out what you get with VARIANT$(BYTE,....)

            Note that TXT.PRINT does not truncate at the first embedded CHR$(0) in a string, it displays CHR$(0) as a narrow space - (Unicode &H2009?)
            '
            Code:
            #COMPILE EXE
            #DIM ALL
            %UNICODE = 1
            #INCLUDE ONCE "WIN32API.INC"
            FUNCTION PBMAIN() AS LONG
                LOCAL lDebug AS LONG: TXT.WINDOW EXE.FULL$, 200,50,40,85 TO lDebug
            
                LOCAL vs,vws AS VARIANT
                LOCAL s AS STRING
                LOCAL ws AS WSTRING
                s = "String"
                ws = "Wide string"
                vs= s
                vws = ws
                TXT.PRINT VARIANT$(vs)
                TXT.PRINT VARIANT$(vws)
                TXT.PRINT VARIANT$$(vs)
                TXT.PRINT VARIANT$$(vws)
                TXT.PRINT VARIANT$(BYTE,vs)
                TXT.PRINT VARIANT$(BYTE,vws)
             'Finalise
                TXT.COLOR = %RGB_BLUE
                TXT.PRINT
                TXT.PRINT "  ....Press any key to exit": TXT.WAITKEY$: TXT.END
            END FUNCTION
            '


            Click image for larger version  Name:	variant.jpg Views:	1 Size:	23.3 KB ID:	820335

            Comment


            • #7
              . . . that PB converts between STRING and WSTRING automagically . . .
              From WSTRING to STRING characters 0 to 255, yes. Above 255, many get converted to similar characters 255 and below; some become "?".

              From STRING to WSTRING it stays an ANSI character with a zero byte tacked on because there if no way knowing what is was before.

              The quoted statement is correct only if you stick to English/many Latin based .
              Dale

              Comment


              • #8
                When converting some of the functionality of OPEN to Python i found that using UTF-8 causes trouble, while using Latin fixes all issues. I suspect Bob used Latin (instead of utf-8) internally for some of the automatic conversions... have you experienced this?
                www.patreon.com/pluribasic

                Comment


                • #9
                  Conversion from UTF-8 to UTF-16 (wide){or ANSI for that matter} is not automatic. And ANSI characters 128 to 255. are 2 bytes in UTF-8. Use UTF8TOCHR$() to convert to WSTRING or to STRING.

                  OPEN, whether expecting WSTRING or STRING is confused by UTF-8. The 2 byte UTF-8 characters use the upper bits of each byte to identify the number of bytes. Only if bit 7 is 0 is it one a byte character (aka ASCII).

                  See -
                  UTF-8 - Wikipedia

                  Cheers,
                  Last edited by Dale Yarker; 22 Dec 2022, 03:49 AM.
                  Dale

                  Comment


                  • #10
                    While out on my doctor ordered walk thought of . . .

                    Conversely, if you feed something expecting UTF-8 a byte with bit 7 set, it is expecting at least bit 6 also set and at least 1 more byte to finish the current character, not a byte of the next character. In short confused this way too. Use ChrToUtf8$.

                    I've no evidence Bob did anything with what you call "Latin".

                    (called Latin 1 Supplement at Wikipedia, also has many symbols.)

                    Cheers,
                    Last edited by Dale Yarker; 22 Dec 2022, 05:32 AM.
                    Dale

                    Comment


                    • #11
                      Originally posted by Brian Alvarez View Post
                      When converting some of the functionality of OPEN to Python i found that using UTF-8 causes trouble, while using Latin fixes all issues. I suspect Bob used Latin (instead of utf-8) internally for some of the automatic conversions... have you experienced this?
                      I doubt that it fixes "all issues". Using Latin (presumably ISO-8859-1​ (Latin1) , WIndows Code Page 1252), you are likely to run into problems if you use characters in in the 128-255 range if what you are working with expects UTF-8.

                      PB doesn't work with UTF-8 at all internally.

                      Bob used ANSI code pages for STRINGs and UTF-16 for WSTRINGS and I'm fairly certain that he used the standard WIn32 MultiByteToWideChar function for String to Wstring and WideCharToMultiByte for Wstring to String - both with the first parameter set to CP_ACP​ ( i.e. use the default Windows Code Page for the system that the application is running on.)

                      You should not try to work with UTF-8 strings in your application. If you want to use UTF-8 between PB and other systems, you need to use ChrToUtf8$ as the last step in the sending process and Utf8ToChr$ as the first step in the receiving process.

                      As for OPEN - if you are talking about filenames, Windows doesn't use UTF-8 for them, it uses UTF-16.
                      And read /write expects either an ANSI string or a WSTRING unless you are OPENing for BIINARY,
                      If you OPEN a UTF-8 text file and try to treat tha contents as either ANSI (STRING) or UTF-16 (WSTRING) unconverted, you are likely to run into all sorts of issues.

                      Comment


                      • #12
                        Originally posted by Stuart McLachlan View Post
                        I doubt that it fixes "all issues"
                        So far python hasn't complained, but i will keep an eye open. Before writing the string to binary a file, i have a routine to convert it to all "bytes", i assume thats why it has worked so far. Perhaps in some test i havent done yet, I may run into an issue.

                        Interestingly i was having a lot of conversion issues if I used ISO-8859-1, which AFAIK should have worked. I had to explicitly state 'latin-1' format... I am not sure why.
                        www.patreon.com/pluribasic

                        Comment

                        Working...
                        X