Announcement

Collapse
No announcement yet.

Behaviour of Asciiz Parameters

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Behaviour of Asciiz Parameters

    This is a little trick that i often use with Subs and Functions that must work
    in different modes, to make the code more readable.

    I pass a parameter ByVal as Asciiz*2 and call the function with a string literal.
    The Asciiz string clips off the first character, which is subsequently processed.

    I now find that PBCC4 and 5 handle this in a different way.
    Although the workaround is simple, the issue interests me,
    and i would like to understand it. Therefore two questions :

    1. Is this technique legal ?

    2. Comparing PBCC4 and PBCC5 with the code below, i get the following reults:


    Parameter value : "Load"

    Result PBCC4 : Byref "Load", ByVal "L"

    Result PBCC5 : ByRef "L", ByVal "Lo"

    Does anyone understand the logic of this.
    Especially the last case where two characters are passed puzzles me a bit.

    Arie Verheul

    Code:
       
    [FONT="Courier New"][SIZE="2"]#DIM ALL
        
    '----------------------------------
    SUB Test1 (S AS ASCIIZ*2)
        
        PRINT "ByRef",S
        
    END SUB
    '----------------------------------
    SUB Test2 (BYVAL S AS ASCIIZ*2)
       
        PRINT "ByVal",S
       
    END SUB
    '----------------------------------
    FUNCTION PBMAIN () AS LONG
        
        Test1 "Load"
        Test2 "Load"
       
        WAITKEY$
       
    END FUNCTION
    '----------------------------------[/SIZE][/FONT]

  • #2
    Regardless of compiler, the correct answer is always "L", as ASCIIZ variables require one of the length characters for the terminating null.

    Do with that info what you will.
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      > Is this technique legal ?

      Everything you have done is "legal" in that all statements coded are valid.

      "Good Programming Practice" is another issue, albeit highly subjective.

      >Subs and Functions that must work in different modes,

      Different "modes?" What does that mean?

      The functions shown do what they do; what are you trying to do?

      MCM
      Michael Mattias
      Tal Systems (retired)
      Port Washington WI USA
      [email protected]
      http://www.talsystems.com

      Comment


      • #4
        Who is to blame, me or the compiler ?

        Well, there is no difference in opinion that the correct result should be "L", and i can even
        understand that if a parameter is passed by reference, that this may cause that all the supplied
        data is passed, here with PBCC4 resulting in "Load", even if this is not what was specified.
        But it really puzzles me how PBCC5 comes to "Lo", if the data is passed by value. I tend to say
        that this is simply wrong, and it may cause nasty problems if one is not aware of this.

        The subs in the example just demonstrate the issue. In reality i use this when combining very
        similar, but slightly different, tasks into one sub to save on code size and overhead.
        For my own convenience, i like to call them with some meaningful parameter, rather than 0 or 1.
        I am aware that there are other ways to do this, but as this method always worked well
        there was not much reason to change it. However, as i recently changed from PBCC 4 to 5,
        the issue came to the light as it caused problems with existing code.

        What most interests me is, whether i force the compiler into unpredictible behaviour with
        my possibly unusual programming practice, or conversely if my method should be considered
        as acceptable, and the observed behaviour of the compiler should be considered as a flaw
        in the compiler.

        Arie Verheul

        Comment


        • #5
          >But it really puzzles me how PBCC5 comes to "Lo"....

          A long time ago we had a word for this: "bug"

          Maybe it's just a poor demo on your part, but writing a separate procedure just to get get the first character of a string literal sure seems like the long way.....I would think the first character of a string literal is itself... a string literal.
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Because the below code produces "L" in both cases, both compilers, and because passing a string literal to an ASCII*2 defined parameter SUB results in unpredictable results in both compilers, I think it may not be "legal" even tho the compliers don't flag it. It may be like trying to send the sub a dynamic string, which the compiler flags as a parameter mismatch. Likely PB support could answer that definitively if you sent them the question.
            Code:
            #DIM ALL
            '----------------------------------
            SUB Test1 (S AS ASCIIZ * 2)
            
                PRINT "ByRef",S
            
            END SUB
            '----------------------------------
            SUB Test2 (BYVAL S AS ASCIIZ * 2)
            
                PRINT "ByVal",S
            
            END SUB
            '----------------------------------
            FUNCTION PBMAIN () AS LONG
                LOCAL str AS ASCIIZ * 2
                str = "Load"
                Test1 str
                Test2 str
            
                WAITKEY$
            
            END FUNCTION
            '---------------------

            Comment


            • #7
              > I think it may not be "legal" even tho the compliers don't flag it

              If the compiler does not flag it at compile time, by definition it's legal - or a bug.

              Since you got "L" in all cases using a variable, it's obvious that assignment to an ASCIIZ variable is working correctly.

              In this case there's another "question" involved: how can the assignment of a four-character literal to a two-character (one usable, since one is reserved for the null) be "legal?"

              Unless.....

              Perhaps string overflow is supposed to work just like numeric overflow, where no ERR is raised but 'results are unpredictable?'

              ???

              --------------------------------
              Befuddled in America's Dairyland
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                According to CC5's help file (and PBWin9 as well), your trick is not just "legal", Arie, it has been predicted and could be expected to behave differently:
                Code:
                [B][SIZE="3"]ASCIIZ strings[/SIZE][/B]
                You can think of ASCIIZ strings (or its synonym ASCIZ) as fixed-length strings
                where the last character is always a nul (CHR$(0) or $NUL) terminator.  Like
                fixed-length strings, ASCIIZ strings contain character data of fixed length
                and any attempt to assign a string longer than the defined length will result
                in truncation.
                
                If you assign a string (a string literal, or a dynamic or fixed-length string
                variable) to an ASCIIZ string that is shorter than the defined length, the
                string will not be padded on the right.  Instead, the nul-terminator goes right
                after the last string character.  The contents of the remainder of the string
                buffer are undetermined.  Because an ASCIIZ string requires the nul-
                terminator byte, ASCIIZ strings are usually defined with a length of at least
                two bytes.
                Out of curiosity, I ran your code snippet in PBWin9 (had to remove WAITKEY$ and switch PRINT with MSGBOX of course), and PBWin9 behaves as CC5 does. The Byref-variant returns only the first character. (Which, to me when reading the doc, is correct, and even CC4 was in error). However, I am puzzled by the two-letter response when using BYVAL. It doesn't seem to fit any equation ....

                I'd say that even CC4 is in error when displaying the full 4-letters.

                The effect has to do with the way you define your variables - in fact, you never define any. Your ASCIIZ variables have no storage. Correct, they are defined in your SUBs definition, but that only tell the SUBs to expect an ASCIIZ variable when being called - it doesn't actually reserve any storage (well, actually, it does - it is being reserved on the stack during call. BYREF reserves a DWORD to be used as a pointer to the ASCIIZ string, BYVAL will in this case - if I read the docs correctly - also reserve space for a DWORD, but this would point to an expected COPY of the original ASCIIZ string, which may explain why CC4 behaves as it does).
                You supply the SUBs with a 4-letter string constant. For PB to handle that, it has to create a temporary two-letter ASCIIZ space at run-time, truncate the string constants correctly there and THEN pass the pointer to the temporary space on to the SUBs.
                It could (should?) be argued that PB should be able to handle these cases, but it doesn't always do that.
                Like me, you'll probably find that even if it compiles just fine, there are a number of situations where constants cannot be used in place of a variable holding the same constant's value. And you, Arie, have shown why. Types must match.

                If you change your PBMAIN() as shown below, everything works fine in CC5, CC4 (and PBWin9 for that matter) - all return the expected one letter "L":
                Code:
                FUNCTION PBMAIN () AS LONG
                
                 LOCAL z AS ASCIIZ * 2
                 z = "Load"
                    Test1 z
                    Test2 z
                
                    WAITKEY$
                
                END FUNCTION
                Certainly, this is a bug. PB docs do mention things like "...PB has to create a temporary copy during call..." often enough to make one expect this to always work. But it doesn't. However, with the newer compiler versions, this mechanism has apparently improved. I just wish that PB would publish a "known issues" or a bug-list, but they don't. That could make situations like this much easier to handle. I've asked for such a list, and they refused.

                However, I would recommend you to at least implement the method suggested in the PBMAIN() above as relying on PB to correctly create a two-char ASCIIZ storage for you when you are giving it a 4-character constant is what I would call to rely on "behind the curtains behaviour" - functionality that is not immediately clear. Your expectation will probably be clear, as would mine, but the two of us could still disagree.
                If such "curtain" functionality is overused, things will change over time (as in your case here) and code will more often fall over unexpectedly and it will probably be hard to detect too. Everything contain bugs; your code, my code, PB, Windows, BIOS, other applications that your code cooperate with or just merely coexist with etc. etc. If you rely on clear, easy to spot functionality as much as you can, you have strongly improved your ability to track down your own bugs - and maybe isolate the other's bugs as well.

                ---

                To make this comment even longer - I would suggest an entirely different SUB/function mode-selection mechanism that is still easy to read. Use numeric equates. If you rely on LONGs for this, your code will gain some speed too. E.g.
                Code:
                ' Code for CC5 but should work for any CC version
                #COMPILE EXE
                #DIM ALL
                
                %ThisBehaviour = 1
                %ThatBehaviour = 2
                %NewBehaviour  = 3
                '----------------------------------
                SUB Test (lMode AS LONG)
                
                    IF lMode = %ThisBehaviour THEN
                       PRINT lMode
                      ELSEIF lMode = %ThatBehaviour THEN
                       PRINT lMode
                      ELSE
                       ' Behaviour is still to be determined
                       PRINT lMode
                    END IF
                
                END SUB
                '----------------------------------
                FUNCTION PBMAIN () AS LONG
                
                 LOCAL lMode AS LONG
                 
                 lMode = %ThisBehaviour
                 Test lMode
                 lMode = %ThatBehaviour
                 Test lMode
                 lMode = %NewBehaviour
                 Test lMode
                
                 WAITKEY$
                
                END FUNCTION
                '----------------------------------
                BTW: In this case, calling SUB Test using " Test %ThisBehaviour" does work. This is also a simpler situation, as there are 32-bit intergers all the way. It is still pointers being reserved on the stack, but it is pointing to a temporary integer and no truncation is necessary. It is handled nicely.

                This is as readable as you can name the equates. Best of all, as long as the equates aren't explicitely used in a variable or as a parameter, they don't grow your compiled code a single byte.


                ViH

                ----------------
                "Old" sayings:
                - "If debugging is the process of removing bugs, then programming must be the process of putting them in."
                - "Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work."

                Comment


                • #9
                  Thanks all

                  Thanks all for your comment, as this has cleared up a lot.
                  Once the issue is known it can be easily avoided, and i will certainly change this.
                  But as up to recently the previous method always worked fine, there was not much reason to even think about it.
                  The issue nicely demonstrates what may happen at the edges of the specified behaviour of the compiler.

                  Arie Verheul

                  Comment


                  • #10
                    Buffer OverFlow Error?

                    I've been thinking about this since I originally read your message and wrote a reply several times but didn't send it. I have finally given up on my thinking and pose my thoughts as a question.

                    I am wondering if setting parameter lengths shorter than what is past could lead to a buffer overflow security bug? We all read about them where some hacker causes a buffer overrun in some product and is able to execute some inserted code and take over a machine. http://en.wikipedia.org/wiki/Buffer_overflow

                    Is the better approach ASCIIZ with no length specified? In the function make sure the length is what is expected or truncate to expected length.

                    Thoughts anyone?


                    Originally posted by Arie Verheul View Post
                    This is a little trick that i often use with Subs and Functions that must work
                    in different modes, to make the code more readable.

                    I pass a parameter ByVal as Asciiz*2 and call the function with a string literal.
                    The Asciiz string clips off the first character, which is subsequently processed.

                    I now find that PBCC4 and 5 handle this in a different way.
                    Although the workaround is simple, the issue interests me,
                    and i would like to understand it. Therefore two questions :

                    1. Is this technique legal ?

                    2. Comparing PBCC4 and PBCC5 with the code below, i get the following reults:




                    Does anyone understand the logic of this.
                    Especially the last case where two characters are passed puzzles me a bit.

                    Arie Verheul

                    Code:
                       
                    [FONT="Courier New"][SIZE="2"]#DIM ALL
                        
                    '----------------------------------
                    SUB Test1 (S AS ASCIIZ*2)
                        
                        PRINT "ByRef",S
                        
                    END SUB
                    '----------------------------------
                    SUB Test2 (BYVAL S AS ASCIIZ*2)
                       
                        PRINT "ByVal",S
                       
                    END SUB
                    '----------------------------------
                    FUNCTION PBMAIN () AS LONG
                        
                        Test1 "Load"
                        Test2 "Load"
                       
                        WAITKEY$
                       
                    END FUNCTION
                    '----------------------------------[/SIZE][/FONT]

                    Comment


                    • #11
                      FWIW, this clip from the cite above.......
                      The contents of the remainder of the string buffer are undetermined.
                      ...is VERY important.

                      eg
                      Code:
                      FUNCTION Foo () AS LONG
                        LOCAL  sz  AS ASCIIZ * 8 
                      
                      ' at entry, buffer contents are initialized to "<nul><nul><nul><nul><nul><nul><nul><nul>"
                        sz = "12345"    
                        'buffer contents are now   "12345<nul><nul><nul>"
                        sz = "A"
                        'buffer contents are now   "A<nul>345<nul><nul><nul>"
                      If you need to be sure your ASCIIZ variable contains your string followed by nul for the balance of the buffer...
                      Code:
                         RESET  sz
                      MCM
                      Michael Mattias
                      Tal Systems (retired)
                      Port Washington WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment

                      Working...
                      X