Announcement

Collapse
No announcement yet.

REDIM PRESERVE : Stack corruption possible?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • REDIM PRESERVE : Stack corruption possible?

    I've got a general, hand waving sort of question here. In the video game we're working on we have a bunch of AI cars running around. These are all stored in a 1D array AICars() of a TYPE that is quite large. In a previous version of the game any time you change the number of AICars() I simply REDIM'd the array, zeroing it out, then reset the cars' positions and so forth one at a time. Recently what I wanted to do is REDIM PRESERVE this structure instead so we could add or remove a car now and then without changing the states of the previously created cars.

    For some mysterious reason though I've been having serious problems doing this. At the moment REDIM PRESERVE is called everything is fine and works as advertised. The previous elements in the array are preserved if the array is increased in size, and if it's decreased in size the upper end elements are eliminated.

    When I do this, however, later on the main game loop I start getting the typical nonsense indicative of bad pointers (there are no pointers in the AICar structure) and out of bounds arrays. One concrete example is the position (AICar(i).Position, basically). Immediately when REDIM PRESERVE is called, everything is fine. The first time through the game loop the position gets updated correctly. The second time through the game loop when it gets to the position update:

    Code:
    AICar(CurrentAICar).WorldPositionVector(%X) = _
    AICar(CurrentAICar).WorldPositionVector(%X) _
    + AICar(CurrentAICar).WorldVelocityVector(%X) * TStep
    AICar(CurrentAICar).WorldPositionVector(%X) suddenly becomes 0, as does (CurrentAICar).WorldVelocityVector(%X). Immediately before this line I write the position to a debug file and it is not 0. So I am seeing 10 + 0 = 0. I've tried getting around this by storing the position to another variable, storing the additive term to another one, then adding them together to a third, and finally setting position equal to this at the end of it, as well as several other workarounds, all to no avail.

    My question, besides the obvious "what the heck could be happening here to make 10+0=0" is this:

    If I REDIM PRESERVE a big enough array (maybe a MB or so), and I then enter a very large function, is it possible there could be some stack corruption going on? This crazy mathematical glitch (and others later on in the code; this is just the start of the ensuing avalanche) does not occur unless I PRESERVE the structure as described. It also does NOT happen the first time through the game loop after the REDIM PRESERVE was exectued. Only the second, third, etc.. This happens identically with both PB8 and PB9, so I doubt it's a compiler issue. As my associates say: "the problem is between the chair and the keyboard"

    If it's possible there's a stack corruption issue here at play, what sort of things should I be looking at to figure out what I'm doing wrong?

    My newly beloved #DEBUG DISPLAY ON triggers no out of bounds, bad pointers, etc.. No errors at all. This one has me really scratching my head and we're going to have to make a (very minor) game design change as a result of it if I can't solve it. Either way, it's just a frightening thing to me that this sort of thing can happen even when there's no out of bounds stuff happening. Has anyone else had similar experiences or have tips on what to do to minimize the risk of this sort of thing?

    Thanks :coffee4:
    Last edited by Todd Wasson; 2 Apr 2009, 08:25 AM.
    Todd Wasson
    http://PerformanceSimulations.Com
    PowerBasic Racing Simulator (October 2007 clip - 15.1MB wmv file) http:http://www.performancesimulations.co...m-GenIV-12.wmv

  • #2
    For as long as I have used PB products (1991), I have never seen stack corruption cuased by anythng other than an unhandled ERROR 1 (programmer errror).

    Bear in mind, however, that you can corrupt the stack - or any other program data - from just about anywhere in your program. That is, your "real" error might have been made hundreds or thousands of instructions previous to the problem actually exhibiting itself as above.

    If your arrays are LOCAL, the descriptor is stored on the stack... and if you corrupt that descriptor, the error will not show up UNTIL you execute some statement REQUIRING that descriptor.

    For that matter, if the arrays are STATIC or GLOBAL, those descriptors are stored in a common data segment, and since your program owns that block of memory, you can corrupt those descriptors, too.

    So to address your question... yes, you can have stack corruption, but I would bet a lot of money you corrupted it yourself elsewhere and the compiler is sans any responsibility for same.

    MCM
    Michael Mattias
    Tal Systems Inc. (retired)
    Racine WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Originally posted by Todd Wasson View Post
      Code:
      AICar(CurrentAICar).WorldPositionVector(%X) = _
      AICar(CurrentAICar).WorldPositionVector(%X) _
      + AICar(CurrentAICar).WorldVelocityVector(%X) * TStep
      AICar(CurrentAICar).WorldPositionVector(%X) suddenly becomes 0, as does (CurrentAICar).WorldVelocityVector(%X). Immediately before this line I write the position to a debug file and it is not 0. So I am seeing 10 + 0 = 0.
      Thi is probably not helpful at all but maybe will cause an alteration in thinking. You are not "seeing 10 + 0 = 0" but according to the code shown are seeing "10 + 0 * Tstep = 0".

      Also, have you printed the values of .Position(x), .Velocity and Tstep immediately after the line?

      Just thoughts from simpler, (far) less experienced eyes.

      ==========================================
      An uneducated man has one great advantage
      over an educated man.
      He does not know
      what is not possible.
      Swede
      ==========================================
      It's a pretty day. I hope you enjoy it.

      Gösta

      JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
      LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

      Comment


      • #4
        Is the array LOCAL? LOCAL arrays of UDTs are created on the stack frame. I believe that REDIM PRESERVE attempts to create a copy of the array before deleting the original, so if the arrays are large, you might be running out of stack space.

        If there is a max # of possible cars, why not just dimension the array one time and not mess with adjusting it every time cars are added or removed? You can just keep a simple counter to know how many are "live".

        Otherwise, you might consider using a GLOBAL array.
        Bernard Ertl
        InterPlan Systems

        Comment


        • #5
          >LOCAL arrays of UDTs are created on the stack frame.

          You can (should?) confirm with support, but only the descriptors are created on the stack. Memory for the array data comes from the heap.
          Michael Mattias
          Tal Systems Inc. (retired)
          Racine WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Check the Restrictions comment on the DIM statement in the Help/Docs.
            Bernard Ertl
            InterPlan Systems

            Comment


            • #7
              I've double checked and it is indeed a touch over 1.1MB. This is the TYPE size itself.

              Anyway, the array is global, so it seems this is not the problem. I am, however, tripping the FPU invalid operation and precision flags at a time that corresponds to this event, with or without PRESERVE.

              I can live with the precision flag, but an invalid operation is no good. From what I've just read, the invalid operation flag can be set when a stack problem exists. Surprise surprise. If this is the case then it appears quite likely that just as Michael said, I am screwing up the stack in some other place and it's just manifesting itself coincidentally in this situation.

              At least there's something of a trail to follow now. I need to spot the exact location where this flag is getting tripped now. I'll keep at it.

              Thi is probably not helpful at all but maybe will cause an alteration in thinking. You are not "seeing 10 + 0 = 0" but according to the code shown are seeing "10 + 0 * Tstep = 0".
              0 * Tstep is still 0. I know Tstep is not changing (it's the time step and if that was 0 there would be no movement of cars anywhere, which there is), but did not print it out.

              Michael, when you talk about the descriptors, are you referring to the names of the elements in the structure? For instance:

              REDIM AICar(x) AS AICarTYPE

              Where:

              TYPE AICarTYPE
              Position(2) AS SINGLE
              Velocity(2) AS SINGLE
              ENDTYPE

              The data you described would be (3 + 3) * 8 = 72 bytes. Are the descriptors that go in the stack here effectively handles to the names Position and Velocity in a sense? Maybe 8 bytes for each or something like that?
              Last edited by Todd Wasson; 2 Apr 2009, 12:01 PM.
              Todd Wasson
              http://PerformanceSimulations.Com
              PowerBasic Racing Simulator (October 2007 clip - 15.1MB wmv file) http:http://www.performancesimulations.co...m-GenIV-12.wmv

              Comment


              • #8
                > Michael, when you talk about the descriptors.....

                An array descriptor is a PB-proprietary structure maintained by the compiler's runtime library. Basically it's where it maintains all that info you get from LBOUND, UBOUND and ARRAYATTR.

                That said, if you have a UDT with a SIZEOF() 1.1 Mb, that will overflow the default stack (1.0 Mb) if you try to make it a SCALAR (non-array) variable; except what usually happens when you try to do that is the program GPFs on a stack fault as soon as the procedure in which that variable is defined is called.

                MCM
                Michael Mattias
                Tal Systems Inc. (retired)
                Racine WI USA
                [email protected]
                http://www.talsystems.com

                Comment


                • #9
                  Originally posted by Todd Wasson View Post
                  I've double checked and it is indeed a touch over 1.1MB. This is the TYPE size itself.

                  Anyway, the array is global, so it seems this is not the problem.
                  When you REDIM PRESERVE inside of a function/sub, are you explicitly declaring the array as GLOBAL every time? If not, PB might be attempting to dimension a local array which would cause the stack problem.
                  Bernard Ertl
                  InterPlan Systems

                  Comment


                  • #10
                    I've double checked and it is indeed a touch over 1.1MB. This is the TYPE size itself.
                    A single TYPE element is over 1MB? That's big!
                    If you pass that as a parameter to a SUB/FUNCTION then you might overflow the stack as parameters are passed on the stack.
                    The default CPU stack is 1MB so if you suspect that is the problem then try adding #STACK 2000000 at the start of your code to change the default to 2000000 from 1000000 bytes.
                    I am, however, tripping the FPU invalid operation and precision flags at a time that corresponds to this event
                    Do you use inline ASM on the FPU? The FPU stack is only 8 entries deep and it's easy to lose count and forget to clean it up so it can overflow causing unpredictable errors.
                    Also, the compiler itself expects 4 entries of the stack to be unused which it then uses a REGISTER variables. That leaves you with only 4 entries to use.


                    Paul.

                    Comment


                    • #11
                      Originally posted by Bern Ertl View Post
                      When you REDIM PRESERVE inside of a function/sub, are you explicitly declaring the array as GLOBAL every time? If not, PB might be attempting to dimension a local array which would cause the stack problem.
                      Good idea, thanks. No, I'm not doing that. I define Global AICars() AS AICarsTYPE outside the function, but do not use GLOBAL in the REDIM statement itself. If I still have problems after tracking down whatever it is in my collision detection code that is causing the FPU invalid op flag to trip, I'll give that try.

                      I do want to point out for the record, so to speak, that every element that I've checked in that array immediately after the REDIM PRESERVE statement is indeed correct. So hopefully I can find it with the FPU checking...
                      Todd Wasson
                      http://PerformanceSimulations.Com
                      PowerBasic Racing Simulator (October 2007 clip - 15.1MB wmv file) http:http://www.performancesimulations.co...m-GenIV-12.wmv

                      Comment


                      • #12
                        Originally posted by Paul Dixon View Post
                        A single TYPE element is over 1MB? That's big!
                        If you pass that as a parameter to a SUB/FUNCTION then you might overflow the stack as parameters are passed on the stack.
                        The default CPU stack is 1MB so if you suspect that is the problem then try adding #STACK 2000000 at the start of your code to change the default to 2000000 from 1000000 bytes.

                        Do you use inline ASM on the FPU? The FPU stack is only 8 entries deep and it's easy to lose count and forget to clean it up so it can overflow causing unpredictable errors.
                        Also, the compiler itself expects 4 entries of the stack to be unused which it then uses a REGISTER variables. That leaves you with only 4 entries to use.


                        Paul.
                        The TYPE is indeed huge. The vehicle simulation model has quite a lot of variables for each car, even the computer controlled ones. However, a fair amount of it is storage space for an array that holds an engine torque curve for each car that's 5000 elements. That sort of thing. There are lots and lots of vectors (various velocities, forces, torques, of different components and so on), each of which takes up 24 bytes. They add up pretty quickly.

                        I indeed wanted to try #STACK after reading some of the helpful responses in this thread, but my part of the game is a DLL so I can't use it or the PB debugger. I have no idea what size stack the C++ guys are using for the game executable. If my FPU hunting doesn't fix it, I'll ask them.

                        The FPU code I'm using is ASM taken from a wonderful post here at the PB forums several years ago. It's been an invaluable tool for me:

                        Code:
                        MACRO FpuCheck
                        
                            ! mov   eax, 0
                            ! fstsw ax                    ' load the exception flags
                            ! mov   FpuErr, eax
                        
                        '......1 - Invalid operation    -
                        '.....1. - Denormalized operand -
                        '....1.. - Zero divide          -
                        '...1... - Overflow             -
                        '..1.... - Underflow            -
                        '.1..... - Precision            -
                        '1...... - Stack Fault          -
                        
                        
                            TEXT$ = "FpuERR bit 0 Invalid operation " + FORMAT$(BIT(FpuErr,0)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 1 Denormalized operand  " + FORMAT$(BIT(FpuErr,1)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 2 Zero divide  " + FORMAT$(BIT(FpuErr,2)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 3 Overflow  " + FORMAT$(BIT(FpuErr,3)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 4 Underflow  " + FORMAT$(BIT(FpuErr,4)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 5 Precision  " + FORMAT$(BIT(FpuErr,5)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 6 Stack fault  "  + FORMAT$(BIT(FpuErr,6)) + $CRLF
                            TEXT$ = TEXT$ +  "FpuERR bit 7  " + FORMAT$(BIT(FpuErr,7)) + $CRLF
                        
                            Errors = 0
                            IF BIT(FpuErr,0) = 1 THEN Errors = 1
                            IF BIT(FpuErr,1) = 1 THEN Errors = 1
                            IF BIT(FpuErr,2) = 1 THEN Errors = 1
                            IF BIT(FpuErr,3) = 1 THEN Errors = 1
                            IF BIT(FpuErr,4) = 1 THEN Errors = 1
                            'IF BIT(FpuErr,5) = 1 THEN Errors = 1
                            IF BIT(FpuErr,6) = 1 THEN Errors = 1
                            IF BIT(FpuErr,7) = 1 THEN Errors = 1
                        
                            IF Errors THEN
                              MSGBOX TEXT$ + FUNCNAME$,,"LineNumber = " + FORMAT$(LineNumber,"#")
                            END IF
                            ! fclex
                        END MACRO
                        I just sprinkle it around in a few places and eventually zero in on the exact line that causes the problem. This is what I'm trying now.
                        Last edited by Todd Wasson; 2 Apr 2009, 04:22 PM.
                        Todd Wasson
                        http://PerformanceSimulations.Com
                        PowerBasic Racing Simulator (October 2007 clip - 15.1MB wmv file) http:http://www.performancesimulations.co...m-GenIV-12.wmv

                        Comment


                        • #13
                          Todd,
                          I think Windows uses 1MB as the default stack size so if the C++ programmers haven't increased this then I'm sure that trying to pass your 1MB+ TYPE will cause stack overflows.

                          The code you're using to check for FPU errors is useful but if you're jumping between different routines programmed by different people using different compilers then there's no guarantee that the FPU flags are being preserved by the other code.
                          Maybe this code may help to isolate any problem, it shows how to trap FPU exceptions at the time they occur rather than a test afterwards to see if one occured.
                          http://www.powerbasic.com/support/pb...ad.php?t=37821

                          Paul.

                          Comment


                          • #14
                            single TYPE element is over 1MB? That's big!
                            If you pass that as a parameter to a SUB/FUNCTION then you might overflow the stack as parameters are passed on the stack
                            !!!

                            Only if you (try to) pass that UDT parameter "BYVAL" could that happen. When BYREF what gets PUSHed on the stack is all of 32 bits (an address).

                            Then again, "code not shown"

                            My money is on corruption elsewhere in program.


                            MCM
                            Michael Mattias
                            Tal Systems Inc. (retired)
                            Racine WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              MCM,
                              it's passed in its entirity on the stack. Try the following program. Run as posted then edit the TYPE to 1000000 and run again.

                              Paul.
                              Code:
                              'PBCC5.01 program
                              #COMPILE EXE
                              #BREAK ON
                              #DIM ALL
                              
                              
                              TYPE MyType
                                  test AS LONG
                                  test2 AS STRING*1500000    '<------ change this to a 1000000 so it fits on the stack then try again
                                  test3 AS BYTE
                              END TYPE
                              
                              GLOBAL InitialStackPointer AS LONG
                               
                              FUNCTION PBMAIN() AS LONG
                              
                              LOCAL TestVar AS MyType
                                     
                              MyFunction(TestVar)
                              
                              PRINT "Made it!"
                              WAITKEY$
                              END FUNCTION
                              
                              
                              FUNCTION MyFunction(x AS MyType) AS LONG
                              'do nothing 'cos we're just testing the passing of variables
                                             
                              END FUNCTION

                              Comment


                              • #16
                                PS I get the 1,500,000 version ending with no error reported but the 1,000,000 version saying it "made it!" and waiting for a key press.

                                Comment


                                • #17
                                  >FUNCTION MyFunction(x AS MyType) AS LONG

                                  This function should NOT be getting the entire type on the stack. It should be getting exactly 32 bits = VARPTR(TestVAR)

                                  However this line...
                                  >LOCAL TestVar AS MyType

                                  Is treating TestVar as a SCALAR, so it is utilizing SIZEOF(TestVar) bytes on the stack.

                                  Change your program to
                                  Code:
                                  LOCAL TestVar() AS MyType 
                                  
                                   REDIM TestVar(0)   ' create one-element array 
                                   CALL  MyFunction (Testvar(0))
                                  .. and you'll see the difference.

                                  If that is passing that UDT by value in the absence of an override, that's a bug in the compiler. Which I refuse to believe.

                                  I think your 'one size works, one size don't' is because you enter the called procedure with sp already close to bp+stacksize and you are blowing out in that called procedure.
                                  Michael Mattias
                                  Tal Systems Inc. (retired)
                                  Racine WI USA
                                  [email protected]
                                  http://www.talsystems.com

                                  Comment


                                  • #18
                                    MCM,
                                    well, whadya know.
                                    The value is not passed on the stack but crashes as if there was a stack overflow as if it was passed on the stack.
                                    Code:
                                    'PBCC5.01 program
                                    #COMPILE EXE
                                    #BREAK ON
                                    #DIM ALL
                                    
                                    
                                    TYPE MyType
                                        test AS LONG
                                        test2 AS STRING*1500000    
                                        test3 AS BYTE
                                    END TYPE
                                    
                                    GLOBAL InitialStackPointer AS LONG
                                    
                                    FUNCTION PBMAIN() AS LONG
                                    
                                    LOCAL TestVar AS MyType
                                    
                                    !mov InitialStackPointer,esp    'get the stack pointer
                                    PRINT "Initial Stack Pointer = ";InitialStackPointer
                                    
                                    MyFunction(TestVar)
                                    
                                    PRINT "Made it!"
                                    WAITKEY$
                                    END FUNCTION
                                    
                                    
                                    FUNCTION MyFunction(x AS MyType) AS LONG
                                        LOCAL LocalStackPointer AS LONG
                                    WAITKEY$
                                    just print the value of the stack pointer as we don't need to do anything else.
                                    !mov LocalStackPointer,esp    'get the stack pointer
                                    PRINT "Local Stack Pointer = ";LocalStackPointer
                                    PRINT "Difference = " InitialStackPointer - LocalStackPointer
                                    END FUNCTION
                                    Note that the stack pointer doesn't change by the million, only by one or two hundred bytes but when the passed parameter is too big the code crashes.
                                    Looks like a bug!

                                    Comment


                                    • #19
                                      MCM,
                                      I should have read your reply more carefully before I posted.
                                      It is the local variable that I forgot to remove that caused the problem. It doesn't look like a bug any more.

                                      Paul.

                                      Comment


                                      • #20
                                        I fixed a couple of fpu problems. There were some divide by zero errors occuring at times. Eliminating that didn't fix the problem, but moved it to another piece of code in the same function. Now, a vector length calculation involving a square root triggers an invalid operation now after the REDIM PRESERVE is done. There is no negative number there to trigger it and the calculation works fine if I don't use PRESERVE. All I can do now is ask the guys to compile the exe with a bigger stack and try again. That shouldn't do anything, but it's the last resort. Also, explicitily using GLOBAL in the REDIM PRESERVE line had no effect.

                                        Thanks for the ideas on this, everyone. I've learned a couple things anyway.
                                        Todd Wasson
                                        http://PerformanceSimulations.Com
                                        PowerBasic Racing Simulator (October 2007 clip - 15.1MB wmv file) http:http://www.performancesimulations.co...m-GenIV-12.wmv

                                        Comment

                                        Working...
                                        X