Announcement

Collapse
No announcement yet.

RunTime Debug and Error Handling Part II - Find that elusive bug (Discussion)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RunTime Debug and Error Handling Part II - Find that elusive bug (Discussion)

    This is for discussion (and hopefully someone can point out some of my more 'SORE' points) at tracking down bugs, errors, or just plain ole "Why did I NOT think of that???)

    It all comes about from "What did I not think of?" point of attack.

    Larger projects are hard to find where mistakes, may have happened (either from lack of knowledge or mis-interpretation of documentation)

    Anyways, I hope the source code and my comments help someone else make sense of documentation and tracking down a bug they think is not their code

    RunTime Debug and Error Handling Part II - Find that elusive bug
    Engineer's Motto: If it aint broke take it apart and fix it

    "If at 1st you don't succeed... call it version 1.0"

    "Half of Programming is coding"....."The other 90% is DEBUGGING"

    "Document my code????" .... "WHYYY??? do you think they call it CODE? "

  • #2
    From my original post, I finally got "Array Out Of Bounds" to behave correctly but Division by zero only half works.
    (probably why I commented them out.)

    Does anyone have any commented details to what the Assembly variables in CONTEXT mean? (closest I found are the state of the registries at the point of crash, and if it was not for Paul Dixon, I would never have gotten this far)

    so far the only docs I can find for CONTEXT show the declares (and it appears there are 2 completely different ones so I have no clue what OS each applies to??)

    The search goes on...and at least I am a heck of a lot farther off than I was when I started.

    Engineer's Motto: If it aint broke take it apart and fix it

    "If at 1st you don't succeed... call it version 1.0"

    "Half of Programming is coding"....."The other 90% is DEBUGGING"

    "Document my code????" .... "WHYYY??? do you think they call it CODE? "

    Comment


    • #3
      >From my original post, I finally got "Array Out Of Bounds" to behave correctly

      ???

      #DEBUG ERROR ON, test ERR for value 9.

      Remember, an array bounds violation may not cause a Windows exception... you may simply be reading or writing memory to which you have 'physical' (actually 'virtual') permissions to do so, but the compiler's runtime library detects that you have exceeded the 'application-logical' address range.

      PB arrays are logical and the system knows nothing about them by the time it gets a chance to do anything.

      MCM
      Michael Mattias
      Tal Systems (retired)
      Port Washington WI USA
      [email protected]
      http://www.talsystems.com

      Comment


      • #4
        MCM, you are only PARTLY right, depending on how you meant your post.

        According to the doc's

        #DEBUG ERROR option specifies whether the compiler should generate code that checks for array boundary and null-pointer errors wherever they may occur. The default setting is OFF.

        When #DEBUG ERROR mode is ON, any attempt to access an array outside of its boundaries, or attempting to use a null-pointer will generate a run-time Error 9 ("Subscript/Pointer out of range"), and the statement itself is not executed.
        My Testing thus far proves this as, if #DEBUG ERROR ON, then you will indeed get a %EXCEPTION_ARRAY_BOUNDS_EXCEEDED value for a GPF

        So far, I hpe to improve what I have posted, and even a few that eludes me to date, so that I can at least log errors that I did not cause (well they were errors that I did cause, but I did not know about them, or I would have built in protection to NOT cause them.)
        Engineer's Motto: If it aint broke take it apart and fix it

        "If at 1st you don't succeed... call it version 1.0"

        "Half of Programming is coding"....."The other 90% is DEBUGGING"

        "Document my code????" .... "WHYYY??? do you think they call it CODE? "

        Comment


        • #5
          My Testing thus far proves this as, if #DEBUG ERROR ON, then you will indeed get a %EXCEPTION_ARRAY_BOUNDS_EXCEEDED value for a GPF
          Hmm... if Windows is gettting an exception, then it sounds like that must be how PB currently implements #DEBUG ERROR ON... by raising that exception and then catching it.

          But since that is proprietary I'm not so sure it would be a good idea to rely on it.. or even assume that's how it's really done.

          SDK doc doesn't help me a whole lot here either...

          EXCEPTION_ARRAY_BOUNDS_EXCEEDED The thread tried to access an array element that is out of bounds and the underlying hardware supports bounds checking
          .. because I don't understand how the operating system and/or hardware can know about an "array element".



          MCM
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Michael,
            I don't know how the compiler does it but there is a CPU opcode for that purpose:
            BOUND—Check Array Index Against Bounds
            If the index is not within bounds, a BOUND range exceeded exception (#BR) is signaled.
            The OS doesn't know about the array element, the BOUND instruction is given the upper and lower bounds of the array and the index to check. If the index is outside of the array then a Bound Range exception occurs.

            Paul.

            Comment


            • #7
              Cliff,
              but Division by zero only half works
              The compiler doesn't always do things the way you might expect. Are you sure the times you "divide by zero" the disvision is really taking place as you expect?
              Code:
              'PBCC5.01 program
              #DEBUG ERROR ON
              FUNCTION PBMAIN
              
              b##=2
              c##=0
              
              a##=b##/c##       'no error because, by default, the FPU exception flags are disabled
              PRINT a##
              PRINT ERR,ERROR$
              
              y&=2
              z&=0
              
              x&=y&/z&          'no error because PB uses the FPU to do this division so the above comment applies again!
              PRINT x&
              
              PRINT ERR,ERROR$
                  
              WAITKEY$
              
              END FUNCTION
              Does anyone have any commented details to what the Assembly variables in CONTEXT mean?
              You'll have to be more specific, I don't know what you're asking for.

              Paul.

              Comment


              • #8
                Paul,
                For example from the Win32Api.inc
                Code:
                TYPE CONTEXT
                  '
                  ' The flags values within this flag control the contents of
                  ' a CONTEXT record.
                  '
                  ' If the context record is used as an input parameter, then
                  ' for each portion of the context record controlled by a flag
                  ' whose value is set, it is assumed that that portion of the
                  ' context record contains valid context. If the context record
                  ' is being used to modify a threads context, then only that
                  ' portion of the threads context will be modified.
                  '
                  ' If the context record is used as an IN OUT parameter to capture
                  ' the context of a thread, then only those portions of the thread's
                  ' context corresponding to set flags will be returned.
                  '
                  ' The context record is never used as an OUT only parameter.
                  '
                  ContextFlags AS DWORD
                
                  ' This section is specified/returned if CONTEXT_DEBUG_REGISTERS is
                  ' set in ContextFlags.  Note that CONTEXT_DEBUG_REGISTERS is NOT
                  ' included in CONTEXT_FULL.
                  Dr0 AS DWORD
                  Dr1 AS DWORD
                  Dr2 AS DWORD
                  Dr3 AS DWORD
                  Dr6 AS DWORD
                  Dr7 AS DWORD
                
                  ' This section is specified/returned if the
                  ' ContextFlags word contians the flag CONTEXT_FLOATING_POINT.
                  FloatSave AS FLOATING_SAVE_AREA
                
                  ' This section is specified/returned if the
                  ' ContextFlags word contians the flag CONTEXT_SEGMENTS.
                  regGs AS DWORD
                  regFs AS DWORD
                  regEs AS DWORD
                  regDs AS DWORD
                
                  ' This section is specified/returned if the
                  ' ContextFlags word contians the flag CONTEXT_INTEGER.
                  regEdi AS DWORD
                  regEsi AS DWORD
                  regEbx AS DWORD
                  regEdx AS DWORD
                  regEcx AS DWORD
                  regEax AS DWORD
                
                  ' This section is specified/returned if the
                  ' ContextFlags word contians the flag CONTEXT_CONTROL.
                  regEbp AS DWORD
                  regEip AS DWORD
                  regCs AS DWORD      ' MUST BE SANITIZED
                  regFlag AS DWORD    ' MUST BE SANITIZED
                  regEsp AS DWORD
                  regSs AS DWORD
                END TYPE
                I have no clue as to what each flag is to mean in plain english.
                Engineer's Motto: If it aint broke take it apart and fix it

                "If at 1st you don't succeed... call it version 1.0"

                "Half of Programming is coding"....."The other 90% is DEBUGGING"

                "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                Comment


                • #9
                  Cliff,
                  A CONTEXT record can be thought of as a register dump of a selection of registers. Sometimes it just records what the registers are at the time and sometimes it can be used to set registers to a particular state. The OS would do both when switching threads, recording the current thread details before switching to the new thread.

                  The processor contains lots of registers, not only the general purpose registers you'd normally use such as EAX and ECX and the FPU registers but segment registers, SSE registers, debug registers and more.
                  Most of these are of little use to you unless you program operating systems.

                  The CONTEXT record is a general purpose record which can contain all of these types and more and is not just limited to 32bit x86 processors but can have information on registers of other CPUs as well.

                  The ContextFlags are included to tell you which groups of regisers are included in any particular CONTEXT record. There is 1 bit to flag each possible set of registers. If the bit is set then that group of regsters is present in the CONTEXT record.
                  If a flag is not set then those registers are not present in the CONTEXT record.
                  I have no clue as to what each flag is to mean in plain english.
                  It's really simple, you're probably thinking there's more to it than there is.

                  CONTEXT_FLOATING_POINT means the FPU registers and state are present in the CONTEXT record.

                  CONTEXT_SEGMENTS means the segment registers are present. You wouldn't normally use these as they're fixed by the OS.

                  CONTEXT_INTEGER means the 6 general purpose registers are present, EAX, EBX, ECX, EDX, ESI, EDI.

                  CONTEXT_CONTROL means the registers EBP, EIP, ESP, Flags, Code Segment and stack segment are included in the CONTEXT record.

                  CONTEXT_DEBUG_REGISTERS means the debug registers are included in the CONTEXT record. You wouldn't normally use these as they're privileged instructions used by the OS but you can see what they contain (if they're in the CONTEXT record) by looking here:
                  http://www.intel.com/design/PentiumI...als/243192.htm
                  ..in section 15.2

                  If the flag says the registers are present then you can look at them and get useful information. If the flag says a group of registers is not present then you'll get garbage back if you try to read them.

                  Does that help at all?

                  Paul.

                  Comment


                  • #10
                    It helps a little bit (but reading and re-reading what bits and peices I can find, maybe my questions can be answered below.
                    TYPE CONTEXT
                    ' The flags values within this flag control the contents of a CONTEXT record.
                    '
                    ' If the context record is used as an input parameter, then for each portion of the context record controlled by a flag whose value is set, it is assumed that that portion of the
                    ' context record contains valid context. If the context record' is being used to modify a threads context, then only that' portion of the threads context will be modified.
                    '
                    ' If the context record is used as an IN OUT parameter to capture the context of a thread, then only those portions of the thread's
                    ' context corresponding to set flags will be returned.
                    '
                    ' The context record is never used as an OUT only parameter.
                    '*** <--- To All of the Above (Except "Flags" as values) all I can say is HUHHHHHHH>???????????
                    '
                    ContextFlags AS DWORD
                    ' This section is specified/returned if CONTEXT_DEBUG_REGISTERS is set in ContextFlags. Note that CONTEXT_DEBUG_REGISTERS is NOT included in CONTEXT_FULL.
                    Dr0 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr0 mean?????)
                    Dr1 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr1 mean?????)
                    Dr2 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr2 mean?????)
                    Dr3 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr3 mean?????)
                    Dr6 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr4 mean?????)
                    Dr7 AS DWORD '<--- Debug Register but to WHAT???? (What does Dr5 mean?????)

                    ' This section is specified/returned if the ContextFlags word contians the flag CONTEXT_FLOATING_POINT.
                    FloatSave AS FLOATING_SAVE_AREA '<--- I have to research this once I get the other values figured out

                    ' This section is specified/returned if the ContextFlags word contians the flag CONTEXT_SEGMENTS.
                    regGs AS DWORD '<--- RegGs means????
                    regFs AS DWORD '<--- RegFs means????
                    regEs AS DWORD '<--- RegEs means????
                    regDs AS DWORD '<--- RegDs means????

                    ' This section is specified/returned if the ContextFlags word contians the flag CONTEXT_INTEGER.
                    regEdi AS DWORD '<--- RegEdi means????
                    regEsi AS DWORD '<--- RegEsi means????
                    regEbx AS DWORD '<--- RegEbx means????
                    regEdx AS DWORD '<--- RegEdx means????
                    regEcx AS DWORD '<--- RegEcx means????
                    regEax AS DWORD '<--- RegEax means????

                    ' This section is specified/returned if the ContextFlags word contians the flag CONTEXT_CONTROL.
                    regEbp AS DWORD '<--- I think RegErrorBasePointer (Pointer to begginning of function with Error)
                    regEip AS DWORD '<--- I think RegErrorInterruptPointer (Pointer to where Error Occurred)
                    regCs AS DWORD ' MUST BE SANITIZED '<--- I think RegContextSwitch (CONTEXT_SEGMENTS/CONTEXT_INTEGER/CONTEXT_CONTROL ) and how is it sanitazed? (Clorox, bleach, Thermite device?????)
                    regFlag AS DWORD ' MUST BE SANITIZED '<--- I think RegFlag (No CLUE what that is) and how is it sanitazed? (Clorox, bleach, Thermite device?????)
                    regEsp AS DWORD '<--- I think RegSafePointer (Next Valid Pointer to jump to like Resume Next does??)
                    regSs AS DWORD '<--- I think RegSS (Did HITLER document this code or something????)
                    END TYPE
                    Most every example I can find, is Assembly language, and my minor experience with Assembly was over 15 years ago, so you can see why I have no clue as to what each of the variables mean.
                    Engineer's Motto: If it aint broke take it apart and fix it

                    "If at 1st you don't succeed... call it version 1.0"

                    "Half of Programming is coding"....."The other 90% is DEBUGGING"

                    "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                    Comment


                    • #11
                      Sad part is, my stuck point is at DivisionByZero

                      Code:
                           x=0.0
                           y=1.0
                           z=y/x                              'Floating Point Division (/) by zero, I am protected by PB
                           PRINT# ErrorCode.ErrorLogNumber, STRING$(20, "*") + SPACE$(5) + "Performing Floating Point DivByZero: z = y/x Results in PB protecting me" + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
                      Code:
                           c=b\a                             'Integer Division (\) by zero, still has problems with my code          '<--- To be investigated
                      MSGBOX "I Got Here"
                           PRINT# ErrorCode.ErrorLogNumber, STRING$(20, "*") + SPACE$(5) + "Performing Integer DivByZero: c = b\a Results in allowing an error" + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
                      I know, not complete, but hoping to solve it myself if I can, but suffice it to say a and y are zero and I am just dividing some value by zero (both floating point divide (which PB seems to protect me from) and integer (which I can NOT for the life of me figure out where to point to so I can resume and skip the error???)
                      Engineer's Motto: If it aint broke take it apart and fix it

                      "If at 1st you don't succeed... call it version 1.0"

                      "Half of Programming is coding"....."The other 90% is DEBUGGING"

                      "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                      Comment


                      • #12
                        Cliff,
                        you'll need to know a bit more about the CPU and its internal workings to understand the CONTEXT RECORD's details as the context record deals with the very low level stuff in the CPU.

                        I suggest you download the Intel manuals.
                        For a start, get this one:
                        http://download.intel.com/design/Pen...s/24319002.PDF
                        and read section 3.6 . It'll expalin the basic registers. Without that information you'll be lost.



                        A CONTEXT record is a copy of the the CPU's registers at a time of interest. That time of interest may be when an interrupt occurs or when an exception occurs or when a task swith occurs. The CONTEXT of the CPU is just the values of the CPU registers at that time.
                        You can read a context, i.e. take a copy of the registers and store it in a context record.
                        You can also write a context, i.e. write a copy of the registers back from a context record to the registers themselves.

                        For example, if your program is interrupted then it needs to be able to return to exactly where it was so the registers could be copied into a CONTEXT record, the interrupt routine then does its job, then the registers can be restored from that context record and your program can continue executing from where it left off.
                        <--- RegEdi means????
                        The CPU has 8 general purpose 32-bit registers called EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESI

                        RegEdi means that this item in the context record is a copy of the EDI regitser.
                        regEax AS DWORD '<--- RegEax means????
                        RegEax means that this item in the context record is a copy of the EAX regitser


                        I think you'll get the others now!


                        In addition to the general purpose registers there are a number of segment segisters. Just ignore them. They are set by the operating system and user programs have no control over them.
                        They are regGs ,regFs, regEs, regDs


                        The DEBUG registers are again of no interest unless you write operating systems. They are privileged registers. If you really want to know what they contain the follow the link I gave previously:
                        http://www.intel.com/design/PentiumI...als/243192.htm
                        ..in section 15.2


                        The FPU registers are of interest but you say you'll look at them later.


                        regCS is the code segment
                        regSS is the Stack Segment just ignore segment addresses, they're of little use unless you write operating systems.

                        regEip is the instruction pointer (that tells you the address of the instruction being executed when the context record was taken).
                        regFlag is the flags of the CPU.
                        regEsp is the stack pointer. It points to the first location of the CPU's stack.


                        The reference to "sanitized" will be because the flags and the instruction pointer can't just be copied and restored because if an interrupt occurs then the flags and instruction pointer are altered by that interrupt, the instruction pointer will be pointing at the interrupt routine doing the context saving rather than the location in the code where the interupt occured so the value of EIP and the flags will have been recovered from the stack rather than directly dumped from registers.



                        Other bits:
                        EBP = Extended Base Pointer. It's actually a standard register but is commonly used as a base pointer (i.e. a pointer the the start of something, usually on the stack.)
                        EXTENDED in all the register names just means it's a 32bit register as they evolved from smaller, non-extended 16-bit registers in earlier CPUs.


                        ESP = Extended Stack Pointer. so its a 32bit register (Extended) and it points to the last used position on the stack

                        EIP is the extended Instruction Pointer (the program counter)

                        I hope that helps.

                        Paul.

                        Comment


                        • #13
                          Cliff,
                          I can NOT for the life of me figure out where to point to so I can resume and skip the error???)
                          see the example posted here:
                          http://www.powerbasic.com/support/pb...ad.php?t=37821

                          It shows how to trap an integer divide by zero (which resumes at an error trap) or in the case of a privileged intruction exception, it shows how to resume at the next instruction.

                          If you want to resume at the next instruction after a divide by zero then you need to work out how many bytes the divide instruction takes and adjust the regEip accordingly before. The following line does this in my example:
                          Code:
                          INCR @[email protected]    'return execution to the byte after the one that caused the exception
                          but I KNOW in that case the CLI or STI instruction was only 1 byte. For a divide by zero you don't know the instruction length so you need to work it out and adjust regEip to match before you RESUME_EXECUTION.

                          It's not really useful to continue exection at the next instruction after a divide by zero error.

                          Paul.

                          Comment


                          • #14
                            Thanx Paul, I am getting closer in my understanding. (At least now I know what EBP and ESP and others stand for)

                            For a divide by zero you don't know the instruction length so you need to work it out and adjust regEip to match before you RESUME_EXECUTION.
                            Thats exactly what I am trying to achieve, knowing where to continue at irregardless of where the error was. I just log it and continue

                            It's not really useful to continue exection at the next instruction after a divide by zero error.
                            Normally I agree but in the case of my demo and future logging use, it will come in handy to see the log and see where I made a mistake and just never saw it before
                            Engineer's Motto: If it aint broke take it apart and fix it

                            "If at 1st you don't succeed... call it version 1.0"

                            "Half of Programming is coding"....."The other 90% is DEBUGGING"

                            "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                            Comment


                            • #15
                              Cliff,
                              if you carry on executing after a divide by zero error then the result is going to wrong and is likely to cause errors in future calculations. It's much better to stop on the first error and fix it before carrying on.

                              If you really want to continue then you'll have to disassemble the instruction pointed to by regEip to see how long it is and increment regEip by that amount before continuing execution.

                              Paul.

                              Comment


                              • #16
                                To ask the question.... of one who might actually know something about the CPU instructions...

                                Might there be an instruction to "test if this will result in a divide by zero error?" without actually generating the Windows' exception?

                                eg
                                Code:
                                ASM push  somereg    dividend
                                ASM push somereg     divisor
                                ASM testforDivideByZero    ; instruction needed, maybe sets some flag bit if true?
                                ???

                                MCM
                                Michael Mattias
                                Tal Systems (retired)
                                Port Washington WI USA
                                [email protected]
                                http://www.talsystems.com

                                Comment


                                • #17
                                  MCM,
                                  that would be checking if the divisor was 0.
                                  Code:
                                  !mov SomeRegister, divisor
                                  !cmp SomeRegister, 0
                                  !je  DivideByZeroIminent
                                  Paul.

                                  Comment


                                  • #18
                                    Well I guess I will now have to research how to disassemble an instruction and its size before I can solve this.

                                    Paul, I have been reading the docs you pointed me to, but I am getting more than ever.

                                    I do find it odd that every example I have found involves assembly to do the job. (maybe thats my hint??)
                                    Engineer's Motto: If it aint broke take it apart and fix it

                                    "If at 1st you don't succeed... call it version 1.0"

                                    "Half of Programming is coding"....."The other 90% is DEBUGGING"

                                    "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                                    Comment


                                    • #19
                                      Cliff,
                                      the example program mentioned earlier http://www.powerbasic.com/support/pb...ad.php?t=37821
                                      only uses ASM because I needed to control the FPU registers. You can disassemble the instruction without ASM but an understanding of ASM certainly helps when you're working at such a low level.

                                      Paul.

                                      Comment


                                      • #20
                                        >that would be checking if the divisor was 0.

                                        Is it that simple?

                                        I was thinking 'bigtime' overflow would cause that error ... so I ran this..

                                        Code:
                                        #COMPILE EXE
                                        #DIM ALL
                                        
                                        FUNCTION PBMAIN () AS LONG
                                             
                                             LOCAL X, Y, Z AS SINGLE
                                             
                                             LET X =  1E37
                                             LET Y =  1E-37
                                             LET Z =  X/Y
                                             STDOUT FORMAT$(ERR)
                                             STDOUT FORMAT$(Z)
                                             WAITKEY$
                                        
                                        END FUNCTION
                                        ..and all I got was a bogus result: 9.999999E+998

                                        So I changed to:
                                        Code:
                                          LOCAL X, Y, Z AS EXT
                                             LET X =  1E4000
                                             LET Y =  1E-4000
                                             LET Z =  X/Y
                                             STDOUT FORMAT$(ERR) 
                                             STDOUT FORMAT$(Z)
                                        .. and all got was a bogus result with more digits:9.99999999999999E+998

                                        Hmm, I have to rethink my concerns about 'divide by zero' since testing for zero is not really all that difficult to code. I probably even have some code here I could modify to handle that.

                                        Does not deal with the bogus data with no ERR, but it's a start.

                                        What you are saying, supported by above test, is "divide by zero exception occurs when you divide by zero". So let's test that
                                        Code:
                                             LET X =  1E4000
                                             LET Y =  0## ' 1E-4000
                                             LET Z =  X/Y
                                             STDOUT   FORMAT$(ERR)
                                             STDOUT FORMAT$(Z)
                                             WAITKEY$
                                        Huh? No exception! I just get the same bogus result. Now I am confused.

                                        MCM
                                        (using PB/CC 5.01)
                                        Michael Mattias
                                        Tal Systems (retired)
                                        Port Washington WI USA
                                        [email protected]systems.com
                                        http://www.talsystems.com

                                        Comment

                                        Working...
                                        X