Announcement

Collapse
No announcement yet.

RunTime Debug and Error Handling Part II - Find that elusive bug (Discussion)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cliff Nichols
    replied
    I am up to the point of deciphering the Mod/RegOpCode/RM that are in ModRM but I really do not get what the doc is telling me from the charts its showing. (I foggy get it, but then I lose it)

    If I can get just 1 instruction format worked out, then I can work out the other instructions based on the sequece of steps to be performed to get the next memory location.

    Leave a comment:


  • Cliff Nichols
    replied
    Paul, I kinda have it, but unsure if a byte (or bytes) exist.
    What I have so far is
    Code:
        CASE %STATUS_INTEGER_DIVIDE_BY_ZERO
    LOCAL bp AS BYTE PTR
    LOCAL length, md, rm, opcode AS LONG
    FOR i = 1 TO 3
            bp = [email protected]                          'Set byte pointer to Extended Interrupt Pointer
    length = 0
    MSGBOX STR$(@bp) + $CR + STR$(@bp[1]) + $CR + STR$(SIZEOF(@bp))
    'IF @bp = &hD4 AND @bp[1] = 0 THEN                                'Pointer = AAM—ASCII Adjust AX After Multiply (AAM)
    '    'its an AAM 0 instruction
    '    length = 2
    'END IF
    '*** If I have this correct then
    SELECT CASE @bp
    '*** Get any INSTRUCTION PREFIXES   'Up to four prefixes of 1-byte each (optional)        '<--- If I have this correct then Array element 0 = Instruction PreFixes
         CASE &h66 _                              'Operand-size override
         ,&h67 _                               'Address-size override
         ,&hf0 _                               'LOCK prefix
         ,&hf2 _                               'REPNE/REPNZ prefix (used only with string instructions).
         ,&hf3 _                               'REP prefix (used only with string instructions) OR REPE/REPZ prefix (used only with string instructions) OR Streaming SIMD Extensions prefix
         ,&h2e _                               'CS segment override prefix
         ,&h36 _                               'SS segment override prefix
         ,&h3e _                               'DS segment override prefix.
         ,&h26 _                               'ES segment override prefix
         ,&h64 _                               'FS segment override prefix
         ,&h65 _                               'GS segment override prefix
         ,&h0F                                 'Streaming SIMD Extensions prefix (see Section B.4.1., “Instruction Prefixes” in Appendix B, Instruction Formats and Encodings)
    '*** There is a prefix (or more) so scan prefixes as they may affect the operand sizes
              WHILE @bp=&h66 OR @bp=&h67 OR @bp=&hf0 OR @bp=&hf2 OR @bp=&hf3 OR @bp=&h2e OR @bp=&h36 OR @bp=&h3e OR @bp=&h26 OR @bp=&h64 OR @bp=&h65 OR @bp=&h0F
                  bp=bp+1
                  Length = Length +1
              WEND
    '*** No Prefix, so start sorting OpCodes          '<--- If I have this correct then Array element 1 = OpCodes
         CASE &hD4                                         'ASCII adjust AX after multiply (2 bytes regardless if 2nd byte has a value)
              SELECT CASE @bp[1]
                   CASE 10, &H0A                              'Pointer ---> AAM—ASCII Adjust AX After Multiply (AAM)
                        length = 2
                   CASE 0, &H0                                'Pointer ---> Adjust AX after multiply to number base imm8
                        length = 2
              END SELECT
    END SELECT
    
    
    ''need to scan prefixes here first as they may affect the operand sizes
    'WHILE @bp=&h66 OR @bp=&h67 OR @bp=&hf0 OR @bp=&hf2 OR @bp=&hf3 OR @bp=&h2e OR @bp=&h36 OR @bp=&h3e OR @bp=&h26 OR @bp=&h64 OR @bp=&h65
    '    'there's a prefix
    '    bp=bp+1
    '    Length = Length +1
    'WEND
    Where I think
    @bp[0] or @bp = Pointer to Instruction
    @bp[0] (if it exists, points to INSTRUCTION PREFIXES)
    @bp[0] (if it does not exist, not sure if [0] exists and just points to zero??? or if it exists and just has a zero value?
    @bp[1] (would = Opcode, but unsure if other handling makes it appear as [0] ????)

    the same when I get to ModR/M and SIB when I get there, but for the time being, I have only docs to relearn from and my mild intro to assembly (15-20 years ago) to fall back on.

    (Amazing what one can dig out of cobwebs when needed )

    As I am one that has always learned from not just "Here's how you do it", but also "Here's Why") I really appreciate you helping me figure this out.

    Much like math (my weak point) I could never see the How and the why until I was shown both, and when the lightbulb hit (to me I wanted to scream...."WHYYYYY did they just NOT SAY SOOOOOooooo" !!!!)

    (I guess its a brain wiring thing??? I can always see the "HOW if I see a useful example along side, and a short explaination as to WHY along side")

    Thank you sooooo much for pointing out each point and showing me an example to work from, I would NEVER have figured out as much as I have without your assistance.

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    the AAM instuction always has 1 opcode and 1 operand.
    The opcode is AAM &hD4
    The operand is the next byte, whatever that is.

    I only chose &hD4 followed by &h00 because that is the only time this opcode can cause a divide by zero error and you were talking at the time of trapping divide by zero errors.

    The default for the instruction, as it was its intended useage, is to use &hD4 followed by &h0A as that will correct for a base 10 multply which is appropriate to BCD arithmetic.


    If you are checking for division by zero then you need to be looking for &hD4 followed by &h00
    If you are now trying to determine the instruction length then the AAM is always 2 bytes .. plus prefixes.
    Even though prefixes aren't appropriate, if they are present they still need to be taken into account as they make the instruction longer.

    Paul.

    Leave a comment:


  • Cliff Nichols
    replied
    I did not realize what sort of digging I will have to do, but if I understand the doc correctly, is my translation of your code correct? (1st try so forgive me if I am incorrect since I am just starting to figure this out)

    Your Snippet
    Code:
    'IF @bp = &hD4 AND @bp[1] = 0 THEN                                'Pointer = AAM—ASCII Adjust AX After Multiply (AAM)
    '    'its an AAM 0 instruction
    '    length = 2
    'END IF
    My Translation
    Code:
    SELECT CASE @bp
         CASE &hD4                                         'ASCII adjust AX after multiply (2 bytes regardless if 2nd byte has a value)
              SELECT CASE @bp[1]
                   CASE 10, &H0A                              'Pointer ---> AAM—ASCII Adjust AX After Multiply (AAM)
                        length = 2
                   CASE 0, &H0                                'Pointer ---> Adjust AX after multiply to number base imm8
                        length = 2
              END SELECT
    END SELECT
    Now if I more or less got that translation correct I think I have something to build from (until I get lost again that is )

    Thanx again for alllllll the assistance you have been giving

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    @bp does mean @BytePointer but &bp=&hF7 could be one of many instructions. The 3 opcode bits in the folloing ModR/M byte determine which instruction it is. If the 3 bits = 110 then it's a DIV and if the 3 bits are 111 then it's IDIV. Other bit values mean other opcodes.
    Also, &HF6 is also DIV and IDIV but for BYTE operands instead of DWORD or WORD operands.

    There may be a disassembler out there that could disassemble a few bytes for you. It might be a lot easier than doing it all yourself.
    You could try this as a start. I haven't used it but it looks like what you need:
    http://www.ollydbg.de/disasm.zip

    Paul.

    Leave a comment:


  • Cliff Nichols
    replied
    Thanx Paul,
    If this leads me where I think it will lead me (still downloading the doc)

    Will I need to set up a bunch of equates that make sense to me? Or does PB have a doc that tells me a set of equates?

    Aka:
    IF @bp = &hF7
    from your example (and just my initial guess until I can read the doc) would mean "If bytePointer = DIV" (Do I need to create an equates that reads "%DIV = &hF7" or is there an inc, or other doc that tells me an equates = &hF7 ????

    Just curious since it could save a ton of time to not reinvent the wheel

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    the code I posted is not complete, I'm sure there'll be bugs in it but it was intended to show you how you might do it.
    If you want to understand it then read the Intel manual:
    http://download.intel.com/design/Pen...s/24319102.PDF

    Chapter 2, Instruction Format.


    Once you understand that fully and correct and expand on my example code fully then you'll be able to skip any number of instructions before resuming execution, but I really can't think why you'd want to!

    Paul.

    Leave a comment:


  • Cliff Nichols
    replied
    Paul,
    I see where you are going with this. I accidently discovered one of my tests if I added 6 to the regeip then I could continue after a Division by zero.

    At first I started thinking....6??? 6 means WHAT???? in equates terms.....

    Then after testing some values I saw I was off in my testing, I re-read your post about how it could be any number of bytes. (Correcting my wondering if 6 meant 48 bits koinky dink = Far Pointer) sort of thinking.

    Your code replacement for DivisionByZero works. (Now I have to learn what all the hardcoded values mean. And see if I can make it not only skip 1 instruction, but maybe 2 instructions (Like skipping 2 lines in my code))

    Thanks for the help, I am starting to see why all the examples I find online all show assembly code.

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    no guarantees that this works in all cases especially if you use address override prefixes, but this code snippet might show you how to go about doing what you want.
    Insert this in the exception handler where it checks for CASE %STATUS_INTEGER_DIVIDE_BY_ZERO
    Code:
        CASE %STATUS_INTEGER_DIVIDE_BY_ZERO
            PRINT "Integer divide by zero at address &h";HEX$(ThisExceptionRecord.ExceptionAddress)
            PRINT
    LOCAL bp AS BYTE PTR
    LOCAL length, md, rm, opcode AS LONG
    
            bp=ThisExceptionRecord.ExceptionAddress 'some address to start looking at
    
    length = 0
    
    IF @bp = &hD4 AND @bp[1] = 0 THEN
        'its an AAM 0 instruction
        length = 2
    END IF
    
    
    'need to scan prefixes here first as they may affect the operand sizes
    WHILE @bp=&h66 OR @bp=&h67 OR @bp=&hf0 OR @bp=&hf2 OR @bp=&hf3 OR @bp=&h2e OR @bp=&h36 OR @bp=&h3e OR @bp=&h26 OR @bp=&h64 OR @bp=&h65
        'there's a prefix
        bp=bp+1
        Length = Length +1
    WEND
    
    
    'get the MODr/m byte parts assuming it's present.
    md     = (@bp[1] AND &b11000000) \&b1000000
    opcode = (@bp[1] AND &b00111000) \&b1000
    rm     = (@bp[1] AND &b00000111)
    
    
    IF @bp = &hF7 OR @bp = &hf6 THEN    'check the 1 byte opcode
        'its possibly a BYTE  DIV or IDIV
        IF (opcode = 6) OR (opcode = 7) THEN     'check the MOD/RM opcode bits in the next byte
            'it is a BYTE DIV or IDIV
            length = length + 2
            SELECT CASE md
                CASE 0
                    IF rm = 4 THEN
                        'SIB byte is present
                        length = length +1
                    END IF
    
                    IF rm = 5 THEN
                        'speacial case for 32 bit displacement
                        length = length +4
                    END IF
    
    
                CASE 1
                    '8bit displacement
                    length = length + 1
    
                    IF rm = 4 THEN
                        'SIB byte is present
                        length = length +1
                    END IF
    
                CASE 2
                    '32bit displacement
                    length = length + 4
    
                    IF rm = 4 THEN
                        'SIB byte is present
                        length = length +1
                    END IF
    
                CASE 3
                    'no extra bytes needed
    
            END SELECT
    
    
    
         END IF
    END IF
    
     PRINT "Instruction length = "length
     @[email protected] = @[email protected]  + length
                    
     WhatToDoNext=%EXCEPTION_CONTINUE_EXECUTION  '%EXCEPTION_CONTINUE_SEARCH  '%EXCEPTION_CONTINUE_EXECUTION

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    Is RegEip (Eip) the point where my exception happens?
    Yes, but a FP exception happens on the NEXT FP instruction (not the next CPU instruction) after the one which caused it. It's just a quirk of the FPU.

    is docs conflict as to which is which, and I am finding that depending on the error, I either increment this pointer or I don't
    Which documents conflict?

    I also still dont get if I increment this pointer by 1? or 4?
    You increment it by the size of the instruction that caused the exception.
    In the case of a divide by zero it could be 2 bytes, e.g.:
    Code:
    F7F9             IDIV ECX
    It could be 3 bytes:
    Code:
    0040117B   F778 0C          IDIV DWORD PTR DS:[EAX+C]
    It could be 4 bytes:
    Code:
    0040117B   F77C88 0C        IDIV DWORD PTR DS:[EAX+ECX*4+C]
    It could be 7 bytes, e.g.
    Code:
    0040117B   F7BC88 D2040000           IDIV DWORD PTR DS:[EAX+ECX*4+4D2]
    It could be 8 bytes
    Code:
    0040117B   66:F7BC88 D2040000        IDIV WORD PTR DS:[EAX+ECX*4+4D2]
    That's why you need to disassemble the instruction pointed to by regEip and find how long it is.

    Paul.

    Leave a comment:


  • Cliff Nichols
    replied
    Paul,
    You were partially right
    I don't think you need to read process memory. Can't you just look at the few bytes pointed to by regEip?
    ReadProcessMemory was a bad idea (unless I just got it wrong)

    I am getting ready to post my best attempt yet, but still confused to docs. Is RegEip (Eip) the point where my exception happens? or the point that is next if there were no error?

    Reason I ask, is docs conflict as to which is which, and I am finding that depending on the error, I either increment this pointer or I don't

    Hoping to have a posting soon, but thought I would ask while I am cleaning things up to post with to not confuse everyone as much as I got confused.

    (I also still dont get if I increment this pointer by 1? or 4? (SizeOf the pointer) just yet but I think it depends on the error as well????)

    Leave a comment:


  • Cliff Nichols
    replied
    I think my catch is I can not use debugger to walk through and watch what is going on under the hood. And in the case of Divide by zero, I can't seem to find a way to jump to the next line after my DivideByZero Exception.

    Either that or I still do not understand the registers correctly and need to reset some flag to allow me to continue as if there were no exception.
    ???

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    I don't think you need to read process memory. Can't you just look at the few bytes pointed to by regEip?

    Paul.

    Leave a comment:


  • Cliff Nichols
    replied
    I am looking at using ReadProcessMemory as a possible way to get me the size of where the Exception happens and then move the RegEip to the end of that location.

    Or am I way of base with that thought?????

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    if all you want to do is stop if there's a FP overflow then you can do this:
    Code:
    PBCC5.01 program
    #COMPILE EXE
    #DIM ALL
            
    MACRO FPUOverflowTrapEnable
        MACROTEMP NewControlWord,OriginalControlWord
        #REGISTER NONE              'the following 2 cannot be allowed as register variables
        DIM OriginalControlWord AS INTEGER
        DIM NewControlWord AS INTEGER    'control word is a 16 bit register
    
        !fstcw OriginalControlWord  'save the original control word to memory
        !mov ax,OriginalControlWord 'get the control word
        !and ax,&hfff3              'mask off the divide by zero and overflow bits
        !mov NewControlWord,ax      'save the new control word to memory
        !fldcw NewControlWord       'set the FPU to use the new control word
    
    END MACRO
    
    
    FUNCTION PBMAIN () AS LONG
    LOCAL a,b,c AS DOUBLE
    
    FPUOverflowTrapEnable
    
    a#=1
    lp:
    PRINT a#
    a#=a#+a#       'It'll eventually overflow
    GOTO lp
    
    END FUNCTION
    My concern would be that I have no control over whether calls to other PB functions or the Windows API will preserve the FPU control word to the value I set it at so the call to FPUOverflowTrapEnable may need to be done before every block of FPU maths and even then with caution, rather than just once at the start of the code.

    Edit: and, of course, the compiler sometimes checks operands and doesn't execute the code as you might expect as with 0^-1 which you might expect to overflow or SQR(-2) which returns -2 instead of and "invalid operation".
    Last edited by Paul Dixon; 31 May 2009, 10:46 AM.

    Leave a comment:


  • Michael Mattias
    replied
    Hmm, interesting.

    Since my concern is with floating point Overflow rather than divide by zero (and here I mean "ZERO" .. a value for which I can test) ... I could maybe check if the result = 9.99999E+998 or something like that. I'll have to write some test code to see if that will handle a large range of numbers.

    I don't do whole lot of floating point division, but when I do I really do not like bogus data coming back without an error for which I can test. Usually I solve these (user-supplied data) problems by editing the dividend and divisor for 'reasonableness' but something a bit less arbitrary would be nice.

    Leave a comment:


  • Cliff Nichols
    replied
    Seems to fall right in line with the testing I have been doing and trying to purposely over-ride a division by zero

    Code:
    FUNCTION DemoOverflowDivisionByZero()AS LONG
         OnError
    '     ErrorCode.LogError = %TRUE                   'Log Errors?   %TRUE/%FALSE
         SetGetLogErrors %UNKNOWN_VALUE, ErrorCode.LogError, %UNKNOWN_VALUE         'Get flag for logging errors
    '     ErrorCode.OverRideError = %RESUMENEXT     'Continue after an error?        '<--- Do NOTTTTT change this to %RESUME or you will be stuck in a logging loop
         ErrorCode.OverRideError = %UNKNOWN_VALUE     'Continue after an error?        '<--- Do NOTTTTT change this to %RESUME or you will be stuck in a logging loop
         LOCAL MyOverFlow AS LONG
    '     ON %ERR_DIVISIONBYZERO GOTO ErrHandler
         PRINT# ErrorCode.ErrorLogNumber, STRING$(40, "-") + SPACE$(5) + FUNCNAME$ + SPACE$(5) + STRING$(40, "-")
         MyOverFlow = 4 \ 0
         PRINT# ErrorCode.ErrorLogNumber, STRING$(20, "*") + SPACE$(5) + "Performing: MyOverFlow = 4 \ 0 Results in Pb protecting me so no error" + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
         LOCAL a,b,c AS LONG
         LOCAL x,y,z AS CURRENCY
         x=0.0
         y=1.0
         z=y[B][COLOR=Red]/[/COLOR][/B]x                              'Floating Point Division ([B][COLOR=Red]/[/COLOR][/B]) by zero, I am protected by PB
         PRINT# ErrorCode.ErrorLogNumber, STRING$(20, "*") + SPACE$(5) + "Performing Floating Point DivByZero: z = y/x Results in PB protecting me" + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
    '     c=b[COLOR=Red][B]\[/B][/COLOR]a                             'Integer Division ([B][COLOR=Red]\[/COLOR][/B]) by zero, still has problems with my code          '<--- To be investigated
    'MSGBOX "I Got Here"
         PRINT# ErrorCode.ErrorLogNumber, STRING$(20, "*") + SPACE$(5) + "Performing Integer DivByZero: c = b\a Results in allowing an error" + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
         PRINT# ErrorCode.ErrorLogNumber, STRING$(40, "-") + SPACE$(5) + "End " + FUNCNAME$ + SPACE$(5) + STRING$(40, "-") + $CRLF + $CRLF
    '*** Macro's for error handling
         HandleErrors
    END FUNCTION

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    Floating Point and integer calculations are treated differently.
    When a FP divide by zero occurs, a special non-valid FP number is used to represent infinity. The FPU has a few such non-valid numbers called NaNs (NaN=Not a Number).
    The FPU will correctly calculate with that number following normal mathematical rules for calculations with infinity such as:
    ANYTHING * Infinity = infinity.
    ANYTHING + Infinity = infinity.
    ANYTHING - Infinity = -infinity.
    ANYTHING / Infinity = 0

    However, when that "infinity" is stored as a normal number it is treated as the largest number that can be represented in the corresponding FP format, that's why you see it as 9.999999E+998.

    Additionally, the FPU's flag for divide by zero is set and this can be trapped as an exception, although PowerBasic doesn't do that. You could set the relevant FPU control bit yourself to cause an exception on overflow or divide by zero.


    When an integer divide by zero occurs then the calculation ends immediately with a divide by zero exception.


    Your confusion is partly due to the compiler choosing to do calculations with the FPU even when there are only integers involved. This is why you get the FP response to the divide by zero and not the integer response to it.



    To get an integer divide by zero do something like this:
    Code:
    a&=1
    b&=0
    c&=a&\b&
    The compiler treats that as an integer only calculation and will crash with an Integer Division by Zero exception.

    Paul.

    Leave a comment:


  • Michael Mattias
    replied
    >that would be checking if the divisor was 0.

    Is it that simple?

    I was thinking 'bigtime' overflow would cause that error ... so I ran this..

    Code:
    #COMPILE EXE
    #DIM ALL
    
    FUNCTION PBMAIN () AS LONG
         
         LOCAL X, Y, Z AS SINGLE
         
         LET X =  1E37
         LET Y =  1E-37
         LET Z =  X/Y
         STDOUT FORMAT$(ERR)
         STDOUT FORMAT$(Z)
         WAITKEY$
    
    END FUNCTION
    ..and all I got was a bogus result: 9.999999E+998

    So I changed to:
    Code:
      LOCAL X, Y, Z AS EXT
         LET X =  1E4000
         LET Y =  1E-4000
         LET Z =  X/Y
         STDOUT FORMAT$(ERR) 
         STDOUT FORMAT$(Z)
    .. and all got was a bogus result with more digits:9.99999999999999E+998

    Hmm, I have to rethink my concerns about 'divide by zero' since testing for zero is not really all that difficult to code. I probably even have some code here I could modify to handle that.

    Does not deal with the bogus data with no ERR, but it's a start.

    What you are saying, supported by above test, is "divide by zero exception occurs when you divide by zero". So let's test that
    Code:
         LET X =  1E4000
         LET Y =  0## ' 1E-4000
         LET Z =  X/Y
         STDOUT   FORMAT$(ERR)
         STDOUT FORMAT$(Z)
         WAITKEY$
    Huh? No exception! I just get the same bogus result. Now I am confused.

    MCM
    (using PB/CC 5.01)

    Leave a comment:


  • Paul Dixon
    replied
    Cliff,
    the example program mentioned earlier http://www.powerbasic.com/support/pb...ad.php?t=37821
    only uses ASM because I needed to control the FPU registers. You can disassemble the instruction without ASM but an understanding of ASM certainly helps when you're working at such a low level.

    Paul.

    Leave a comment:

Working...
X