Announcement

Collapse
No announcement yet.

Add/subtract 2 quad integers as if they were unsigned?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Michael Mattias
    replied
    The goal of Rosetta Code is to compare the same program in different languages, so I used the same function and variables names of the original one.
    Of course it's possible to write a completely different algorithm using meaningful names and comments, perhaps better, but this is not the purpose in this case.
    I hope you are not trying to compare the runtime performances of C++ code written for a 64-bit CPU to PowerBASIC code written for a 32-bit CPU. That's not even comparing apples with oranges, it's comparing apples with elephants.

    Leave a comment:


  • Claudio DalZovo
    replied
    By the way, using macros (with inline assembler) for lshift/rshift (instead of function calls) reduces the run time for the test setup by another generous second, to about 3.5 seconds.
    Curious that the other configuration proposed in Rosetta Code requires a very long time; I stopped the program after 4 hours ... (for C ++ the declared time is more or less that).
    "By hand" I can solve that configuration in just over a minute; the algorithm has some limitations, obviously ...

    I'll try to find another algorithm, just for fun (if and when I'll have time to spare).

    Leave a comment:


  • Claudio DalZovo
    replied
    Putting all together, considering that for the specific program the best form is to receive the result on a, I wrote this:
    Code:
    #compiler pbcc
    #compile exe
    #dim all
    
    macro AddQWord(a, b)
       !mov edx,b[4]
       !mov ecx,b
       !add a,ecx
       !adc a[4],edx
    end macro
    
    function pbmain() as long
       local a as quad
       local b as quad
       local c as quad
       local d as quad
       local j as long
       local t as single
       a = &hfe169b4c0a73d852
       b = 1
       d = a
       print bin$(a + b, 64)
       AddQWord(a, b)
       print bin$(a, 64) ' it's ok!
       a = d
    
       t = timer
       for j = 1 to 100000000
          c = a + b
       next
       print "T1:";format$(timer - t,"#####.########")
    
       t = timer
       for j = 1 to 100000000
          AddQword(a, b)
          a = d
       next
       print "T2:";format$(timer - t,"#####.########")
    end function
    It works, but still slightly slower than PB addition (0.17 instead of 0.1 on my system).
    However, the second step require an extra assignment to have always the same value for a, so probably the two methods are almost equivalent.
    Indeed, I tried to apply inline assembler math to the complete program and the execution time is almost the same.

    As Paul said, it's not worth using inline assembler in this case.
    It's unbelievable that all this efforts in the math give no result, while modifying the shift left/right brings a great enhancement.

    I learned a lot of things in this topic, thank you all!

    Leave a comment:


  • Dale Yarker
    replied
    Oh, yeah. I never use that way here. While working on it, I use "Compile And Run" in IDE; later either desktop icon or double click in File Explorer.

    Leave a comment:


  • Claudio DalZovo
    replied
    Originally posted by Dale Yarker View Post
    Didn't take as long as I thought it might: '[code]
    waitkey$ 'how did you see result without this?
    Launching the program from command prompt; the result remains on the screen

    Leave a comment:


  • Claudio DalZovo
    replied
    Thank you Paul and Dale!
    Very instructive, especially the concept that "as soon as you execute any non-ASM code, those registers are lost as the compiler will use them" and the idea to use macro instead of function calls.

    Leave a comment:


  • Dale Yarker
    replied
    Didn't take as long as I thought it might: '
    Code:
    #compiler pbcc
    #compile exe
    #dim all
    'not named AddQuad because QUAD is signed
    '32 bit unsigned is DWORD, QWORD should work for 64 bit unsigned
    macro function AddQWord(a, b) '
      macrotemp dHighA, dLowA, dHighB, dLowB
      local dHighA, dLowA, dHighB, dLowB as dword
      dHigha = hi(dword, a)
      dLowa = lo(dword, a)
      dHighb = hi(dword, b)
      dLowb = lo(dword, b)
      ! push ecx
      ! push edx
      ! mov edx, dHighb
      ! mov ecx, dLowb
      ! add dLowa, ecx
      ! adc dHigha, edx
      ! pop edx
      ! pop ecx
    end macro = mak(quad, dLowa, dHigha)
    
    function pbmain() as long
       local a as quad
       local b as quad
       a = &hfe169b4c0a73d852
       b = 1
       print bin$(a, 64)
       print bin$(b, 64)
       print bin$(a + b, 64)
       print
       print bin$(a, 64)
       print bin$(b, 64)
       print bin$(AddQWord(a, b), 64)
       waitkey$ 'how did you see result without this?
    end function '
    You should be able to do subtract.

    Cheers,

    Leave a comment:


  • Paul Dixon
    replied

    In ASM you can split and combine parts of numbers without needing the MAKxxx or HI/LO functions.

    Code:
    !mov eax,MyVariable 'load the low 4 bytes of MyVariable into EAX
    !mov eax,MyVariable[4] 'load the next 4 higher bytes of MyVariable into EAX
    
    !mov MyVariable,eax 'store EAX in the low 4 bytes of MyVariable
    !mov MyVariable[4],eax 'store EAX in the next 4 higher bytes of MyVariable
    This works as long as MyVariable is not a parameter passed ByRef.

    Leave a comment:


  • Paul Dixon
    replied
    Claudio,
    the problem with your code is that you assume all the CPU registers are yours to use as you wish.
    But the compiler uses some of them too.

    EAX, EDX, and ECX are scratch registers. They can be used as you wish within your own ASM code, but as soon as you execute any non-ASM code, those registers are lost as the compiler will use them.

    Code:
    function AddQuad(byval a as quad, byval b as quad) as quad
    local dHigh, dLow as dword
    
    dHigh = HI(DWORD, a)
    dLow = LO(DWORD, a)
    !mov ebx,dHigh
    !mov eax,dLow '##### you use EAX here
    
    dHigh = HI(DWORD, b) '##### but the compiler uses EAX here and overwrites your EAX value
    dLow = LO(DWORD, b)
    !mov edx,dHigh
    !mov ecx,dLow
    
    !add eax,ecx
    !adc ebx,edx
    
    !mov dHigh,ebx
    !mov dLow,eax
    function = mak(quad, dLow, dHigh)
    end function

    Leave a comment:


  • Dale Yarker
    replied
    Fixed the Add function.
    '
    Code:
    function AddQuad(byval a as quad, byval b as quad) as quad
       local dHigha, dLowa, dHighb, dLowb as dword
    
       dHigha = hi(dword, a)
       dLowa = lo(dword, a)
       dHighb = hi(dword, b)
       dLowb = lo(dword, b)
    
       ! mov edx, dHighb
       ! mov ecx, dLowb
    
       ! add dLowa, ecx
       ! adc dHigha, edx
    
       function = mak(quad, dLowa, dHigha)
    end function '
    Problem might have been BASIC, then assembler, then BASIC, then assembler, then BASIC with no PUSHs or POPs to save registers. Also, since a memory location (a variable) can be destination of ADD and ADC, we can save 4 MOVs.

    If the functions can be made into MACROs, the code goes in-line instead of a call.

    Cheers,

    Leave a comment:


  • Paul Dixon
    replied
    Claudio,
    don't credit me for the shifting, it's the standard way to do a 64 bit shift.
    It's published in many places on the internet. I didn't devise it.

    Leave a comment:


  • Claudio DalZovo
    replied
    Thank you Paul!
    If I'll post the complete program to Rosetta Code website, I'll add a comment to the shift functions in assembler giving you credits.
    I added creation of new configurations (by shuffling initial correct configuration) and a simple (text only) visualization of the solution.

    In any case, I would like to know what's wrong in my AddSub; it's always useful to learn something new (I'm not skilled with assembler, I used it sometimes, but it was about 30 year ago ...)

    Leave a comment:


  • Paul Dixon
    replied
    Claudio,
    I don't think it's worth trying to improve the speed of QUAD addition and subtraction.

    Although the compiler doesn't do it the most efficient way (it does QUAD arithmetic using the FPU instead of the CPU) it's still way faster than calling a FUNCTION would be.
    The compiler does something like this:
    !FILD FirstQuad
    !FILD SecondQuad
    !FADDP
    !FISTP ResultQuad

    So it's only 4 instructions to add 2 QUADs.
    Adding with the CPU would be slightly faster but if you put that in a FUNCTION it'll be a lot slower because of the overhead of calling the FUNCTION.

    Leave a comment:


  • Claudio DalZovo
    replied
    I'm trying to write 2 functions to do addition and subtraction of quads with inline assembler.
    I'm not able to modify Frank's code on post #9 to adapt it when the second number is not simply 1, but has an high part too.
    So I tried to apply instructions found here:
    http://www.c-jump.com/CIS77/MLabs/M1...c_examples.htm

    The following instructions add two 64-bit numbers received in EBX:EAX and EDX:ECX:

    The result is returned in EBX:EAX.

    Overflow/underflow conditions are indicated by the Carry flag.

    add eax, ecx ; add low parts EAX += ECX, set CF
    adc ebx, edx ; add high parts EBX += EDX, EBX += CF
    ; The result is in EBX:EAX
    ; NOTE: check CF or OF for overflow (*)


    The 64-bit subtraction is also simple and similar to the 64-bit addition:

    sub eax, ecx ; subtract low parts EAX -= ECX, set CF (borrow)
    sbb ebx, edx ; subtract high parts EBX -= EDX, EBX -= CF
    ; The result is in EBX:EAX
    ; NOTE: check CF or OF for overflow (*)
    I cannot get the correct results.

    Code:
    #compiler pbcc
    #compile exe
    #dim all
    
    function pbmain() as long
       local a as quad
       local b as quad
       a = &hfe169b4c0a73d852
       b = 1
       print bin$(a, 64)
       print bin$(b, 64)
       print bin$(a + b, 64)
       print
       print bin$(a, 64)
       print bin$(b, 64)
       print bin$(AddQuad(a, b), 64)
    end function
    
    function AddQuad(byval a as quad, byval b as quad) as quad
       local dHigh, dLow as dword
    
       dHigh = HI(DWORD, a)
       dLow = LO(DWORD, a)
       !mov ebx,dHigh
       !mov eax,dLow
    
       dHigh = HI(DWORD, b)
       dLow = LO(DWORD, b)
       !mov edx,dHigh
       !mov ecx,dLow
    
       !add eax,ecx
       !adc ebx,edx
    
       !mov dHigh,ebx
       !mov dLow,eax
       function = mak(quad, dLow, dHigh)
    end function
    
    function SubQuad(byval a as quad, byval b as quad) as quad ' not used in main
       local dHigh, dLow as dword
    
       dHigh = HI(DWORD, a)
       dLow = LO(DWORD, a)
       !mov ebx,dHigh
       !mov eax,dLow
    
       dHigh = HI(DWORD, b)
       dLow = LO(DWORD, b)
       !mov edx,dHigh
       !mov ecx,dLow
    
       !sub eax,ecx
       !sbb ebx,edx
    
       !mov dHigh,ebx
       !mov dLow,eax
       function = mak(quad, dLow, dHigh)
    end function
    The low part of the result is not correct, almost all zeros.
    What's wrong?
    Thank you.

    PS. However, even if the result is wrong, I did some speed test repating 100,000,000 times both a+b and AddQuad(a,b).
    Both are very fast, but the simple a+b is faster, so no real need to replacing it ...

    Leave a comment:


  • Claudio DalZovo
    replied
    Wow, that' true!
    I did not focus on shift because I thought that addition/subtraction are slower, but actually using assembler for shift gain a lot of time!
    Thank you!

    I'll try to replace with assembler addition and subtraction too, so program should became even more fast.

    Leave a comment:


  • Paul Dixon
    replied
    Claudio,
    the PB shifts for QUAD variables are not very efficient.
    You can try replacing them with thie following code.
    On my computer this doubles the speed of your program.

    Code:
    FUNCTION lshift(BYVAL v AS QUAD, BYVAL s AS DWORD) AS QUAD
    ' shift left v, s
    
    !mov edx,v[4]
    !mov eax,v
    !mov ecx,s
    
    !shld edx,eax,cl
    !shl eax,cl
    !test ecx,32
    !jz skip
    !mov edx,eax
    !xor eax,eax
    skip:
    
    !mov v[4],edx
    !mov v,eax
    
    FUNCTION = v
    
    
    END FUNCTION
    
    
    
    FUNCTION rshift(BYVAL v AS QUAD, BYVAL s AS DWORD) AS QUAD
    ' SHIFT RIGHT v, s
    !mov edx,v[4]
    !mov eax,v
    !mov ecx,s
    
    !shrd eax,edx,cl
    !shr edx,cl
    !test ecx,32
    !jz skip
    !mov eax,edx
    !xor edx,edx
    skip:
    !mov v[4],edx
    !mov v,eax
    
    FUNCTION = v
    
    
    END FUNCTION

    Leave a comment:


  • Claudio DalZovo
    replied
    Originally posted by Claudio DalZovo View Post
    there is certainly some logical problem because neither of them solves a simple configuration (the one commented in my source) that I can solve by hand in few moves (1-12 in their correct place, 15-14-13 on the last line).
    By the way, this is not true, my mistake.
    The case with 1-12 in the correct place and 15-14-13 on the last line has no solution.
    I knew that 13-15-14 has no solution (there is a complete story behind this: https://en.wikipedia.org/wiki/15_puzzle, see Sam Loyd), but this is true also for 15-14-13; some configuration cannot be reached (without removing and repositioning the tiles) and have no solution.

    With 15-14-13 (and others) on the last line, the solution is trivial.

    Leave a comment:


  • Claudio DalZovo
    replied
    Originally posted by Stuart McLachlan View Post
    If it used meaningful function and variable names and code comments, I'd be more inclined to take a look, but I really can't be bothered with trying to decypher the logic of that cryptic code
    The goal of Rosetta Code is to compare the same program in different languages, so I used the same function and variables names of the original one.
    Of course it's possible to write a completely different algorithm using meaningful names and comments, perhaps better, but this is not the purpose in this case.

    The optimization I mean is only regarding the coding, not about "decypher the logic of that cryptic code" and rewrite it in a better way.
    For example, in previous versions I used "select chr$(N3(n))" and chars in the various case ("l", "r" etc) to maintain the source more similar to the original; the simple variation to use long and constants instead of string halfed the execution time.
    So I wonder if is there some other "trick" to speed up execution.
    However, as highlighted by Steve in post #23, probably the difference is in the fact that go and C ++ programs natively use 64-bit math; so there is nothing to do to have the same speed in PB (in this particular case).



    Leave a comment:


  • Steve Hutchesson
    replied
    Hi Claudio,

    As I suspected, the C++ you were trying to port was 64 bit where integer maths in simple things like adds, subs, compares and so on are trivial to do as long as you load the values into registers first, but trying to do the same in a 32 bit compiler is always going to be a pain as the native data size of 32 bit is not big enough.

    You could try the scalar double instructions but you would have to convert it both ways which is probably too slow for what you are doing.

    Code:
        comisd  ; compare two dbl values and set eflags (jz jnz je jne etc...)
        cvtsd2si ; Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer
        cvtsi2sd ; Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value

    Leave a comment:


  • Stuart McLachlan
    replied
    > Is there any way to optimize the PB program?
    If it used meaningful function and variable names and code comments, I'd be more inclined to take a look, but I really can't be bothered with trying to decypher the logic of that cryptic code

    Leave a comment:

Working...
X