Announcement

Collapse
No announcement yet.

Weird performance boost

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weird performance boost

    I've pasted below a function from a program I'm working on. I can post full code if it helps. I've been trying to speed it up and found an odd thing.

    Why is it quicker for this to step the innermost loop by two and do two comparisons than the the more obvious loop through one by one and do a single comparison? (it's also quicker than doing 3 at a time).

    And if anyone can suggest a way to speed this up at all, please do.

    Code:
    THREAD FUNCTION MyThread (BYVAL dummy AS LONG) AS LONG
        REGISTER g&
        REGISTER h&
        DIM nptr AS LONG POINTER
        nptr=VARPTR(n(0,0))
        tst&=0
        dlt&=0
        FOR z&=161 TO 1 STEP -2
            '
            sc&=0
            FOR a&=44 TO 1 STEP -1
                FOR b&=45 TO a&+1 STEP -1
                    FOR c&=46 TO b&+1 STEP -1
                        FOR d&=47 TO c&+1 STEP -1
                            FOR e&=48 TO d&+1 STEP -1
                                FOR f&=49 TO e&+1 STEP -1
                                    FOR g&=162 TO 2 STEP -2
                                        IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                        h&=g&-1
                                        IF h&<>z& AND @nptr[h& OF 162,a& OF 49][email protected][h& OF 162,b& OF 49][email protected][h& OF 162,c& OF 49][email protected][h& OF 162,d& OF 49][email protected][h& OF 162,e& OF 49][email protected][h& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                    NEXT
                                NEXT
                            NEXT
                        NEXT
                    NEXT
                NEXT
            NEXT
            '
            IF sc&>tst& THEN tst&=sc&:dlt&=z&
            '
        NEXT
        '
        FUNCTION=-1
    END FUNCTION
    Neil Croft (cissp)

  • #2
    I get no significant difference in speed.

    Possible improvements would include:

    The g& FOR/NEXT loop doesn't depend on other variables so move the g& loop to outside the a& loop.
    This will allow the comparison to z& to take place and exit without having to initiate all the other loops.

    Also, you're executing the IF h&<>z& AND @nptr[h& ...etc. line over 1,000,000,000 times on the first iteration of z& which is 6,000,000,000 index calculations because your inner loop calculates every pointer every time through the inner loop.
    If you move the g& loop as suggested then you'll be able to calculate @nptr[g& OF 162,a& OF 49] once at the start of the a& loop and @nptr[g& OF 162,b& OF 49] once at the start of the b& loop and so on. That ought to cut the number of index calculation enormously as the inner loop would end up looking up a single element 1,000,000,000 times and adding it to the sum so far.

    Don't forget to check which variables of the rearranged code would now best suit being register variables.


    Paul.

    Comment


    • #3
      Another obvious saving would come from treating your 2 dimensional array as a single dimension and calculating the index into the data yourself.
      With the suggestions I made above, the inner loop would just evaluate the following line:
      Code:
      IF SumSoFar + @nptr[g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
      The index calculation which will be done 1,000,000,000 times will be something like BaseOfArray + g&*163 + f&.
      But g& doesn't change within the inner loops so if you calculate the index into the array yourself then you can do the multiply one time at the start of the g& loop and reuse the result each time around the loop saving lots of multiplies.
      Code:
      gScaled&=g&*163
      ..
      ..
      IF SumSoFar + @nptr[gScaled&+f&]>2 THEN INCR sc&:EXIT
      Paul.

      Comment


      • #4
        Originally posted by Paul Dixon View Post
        I get no significant difference in speed.
        6.5 seconds per loop of z& vs 7 seconds here on a dual 3.6GHz Xeon machine. Note this thread is replicated within the main thread. This one does the odd z&s. The main thread does the evens.

        Originally posted by Paul Dixon View Post
        Possible improvements would include:

        The g& FOR/NEXT loop doesn't depend on other variables so move the g& loop to outside the a& loop.
        This will allow the comparison to z& to take place and exit without having to initiate all the other loops.
        Not so. The g& loop is cut short once a match is made to the a/b/c/d/e/f loops. If it's moved outside of the a& loop then it counts all matches for all combinations, not just the first match for all combinations.

        Originally posted by Paul Dixon View Post
        Also, you're executing the IF h&<>z& AND @nptr[h& ...etc. line over 1,000,000,000 times on the first iteration of z& which is 6,000,000,000 index calculations because your inner loop calculates every pointer every time through the inner loop.
        If you move the g& loop as suggested then you'll be able to calculate @nptr[g& OF 162,a& OF 49] once at the start of the a& loop and @nptr[g& OF 162,b& OF 49] once at the start of the b& loop and so on. That ought to cut the number of index calculation enormously as the inner loop would end up looking up a single element 1,000,000,000 times and adding it to the sum so far.
        Now, put like that, it may be that putting G& outside A& and flagging once a match is made might work quicker. Hmmmm, I can feel a bit of code fiddling coming on there. I did try bit arrays to reduce the problem to single indexes but it was much slower. Could have been me though.

        Originally posted by Paul Dixon View Post
        Don't forget to check which variables of the rearranged code would now best suit being register variables.
        I won't forget. I tried a variety of options to get to g& and h&. h& surprised me a bit.

        At the end of the day, it may just be that the time it's taking is the time it takes. I shall print out the source and go for a bath. Guaranteed to see a shortcut as soon as I get in the water.:laugh:
        Neil Croft (cissp)

        Comment


        • #5
          All clean and refreshed. Unfortunately the only thing I realised in the bath was that you don't really have enough to work out what the program's doing. Easily solved.

          Code:
          '
          ' 162-HCt.exe
          '
          ' Calculating complete covers using hill climbing
          ' reference http://www.ccrwest.org/cover/cover.pdf
          '
          ' Attempts to derive a solution for c(49,6,3,6)=162. Current best recorded is 163.
          '
          ' Program is threaded to fully utilise dual CPU machine.
          '
          '
          #OPTIMIZE SPEED
          #OPTION VERSION5
          #BREAK ON
          #INCLUDE "win32api.inc"
          FUNCTION PBMAIN () AS LONG
              ' make nice
              ff&=SetPriorityClass(GetCurrentProcess, %IDLE_PRIORITY_CLASS)
              RANDOMIZE TIMER
              CONSOLE NAME "162 Hill Climber"
              '
              ' speed things up a bit
              '
              REGISTER f&
              REGISTER g&
              '
              bps&=13983816
              '
              ' n() keeps record of the sets we're checking. Global 'cos the thread reads it. Could be bit array kept in 162 quads but was slower when tried.
              '
              DIM n(162,49) AS GLOBAL LONG
              '
              ' and it's pointers
              '
              DIM nptr AS LONG POINTER
              nptr=VARPTR(n(0,0))
              '
              ' Nasty but effective. Couple of globals for the thread's returns.
              '
              GLOBAL tst&,dlt&
              '
              ' Initial set if we're not restarting.
              '
              n(1,1)=1
              n(1,2)=1
              n(1,3)=1
              n(1,4)=1
              n(1,5)=1
              n(1,6)=1
              '
              FOR a&=2 TO 162
                  FOR b&=1 TO 6
                      x&=RND(1,49)
                      WHILE n(a&,x&)
                          x&=RND(1,49)
                      WEND
                      n(a&,x&)=1
                  NEXT
              NEXT
              '
              ' load a set if we've run before
              '
              IF DIR$("progress.bin")="progress.bin" THEN
                  ff&=FREEFILE
                  OPEN "progress.bin" FOR BINARY AS ff&
                  GET# ff&,,n()
                  CLOSE
              END IF
              '
              ' Start time for reference.
              '
              PRINT TIME$
              '
              ' and off we go.
              '
              WHILE sc&<bps&
                  '
                  ' throw a thread here to do odd loops. Do evens inline. Comments here apply generally to the thread too.
                  '
                  THREAD CREATE MyThread(BYVAL dummy&) TO idThread&
                  THREAD SET PRIORITY idthread&,%THREAD_PRIORITY_NORMAL
                  '
                  ' initialise vars to track progress.
                  '
                  ts&=0
                  dl&=0
                  '
                  ' loop backwards through the evens
                  '
                  FOR z&=162 TO 2 STEP -2
                      ' show progress.
                      CONSOLE NAME "162 Hill Climber"+STR$(z&)
                      '
                      ' Each "row" of n() contains exactly 6 1's between 1 and 49 [The first 6 in C(49,6,3,6)]
                      '
                      sc&=0
                      ' pick six [The second 6 in C(49,6,3,6)] Test against all possible perms. loop backwards for speed
                      FOR a&=44 TO 1 STEP -1
                          FOR b&=45 TO a&+1 STEP -1
                              FOR c&=46 TO b&+1 STEP -1
                                  FOR d&=47 TO c&+1 STEP -1
                                      FOR e&=48 TO d&+1 STEP -1
                                          FOR f&=49 TO e&+1 STEP -1
                                              ' check current 162 lines against the loops. Exit if a line matches more than 2 [the 3 in C(49,6,3,6)] then move on.
                                              ' Note we do this 162 times, missing out one set each time to see which set has the least impact if omitted/changed.
                                              FOR g&=162 TO 2 STEP -1
                                                  ' pointers used for speed.
                                                  IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                              NEXT
                                          NEXT
                                      NEXT
                                  NEXT
                              NEXT
                          NEXT
                      NEXT
                      '
                      ' If that set scored better then make a note.
                      '
                      IF sc&>ts& THEN ts&=sc&:dl&=z&
                  NEXT
                  '
                  ' Route out for immediate finish, finish on completion of current test or just to pause for a bit.
                  '
                  i$=INKEY$
                  IF i$="q" THEN EXIT FUNCTION
                  IF i$="p" THEN CONSOLE NAME "Paused":WAITKEY$
                  IF i$="e" THEN drop&=1
                  '
                  ' wait for thread here.
                  '
                  ' wait for the thread function to end (which it may have even before we got here): (recognise this comment?) Thanks to MCM for example of WaitForSingleObject
                  CONSOLE NAME "162 Hill Climber - Waiting for thread"
                  dummy&=WaitForSingleObject (idThread&, %INFINITE)
                  THREAD CLOSE idThread& TO dummy&
                  CONSOLE NAME "162 Hill Climber"
                  '
                  ' If the thread's returns are better then use them
                  '
                  IF tst&>ts& THEN ts&=tst&:dl&=dlt&
                  '
                  ' Score the complete 162 picks for display and recording purposes
                  '
                  sc&=0
                  FOR a&=44 TO 1 STEP -1
                      FOR b&=45 TO a&+1 STEP -1
                          FOR c&=46 TO b&+1 STEP -1
                              FOR d&=47 TO c&+1 STEP -1
                                  FOR e&=48 TO d&+1 STEP -1
                                      FOR f&=49 TO e&+1 STEP -1
                                          FOR g&=162 TO 1 STEP -1
                                              IF @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                          NEXT
                                      NEXT
                                  NEXT
                              NEXT
                          NEXT
                      NEXT
                  NEXT
                  '
                  INCR sad&   ' Just a little counter. Since A Display
                  IF sc&>bsc& THEN
                      ' output best set to date
                      bsc&=sc&
                      PRINT
                      PRINT DATE$;" ";TIME$;bsc&;bps&-bsc&;FORMAT$(bsc&/bps&,"0.0000%");sad&
                      FOR a&=1 TO 162
                          PRINT a&,
                          FOR b&=1 TO 49
                              IF n(a&,b&) THEN PRINT b&;
                          NEXT
                          PRINT
                      NEXT
                      PRINT DATE$;" ";TIME$;bsc&;bps&-bsc&;FORMAT$(bsc&/bps&,"0.0000%");sad&
                      '
                      ' Save it for re-starts
                      '
                      ff&=FREEFILE
                      OPEN "progress.bin" FOR BINARY AS ff&
                      PUT# ff&,,n()
                      CLOSE
                      '
                      ' keep a log
                      '
                      ff&=FREEFILE
                      OPEN "log.txt" FOR APPEND AS ff&
                      WRITE# ff&,DATE$;TIME$;bsc&;sad&
                      CLOSE
                      sad&=0
                  END IF
                  '
                  ' clear set with least impact as calculated above
                  '
                  FOR a&=1 TO 49
                      n(dl&,a&)=0
                  NEXT
                  '
                  ' pick a new, random set
                  '
                  FOR a&=1 TO 6
                      x&=RND(1,49)
                      WHILE n(dl&,x&)
                          x&=RND(1,49)
                      WEND
                      n(dl&,x&)=1
                  NEXT
                  '
                  ' Check again if we want to quit.
                  '
                  i$=INKEY$
                  IF i$="q" OR i$="e" OR drop&=1 THEN EXIT FUNCTION
                  IF i$="p" THEN CONSOLE NAME "Paused":WAITKEY$:CONSOLE NAME "162 Hill Climber"
                  '
                  ' do it all again
                  '
              WEND
              ' let's us know that the program completed as opposed to being aborted.
              PRINT "Done"
              WAITKEY$
          END FUNCTION
          '
          ' See comments above.
          '
          THREAD FUNCTION MyThread (BYVAL dummy AS LONG) AS LONG
              REGISTER f&
              REGISTER g&
              DIM nptr AS LONG POINTER
              nptr=VARPTR(n(0,0))
              tst&=0
              dlt&=0
              FOR z&=161 TO 1 STEP -2
                  '
                  sc&=0
                  FOR a&=44 TO 1 STEP -1
                      FOR b&=45 TO a&+1 STEP -1
                          FOR c&=46 TO b&+1 STEP -1
                              FOR d&=47 TO c&+1 STEP -1
                                  FOR e&=48 TO d&+1 STEP -1
                                      FOR f&=49 TO e&+1 STEP -1
                                          FOR g&=162 TO 2 STEP -1
                                              IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                          NEXT
                                      NEXT
                                  NEXT
                              NEXT
                          NEXT
                      NEXT
                  NEXT
                  '
                  IF sc&>tst& THEN tst&=sc&:dlt&=z&
                  '
              NEXT
              '
              FUNCTION=-1
          END FUNCTION
          Neil Croft (cissp)

          Comment


          • #6
            Neil,
            Code:
            6.5 seconds per loop of z& vs 7 seconds here
            That's only 7% difference so it's well within the variation you might get just because of code alignment.


            Paul.

            Comment


            • #7
              I did wonder about that. I recall a thread recently about aligning tight loops.
              Neil Croft (cissp)

              Comment


              • #8
                Neil,
                The compiler will align most loops for you anyway (that's what #option faster does) but that doesn't remove all code alignment problems. With some processors, if an instruction stradles a cache line boundary then it runs slower that if it's entirely within a single cache line.

                Paul.

                Comment


                • #9
                  Remove the repeated multiplies from the innermost loop: 25% faster
                  Code:
                              ' pick six [The second 6 in C(49,6,3,6)] Test against all possible perms. loop backwards for speed
                              FOR a&=44 TO 1 STEP -1
                                  a2&=a&*163
                                  FOR b&=45 TO a&+1 STEP -1
                                      b2&=b&*163
                                      FOR c&=46 TO b&+1 STEP -1
                                          c2&=c&*163
                                          FOR d&=47 TO c&+1 STEP -1
                                              d2&=d&*163
                                              FOR e&=48 TO d&+1 STEP -1
                                                  e2&=e&*163
                                                  FOR f&=49 TO e&+1 STEP -1
                                                      f2&=f&*163
                                                
                                                      ' check current 162 lines against the loops. Exit if a line matches more than 2 [the 3 in C(49,6,3,6)] then move on.
                                                      ' Note we do this 162 times, missing out one set each time to see which set has the least impact if omitted/changed.
                                                      FOR g&=162 TO 2 STEP -1
                                                          ' pointers used for speed.
                                                          
                                                        '   IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                                          IF g&<>z& AND @nptr[g&+a2&][email protected][g&+b2& ][email protected][g&+c2&][email protected][g&+d2&][email protected][g&+e2&][email protected][g&+f2&]>2 THEN INCR sc&:EXIT
                                                          
                                                      NEXT
                                                  NEXT
                                              NEXT
                                          NEXT
                                      NEXT
                                  NEXT
                              NEXT

                  Comment


                  • #10
                    Replaced the critical calculation in the inner loop with a bit of ASM. A further 50% reduction in time.
                    You might need to test this to make sure it's right as I don't know what answer I'm waiting for so I can't confirm it is working.
                    Code:
                    FOR g&=162 TO 2 STEP -1
                        ' pointers used for speed.
                    
                      '   IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                        ' IF g&<>z& AND @nptr[g&+a2&][email protected][g&+b2& ][email protected][g&+c2&][email protected][g&+d2&][email protected][g&+e2&][email protected][g&+f2&]>2 THEN INCR sc&:EXIT
                    
                    
                            !mov eax,g&             'get g&
                            !cmp eax,z&             'compare with z&
                            !je skip1               'if = then skip the calculation
                    
                            !mov ecx,g&             'offset to current row
                            !sal ecx,2              '4 bytes per entry in array
                            !add ecx,nptr           'add address of array
                            !mov edx,a2&
                            !mov eax,[ecx+edx*4]    'get a
                            !mov edx,b2&
                            !add eax,[ecx+edx*4]    'add b
                            !mov edx,c2&
                            !add eax,[ecx+edx*4]    'add c
                            !mov edx,d2&
                            !add eax,[ecx+edx*4]    'add d
                            !mov edx,e2&
                            !add eax,[ecx+edx*4]    'add e
                            !mov edx,f2&
                            !add eax,[ecx+edx*4]    'add f
                    
                            !cmp eax,2              '>2?
                            !jle skip1              'no.
                            !inc sc&                'yes, >2 so inc sc& and then ...
                            EXIT                    '... exit the loop
                    
                            skip1:
                    
                    NEXT
                    My PC now does a z& loop in 10 seconds instead of the original 26.

                    Paul.

                    Comment


                    • #11
                      Code:
                      IF g&<>z& AND @nptr[g&+a2&][email protected][g&+b2& ][email protected][g&+c2&]  _
                         [email protected][g&+d2&][email protected][g&+e2&][email protected][g&+f2&]>2 THEN INCR sc&:EXIT
                      A. This line is always doing ALL the additions. If these values are not signed you could do 'em one at a time and exit as soon as any addition creates Sum> 2

                      B. Maybe nested FOR..NEXT is not the best algorithm. With all those numbers being sequential there may be a some kind of polynomial expansion.

                      C. maybe it's no big deal this takes seven seconds... as long as your program does not have wait for the answer.. or if you are not doing all that many of them. Application not described.

                      MCM
                      Michael Mattias
                      Tal Systems Inc. (retired)
                      Racine WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        Loads to go on there. Thanks Paul. For reference, if you set the "randomize timer" line to "randomize 1" so the initialisation is always the same, the results display for the first set displayed is

                        date time 13283911 699905 94.9949% 1

                        and the second set is

                        date time 13290206 693610 95.0399% 1

                        I shall play around and get back tomorrow with progress.

                        Michael, there's a bit at the start of the second (ie full) code describing what it's trying to do and a link to a paper that describes the method.

                        Still,

                        A) Yes, that'll work. they're all ones and zeroes.

                        B) The dearth of good algorithms to calculate complete covers is, unfortunately, well documented. (the dearth that is, not the algorithms). I have 5 computers/9 CPU running a complete brute force check but that's never likely to finish in my lifetime unless Moore's law picks up. Monte Carlo and simulated annealing are other methods I am or have investigated/investigating.

                        C) Hill climbing, by it's very nature, is very repetitive. From the referenced .pdf, "Repeat until all t-sets are covered or until time runs out.". The method works well for smaller sets. I've been running C(27,6,3,4) for a while and that's come close (but, as the saying goes, no cigar). It just takes a long time when there are 13,983,816 t-sets.

                        Don't get me wrong, I appreciate there's a lot of maths and computing going on there and don't expect a result in under 15 minutes. Due to the complete lack of generic formulae to calculate the minimum bound for complete covers, there's not even a guarantee an answer exists for <163 however the best 163 line cover has a lot of redundancy in it.
                        Neil Croft (cissp)

                        Comment


                        • #13
                          The answers I get are not identical to yours, it maybe needs investigating. The following changes the whole g& FOR/NEXT loop to ASM and now runs in about 1.2 seconds per z& iteration. Down from 26 seconds.

                          This uses EDI to hold g& but requires that register variables are not used (edi and esi are where the compiler puts register variables). so use #register none in your code and don't declare F& and G& as register variables.
                          Code:
                          FOR f&=49 TO e&+1 STEP -1
                              f2&=f&*163
                          
                              ' check current 162 lines against the loops. Exit if a line matches more than 2 [the 3 in C(49,6,3,6)] then move on.
                              ' Note we do this 162 times, missing out one set each time to see which set has the least impact if omitted/changed.
                            '  FOR g&=162 TO 2 STEP -1
                               !mov edi,162   'use register edi to hold g&
                                  ' pointers used for speed.
                            #ALIGN 32
                             lp1:
                                '   IF g&<>z& AND @nptr[g& OF 162,a& OF 49][email protected][g& OF 162,b& OF 49][email protected][g& OF 162,c& OF 49][email protected][g& OF 162,d& OF 49][email protected][g& OF 162,e& OF 49][email protected][g& OF 162,f& OF 49]>2 THEN INCR sc&:EXIT
                                '   IF g&<>z& AND @nptr[g&+a2&][email protected][g&+b2& ][email protected][g&+c2&][email protected][g&+d2&][email protected][g&+e2&][email protected][g&+f2&]>2 THEN INCR sc&:EXIT
                          
                          
                                    '  !mov eax,g&             'get g&
                                    '  !cmp eax,z&             'compare with z&
                                     !cmp edi,z&
                                      !je skip1               'if = then skip the calculation
                          
                                      !mov ecx,edi 'g&             'offset to current row
                                      !sal ecx,2              '4 bytes per entry in array
                                      !add ecx,nptr           'add address of array
                                      !mov edx,a2&
                                      !mov eax,[ecx+edx*4]    'get a
                                      !mov edx,b2&
                                      !add eax,[ecx+edx*4]    'add b
                                      !mov edx,c2&
                                      !add eax,[ecx+edx*4]    'add c
                                      !mov edx,d2&
                                      !add eax,[ecx+edx*4]    'add d
                                      !mov edx,e2&
                                      !add eax,[ecx+edx*4]    'add e
                                      !mov edx,f2&
                                      !add eax,[ecx+edx*4]    'add f
                          
                                      !cmp eax,2              '>2?
                                      !jle skip1              'no.
                                      !inc sc&                'yes, >2 so inc sc& and then ...
                                      EXIT                    '... exit the loop
                          
                                      skip1:
                          
                             ' NEXT
                             !dec edi
                             !cmp edi,1
                             !jne lp1
                          
                          NEXT

                          Comment


                          • #14
                            >they're all ones and zeroes

                            Really? Why not store your ones and zeros as bits and read/add 'em 16, 32 or 64 bits at a time as INTEGER, LONG or QUAD? You may have to twink around with the bit orders, but there has got to be some kind of efficiency available here with one and zero being the only values possible.....

                            MCM
                            Michael Mattias
                            Tal Systems Inc. (retired)
                            Racine WI USA
                            [email protected]
                            http://www.talsystems.com

                            Comment


                            • #15
                              Neil,
                              now I get the correct first set matching yours. I assume the problem was the progress.bin file left from a previous run which needed to be deleted. My second set of results doesn't match yours. It's:

                              date time 13289093 694723 95.0319% 4


                              Your original code took 55m49s to give the first set of results, the last code I posted ran in 3m19s on the same computer, only 17 times faster but still not bad.

                              I'm sure there are still some slight improvements to be made but further big gains will probably have to alter the algorithm you use.

                              Maybe 17 times faster will get it finished in your lifetime now?

                              Paul.

                              Comment


                              • #16
                                Paul,

                                After a poor night's sleep (I'm home, sick with the flu.) I made a strong coffee :coffee4: and sat down to actually understand what your code was doing.

                                The reason your numbers didn't match are
                                1) You drop out of the "g" loop one too soon and
                                2) the EXIT drops out of the f loop, not the g loop.

                                Here's my revised code and it works. Yay! Woo! The best PB only code I've had, with your revised pointer calcs and Michael's shortcuts on the addition line, does just over 5 seconds per z. The asm does generally around 3.5.

                                Code:
                                FOR f&=49 TO e&+1 STEP -1
                                    f2&=f&*163
                                
                                     !mov edi,162   'use register edi to hold g&
                                    #ALIGN 32
                                    lp1:
                                       !cmp edi,z&
                                        !je skip1               'if = then skip the calculation
                                        !mov ecx,edi 'g&             'offset to current row
                                        !sal ecx,2              '4 bytes per entry in array
                                        !add ecx,nptr           'add address of array
                                        !mov edx,a2&
                                        !mov eax,[ecx+edx*4]    'get a
                                        !mov edx,b2&
                                        !add eax,[ecx+edx*4]    'add b
                                        !mov edx,c2&
                                        !add eax,[ecx+edx*4]    'add c
                                        !mov edx,d2&
                                        !add eax,[ecx+edx*4]    'add d
                                        !mov edx,e2&
                                        !add eax,[ecx+edx*4]    'add e
                                        !mov edx,f2&
                                        !add eax,[ecx+edx*4]    'add f
                                        !cmp eax,2              '>2?
                                        !jle skip1              'no.
                                        !inc sc&                'yes, >2 so inc sc& and then ...
                                        !jmp ngloop                    '... exit the loop
                                        skip1:
                                       !dec edi
                                       !cmp edi,0
                                       !jne lp1
                                       ngloop:
                                NEXT
                                Neil Croft (cissp)

                                Comment


                                • #17
                                  Originally posted by Michael Mattias View Post
                                  >they're all ones and zeroes

                                  Really? Why not store your ones and zeros as bits and read/add 'em 16, 32 or 64 bits at a time as INTEGER, LONG or QUAD? You may have to twink around with the bit orders, but there has got to be some kind of efficiency available here with one and zero being the only values possible.....

                                  MCM
                                  My heart says you're right and an array 162 quads should be a better way of storing it all but my experiences to date say otherwise. My level of skill and knowledge of pointers and arrays seems to be letting me down though. Just using BIT commands hasn't delivered the performance boost I had hoped for. That said, the last 24 hours and yours and Paul's input has given me some new perspectives and ideas.

                                  Thanks to you both. Again.
                                  Neil Croft (cissp)

                                  Comment


                                  • #18
                                    Neil,

                                    1) You drop out of the "g" loop one too soon

                                    Are you sure? The g loop is 162 to 2 and I exit when it gets decremented to 1. You appear to exit when it reaches 0 so you include an iteration at g=1 which was not there in the original BASIC loop.

                                    2) the EXIT drops out of the f loop, not the g loop.
                                    Guilty as charged. I thought the speed improved a lot more than expected with that change but since it gave the right answer I didn't investigate.

                                    Paul.

                                    Comment


                                    • #19
                                      Neil,
                                      Slight additional speed up.
                                      Code:
                                      FOR e&=48 TO d&+1 STEP -1
                                      	e2&=e&*163
                                      	!mov esi,nptr '<---add this line to make use of the now unused esi register
                                      
                                      	FOR f&=49 TO e&+1 STEP -1
                                      replace these 3 lines:
                                      Code:
                                      !mov ecx,edi 'g&             'offset to current row
                                      !sal ecx,2              '4 bytes per entry in array
                                      !add ecx,nptr           'add address of array
                                      with this one:
                                      Code:
                                      !lea ecx,[edi*4+esi]        'faster address calculation
                                      Paul.

                                      Comment


                                      • #20
                                        > Here's my revised code and it works........shortcuts on the addition line

                                        Aren't you STILL doing all the additions?

                                        Can't you ...
                                        ! cmp eax, 2
                                        after each
                                        !add eax,[ecx+edx*4]

                                        ..and jump somewhere if > 2?

                                        MCM
                                        Michael Mattias
                                        Tal Systems Inc. (retired)
                                        Racine WI USA
                                        [email protected]
                                        http://www.talsystems.com

                                        Comment

                                        Working...
                                        X