Announcement

Collapse
No announcement yet.

Weird performance boost

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Neil Croft
    replied
    Originally posted by Michael Mattias View Post
    An MS-VB application ... done on time and it looks good
    I've worked for a couple of large (global) IT companies since 1996 and I can vouch that's not actually always true. A case in point was a George III mainframe replacement program for Y2K which someone suddenly realised in October 1999 wasn't going to be done on time. It is fair to say in that case however that it was the project management and client's fault more than the programmers and language choice.

    I would agree it's a dog to deploy though.

    And don't get me started on MS load balancing and what "damage" that can do to an RFC compliant network.

    Leave a comment:


  • Neil Croft
    replied
    Originally posted by Paul Dixon View Post
    Michael,

    Important in this application? That'd be speed.

    Paul.
    That would be correct.

    This isn't something I'll look at in 2 years and wonder what the heck G& is supposed to represent so readability isn't important.

    There isn't a machine here with less than 2gig (and only the one with less than 4 gig) of RAM so memory constraints aren't relevant.

    It's my hobby (like Sudoku or Crosswords) and I program in my free time so development costs are zero.

    Michael is quite correct that there are many reasons to develop code in many ways. I was once a climbing instructor on a "management training" course for part of the software team of a large aircraft manufacturer. I had never seen any team before or since who could write so small and neatly on a whiteboard. Over some beers we discussed software development. They told a few horror stories (coding metres instead of feet or vice versa and how high inside an aircraft the nose wheel comes when you land with that bug.) Their requirements were absolute accuracy, complete readability and fairly quick development cycles. Performance really wasn't that high on their list of needs. Obscure, CPU specific opcodes were a big no-no. Easily maintained generic libraries was their game.

    It's also true that a notepad and pen should always be the first tools of programming anything other than a trivial home-use utility. ASM is just another one of the tools in the box, like a different spanner is to a mechanic. It should neither be dismissed as irrelevant or held to be the only way to do the job.

    Someone once said, if the only tool in your toolkit is a hammer, every problem looks increasingly like a nail.

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    It all depends on what is important in this application. Size? Resource usage? Maintainability? Features? Development expense? Deadline?
    Important in this application? That'd be speed.

    Paul.

    Leave a comment:


  • Michael Mattias
    replied
    It is a fallacy that faster is ALWAY better.

    It all depends on what is important in this application. Size? Resource usage? Maintainability? Features? Development expense? Deadline?

    Developers must always balance competing needs.

    Look at, let's say, the much-maligned Microsoft Visual BASIC. Is there anyone who can't give VB a good score for "ease of GUI development" and "how quickly I can have my screen working?"

    Of course, no one will say that development speed and ease of development comes for free: An MS-VB application is a dog to deploy and often runs like a pig. But by golly, it's done on time and it looks good - which may be plenty good enough!

    MCM

    Leave a comment:


  • Paul Dixon
    replied
    Neil,
    I don't think bit arrays are going to help at all. Your array is small enough that it easily fits in the CPU cache so there's going to be no access speed advantage with bit array but you will have the disadvantage of having to work with bits which is inherently slower.

    I think you'd be better off looking at my original suggestion of moving the g& loop to outside the a& loop. It will allow you to calculate sub totals once only instead of every time and there are other conditions you can test to exit early which will save you as much time as save now by ending the g& loop early.

    The following code works for the first set (but gives a different answer fore the second, I haven't worked out why yet) but it's entirely in BASIC (MCM will be happy!) and runs in a similar time on my machine to the last version you posted.

    It could be improved further by checking if the sum has already reached 3 before some of the loops or by converting parts to ASM and losing style points from Michael.

    Paul.
    Code:
        FOR z&=161 TO 1 STEP -2
            '
            sc&=0
                FOR g&=162 TO 1 STEP -1
                    IF g&<>z& THEN
    
                    FOR a&=44 TO 1 STEP -1
                        aSum&[email protected][g& OF 162,a& OF 49]
                        FOR b&=45 TO a&+1 STEP -1
                            bSum& = aSum& + @nptr[g& OF 162,b& OF 49]
                            FOR c&=46 TO b&+1 STEP -1
                                cSum& = bSum& + @nptr[g& OF 162,c& OF 49]
                                FOR d&=47 TO c&+1 STEP -1
                                    dSum& = cSum& + @nptr[g& OF 162,d& OF 49]
                                    IF dSum&>0 THEN   'if it's not > 0 by now then it will never exceed 2 in the last loop so skip the next 2 loops
                                    FOR e&=48 TO d&+1 STEP -1
                                        eSum& = dSum& + @nptr[g& OF 162,e& OF 49]
                                        IF eSum&>1 THEN  'if it's not > 1 by now then it will never exceed 2 in the next loop so skip the next loop
    
                                            FOR f&=49 TO e&+1  STEP -1
                                            fSum& = eSum& + @nptr[g& OF 162,f& OF 49]
    
                                            IF fSum& >2 THEN INCR sc&
    
                                            NEXT
                                        END IF
    
                                    NEXT
                                    END IF
                                NEXT
                            NEXT
                        NEXT
                    NEXT
    
                   END IF
    
                NEXT
            '
            IF sc&>tst& THEN tst&=sc&:dlt&=z&
            '
        NEXT

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    in a case such as this, how good something is is determined by how fast it is so 50% faster is 50% better. That's not to say you can't make it better still with a different algorithm, but as things stand at this point of this thread, the current code is 50% better than it was at the start of the thread.

    Paul.

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    Fifty percent compared to WHAT?
    Compared to what it was before it was augmented with ASM.
    Good BASIC does the job in X
    Good BASIC with a bit of ASM does the same job in X/2

    Bad BASIC does the same job in Y where Y>>X
    Bad BASIC with a bit of ASM does the same job in Y/2

    When you need something to run fast then a good algorithm helps and a bit of ASM helps more. You can't just ignore the help from one because the other is yet perfect.

    Paul.
    Last edited by Paul Dixon; 31 Jul 2009, 06:21 PM.

    Leave a comment:


  • Michael Mattias
    replied
    > Either way, it's 50% better.

    No, it's not.. it's 50% faster.

    Leave a comment:


  • Michael Mattias
    replied
    >if ASM improves the speed by 50%

    Fifty percent compared to WHAT?

    Hack ASM is not going to be 50% faster than quality BASIC.

    And no language is going to be fast with bad algorithms.

    I think it's safe to say this thread is NOT a demonstration of "when you can write better ASM than BASIC".... oh, gee whiz, I hope it isn't anyway. Bad BASIC makes bad COBOL look good.

    Leave a comment:


  • Paul Dixon
    replied
    Michael,
    if ASM improves the speed by 50% then it'll likely improve the speed of bad algorithms by 50% and good algorithms by 50%. Either way, it's 50% better.

    Paul.

    Leave a comment:


  • Neil Croft
    replied
    Originally posted by Michael Mattias View Post
    I think it's marketed under the brand name " THINKING BEFORE YOU WRITE THAT FIRST LINE OF CODE."
    I looked on Amazon but I couldn't find that.

    At the end of the day, ASM is just another language. Bad code is bad code whatever language you use. Ditto, good code is good code. I just hope to glean enough from these forums to move further away from the former and closer to the latter.

    Leave a comment:


  • Michael Mattias
    replied
    > but I'm not aware of any compiler or software that'll write a better algorithm.

    Compilers don't write application algorithms; programmers write application algorithms.

    Now, if someone could come up with a pre-compiler that went through code that said "You don't want to do it that way, you want to do it this way", that would be one rich developer.
    The precompiler you want is available now..right in your home or office, at zero cost.

    I think it's marketed under the brand name " THINKING BEFORE YOU WRITE THAT FIRST LINE OF CODE."


    MCM

    Leave a comment:


  • Neil Croft
    replied
    Calculating the pointer indexes had about as much effect as splitting the adds but that in turn made the ASM simple (well, simple enough for me to understand).

    The algorithm v implementation argument is as old as the hills and one that will rage forever I suspect. I do believe, based upon my own, limited, experience that a better algorithm wins every time but I'm not aware of any compiler or software that'll write a better algorithm. Now, if someone could come up with a pre-compiler that went through code that said "You don't want to do it that way, you want to do it this way", that would be one rich developer.

    On this problem, I recently stumbled upon a series of blog posts where a C++ chap had approached it differently. His code was fast but used vast amounts of memory and didn't even manage to achieve the currently known best value either.

    Regardless of whether this is the "right" way to solve the problem or not, I've had a simple improvement (splitting the adds) pointed out to me that will hopefully make me look at my code slightly differently in the future. I have also had my very limited knowledge of pointers and ASM greatly improved and therefore, as I've said before on here, learning anything is a "good thing"(tm). I just wish I could justify the money to get Knuth's books on programming and algorithms

    Current best progress on this particular problem is
    Code:
    07-31-2009 16:03:22 13721577  262239 98.1247% 1
    which is better than my Monte Carlo runs given I restarted the runs yesterday. I shall now go and look once more at my bit array attempts to see why they can't be as quick.

    Leave a comment:


  • Michael Mattias
    replied
    I'd love to load up this program into some kind of magic meter which would show me a little screen like

    Code:
      Original Execution time     nnnnn
      New Execution time         nnnnnn
      %improvement due to use of assembly language per se        XX
      %improvement due to use of better alogorithm and programming techinque YY
    Any bets against "YY" being greater than "XX?"

    I thought not.

    Leave a comment:


  • Neil Croft
    replied
    Latest function with Paul's ASM and Michael's drop out shortcut.

    Code:
    THREAD FUNCTION MyThread (BYVAL dummy AS LONG) AS LONG
        DIM nptr AS LONG POINTER
        LOCAL a&,b&,c&,d&,e&,f&,g&,z&
        LOCAL a2&,b2&,c2&,d2&,e2&,f2&,sc&
        nptr=VARPTR(n(0,0))
        tst&=0
        dlt&=0
        FOR z&=161 TO 1 STEP -2
            sc&=0
            FOR a&=44 TO 1 STEP -1
                a2&=a&*163
                FOR b&=45 TO a&+1 STEP -1
                    b2&=b&*163
                    FOR c&=46 TO b&+1 STEP -1
                        c2&=c&*163
                        FOR d&=47 TO c&+1 STEP -1
                            d2&=d&*163
                            FOR e&=48 TO d&+1 STEP -1
                                e2&=e&*163
                                !mov esi,nptr '<---add this line to make use of the now unused esi register
                                FOR f&=49 TO e&+1 STEP -1
                                    f2&=f&*163
                                     !mov edi,162   'use register edi to hold g&
                                    #ALIGN 32
                                    lp1:
                                        !cmp edi,z&
                                        !je skip1               'if = then skip the calculation
                                        !lea ecx,[edi*4+esi]        'faster address calculation
                                        !mov edx,a2&
                                        !mov eax,[ecx+edx*4]    'get a
                                        !mov edx,b2&
                                        !add eax,[ecx+edx*4]    'add b
                                        !mov edx,c2&
                                        !add eax,[ecx+edx*4]    'add c
                                        !cmp eax,3
                                        !je isc
                                        !mov edx,d2&
                                        !add eax,[ecx+edx*4]    'add d
                                        !cmp eax,3
                                        !je isc
                                        !mov edx,e2&
                                        !add eax,[ecx+edx*4]    'add e
                                        !cmp eax,3
                                        !je isc
                                        !mov edx,f2&
                                        !add eax,[ecx+edx*4]    'add f
                                        !cmp eax,3              '=3
                                        !jne skip1              'no.
                                        isc:
                                        !inc sc&                'yes, >2 so inc sc& and then ...
                                        !jmp ngloop                    '... exit the loop
                                        skip1:
                                        !dec edi
                                        !cmp edi,0
                                        !jne lp1
                                   ngloop:
                                NEXT
                            NEXT
                        NEXT
                    NEXT
                NEXT
            NEXT
            IF sc&>tst& THEN tst&=sc&:dlt&=z&
        NEXT
        FUNCTION=-1
    END FUNCTION

    Leave a comment:


  • Neil Croft
    replied
    Originally posted by Michael Mattias View Post
    > Here's my revised code and it works........shortcuts on the addition line

    Aren't you STILL doing all the additions?

    Can't you ...
    ! cmp eax, 2
    after each
    !add eax,[ecx+edx*4]

    ..and jump somewhere if > 2?

    MCM
    Yes, you're quite correct and the currently running code (without Paul's additional improvements actually does this. I'll post revised shortly when I've added Paul's other bits.

    Leave a comment:


  • Neil Croft
    replied
    Originally posted by Paul Dixon View Post
    Neil,
    Slight additional speed up.
    Code:
    FOR e&=48 TO d&+1 STEP -1
    	e2&=e&*163
    	!mov esi,nptr '<---add this line to make use of the now unused esi register
    
    	FOR f&=49 TO e&+1 STEP -1
    replace these 3 lines:
    Code:
    !mov ecx,edi 'g&             'offset to current row
    !sal ecx,2              '4 bytes per entry in array
    !add ecx,nptr           'add address of array
    with this one:
    Code:
    !lea ecx,[edi*4+esi]        'faster address calculation
    Paul.
    I shall amend and test.

    Leave a comment:


  • Neil Croft
    replied
    Originally posted by Paul Dixon View Post
    Neil,

    1) You drop out of the "g" loop one too soon

    Are you sure? The g loop is 162 to 2 and I exit when it gets decremented to 1. You appear to exit when it reaches 0 so you include an iteration at g=1 which was not there in the original BASIC loop.
    In the original, it was doing two tests inside the loop.
    Originally posted by Paul Dixon View Post
    2) the EXIT drops out of the f loop, not the g loop.
    Guilty as charged. I thought the speed improved a lot more than expected with that change but since it gave the right answer I didn't investigate.

    Paul.
    It gave the right answer on the first line because that hadn't really changed anything but was the complete "score" for the set unchanged. Hence the inclusion of the second line. It's no bad thing. If it had worked right first time I wouldn't have made the effort to understand exactly what it was doing.

    Leave a comment:


  • Michael Mattias
    replied
    > Here's my revised code and it works........shortcuts on the addition line

    Aren't you STILL doing all the additions?

    Can't you ...
    ! cmp eax, 2
    after each
    !add eax,[ecx+edx*4]

    ..and jump somewhere if > 2?

    MCM

    Leave a comment:


  • Paul Dixon
    replied
    Neil,
    Slight additional speed up.
    Code:
    FOR e&=48 TO d&+1 STEP -1
    	e2&=e&*163
    	!mov esi,nptr '<---add this line to make use of the now unused esi register
    
    	FOR f&=49 TO e&+1 STEP -1
    replace these 3 lines:
    Code:
    !mov ecx,edi 'g&             'offset to current row
    !sal ecx,2              '4 bytes per entry in array
    !add ecx,nptr           'add address of array
    with this one:
    Code:
    !lea ecx,[edi*4+esi]        'faster address calculation
    Paul.

    Leave a comment:

Working...
X