Announcement

Collapse
No announcement yet.

PB Speed Challenge

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PB Speed Challenge

    hi all!

    below is the description and source of a proggy that does permutation of a
    dataset, i found one routine doing this in the forum here, but is so slow that
    good old c had to come in to the rescue...

    compare the speed of this proggy to that in the forums here and you'll be blown
    away! the pb routine is so lame in comparison, well, uhm, that there is no
    comparison! the one from the forum runs way slower, even on 200mhz pentium
    system under nt2000.
    here's the code from the
    forum: http://www.powerbasic.com/support/pb...ad.php?t=22353

    anyways, now that the mood is set, check out this proggy by paul and think it
    over, can you write something better, faster?

    the algorithm used is fairly simple. for each item i in a set of n items:

    remove item i from the data set;
    find all permutations of the remaining n - 1 items;
    insert item i back into the set in its original position
    the recursive function perm() shows this algorithm. each permutation is printed
    as it is found.

    the rest of the code is relatively lengthy, but merely supports the above
    algorithm by providing it with appropriate data structures. two data structures
    are provided:

    a data set, an ordered list allowing two operations:
    removing an item from a given position, and
    inserting an item into a given position
    a stack to hold the current permutation, which like all stacks allows two
    operations:
    pushing an item onto the top of the stack, and
    popping an item off the top of the stack
    in order to understand the algorithm used to find the permutations, it is not
    necessary to delve into the details of these functions or data structures any
    more than it is necessary to understand exactly how a printf() call produces
    output on your screen. the main() and perm() functions describe the algorithm
    completely.

    functions to:

    initialize and free memory used by the data set,
    print out the contents of the stack, and
    retrieve the number of data items from the command line
    are also provided.


    ----------------------------------------------
    usage:
    pass the number of data items to the command line as the single argument. an
    example session is shown below:

    perm 3
    permutations of 3 items:

    abc
    acb
    bac
    bca
    cab
    cba

    6 permutations in all.
    ---------------------------------------------


    executable size (in kb)14.256!
    compiled and executed on an intel 166mhz pentium processor
    with 32mb ram, running linux kernel 2.2.5-15

    here you see the performance of the algorithm, where it becomes apparent
    that the output (to screen etc) consumes more time than algorithm itself.
    so instead of printing it, keeping it in an array eliminates the output
    delays. one can then print the array with all permutations once the
    program has done its job.


    performance (with output):
    =========================
    invocation permutations (secs)
    ./perm 1 1 -
    ./perm 2 2 -
    ./perm 3 6 -
    ./perm 4 24 -
    ./perm 5 120 -
    ./perm 6 720 0.05
    ./perm 7 5,040 0.36
    ./perm 8 40,320 2.97
    ./perm 9 362,880 27.64


    performance (no output):
    =======================
    invocation permutations (secs)
    ./perm 1 1 -
    ./perm 2 2 -
    ./perm 3 6 -
    ./perm 4 24 -
    ./perm 5 120 -
    ./perm 6 720 -
    ./perm 7 5,040 0.015
    ./perm 8 40,320 0.12
    ./perm 9 362,880 1.06
    ./perm 10 3,628,800 10.61
    ./perm 11 39,916,800 116.61


    so here's the challenge:
    show off pb and write it in pb/dll so it is at least as fast as the
    c program, preferably even faster (if that can be done in pb)
    the code should be pb only! however it would be cool to see what a
    real assembler guru can do to speed it up even more once it's done...


    jax

    ps: i just aquired pb/dll a little while ago and am still trying to
    figure it all out, however i know c since decades, so i figure it would
    be easier to start with it. now i am exploring if bying pb was worth it
    or not, seems to me that there are some serious speed penalties in
    using pb. however, mayhaps there is a better solution ala pb style and i
    am just not into it enough yet?

    next post is c-source


    ------------------

  • #2
    Code:
    ===============================================================================
    Source:
    ===============================================================================
    
    perm.c
    /*
      PERM.C
      ======
      Prints all the possible permutations of 'n' items.
      'n' must be supplied as the sole command line argument.
    */
    
    
    #include <stdio.h>
    #include <stdlib.h>
    
    #include "data.h"
    #include "stack.h"
    #include "cmdline.h"
    
    
    static void Perm(unsigned n);
    
    static int count = 0;
    
    
    int main(int argc, char *argv[]) {
        unsigned n = ParseCmdLine(argc, argv);
    
        InitializeData(n);
    
        printf("Permutations of %u items:\n\n", n);
        Perm(n);
        printf("\n%d permutations in all.\n", count);
    
        FreeData();
    
        return EXIT_SUCCESS;
    }
    
    void Perm(unsigned n) {
        unsigned item;
    
        if ( n == 0 ) {
            PrintStack();
            ++count;
            return;
        }
    
        for ( item = 0; item < n; ++item ) {
            Push(RemoveItem(item));
            Perm(n-1);
            InsertItem(Pop(), item);
        }
    }
    --------------------------------------------------------------------------------
    
    data.h
    /*
      DATA.H
      ======
      Interface to functions for dealing with data
    */
    
    
    #ifndef PG_DATA_H
    #define PG_DATA_H
    
    void InitializeData(unsigned n);
    void FreeData(void);
    char RemoveItem(unsigned pos);
    void InsertItem(char n, unsigned pos);
    
    #endif  /* PG_DATA_H */
    --------------------------------------------------------------------------------
    
    data_array.c
    /*
      DATA_ARRAY.C
      ============
    
      Functions for manipulating the data using arrays
    */
    
    
    #include <stdio.h>
    #include <stdlib.h>
    
    #include "data.h"
    
    
    static char data[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    static unsigned size;
    
    
    /*  Initialize memory for data  */
    
    void InitializeData(unsigned n) {
        size = n;
    }
    
    
    /*  Free memory used for data  */
    
    void FreeData(void) {
        return;
    }
    
    /*  Remove data item at position 'pos'  */
    
    char RemoveItem(unsigned pos) {
        char d = data[pos];
    
        if ( pos >= size-- ) {
            fprintf(stderr, "Data position %u is out of bounds.\n", pos);
            exit(EXIT_FAILURE);
        }
    
        while ( pos < size ) {
            data[pos] = data[pos+1];
            ++pos;
        }
    
        return d;
    }
    
    
    /*  Insert data item 'n' at position 'pos'  */
    
    void InsertItem(char n, unsigned pos) {
        int i = (signed) size;
    
        if ( pos > size++ ) {
            fprintf(stderr, "Data position %u is out of bounds.\n", pos);
            exit(EXIT_FAILURE);
        }
    
        while ( --i >= (signed) pos )
            data[i+1] = data[i];
    
        data[pos] = n;
    }
    
    
    --------------------------------------------------------------------------------
    
    stack.h
    /*
      STACK.H
      =======
      Interface to stack operations
    */
    
    
    #ifndef PG_STACK_H
    #define PG_STACK_H
    
    void InitializeStack(void);
    void Push(char n);
    char Pop(void);
    void PrintStack(void);
    
    #endif  /* PG_STACK_H */
    
    
    --------------------------------------------------------------------------------
    
    stack_array.c
    /*
      STACK_ARRAY.C
      =============
      Stack operations using arrays
    */
    
    
    #include <stdio.h>
    #include <stdlib.h>
    
    #include "stack.h"
    
    
    #define MAX_STACK_SIZE 26
    
    static char stack[MAX_STACK_SIZE];
    static int top = -1;
    
    
    /*  Push item 'n' onto the stack  */
    
    void Push(char n) {
        if ( ++top == MAX_STACK_SIZE ) {
            fprintf(stderr, "Stack full!\n");
            exit(EXIT_FAILURE);
        }
    
        stack[top] = n;
    }
    
    
    /*  Pop top item from the stack  */
    
    char Pop(void) {
        if ( top < 0 ) {
            fprintf(stderr, "Stack empty!\n");
            exit(EXIT_FAILURE);
        }
    
        return stack[top--];
    }
    
    
    /*  Output contents of stack  */
    
    void PrintStack(void) {
        int i = 0;
    
        while ( i <= top )
            putchar(stack[i++]);
    
        putchar('\n');
    }
    
    --------------------------------------------------------------------------------
    
    cmdline.h
    /*
      CMDLINE.H
      =========
      Interface to command line parsing function
    */
    
    
    #ifndef PG_CMDLINE_H
    #define PG_CMDLINE_H
    
    unsigned ParseCmdLine(int argc, char *argv[]);
    
    #endif /* PG_CMDLINE_H */
    
    
    --------------------------------------------------------------------------------
    
    cmdline.c
    /*
      CMDLINE.C
      =========
      Function to parse command line and return integral argument
    */
    
    
    #include <stdio.h>
    #include <stdlib.h>
    
    #include "cmdline.h"
    
    
    unsigned ParseCmdLine(int argc, char *argv[]) {
        unsigned n;
        char * endptr;
    
        if ( argc < 2 ) {
            fprintf(stderr, "You must supply an argument\n");
            exit(EXIT_FAILURE);
        }
        else if ( argc > 2 ) {
            fprintf(stderr, "You must only supply one argument\n");
            exit(EXIT_FAILURE);
        }
    
        n = (unsigned) strtoul(argv[1], &endptr, 0);
    
        if ( *endptr ) {
            fprintf(stderr, "You must supply a whole number as an argument\n");
            exit(EXIT_FAILURE);
        }
        else if ( n > 26 ) {
            fprintf(stderr, "You must specify a number less than 27\n");
            exit(EXIT_FAILURE);
        }
        else if ( n < 1 ) {
            fprintf(stderr, "You must specify a number greater than 0\n");
            exit(EXIT_FAILURE);
        }
    
        return n;
    }
    ------------------

    Comment


    • #3
      Um, Mr. Death, the code you pointed us to is hardly "optimized" code.
      It was written for the DOS compiler, it does a load of screen I-O, it uses the string engine extensively, etc, etc.

      But...

      You may not have noticed that the original requester said, "..works great...exactly what I need."

      So I am wondering: are you just trying to get someone to do your job/homework for you?

      Besides, it's not the compiler, it's the programmer. (You're new here, so you don't get the inside joke).

      MCM




      [This message has been edited by Michael Mattias (edited July 28, 2001).]
      Michael Mattias
      Tal Systems Inc. (retired)
      Racine WI USA
      [email protected]
      http://www.talsystems.com

      Comment


      • #4
        Neither is the one I submitted, that should have been very
        obvious just by looking at it. I intentionally used code that isn't optimized,
        because as it stands, I suspect that it would be impossible for PB to beat.

        You suspect wrong, I don't need my homework done... I could care less, does it
        occur to you that I don't need to write this in PB at all? And since PB has
        pointers, I could also write it in PB as well, the question is: Can you?

        Here's one like the one in the forum (quickly whipped up, just for you...)
        ============================================================================
        Code:
        not tested, just of the top of my head...
        
        void perm (int);
        
        int l = -1;
        int num;
        int *ar;
        
        void main(void)
        {
           int i;
        
           printf ("Enter n: ");  scanf ("%d", &num);
           printf ("\n");
        
           ar = malloc (num * sizeof (int));
        
           for (i=0; i < num; i++)
               ar[i] = 0;
           
           perm(0);
        }
        
        
        void perm (int k)
        {
           int i;
        
           ar[k] = ++l;
           if (l == num)
              {
              for (i=0; i < num; i++) printf ("%2d", ar[i]);
              printf ("\n");
              }
           else
              for (i = 0; i < num; i++)
                 if (ar[i] == 0) perm(i);
        
           l--;
           ar[k] = 0;
        }
        =========================================================

        This would be EXTREMELY easy to translate into PB, just make
        the malloc an array and use a varptr and you're off and
        rolling in PB...

        But now I am starting to solve my own challenge, forget it..

        If you can't/won't take the challenge, that's okay, just please respond
        with something constructive, or not at all...

        FYI, I also get the "inside" joke... ;-)

        I didn't post this to get useless responses, I posted this to see how
        year long PB programmers would squeeze performance out of their favorite
        toy, which I want to make mine also. Furthermore the best way to get into
        the language is by having the best "teach" you. By examining what the
        "Masters" here do, all of us can learn something about the idiosyncrasies
        of the language and mayhaps even a trick ot two...


        [This message has been edited by Jack Death (edited July 28, 2001).]

        Comment


        • #5
          Jack;

          I don't program in C, but at first glance there is a significant
          difference between the C code you posted and the PB DLL (was intended for PB 3.5)
          code in the URL you listed.

          If you attempt to use string variables in PB DLL, your program
          will be using the Windows OLE string engine. An awful lot of
          processing goes on in the background, when you use string variables.

          The C code is using the CHAR type, which if my memory serves me,
          would be comparable to a Byte array in PB and not a string variable.

          Because the OLE string engine (which even VB uses) is significantly
          slower than working with a byte array, the PB DLL code should be
          significantly slower than the C code.

          Now if the PB code were to use a byte array and pointers, which would
          put it on par with the C code, I believe the PB code should be
          able to hold its own compared to the C code in execution speed.

          One of Basics main advantages is the variable length string variable
          type. It is extremely useful, but there is a price to pay in slower
          execution. In most instances it doesn't make a difference, but when
          strings are used in a loop hundreds of times (or more), then the
          speed lose becomes significant.

          The comparison you are making is comparing Apples to Oranges
          (which isn't a fair comparison). You need to compare some PB
          code that is really doing the same thing (not just the algorythm,
          but the actual same coding methods).

          I can't guarantee that PB will always be as fast or faster than C,
          but when the code is truly similiar, PB holds its own quite well.




          ------------------
          Chris Boss
          Computer Workshop
          Developer of "EZGUI"
          http://cwsof.com
          http://twitter.com/EZGUIProGuy

          Comment


          • #6
            Personally the "Mood" would sway me from even looking twice at the code.
            #1, Linux does not have the overhead that windows does, so that in itself is not even a fair comparison.
            #2, 14k compiled is what, the PB or Linux? PB Should compile down to 9k or so on something like that...I could be wrong.

            If you do as these guys say and put it into a byte array and use pointers, it should be COMPARABLE to C code, given equal OS's and whatnot....

            Or perhaps one of our ASM guru's could do it in ASM, which C also supports, THEN go back and run the test and ask if it's an even platform..

            I have apps on my Linux box that Windows could not touch, Linux is by far faster, despite a 166....
            My router for example is a 133 w/64 megs, and I could not do that in Win2k.


            Apples and Oranges...


            Scott

            ------------------
            Scott
            Scott Turchin
            MCSE, MCP+I
            http://www.tngbbs.com
            ----------------------
            True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

            Comment


            • #7
              Okay, okay... This seems to go nowhere fast..

              Forget the Linux, it was just a reference to show that this stuff can run very
              fast, even on a "lame" machine. Think Windows, I mentioned that the PB version
              (from the forum) runs very slow in Windows 2000 (on 200 MHz machine).
              The code runs just as fast in Windows as it does in Linux, the OS doesn't seem
              to make much, if any difference to the C implementation.

              However, to get something like a a permutation of only 7 digits the original PB
              code took around:

              12 seconds for 5040 items, versus

              10.61 secs for 3,628,800 items in the "lamer" C code vers. (without output), or

              0.36 secs for 5040 items in the same with output.

              Scott, please read my previous post about that, it kind of reaffirms what you are
              saying. BTW, I run mine on PII 100 MHz, on Win98 and it has no problem dishing it's
              INET traffic to 5 station connected simultaneously. There might be something wrong
              with your setup if you can't do that in Win2k.

              Chris, you figured out what this is about, you actually give your 5 Cents
              worth on improving the PB code. By the way I added some animation functionality to
              EZGUI, gives it some real cool effects. When I have more time I'll make a real cool
              interface incorporating it and e-mail it.

              To make it clear:
              -------------------
              Please be free to use any PB code you want to, you don't have to duplicate
              any of the 2 above C versions, what I would like to see is a PB proggy
              performing at these speeds, no matter how it's done (except for ASM,
              'cause that would be the same for either PB or C)

              So this is not about the same code performing on the same OS or platform,
              it's about programming. We're not trying to compare Apples and Oranges here,
              we're trying to show off programming know-how. The tricky, speedy stuff,
              size isn't an issue since it's going to be small one way or the other....
              Make it Apples, Oranges or Eggs, who cares, just show the speed!

              That pointers can be used should be obvious (what else would you do in C,
              for ex.?)

              Hint: Anybody thought about a non-recursive version? linked list? Bitmaps?

              Guys, don't be so touchy here, it's not an attack on PB/DLL, I know it's great
              product (finally finished digesting the manual) and I am pretty
              convinced that I can do most anything I want with it, at great
              execution speeds, with the comfort of great functionality
              built right into the language itself.
              So roll in the feelers, don't be so sensitive, nobody wants
              to step on them...

              A routine like the one we're discussing (permutation) is a great discussion
              item, after all, any speed improved in the code counts! Aren't we doing
              hundreds of millions items with just 12 digits? And it gets really
              interesting with more. This is one killer subject. Just think about
              permutating "Abraham Lincoln"

              Well, 'nuff said...


              ------------------




              [This message has been edited by Jack Death (edited July 29, 2001).]

              Comment


              • #8
                Jack, rewrote your smaller app in PB. With no output the results are very similar.
                >>> num=11 <<<
                PB Elapsed: 11256ms
                VC Elapsed: 12007ms
                Code:
                #Dim All
                #Option Version4
                 
                Declare Function GetTickCount Lib "KERNEL32.DLL" Alias "GetTickCount" () As Dword
                Declare Sub perm (k As Long)
                Global l      As Long
                Global num    As Long
                Global arl()  As Long
                Global ar     As Long Ptr
                 
                Function PbMain
                  Local i  As Long
                  Local s  As Long
                  Local f  As Long
                  num = Val(Command$)
                  l=-1
                  s = GetTickCount()
                  ReDim arl(0 To num) As Global Long
                  ar = VarPtr(arl(0))
                  For i = 0 To num -1
                    @ar[i] = 0
                  Next
                  perm 0
                  f = GetTickCount()
                  StdOut "PB Elapsed:" & Str$(f-s) & "ms"
                End Function
                 
                Sub perm (k As Long)
                   Local i As Long
                   Incr l
                   @ar[k] = l
                   If (l = num) Then
                      For i = 0 To num -1
                    '    StdOut Str$(@ar[i]);
                      Next
                    '  StdOut ""
                   Else
                      For i = 0 To num -1
                        If (@ar[i] = 0) Then
                          perm i
                        End If
                      Next
                   End If
                   Decr l
                   @ar[k] = 0
                End Sub
                [This message has been edited by Ron Pierce (edited July 28, 2001).]

                Comment


                • #9
                  Hi Ron,

                  that's the ticket! A man short of words but large on action ;-)

                  Yup, that's one way of doing it! The short version I presented sure beats
                  the heck out of the original version shown and turns out to be even faster
                  in PB than in C! For:

                  10 digits 8.330 secs on my PII 200Mhz WIN98/ME versus

                  10 digits 10.92 secs for the C version on same machine)

                  That's 3,628,800 different combinations created in in about 8 secs!

                  PB is the clear winner here, as Ron so surely proved!
                  -----------------------------------------------------
                  So there is a way to compare these Apples with Oranges after all...
                  This may just turn out to be an interesting thread...

                  Pointers have something going for them, don't they?

                  Great job Ron,
                  ==============

                  Thanx for the time taken, but I suspect you knew this
                  was easy, didn't ya? ;-)

                  Anybody have an idea of about how to get these results ordered lexically?
                  I'll do a C quicky tomorrow afternoon (just came back from a party)


                  ------------------




                  [This message has been edited by Jack Death (edited July 29, 2001).]

                  Comment


                  • #10
                    Ron --
                    1) You lose at least 10-15% because of @ar[i]
                    @ar[i] works slowly than ar(i).

                    2) ReDim arl(0 To num) As Global Long
                    PB is a little strange compiler.
                    It keeps an address of LBound element instead of zero element.
                    To optimize, it decides in compilation time to substract lbound or not.
                    When Pb sees ReDim ar(a To b) it will substract in execution time, even if a = 0 and declared ar(0 To num).



                    ------------------
                    E-MAIL: [email protected]

                    Comment


                    • #11
                      Thanks, Semen. It was late at night here in east coast, usa.
                      I havent had my coffee yet, so I am having to absorb what you said about the REDIM of the array.

                      PB is a little strange compiler.
                      It keeps an address of LBound element instead of zero element.
                      To optimize, it decides in compilation time to substract lbound or not.
                      When Pb sees ReDim ar(a To b) it will substract in execution time, even if a = 0 and declared ar(0 To num).
                      I did notice i had created an extra array element. I don't understand exactly what you mean about the lbound element. I thought my lbound would be 0.

                      Are you sure about the pointers? If I am not mistaken, ar[i] is the address of a pointer whereas @ar[i] is touching the data.

                      Thanks,
                      Ron

                      Comment


                      • #12
                        Ron --

                        > I don't understand exactly what you
                        > mean about the lbound element.

                        I think he is referring to the fact that using DIM MyArray(100) will produce more efficient PowerBASIC code than DIM MyArray(0 TO 100).

                        -- Eric

                        ------------------
                        Perfect Sync Development Tools
                        Perfect Sync Web Site
                        Contact Us: mailto:[email protected][email protected]</A>

                        [This message has been edited by Eric Pearson (edited July 29, 2001).]
                        "Not my circus, not my monkeys."

                        Comment


                        • #13
                          Thanks, Eric. Is that documented anywhere? I thought the "0" would cause the same code generation.

                          Update (my last post in this box canyon)
                          The code did become more efficient with dim(xx) as global long. Thanks Semen and Eric.
                          Somewhat unfortunantly, I was using debug code generation
                          settings for VC. I tried VC with Minimum Size and Maximum Speed.
                          The times were 580ms and 551ms respectively.
                          The fastest PB time was 991ms.

                          In this test it appears C will win and I am pretty sure it's because of the recursive function calls. Function overhead has been discussed before. Based upon the dialog of those discussions, I doubt this situation will be addressed as it's a BASIC/design decision kind of thing. I think the compiler might need another pass to optimize the calls and thus generate only the actual minimum overhead needed by each function. I for one could care less if my compile times increased 2x.

                          Just keep in mind this is a specific test with a tiny, unsophisticated app. I strongly believe what Bob Zale says about the PowerBASIC generated code being more efficient than C's. There is some "overhead" penalty when calling functions/subs. The optimized code in a function/sub will neutralize the initial overhead provided there is enough code executed inside the function. My dream is for this overhead to go away then there is simply no contest anymore.

                          Some of you guys have played with Call Dword to skip over the overhead of calling a function. I don't think this can be done recursively - or can it?

                          Ron

                          Comment


                          • #14
                            Ron --

                            > Is that documented anywhere?

                            Well, now that I have researched it, this may be one of those pieces of "knowledge" that I incorrectly carried over from my PB/DOS days. This is from the PB/DOS Help File:

                            You should never specify an explicit lower limit of 0 (unless a non-zero $OPTION BASE is active) as this imposes an efficiency penalty with no meaningful benefits.

                            But I don't see anything about it in the PB/DLL or PB/CC docs, so I may have incorrectly assumed that the rule applies to the Windows compilers.

                            -- Eric


                            ------------------
                            Perfect Sync Development Tools
                            Perfect Sync Web Site
                            Contact Us: mailto:[email protected][email protected]</A>

                            [This message has been edited by Eric Pearson (edited July 29, 2001).]
                            "Not my circus, not my monkeys."

                            Comment


                            • #15
                              It's not documented, but easy to test.
                              In my impression PB does following in compilation time.
                              It looks all Dim/Redim for an array.
                              If somewhere there is To, PB selects a slow variant (with substraction lbound in execution time).
                              Of course, PB could be a little clever and to ignore 0 To, but actually - I think - better to use standart approach : to keep address of zero element.
                              This allows not to substract in any case (lbound = 0 or not).

                              About C vs PB. Frankly, guess that in equal conditions C will win 10-20%, because it's really good optimized.

                              About recoursive ... Depends of situations, but typically it's possible to convert to following
                              Code:
                              PUSH "end of query"
                              PUSH "first k" 
                              Do
                                 POP k
                                 If k = "end" Then Exit Do
                                 .... PUSH "next k" .... (for recoursive call)
                              Loop
                              ------------------
                              E-MAIL: [email protected]

                              Comment


                              • #16
                                Ron, Semen,

                                Hmmm, this was pretty interesting! However, I can't confirm the pointer nor
                                the REDIM problem Semen talks about. DIM it and it actually get slower!
                                Change the pointer to array syntax and nothing significant changes. It seems
                                that the assumptions are incorrect.

                                Recursion might be the problem (pushing/popping the stack), I agree with Ron
                                that the overhead must be eliminated, as it leaves the C version almost twice
                                as fast as the PB version!

                                Also, just in case somebody here wonders why this might be of interest:

                                It isn't! (well at least not for "regular" programs doing trivial IO/computations)

                                However, if you ever need speed in programs that handle:

                                1. arrays,
                                2. databases
                                3. computational intensive apps (DES ring a bell?)
                                4. data analysis
                                5. large amounts of data to be modified

                                and so on, it becomes painfully clear that C needs to be used if the program
                                calls FUNCTIONS repetitively (like in a loop, or straight recursion) This is
                                what the discussion has gained so far, mayhaps that will change...

                                It also shows that the use of pointer versus traditional BASIC array
                                manipulation can gain dramatic speed increases, so, you should start thinking
                                in those terms, if you don't already.
                                This is typical C stuff, but atypical for BASIC programming. The proof is in the
                                pudding, just examine the solution shown in the forum link at the very top of thread,
                                it shows "BASIC thinking", not "speed/optimize thinking".
                                It's always faster to juggle a few address bytes than it is to juggle the whole or partial array.
                                So this thread may just convince some of you lurkers to change your programming habits ;-)

                                (To Michael: YES, it is the programmer, but it is also in depth
                                knowledge of the tool used that counts and that's what we are trying to do here;
                                figuring out the PB compiler and what strategies are best to use for it)
                                There's always more than way to Rome, finding the best one is the problem...

                                That said, I whipped up a little proggy, doing the permutations WITHOUT
                                recursion AND have it sort the permutations while it's at it.

                                Ron, you might want to take this one out for a test drive...
                                (It's just as easy to convert as the previous one)

                                Mayhaps this will show off PB...

                                New code follows:

                                Code:
                                /*No recursion and sorted output */
                                
                                #include <stdio.h>
                                #include <stdlib.h>
                                void main(void)
                                {
                                    int i, j, x, y, tmp, n;
                                    int *ar;
                                    printf( "Enter n: ");
                                    scanf( "%d", &n);
                                    ar = malloc( (n+1) * sizeof(int));
                                    for (i=0; i <= n; i++)
                                        ar[i] = i;
                                    i = 1;
                                    while (i)
                                    {
                                        for (i=1; i <= n; i++)
                                            printf( " %2d", ar[i]);
                                        printf("\n");
                                        i = n-1;
                                        while (ar[i] > ar[i+1])
                                            i--;
                                        j = n;
                                        while (ar[i] > ar[j])
                                            j--;
                                        tmp = ar[i];
                                        ar[i] = ar[j];
                                        ar[j] = tmp;
                                        x = n;
                                        y = i+1;
                                        while (x > y)
                                        {
                                            tmp = ar[x];
                                            ar[x] = ar[y];
                                            ar[y] = tmp;
                                            x--;
                                            y++;
                                        }
                                    }
                                }
                                ------------------

                                Comment


                                • #17
                                  > I can't confirm the pointer nor the REDIM problem Semen talks about.

                                  Code:
                                    #Dim All
                                    #Option Version4
                                    Declare Function GetTickCount Lib "KERNEL32.DLL" Alias "GetTickCount" () As Dword
                                    Declare Sub perm (k As Long)
                                    Global l      As Long
                                    Global num    As Long
                                    Global arl()  As Long
                                    Global ar     As Long Ptr
                                  
                                    Function PbMain
                                       Local i  As Long
                                       Local s  As Long
                                       Local f  As Long
                                       num = 11 ' Val(Command$)
                                       l=-1
                                       s = GetTickCount()
                                       ReDim arl(0 To num) As Global Long
                                       ar = VarPtr(arl(0))
                                       For i = 0 To num -1
                                          @ar[i] = 0
                                       Next
                                       perm 0
                                       f = GetTickCount()
                                       StdOut "PB Elapsed:" & Str$(f-s) & "ms"
                                       WaitKey$
                                    End Function
                                    
                                    Sub perm (k As Long)
                                       Local i As Long
                                       Incr l
                                       @ar[k] = l
                                       If (l = num) Then
                                          For i = 0 To num -1
                                             '    StdOut Str$(@ar[i]);
                                          Next
                                          '  StdOut ""
                                       Else
                                          For i = 0 To num -1
                                             If (@ar[i] = 0) Then
                                                 perm i
                                             End If
                                          Next
                                       End If
                                       Decr l
                                       @ar[k] = 0
                                    End Sub
                                  This is an original variant (I explicity set num = 11 + added Waitkey$)

                                  On my PC (PIII-800, Win2000 SP2) takes 19,2 sec.

                                  After modification
                                  Code:
                                    #Dim All
                                    #Option Version4
                                    Declare Function GetTickCount Lib "KERNEL32.DLL" Alias "GetTickCount" () As Dword
                                    Declare Sub perm (k As Long)
                                    Global l      As Long
                                    Global num    As Long
                                    Global ar()  As Long
                                  
                                    Function PbMain
                                       Local i  As Long
                                       Local s  As Long
                                       Local f  As Long
                                       num = 11 ' Val(Command$)
                                       l=-1
                                       s = GetTickCount()
                                       ReDim ar(num) As Global Long
                                       For i = 0 To num -1
                                          ar(i) = 0 ' <-- actually not necessary
                                       Next
                                       perm 0
                                       f = GetTickCount()
                                       StdOut "PB Elapsed:" & Str$(f-s) & "ms"
                                       WaitKey$
                                    End Function
                                    
                                    Sub perm (k As Long)
                                       Local i As Long
                                       Incr l
                                       ar(k) = l
                                       If (l = num) Then
                                          For i = 0 To num -1
                                             '    StdOut Str$(@ar[i]);
                                          Next
                                          '  StdOut ""
                                       Else
                                          For i = 0 To num -1
                                             If (ar(i) = 0) Then
                                                 perm i
                                             End If
                                          Next
                                       End If
                                       Decr l
                                       ar(k) = 0
                                    End Sub
                                  Takes 16.9 sec. Which results do you have ?

                                  ------------------
                                  E-MAIL: [email protected]

                                  Comment


                                  • #18
                                    I decided to look a code in debugger.
                                    So, I added
                                    Code:
                                         k = -10000000
                                    at the bottom before
                                    Code:
                                         ar(k) = 0
                                      End Sub
                                    to initiate Access Violation.

                                    ar(k) = 0 looks so

                                    a) with ReDim ar(num) As Global Long
                                    Code:
                                    00401239   mov         ebx,dword ptr [ebp+8]
                                    0040123C   mov         dword ptr [ebx],0FF676980h
                                    00401242   mov         ebx,dword ptr [ebp+8]
                                    00401245   mov         eax,dword ptr [ebx]
                                    00401247   mov         ebx,dword ptr ds:[4043B8h]
                                    0040124D   mov         dword ptr [ebx+eax*4],0
                                    b) with ReDim ar(0 To num) As Global Long
                                    Code:
                                    0040124B   mov         ebx,dword ptr [ebp+8]
                                    0040124E   mov         dword ptr [ebx],0FF676980h
                                    00401254   mov         ebx,dword ptr [ebp+8]
                                    00401257   mov         eax,dword ptr [ebx]
                                    00401259   sub         eax,dword ptr ds:[4043D0h]
                                    0040125F   mov         ebx,dword ptr ds:[4043B8h]
                                    00401265   mov         dword ptr [ebx+eax*4],0
                                    Do you see additional 00401259 sub eax,dword ptr ds:[4043D0h] ?

                                    Now how Ron's code looks
                                    Code:
                                     k = -10000000
                                     @ar[k] = 0
                                    Redim is not important here, because ar is a single Long Ptr, not an array
                                    Code:
                                    00401266   mov         ebx,dword ptr [ebp+8]          ' address of k
                                    00401269   mov         dword ptr [ebx],0FF676980h     ' <---------- k = -10000000
                                    0040126F   mov         ebx,dword ptr ds:[4043B8h]     ' address of ar 
                                    00401275   mov         dword ptr [ebp-2Ch],ebx        ' @ar
                                    00401278   mov         ebx,dword ptr [ebp+8]          ' address of k (it's input parameter; PB does mov ebp. esp during enterance)
                                    0040127B   mov         eax,dword ptr [ebx]            ' value of k 
                                    0040127D   shl         eax,2                          ' multiply 4
                                    00401280   mov         ebx,dword ptr [ebp-2Ch]        ' value of ponter ar
                                    00401283   add         ebx,eax                        ' address for @ar[k] 
                                    00401285   mov         dword ptr [ebx],0              ' @ar[k] = 0
                                    Let's compare with my.
                                    Code:
                                    00401239   mov         ebx,dword ptr [ebp+8]
                                    0040123C   mov         dword ptr [ebx],0FF676980h
                                    00401242   mov         ebx,dword ptr [ebp+8]
                                    00401245   mov         eax,dword ptr [ebx]
                                    00401247   mov         ebx,dword ptr ds:[4043B8h]
                                    0040124D   mov         dword ptr [ebx+eax*4],0
                                    If to forget about k = -10000000 (first two MOV in both variants)
                                    4 mov vs 6 ("equal") mov + shl + add.

                                    [This message has been edited by Semen Matusovski (edited July 29, 2001).]

                                    Comment


                                    • #19
                                      I said I was done with this, but I have changed my mind.

                                      Semen, I do see where the redim without specifying "To" does increase the speed a little. I am very curious (concerned???) that referencing a long array without a pointer (albeit indexed) is faster than with an indexed pointer - significantly faster.
                                      My speeds were:
                                      Code:
                                      My Prog:      PB Elapsed: 46747ms
                                      Semen's Prog: PB Elapsed: 33438ms
                                      Any ideas about the pointer being slower anyone? Is this a fluke or should indexed pointers be discouraged?

                                      Semen, what debugger do you use that gives you the compiled code?

                                      Thanks,
                                      Ron

                                      [This message has been edited by Ron Pierce (edited July 29, 2001).]

                                      Comment


                                      • #20
                                        Semen,

                                        this gets wierder as the day gets older!
                                        I did the "RON" version for 10 digits (my machine is 4 time slower than yours
                                        and my hangover makes me impatient today that's why I didn't go for 11 digits)
                                        8180 ms
                                        8090 ms
                                        8155 ms

                                        Your version for 10 digits:
                                        7590 ms
                                        7470 ms
                                        7355 ms

                                        quite difference! Looking at your follow up, I can only suspect that there are
                                        some intrinsic PB compiler optimizations, that we don't know about, maybe that's
                                        something for next Manual/Help file wish list. Some of the optimizations should
                                        definitely be explained to those who want to squeeze speed out of their code.

                                        Nice work, Semen! This left us with no answers, just more questions!!! ARRGHH..
                                        Wonder how the Non-Recursive version will work out...
                                        PS: are you using the console compiler? (I have PB/DLL 6.0 and it doesn't like the stdout)

                                        Thanx!

                                        ------------------


                                        [This message has been edited by Jack Death (edited July 29, 2001).]

                                        Comment

                                        Working...
                                        X