Announcement

Collapse
No announcement yet.

High Speed Graphics DLL

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Tom,

    We'll said! (I know I couldn't have said it better)...

    It's never been the compiler(PB) that has been my problem, but
    the hardship of producing the desire results, with the knowledge
    and level that I'm currently at with PB, is the amount of time
    that I'm willing to spend to over-come my battle with PB!

    Thanks

    MWM


    mwm

    Comment


    • #22
      What they all said!

      A lot of VB programmers think that just porting their code to C will instantly make it fly. What they don't unserstand is that if you code it the same way it runs at about the same speed. C however can utilise pointers etc so a progammers suddenly has other ways of implimenting a program. For example the speed difference in a multi-dimansional array if you use propper pointer arithmatic to loop through the data. These loops will be faster, not because it's a better compiler but because the language allows these optimisations. PB can do pretty much what ever C can do so if your not getting the power the first place you should look is your code. the "Tricks of the trade" the pro's use is not a secret compiler but better understanding of what runs fast and why. Just like programming the same thing in ASM doesn't necessarly give you a huge performance boost, but it can.

      I've seen lots of tricks use for high speed graphics and just "Avoiding the API" doesn't say much. I've seen some very cunning uses of unions to alter RGB values without copying extra data all over the place, people finding patterns in unexpected places and allowing them to add more pre calculated contants rather than use math at the last minute. I'm doubtful that if you just ported your highspeed dll to c++ it'd be any quicker. If there was a compiler bottlenect you should be able to profile your code and tell where it is and understand why it's not your code. I'm just seeing finger pointing.


      ------------------
      Paul Dwyer
      Network Engineer
      Aussie in Tokyo

      Comment


      • #23
        Eric,

        LONGs are SIGNED. DWORDS are UNsigned.
        INTEGERS are SIGNED. WORDS are UNsigned.

        I'm sure you just made some typos!

        That's why (as Lance pointed out to me in another thread) the MATH
        with LONGS/INTEGERS are faster than MATH with DWORDS/WORDS.

        My testing shows that pure access/assignment is slightly faster
        with DWORDS/WORDS than LONGS/INTEGERS (which seems strange to me -
        I would think that they should be equivalent).

        Tyrone,

        The best way to optimize PB code is to understand how computers
        process functions/data at the assembly or machine level. You can
        then make your own determinations as to what should have the least
        overhead in processing.

        The reason 32 bit variables (LONGs/DWORDs) are faster other sized
        variables (BYTES/WORDS/INTEGERS) has to do with the way 32-bit CPU's
        address memory. Memory access is on 32-bit byte boundaries. Extra
        processing is required to extract the data from non-32-bit variables.

        Single & Double precision (real) variables require so much extra
        processing, they have a dedicated processor! Avoid them if at all
        possible if speed is essential.

        Users have mentioned profiling your code. What I do is wrap sections
        of my code with :

        Code:
        Start! = TIMER
        
        
        ...  my code ...
        
        
        Finish! = TIMER
        
        
        MSGBOX "The code took"+STR$(Finish! - Start!)+" seconds to execute."
        and then I start tweeking the code and testing the performance
        difference.




        ------------------
        Bernard Ertl
        Bernard Ertl
        InterPlan Systems

        Comment


        • #24
          > I'm sure you just made some typos!

          Thanks, you're exactly right Bern. I have re-edited the original message to avoid confusion...

          -- Eric


          ------------------
          Perfect Sync Development Tools
          Perfect Sync Web Site
          Contact Us: mailto:[email protected][email protected]</A>
          "Not my circus, not my monkeys."

          Comment


          • #25
            Tyrone,

            I forgot to mention that if speed is critical, you might consider
            storing the data from your type declaration:

            Code:
            Type Bitmap
            picWidth as Integer
            picHeight as Integer
            BackGround as Integer
            bmpData (128) as DWORD
            End Type
            in straight variables/arrays/memory block. Using a UDT (User
            Defined Type) requires a little extra processing to reference
            the data elements.

            In other words, using:

            Code:
            DIM Data( 1: 10) AS Bitmap
            
            
            FOR I = 1 TO 10
               X = Data(I).picWidth
            NEXT
            will be slower than:

            Code:
            DIM picWidth( 1: 10) as Integer
            DIM picHeight( 1: 10) as Integer
            DIM BackGround( 1: 10) as Integer
            DIM bmpData ( 1:10, 128) as DWORD
            
            FOR I = 1 TO 10
               X = picWidth(I)
            NEXT
            It requires more work on your part to manage all those arrays,
            but it is less work on the processor and thus faster.




            ------------------
            Bernard Ertl
            Bernard Ertl
            InterPlan Systems

            Comment


            • #26
              I'm afraid some of the advice here has not been entirely accurate. If absolute speed is an issue, never, never, never, ever DIM an array from 1 TO N! Always use a lower bound of zero.

              Also, accessing a member of a user-defined type is almost never an issue, and when it is, it's so minor that it's almost non-measurable. Don't worry about this one! {smile}

              Regards,

              Bob Zale
              PowerBASIC Inc.



              ------------------

              Comment


              • #27
                Bob,

                The following code executes in just under two minutes on
                my computer. I think you might find the results surprising.

                The UDT access is roughly 4X slower than a straight array access.

                The results will be similar if you change the access variables
                from INTEGER to LONG or DWORD.


                Code:
                #COMPILE EXE
                #REGISTER NONE
                
                
                #INCLUDE "DDT.INC"
                
                
                TYPE My_UDT
                   picWidth AS INTEGER
                   picHeight AS INTEGER
                   BackGround AS INTEGER
                   bmpData (128) AS DWORD
                END TYPE
                
                
                %MaxOffset = 99999
                %MaxIterations = 4000
                
                
                FUNCTION PBMAIN () AS LONG
                
                
                  REGISTER I&, J&
                
                
                  LOCAL Start!, Fin!, temp%
                  LOCAL Start2!, Fin2!, udtsize&
                  LOCAL Start3!, Fin3!
                  LOCAL Start4!, Fin4!
                  LOCAL Start5!, Fin5!
                
                
                  DIM buff2 ( %MaxOffset) AS INTEGER
                
                
                  Start! = TIMER
                  FOR J& = 1 TO %MaxIterations
                     FOR I& = 0 TO %MaxOffset
                        temp% = buff2(I&)
                     NEXT
                  NEXT
                  Fin! = TIMER
                
                
                  DIM irptr AS INTEGER PTR
                  Start2! = TIMER
                  irptr = VARPTR(buff2(0))
                  FOR J& = 1 TO %MaxIterations
                     FOR I& = 0 TO %MaxOffset
                        temp% = @irptr[I&]
                     NEXT
                  NEXT
                  Fin2! = TIMER
                  ERASE buff2
                
                
                  DIM buff( %MaxOffset) AS My_UDT
                
                
                  Start3! = TIMER
                  FOR J& = 1 TO %MaxIterations
                     FOR I& = 0 TO %MaxOffset
                        temp% = buff(I&).BackGround
                     NEXT
                  NEXT
                  Fin3! = TIMER
                
                
                  udtsize& = SIZEOF(buff(0))
                  Start4! = TIMER
                  FOR J& = 1 TO %MaxIterations
                     irptr = VARPTR(buff(0).BackGround)
                     FOR I& = 0 TO %MaxOffset
                        temp% = @irptr
                        irptr = irptr + udtsize&
                     NEXT
                  NEXT
                  Fin4! = TIMER
                
                
                  DIM udtptr AS My_UDT PTR
                
                
                  Start5! = TIMER
                  udtptr = VARPTR(buff(1))
                  FOR J& = 1 TO %MaxIterations
                     FOR I& = 0 TO %MaxOffset
                        temp% = @udtptr[I&].BackGround
                     NEXT
                  NEXT
                  Fin5! = TIMER
                
                
                  MSGBOX "Accessing INTEGER array took"+STR$(Fin!-Start!)+" seconds." + $CRLF + $CRLF + _
                         "Accessing INTEGER array with pointer took"+STR$(Fin2!-Start2!)+" seconds." + $CRLF + $CRLF + _
                         "Accessing UDT array took "+STR$(Fin3!-Start3!)+" seconds." + $CRLF + $CRLF + _
                         "Accessing UDT array with element pointer took "+STR$(Fin4!-Start4!)+" seconds." + $CRLF + $CRLF + _
                         "Accessing UDT array with UDT pointer took"+STR$(Fin5!-Start5!)+" seconds." + $CRLF + $CRLF _
                         ,,"UDT Access Test"
                
                
                END FUNCTION
                Thanks for the correction about the array declaration. I didn't
                intend to mean that assigning (1: 10) was the optimum way.




                ------------------
                Bernard Ertl
                Bernard Ertl
                InterPlan Systems

                Comment


                • #28
                  One can find a way to prove most any point of view in programming. Your sample code was a bit different than the description.

                  However, I'm not here to debate the issue, only to offer my opinions, if it will help someone. While I'm quite confident of my accuracy, you're free to discard my comments if you find a better way.

                  Regards,

                  Bob Zale
                  PowerBASIC Inc.


                  ------------------

                  Comment


                  • #29
                    Bob,

                    Now I'm confused. The point I was trying to make from the beginning
                    is that referencing data as an element of a UDT takes longer than
                    referencing a "flat" variable.

                    My sample code was written several months ago and contains extra code
                    which compares different things, but it still illustrates the access
                    of an UDT element vs. having that data outside a UDT.



                    ------------------
                    Bernard Ertl
                    Bernard Ertl
                    InterPlan Systems

                    Comment


                    • #30
                      For those who are interested, I reworked my sample to exclude arrays
                      and the results are much better.

                      On my machine, using INTEGERS, the flat var access took 5 seconds
                      and the UDT took 12. With LONGS, the flat var access took 5 seconds
                      and the UDT took 6 seconds. Not too bad for 99999 X 4000 accesses.

                      But if you need to use arrays, the differences are more dramatic.

                      Code:
                      #COMPILE EXE
                      #REGISTER NONE
                      
                      
                      TYPE BitmapType
                         picWidth AS INTEGER
                         picHeight AS INTEGER
                         BackGround AS INTEGER
                         bmpData (128) AS DWORD
                      END TYPE
                      'TYPE BitmapType            'Try testing with UDT aligned on DWORD
                      '   picWidth AS LONG
                      '   picHeight AS LONG
                      '   BackGround AS LONG
                      '   bmpData (128) AS DWORD
                      'END TYPE
                      
                      
                      %MaxOuterLoopIterations = 99999
                      %MaxInnerLoopIterations = 4000
                      
                      
                      FUNCTION PBMAIN () AS LONG
                      
                      
                        REGISTER I&, J&
                      
                      
                        LOCAL Start!, Fin!
                        LOCAL Start2!, Fin2!
                        LOCAL Temp AS INTEGER
                      '  LOCAL Temp AS LONG
                      
                      
                        LOCAL vBackGround AS INTEGER
                      '  LOCAL vBackGround AS LONG
                        DIM vBitMap AS BitmapType
                      
                      
                        Start! = TIMER
                        FOR J& = 1 TO %MaxOuterLoopIterations
                           FOR I& = 0 TO %MaxInnerLoopIterations
                              Temp = vBackGround
                           NEXT
                        NEXT
                        Fin! = TIMER
                      
                      
                        Start2! = TIMER
                        FOR J& = 1 TO %MaxOuterLoopIterations
                           FOR I& = 0 TO %MaxInnerLoopIterations
                              Temp = vBitMap.BackGround
                           NEXT
                        NEXT
                        Fin2! = TIMER
                      
                      
                        MSGBOX "Accessing 'flat' variable took "+STR$(Fin!-Start!)+" seconds." + $CRLF + $CRLF + _
                               "Accessing UDT element took "+STR$(Fin2!-Start2!)+" seconds." + $CRLF + $CRLF _
                               ,,"UDT Access Test"
                      
                      
                      END FUNCTION



                      ------------------
                      Bernard Ertl
                      Bernard Ertl
                      InterPlan Systems

                      Comment


                      • #31
                        Tyrone
                        I suspect that the main reason for your lack of speed is that you are compiling a 16 bit app and so you will never be able to compete with the speed of the 32 bit windows API.
                        Also the comments about longs versus integers dont neccessarily apply in 16 bit. Compile the code in PBDLL 6 and it will be much faster.


                        ------------------

                        Comment


                        • #32
                          I think the application will be much faster in 32bit as well..
                          I will port to 32bit and use Longs instead of Integers..

                          The itteration example posted was MOST impressive. 99999x4000 is alot
                          for 5/6 seconds.. Thats approx 66 Million a second! Damn is that right?
                          But I'll have to agree the time difference between UDT and Individual variables is not important.

                          If I can get this working at 30million a second I'm happy. *And that
                          was copying the entire datatype. I just want to move pixels and
                          colors data..

                          Was that test done using PB6.0 32bit? And please tell me how fast was the
                          machine you tested this code on?

                          ------------------
                          Explorations v3.0 RPG Development System http://www.explore-rpg.com

                          [This message has been edited by Tyrone W. Lee (edited March 12, 2001).]
                          Explorations v9.10 RPG Development System
                          http://www.explore-rpg.com

                          Comment


                          • #33
                            Compiled with PB/DLL 6.0. (32 bit)

                            Running Win98 on a 400 Mhz PII.

                            ------------------
                            Bernard Ertl

                            [This message has been edited by Bern Ertl (edited March 12, 2001).]
                            Bernard Ertl
                            InterPlan Systems

                            Comment


                            • #34
                              Tyrone,

                              don't forget to post a bit of that time critical code once you're happy with it.

                              I'm sure some readers here would like to see if they can squeeze more performance out of it. I think it would make an interesting discussion.

                              A lot of good tips might come out of it for all of us.

                              Peter.


                              ------------------
                              [email protected]
                              [email protected]

                              Comment


                              • #35
                                I have always been a big fan of AMD processors.. and this little code
                                benchmark test will show why...

                                I took the sample code and compiled it in PB6.0 On my AMDK2 380Mhz machine
                                and here are the results I got.

                                Integers:
                                Flat Variables - 8.78 seconds
                                UDT Variables - 7.58 seconds

                                Long Variables - 6.527 seconds
                                UDT Variables - 6.703 seconds..

                                The difference in Flat vs UDT variables seems to be not-important but there
                                was a whole second difference in the Int vs use of Long datatypes.

                                But even this time difference isn't as great at the 12second vs 6 second
                                example shown on a Pentium of comprable speed. Wonder why?!?


                                Either way.. At 400Million iterations in 12 seconds is still 33 million
                                assigned pixels per second.. (I only need 25million) Any code designed
                                will soar on and AMD with Integer UDT's, and lag a bit on a comprable
                                Pentium....



                                ------------------
                                Explorations v3.0 RPG Development System
                                http://www.explore-rpg.com
                                Explorations v9.10 RPG Development System
                                http://www.explore-rpg.com

                                Comment


                                • #36
                                  AMDs are not guaranteed to be faster than Pentiums. It varies "depending". My tests
                                  suggest that AMDs provide more uniform performance, though: they seem less likely
                                  to be dramatically affected by tiny changes in code. YMMV.

                                  ------------------
                                  Tom Hanlin
                                  PowerBASIC Staff

                                  Comment


                                  • #37
                                    Well AMD's have the math co-processor build it which would explain
                                    why the performance is uniform..

                                    Regarding speed.. If you havent read the thread "High Speed Graphics Pt. 2" -
                                    I have made some MAJOR advancements on the original code..

                                    I was able to produce 1024x760x1000 using a Pointer variable
                                    reference to the Screen() array. Yea, thats 1000 frames per second on an
                                    AMD 380Mhz!! This is the kinda speed I'm looking for..

                                    I posted the code on that thread if anyone is interested..

                                    Thanks Again Guys! This is really going to help my game project!

                                    ------------------
                                    Explorations v3.0 RPG Development System
                                    http://www.explore-rpg.com
                                    Explorations v9.10 RPG Development System
                                    http://www.explore-rpg.com

                                    Comment


                                    • #38
                                      > have the math co-processor build it which would explain...

                                      I'm not a chip expert, but I was under the impression that the last major chip that did not have a math coprocessor was the 386. The 486 had a "stripped down" version -- 486-SX? -- that didn't have coprocessors, but 100% of the Pentiums and above do. Or so I thought. Am I wrong?

                                      -- Eric


                                      ------------------
                                      Perfect Sync Development Tools
                                      Perfect Sync Web Site
                                      Contact Us: mailto:[email protected][email protected]</A>



                                      [This message has been edited by Eric Pearson (edited March 14, 2001).]
                                      "Not my circus, not my monkeys."

                                      Comment


                                      • #39
                                        Apparently not...

                                        I ran this high speed graphics code on my AMD380Mhz and was
                                        experiencing 80 frames in about 8 second.. I commented the
                                        line out that referenced the Screen() array and the system
                                        blazed.. I changed this reference to a pointer ptrScreen[px + py * Width]
                                        And was able to generate 1024x768x1000 (1000 frames per second!)

                                        I then, come to work where I have a PentiumIII 1Ghz machine and
                                        attempt to run the same code... I can barely get 80 frames per second!

                                        I comment out the reference to any Screen variables and allow
                                        the computer to just add numbers within the inner loops. The
                                        machine shows NO increase in speed. Apparently there is NO - Math
                                        coprocessor on Pentium chips.

                                        I will run this test again, because its very hard to believe
                                        my 380 out-performed a 1Ghz, but regardless of my 380Mhz performance.
                                        I would expect ALOT more from a 1Ghz machine.. I think Pentiums really
                                        ****!

                                        If anyone would like to see this code and explain to me WHY there
                                        is such a big difference please respond..


                                        ------------------
                                        Explorations v3.0 RPG Development System
                                        http://www.explore-rpg.com
                                        Explorations v9.10 RPG Development System
                                        http://www.explore-rpg.com

                                        Comment


                                        • #40
                                          The mere change to a pointer here should have negligible effect... and yes,
                                          certainly all Pentiums have math processors built in. It seems likely that
                                          there's a misunderstanding involved in the timing calculations somewhere.


                                          ------------------
                                          Tom Hanlin
                                          PowerBASIC Staff

                                          Comment

                                          Working...
                                          X