Announcement

Collapse
No announcement yet.

High Speed Graphics DLL

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • High Speed Graphics DLL

    I am currently working on a high speed graphics .DLL routine
    and was dissapointed in the results I got with PowerBasic. I
    will explain my design and maybe someone can tell me if its a
    design flaw, or PB compiler. The routine works, but runs slower
    than Windows API.

    The first step in the design is to Load Bitmaps into memory.
    The system has a link lists of bitmaps that you pre-load that
    can be used as textures. The current system can have a total of 40
    different bitmap arrays running at the same time, each holding
    variable # of bitmaps to be rendered.

    The bitmaps are stored in a Type definition shown below.

    Type Bitmap
    picWidth as Integer
    picHeight as Integer
    BackGround as Integer
    bmpData (128) as DWORD
    End Type

    The array bmpData(128) is used to hold up to 32K pages of data each.
    I use the GetStrAlloc command to allocate 32K bytes of data each page
    until the entire bitmap is stored.

    Once the bitmap data is stored. You can declare a Set_Screen command
    This command which declares a 2 dimensional array. Screen (width,height)
    This represent your Windows Picture box where the render will be
    finally display.

    *Now, all copies are done on this Screen Array.

    Function GetBMPPixel (ByVal GType%, ByVal PixX%, ByVal PixY%) EXPORT As WORD
    Dim page As WORD, bytes As DWORD
    Dim bmpColor as WORD PTR

    page = 0
    Select Case GType%
    Case 0, 2
    bytes = PixX% + (PixY% * @GraphicsData.Icon2D.Icon.pWidth)
    Case Else
    bytes = PixX% + (PixY% * @GraphicsData.Icon2D.notIcon.pWidth)
    End Select

    page = FIX(bytes/16000)
    bytes = bytes - (page * 16000)

    Select Case GType%
    Case 0, 2
    bmpColor = @GraphicsData.Icon2D.Icon.bmpPages(page)
    Case Else
    bmpColor = @GraphicsData.Icon2D.notIcon.bmpPages(page)
    End Select

    GetBMPPixel = @bmpColor[bytes]
    End Function

    This routine retrieves a pixel color from the bitmap type def.
    The structure I am using hold 2 bitmaps for each structure, so maybe
    its the SelectCase branch that is slowing the routine, but I don't
    know.

    You find a bitmap pixel, the system has to find the bitmap type
    structure, calculate the page and locate the byte. I wrote a
    a simple function to retrieve a bitmap pixel from the type
    definition and set the appropriate Screen(x,y) location to the
    new value.


    Endless copies can be completed and then an UpdateScreen command
    can be executed to send the final render to the PictureBox.

    I have a AMDK380Mhz machine and the copy routine is much slower
    than I would expect for a machine of this speed. I am using
    PowerBasic 2.0 16bit compiler, but was wondering if my approach
    was wrong.

    ------------------
    Explorations v3.0 RPG Development System
    http://www.explore-rpg.com
    Explorations v9.10 RPG Development System
    http://www.explore-rpg.com

  • #2
    SELECT CASE could be a bottleneck in this type of repetitive code. Internally, SELECT CASE uses Extended Floating point since it can also handle ranges, etc.

    Try changing this section of code to use an IF-THEN block instead and see how much of a boost that gives you.

    Also, regarding the line:
    Code:
    page = FIX(bytes/16000)
    I've not studied your arithmetic to see if this will work, but if you can convert that to an integer division it will also improve performance.




    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>
    Lance
    mailto:[email protected]

    Comment


    • #3
      Tyrone,

      Could you not resolve the nested structure members up front (GraphicsData.Icon2D.Icon.pWidth, GraphicsData.Icon2D.notIcon.pWidth, etc) before the function call, or does this value really change with every call to the function ?

      Also, I'm only guessing, but if the .pWidth member only ever evaluates to a values like 1 or 2, you could hard code each such possibility as say PixX% + PixY%, PixX% + PixY% + PixY%, etc, thereby avoiding having to perform a multiplication on PixY%.

      ~~~

      Lance,

      The book of rules says that using using SELECT (CASE) rather than multiple IFs may produce a more efficient & concise program. I'm also looking to optimise a core routine, in this case examining sequential byte values - would your advice to use multiple IFs hold good in such a case too ?

      Thanks -

      Paul



      [This message has been edited by Paul Noble (edited March 05, 2001).]
      Zippety Software, Home of the Lynx Project Explorer
      http://www.zippety.net
      My e-mail

      Comment


      • #4
        Paul --

        > The book of rules says that using using
        > SELECT (CASE) rather than multiple IFs
        > may produce a more efficient & concise
        > program.

        Concise, Yes. Efficient, No. As Lance said, SELECT uses floating point operations for all comparisons, so IF/THEN will usually be faster if you are testing integer values.

        > I'm also looking to optimise a core routine, in
        > this case examining sequential byte values

        Bytes values are limited to 0-255, so this sounds like an ideal place for ON GOTO or ON GOSUB. I know, I know... People say you should never use GOTO because it leads to hard-to-read code, but look at this "structure"...

        Code:
        '-------------------- start of block
         
        ON bValue GOTO One, Two, Three, Four, Five, Six, Seven, Eight, Nine, Ten
         
            One:
                'Process 1 here
                GOTO AllDone
         
            Two:
                'Process 2 here
                GOTO AllDone
         
            Three:
                'Process 3 here
                GOTO AllDone
         
            'and so on...
         
        AllDone:
         
        '-------------------- end of block
        You won't believe how fast it is, compared to an IF/THEN or SELECT!

        And if you confine all of the GOTOs to the "block" it would be hard to criticize that code on readability. It's as easy to read and understand as the corresponding IF/THEN or SELECT structure.

        The reason it is so fast is that it jumps directly to the appropriate code. In other words, if you have a long SELECT or IF/ELSEIF/ELSEIF... block, the compiler has to perform comparison after comparison until it finds a match. With ON GOTO, the compiler jumps directly to the correct choice, without making a lot of preliminary tests. Zooooom!

        -- Eric


        ------------------
        Perfect Sync Development Tools
        Perfect Sync Web Site
        Contact Us: mailto:[email protected][email protected]</A>
        "Not my circus, not my monkeys."

        Comment


        • #5
          Eric,

          Thanks so much for the advice - that's a great help. And I absolutely agree your comments about use of GOTO here.

          Cheers -

          Paul
          Zippety Software, Home of the Lynx Project Explorer
          http://www.zippety.net
          My e-mail

          Comment


          • #6
            Tyrone;

            I don't see the actual Display code here !

            All you show is reading the pixel values from memory but you don't
            show how it is actually drawn or rendered.

            Are you generating a Bitmap in RAM and then using BitBlt ?

            Or are you using SetPixelV to draw each pixel ?

            The proper way to build a Bitmap in memory and draw it fast is to
            use an actual Bitmap which is selected into a memory DC. Then move
            the Bitmap data to a DIB section, change the pixels in the DIB, then
            move the DIB data back to Bitamp from the DIB and then BitBlt the Bitmap
            from the memory DC to the Window DC.

            You cannot directly access a Bitmaps pixels in memory !
            You can use GDI functions like GetPixel/SetPixel on a Bitmap
            selected into a Memory DC, but these functions are extremely slow.
            Using DIB sections are the way you modify a Bitmap at high speed
            since once the image is a DIB, you can modify pixels at lightning speed.

            Without seeing your rendering code, it is impossible to tell how
            much the code you posted above is really affecting your speed. While
            your code above can be improved in speed (I would use Erics suggestion
            of On GOTO or ON GOSUB), I would likely guess it is the rendering
            code that is the real bottleneck.



            ------------------
            Chris Boss
            Computer Workshop
            Developer of "EZGUI"
            http://cwsof.com
            http://twitter.com/EZGUIProGuy

            Comment


            • #7

              A little clarification...

              Prior to any of these routines I use the GetBitmapBits to get the bitmap
              colors in an array. I store the bitmap color data in the structure
              I provided above.

              Whenever I want to use these tiles/textures, I create a screen surface.
              Screen (Width,Height) as WORD (Im in 16bit color)

              And all my copies/work take place in this array. I am NOT referencing the
              Windows GDI AT ALL, during this process. I am just moving values from
              the bitmap structure I explained to the appropriate screen location.
              When I am done all the work, I call UPDATE Screen (hbitmap)

              hbitmap is a Picture box.. The code executes a SetBitmapBits API call 1
              time to refresh the bitmap. This call is VERY fast, but the PowerBasic
              array / calculation routines are LAGGING big time..

              I will try the On Goto approach and maybe even isolate my code to
              really find out the problem.

              Oh one other thing..

              The control routine for the GetBMPPixel is a nested loop called fast copy.
              Maybe this is the problem. (It does have an If-Then statement in it.)

              Sub FastCopy (ByVal GType%, ByVal dx%, ByVal dy%, ByVal dWidth%, ByVal dHeight%, ByVal sX%, ByVal sY%, ByVal sWidth%, ByVal sHeight%)
              Dim x%, y%, px%, py%
              Dim ax%, ay%

              For y% = 1 To dHeight%
              For x% = 1 To dWidth%
              px% = sX% + Int((x% / dWidth%) * sWidth%): py% = sY% + Int((y% / dHeight%) * sHeight%)
              ax% = dx% + (x% - 1): ay% = dy% + (y% - 1)

              If (ax% >= 0 And ax% < CurrentScreenWidth%) And (ay% >= 0 And ay% < CurrentScreenHeight%) Then
              Screen(ax%, ay%) = GetBMPPixel(GType%, px%, py%)
              End If
              Next x%
              Next y%
              End Sub

              This routine gets pixel by pixel and places in on the screen with the possibility to
              stretch and shrink the source bmp..

              Other than the code as specified the routine is fairly simple. I don't know why
              its soo slow..


              ------------------
              Explorations v3.0 RPG Development System
              http://www.explore-rpg.com
              Explorations v9.10 RPG Development System
              http://www.explore-rpg.com

              Comment


              • #8
                You probably don't need those INT functions. Removing the parentheses from
                the IF statement will let PowerBASIC's early-out optimizations work for you.
                You may wish to put the GetBMPPixel functionality directly into the main
                loop to avoid the overhead of the function call.

                As others have mentioned for GetBMPPixel, you don't need the FIX function,
                and you can use an integer division there instead. You will get better speed
                from IF..THEN..ELSE than SELECT CASE.

                Also, you do not have to initialize page to zero. The compiler has
                already done this for you at the start of GetBMPPixel.

                ------------------
                Tom Hanlin
                PowerBASIC Staff

                Comment


                • #9
                  I will try those changes...

                  The thing that bothers me is that the PoserBasic code isn't
                  even close to the bit-through put that I need for high speed
                  graphics.

                  I started a new thread with a simple Next For-Next loop that fills
                  an array with random colors. I used a bitmap that is 418x318 and
                  can barely send the data to the screen 30 times a second.

                  The industry standard in 800x600 so would I be able to get better bit
                  speed by grabbing more bytes per iteration? Maybe a bmp line at a time?

                  ------------------
                  Explorations v3.0 RPG Development System
                  http://www.explore-rpg.com
                  Explorations v9.10 RPG Development System
                  http://www.explore-rpg.com

                  Comment


                  • #10
                    Hmm, it also occurs to me that you're doing a great deal of redundant calculation.
                    If you consider (x% / dWidth%) * sWidth% as x% * (sWidth% / dWidth%)
                    then it will be obvious that you can calculate sWidth% / dWidth% outside the
                    loop and store it in a variable. You may need to experiment to find out whether
                    single precision, double precision, or extended precision is required here.

                    418x318x30 means nearly four million pixels per second, which doesn't sound too
                    awful to me. Anyway, while your code can still stand some tuning, the bottleneck
                    may well be elsewhere. Use a profiler or add timing code so you can get a better
                    idea of where to look.

                    ------------------
                    Tom Hanlin
                    PowerBASIC Staff

                    Comment


                    • #11
                      Tyrone;

                      You need to add some way to test your code to see how much time
                      is actually being spent in different tasks. There are a number of ways, such
                      as writing the value of the clock (timer$) to a log file before
                      executing a procedure that takes some time and after.

                      You need to be able to find out how much time is actually being spent
                      on various parts of your code. If you don't "profile" your code, you
                      could spent days trying to optimize the wrong part of your code.

                      You need to find out where the bottleneck is and then concentrate
                      on that.

                      As far as the code you posted, the only thing I can say is that
                      you aren't optimizing the code much. The use of ON GOTO or ON GOSUB
                      (which I prefer) are very fast. SELECT CASE is definitely the wrong
                      structure to use if you want high speed.

                      Also, you shouldn't make function calls in the core of a high speed
                      loop. Function calls waste a lot of CPU cycles, because you are pushing
                      parameters on the stack and then you have to clean up the stack
                      when it returns. Definitely a time waster !

                      If you want your code to be modular, but speed is important, then use GOSUB
                      in the core loop, but keep all the functionality in one procedure.

                      Also avoid any reduncy (ie. INT(X&/Y&) - X& and Y& are integers and perform
                      integer divide, so no need for INT).

                      Lastly, first optimize your code as best as possible and then if
                      necessary convert what you can to ASSEMBLER ! Nothing beats assembler !




                      ------------------
                      Chris Boss
                      Computer Workshop
                      Developer of "EZGUI"
                      http://cwsof.com
                      http://twitter.com/EZGUIProGuy

                      Comment


                      • #12
                        The last time I programmed in Assembly was on a C=64. haha - showing my age
                        Thank ALL of you for your support. This code doesn't have to be modular
                        so I can eliminate the Jumps, get rid of the SELECT Case and really
                        attempt to write 1 complete routine to do all the work as FAST as
                        possible. Plus get rid of the redundant division calculations.

                        Speed is the key here.. Thanks again to ALL of you..

                        My goal is 1024x768 x30 but I'll settle for 800x600 x30..

                        Wish me luck!

                        ------------------
                        Explorations v3.0 RPG Development System
                        http://www.explore-rpg.com
                        Explorations v9.10 RPG Development System
                        http://www.explore-rpg.com

                        Comment


                        • #13
                          Originally posted by Chris Boss:
                          Also avoid any reduncy (ie. INT(X&/Y&) - X& and Y& are integers and perform
                          integer divide, so no need for INT).

                          Lastly, first optimize your code as best as possible and then if
                          necessary convert what you can to ASSEMBLER ! Nothing beats assembler !
                          Careful with that axe, Eugene! The expression in question isn't of the form
                          Code:
                          INT(X& / Y&)
                          --- which would be an integer division. It's
                          Code:
                          INT((X& / Y&) * Z&)
                          --- which is a floating point division, then a floating point multiplication,
                          finally converted to integer. This is slower than pure integer code, of course,
                          but it's fully likely that the extra precision is required here.

                          While it's true that, potentially, nothing beats programming in assembly language
                          (not "assembler", which is the program that converts assembly language to machine
                          code), it should be said that the results will be extremely dependent on the
                          programmer and the processor involved. The results of inexpert assembly language
                          programming can be much slower than an equivalent PowerBASIC routine.


                          ------------------
                          Tom Hanlin
                          PowerBASIC Staff

                          Comment


                          • #14
                            Tom,

                            > INT((X& / Y&) * Z&)

                            This could be optimized if you use LONGs and the values are pixel coordinates.
                            Using LONGs you can do the multiplication first and the division next, using integer logic.


                            Peter.


                            ------------------
                            [email protected]
                            [email protected]

                            Comment


                            • #15
                              Excellent point! And a fine demonstration of how it's almost always possible
                              to optimize something further.

                              ------------------
                              Tom Hanlin
                              PowerBASIC Staff

                              Comment


                              • #16
                                Pete..

                                > INT((X& / Y&) * Z&)

                                Why would you use LONGS instead of Integers in this case? LONGS process faster
                                than INTEGERS?

                                I posted this on another thread but, could someone email me or direct me
                                on places to find the key areas of optimization for PB. In short,
                                what areas can PB process faster than C++? So I can structure my
                                code around the areas that PB will run fastest.

                                If I bought the user manual for PB6.0/2.0 would it have this information?

                                Thanks again guys..
                                ------------------
                                Explorations v3.0 RPG Development System http://www.explore-rpg.com

                                [This message has been edited by Tyrone W. Lee (edited March 08, 2001).]
                                Explorations v9.10 RPG Development System
                                http://www.explore-rpg.com

                                Comment


                                • #17
                                  On a 32-bit system LONGs run faster than Integers. PB is optimized
                                  for LONGs so you should use LONGs wherever possible...


                                  Cheers

                                  Florent

                                  ------------------

                                  Comment


                                  • #18
                                    Amplifying Florent's comment, because this can have such a major effect...

                                    When using a 32-bit operating system, 32-bit operations are the most efficient. A LONG is a 32-bit signed integer, the fastest and most efficient variable type.

                                    PB/DOS users should keep in mind that the INTEGER type -- a 16-bit signed integer -- is the fastest variable type in a DOS program, even if that program is running under Windows. That's because the Windows "DOS subsytem" simulates a 16-bit environment for DOS apps. But for 32-bit apps, and that includes all PB/CC apps and all PB/DLL 2.0+ apps, LONGs are fastest. DWORDs (as I recall) come in second, but LONGs are definitely faster.

                                    -- Eric


                                    ------------------
                                    Perfect Sync Development Tools
                                    Perfect Sync Web Site
                                    Contact Us: mailto:[email protected][email protected]</A>

                                    [This message has been edited by Eric Pearson (edited March 09, 2001).]
                                    "Not my circus, not my monkeys."

                                    Comment


                                    • #19
                                      Tyrone,

                                      "What they said", plus you mentioned that these are pixel values. My comment was that you could eliminate floating point calculations by multiplying first and divide next. But then you'd have to use LONGs, just to make sure there's no overflow from the multiplication.

                                      You can only use this optimization if you're sure there can never be an overflow. That's another reason for using LONGs and integer calculations.

                                      Peter.


                                      ------------------
                                      [email protected]
                                      [email protected]

                                      Comment


                                      • #20
                                        Originally posted by Tyrone W. Lee:
                                        I posted this on another thread but, could someone email me or direct me
                                        on places to find the key areas of optimization for PB. In short,
                                        what areas can PB process faster than C++? So I can structure my
                                        code around the areas that PB will run fastest.
                                        The problem is, here, is that you're talking about two very different languages
                                        (and not even any specific implementation of C++ !). It doesn't make any sense
                                        to compare the two in this fashion. To the extent the languages are directly
                                        comparable, they're liable to provide similar results. To the extent that they're
                                        not directly comparable, you're liable to be able to get similar results with
                                        appropriate coding. The way to get the best results is not to go bouncing
                                        from one compiler to another and expecting a miracle, but to focus on becoming
                                        a better programmer and improving your understanding of the compiler(s) that you
                                        use.

                                        ------------------
                                        Tom Hanlin
                                        PowerBASIC Staff

                                        Comment

                                        Working...
                                        X