Announcement

Collapse
No announcement yet.

PowerBASIC DLL for C# App - Convert to Black and White

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #41
    "without needing add-ons" would be my preference too but in the event that a gpu wouldn't do what's needed...that 144 core is general purpose.

    Comment


    • #42
      Yep, there's a substitution folks use (R+R+B+G+G+G)/6. The exact number are not very critical at all.

      I've not considered multi-core solutions at all
      I'd try the integer math above to replace the FP math before I would even think about ^%$!$@# around with CPU allocation or graphics chips.

      The formula you give here might even be transferable to at least partially BASIC bitwise Boolean operations.

      Or maybe go around the problem with something like...
      Code:
      incr R,2
      incr B
      incr G, 3
      integer divide by 6
      or something like that.

      The FP math is almost certainly what is slowing you down here.


      I "may" have said here before, the best way to eliminate any performance issues caused by certain code is to not do that code at all.

      MCM
      Michael Mattias
      Tal Systems Inc.
      Racine WI USA
      mmattias@talsystems.com
      http://www.talsystems.com

      Comment


      • #43
        Originally posted by Michael Mattias View Post

        I'd try the integer math above to replace the FP math before I would even think about ^%$!$@# around with CPU allocation or graphics chips.
        Yes. The FP math is perfectly adequate for single use.
        For continued use where your repetition of code builds up the lag,
        using integer math is going to be better.
        Here, I use integer math and an array of type BGRA REDIMed AT
        the start of the pixel data obtained by GRAPHIC GET BITS.
        Code:
        #COMPILE EXE
        #DIM ALL
        
        TYPE bgra
            b AS BYTE
            g AS BYTE
            r AS BYTE
            a AS BYTE
        END TYPE
        
        ENUM ctrls SINGULAR
            id_grid = 500
            id_new
            id_red
            id_grn
            id_blu
            id_gry
        END ENUM
        
        FUNCTION PBMAIN () AS LONG
            LOCAL hWin AS DWORD, returncode AS LONG
            RANDOMIZE
            DIALOG NEW PIXELS, 0, "BGRA Grid", , , 300, 265, %WS_POPUP OR %WS_SYSMENU OR %WS_CAPTION TO hWin
            CONTROL ADD BUTTON, hWin, %id_new, "NEW", 5, 3, 50, 20
            CONTROL ADD BUTTON, hWin, %id_red, "RED", 65, 3, 50, 20
            CONTROL ADD BUTTON, hWin, %id_grn, "GREEN", 125, 3, 50, 20
            CONTROL ADD BUTTON, hWin, %id_blu, "BLUE", 185, 3, 50, 20
            CONTROL ADD BUTTON, hWin, %id_gry, "GRAY", 245, 3, 50, 20
            CONTROL ADD GRAPHIC, hWin, %id_grid, "bgra grid", 0, 25, 300, 240
            DIALOG SHOW MODAL hWin, CALL DlgProc TO returncode
        END FUNCTION
        
        CALLBACK FUNCTION DlgProc() AS LONG
            STATIC bgstr AS STRING
            SELECT CASE CB.MSG
                CASE %WM_INITDIALOG
                    ' create background
                    bgdraw(CB.HNDL, %id_grid)
                    GRAPHIC ATTACH CB.HNDL, %id_grid
                    ' store background in static string
                    GRAPHIC GET BITS TO bgstr
                CASE %WM_COMMAND
                    SELECT CASE CB.CTL
                        CASE %id_new
                            IF CB.CTLMSG = %BN_CLICKED THEN
                                ' create background
                                bgdraw(CB.HNDL, %id_grid)
                                ' store background in static string
                                GRAPHIC ATTACH CB.HNDL, %id_grid
                                GRAPHIC GET BITS TO bgstr
                            END IF
                        CASE %id_red TO %id_gry
                            IF CB.CTLMSG = %BN_CLICKED THEN
                                ' restore background from static string to prevent data loss in conversion
                                GRAPHIC ATTACH CB.HNDL, %id_grid
                                GRAPHIC SET BITS bgstr
                                ' manipulate background with bgra array
                                changegraphic(CB.HNDL, %id_grid, CB.CTL)
                            END IF
                    END SELECT
            END SELECT
        END FUNCTION
        
        SUB bgdraw(win AS DWORD, ctl AS LONG)
            LOCAL x, y, z, r, c AS LONG
            GRAPHIC ATTACH win, ctl
            GRAPHIC COLOR %WHITE, %BLACK
            GRAPHIC CLEAR
            FOR z = 1 TO 100
                x = RND(0, 299)
                y = RND(0, 239)
                r = RND(10, 100)
                c = RGB(RND(20, 250), RND(20, 250), RND(20, 250))
                GRAPHIC ELLIPSE (x - r, y - r)-(x + r, y + r), c, c
            NEXT
        END SUB
        
        SUB changegraphic(win AS DWORD, ctl AS LONG, todo AS LONG)
            LOCAL strbuf AS STRING, pxl_arr() AS bgra, x, y, p AS LONG
            ' get buffer size to x, y
            CONTROL GET CLIENT win, ctl TO x, y
            ' calculate buffer array length
            x = (x * y) -1
            GRAPHIC ATTACH win, ctl
            ' get string from GRAPHIC object
            GRAPHIC GET BITS TO strbuf
            ' create array a buffer, offset eight bytes for two long values for x, y
            REDIM pxl_arr(x) AT STRPTR(strbuf) + 8
            ' select which button was pressed
            SELECT CASE todo
                CASE %id_red
                    FOR y = 0 TO x
                        ' use integer math to calculate gray scale brightness
                        p = ((pxl_arr(y).r * 299) + (pxl_arr(y).g * 587) + (pxl_arr(y).b * 114)) \ 1000
                        ' use shades of red
                        pxl_arr(y).r = 255
                        pxl_arr(y).g = p
                        pxl_arr(y).b = p
                    NEXT
                CASE %id_grn
                    FOR y = 0 TO x
                        ' green is too bright, so reduce range and scale green's brightness
                        p = ((pxl_arr(y).r * 299) + (pxl_arr(y).g * 587) + (pxl_arr(y).b * 114)) \ 2000
                        pxl_arr(y).r = p
                        ' use shades of green
                        pxl_arr(y).g = p + 127
                        pxl_arr(y).b = p
                    NEXT
                CASE %id_blu
                    FOR y = 0 TO x
                        p = ((pxl_arr(y).r * 299) + (pxl_arr(y).g * 587) + (pxl_arr(y).b * 114)) \ 1000
                        pxl_arr(y).r = p
                        pxl_arr(y).g = p
                        ' use shades of blue
                        pxl_arr(y).b = 255
                    NEXT
                CASE %id_gry
                    FOR y = 0 TO x
                        p = ((pxl_arr(y).r * 299) + (pxl_arr(y).g * 587) + (pxl_arr(y).b * 114)) \ 1000
                        ' use gray scale brightness value
                        pxl_arr(y).r = p
                        pxl_arr(y).g = p
                        pxl_arr(y).b = p
                    NEXT
            END SELECT
            ' set GRAPHIC object from string
            GRAPHIC SET BITS strbuf
        END SUB
        The world is strange and wonderful.*
        I reserve the right to be horrifically wrong.
        Please maintain a safe following distance.
        *wonderful sold separately.

        Comment


        • #44
          From post #30, with 0.044s as the speed that code reaches, here's an analysis of where the time comes from ....

          Code:
          Sub Byte_ConversionToBW(w As Long, h As Long, wh As Long)  'uses BYTE pointer
             Local i,iColor As Long,R,G,B As Long, bp As Byte Ptr, p As Long Ptr
             bmpOUT$ = bmpIN$
             p = StrPtr(bmpOUT$)+8
             For i = 1 To wh                                           '9ms (loop plus initialization lines above)
                bp = p : B = @bp[0] : G = @bp[1] : R = @bp[2]          '17ms
                iColor = 0.299*R + 0.587*G + 0.114*B                   '10ms
                If iColor < BWTrigger Then @p = BgrA Else @p = BgrB    '8ms
                Incr p                                                 '(part of the 9ms group)
             Next i
          End Sub
          The basic looping takes 9ms
          The pointer value extraction 17ms
          The color averaging 10ms
          The color compare/assignment 8ms

          Comment


          • #45
            And here's the numbers using the Type BGRA suggestion.

            ... be right back ...
            Last edited by Gary Beene; 10 Aug 2017, 07:38 AM.

            Comment


            • #46
              Gary,

              I don't log on here as often as I used to. I'll try find the project I sent and resend it asap.

              Russ
              "There are two novels that can change a bookish fourteen-year old's life: The Lord of the Rings and Atlas Shrugged. One is a childish fantasy that often engenders a lifelong obsession with its unbelievable heroes, leading to an emotionally stunted, socially crippled adulthood, unable to deal with the real world. The other, of course, involves orcs." - John Rogers

              Comment


              • #47
                Russ!
                No hurry - as you can tell I'm split between a variety of tasks. But I will watch for it!

                Comment


                • #48
                  Patrice,
                  In Post #9, the code you posted was just for gray scale conversion, correct? I don't see how that converts pixels into one of a selected pair of colors bases on a trigger value. Can you clarify?

                  Comment


                  • #49
                    There is a lot of scope for improving the speed but, sticking to just PB and no ASM, here's a SUB that runs twice as fast.

                    On my PC, the original program from post 30 took about 69ms.
                    The code in post 44 took 58ms.


                    Then remove the unused i loop variable and use p as the loop counter .. now takes 56ms
                    Replace the mixed up BYTE and LONG pointers with a single loop counter and use PEEK and POKE .. now takes 35ms
                    Replace the FP by calculating with scaled integers .. now takes 28ms.

                    Even a 30% change would be useful.
                    Mission accomplished?

                    Code:
                    PBWin10 SUB
                    SUB Byte_ConversionToBW(w AS LONG, h AS LONG, wh AS LONG)  'uses BYTE pointer
                       LOCAL p,iColor,R,G,B, BWtriggerScaled AS LONG
                       bmpOUT$ = bmpIN$
                    
                       BWtriggerScaled = BWTrigger * 65536
                    
                       p = STRPTR(bmpOUT$)+8
                       FOR p = p  TO  p + 4* wh  STEP 4
                    
                          B = PEEK(BYTE, p) : G = PEEK( BYTE, p+1) : R = PEEK(BYTE, p+2)
                    
                          iColor =  19595 * R + 38470 * G + 7471 * R    'these are the same coefficients but *65536, scaled the same as the trigger value was
                    
                          IF iColor < BWtriggerScaled THEN POKE LONG,p, BgrA ELSE POKE LONG, p, BgrB
                    
                       NEXT i
                    
                    END SUB

                    Comment


                    • #50
                      Paul!
                      Good grief! That code destroys what I posted earlier. Mine took 0.046s whereas your code takes only 0.017s, just barely 1/3 the time I posted. That's amazing!

                      And, you think ASM can do even better? I can't wait to see that too (hint!), but just that code alone supports an almost 60fsp. Very nice, and thanks for sharing!

                      Comment


                      • #51
                        Paul,
                        I used your scaling approach to the pointer code I posted. That change alone dropped the 0.046s down to about 0.040s. A worthwhile change all by itself, but the huge difference is in your use of the Peek/Poke to get the job done.

                        Code:
                        Sub Byte_ConversionToBW(w As Long, h As Long, wh As Long)  'uses BYTE pointer
                           Local i,iColor, BWTriggerScaled As Long,R,G,B As Long, bp As Byte Ptr, p As Long Ptr
                           bmpOUT$ = bmpIN$
                           BWtriggerScaled = BWTrigger * 65536
                           p = StrPtr(bmpOUT$)+8
                           For i = 1 To wh                                          
                              bp = p
                              iColor = 19595*(@bp[0]) + 38470*(@bp[1]) + 7471*(@bp[2])
                              If iColor < BWTriggerScaled Then @p = BgrA Else @p = BgrB   
                              Incr p              
                           Next i
                        End Sub

                        Comment


                        • #52
                          Gary,
                          "your code takes only 0.017s,"

                          it shouldn't be that fast, you'd better check carefully that it works.

                          Paul.

                          Comment


                          • #53
                            Gary,
                            "your code takes only 0.017s"

                            As expected, I'm not going to beat that by much with the usual ASM. The following runs in 17 to 18ms on my PC which is better than the 28ms I was getting with the BASIC code but no better than you were getting.
                            The processing takes about 11ms here. The remainder is mostly consumed by your copying of the string from bmpIN$ to bmpOUT$

                            To beat it by more I would have to look at using MMX and SSE code.

                            Code:
                            'PBWin10 SUB
                            SUB Byte_ConversionToBW(w AS LONG, h AS LONG, wh AS LONG)  
                            #REGISTER NONE    'I'm using the registers so done let the compiler mess them up
                            
                               LOCAL p,BWtriggerScaled AS LONG
                               bmpOUT$ = bmpIN$
                            
                               BWtriggerScaled = BWTrigger * 65536
                            
                               p = STRPTR(bmpOUT$)+8
                            
                                !mov ecx,wh     'get the address of wh (it's a SUB parameter so when I load it I get the address, not the value).
                                !mov ecx,[ecx]  'get the value of wh
                            
                                !mov esi,p      'get the pointer to the data into esi
                            
                            #ALIGN 16           'align the jump target of the loop on a cache boundary as it's faster
                            lp1:
                                !movzx eax,byte ptr [4*ecx+esi]     'get the B byte
                                !imul eax,7471                      'multiply by 0.114 * 65536
                            
                                !movzx edx,byte ptr [4*ecx+esi+1]   'get the G byte
                                !imul edx,38470                     'muliply by 0.587 * 65536
                            
                                !movzx edi,byte ptr [4*ecx+esi+2]   'get the R byte
                                !imul edi,19595                     'muliply by 0.299 * 65536
                            
                                !add eax,edx                        'add the B and G results
                                !add eax,edi                        'add in the R result
                            
                                !mov edx,BgrB                       'set the default colour ready for later
                            
                                !cmp eax,BWtriggerScaled            'is the sum greater than the threshold?
                                !cmovl edx,BgrA                     'choose the other colour for less than the threshold
                            
                                !mov [4*ecx+esi],edx                'store the new colour
                            
                                !dec ecx                            'count down the loop
                                !jns short lp1                      'if not negative, go back for the next pixel
                            
                            END SUB
                            Last edited by Paul Dixon; 11 Aug 2017, 02:43 PM.

                            Comment


                            • #54
                              Gary,

                              I looked for that little sample rgb->bw converter that I wrote and I don't remember what I called it. If you have the name, I have a better chance of finding it.

                              Russ
                              "There are two novels that can change a bookish fourteen-year old's life: The Lord of the Rings and Atlas Shrugged. One is a childish fantasy that often engenders a lifelong obsession with its unbelievable heroes, leading to an emotionally stunted, socially crippled adulthood, unable to deal with the real world. The other, of course, involves orcs." - John Rogers

                              Comment


                              • #55
                                Hey Paul!

                                Well, I've run it several more times and the image changes as expected and the time is still showing as 0.017s. When I use your ASM code, that drops to about 0.012s - with the picture changing as expected.

                                I have a 10-year old i7, so nothing magic about my setup that I know of.

                                I'll do some more looking at the code to see if anything seems awry.

                                I'm making a few other changes and when I finish I'll post it again so you can see if my files work the same for you.

                                Comment


                                • #56
                                  Hi Russ!
                                  I'll go take a look ... be back ...

                                  Comment


                                  • #57
                                    Gary,
                                    my AMD CPU is about 10 years old too.

                                    It used to be that AMD chips were faster at processing the data but Intel chips could move it around faster.
                                    It may just be that as the amount of processing was reduced and memory access became the dominant factor, my AMD CPU started running into problems with the memory not keeping up so didn't speed up as much as yours which may have been able to take advantage of faster memory.
                                    Whatever the reason, it does look to be doing what it's meant to.

                                    My PC has a maximum transfer rate of about 5GB/s and the ASM version is running at about 3.5GB/s so there isn't much room for improvement. .. unless I get a new PC, which is a few years overdue!

                                    Comment


                                    • #58
                                      A threaded version.
                                      This one runs in under 4ms on my PC.
                                      Code:
                                      'PBWin10 snippet
                                      TYPE ThreadToken            'used to pass information to the threads fo which part of the image they should process
                                          StartAddress AS LONG
                                          PixelsToDo   AS LONG
                                          BWtriggerScaled AS LONG
                                      END TYPE
                                      
                                      
                                      SUB Byte_ConversionToBW(w AS LONG, h AS LONG, wh AS LONG)
                                      
                                      LOCAL r,p,BWtriggerScaled, NoofThreads AS LONG
                                      STATIC CalledBefore AS LONG
                                      LOCAL BlockInfo() AS ThreadToken
                                      LOCAL hThreads() AS LONG
                                      
                                      IF NOT CalledBefore THEN
                                          CalledBefore = -1   'so we don't do this again
                                          'only do this copy once to create the bitmap rather than re-do it each time
                                          bmpOUT$ = bmpIN$
                                      
                                      END IF
                                      
                                      NoofThreads = 4  'be careful to only use numbers which divide into the number of pixels with no remainder as I don't check for odd pixels. e.g 2,4,8,16 may be ok
                                      DIM hThreads(1 TO NoofThreads)
                                      DIM BlockInfo(1 TO NoofThreads)
                                      
                                         BWtriggerScaled = BWTrigger * 65536
                                      
                                      
                                         p = STRPTR(bmpOUT$)+8
                                      
                                          FOR r = 1 TO NoofThreads
                                              BlockInfo(r).StartAddress = p + (r-1)*(wh\NoofThreads)*4
                                              BlockInfo(r).PixelsToDo = wh \ NoofThreads
                                              BlockInfo(r).BWtriggerScaled = BWtriggerScaled
                                      
                                              THREAD CREATE ProcessingThread(VARPTR(BlockInfo(r))) TO hThreads(r)
                                      
                                          NEXT
                                      
                                          FOR r = 1 TO  NoofThreads
                                              WaitForSingleObject(hThreads(r), BYVAL %INFINITE)
                                      
                                          NEXT
                                      
                                      
                                      END SUB
                                      
                                      
                                      
                                      THREAD FUNCTION ProcessingThread(BYVAL TokenPointer AS LONG) AS LONG
                                      #REGISTER NONE    'I'm using the registers so don't let the compiler mess them up
                                      
                                      LOCAL InputData AS ThreadToken  PTR
                                      LOCAL StartAddress,PixelsToDo, BWtriggerScaled AS LONG
                                      
                                      InputData=TokenPointer
                                      
                                      StartAddress = @InputData.StartAddress
                                      PixelsToDo = @InputData.PixelsToDo
                                      BWtriggerScaled = @InputData.BWtriggerScaled
                                      
                                      !mov ecx,PixelsToDo
                                      !mov esi,StartAddress
                                      
                                      #ALIGN 16           'align the jump target of the loop on a cache boundary as it's faster
                                      lp1:
                                          !movzx eax,byte ptr [4*ecx+esi]     'get the B byte
                                          !imul eax,eax,7471                      'multiply by 0.114 * 65536
                                      
                                          !movzx edx,byte ptr [4*ecx+esi+1]   'get the G byte
                                          !imul edx,edx,38470                     'muliply by 0.587 * 65536
                                      
                                          !movzx edi,byte ptr [4*ecx+esi+2]   'get the R byte
                                          !imul edi,edi,19595                     'muliply by 0.299 * 65536
                                      
                                          !add eax,edx                        'add the B and G results
                                          !add eax,edi                        'add in the R result
                                      
                                          !mov edx,BgrB                       'set the default colour ready for later
                                      
                                          !cmp eax,BWtriggerScaled            'is the sum greater than the threshold?
                                          !cmovl edx,BgrA                     'no, choose the other colour for less than the threshold
                                      
                                      
                                          !mov [4*ecx+esi],edx                'store the new colour
                                      
                                          !dec ecx                            'count down the loop
                                          !jns short lp1                      'if not negative, go back for the next pixel
                                      
                                      END FUNCTION

                                      Comment


                                      • #59
                                        Paul,
                                        No fair ... I have to leave for a moment and you picked now to tease me! I'll be back in a few hours and take a look!

                                        Comment


                                        • #60
                                          Paul,
                                          That is just this side of magic! I get similar times here - 4ms or less.

                                          So, any PC with a multi-core processor should be able to take advantage of this approach? Your code creates the threads and Windows handles the assignment of threads to available cores. If there's only one core, it still works, just not as fast.

                                          At those speeds the app I'm working on that uses AForge will take way more time just to acquire each video frame than it will take to process the frame. Frame rates of 1000/4= 500 for the algorithm means it pretty much has zero impact on the task. The overhead of other video function dwarfs the frame processing time.

                                          That's awesome, and thank you for the demonstration!

                                          I think I'll make a thread example using one of my earlier posts, one without ASM just to see the speed boost that you've demonstrated.

                                          Comment

                                          Working...
                                          X