Announcement

Collapse
No announcement yet.

ARRAY SCAN oddity

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ARRAY SCAN oddity

    I've been playing with a variety of table search routines, and benchmarking various scenarios.

    But with one attempt using ARRAY SCAN, I'm getting unexpected results. I create a table (10,000 items), and then later do an ARRAY SCAN for a range of entries. i.e. I provide the starting index and the FOR value.

    But benchmarking says that when the scan has COLLATE UCASE it runs faster than when COLLATE UCASE is left out. In my tests, the COLLATE UCASE runs is about 1/2 the time of the one without COLLATE UCASE.

    This seems opposite to what I expected. Surely there's more overhead to perform COLLATE UCASE comparisons.

    Anyone else experienced this? Or have a possible explanation?

    I've checked the benchmarking measuring and I'm confident it's correct, I've been using the method for years.

    George

    P.S. I can provide a demo if people would like, but thought I'd ask the general question first.


  • #2
    I suspect that it is data-dependent. If you compare sorting BOB, Bob, BoB, and bOb to sorting BOB, BOB, BOB and BOB, the second sort is much simpler than the first.

    Similarly, if you start with BOB, BOB, BOB, and BOB, I'd expect it to sort a little bit slower with COLLATE UCASE because of the useless case-conversions.
    "Not my circus, not my monkeys."

    Comment


    • #3
      It is much easier and less code to compare " text" if they are all the same case upper or lower.

      I timed out posting , Eric hit it.
      A dozen what.

      Comment


      • #4
        Here's a Demo to play with. Sorry it is complicated by having an Object in there, but the routine I'm testing is a METHOD and I want to be able to swap an improved version into the main program once I feel I've optimized it all I can.

        George

        ScanBench.bas

        I tried to insert this within [CODE] brackets, but it lost all formatting. What's the trick to inserting code properly?

        Comment


        • #5
          And this is ARRAY SCAN, not ARRAY SORT.

          Comment


          • #6
            Posted Source Code Losing Indentation/Formatting - PowerBASIC Peer Support Community
            A dozen what.

            Comment


            • #7
              > not ARRAY SORT

              My mistake, but I think it's the same idea. If you have an array of Bobs, and all of them are BOB, you will find the first one faster.
              "Not my circus, not my monkeys."

              Comment


              • #8
                The test data here is randomly generated, and is the same for both tests (with COLLATE and without), so there is no 'items already in sequence' impact. The range of the table being searched IS sorted and in sequence.

                George

                Comment


                • #9
                  Originally posted by George Deluca View Post
                  Here's a Demo to play with. Sorry it is complicated by having an Object in there, but the routine I'm testing is a METHOD and I want to be able to swap an improved version into the main program once I feel I've optimized it all I can.

                  George

                  [ATTACH]n811231[/ATTACH]

                  I tried to insert this within [CODE] brackets, but it lost all formatting. What's the trick to inserting code properly?
                  Put your CODE brackets around your code in the PB Editor and then copy paste the whole thiing
                  Code:
                  #TOOLS OFF
                  #INCLUDE ONCE "Win32Api.inc"
                  %MaxLoop = 10000
                  TYPE KWIX
                     StartItem AS INTEGER
                     EndItem   AS INTEGER
                  END TYPE
                  GLOBAL qFreq, qStart, qStop AS QUAD
                  GLOBAL TT AS iTest
                  
                  CLASS cTest
                     INSTANCE ClrKW()     AS STRING
                     INSTANCE ClrKWAns()  AS LONG
                     INSTANCE ClrKWDlm()  AS LONG
                     INSTANCE ClrKWIX()   AS KWIX
                     INSTANCE ClrKWLoad   AS LONG
                     INSTANCE ClrKWNum    AS LONG
                     INSTANCE IsClrCase  AS LONG
                  
                     CLASS METHOD CREATE()
                     LOCAL t AS STRING, i AS LONG
                        DIM ClrKWDlm(255) AS INSTANCE LONG
                        DIM ClrKWIX(255)  AS INSTANCE KWIX
                        '---- Build some random data
                        ClrKWNum = %Maxloop
                        DIM ClrKW(1 TO %MaxLoop)      AS INSTANCE STRING
                        DIM ClrKWAns(1 TO %MaxLoop)   AS INSTANCE LONG
                        FOR i = 1 TO ClrKWNum
                           t = CHR$(RND(65, 122)) + REPEAT$(RND(1, 10), "X") + FORMAT$(i, "00000")
                           ClrKW(i) = CHR$(LEN(t)) + t
                           ClrKWAns(i) = i
                        NEXT i
                     END METHOD
                  
                     INTERFACE iTest: INHERIT IUNKNOWN
                        PROPERTY GET ClrKWLoad AS LONG:      PROPERTY = ClrKWLoad:   END PROPERTY
                        PROPERTY SET ClrKWLoad(v AS LONG):   ClrKWLoad = v:          END PROPERTY
                  
                        PROPERTY GET IsClrCase AS LONG:     PROPERTY = IsClrCase:    END PROPERTY
                        PROPERTY SET IsClrCase(v AS LONG):  IsClrCase = v:           END PROPERTY
                  
                        METHOD Test1()
                        LOCAL i, j AS LONG, str AS STRING POINTER
                           FOR i = 1 TO ClrKWNum
                              j = me.ClrKWSrch1(ClrKW(i), str)
                  '            IF VAL(RIGHT$(ClrKW(i), 5)) <> j THEN
                  '               MSGBOX "Search failed" + ClrKW(i) + " " + FORMAT$(j)
                  '            END IF
                           NEXT i
                        END METHOD
                  
                        METHOD ClrKWSrch1 (srch AS STRING, found AS DWORD) AS LONG
                        '---------- Do Binary lookup of color keywords
                        REGISTER i AS LONG                                          ' Use registers everywhere possible
                        REGISTER j AS LONG                                          '
                        LOCAL iFnd AS LONG
                  
                           METHOD = 987654: found = VARPTR(srch)                    ' Default fail answer
                  
                           '----- Build Index if KWs just loaded
                           IF ClrKWLoad THEN                                        ' Just loaded?
                              ClrKWLoad = %False                                    ' Just do this once
                              ARRAY SORT ClrKW() FOR ClrKWNum, TAGARRAY ClrKWAns()  ' Sort by length/text
                              RESET ClrKWIX()                                       ' Reset the Index and DLM tables
                              FOR i = 0 TO 255: ClrKWDlm(i) = 987654: NEXT i        '
                              FOR i = 1 TO ClrKWNum                                 ' Build the index
                                 j = ASC(ClrKW(i))                                  ' Get length of this KW
                                 ClrKW(i) = MID$(ClrKW(i), 2)                       ' Strip off length byte
                                 IF j = 1 THEN                                      ' Just a DLM?
                                    ClrKWDlm(ASC(ClrKW(i))) = ClrKWAns(i)           ' Save the Ans in DLM table
                                    ITERATE FOR                                     ' That's all that's needed
                                 END IF                                             '
                                 IF ClrKWIX(j).StartItem = 0 THEN ClrKWIX(j).StartItem = i ' 1st item of this length?
                                 ClrKWIX(j).EndItem   = i                           ' And as the end
                              NEXT i                                                '
                           END IF                                                   '
                  
                           '----- Do the search
                           i = LEN(srch)                                            ' Get length of this search word
                           IF i = 1 THEN METHOD = ClrKWDlm(ASC(srch)): found = VARPTR(srch): EXIT METHOD ' Exit quickly if just a DLM character
                           j = ClrKWIX(i).EndItem                                   ' Set ending range
                           i = ClrKWIX(i).StartItem                                 ' Set starting range
                           IF i = 0 THEN EXIT METHOD                                ' No range, exit
                           IF IsClrCase THEN                                        ' Case sensitive?
                              ARRAY SCAN ClrKW(i) FOR (j - i + 1), =srch, TO j      ' Do normal search
                           ELSE                                                     '
                              ARRAY SCAN ClrKW(i) FOR (j - i + 1), COLLATE UCASE, =srch, TO j ' Do case insensitive search
                           END IF                                                   '
                           IF j = 0 THEN EXIT METHOD                                ' Not found
                           METHOD = ClrKWAns(j + i - 1) : found = VARPTR(ClrKW(j + i - 1)) ' Return result
                        END METHOD
                  
                     END INTERFACE
                  END CLASS
                  
                  FUNCTION PBMAIN () AS LONG
                     QueryPerformanceFrequency qFreq
                     debug " "                                                    ' To get console activated
                  
                  Test1:
                     LET TT = NOTHING
                     LET TT = CLASS "cTest"                                       ' Create the Object structure
                     TT.ClrKWLoad  = %True
                     TT.IsClrCase  = %True
                     QueryPerformanceCounter   qStart
                     TT.Test1()
                     QueryPerformanceCounter   qStop
                     debug "Without COLLATE UCASE = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
                  
                  Test2:
                     LET TT = NOTHING
                     LET TT = CLASS "cTest"                                       ' Create the Object structure
                     TT.ClrKWLoad  = %True
                     TT.IsClrCase  = %False
                     QueryPerformanceCounter   qStart
                     TT.Test1()
                     QueryPerformanceCounter   qStop
                     debug "With    COLLATE UCASE = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
                  
                  ExitPrompt:
                     MSGBOX "Press OK to terminate"
                  END FUNCTION
                  
                  SUB      DEBUG (st AS STRING)
                  '---------- Print stuff to a console type window
                  STATIC hDebug AS DWORD
                     '----- Allocate a Window if we haven't already done so
                     IF hDebug = 0 THEN                                             ' Got one yet?
                        TXT.WINDOW("Debug Log", 0, 0, 40, 80) TO hDebug             ' No, get it now
                        TXT.PRINT TIME$ + " ========== Debug Log Started =========="' Print header
                     END IF
                     TXT.PRINT TIME$ + " " + st                                     ' Print the string
                  END SUB

                  Comment


                  • #10
                    A is ASC(65) whereas a is ASC(97) so the bit that =32(6th) can be ignored for COLLATE UCASE, no actual substitution has to take place and it only has to check for half the alpha characters. Perhaps it makes up the time difference thus.
                    Rod
                    In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                    Comment


                    • #11
                      Haven't analysed it in detail, but:
                      Does this:
                      ARRAY SORT ClrKW() FOR ClrKWNum, TAGARRAY ClrKWAns()
                      sort the array so that upper case entries comes first?
                      In which case, COLLATE UCASE, may requirer shorter scans each time.

                      Comment


                      • #12
                        Originally posted by Stuart McLachlan View Post
                        Haven't analysed it in detail, but:
                        Does this:
                        ARRAY SORT ClrKW() FOR ClrKWNum, TAGARRAY ClrKWAns()
                        sort the array so that upper case entries comes first?
                        In which case, COLLATE UCASE, may requirer shorter scans each time.
                        ARRAY SORT ClrKW() FOR ClrKWNum, TAGARRAY ClrKWAns(), DESCEND would work to show if that is the case, methinks. Would require adding a third test to the pot and perhaps even a fourth test.

                        Incidentally, in my post (#10) I pointed out the sixth bit, which is bit #5 since they're zero based and that may not have been clear. A little further study on that revealed that both UCASE$(on ASC(97 to 122)) and LCASE$(on ASC(65 to 90)) seem to use:
                        BIT TOGGLE tmp, 5. Of course, 'under the hood' things may be different.
                        Code:
                        #COMPILE EXE
                        #DIM ALL
                        #INCLUDE ONCE "WIN32API.INC"
                        FUNCTION PBMAIN () AS LONG
                          LOCAL twin, ndx AS LONG
                          LOCAL tmp AS BYTE
                          LOCAL str, newstr AS STRING
                        
                          TXT.WINDOW ("A CASE of CASEs",10,10,60, 80) TO twin
                          FOR ndx=65 TO 122
                            tmp=ndx
                            SELECT CASE ndx
                            CASE 65 TO 90    ' similar to LCASE$
                              str=BIN$(tmp,8)
                              BIT TOGGLE tmp, 5
                              newstr=BIN$(tmp,8)
                              TXT.PRINT ndx, str, newstr, CHR$(tmp), tmp
                            CASE 97 TO 122   ' similar to UCASE$
                              str=BIN$(tmp,8)
                              BIT TOGGLE tmp, 5
                              newstr=BIN$(tmp,8)
                              TXT.PRINT ndx,str, newstr, CHR$(tmp), tmp
                            CASE ELSE
                        
                            END SELECT
                          NEXT ndx
                          TXT.WAITKEY$
                          TXT.END
                        END FUNCTION
                        Rod
                        In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                        Comment


                        • #13
                          Rodney: I tried it with DESCEND, no effect, the COLLATE UCASE is still faster (about 1/2 the time)

                          Regarding how PB handles UCASE, I'm sure I read in some post that PB UCASE properly handles the ASCII European characters, which means it can't really just ignore a bit, or do a simple character range and subtract 32. I assume COLLATE UCASE is equally thorough.

                          I'm just going to proceed with my optimization and leave this as a curiosity. What I want to try is replacing ARRAY SCAN with a binary search to get a comparison. Anyone have a proven sample binary search code?

                          George

                          Comment


                          • #14
                            George, that was why my comment about 'under the hood'. However, I do think that the gains made are a result of having fewer different characters to scan.
                            Rod
                            In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

                            Comment

                            Working...
                            X