Announcement

Collapse
No announcement yet.

Need better (faster) INSTR

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Stuart: Finally got to testing out your Flag code. There is a problem.

    The Alloc routine finds the leftmost available bit, this creates the 'flag number'.

    But BIT SET / RESET count the bit number from Right-to-Leftm modulo 'n', based on the definition of the BIT SET variable. So when you allocate a bit, BIT SET is not setting the same bit you just found.

    So the alloc keeps finding the same bit over and over.

    e.g. (Using a 16 bit variable)
    Start X'0000' Alloc chooses Buffer 1, Flag=1
    BIT SET var, Flag - 1
    X'0001' Oops, buffer 32 is set
    Alloc keeps assigning Flag = 1 forever.


    So although BIT SET will handle bit numbers in the millions, it does not set the bits serially from MSB to LSB, but the other way around, for each item in the bit flag array.

    We really need a BIT SET function to handle the bit number from left to right.

    George

    Comment


    • #22
      George,

      Which code? The final version here https://forum.powerbasic.com/forum/u...174#post795174 or one of the earlier versions?


      "The Alloc routine finds the leftmost available bit, this creates the 'flag number'."
      There is no "Alloc routine" in any of the code I have posted. Are you referring to the GetFirstFreeFlag procedure?
      It does not "find the leftmost available bit", it finds the "lowest available flag number".
      Your assumption that this returns a different flag number to the one used in the SetFlag, ClearFlag and FlagIsSet procedures is incorrect

      The application creates a logical array of flags numbered sequentially (from 1). How those flags are physically represented in memory is irrelevant.

      While it is a fact that the physical representation in memory of Flag(1) is the 64th bit in the block of memory allocated as a flag buffer and the first bit in memory represents Flag(64) , that is totally irrelevant to the functionality of the Flag management.

      There is no requirement to set the bits "serially from MSB to LSB" or any other physical order as long as you use the functions provided.
      There is also no requirement for a "BIT SET function to handle the bit number from left to right"

      It is only if you go outside of the flag management procedures and directly try to set bits in memory that a problem can arise.

      Physical storage is predicated on the most efficient way to read and write logical flags, not on how they "look" in memory.
      That's why we need a more complicated procedure to find the first available flag rather than just iterating through the bits in memory. i.e.
      Code:
      FASTPROC GetFirstFreeFlag() AS LONG
      STATIC x AS LONG
      STATIC Flag AS LONG
      Flag = 0
      FOR x = 0 TO (LEN(sFlags)\8)-1
          IF @pFlags[x] <> &HFFFFFFFFFFFFFFFF THEN
              Flag = x * 64 + 65 - INSTR(-1,BIN$(@pFlags[x],64),"0")
              EXIT FOR
          END IF
      NEXT
      END FASTPROC = Flag
      May I say, you remind me of those users who want to directly access the tables in a data application which uses many related tables and keys. Doing that instead of using the built in functionality of an application generally leads to corrupt data since physical storage and logical structures are two entirely different things.

      Comment


      • #23
        Stuart: My apologies if it looked like I was taking 'pot shots' at you, I wasn't. Before I incorporated your code I was running benchmarks against what is currently in the app, and your code. I was also evaluating some other techniques that looked promising.

        As I started getting results, I was trying to combine the best ideas from several methods, when I noticed the difference in bit counting. I goofed, I thought the flaw was in your code.

        Not!

        It was in my understanding.

        However, something good did come out of it. There is an improvement to GetFirstFreeFlag that in my benchmarks runs 3-4 times faster as the number of flags goes up. It uses a binary search to find the first free bit rather than an INSTR of the BIN$ value.. But it does require flipping the 'available' status to a 1 bit from a 0 bit, not a big deal.

        Here's my benchmark test program.

        George
        Code:
        #COMPILE EXE
        #DIM ALL
        #INCLUDE ONCE "Win32Api.inc"
        
        '----- Flag Global Variables
        GLOBAL TIDX AS STRING ' String storing the bit flags
        GLOBAL TIDXqp AS QUAD PTR ' Pointer to blocks of 64 flags
        GLOBAL TIDXbp() AS BYTE ' Overlay of TIDX for bit manipulation
        
        FUNCTION PBMAIN () AS LONG
        LOCAL i, j, FlagCount, TestCount, hWin AS LONG
        LOCAL qFreq, qStart, qStop, n, n1 AS QUAD, Result AS STRING
           TXT.WINDOW("Test Log", 40, 80) TO hWin
           TXT.PRINT " "
           QueryPerformanceFrequency qFreq
           TestCount = 200000
        Test1:
        '----- Initialise buffer array
           Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
           Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
           TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
           i = STRPTR(TIDX)
           TIDXqp = STRPTR(TIDX)
           REDIM TIDXbp(1 TO LEN(TIDX) / 8) AT TIDXqp
           QueryPerformanceCounter qStart
           FOR i = 1 TO TestCount
              j = GetFirstFreeFlag1
              ClearFlag(j)
           NEXT i
           QueryPerformanceCounter qStop
           TXT.PRINT "BIT INSTR Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
           TXT.PRINT " "
        Test2:
        '----- Initialise buffer array
           Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
           Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
           TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
           TIDXqp = STRPTR(TIDX)
           REDIM TIDXbp(1 TO LEN(TIDX)) AT TIDXqp
           QueryPerformanceCounter qStart
           FOR i = 1 TO TestCount
              j = GetFirstFreeFlag2
              ClearFlag(j)
           NEXT i
           QueryPerformanceCounter qStop
           TXT.PRINT "BIT Binary Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
           TXT.PRINT " "
        
           TXT.LINE.INPUT "All done, press Enter", Result
        
        END FUNCTION
        
        FUNCTION GetFirstFreeFlag1() AS LONG
        '---------- Find next available buffer number
        LOCAL x, Flag AS LONG
           FOR x = 0 TO (LEN(TIDX) \ 8) - 1
              IF @TIDXqp[x] <> &H0000000000000000 THEN
                 Flag = x * 64 + 65 - INSTR(-1, BIN$(@TIDXqp[x], 64), "1")
                 EXIT FOR
              END IF
           NEXT
           FUNCTION = Flag
        END FUNCTION
        
        FUNCTION GetFirstFreeFlag2() AS LONG
        '---------- Find next available buffer number
        REGISTER Flag AS LONG
        REGISTER x AS LONG
        LOCAL y AS QUAD
           FOR x = 0 TO (LEN(TIDX)\8)-1
              IF @TIDXqp[x] <> &H0000000000000000 THEN
                 y = @TIDXqp[x]
                 flag = 64
                 IF (y AND &H00000000FFFFFFFF) THEN SHIFT LEFT y, 32: flag -= 32
                 IF (y AND &H0000FFFF00000000) THEN SHIFT LEFT y, 16: flag -= 16
                 IF (y AND &H00FF000000000000) THEN SHIFT LEFT y, 8: flag -= 8
                 IF (y AND &H0F00000000000000) THEN SHIFT LEFT y, 4: flag -= 4
                 IF (y AND &H3000000000000000) THEN SHIFT LEFT y, 2: flag -= 2
                 IF (y AND &H4000000000000000) THEN SHIFT LEFT y, 1: flag -= 1
                 EXIT FOR
              END IF
           NEXT
           FUNCTION = Flag
        END FUNCTION
        
        FASTPROC ClearFlag(BYVAL FlagNo AS LONG)
        '---------- Mark buffer as 'in use'
           BIT RESET TIDXbp(1),FlagNo - 1
        END FASTPROC

        Comment


        • #24
          I may be misreading it, but ISTM that GetFirstFreeFlag2 will always return a Flag in the range 1 - 64 ?

          That means that your test is going to be locating a free flag in the first block it tests every time

          Change FUNCTION = Flag to FUNCTION = x * 64 +Flag and the comparative times are very different

          And if you change GetFirstFreeFlag1 from
          IF @TIDXqp[x] <> &H0000000000000000 THEN
          to
          IF @TIDXqp[x] THEN
          I think you will find that it is now faster than the corrected GetFirstFreeFlag2

          (but making the same change to GetFirstFreeFlag2 does make it marginally faster again)

          Comment


          • #25
            Add the following:
            Code:
            MACRO GetFirstFreeFlag4 = GetFirstFreeFlag(0,0)  
            FASTPROC GetFirstFreeFlag3(BYVAL x AS LONG,BYVAL flag AS LONG) AS LONG
            FOR x = 0 TO (LEN(TIDX)\8)-1
            IF @TIDXqp[x] THEN
            Flag = x * 64 + 65 - INSTR(-1,BIN$(@TIDXqp[x],64),"0")
            EXIT FOR
            END IF
            NEXT
            END FASTPROC = Flag
            Change
            j = GetFirstFreeFlag1
            to
            j = GetFirstFreeFlag4

            Now compare the times

            Comment


            • #26
              Thanks for getting me to look at optimising this . I've just updated the Source Code forum listing to the much faster use of Macros and Register variable in place of Statics as per Post#25 above

              Comment


              • #27
                OK, I'll go explore your new code. I found it fascinating exploring how many different algorithms there were for the "Find Next Bit" function.

                I must say though, that the Benchmark I posted doesn't show it, but one of the heavily used functions in my App is allocating a large 'block' of buffers (i.e. during a complete file load into the editor) and my old INSTR of a character string for the next buffer beats all others if the INSTR is started AFTER the last successful search. Again, I'm going to try combining all these techniques. This is proving to be a fascinating exploration.

                George

                Comment


                • #28
                  A faster solution was mentioned in post#2 and again in post#6

                  Why search for a flag?
                  Just have a list of available flags.
                  The next free flag is the next on the list.
                  When you finish with a flag, add it back to the list.
                  Either a stack or a circular list would be many times faster than searching a bit array.

                  Comment


                  • #29
                    Originally posted by George Deluca View Post
                    OK, I'll go explore your new code. I found it fascinating exploring how many different algorithms there were for the "Find Next Bit" function.

                    I must say though, that the Benchmark I posted doesn't show it, but one of the heavily used functions in my App is allocating a large 'block' of buffers (i.e. during a complete file load into the editor) and my old INSTR of a character string for the next buffer beats all others if the INSTR is started AFTER the last successful search. Again, I'm going to try combining all these techniques. This is proving to be a fascinating exploration.

                    George
                    Ooooh, Set/ClearBlock(FirstFlag,LaastFlag) I'm off to add that, give me a few minutues

                    Comment


                    • #30
                      Originally posted by Paul Dixon View Post
                      A faster solution was mentioned in post#2 and again in post#6

                      Why search for a flag?
                      Just have a list of available flags.
                      The next free flag is the next on the list.
                      When you finish with a flag, add it back to the list.
                      Either a stack or a circular list would be many times faster than searching a bit array.
                      For a start, because your solution takes 32 times as much memory
                      The current method will set/clear a block of a thousand flags in around 0.01 milliseconds. How long will your list/stack of longs take to do the same?
                      Setting up an initial array of 1 million flags takes less than 0.1 milliseconds
                      It takes 0.05 milliseconds to find the first available free flag where the first million are already set.
                      It counts the number of free flags in a 1 million flag array in about 5 milliseconds
                      It's a generic flag management solution which also lets you set, clear or toggle any flag, not just use the last one that was freed..

                      But feel free to post your alternative version with the same capabilities and the same or better times so that we can compare to the latest version in the thread at
                      https://forum.powerbasic.com/forum/u...ags#post795140
                      Last edited by Stuart McLachlan; 28 Jun 2020, 05:02 AM.

                      Comment


                      • #31
                        Version 1.3 with Set/ClearBlock(lStart,lStop) procedures added in Source Code thread.

                        Comment


                        • #32
                          Originally posted by Stuart McLachlan View Post
                          I may be misreading it, but ISTM that GetFirstFreeFlag2 will always return a Flag in the range 1 - 64 ?

                          That means that your test is going to be locating a free flag in the first block it tests every time

                          Change FUNCTION = Flag to FUNCTION = x * 64 +Flag and the comparative times are very different
                          Doing my regular morning visit to Dilbert and thought today's entry was appropriate to this

                          Click image for larger version  Name:	e1265fd08d7f013808e1005056a9545d.gif Views:	0 Size:	97.2 KB ID:	796188

                          Comment


                          • #33
                            Stuart,
                            there's nothing wrong with some general purpose bit flag system, it's just not the best solution for George if he needs it to be fast.

                            For a stack based system:
                            Getting the next flag is near instantaneous. It's just an array access. It'll be a lot faster than using bits and is the same speed regardless of how many flags are in use.
                            Returning a used flag is the same.
                            These are the 2 main operations needed by George.

                            Counting flags in use means just returning the stack pointer. Maybe a 1000000x faster than counting the bits.

                            It does use more memory, but for a 2.5million flags that's not significant.
                            It doesn't easily allow mass clearing and setting of specific blocks of flags but that doesn't appear to be something George needs.



                            If you're using blocks of bits as flags, you could improve things.
                            Don't convert 64 bits to a string then use INSTR to find a flag. That's incredibly slow.
                            Instead, scan the binary for the set bit.
                            If you don't mind a few ASM instructions then the CPU has the BSF (Bit Scan Forwards) and BSR (Bit Scan Reverse) instructions which do it for you on 32-bit values.

                            Comment


                            • #34
                              Here's a modified version of the code from post#23 posted by George.
                              I've added a 3rd test using stacks.
                              It's untested and no attempt has been made to optimise it.
                              Maybe George can try it out in a real situation as the test he posted isn't very realistic.

                              Code:
                              'PBCC6 program
                              #COMPILE EXE
                              #DIM ALL
                              #INCLUDE ONCE "Win32Api.inc"
                              
                              '----- Flag Global Variables
                              GLOBAL TIDX AS STRING ' String storing the bit flags
                              GLOBAL TIDXqp AS QUAD PTR ' Pointer to blocks of 64 flags
                              GLOBAL TIDXbp() AS BYTE ' Overlay of TIDX for bit manipulation
                              
                              
                              
                              ENUM FlagToDoValues SINGULAR
                              GetNextFreeFlag = 1
                              ReturnFlagToHeap
                              HowManyFlagsInUse
                              InitialiseFlags
                              END ENUM
                              
                              GLOBAL gFlagArray() AS LONG
                              
                              
                              
                              
                              FUNCTION PBMAIN () AS LONG
                              LOCAL i, j, FlagCount, TestCount, hWin AS LONG
                              LOCAL qFreq, qStart, qStop, n, n1 AS QUAD, Result AS STRING
                                 TXT.WINDOW("Test Log", 40, 80) TO hWin
                                 TXT.PRINT " "
                                 QueryPerformanceFrequency qFreq
                                 TestCount = 200000
                              Test1:
                              '----- Initialise buffer array
                                 Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
                                 Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
                                 TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
                                 i = STRPTR(TIDX)
                                 TIDXqp = STRPTR(TIDX)
                                 REDIM TIDXbp(1 TO LEN(TIDX) / 8) AT TIDXqp
                                 QueryPerformanceCounter qStart
                                 FOR i = 1 TO TestCount
                                    j = GetFirstFreeFlag1
                                    ClearFlag(j)
                                 NEXT i
                                 QueryPerformanceCounter qStop
                                 TXT.PRINT "BIT INSTR Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
                                 TXT.PRINT " "
                              Test2:
                              '----- Initialise buffer array
                                 Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
                                 Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
                                 TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
                                 TIDXqp = STRPTR(TIDX)
                                 REDIM TIDXbp(1 TO LEN(TIDX)) AT TIDXqp
                                 QueryPerformanceCounter qStart
                                 FOR i = 1 TO TestCount
                                    j = GetFirstFreeFlag2
                                    ClearFlag(j)
                                 NEXT i
                                 QueryPerformanceCounter qStop
                                 TXT.PRINT "BIT Binary Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
                                 TXT.PRINT " "
                              
                              
                              
                              
                              Test3:
                              '----- Initialise buffer array
                                 FlagCode(%InitialiseFlags, 2500000)
                                 QueryPerformanceCounter qStart
                                 FOR i = 1 TO TestCount
                                    j = FlagCode(%GetNextFreeFlag,0)
                                    FlagCode(%ReturnFlagToHeap,j)
                                 NEXT i
                                 QueryPerformanceCounter qStop
                                 TXT.PRINT "Flag Stack = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
                                 TXT.PRINT " "
                              
                              
                              
                              
                              
                                 TXT.LINE.INPUT "All done, press Enter", Result
                              
                              END FUNCTION
                              
                              FUNCTION GetFirstFreeFlag1() AS LONG
                              '---------- Find next available buffer number
                              LOCAL x, Flag AS LONG
                                 FOR x = 0 TO (LEN(TIDX) \ 8) - 1
                                    IF @TIDXqp[x] <> &H0000000000000000 THEN
                                       Flag = x * 64 + 65 - INSTR(-1, BIN$(@TIDXqp[x], 64), "1")
                                       EXIT FOR
                                    END IF
                                 NEXT
                                 FUNCTION = Flag
                              END FUNCTION
                              
                              FUNCTION GetFirstFreeFlag2() AS LONG
                              '---------- Find next available buffer number
                              REGISTER Flag AS LONG
                              REGISTER x AS LONG
                              LOCAL y AS QUAD
                                 FOR x = 0 TO (LEN(TIDX)\8)-1
                                    IF @TIDXqp[x] <> &H0000000000000000 THEN
                                       y = @TIDXqp[x]
                                       flag = 64
                                       IF (y AND &H00000000FFFFFFFF) THEN SHIFT LEFT y, 32: flag -= 32
                                       IF (y AND &H0000FFFF00000000) THEN SHIFT LEFT y, 16: flag -= 16
                                       IF (y AND &H00FF000000000000) THEN SHIFT LEFT y, 8: flag -= 8
                                       IF (y AND &H0F00000000000000) THEN SHIFT LEFT y, 4: flag -= 4
                                       IF (y AND &H3000000000000000) THEN SHIFT LEFT y, 2: flag -= 2
                                       IF (y AND &H4000000000000000) THEN SHIFT LEFT y, 1: flag -= 1
                                       EXIT FOR
                                    END IF
                                 NEXT
                                 FUNCTION = Flag
                              END FUNCTION
                              
                              FASTPROC ClearFlag(BYVAL FlagNo AS LONG)
                              '---------- Mark buffer as 'in use'
                                 BIT RESET TIDXbp(1),FlagNo - 1
                              END FASTPROC
                              
                              
                              
                              
                              
                              FUNCTION FlagCode(WhatToDo AS LONG, WhichFlag AS LONG) AS LONG
                              
                              STATIC StackPointer AS LONG
                              STATIC StackSize AS LONG
                              LOCAL r AS LONG
                              
                              SELECT CASE WhatToDo
                                  CASE %GetNextFreeFlag
                                      IF WhichFlag <= StackSize THEN
                                          'the stack is not empty so the StackPointer points to the next available free flag number
                                          FUNCTION = gFlagArray(StackPointer)
                                          INCR StackPointer
                              
                                      ELSE
                                          'Out of stack space. Report an error?
                              
                                      END IF
                              
                                  CASE %ReturnFlagToHeap
                                      'StackPointer points to the next available flag so decrement it to point to the previous, unavailable, flag and store the returned flag number there
                                      DECR StackPointer
                                      gFlagArray(StackPointer) = WhichFlag
                              
                              
                                  CASE %HowManyFlagsInUse
                                      FUNCTION = StackPointer
                              
                                  CASE %InitialiseFlags
                                      'set the size of the stack
                                      StackSize = WhichFlag
                              
                                      'Fill in each possible location with the number of a possible flag.
                                      'These start off in sequence but as flags are used and returned in different order the stack gets out of sequence, but that's OK
                                      DIM gFlagArray(WhichFlag)
                                      FOR r = 1 TO WhichFlag
                                          gFlagArray(r) = r
                              
                                      NEXT
                              
                              
                              END SELECT
                              
                              END FUNCTION
                              Last edited by Paul Dixon; 28 Jun 2020, 09:28 AM. Reason: Used wrong index when filling the array.

                              Comment


                              • #35
                                Originally posted by Paul Dixon View Post
                                It doesn't easily allow mass clearing and setting of specific blocks of flags but that doesn't appear to be something George needs.
                                Post #27 ? "one of the heavily used functions in my App is allocating a large 'block' of buffers"

                                Comment


                                • #36
                                  Stuart,
                                  maybe my interpretation of what George is doing is different to yours.
                                  To me, it looks like when he allocates a "large block of buffers" it'll be done one at a time, taking into account those buffers that are already allocated. e.g. get the next 100,000 FREE buffers, not just get a block of 100,000 irrespective of if they are currently in use.

                                  We can wait for George to clarify.



                                  I've done some timings.
                                  With some, your code is faster, with others my code is faster.


                                  But the 2 critical items for George are GetFirstFreeFlag and ClearFlag(x).

                                  With 1,000,000 flags allowed for, getting the first free 1,000,000 flags takes your code 27107ms.
                                  Mine takes 18ms
                                  Freeing up those 1,000,000 flags takes your code 2.9ms
                                  Mine takes 20ms

                                  So the total time to allocate and deallocate 1,000,000 flags is around 27000ms for your code and 38ms for my code, that's about 700x faster.


                                  The time taken by the stack code to allolcate a flag is about the same, regardless of how many flags are already allocated.
                                  The BIT array version takes progressively longer to find a free flag as the number of allocated flags increases.
                                  If we use the 2.5million flag limit that George appears to want then the bit code takes 168,000ms and the stack code takes 98ms. That's 1700x times faster.

                                  Comment


                                  • #37
                                    Originally posted by Paul Dixon View Post
                                    Don't convert 64 bits to a string then use INSTR to find a flag. That's incredibly slow.
                                    I suppose so, if you consider 4 microseconds slow

                                    Comment


                                    • #38
                                      4 microseconds for each of a million items, that's 4 seconds extra.

                                      Comment


                                      • #39
                                        Stuart,
                                        There will be some changes that can speed up your code.
                                        Since flags are always allocated from the start of the block and you always scan from the start of the block then, as you use more flags, you're guaranteed to take longer to find the next flag because you always scan parts of the bit array that are unlikely to have free flags in them.

                                        Instead of always searching from the start of the flags, why not search from the location that the previous flag was found?
                                        It's very likely that the next location will also have an available flag so will be found almost immediately?

                                        Comment


                                        • #40
                                          Stuart,
                                          have you tried overlaying LONGs on the string array instead of QUADs?
                                          QUADs are handled by the FPU and are not as efficient as LONGs. I'd expect LONGs to improve the speed.


                                          Scanning for the next available flag beginning at the location of the last found flag, and wrapping round if necessary, appears to reduce the time to allocate 1,000,000 flags from 27 seconds to under 0.4 seconds.


                                          Now, about 90% of the remaining time is taken up in that INSTR(-1,BIN$(@pFlags[x],64),"1") line so it'll be worth rewriting that to avoid the strings.

                                          Comment

                                          Working...
                                          X