Announcement

Collapse
No announcement yet.

Need better (faster) INSTR

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stuart McLachlan
    replied
    Originally posted by Michael Mattias View Post
    Um, not "recommended," sir, but SUGGESTED, as something to be tried and evaluated for use in this application. .
    "Instead of INSTR, use REGEXPR, which DOES support a "start position."

    Looks more like a recommendation (or even a direction or instruction) rather that a suggestion to me.

    Leave a comment:


  • Michael Mattias
    replied
    And you recommended the slower REGEXPR in reply
    Um, not "recommended," sir, but SUGGESTED, as something to be tried and evaluated for use in this application. .

    What I "recommended" was changing the design to eliminate the bottleneck, but OP Mr. DeLuca ruled that out as "just too big a change." (Post #12 this thread).

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Michael Mattias View Post
    And? I read this as a request to get something faster than INSTR.
    And you recommended the slower REGEXPR in reply

    Leave a comment:


  • Michael Mattias
    replied
    Is anyone aware that INSTR() has an optional start position parameter? Just INCR previous INSTR return.
    I am now. That is a feature I totally missed, apparently for many years. My bad.

    And the Heading is: Need better (faster) INSTR
    And? I read this as a request to get something faster than INSTR.

    You will also note earlier in this thread I made two different "design change" suggestions to remove the (currently) INSTR-related bottleneck in the program. I have always, do now and for the forseeable future have this old-fashioned belief that there is a lot more performance gained by design than by programming technique such as choice of intrinsic functions used.

    Leave a comment:


  • George Deluca
    replied
    And I DID say that for multiple requests the INSTR search starts at the last successful position + 1.

    And many thanks to all of you for the many suggestions. Invaluable.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Michael Mattias View Post
    Instead of INSTR, use REGEXPR, which DOES support a "start position."

    ??
    So does INSTR

    And the Heading is: Need better (faster) INSTR

    RegExp is much slower than INSTR

    Leave a comment:


  • Paul Dixon
    replied
    use REGEXPR,
    REGEXPR is a lot slower than INSTR.

    Leave a comment:


  • Dale Yarker
    replied
    Is anyone aware that INSTR() has an optional start position parameter? Just INCR previous INSTR return.

    Not saying it is best way to do this job, just that full power of INSTR is not being used.

    Leave a comment:


  • Michael Mattias
    replied
    Instead of INSTR, use REGEXPR, which DOES support a "start position."

    ??

    Leave a comment:


  • George Deluca
    replied
    OK, clarification of my App's requirement..

    When a new buffer allocation is needed, it is because an array of UDTs, is having 1 or more entries inserted. One of the fields in the UDT is a STRING POINTER (since we can't have dynamic strings) So as each new UDT entry is filled in, a free STRING buffer must be allocated and saved in the STRING POINTER.

    This insert a new UDT routine is passed the number of entries being inserted into the UDT array. When it's > 1, the INSTR search for 'next available' buffer starts @ the previous successful INSTR location + 1. (everything left of that point we know is in use). Except the 1st search of course which starts at 1. So these 'bulk mode' free buffer searches are very quick.

    Where my INSTR approach performs poorly is when these inserts are requested for inserts of single items, since for those, the INSTR search always starts at position 1.

    Paul: I looked at your stack approach, it sure looks like it has all these others beat. Yes, it takes 4 bytes per buffer, but nowadays what's 5-10 megabytes.

    George

    Leave a comment:


  • Paul Dixon
    replied
    Stuart,
    have you tried overlaying LONGs on the string array instead of QUADs?
    QUADs are handled by the FPU and are not as efficient as LONGs. I'd expect LONGs to improve the speed.


    Scanning for the next available flag beginning at the location of the last found flag, and wrapping round if necessary, appears to reduce the time to allocate 1,000,000 flags from 27 seconds to under 0.4 seconds.


    Now, about 90% of the remaining time is taken up in that INSTR(-1,BIN$(@pFlags[x],64),"1") line so it'll be worth rewriting that to avoid the strings.

    Leave a comment:


  • Paul Dixon
    replied
    Stuart,
    There will be some changes that can speed up your code.
    Since flags are always allocated from the start of the block and you always scan from the start of the block then, as you use more flags, you're guaranteed to take longer to find the next flag because you always scan parts of the bit array that are unlikely to have free flags in them.

    Instead of always searching from the start of the flags, why not search from the location that the previous flag was found?
    It's very likely that the next location will also have an available flag so will be found almost immediately?

    Leave a comment:


  • Paul Dixon
    replied
    4 microseconds for each of a million items, that's 4 seconds extra.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Paul Dixon View Post
    Don't convert 64 bits to a string then use INSTR to find a flag. That's incredibly slow.
    I suppose so, if you consider 4 microseconds slow

    Leave a comment:


  • Paul Dixon
    replied
    Stuart,
    maybe my interpretation of what George is doing is different to yours.
    To me, it looks like when he allocates a "large block of buffers" it'll be done one at a time, taking into account those buffers that are already allocated. e.g. get the next 100,000 FREE buffers, not just get a block of 100,000 irrespective of if they are currently in use.

    We can wait for George to clarify.



    I've done some timings.
    With some, your code is faster, with others my code is faster.


    But the 2 critical items for George are GetFirstFreeFlag and ClearFlag(x).

    With 1,000,000 flags allowed for, getting the first free 1,000,000 flags takes your code 27107ms.
    Mine takes 18ms
    Freeing up those 1,000,000 flags takes your code 2.9ms
    Mine takes 20ms

    So the total time to allocate and deallocate 1,000,000 flags is around 27000ms for your code and 38ms for my code, that's about 700x faster.


    The time taken by the stack code to allolcate a flag is about the same, regardless of how many flags are already allocated.
    The BIT array version takes progressively longer to find a free flag as the number of allocated flags increases.
    If we use the 2.5million flag limit that George appears to want then the bit code takes 168,000ms and the stack code takes 98ms. That's 1700x times faster.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Paul Dixon View Post
    It doesn't easily allow mass clearing and setting of specific blocks of flags but that doesn't appear to be something George needs.
    Post #27 ? "one of the heavily used functions in my App is allocating a large 'block' of buffers"

    Leave a comment:


  • Paul Dixon
    replied
    Here's a modified version of the code from post#23 posted by George.
    I've added a 3rd test using stacks.
    It's untested and no attempt has been made to optimise it.
    Maybe George can try it out in a real situation as the test he posted isn't very realistic.

    Code:
    'PBCC6 program
    #COMPILE EXE
    #DIM ALL
    #INCLUDE ONCE "Win32Api.inc"
    
    '----- Flag Global Variables
    GLOBAL TIDX AS STRING ' String storing the bit flags
    GLOBAL TIDXqp AS QUAD PTR ' Pointer to blocks of 64 flags
    GLOBAL TIDXbp() AS BYTE ' Overlay of TIDX for bit manipulation
    
    
    
    ENUM FlagToDoValues SINGULAR
    GetNextFreeFlag = 1
    ReturnFlagToHeap
    HowManyFlagsInUse
    InitialiseFlags
    END ENUM
    
    GLOBAL gFlagArray() AS LONG
    
    
    
    
    FUNCTION PBMAIN () AS LONG
    LOCAL i, j, FlagCount, TestCount, hWin AS LONG
    LOCAL qFreq, qStart, qStop, n, n1 AS QUAD, Result AS STRING
       TXT.WINDOW("Test Log", 40, 80) TO hWin
       TXT.PRINT " "
       QueryPerformanceFrequency qFreq
       TestCount = 200000
    Test1:
    '----- Initialise buffer array
       Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
       Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
       TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
       i = STRPTR(TIDX)
       TIDXqp = STRPTR(TIDX)
       REDIM TIDXbp(1 TO LEN(TIDX) / 8) AT TIDXqp
       QueryPerformanceCounter qStart
       FOR i = 1 TO TestCount
          j = GetFirstFreeFlag1
          ClearFlag(j)
       NEXT i
       QueryPerformanceCounter qStop
       TXT.PRINT "BIT INSTR Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
       TXT.PRINT " "
    Test2:
    '----- Initialise buffer array
       Flagcount = 2500000 ' will be rounded up to nearest multiple of 64
       Flagcount = Flagcount + (63 - (Flagcount - 1) MOD 64)
       TIDX = REPEAT$(Flagcount, CHR$(255,255,255,255,255,255,255,255))
       TIDXqp = STRPTR(TIDX)
       REDIM TIDXbp(1 TO LEN(TIDX)) AT TIDXqp
       QueryPerformanceCounter qStart
       FOR i = 1 TO TestCount
          j = GetFirstFreeFlag2
          ClearFlag(j)
       NEXT i
       QueryPerformanceCounter qStop
       TXT.PRINT "BIT Binary Search = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
       TXT.PRINT " "
    
    
    
    
    Test3:
    '----- Initialise buffer array
       FlagCode(%InitialiseFlags, 2500000)
       QueryPerformanceCounter qStart
       FOR i = 1 TO TestCount
          j = FlagCode(%GetNextFreeFlag,0)
          FlagCode(%ReturnFlagToHeap,j)
       NEXT i
       QueryPerformanceCounter qStop
       TXT.PRINT "Flag Stack = " + FORMAT$((qStop-qStart)*1000/qFreq, "####.##")+"ms"
       TXT.PRINT " "
    
    
    
    
    
       TXT.LINE.INPUT "All done, press Enter", Result
    
    END FUNCTION
    
    FUNCTION GetFirstFreeFlag1() AS LONG
    '---------- Find next available buffer number
    LOCAL x, Flag AS LONG
       FOR x = 0 TO (LEN(TIDX) \ 8) - 1
          IF @TIDXqp[x] <> &H0000000000000000 THEN
             Flag = x * 64 + 65 - INSTR(-1, BIN$(@TIDXqp[x], 64), "1")
             EXIT FOR
          END IF
       NEXT
       FUNCTION = Flag
    END FUNCTION
    
    FUNCTION GetFirstFreeFlag2() AS LONG
    '---------- Find next available buffer number
    REGISTER Flag AS LONG
    REGISTER x AS LONG
    LOCAL y AS QUAD
       FOR x = 0 TO (LEN(TIDX)\8)-1
          IF @TIDXqp[x] <> &H0000000000000000 THEN
             y = @TIDXqp[x]
             flag = 64
             IF (y AND &H00000000FFFFFFFF) THEN SHIFT LEFT y, 32: flag -= 32
             IF (y AND &H0000FFFF00000000) THEN SHIFT LEFT y, 16: flag -= 16
             IF (y AND &H00FF000000000000) THEN SHIFT LEFT y, 8: flag -= 8
             IF (y AND &H0F00000000000000) THEN SHIFT LEFT y, 4: flag -= 4
             IF (y AND &H3000000000000000) THEN SHIFT LEFT y, 2: flag -= 2
             IF (y AND &H4000000000000000) THEN SHIFT LEFT y, 1: flag -= 1
             EXIT FOR
          END IF
       NEXT
       FUNCTION = Flag
    END FUNCTION
    
    FASTPROC ClearFlag(BYVAL FlagNo AS LONG)
    '---------- Mark buffer as 'in use'
       BIT RESET TIDXbp(1),FlagNo - 1
    END FASTPROC
    
    
    
    
    
    FUNCTION FlagCode(WhatToDo AS LONG, WhichFlag AS LONG) AS LONG
    
    STATIC StackPointer AS LONG
    STATIC StackSize AS LONG
    LOCAL r AS LONG
    
    SELECT CASE WhatToDo
        CASE %GetNextFreeFlag
            IF WhichFlag <= StackSize THEN
                'the stack is not empty so the StackPointer points to the next available free flag number
                FUNCTION = gFlagArray(StackPointer)
                INCR StackPointer
    
            ELSE
                'Out of stack space. Report an error?
    
            END IF
    
        CASE %ReturnFlagToHeap
            'StackPointer points to the next available flag so decrement it to point to the previous, unavailable, flag and store the returned flag number there
            DECR StackPointer
            gFlagArray(StackPointer) = WhichFlag
    
    
        CASE %HowManyFlagsInUse
            FUNCTION = StackPointer
    
        CASE %InitialiseFlags
            'set the size of the stack
            StackSize = WhichFlag
    
            'Fill in each possible location with the number of a possible flag.
            'These start off in sequence but as flags are used and returned in different order the stack gets out of sequence, but that's OK
            DIM gFlagArray(WhichFlag)
            FOR r = 1 TO WhichFlag
                gFlagArray(r) = r
    
            NEXT
    
    
    END SELECT
    
    END FUNCTION
    Last edited by Paul Dixon; 28 Jun 2020, 09:28 AM. Reason: Used wrong index when filling the array.

    Leave a comment:


  • Paul Dixon
    replied
    Stuart,
    there's nothing wrong with some general purpose bit flag system, it's just not the best solution for George if he needs it to be fast.

    For a stack based system:
    Getting the next flag is near instantaneous. It's just an array access. It'll be a lot faster than using bits and is the same speed regardless of how many flags are in use.
    Returning a used flag is the same.
    These are the 2 main operations needed by George.

    Counting flags in use means just returning the stack pointer. Maybe a 1000000x faster than counting the bits.

    It does use more memory, but for a 2.5million flags that's not significant.
    It doesn't easily allow mass clearing and setting of specific blocks of flags but that doesn't appear to be something George needs.



    If you're using blocks of bits as flags, you could improve things.
    Don't convert 64 bits to a string then use INSTR to find a flag. That's incredibly slow.
    Instead, scan the binary for the set bit.
    If you don't mind a few ASM instructions then the CPU has the BSF (Bit Scan Forwards) and BSR (Bit Scan Reverse) instructions which do it for you on 32-bit values.

    Leave a comment:


  • Stuart McLachlan
    replied
    Originally posted by Stuart McLachlan View Post
    I may be misreading it, but ISTM that GetFirstFreeFlag2 will always return a Flag in the range 1 - 64 ?

    That means that your test is going to be locating a free flag in the first block it tests every time

    Change FUNCTION = Flag to FUNCTION = x * 64 +Flag and the comparative times are very different
    Doing my regular morning visit to Dilbert and thought today's entry was appropriate to this

    Click image for larger version  Name:	e1265fd08d7f013808e1005056a9545d.gif Views:	0 Size:	97.2 KB ID:	796188

    Leave a comment:


  • Stuart McLachlan
    replied
    Version 1.3 with Set/ClearBlock(lStart,lStop) procedures added in Source Code thread.

    Leave a comment:

Working...
X