Announcement

Collapse
No announcement yet.

Speed drop in iterative string concatenation

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Speed drop in iterative string concatenation

    I came across this as I was trying to generate a string of 1,000,000 random ASCII characters (in the ASCII 65 TO 90 range). My idea was to generate one random character at a time, and then keep concatenating the results into one, long string.

    For this test, I eliminated the RND segment and used the following code that simply adds the character "A" in varying iterations...

    ----------------------------------
    Code:
     
    #COMPILE EXE
    #DIM ALL
    
    GLOBAL PanString$
    GLOBAL Start#
    GLOBAL Finish#
    GLOBAL Counter&
    
    FUNCTION PBMAIN () AS LONG
    
        Start# = TIMER
        
            FOR Counter& = 1 TO 100000
               
                 Panstring$ = Panstring$ + "A"
            
            NEXT Counter&
    
            BEEP
        
        Finish# = TIMER
        PRINT "ELAPSED TIME=";Finish#-Start#
        WAITKEY$
        
    
    END FUNCTION
    
    ---------------------------------------------
    Using my old 2.6GHz PC, and my PBCC Version 4.04.042, I got these results:
    
    ITERATIONS                 SECONDS
    -------------         ----------------
    100,000                         1.828 
    150,000                         4.186 
    200,000                         7.188 
    250,000                        15.360
    300,000                        39.297
    400,000                       141.530
    Take note of the striking drop in processing speed after the 300,000th
    iteration.

    My workaround was to split the processing into fewer iterations (say, 10,000 iterations x 100) and then concatenate the 100 separate strings into the final long string.

    Is this relative increase in processing time caused by the increasing length of the contents of PanString$ ? Or could this be attributable to something else? I'd appreciate any feedback.

    #2
    Jose--

    String concatenation can be a bottleneck when you perform the operation on a string that's 400,000 bytes long, and then repeat the operation 400,000 times.

    A better plan would be to first create a string which is 400,000 bytes long:

    x$ = space$(400000)

    then insert the characters using the ASC() statement or the MID$() statement...

    FOR i& = 1 to 400000
    ASC(x$, i&) = RND(65,90)
    NEXT


    Best regards,

    Bob Zale
    PowerBASIC Inc.

    Comment


      #3
      Jose,
      Is this relative increase in processing time caused by the increasing length of the contents of PanString$ ? Or could this be attributable to something else? I'd appreciate any feedback.
      The sudden drop will be caused by your system (CPU) running out of cache.
      Cache memory is a lot faster than main memory but it's limited to around half to one megabyte in most systems. When you run out of cache the CPU is required to start using much slower main memory.

      Paul.

      Comment


        #4
        Oops! Mis-read Bob's reply. Appologies.

        But actually, going the full 1 meg string buffer wouldn't take that big a speed hit.

        The following took less than a second to finish:
        Code:
        FUNCTION PBMAIN () AS LONG
            RANDOMIZE TIMER
            x$ = SPACE$(100000)
            LOCATE 5,1 : PRINT;LEFT$(x$,80)
        
            FOR i& = 1 TO 1000000
            ASC(x$, i&) = RND(65,90)
            NEXT
        
            LOCATE 7,1 : PRINT;LEFT$(x$,80);
            LOCATE 9,1 : PRINT;LEN(x$)
            WAITKEY$
        
            END FUNCTION
        Last edited by Mel Bishop; 28 Mar 2008, 07:38 AM.
        There are no atheists in a fox hole or the morning of a math test.
        If my flag offends you, I'll help you pack.

        Comment


          #5
          Check those numbers, Mel... {smile} You're filling 1,000,000 bytes in a 100,000 byte string.

          Best regards,

          Bob Zale
          PowerBASIC Inc.

          Comment


            #6
            >You're filling 1,000,000 bytes in a 100,000 byte string

            Hell of a compresssion technique.
            Michael Mattias
            Tal Systems (retired)
            Port Washington WI USA
            [email protected]
            http://www.talsystems.com

            Comment


              #7
              Point taken. Changed it to 1,000,000 and it still took < 1 second.
              There are no atheists in a fox hole or the morning of a math test.
              If my flag offends you, I'll help you pack.

              Comment


                #8
                Here is the absolute array technique which is even faster still--ten million in just a couple seconds. Behold:

                Code:
                #COMPILE EXE
                #DIM ALL
                
                FUNCTION PBMAIN () AS LONG
                
                    LOCAL ii AS LONG
                    LOCAL rndStr, printStr AS STRING
                    rndStr = SPACE$(10000000)                    'use an absolute array now to overlap string memory:
                    DIM rsArr(9999999) AS BYTE AT STRPTR(rndStr) 'size is 1 less than length of string because of rsArr(0) element
                    RANDOMIZE
                                                                 'now quickly fill array:
                    FOR ii = 0 TO 9999999                        '10,000,000 total loop iterations
                       rsArr(ii) = RND(65, 90)                   'changed to characters Jose wanted
                    NEXT
                    
                    printStr = RIGHT$(rndStr, 4000)              'for printing purposes
                    ? "Done. Here are the last 4000 characters:" & $CRLF & printStr
                END FUNCTION
                Last edited by John Gleason; 28 Mar 2008, 12:13 PM. Reason: changed to RND(65, 90) so only those characters appear

                Comment


                  #9
                  Code:
                   LOCAL rndStr AS STRING
                   LOCAL pB As BYTE PTR 
                   REGISTER II AS LONG
                  
                   rndStr       = SPACE$(10000000)
                   pb            = STRPTR(rndStr)
                   RANDOMIZE
                   FOR ii = 1 TO 10000000      '10,000,000 total loop iterations
                         @pb = RND(0, 255)
                         INCR pb
                    NEXT
                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                    #10
                    I'm drifting off topic here but thanks John - you have just given me the fastest binary array to hex string to date - by a factor of 10.

                    Code:
                    Function PBMain () As Long
                      
                      Dim hashByte() As Byte
                      Dim hashByteHex () As String * 2
                      Local hashText As String
                      Local hashSize, i As Long
                      
                      hashSize = 32 ' bytes for example
                      
                      ReDim hashByte(hashSize -1) As Byte
                      
                      For i = 0 To hashSize- 1
                        hashByte(i) = Rnd(0,255) ' Output from a hashing algorithm, for example
                      Next
                      
                      hashText = Space$(2*hashSize)
                      ReDim hashByteHex( hashSize - 1) As String * 2 At StrPtr(hashText)
                      
                      For i = 0 To hashSize-1
                        hashByteHex(i) = Hex$( hashByte( i ),2 )
                      Next
                      
                      MsgBox hashText
                    
                    End Function

                    Comment


                      #11
                      Dave, tho it won't help with short strings (unless there are a tons of them) here are two mods to your code to further speed it:
                      http://www.powerbasic.com/support/pb...398#post280398

                      Comment


                        #12
                        Inputs regarding string concatenation slowdown

                        Bob, Paul, Mel, John, Michael and David: Thanks for you fast replies/inputs.

                        Sorry, I haven't been able to go online in the last two days.

                        I tried out Bob's suggestion, and his code snippet took only .0469 seconds to execute 400,000 iterations in my 2.6Ghz Pentium. I'll now try out the other suggestions.

                        Bob mentioned the "bottleneck" that happens somewhere around the 400,000 iteration. Is this related to what Paul said about my running out of memory cache?

                        Comment


                          #13
                          Hi Jose!

                          I think you may have misunderstood me slightly. There is nothing special about the number 400,000, or any other number, except for the fact that it's a very big number.

                          It's concatenation itself which can be a "problem" when you do it too many times on strings that are too long. Every time you add one byte to the end of a string, PowerBASIC must copy both strings (the original and the one new byte) into a new string. In your example, that means you're doing:

                          400,000 copy operations * 200,001 average number of bytes per copy

                          That's 80,000,400,000 bytes copied plus 400,000 string allocations and deletions. That's a lot of stuff to do! It takes time. Contrast it with my method which has one allocation and 400,000 byte insertions?

                          The cache issue Paul mentioned can have its effect as well, but I think concatenation is the real problem.

                          Best regards,

                          Bob Zale
                          PowerBASIC Inc.

                          Comment


                            #14
                            Yes, I see your point....

                            Bob, thanks for the clarification. I should have realized this when I made the table of run times (see my original post) The repeated concatenations do take much longer as you pointed out, as they have a cumulative effect on the number of bytes processed/added per iteration.

                            Again, thanks for the help! I do appreciate it.

                            Comment

                            Working...
                            X
                            😀
                            🥰
                            🤢
                            😎
                            😡
                            👍
                            👎