Announcement

Collapse
No announcement yet.

String-Speed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • String-Speed

    This is a question for you bit-crunchers..
    Whitch is fastest?
    1) globS$ = globS$ & NewS$
    2) globS$ = STRINSERT(globS$,NewS$,Len(globS$)+1)
    I want your opinion as this is called milli-billions of time
    or is there a better way?
    globS$ is a global string
    Thank you

    -------------
    Fred
    mailto:[email protected][email protected]</A>
    http://www.oxenby.se

    Fred
    mailto:[email protected][email protected]</A>
    http://www.oxenby.se

  • #2
    Fred,

    the first, it does less work.

    regards,

    [email protected]

    PS, Thinking about it, it can be done a lot faster over a very long
    string as extended concantenations become very slow due to the nayure
    of how basic does it. What is necessary though is the maximum size
    of the buffer as this makes it so u can just write to the end of the
    buffer each adition.

    Let us know a bit more about what it does, there is another choice
    which is to use some C style zero terminated string routines for doing
    the concantenations which are not all that hard to write, I have
    them available in MASM code and it will translate reasonably easily
    into PowerBASIC.

    ------------------


    [This message has been edited by Steve Hutchesson (edited February 11, 2000).]
    hutch at movsd dot com
    The MASM Forum

    www.masm32.com

    Comment


    • #3
      Today file sizes sometime exceed 2GB and average line-size is 50 chrs
      so there are a 'lot of' string-concatenations going on
      This routine builds an output-buffer which will be
      flushed to disk when it exceeds 10kB
      InputBuffer = 50 kb

      Code:
      Function PCCEBC_TO_ASA()As Long
      '..PCC = SkipAfterWrite and ASA = SkipBeforeWrite
      '  Last cmd in file has to go with first data-record
      '  First cmd to second data-record and so on
      Local Cmd$,Rad$,CmdPos%
      Local PrevCmd$
      Local Tmp$,Rl%,FPos&
      Local CmdASAE As String,CmdPCCE As String
      CmdASAE = Chr$(078,064,240,096,241,242,243,244,245,246,247,248,249,193,194,195, _
                     078,064,240,096,241,242,243,244,245,246,247,248,249,193,194,195)
      CmdPCCE = Chr$(001,009,017,025,137,145,153,161,169,177,185,193,201,209,217,225, _
                     003,011,019,027,139,147,155,163,171,179,187,195,203,211,219,227)
      On Error Resume Next
      '..Get the last command..........................
          FPos& = Lof(gIn.FilNr) - 1
          Get gIn.FilNr,FPos&, Rl%
          If ErrClear > 0 Then Function = 102: Exit Function
          If Rl% = 0 Or Rl% > 255 Then Function = 111: Exit Function
          FPos& = FPos& - Rl%
          Cmd$ = Chr$(32)
          Get gIn.FilNr,FPos&, cmd$
          If ErrClear > 0 Then Function = 102: Exit Function
      '..Save for the first data.......................
          If Instr(CmdPCCE,Cmd$) = 0 Then Function = 110:Exit Function
          PrevCmd$ = Mid$(CmdASAE,Instr(CmdPCCE,Cmd$),1)
      '..loop thru the file............................
          gIn.Refill = 1: Tmp$ = ""
          ErrClear:Seek gIn.FilNr, gIn.FilPos
          If ErrClear > 0 Then Function = 102: Exit Function
          Do
           apiSleep 0
           If gIn.BytesToRead > 0 And gIn.Refill <> 0 Then
            gIn.Refill = 0
            gIn.BuffSize = Min(gIn.BytesToRead,gIn.MaxBuff)
            ErrClear:Get$ gIn.FilNr, gIn.BuffSize, glbInBuff
            If ErrClear > 0 Then Function = 102: Exit Function
            gIn.FilPos = Seek(gIn.FilNr)
            If ErrClear > 0 Then Function = 102: Exit Function
            gIn.BytesToRead = gIn.FilLen - (gIn.FilPos - 1)
          '..Merge buffer..............................
            glbInBuff = Tmp$ + glbInBuff
            gIn.BuffSize = Len(glbInBuff)
            gIn.BuffPos = 1
            If ProgressReport(gIn.FilPos,gIn.FilLen)<> 0 Then Function = 200: Exit Function
           End If
      '..number of char in line........................
           gIn.RadLen    = Cvi(glbInBuff, gIn.BuffPos)
           If gIn.RadLen = 0 Or gIn.RadLen > 255 Then Function = 111: Exit Function
      '..extract cmd and line..........................
           cmd$ = Mid$(glbInBuff, gIn.BuffPos + 2, 1)
           rad$ = Mid$(glbInBuff, gIn.BuffPos + 3, gIn.RadLen - 1)
      '..pos on next line-indicator....................
           gIn.BuffPos = gIn.BuffPos + gIn.RadLen + 4
           If gIn.BuffSize - gIn.BuffPos < 1024 Then
            Tmp$ = Mid$(glbInBuff,gIn.BuffPos)
            gIn.Refill = 1
           End If
      '..get rid of  space och NULL....................
           Rad$    = Rtrim$(Rad$,Any Chr$(0,&H40))
      '..Validate Cmd-code.............................
           CmdPos% = Instr(CmdPCCE,Cmd$)
           If CmdPos% = 0 Then Function = 110:Exit Function
           Cmd$ = Mid$(CmdASAE,CmdPos%,1)
      '..Swap the commands.............................
           Swap Cmd$,PrevCmd$
      '..Put ASA-line to outputbuffer..................
           If gIn.BlkLen = 0 Then
            glbUtBuff = glbUtBuff + Cmd$ + Rad$ + Chr$(13,10)
           Else
            Rad$ = Left$(Cmd$ + Rad$ + String$(gIn.BlkLen,&H40),gIn.BlkLen)
            glbUtBuff = glbUtBuff + Rad$
           End If
      '..fill the output-buffer........................
           If Len(glbUtBuff) > gUt.MaxBuff Then
            If gIn.CharSet = %EBC2ASC Then Replace Any glbEbcdic With glbAscii In glbUtBuff
            If gIn.CharSet = %EBC2ANS Then Replace Any glbEbcdic With glbAnsi  In glbUtBuff
            ErrClear:Put$ gUt.FilNr,glbUtBuff
            If ErrClear > 0 Then Function = 102:Exit Function
            glbUtBuff = ""
           End If
      '..EndOfFile and EndOfBuffer.....................
           If (gIn.BytesToRead < 1) And (gIn.BuffPos >= gIn.BuffSize) Then Exit Do
          Loop
      '..Empty outputbuffer............................
          If Len(glbUtBuff) > 0 Then
           If gIn.CharSet = %EBC2ASC Then Replace Any glbEbcdic With glbAscii In glbUtBuff
           If gIn.CharSet = %EBC2ANS Then Replace Any glbEbcdic With glbAnsi  In glbUtBuff
           ErrClear:Put$ gUt.FilNr,glbUtBuff
           If ErrClear > 0 Then Function = 102:Exit Function
           glbUtBuff = ""
          End If
          Function = 0
      End Function
      ------------------
      Fred
      mailto:[email protected][email protected]</A>
      http://www.oxenby.se

      Fred
      mailto:[email protected][email protected]</A>
      http://www.oxenby.se

      Comment


      • #4
        Not to belabor an old point, but when I see questions about "which of these methods is faster?" I always ask myself, "Why not write some code and test it?"

        Granted, writing programs specifically to test performance is not a whole lot of fun; if you are doing it "on the job" your boss will probably remind you that it produces no useful short-term productivity (and "long-term productivity " is an oxymoron for some managers); and even the most complete empirical data may not satisfy that hunger for a technical justification to validate your results.

        My $0.02

        MCM
        Michael Mattias
        Tal Systems Inc. (retired)
        Racine WI USA
        [email protected]
        http://www.talsystems.com

        Comment


        • #5
          Fred,

          Here is the first try at a very fast dedicated string concantenation
          function. It uses a pre-allocated buffer which must be big enough for the
          strings you wish to add to it. If it is not, you will get a page fault
          when it overwrites the end of the buffer.

          With a pre-allocated buffer it simply writes to the end of the buffer
          each time it is called.

          cnt& = AppendStr(cnt&,adrBuffer&,adrAddin&)

          The three parameters are
          1. the starting position to write to
          2. the address of the main buffer
          3. the address of the string to append

          The function returns the position in the buffer that was last written
          to and it is designed to be used as the starting position in the next
          iteration of the loop.

          As I understand from your original question, you will probably need to
          work with a buffer as no machine I know of will handle that size of
          data in memory so you will need to check the return value to see if
          it is up near the buffer size you allocate. If you can predetermine what
          will be the longest string you will append to the buffer, you can use
          that as an indicator of when the buffer should be processed before
          reusing it again.

          I tested the function by running it a million times with a 10 byte long
          string (10 meg write) and it is fast enough to be exciting, I made the
          mistake of trying a standard basic comparison and it locked up the test
          app for some minutes until I closed the test app down.

          The speed limit is probably the source of the string to append to the
          buffer depending on how it is obtained so the function should be fast
          enough to do the job.

          A small piece of pure PowerBASIC code for your pleasure.

          Regards,

          [email protected]

          Code:
              Buffer$ = space$(1000000)           ' allocate 1 meg
              
              Addin$ = "1234567890"               ' test string to add to it
              
              adrBuffer& = StrPtr(Buffer$)        ' address of buffer
              adrAddin& = StrPtr(Addin$)          ' address of test string
              
              ref& = 0
              cnt& = 0
              
              Do
                cnt& = AppendStr(cnt&,adrBuffer&,adrAddin&)
                ! inc ref&
                If ref& = 100000 Then Exit Do
              Loop
              
              p1& = instr(Buffer$,chr$(0))        ' get terminating zero
              Buffer$ = left$(Buffer$,p1& - 1)    ' get string left of terminator
          
          ' ###########################################################################
            
          FUNCTION AppendStr(ByVal stPos  as LONG, _
                             ByVal Buffer as LONG, _
                             ByVal Addon  as LONG) as LONG
          
              #REGISTER NONE
            
                ! mov edi, Buffer         ; put Buffer address in edi
                ! add edi, stPos          ; add position to address
            
                ! mov esi, Addon          ; put Addon address in esi
                ! push esi                ; save address
            
                ! cld                     ; read forward
            
              apMainLoop:
                ! mov al, [esi]
                ! inc esi
                ! cmp al, 0
                ! je apOut
                ! mov [edi], al
                ! inc edi
                ! jmp apMainLoop
            
              apOut:
                ! mov al, 0
                ! stosb                   ; put terminator on string
            
                ! pop ecx                 ; restore address
                ! sub esi, ecx
            
                ! add stPos, esi
                ! mov edx, stPos
            
                ! dec edx                 ; correct count
            
                ! mov FUNCTION, edx       ; return position
            
          END FUNCTION
            
          ' ###########################################################################

          ------------------
          hutch at movsd dot com
          The MASM Forum

          www.masm32.com

          Comment


          • #6
            Appending strings can become very costly. I would use a Fixed length string as the target and
            use POKE$() or the api RtlMoveMemory functions to copy the
            new string into the old string. Just maintain a pointer/placeholder for the start of the next "copy".

            Hutch's asm would probably be the fastest but Poke$ or RtlMoveMemory should rival it with pointers. If
            you use Binary data then you obviously can't use his AsciiZ string code.


            ------------------
            Ron

            Comment


            • #7
              I agree with Michael. A simple timer I often use is:
              Code:
                LOCAL t AS SINGLE
                t = TIMER
               
                'Routine to test..
               
                MSGBOX "It took" & STR$(TIMER - t) & " seconds to perform."
              Since you know the final buffer size, why not simply allocate it
              from start with Buffer = SPACE$(10000) and then fill it with
              MID$(Buffer, Start, Length) = Whatever$

              I'm sure the Assembler routine from Steve is a lot faster, but
              filling up an already allocated string is pretty fast too..

              ------------------

              Comment


              • #8
                Michael Mattias:
                Not to belabor an old point, but when I see questions about "which of these methods is faster?" I always ask myself, "Why not write some code and test it?"
                You never get a second opinion if you dont talk to people...
                Yust talking to yourself its a bit confusing. Whos opinion am I
                going for: Mine or Mine
                Hutch,
                Thank you for the code. I will try to adopt what you show me.
                But I can't trust that the first NULL in the string means EndOfString.
                In my code I use dynamic strings because they must be able to
                hold character NULL.
                BufferSize is also hard to predict. When the buffer exceed 10kb it will be flushed to disk.
                In this particular code max addon-stingsize is 256 bytes.
                But this kind of problem can be overcome. Just waste some memory.
                STRINSERT is appealing.
                Buff$=Tmp$ & Buff$ can be substituted with
                Buff$=STRINSERT(Buff$,Tmp$,1)
                Buff$=Buff$ & Tmp$ can be substituted with
                Buff$=STRINSERT(Buff$,Tmp$,%TwoGigabyte)
                Question is if STRINSERT is more efficient.




                ------------------
                Fred
                mailto:[email protected][email protected]</A>
                http://www.oxenby.se



                [This message has been edited by Fred Oxenby (edited February 11, 2000).]
                Fred
                mailto:[email protected][email protected]</A>
                http://www.oxenby.se

                Comment


                • #9
                  Fred,

                  Thanks for explaining, the routine I posted will not do the job
                  as it assumes the first null in the appended string is zero and
                  this will not handle what you are doing. I can always use string
                  functions so the last one was not wasted and when I am a little
                  more awake, I think I know what will do the concantenation in
                  the manner you require.

                  I have one question, is it possible in the way the data you are
                  processing is being accessed to work on much larger blocks than
                  the size you are working with, there is always a sizable speed
                  advantage by working on a much larger block.

                  From the code you posted, it seems to be a character conversion
                  from a mainframe or similar so it will depend on the data transfer
                  speed of the connection you are using but if you can use a buffer
                  size of a meg or more safely, there will be a corresponding
                  reduction in the number of disk writes on the local machine and
                  data accesses to the source machine.

                  Regards,

                  [email protected]

                  ------------------
                  hutch at movsd dot com
                  The MASM Forum

                  www.masm32.com

                  Comment


                  • #10
                    Yes, this is "normalising" of Mainframe data.
                    When I first ported this code to Powerbasic (Windows)
                    I used 1 MB buffers.
                    But as this stores data on a network (Novell/Token Ring,no local writes) real-life test
                    showed that 10 kb output-buffer was the optimal setting for
                    this particular network.
                    In my NT-network (100 Mbit switched) I was able to use much larger buffers.
                    I have often thought about handing over actual writing of
                    data to a seperate thread, thereby allow increased buffering, but
                    it has never came to an working sulotion.
                    I use the same code to "count pages" according to dynamic
                    formatting-code in the data,no concatenation of output or writes,
                    but analysing every line of data including inmemory database lookups,
                    and that is binding fast.
                    My greatest regret is that I have to support platforms from
                    Win95 on a x486 to Win2000 on PIII and soon perhaps something
                    complete diffrent.



                    ------------------
                    Fred
                    mailto:[email protected][email protected]</A>
                    http://www.oxenby.se

                    Fred
                    mailto:[email protected][email protected]</A>
                    http://www.oxenby.se

                    Comment


                    • #11
                      Mainframe data? Cool..

                      I wrote a program specifically for testing the client file transfer ability...the BIGFILE.EXE makes large files by specifying a meg size as the command... did 100mb no prob


                      Anyway Fred, wouldn't this be faster?
                      local ptr1 as long
                      local ptr2 as long
                      local ptr3 as long

                      ptr1 = varptr(string1)
                      ptr2 = varptr(string2)
                      ptr3 = varptr(string3)

                      @ptr1 = @ptr2 & @ptr3


                      ???

                      Maybe I don't have my strings right but it would seem pointers would be the fastest way (??)


                      ------------------
                      Scott Turchin


                      Scott Turchin
                      MCSE, MCP+I
                      http://www.tngbbs.com
                      ----------------------
                      True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

                      Comment


                      • #12
                        Fred,

                        This is try 2 for the string concantenation routine. It is a simpler
                        function as it does not need to scan for the terminating zero but it
                        needs an additional parameter which is the length of the string to add.
                        For the 10k buffer you use, it seems to be fast enough but it is my
                        guess that this will only fix a minor bottleneck as the complexity of
                        the task you are performing and the network limitations are the real
                        time cost.

                        Regards & I hope its useful to you.

                        [email protected]

                        The function AppendStr2() takes 4 parameters,

                        1. starting position in the string.
                        2. address of the main buffer.
                        3. address of the string to add.
                        4. length of the added string.

                        The return value is the parameter to feed back to the function as its
                        starting position for the next iteration of the loop.

                        The example code to use it checks the return value of the function and
                        exits the loop when the byte count written to the buffer exceeds the
                        predetermined size.

                        Code:
                            Buffer$ = space$(12000) ' make a bit larger for safety margin
                            
                            Addin$ = "1234567890"+chr$(0)+"0987654321"
                            
                            pos& = 0
                            
                            Do
                            ' -------------------
                            ' get the string here
                            ' -------------------
                              pos& = AppendStr2(pos&,StrPtr(Buffer$), _
                                                StrPtr(Addin$),len(Addin$))
                              If pos& > 10000 Then Exit Do
                            Loop
                            
                            Result$ = left$(Buffer$,pos&)
                            
                        ' ###########################################################################
                            
                        FUNCTION AppendStr2(ByVal stPos  as LONG, _
                                            ByVal Buffer as LONG, _
                                            ByVal Addon  as LONG, _
                                            ByVal lenAdd as LONG) as LONG
                            
                            #REGISTER NONE
                            
                            ! cld               ; read forwards
                            
                            ! mov edi, Buffer   ; put buffer address in edi
                            ! add edi, stPos    ; add starting offset to it
                            
                            ! mov esi, Addon    ; put string address in esi
                            ! mov ecx, lenAdd   ; length in ecx as counter
                            
                            ! rep movsb         ; copy ecx count of bytes from esi to edi
                            
                            ! mov edx, stPos
                            ! add edx, lenAdd   ; add stPos and lenAdd for return value
                            
                            ! mov FUNCTION, edx
                            
                        END FUNCTION
                            
                        ' ###########################################################################

                        ------------------
                        hutch at movsd dot com
                        The MASM Forum

                        www.masm32.com

                        Comment


                        • #13
                          Thank's agin for your time.
                          I'll give it a shot
                          You are ofcourse right. NumberOne time-waster here is writing
                          to the disk. That is another project to sort out.
                          For now I am trying to find the right path for string-handling.
                          For creating physical diskfiles I have built in the ability
                          to dynamicly change both IN/OUT buffersize depending on network-
                          type. ATM is coming rapidely to my customers, and that will
                          change the priority-list.



                          ------------------
                          Fred
                          mailto:[email protected][email protected]</A>
                          http://www.oxenby.se



                          [This message has been edited by Fred Oxenby (edited February 12, 2000).]
                          Fred
                          mailto:[email protected][email protected]</A>
                          http://www.oxenby.se

                          Comment


                          • #14
                            Fred --

                            I have not been following this thread very closely, and these may be dumb ideas, but what the heck…

                            Here's a zen-like solution for you… The fastest way to add strings together is to not add them together. In other words, how about creating a string array with lots of elements. When you build your first short string, put it in array element #1 and INCR a counter that keeps track of where you are in the array. Also add the length of the string to a "total string volume" variable (a LONG). When string #2 is ready, don't add it to string #1, put it in element #2, and add the string length to the volume variable. When the volume reaches 10k, use a tight loop to blast all of the strings from the array to your output file, in sequence.

                            Another idea…

                            > NumberOne time-waster here is writing to the disk

                            You're building up a long string and then writing it to disk when it reaches a certain size, right? Why not forget about caching them and just write the individual short strings to the disk as you create them, and the disk-writing process will concatenate them automatically. If it helps the speed, count the bytes as they go out and do a FLUSH every 10k bytes.

                            How much of a real-world speed improvement do you actually get by building up a long string and only writing when the buffer reaches a certain size? The way Windows and Netware automatically cache disk writes, it seems like you wouldn't gain a significant amount of speed by pre-caching them yourself. Here's why I ask… I worked very hard on optimizing a project like this a long time ago, and I eventually found out that I was wasting my time. My stress-test programs showed that the network's "blast" speed was optimized for a certain-size block, but in real-world situations my program was not writing data full time (as I was in my test programs). In real life I was writing a few k and then pausing for a fraction of a second, then writing a few more k. Also, my data did not fit into nice blocks of exactly the correct size, so I was always writing a few bytes too many and hurting the efficiency. In the long run I found that I was better off just writing the data as soon as it was available, rather than caching it. The actual real-world throughput was optimized when I did that.

                            -- Eric


                            ------------------
                            Perfect Sync: Perfect Sync Development Tools
                            Email: mailto:[email protected][email protected]</A>

                            "Not my circus, not my monkeys."

                            Comment


                            • #15
                              Fred;

                              There are few factors to consider. One is how to allocate and manipulate your RAM buffers and how to write the data to a disk file for optimum speed.

                              First, lets discuss buffer size. Todays hard-drives have a certain degree of redundantcy when writing data. When the data blocks written to a harddrive are too small, the heads must make multiple passes to the same track to write the small blocks. By using larger blocks when writing data, you can speed up disk access as much as ten fold.

                              I have thoroughly tested this and the optimum buffer size is about 32 KB. A buffer of 8 KB is OK, but you do get an improvement in speed when the buffers get close to 32 KB.

                              You need to use a 32 KB buffer in RAM and then flush them when full to the disk. A 32 KB buffer can be as much as twice as fast as an 8 KB buffer ( and maybe better).

                              Next is how to do you handle the RAM buffers.

                              First, rule is never use string functions (except maybe MID$ as a statement, not a function). String functions can easily force PB to destroy a memory buffer and then create a new one to impliment the string function.

                              Hutches suggestion is the best. Use assembler. The next best (and quite good) is to use POKE$ and PEEK$, since they don't allocate memory for variables and they just move bytes. They may be as fast as assembler or at least close to it.

                              Create your buffer only once:

                              Buffer$=string$(32000, " ")

                              Then use POKE$ to put the bytes in the string buffer. Use a single Long variable to track the pointer of where the end of the buffer is and increment it when data is POKEd into the buffer.

                              When the buffer is to be cleared, you simply set the pointer for the end of the buffer to zero , but you don't actually write any data to the buffer.

                              Your output disk file should be "Binary" (not OUTPUT) and you should use the PUT # command to write the data. Simply find the EOF or use a Long as a pointer to track the current EOF position.

                              By using the 32 KB buffers and switching to POKE$ when writing data to your buffer, you will greatly speed up your program.

                              If you need to sometimes use a smaller buffer at times, you may want to use a UDT with fixed length strings as a buffer.

                              ie.

                              Type MyBuffer
                              Buffer1 as string * 5000
                              Buffer2 as string * 5000
                              Buffer3 as string * 5000
                              Buffer4 as string * 5000
                              Buffer5 as string * 5000
                              Buffer6 as string * 5000
                              Buffer7 as string * 2000
                              End Type

                              Dim Buffer as MyBuffer

                              If data is flowing quickly, you can wait until the entire UDT (32 KB) is fill before flushing.
                              If the data is flowing slowly at the moment you can flush just the parts of the buffer are filled (Buffer.buffer1, Buffer.buffer2, etc.)

                              This way if your program has a periodic interval where the buffers must be flushed, you can still use simple PUT # statements to flush just part of the buffer.


                              ------------------
                              Chris Boss
                              Computer Workshop
                              Developer of "EZGUI"
                              http://cwsof.com
                              http://twitter.com/EZGUIProGuy

                              Comment


                              • #16
                                Eric,
                                This is really an old 16-bit netware-client hangover.
                                I have been doing what you suggest. But since many of the records is only 1 byte long it generated tremendous network-traffic.
                                I am not a network/netware specialist so I really can't argue.
                                With the introduction of Netware32 bit client (dos and Windows)
                                there were an incredible boost in reading-performance. Write-performance was not boosted the same way.
                                There are probably experts out there who can step in and give
                                an good answer to this...
                                Anyhow, Caching/CachWriteThrue has been tested and Caching is doing a lot for the overall performance.You can't live without it.
                                But, Eric, you had me wondering for a while, so I decided take MCM
                                seriously and wrote some testcode.
                                To be simple I used an average line-length of 100 byte.
                                Code:
                                '..Test only put$:
                                Open "F:\##\0" For Binary As #1
                                 a$ = String$(97,"X")
                                 TT = TimeGetTime()
                                 For i& = 1 To 500,000
                                  b$="1" & a$ & Chr$(10,13)
                                  Replace Any "ABCDEFGHIJKLMNOPQRSTUVWXYZ" With "abcdefghijklmnopqrstuvwxyz" In b$
                                  Put$ #1,b$
                                 Next i&
                                 Print "Put only   " & Format$(TimeGetTime - TT)
                                Close 1
                                'Result 500,000 Put$*100 bytes = 47,25 seconds
                                Open "F:\##\1" For Binary As #1
                                 a$ = String$(97,"X")
                                 TT=TimeGetTime
                                 For j& = 1 To 5000
                                  For i& = 1 To 100
                                   c$="1" & a$ & Chr$(10,13)
                                   b$ = b$ + c$
                                  Next i&
                                  Replace Any "ABCDEFGHIJKLMNOPQRSTUVWXYZ" With "abcdefghijklmnopqrstuvwxyz" In b$
                                  Put$ #1,b$:b$=""
                                 Next j&
                                 Print "Add & Put  " & Format$(TimeGetTime - TT)
                                 Close 1
                                'Result 500,000 adds and 5000 put$*10 kbytes = 52,45 seconds
                                With an more realistic average line-length of 50 charachters
                                Result WIN98SE 1,000,000 Put$*50 bytes = 52,39 seconds
                                Result WIN98SE1,000,000 adds and 5000 put$*10 kbytes = 80,57 seconds
                                Result WIN2000 1,000,000 Put$*50 bytes = 36,15 seconds
                                Result WIN2000 1,000,000 adds and 5000 put$*10 kbytes = 38,5 sekonds
                                On a busy network (two parallell tests running)Win98SE and a Win2000.
                                Result WIN98SE 1,000,000 Put$*50 bytes = 72,12 seconds
                                Result WIN98SE1,000,000 adds and 5000 put$*10 kbytes = 78,44
                                OOPS what happened here
                                Result WIN2000 1,000,000 Put$*50 bytes = 93,21 seconds
                                Result WIN2000 1,000,000 adds and 5000 put$*10 kbytes = 93,00 seconds.
                                Seems Win200 cannot coop with a busy network...
                                It also seems that I, without really understanding it, have found
                                an ideal combination of adding/put with my 10 kb Buffer on a busy (as usual) network.


                                ------------------
                                Fred
                                mailto:[email protected][email protected]</A>
                                http://www.oxenby.se



                                [This message has been edited by Fred Oxenby (edited February 12, 2000).]
                                Fred
                                mailto:[email protected][email protected]</A>
                                http://www.oxenby.se

                                Comment


                                • #17
                                  Sorry Chris, I did not see your posting at first.
                                  But I agree with you.If we talk about local disk-drives
                                  I am not sure about the figures you mention, but when
                                  running the test on a local disk, using a larger buffer
                                  is actually faster:
                                  1,000,000 put$*50 bytes = 58,74 seconds
                                  1,000,000 adds and 5000 Put$ * 10kb = 49,4 seconds
                                  I havent tested with larger buffer, as that involves more
                                  string-concatenation than I care to do.


                                  ------------------
                                  Fred
                                  mailto:[email protected][email protected]</A>
                                  http://www.oxenby.se

                                  Fred
                                  mailto:[email protected][email protected]</A>
                                  http://www.oxenby.se

                                  Comment


                                  • #18
                                    Back again with testresult using hutch's string-add function
                                    First Eric, your idea was not bad..
                                    but using buffers more than 50 kB to 1 MB hutch's assembly-code proved
                                    to be about 20% faster.Smaller buffers, not so good,
                                    Testprog:
                                    Code:
                                    c$=Space$(BuffSize& + 2000)
                                    Open "F:\##\00" For Binary As #1
                                     a$ = String$(47,"X")
                                     stPos& = 0
                                     TT = TimeGetTime()
                                     For i& = 1 To 1000000
                                      b$="1" & a$ & Chr$(10,13)
                                      Replace Any "ABCDEFGHIJKLMNOPQRSTUVWXYZ" With "abcdefghijklmnopqrstuvwxyz" In b$
                                      Buffer& = StrPtr(c$)
                                      Addon&  = StrPtr(b$)
                                      lenAdd& = Len(b$)
                                       ! cld               ; read forwards 
                                       ! mov edi, Buffer&   ; put buffer address in edi
                                       ! add edi, stPos&    ; add starting offset to it
                                       ! mov esi, Addon&    ; put string address in esi
                                       ! mov ecx, lenAdd&   ; length in ecx as counter 
                                       ! rep movsb         ; copy ecx count of bytes from esi to edi
                                       ! mov edx, stPos&
                                       ! add edx, lenAdd&   ; add stPos and lenAdd for return value
                                       ! mov stPos&, edx
                                      If stPos > BuffSize& Then
                                        b$=Left$(c$,stPos&)
                                        Put$ #1,b$
                                        stPos& = 0
                                      End If
                                     Next i&
                                     If stPos > 0 Then
                                      b$=Left$(c$,stPos&)
                                      Put$ #1,b$
                                      stPos& = 0
                                     End If
                                     Print Format$(BuffSize&,"0000000") & " Adds " & Format$(TimeGetTime - TT)
                                    Format$(TimeGetTime - TT)
                                    Close 1
                                    HUTCH: Average (independent of buffersize 100kb to 1000 kb) to
                                    write 50 MB data = 40 seconds
                                    ERIC: 50 bytes per Put$ about 52-55 seconds to write 50MB data


                                    ------------------
                                    Fred
                                    mailto:[email protected][email protected]</A>
                                    http://www.oxenby.se



                                    [This message has been edited by Fred Oxenby (edited February 13, 2000).]
                                    Fred
                                    mailto:[email protected][email protected]</A>
                                    http://www.oxenby.se

                                    Comment


                                    • #19
                                      Fred;

                                      Most of my tests with hard-drive speeds was when harddrives were smaller (about 500 meg). Todays harddrives (especially a network server drive) are very optimized.

                                      Your tests show that you are pushing the max with harddrive speed. My old tests showed a max speed of about 1 meg per second.

                                      Your test numbers show a speed of slightly better than 1 meg per second.

                                      I don't think that you will get much better than that.

                                      How big do you expect your data files to get ?

                                      If you use Hutches assembler with a buffer of at least 32 KB, I don't think you will be able to go much faster than that.

                                      The only other solution to speed things up (since working in RAM will always be faster than writing data to the harddrive) is to "compress" the data sent to the harddrive so you are writing less data. Even if you can decrease the data flow by 20% to 50% this would mean a lot. Compression doesn't have to mean zip like compression, but even using byte codes (or abbreviations) for commonly used words. Remove a decimal part that is not needed (ie. 1.00 would be just 1).

                                      If you don't need the data in ascii format, saving numbers as bytes rather than text will save.



                                      ------------------
                                      Chris Boss
                                      Computer Workshop
                                      Developer of "EZGUI"
                                      http://cwsof.com
                                      http://twitter.com/EZGUIProGuy

                                      Comment


                                      • #20
                                        Fred,

                                        I have always kept in mind a piece of your humour from a long time ago
                                        that optimisation of code is like having a discussion with your wife so I
                                        understand the idea that there will always be a point where economy
                                        dictates how much work you do on a problem to get a result.

                                        What I wondered is if it is easy enough to test the absolute data transfer
                                        and disk write process independently from the text processing to see what
                                        percentage of the total time is taken up by network data sourcing and disk
                                        write time. This will give a reasonable indication of what percentage of
                                        time can be worked on to make the text processing that you are doing
                                        faster if in fact it can be made faster.

                                        Isolation of the access speed, the processing speeds and the disk write
                                        time will gve a lot clearer picture of where any improvements can be made
                                        if it is possible. While the idea of working on a much larger buffer is an
                                        attractive one, I understand the limits that you have found to be the most
                                        effective so while doing an algo that used DWORD size memory copy for the
                                        string concantenation is possible, it would probably not be much if any
                                        faster than the byte copy algo that you have.

                                        Your sugestion of using multithread code sounds an interesting one if a
                                        bit complex to get going and depending on available memory on the oldest
                                        box that runs the software, splitting the sourcing, processing and writing
                                        of the data may have some advantage, I guess it depends on how much work
                                        you can afford to put into it.

                                        [email protected]

                                        ------------------
                                        hutch at movsd dot com
                                        The MASM Forum

                                        www.masm32.com

                                        Comment

                                        Working...
                                        X