Announcement

Collapse
No announcement yet.

Pointers and speed

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pointers and speed

    I believed pointers would be usefull to increase execution speed. Then I found the following:

    Ex. 1:

    a$="ABCD"
    b$="AABCDEFGHIJKLMNOPQ..... (etc)" <- a long (1 Mb) string
    j%=instr(a$,b$)

    Ex. 2:
    a$="ABCD"
    b$="AABCDEFGHIJKLMNOPQ..... (etc)" <- a long (1 Mb) string
    c = strptr(a$)
    j%=instr(@c,b$)

    ex.1 and ex.2 should do the same (in fact, they do). But ex. 2 is about 10 times as slow as ex. 1.

    Any ideas ?

  • #2
    Rob --
    pointers are useful to avoid data conversion (string to byte, for example).
    In your case is clear that pointers are not able to increase speed. But from another side to decrease it also.
    Sure, that you have problems with timing.
    Try this
    Code:
       #Compile Exe
       #Register None
       #Dim All
       Function PbMain
          Dim a As String, b As String, c As String Ptr, i As Long, j As Long
          Dim t1 As Single, t2 As Single, t3 As Single, t4 As Single
          %n = 200
          a$= "ABCD": b$= Space$(1000000): c = VarPtr(a$)
          t1 = Timer: For i = 1 To %n: j = Instr(b$, a$): Next: t2 = Timer
          t3 = Timer: For i = 1 To %n: j =Instr(b$, @c): Next: t4 = Timer
          MsgBox Format$(t2 - t1) + " vs " + Format$(t4 - t3)
       End Function
    ------------------

    Comment


    • #3
      First, there is no way that pointers would be 10 times slower in your elementary example code, Rob, unless there was something else your code did but you did not post.

      Pointers can certainly increase performance of code, but in the case of INSTR(), you'll be hard pressed to get better performance without resorting to inline assembly because INSTR() is already highly optimized for you!

      A better example of when to use pointers could be if you wished to work on your 1Mb buffer on a character by character basis - in this case using indexed pointers would be much faster than using MID$().


      ------------------
      Lance
      PowerBASIC Support
      mailto:[email protected][email protected]</A>
      Lance
      mailto:[email protected]

      Comment


      • #4
        Rob,

        Its a bit to do with what you are doing that gives you the result you are
        getting. INSTR as Lance pointed out is a fast and well optimised piece of
        code and to try and improve on it, you are faced with writing very complex
        search algorithms in assembler with no garrantee that you will improve on
        it.

        Bob Zale has been writing assembler since the days of wire wrapped processors
        so I suggest that he well knows what he is doing in this area, Boyer Moore
        and similar search algorithms have been around for a long time but a modern
        byte scanner will give them a very good run for their money as they have far
        less overhead and they do not step then read backwards to find a mismatch.

        The reason why you are getting a speed reduction is that you are trying to
        do more by using the "pointer" operator. The internal structure of the INSTR
        function already handles its own addressing so to try and duplicate it just
        adds a lot more overhead to it.

        There are many places where using pointers will make your code a lot faster,
        this just happens not to be one of them.

        Regards,

        [email protected]

        ------------------
        hutch at movsd dot com
        The MASM Forum

        www.masm32.com

        Comment


        • #5
          First of all, thnx for the replies.

          It might help if I explain what I need to do here:

          I have two large files of similar size and content and I try to create a file that would just give me the difference of the two files. The idea is that this new file is a lot smaller than the originals. Then by using the original and the delta file, I should be able to reconstruct the second file again.
          The file sizes can be up to 150 Mb and contain binary data.

          So I start with reading a reasonable part of the first file - let's say 5 Mb - and the start comparing it with small chunks of the second file - let's say 100 bytes - using the INSTR function. When I find a match I write the non-matching part to the new file. Then I keep comparing until I find a non-matching part again, etc...
          Now INSTR becomes quite slow when searching for a short string in a very long string. However, since the differences can be small and there can be large amounts of new data in the file, I need to be able to detect small changes and look ahead for quite some bytes before I find my match again. Therefore the idea is to "shift-ahead" the starting point of the INSTR search in the large string by using a string pointer and then adding the last matching position to the pointer as to force INSTR to start looking from that point onwards.

          It was there that I found that INSTR is a lot slower (indeed a factor of 10) when using a pointer.

          I fear I need to re-write the whole thing in assmebly language to accomplish what I need, unless there are any *smart* suggestions from the forum....

          Regards,


          Rob de Jong

          ------------------

          Comment


          • #6
            hm
            *grml*
            interesting problem (about that file comparison), i'm thinking about it

            what I would do is:
            * Use a "progress pointer" for every file
            hm
            think of the following:
            recognizing a correspondence of about 20 bytes is not necessary.
            so in order to optimize your code, do the following:
            * take one 4-byte-block from the new (32 bit)
            * write a function to find the block in the old.
            * then omit the next 16 bytes from the new file, and your next block will be byte 17-20.
            * if you find the block in the old file, check where the correspondence starts and where it ends
            * set the "progress pointers" to the end of the correspondence
            * using the progress pointers, you will be able to determine whether data has to be added or deleted in order to convert the old file to the new one ...

            i hope this is clear for you, else just ask any questions.
            just how i would do it.




            ------------------

            Comment


            • #7
              Originally posted by Rob de Jong:
              I believed pointers would be usefull to increase execution speed. Then I found the following:

              Ex. 1:
              Code:
              a$="ABCD"
              b$="AABCDEFGHIJKLMNOPQ..... (etc)" <- a long (1 Mb) string
              j%=instr(a$,b$)
              Ex. 2:
              Code:
              a$="ABCD"
              b$="AABCDEFGHIJKLMNOPQ..... (etc)" <- a long (1 Mb) string
              c = strptr(a$)
              j%=instr(@c,b$)
              ex.1 and ex.2 should do the same (in fact, they do). But ex. 2 is about 10 times as slow as ex. 1.
              The second version here is likely to be slower than the first not due to any particular lack of speed with pointers, but because you are conducting more operations. In practice, you would not be using STRPTR every time. It's not actually clear that you need to use pointers here at all, though. Are you familiar with the INSTR(startposn, a$, b$) syntax?

              ------------------
              Tom Hanlin
              PowerBASIC Staff

              Comment


              • #8
                Tom,

                your are absolutely correct about the usage of "starting pos". I must have had a temporary lapse of reason (aka programmers blindness). What I try to create is what Bob alreay did in INSTR.

                Still, the speed difference is actually true and remains. I'll beam up an example asap.

                Thnx anyway,

                Rob

                ------------------

                Comment


                • #9
                  Rob --
                  sorry, but you need to prove the speed difference.
                  Up to now you posted wrong sample only, where is used incorrect order of parameters.
                  You should search "short" string inside "long".
                  a$="ABCD"
                  b$="AABCDEFGHIJKLMNOPQ..... (etc)" <- a long (1 Mb) string
                  You used j%=instr(a$,b$), but should be j%=instr(b$,a$)
                  Probably, this is a reason of incorrect timing.

                  Another possible reason.
                  If c is declared as String Ptr (like it should be, if you want to compare contents of binary files), you should use VarPtr instead of StrPtr.

                  [This message has been edited by Semen Matusovski (edited June 09, 2000).]

                  Comment

                  Working...
                  X