Announcement

Collapse
No announcement yet.

Any Tweakers?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any Tweakers?

    I wrote a utility that scans a replication log file from Sybase, it searches for "MISSING","FAILED", and "ILLEGAL." It also scans the file and dumps to three text files where Insert,Update, and Delete statements occur and parses the table name. Next, it loads a text file containing just table names and counts 1 at a time per output file to make up the totals. Just to load the 81MB log files takes 10 minutes on NT (with notepad). I have all this happening below in less than 5 minutes. Any suggestions for improvements?

    Thanks,
    Brad

    Code:
    FUNCTION PBMAIN()
           DIM t AS SINGLE
           DIM a$
           DIM x$
           DIM f$
           DIM i$
           DIM m$
           DIM d$
           DIM u$
           DIM count AS DOUBLE
           DIM fcount AS DOUBLE
           DIM icount AS DOUBLE
           DIM mcount AS DOUBLE
           DIM ucount AS DOUBLE
           DIM dcount AS DOUBLE
           DIM incount AS DOUBLE
           count = 0
           fcount = 0
           icount = 0
           mcount = 0
           ucount = 0
           dcount = 0
           incount = 0
           t = TIMER
           u$ = "UPDATE"
           d$ = "DELETE FROM"
           x$ = "INSERT INTO"
           f$ = "FAILED"
           i$ = "MISSING"
           m$ = "ILLEGAL"
           OPEN "c:\temp\qatarmsg.log" FOR INPUT AS #1 LEN = 16384
           OPEN "c:\temp\insert.tmp" FOR OUTPUT AS #2 LEN = 16384
           OPEN "c:\temp\delete.tmp" FOR OUTPUT AS #3 LEN = 16384
           OPEN "c:\temp\update.tmp" FOR OUTPUT AS #4 LEN = 16384
             DO WHILE NOT EOF(1)
                INPUT #1, a$
                'REGEXPR x$ IN a$ TO where&, length&
                'IF length& THEN
                IF INSTR(1,a$,x$)>0 THEN
                  'count = count + 1
                  WRITE# 2, MID$(a$,12,INSTR(1,a$,CHR$(40)) -12) & CHR$(33)
                  END IF
                IF INSTR(1,a$,d$)>0 THEN
                  'count = count + 1
                  WRITE# 3, MID$(a$,12,INSTR(1,a$,CHR$(40)) -12) & CHR$(33)
                END IF
                IF INSTR(1,a$,u$)>0 THEN
                  'count = count + 1
                  WRITE# 4, MID$(a$,7,INSTR(1,a$,CHR$(40)) -7) & CHR$(33)
                END IF
                IF INSTR(1,a$,f$)>0 THEN
                  fcount = fcount + 1
                END IF
                IF INSTR(1,a$,m$)>0 THEN
                  mcount = mcount + 1
                END IF
                IF INSTR(1,a$,i$)>0 THEN
                  icount = icount + 1
                END IF
                LOOP
                CLOSE 1
                CLOSE 2
                CLOSE 3
                CLOSE 4
                OPEN "c:\temp\tables.txt" FOR INPUT AS #1 LEN = 16384
                OPEN "c:\temp\rep.log" FOR OUTPUT AS #2 LEN = 16384
                WRITE# 2, "Replication Log for " & DATE$
                WRITE# 2, "-----------------------------------------------------"
                WRITE# 2, ""
                WRITE# 2, "Failed= " & STR$(fcount) & "  Missing= " & STR$(mcount) & "  Illegal= " & STR$(icount)
                WRITE# 2, ""
                WRITE# 2, "-----------------------------------------------------"
                WRITE# 2, ""
                WRITE# 2, "Tables with deleted items"
                WRITE# 2, "-----------------------------------------------------"
                DO WHILE NOT EOF(1)
                   INPUT #1, a$
                   a$ = REMOVE$(a$,"'")
                   a$ = a$ & CHR$(33)
                   OPEN "c:\temp\delete.tmp" FOR INPUT AS #3 LEN = 16384
                   DO WHILE NOT EOF(3)
                      INPUT #3, d$
                      IF INSTR(1,d$,a$)>0 THEN
                         dcount = dcount + 1
                      END IF
                   LOOP
                   IF dcount > 1 THEN
                      WRITE# 2, REMOVE$(a$,CHR$(33)) & STR$(dcount)
                   END IF
                   dcount = 0
                   CLOSE 3
                LOOP
                CLOSE 1
                OPEN "c:\temp\tables.txt" FOR INPUT AS #1 LEN = 16384
                WRITE# 2, ""
                WRITE# 2, "Tables with inserted items"
                WRITE# 2, "-----------------------------------------------------"
                DO WHILE NOT EOF(1)
                   INPUT #1, a$
                   a$ = REMOVE$(a$,"'")
                   a$ = a$ & CHR$(33)
                   OPEN "c:\temp\insert.tmp" FOR INPUT AS #3 LEN = 16384
                   DO WHILE NOT EOF(3)
                      INPUT #3, d$
                      IF INSTR(1,d$,a$)>0 THEN
                         icount = icount + 1
                      END IF
                   LOOP
                   IF icount > 1 THEN
                      WRITE# 2, REMOVE$(a$,CHR$(33)) & STR$(icount)
                   END IF
                   icount = 0
                   CLOSE 3
                LOOP
                CLOSE 1
                OPEN "c:\temp\tables.txt" FOR INPUT AS #1 LEN = 16384
                WRITE# 2, ""
                WRITE# 2, "Tables with updated items"
                WRITE# 2, "-----------------------------------------------------"
                DO WHILE NOT EOF(1)
                   INPUT #1, a$
                   a$ = REMOVE$(a$,"'")
                   a$ = a$ & CHR$(33)
                   OPEN "c:\temp\update.tmp" FOR INPUT AS #3 LEN = 16384
                   DO WHILE NOT EOF(3)
                      INPUT #3, d$
                      IF INSTR(1,d$,a$)>0 THEN
                         ucount = ucount + 1
                      END IF
                   LOOP
                   IF ucount > 1 THEN
                      WRITE# 2, REMOVE$(a$,CHR$(33)) & STR$(ucount)
                   END IF
                   ucount = 0
                   CLOSE 3
                LOOP
                CLOSE 1
                CLOSE 2
                t = TIMER - t
    END FUNCTION
    [This message has been edited by Bradley Callis (edited June 04, 2000).]
    Thanks,

    Bradley Callis

  • #2
    It would be much easier to look at your code -- so more people will be likely to do it -- if you'd add [ code ] and [ /code ] tags before and after the source code, to preserve indenting. (I''ve typed the tags with additional spaces so they'll show here. You should type them without spaces.)

    -- Eric

    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>

    "Not my circus, not my monkeys."

    Comment


    • #3
      Eric,

      Haven't told you how much I like SQL-TOOLS. Wish it could work with transaction logs. Sorry about the code post, fixed now.

      ------------------
      Thanks,

      Bradley Callis

      Comment


      • #4
        Bradley --

        > Haven't told you how much I like SQL-TOOLS.

        Thanks!

        > Wish it could work with transaction logs.

        It's pretty funny that a system like Sybase produces output files that Sybase can't read! Maybe a more effective way to handle this would be to use SQL Tools to write a program that reads the transaction logs and creates a new database. Then you could use all of your existing reporting tools to analyze the transactions logs. (I bet you could even sell a few copies of a program like that.)

        As far as tweaking this program goes...

        I suspect that 16384 is not the optimum OPEN...LEN. I'd replace "16384" with %BlockSize, and then try different %BlockSize = values at the top of your program. I seem to remember somebody saying that 8192 works best in most cases.

        Also, you appear to be testing every line of the file over and over, even after a match has been found. In other words, when you do this:

        Code:
        IF INSTR(1,a$,x$)>0 THEN
          'count = count + 1
          WRITE# 2, MID$(a$,12,INSTR(1,a$,CHR$(40)) -12) & CHR$(33)
        END IF
        IF INSTR(1,a$,d$)>0 THEN
          'count = count + 1
          WRITE# 3, MID$(a$,12,
        END IF
        (and so on)
        ...if the first test finds a match, you are still performing all of the other tests, even though you know they won't find a match. Try doing this:

        Code:
        IF INSTR(1,a$,x$)>0 THEN
          'count = count + 1
          WRITE# 2, MID$(a$,12,INSTR(1,a$,CHR$(40)) -12) & CHR$(33)
        ELSEIF INSTR(1,a$,d$)>0 THEN
          'count = count + 1
          WRITE# 3, MID$(a$,12,
        ELSEIF INSTR...
        (and so on)
        And put the IF-tests in the order that matches their frequency. If UPDATE is the most common operation, test for that first.

        Also, I seem to remember that LINE INPUT is faster than INPUT. Have you tried it?

        Also, if the position of keywords like DELETE FROM is predictable, then using LEFT$ or MID$ would be much faster than INSTR. Let's say you have a 100-character string, and if it contains DELETE FROM it will always be at the very start of the string. Using LEFT$ would perform a single check and move on. Using INSTR checks the entire string, even if DELETE FROM will never be found after the first few characters, and that takes time.

        Even if the keyword positions are not predictable, take advantage of additional parsing. Sometimes more complex code runs faster! For example, if the keywords always follow a colon, find them that way instead of using INSTR.

        HTH.

        -- Eric

        ------------------
        Perfect Sync: Perfect Sync Development Tools
        Email: mailto:[email protected][email protected]</A>



        [This message has been edited by Eric Pearson (edited June 04, 2000).]
        "Not my circus, not my monkeys."

        Comment


        • #5
          Eric,

          I applied your recommendations and it did speed it up about ten percent. Still a little long. Basically, I just want to total how many unique strings occur in this file. Would an array be faster? Also, I could use SQL to perform this but, I want it as a stand alone app. Working on an all inclusive Utility for Sybase which is going to be released soon, problem is that this program would be run from a client and every client is configured differently, ie DSN name, tcpip,ipx, etc.

          Thanks,

          Brad

          ------------------
          Thanks,

          Bradley Callis

          Comment

          Working...
          X