Announcement

Collapse
No announcement yet.

Refactoring your PB source code with uCalc Transform

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Refactoring your PB source code with uCalc Transform

    uCalc Transform allows you to modify source code (or any text) in powerful ways. Starting today, I am gradually building a transform you can use to clean up or enhance your PowerBASIC source code (either yours or code you inherited). I plan to post each new addition to it in this thread on an ongoing basis. A transform is simply code containing the rules that tell uCalc Transform how to process and modify text it receives as input. The transforms in this refactoring tool will range from the visual aesthetics to optimizations of your source code.

    Index

    Inserting missing quote in a string literal
    Shortcut for Global, Local, Static, Threaded
    Changing Left$+Mid$ to StrInsert$
    Breaking up multiple statements into separate lines
    Changing IF statement to IF/END IF block

    I plan to continually update this section here, turning it into an index for the new posts as they are posted. For now, here are some of the upcoming transforms you can expect:
    • Normalize indentation
    • Align equal signs vertically
    • Align Type & Union members
    • Add missing end quote
    • Make casing (upper/lower case) of variables, functions, etc, consistent with the way they're declared
    • Change Left$() + Mid$() pattern to StrInsert()
    • Change patterns like x = x+1 to x += 1
    • Normalize spacing around args
    • Break up extra long lines in a uniform way
    • Sort Select Case members
    • Add As Const or As Long for Select Case where applicable
    • Change patterns like: Local a as long, b as long to Local a, b As Long
    • Change single-line IF statement containing colons into a multi-line IF statement

    My actual list is much longer, but the above selection should give you an idea. Requests are also welcome.

    Useful links:

    http://www.ucalc.com/transform.html (Learn about and download uCalc Transform from here)
    http://www.ucalc.com/misc/pb/refactor.uc
    http://www.ucalc.com/misc/pb/sample.bas
    Last edited by Daniel Corbier; 4 Feb 2014, 03:31 PM. Reason: Updated index
    Daniel Corbier
    uCalc Fast Math Parser
    uCalc Language Builder
    sigpic

  • #2
    Inserting missing quote in a string literal

    Probably for historic reasons, PowerBASIC allows you to compile code containing string literals that have an opening quote but are missing a closing quote. For instance, PowerBASIC will compile the following two lines without complaint:

    Code:
    i$ = "This is a clean string literal."
    i$ = "This string literal has just one quote.
    Even though the second line is allowed, it is probably cleaner for every string literal to always have both quotes. This addition to the refactoring tool will find lines with a missing quote, and add the missing quote for you. In this example, I've loaded the refactor.uc transform, and opened up a file named sample.bas, which has several literals with missing quotes:



    I clicked the Transform button, and it inserted the missing quotes for me like this:

    Daniel Corbier
    uCalc Fast Math Parser
    uCalc Language Builder
    sigpic

    Comment


    • #3
      Shortcut for Global, Local, Static, and Threaded

      Today's refactoring code involves the shortcut notation for Global, Local, Static, and Threaded, which would turn a line like this:

      Code:
      Local Total As Extended, x As Extended, y As Extended, z As Extended
      To

      Code:
      Local Total, x, y, z As Extended
      The difficulty I encountered with this one is that there are usually various ways to do things with uCalc Transform, and I needed to decide which one was best. I wasn't sure which approach might be easiest for others to understand. So first let me discuss various other options I considered:

      Code:
      Local {etc} As {type:1}, {var} As {type:1}
      I wouldn't have created a separate pass for that one. But then I'd also want it to work equally with Global, and Static. So then I might have extended it to something like this:

      Code:
      { Local | Global | Static | Threaded } {etc} As {type:1}, {var} As {type:1}
      But then I realized that in some cases {type} can be two tokens instead of just one as in:

      Code:
      Local x As Long Ptr
      In this case I needed some kind of delimiter instead of expecting 1 token for {type}. The delimiter here will be a comma or end of line (new line {nl}).

      Code:
      { Local | Global | Static | Threaded } {etc} As {type}, {var} As {type} {delim: , | {nl}}
      The above line was getting long. It could have been broken up in two lines like this:

      Code:
      { Local | Global | Static | Threaded }
      {etc} As {type}, {var} As {type} {delim: , | {nl}}
      Or I could just handle it in a separate pass, which is what I did. Instead of including Local, Global, Static, and Threaded I make it exclude Dim (I'm not sure why PB doesn't allow this shortcut for Dim). I almost decided against this approach because I thought: What about Sub/Function parameters? As it turns out PB allows this shortcut for those as well.

      Anyway, I settled for the following pattern in a separate pass:

      Code:
      As {type}, {var} As {type} {delim: , | {nl}}
      One thing special about the line above is that by choosing the same pattern variable in several locations in the same pattern, it makes the pattern require the same match in those two locations. The actual match doesn't matter. What matters is that they must be the same. So if {type} matches Extended in one place, it must match Extended in the other location, whereas if it matches Byte in one place, it must match Byte in the other location. So a line like this would not match:

      Code:
      Local xx As Long, yy As String, zz As Integer
      Here's what it looks like:


      Notice an additional line to tell it to skip Dim, since Dim doesn't support this shortcut. It's very important to note that the "Pass once" property for this one was set to False. See the red asterisks (above & below):



      So given a line like:

      Code:
      Local Total As Extended, x As Extended, y As Extended, z As Extended
      It will reparse the same line repeatedly until no more changes are left.

      Code:
      Original ==> Local Total As Extended, x As Extended, y As Extended, z As Extended
      Modify 1 ==> Local Total, x As Extended, y As Extended, z As Extended
      Modify 2 ==> Local Total, x, y As Extended, z As Extended
      Modify 3 ==> Local Total, x, y, z As Extended
      After the third modification, the pattern no longer matches, and the parser merrily moves on to the next task. Pass Once is set to True by default. Care should be used when changing the default Pass Once property for a pattern to avoid an infinite reparsing loop.

      Here's the refactored sample code:
      (don't worry about the spacing around the commas; that's for another time)
      Daniel Corbier
      uCalc Fast Math Parser
      uCalc Language Builder
      sigpic

      Comment


      • #4
        I'm impressed, Daniel. I wrote something kinda-sorta-related for my own use with PB/DOS, and I know how complex parsing PB syntax can get. The orphan-quote thing is surprisingly hard to do. There are so many syntax variations possible, for example I prefer one variable per line, like

        DIM lValue AS LOCAL LONG
        DIM lResult AS LOCAL LONG

        The words local, global, and static never seemed like verbs to me. I'm I verbose kind of guy, as some have noted elsewhere.

        Anyway, nice work!
        "Not my circus, not my monkeys."

        Comment


        • #5
          I guess we could always send in a new feature suggestion to make "orphan" quotes cause compiles to fail.
          Michael Mattias
          Tal Systems Inc.
          Racine WI USA
          mmattias@talsystems.com
          http://www.talsystems.com

          Comment


          • #6
            I lost that argument more than once. Bob always insisted on backward compatibility, and orphan quotes go all the way back to MS BASIC.

            My first real-word experience with it came from a line of TB code like this:

            Code:
            PRINT "ERROR: You must enter a number in this field.          'user is an idiot
            "Not my circus, not my monkeys."

            Comment


            • #7
              Originally posted by Eric Pearson View Post
              I'm impressed, Daniel. I wrote something kinda-sorta-related for my own use with PB/DOS, and I know how complex parsing PB syntax can get. The orphan-quote thing is surprisingly hard to do. There are so many syntax variations possible, for example I prefer one variable per line, like

              DIM lValue AS LOCAL LONG
              DIM lResult AS LOCAL LONG

              The words local, global, and static never seemed like verbs to me. I'm I verbose kind of guy, as some have noted elsewhere.

              Anyway, nice work!
              Thanks Eric,

              In fact, the PB to C++ converter breaks up DIM statements into one variable per line as well, to make it easier to convert. uCalc Transform can modify text in various ways that can suite any programming style. Here's a transform that would change DIM statements to your style (as I understand it). I started writing a description of what the transform is supposed to do (like break down DIM satatements into one variable per line, etc), but the description seems more complicated than the transform itself. So instead of a descritpion, here's the transform, followed by a before/after example, and an explanation:

              Here's a link for this transform, so you don't have to manually type it all in, in case you want to play with it:
              http://www.ucalc.com/misc/pb/eric.uc

              Input
              Code:
              #DIM ALL
              
              GLOBAL abc AS LONG, xyz AS EXTENDED
              GLOBAL gWord1, gWord2, gWord3 AS WORD
              THREADED tItem AS DWORD, tNumber AS DOUBLE
              
              ' Some other code here
              %This = 123
              %That = 456
              
              Type Testing
                 Etc As Long
              End Type
              
              FUNCTION PBMAIN
                 DIM lValue AS LONG
                 LOCAL lResult AS LONG
                 DIM lTest AS LOCAL LONG
                 DIM x AS LONG, y AS LONG
                 STATIC b1, b2, b3 AS BYTE, iPtr1, iPtr2 AS INTEGER PTR
              END FUNCTION
              Output
              Code:
              #DIM ALL
              
              
              ' Some other code here
              %This = 123
              %That = 456
              
              Type Testing
                 Etc As Long
              End Type
              
              FUNCTION PBMAIN
                 DIM tItem AS THREADED DWORD
                 DIM tNumber AS THREADED DOUBLE
                 DIM gWord1 AS GLOBAL WORD
                 DIM gWord2 AS GLOBAL WORD
                 DIM gWord3 AS GLOBAL WORD
                 DIM abc AS GLOBAL LONG
                 DIM xyz AS GLOBAL EXTENDED
                 DIM lValue AS LOCAL LONG
                 DIM lResult AS LOCAL LONG
                 DIM lTest AS LOCAL LONG
                 DIM x AS LOCAL LONG
                 DIM y AS LOCAL LONG
                 DIM b1 AS STATIC BYTE
                 DIM b2 AS STATIC BYTE
                 DIM b3 AS STATIC BYTE
                 DIM iPtr1 AS STATIC INTEGER PTR
                 DIM iPtr2 AS STATIC INTEGER PTR
              END FUNCTION
              Explanation
              Here's a line by line explanation. First, it should be noted that the order of patterns is important, especially when several patterns may match the same text, in which case the patterns closest to the bottom are tried first. It moves up until it finds the right match.

              Code:
              Find: DIM {var} AS [LOCAL] {type}
              Replace: DIM {var} AS LOCAL {type}
              This can more or less be read like this: Match the word (or technically token) DIM followed by any word(s) (a variable name in this case represented by {var}) followed by the word AS, optionally followed by the word LOCAL (notice square brackets), followed by one or more words (our data type name).

              If the line doesn't have LOCAL, the replacement adds it, otherwise the replacement is the same. Either way, once the match is found, it moves on.

              Code:
              Find: {dim: GLOBAL|INSTANCE|LOCAL|STATIC|THREADED }
                    {var} AS {type}
              Replace: DIM {var} AS {dim} {type}
              This changes either GLOBAL or INSTANCE or LOCAL or STATIC or THREADED followed by a variable name followed by the word AS, followed by type, with DIM followed by the variable name, followed by whichever starting word was matched (GLOBAL, INSTANCE, LOCAL, STATIC, or THREADED denoted by pattern variable I named {dim}), followed by the type.

              I moved {var} AS {type} to another line just for aesthetics, but it's on the same logical line. If you actually needed a new line in the Find section, then you'd use {nl}. In the Replace section the ASCII character(s) for new line are treated literaly ({nl} can be used there as well).

              Code:
              Find: {dim: DIM|GLOBAL|INSTANCE|LOCAL|STATIC|THREADED }
                    {var1}, {more} AS {type}
              Replace: {dim} {var1} AS {type}
                          {dim} {more} AS {type}
              This one is similar to the previous one. However this one takes a DIM statement with multiple variables, and places the first one in a separate line, and the rest on a second line. The process is repeated (by setting the Pass Once property to False) until there is not more than one variable per line. Only when that is the case will the pattern above this one be considered.

              Code:
              Find: {dim: DIM|GLOBAL|INSTANCE|LOCAL|STATIC|THREADED }
                    {group} AS {type}, {more}
              Replace: {dim} {group} AS {type}
                          {dim} {more}
              This one is again similar to the previous one. But before separating the first variable from the rest, if there's a group of variables with the same type, those are matched first.

              Code:
              Find: {dim: GLOBAL|INSTANCE|THREADED} {variables}{nl}
                    [{etc~+}]
                    FUNCTION PBMAIN
              Replace: {etc}FUNCTION PBMAIN
                          {dim} {variables}
              Before GLOBAL, INSTANCE, or THREADED are matched with other patterns, this one moves thim into the PBMAIN area. Notice {nl}. Unike ordinary pattern variables with user-given names, like {variable} or {etc}, {nl} is a "keyword" that represents a new line. Actually, nothing is really hard-coded. The {nl} "keyword" is actually defined in the patterns.uc file as {#10} (ASCII character 10 for Line Feed). Two pattern variables ({variable} and {etc} in this case) cannot be adjacent to each other unless the number of tokens is specified. Otherwise they must be separated by a delimiter ({nl} in this case).

              The + in {etc} means to continue parsing until the next token, ignoring statement separator boundaries, in this case line feeds, otherwise it would stop at the end of the line. Line Feed is defined as such a boundary in patterns.uc.

              The ~ in {etc} tells it to ignore nested patterns in that area. Everything in that section will be treated as simple tokens.

              There are usually various ways of accomplishing the same thing in uCalc Transform, and the above is just one way. You can tweak it further or re-do it differently according to your tastes.
              Daniel Corbier
              uCalc Fast Math Parser
              uCalc Language Builder
              sigpic

              Comment


              • #8
                Changing Left$+Mid$ to StrInsert$

                Do you have any line of source code that looks something like this?

                Code:
                MyString$ = Left$(MyString$, x-1) + "Text to insert" + Mid$(MyString$, x)
                If so, you can optimize it for speed (and legibility) by changing it to:

                Code:
                MyString$ = StrInsert$(MyString$, "Text to insert", x)
                Apparently StrInsert$ was added as a new feature starting with PB/CC 3. It seems like I had already been using the Left$+Mid$ combo before that to accomplish the same thing. Then I realized how much faster StrInsert$ was. But finding such lines in my source code could have been like finding a needle in a haystack. An ordinary keyword search for Left$ or Mid$ in my code returns too many hits, only a few of which match the StrInsert$ pattern. I might have left such a tedious search for another time (i.e. maybe never). However, given this simple pattern, uCalc Transform highlighted them, like in the following example:



                Once you're ready, just click Transform, and voila! Notice how it catches only the Left$+Mid$ combos that truly match the pattern. The others are left alone. Here's the output:

                Code:
                Global b1 , b2 As Byte, xPtr , yPtr As Long Ptr
                
                Function PBMain
                   Local xx As Long, yy As String, zz As Integer
                   Local Total , x , y , z As Extended
                
                   Print "PB compiles lines even if they are missing a closing"
                   Print "quote.  Refactor inserts the missing quotes for you."
                
                   ' Only two lines below match the pattern for optimizing with StrInsert$
                   MyString$ = Left$(MyString$, x-1) + UCase$(abc$+".") + Mid$(MyString$, y)
                   MyString$ = StrInsert$(MyString$, UCase$(abc$+"."), x)
                   MyString$ = Left$(MyString$, x-1) + UCase$(abc$+".") + Mid$(OtherString$, x)
                   MyString$ = StrInsert$(MyString$, UCase$(abc$+"."), Len(q$))
                End Function
                Daniel Corbier
                uCalc Fast Math Parser
                uCalc Language Builder
                sigpic

                Comment


                • #9
                  Breaking up multiple statements into separate lines

                  There may have been a time, in a previous century, when it might have been acceptable to cram many statements all onto one line, separating them with a colon (":"). Although it is still allowed, perhaps your code might look a little cleaner if you break each statement into a separate line. Today's pattern simply involves replacing a : with a new line, represented by {nl}.



                  Though this pattern will do what we want, indentation for the newly separated statements will be off. This is also true for other upcoming simple patterns I plan to discuss. So at the bottom of the above transform, I've added another pattern that applies indentation transformations from a file named beautify.uc. Indentation involves multiple steps and details about that are for another day. If you don't care about indentation for now, remove the checkmark from the last pattern. If you plan to use it, be sure to download beautify.uc into the same directory as refactor.uc.

                  Before:
                  Code:
                  Global b1 As Byte, b2 As Byte, xPtr As Long Ptr, yPtr As Long Ptr
                  
                  Function PBMain
                     Local xx As Long, yy As String, zz As Integer
                     Local Total As Extended, x As Extended, y As Extended, z As Extended
                  
                     Print "PB compiles lines even if they are missing a closing
                     Print "quote.  Refactor inserts the missing quotes for you.
                  
                     ' Only two lines below match the pattern for optimizing with StrInsert$
                     MyString$ = Left$(MyString$, x-1) + UCase$(abc$+".") + Mid$(MyString$, y)
                     MyString$ = Left$(MyString$, x-1) + UCase$(abc$+".") + Mid$(MyString$, x)
                     MyString$ = Left$(MyString$, x-1) + UCase$(abc$+".") + Mid$(OtherString$, x)
                     MyString$ = Left$(MyString$, Len(q$)-1) + UCase$(abc$+".") + Mid$(MyString$, Len(q$))
                  
                     Incr xx : z = sin(x+1) * 2 : x = x + 25 : Print "Ok"
                     For n& = 1 To 10 : Print n& : Next
                  End Function
                  After:
                  Code:
                  Global b1, b2 As Byte, xPtr, yPtr As Long Ptr
                  
                  Function PBMain
                     Local xx As Long, yy As String, zz As Integer
                     Local Total, x, y, z As Extended
                  
                     Print "PB compiles lines even if they are missing a closing"
                     Print "quote.  Refactor inserts the missing quotes for you."
                  
                     ' Only two lines below match the pattern for optimizing with StrInsert$
                     MyString$ = Left$(MyString$, x - 1) + UCase$(abc$ + ".") + Mid$(MyString$, y)
                     MyString$ = StrInsert$(MyString$, UCase$(abc$ + "."), x)
                     MyString$ = Left$(MyString$, x - 1) + UCase$(abc$ + ".") + Mid$(OtherString$, x)
                     MyString$ = StrInsert$(MyString$, UCase$(abc$ + "."), Len(q$))
                  
                     Incr xx 
                     z = sin(x + 1) * 2 
                     x = x + 25 
                     Print "Ok"
                     For n& = 1 To 10 
                        Print n& 
                     Next
                  End Function
                  Daniel Corbier
                  uCalc Fast Math Parser
                  uCalc Language Builder
                  sigpic

                  Comment


                  • #10
                    Originally posted by Daniel Corbier View Post
                    ... here are some of the upcoming transforms you can expect:
                    ...
                    * Add As Const or As Long for Select Case where applicable
                    ...
                    As Long, where applicable always improves (or at least doesn’t adversely affect) both speed and size of the executable. As Const, on the other hand, might significantly increase the size of the executable.
                    Politically incorrect signatures about immigration patriots are forbidden. Googling “immigration patriots” is forbidden. Thinking about Googling ... well, don’t even think about it.

                    Comment


                    • #11
                      Originally posted by Mark Hunter View Post
                      As Long, where applicable always improves (or at least doesn’t adversely affect) both speed and size of the executable. As Const, on the other hand, might significantly increase the size of the executable.
                      Mark, thanks for the feedback. I haven't posted an implementation for As Const just yet. But your observation is absolutely correct. This refactoring tool is being posted here to explain ways of using uCalc Transform. Once you understand how it works, you can customize the refactoring tool to your specific needs. In general, you can remove the checkmark from patterns you want uCalc Transform to ignore. Or you can click X to delete a specific pattern altogether. Or you can edit it to your liking. It should even be possible to have it post a comment next to As Const with the total amount of bytes it costs, or even have it perform the change only if it takes less than n number of bytes. The possibilities are endless.

                      The transform is being posted at http://www.ucalc.com/misc/pb/refactor.uc so you can download it, experiment with it, and modify it as needed.
                      Daniel Corbier
                      uCalc Fast Math Parser
                      uCalc Language Builder
                      sigpic

                      Comment


                      • #12
                        Changing IF statement to IF/END IF block

                        Previously, I added a pattern that changes a colon to a new line. It works well in many cases. However, what if the colons represent multiple statements that belong to the same single line IF statement? Then you have a problem. For instance, if you used the previous pattern alone, then a statement like:

                        Code:
                        If x > 1 And z < x Then Incr xx : z = sin(x+1) * 2 : x = x + 25 : Print "Ok"
                        would incorrectly be converted to:

                        Code:
                        If x > 1 And z < x Then Incr xx 
                        z = sin(x + 1) * 2 
                        x = x + 25 
                        Print "Ok"
                        How would you turn an IF statement to an IF/END IF block, and insert the statements within the IF/END IF block? Simple. Here's the uCalc pattern for this:

                        Code:
                        Find:    If {Condition} Then {Code}:{MoreCode}
                        Replace: If {Condition} Then
                                    {Code}
                                    {MoreCode}
                                 End If
                        Do this and you're all set? Almost. This would give you:

                        Code:
                        If x > 0 And y > x + 1 Then
                           Incr xx
                           z = sin(x + 1) * 2 : x = x + 25 : Print "Ok"
                        End If
                        which is almost, but not quite the result you want. Now that we have an IF/END IF block, what about the colon-separated multi-statements within that block? You also want those to be broken into separate lines as per the pattern from the previous message in this thread. There are multiple ways to accomplish this. I can set the "Pass once" property to "False" in that pattern so that it will pass over this same block of text again, at which point it will find the colons within the block and replace them with a new line. A different approach, which is the one I'll use here, is to append a % to the {MoreCode} pattern variable. This tells uCalc not just to find text that matches {MoreCode}, but to also expand that text before inserting it in the Replace section of the pattern. So the nested colons will be changed to new lines. If the IF statement has no colons at all, this pattern intentionally leaves it alone. So our final pattern will change:

                        Code:
                        If z = 0 Or Total = 123 Then Print "Ok"
                        If x > 0 And y > x+1 Then Incr xx : z = sin(x+1) * 2 : x = x + 25 : Print "Ok"
                        to:

                        Code:
                        If z = 0 Or Total = 123 Then Print "Ok"
                        If x > 0 And y > x + 1 Then
                           Incr xx
                           z = sin(x + 1) * 2 
                           x = x + 25 
                           Print "Ok"
                        End If
                        And here's what the transform now looks like:
                        Daniel Corbier
                        uCalc Fast Math Parser
                        uCalc Language Builder
                        sigpic

                        Comment

                        Working...
                        X