Announcement

Collapse
No announcement yet.

REGEXPR Help

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • REGEXPR Help

    I'm not sure if I'm understanding the documentation incorrectly but I am attempting to capture a specific pattern with regular expressions and it does not seem to work. The documentation regarding the feature I am trying to use is somewhat contradictory and vague. The expression I'm trying to use is as follows:

    Code:
    mask$ = "(IA=[0-9]+ )+"
    The documentation says
    +
    (plus) Specifies that one or more matches of the preceding sub-pattern are allowed. Cannot be used with a Tag.
    If I use the above code though it refuses to match multiple instances of the pattern. The description of + says it works on a sub-pattern which later in the documentation is claimed to be a pattern inside parens but then says it can't be used with a Tag. The difference between the two is not clearly made.

    The string I'm matching on looks like this:
    Code:
    "AZ=1 AZ=5 DE=8 HI=5 HI=9 IA=5 IA=6 IA=11 NY=1 NY=4 WI=3 WI=8 "
    If there is a better way to do this in PowerBASIC I'm open to that but I'm pretty baffled that I can't utilize REGEXPR to match the subset "IA=5 IA=6 IA=11 " and have it give me the starting position and length so that I can pull it out of the string using parse. Does PowerBASIC really not allow you to match an arbitrary repetition of a sub-pattern or am I somehow adding the quantifier incorrecly?

  • #2
    You should not be using parens around the whole thing as that makes it a tag, which invalidates the use of classes.

    So "in theory" what should work is

    mask = "[IA=[0-9]+]\x20]+" ' add \x20 to account for the space

    But if that doesn't work (me and PB REGEXPR go way back, and we still have "relationship issues") you can try getting the "one or more numeric digits" another way...

    mask = "IA=[0-9][0-9]?[0-9]?[0-9]?"
    IA= literal
    [0-9] = exactly one numeric digit
    [0-9]? = zero or one numeric digits

    So above will find "IA=" plus one to four numeric digits. Of course, so should IA=[0-9]+) .. and you can loop for additional occurrences as shown in REGEXPR and REGREPL demo January 16, 2002

    I used this to get "alpha string followed by [n to m numeric digits] for a client application; but I did not use it with tags nor did I try to "get 'em all with one pass."

    MCM
    Last edited by Michael Mattias; 14 May 2009, 08:43 AM.
    Michael Mattias
    Tal Systems Inc. (retired)
    Racine WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Come tho think of it, looping ain't so bad, as it will find all the occurences even if they are not contiguous.
      Michael Mattias
      Tal Systems Inc. (retired)
      Racine WI USA
      [email protected]
      http://www.talsystems.com

      Comment


      • #4
        Are you trying to get the entire block of the string that says "IA=5 IA=6 IA=11"? If so, and the number of times IA= appears is not consistent, I can't think of a way to do it using a single REGEXPR.

        You are correct that the documentation for REGEXPR is kind of confusing but I'm pretty sure that you can't put that plus sign after the closing parenthesis. It definitely says that plus cannot be used with tags and parenthesis denote tags. Unfortunately, it also says "one or more matches of the preceding sub-pattern" but I'm pretty sure they meant character class.

        By the way, by using a space at the end of your mask, if IA=42 appeared at the end of your string with no trailing space, the REGEXPR would never find that particular value. You would need to add a |$ to the end of the mask to match either a space or an end of line character/end of string. You also don't need the parenthesis as you never reference that tag again using \01 later in your mask and you aren't grouping together a multi-character match to use with the OR operator.

        If you use a mask of "IA=[0-9]+ |$", you would need to loop like this to get all the values:

        Code:
        sText = "AZ=1 AZ=5 DE=8 HI=5 HI=9 IA=5 IA=6 IA=11 NY=1 NY=4 WI=3 WI=8 "
        lIndex = 1
        DO
            REGEXPR "IA=[0-9]+ |$" IN sText AT lIndex TO lStart, lLength
            IF lStart > 0 THEN
                ' MID$(sText, lStart, lLength) will contain the first IA=n after lIndex
                MSGBOX MID$(sText, lStart, lLength)
                lIndex = lStart + lLength
            END IF
        LOOP UNTIL lStart = 0
        There are a number of ways to get the data you want without using REGEXPR as well. You could do a loop using PARSE$ like so:

        Code:
        sText = "AZ=1 AZ=5 DE=8 HI=5 HI=9 IA=5 IA=6 IA=11 NY=1 NY=4 WI=3 WI=8 "
        FOR lIndex = 1 TO PARSECOUNT(sText, $SPC)
            sTemp = PARSE$(sText, lIndex)
            IF LEFT$(sTemp, 3) = "IA=" THEN
                ' VAL(MID$(sText, 4)) is the value
                MSGBOX sTemp
            END IF
        NEXT lIndex
        If you know all the IAs will be grouped together, as in your example, you could using INSTR to find the start and end of the block like so:

        Code:
        sText = "AZ=1 AZ=5 DE=8 HI=5 HI=9 IA=5 IA=6 IA=11 NY=1 NY=4 WI=3 WI=8 "
        lStart = INSTR(sText, "IA=")
        lTemp = INSTR(-1, sText, "IA=")
        lEnd = INSTR(lTemp, sText, $SPC)
        IF lEnd = 0 THEN
            lLength = LEN(sText) - lStart + 1
        ELSE
            lLength = lEnd - lStart
        END IF
        MSGBOX MID$(sText, lStart, lLength)
        I hope all these samples actually work and that it is of some help.
        Jeff Blakeney

        Comment

        Working...
        X