Announcement

Collapse
No announcement yet.

Still need help on String Manipulation property

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Still need help on String Manipulation property

    I am still working on a solid proof 2 delimited SPLIT command. I am sure someone has made one before let me describe in detail what I'm trying to do:
    let assume that:
    <A> is the first delimeter
    </a> is the second delimeter

    Input String
    ---------------------
    <A><A>hi how are you</a><hello><I am fine><A>really?</a><A>yes really!!!!<ggg></a>

    First Array:
    ---------------------
    <A>hi how are you</a>


    Second Array:
    ---------------------
    really?


    Third Array:
    ---------------------
    yes really!!!!<ggg>


    What I'm wanting to accomplish is return the text between two delimiters in a string but the catch is that there can be a delimeter within a delimeter like this:

    <A><A>hi</a>

    <A> is one delimeter
    </a> is the other delimeter

    I would get the string:

    <A>hi

    see with a normal split command it would return this instead:

    hi

    very bad. I hope someone can help me. Thanks.

    -------------
    -Greg



    [This message has been edited by Gregery D Engle (edited June 22, 2000).]
    -Greg
    [email protected]
    MCP,MCSA,MCSE,MCSD

  • #2
    Code:
    This is off the top of my head
    Assume St is the TOTAL String of the entire length.
    
    for x = 1 to len(St)
       StArray(x) = Parse$(St,"</a>",x)
       StArray(x) = Remove$(StArray(x),"<a>") 'in case that 2nd tag shows up
    Next

    That's my basic t hought, you will have to expand on that but it should work, that's how I do it with the "|" sign in my decryption process and it works flawless...


    Scott


    ------------------
    Scott
    mailto:[email protected][email protected]</A>
    MCSE, MCP+Internet
    Scott Turchin
    MCSE, MCP+I
    http://www.tngbbs.com
    ----------------------
    True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

    Comment


    • #3
      Originally posted by Scott Turchin:
      <font face="Courier New, Courier" size="3"><pre>
      This is off the top of my head
      Assume St is the TOTAL String of the entire length.

      for x = 1 to len(St)
      StArray(x) = Parse$(St,"</a>",x)
      StArray(x) = Remove$(StArray(x),"<a>") 'in case that 2nd tag shows up
      Next
      </pre></font>


      That's my basic t hought, you will have to expand on that but it should work, that's how I do it with the "|" sign in my decryption process and it works flawless...


      Scott


      I think that would work if it was one delimeter but I don't think so with two. I will try but I have tried almost everything and everytime I think I got it, it messes up

      ------------------
      -Greg
      -Greg
      [email protected]
      MCP,MCSA,MCSE,MCSD

      Comment


      • #4
        Greg --

        > I think that would work if it was one
        > delimeter but I don't think so with two.

        Use the REPLACE function to replace delimiter string #1 with CHR$(0), then do the same thing with delimiter #2. Then you're only dealing with one delimiter.

        Or, if you need to distinguish between the two, use CHR$(0) for one and CHR$(1) for the other, then use PARSE$(...ANY CHR$(0,1))

        -- Eric


        ------------------
        Perfect Sync: Perfect Sync Development Tools
        Email: mailto:[email protected][email protected]</A>



        [This message has been edited by Eric Pearson (edited June 22, 2000).]
        "Not my circus, not my monkeys."

        Comment


        • #5
          Originally posted by Eric Pearson:
          Greg --

          > I think that would work if it was one
          > delimeter but I don't think so with two.

          Use the REPLACE function to replace delimiter string #1 with CHR$(0), then do the same thing with delimiter #2. Then you're only dealing with one delimiter.

          Or, if you need to distinguish between the two, use CHR$(0) for one and CHR$(1) for the other, then use PARSE$(...ANY CHR$(0,1))

          -- Eric

          That would work 99% of the time but what about a string like this?

          <A><A>hi how are you<A> I am fine</a>

          first delimeter is <A>
          second delimeter is </a>

          I would want the string:

          <A>hi how are you<A> I am fine


          ------------------
          -Greg
          -Greg
          [email protected]
          MCP,MCSA,MCSE,MCSD

          Comment


          • #6
            Something like this
            Code:
            k2 = 1
            Do    
               If k1 > Len(Txt$) Then Exit Do
               k1 = Instr(k2, Txt$, "<A>"): If k1 = 0 Then El(...) = Mid$(Txt$, k1): Exit Do
               k2 = Instr(k1 + 3, Txt$, "</A>"): If k2 = 0 Then k2 = Len(Txt) + 1
               El(...) = Mid$(Txt$, k1 + 3, k2 - k1 - 3)
               k1 = k2 + 4
            Loop
            [This message has been edited by Semen Matusovski (edited June 22, 2000).]

            Comment


            • #7
              It seems to me that Scott's solution is right. The <A> isn't really a "delimiter", right? It can be part of a returned string.

              Your first example isn't very clear... the first string starts with <A> and </A> but the second and third don't. Is that right?

              Maybe you could post a series of sample strings and the results you would like to see, so we can understand the logic better.

              -- Eric


              ------------------
              Perfect Sync: Perfect Sync Development Tools
              Email: mailto:[email protected][email protected]</A>



              [This message has been edited by Eric Pearson (edited June 22, 2000).]
              "Not my circus, not my monkeys."

              Comment


              • #8
                My example only removes <a> from the StArray(x) NOT the St, so hence whatever it takes to remove it since it is the final formatted string, leaving the original intact for parsing of the next loop.

                And dont' forget to check for UCASE$ or just do it like such:

                StArray(x) = Remove$(UCASE$(St),"<A>") to make sure you don't overlook the obvious...


                Scott


                ------------------
                Scott
                mailto:[email protected][email protected]</A>
                MCSE, MCP+Internet
                Scott Turchin
                MCSE, MCP+I
                http://www.tngbbs.com
                ----------------------
                True Karate-do is this: that in daily life, one's mind and body be trained and developed in a spirit of humility; and that in critical times, one be devoted utterly to the cause of justice. -Gichin Funakoshi

                Comment


                • #9
                  Originally posted by Eric Pearson:


                  Maybe you could post a series of sample strings and the results you would like to see, so we can understand the logic better.

                  -- Eric

                  I appoligize about the confusion. Here is some sample strings and the results that I am trying to achieve:

                  string1$="hello how are you<A>today?</A>I am just fine</A>"
                  string2$="<A>hi how are you?<A> I am just fine</A>"

                  delimeter1 = <A>
                  delimeter2 = </A>

                  STRING1$ response:

                  "today?</A>I am just fine"

                  STRING2$ response:

                  "hi how are you?<A> I am just fine"


                  notice that a delimeter within a delimeter is being ignored. Kinda like the Double Quoted String like this:

                  ""hi how are you""

                  any suggestions?


                  ------------------
                  -Greg
                  -Greg
                  [email protected]
                  MCP,MCSA,MCSE,MCSD

                  Comment


                  • #10
                    Originally posted by Semen Matusovski:
                    Something like this
                    k2 = 1
                    Do
                    If k1 > Len(Txt$) Then Exit Do
                    k1 = Instr(k2, Txt$, "<A>"): If k1 = 0 Then El(...) = Mid$(Txt$, k1): Exit Do
                    k2 = Instr(k1 + 3, Txt$, "</A>"): If k2 = 0 Then k2 = Len(Txt) + 1
                    El(...) = Mid$(Txt$, k1 + 3, k2 - k1 - 3)
                    k1 = k2 + 4
                    Loop
                    Your code works pretty good but fails on this string:

                    "<A><A>HELLO how <A>are </A>you today</A>"

                    I am wanting the response:

                    <A>HELLO how <A>are </A>you today

                    also the thing is that I want to be able to parse a document that is similar to this format so the it could be unlimited amount of fields. What I am actually wanting to do is have a function that 1) splits all the delimited fields up in an array/ or a string (that I can split later) Any ideas? I've been messing with this for a long time now and I just can't seem to get it working right.

                    ------------------
                    -Greg

                    [This message has been edited by Gregery D Engle (edited June 22, 2000).]
                    -Greg
                    [email protected]
                    MCP,MCSA,MCSE,MCSD

                    Comment


                    • #11
                      Not that I understand your goal, but to get that result, you have
                      to let delimiter 1 parse from the start, and delimiter 2 from the
                      end of the string. That will do it. Use INSTR(-1.. for backwards
                      search. First parse will give the result you want. Example
                      Code:
                        LOCAL MainString AS STRING, Delim1 AS STRING, Delim2 AS STRING, _
                              Pos1 AS LONG, Pos2 AS LONG
                                          
                        MainString = "<A><A>HELLO how <A>are </A>you today</A>"
                        Delim1 = "<A>" : Delim2 = "</A>"
                                    
                        Pos1 = INSTR(1, MainString, Delim1) + LEN(Delim1)
                        Pos2 = INSTR(-1, MainString, Delim2)
                       
                        DO WHILE Pos1 > 0 AND Pos2 > 0
                           MainString = MID$(MainString, Pos1, Pos2 - Pos1)
                           MSGBOX MainString 
                           Pos1 = INSTR(1, MainString, Delim1) + LEN(Delim1)
                           Pos2 = INSTR(-1, MainString, Delim2)
                        LOOP
                      As for the array, I don't understand either, but if you
                      put it inside the loop above, you can catch the shrinking
                      MainString in each iteration.


                      ------------------

                      Comment


                      • #12
                        REGEXPR should work to extract HTML or XML tags and tag values.

                        I am writing this on-line and untested, but something like...

                        <font face="Courier New, Courier" size="3"><pre>

                        StartTag = "<a>" ' <-- you can build the tags dynamically
                        EndTag = = "</a>" ' <-- with string concatenation

                        SearchFor = StartTag & "[.]+" & EndTag

                        REGEXPR SearchFor in textToSearch AT StartPos TO Posvar, LenVar
                        IF PosVar THEN
                        TagValue = MID$(TextToSearch, PosVar + LEN(StartTag) +1,_ LenVar - LEN(StartTag) - LEN(EndTag))
                        END IF
                        </pre></font>

                        Give it a shot and see what happens.

                        I posted some code using REGEXPR in the source code forum a couple of months ago which may help out a bit.

                        MCM



                        [This message has been edited by Michael Mattias (edited June 23, 2000).]
                        Michael Mattias
                        Tal Systems Inc. (retired)
                        Racine WI USA
                        [email protected]tems.com
                        http://www.talsystems.com

                        Comment


                        • #13
                          Originally posted by Borje Hagsten:
                          Not that I understand your goal, but to get that result, you have
                          to let delimiter 1 parse from the start, and delimiter 2 from the
                          end of the string. That will do it. Use INSTR(-1.. for backwards
                          search. First parse will give the result you want. Example
                          Your code was probably the best I have seen but it failed on this string:

                          "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"

                          it won't return the : hi

                          I'm sure I can modify it I think...



                          ------------------
                          -Greg
                          -Greg
                          [email protected]
                          MCP,MCSA,MCSE,MCSD

                          Comment


                          • #14
                            Maybe I'm missing something, but I cannot determine your "rules" for identifying which tags form a pair...

                            What is the purpose of this code Greg?



                            ------------------
                            Lance
                            PowerBASIC Support
                            mailto:[email protected][email protected]</A>
                            Lance
                            mailto:[email protected]

                            Comment


                            • #15
                              Greg --
                              If you try to write HTML parser, could be http://www.freecode.com/cgi-bin/viewproduct.pl?3702
                              will be useful (Perl)

                              ------------------

                              Comment


                              • #16
                                Lance:

                                I'm looking for a solid XML/HTML parser for an internal project I'm working on. What I'm wanting to do is split all the tags it finds in an array and allow someone to dynamically read/change the content and then save it later (which it will reassemble the array) thats why I'm looking for this. It is true that a valid html/xml document woudln't have out of place delemeters but I want to make sure it works perfect.


                                Semen:

                                Thanks, I'll look into it.


                                ------------------
                                -Greg
                                -Greg
                                [email protected]
                                MCP,MCSA,MCSE,MCSD

                                Comment


                                • #17
                                  I think you need to parse out everything into an array and then
                                  keep separate track of tags, so you can let the user edit the
                                  text between the tags freely. The example line would loook like:

                                  "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"

                                  Parsed to:

                                  Arr(0) = "<A>"
                                  Arr(1) = "<A>"
                                  Arr(2) = "HELLO how "
                                  Arr(3) = "<A>"
                                  Arr(4) = "are "
                                  Arr(5) = "</A>"
                                  Arr(6) = "you today"
                                  Arr(7) = "</A>"
                                  Arr(8) = "I am fine!!!"
                                  Arr(9) = "<A>"
                                  Arr(10) = "hi"
                                  Arr(11) = "</A>"

                                  Okay, so how do we get there? Try this:
                                  [CODE]
                                  LOCAL MainString AS STRING, Result AS STRING, Delim1 AS STRING, Delim2 AS STRING, _
                                  Arr() AS STRING, Pos1 AS LONG, Pos2 AS LONG, I AS LONG, J AS LONG

                                  MainString = "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"
                                  I = 0 : REDIM Arr(0)
                                  Delim1 = "<" : Delim2 = ">"

                                  Pos1 = INSTR(MainString, Delim1) 'Get first delim1 "<"
                                  Pos2 = INSTR(MainString, Delim2) + 1 'get match ">"

                                  DO WHILE Pos1 > 0 AND Pos2 > 0
                                  Arr(I) = MID$(MainString, Pos1, Pos2 - Pos1) 'put in array
                                  INCR I : REDIM PRESERVE Arr(I) 'redim array
                                  Pos1 = INSTR(Pos2, MainString, Delim1) 'get next delim1
                                  IF Pos1 > Pos2 + 1 THEN 'if not next to ">"
                                  Arr(I) = MID$(MainString, Pos2, Pos1 - Pos2) 'place text into array
                                  INCR I : REDIM PRESERVE Arr(I) 'redim array
                                  END IF
                                  Pos2 = INSTR(Pos1, MainString, Delim2) + 1 'get delim2 pos
                                  LOOP

                                  Arr(2) = "GOODBYE how " 'change text

                                  Result = ""
                                  FOR J = 0 TO I - 1
                                  Result = Result + Arr(J) 'build new string
                                  NEXT

                                  MSGBOX MainString & CHR$(10) & Result 'show result
                                  [code]

                                  This code parses the string and places every item into an element.
                                  I already knew what element 2 looked like, so I changed it just
                                  to show you that this allows the user to edit the contents and
                                  then the program can build up a new string between the same tags.

                                  A tag should always start with "<" and end with ">", so in order
                                  to keep track of them, you can use:
                                  Code:
                                    IF LEFT$(Arr(J), 1) <> "<" AND RIGHT$(Arr(J), 1) <> ">" THEN
                                       'text = the element can be edited
                                    END IF

                                  ------------------

                                  Comment

                                  Working...
                                  X