Announcement

Collapse
No announcement yet.

Still need help on String Manipulation property

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Borje Hagsten
    replied
    I think you need to parse out everything into an array and then
    keep separate track of tags, so you can let the user edit the
    text between the tags freely. The example line would loook like:

    "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"

    Parsed to:

    Arr(0) = "<A>"
    Arr(1) = "<A>"
    Arr(2) = "HELLO how "
    Arr(3) = "<A>"
    Arr(4) = "are "
    Arr(5) = "</A>"
    Arr(6) = "you today"
    Arr(7) = "</A>"
    Arr(8) = "I am fine!!!"
    Arr(9) = "<A>"
    Arr(10) = "hi"
    Arr(11) = "</A>"

    Okay, so how do we get there? Try this:
    [CODE]
    LOCAL MainString AS STRING, Result AS STRING, Delim1 AS STRING, Delim2 AS STRING, _
    Arr() AS STRING, Pos1 AS LONG, Pos2 AS LONG, I AS LONG, J AS LONG

    MainString = "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"
    I = 0 : REDIM Arr(0)
    Delim1 = "<" : Delim2 = ">"

    Pos1 = INSTR(MainString, Delim1) 'Get first delim1 "<"
    Pos2 = INSTR(MainString, Delim2) + 1 'get match ">"

    DO WHILE Pos1 > 0 AND Pos2 > 0
    Arr(I) = MID$(MainString, Pos1, Pos2 - Pos1) 'put in array
    INCR I : REDIM PRESERVE Arr(I) 'redim array
    Pos1 = INSTR(Pos2, MainString, Delim1) 'get next delim1
    IF Pos1 > Pos2 + 1 THEN 'if not next to ">"
    Arr(I) = MID$(MainString, Pos2, Pos1 - Pos2) 'place text into array
    INCR I : REDIM PRESERVE Arr(I) 'redim array
    END IF
    Pos2 = INSTR(Pos1, MainString, Delim2) + 1 'get delim2 pos
    LOOP

    Arr(2) = "GOODBYE how " 'change text

    Result = ""
    FOR J = 0 TO I - 1
    Result = Result + Arr(J) 'build new string
    NEXT

    MSGBOX MainString & CHR$(10) & Result 'show result
    [code]

    This code parses the string and places every item into an element.
    I already knew what element 2 looked like, so I changed it just
    to show you that this allows the user to edit the contents and
    then the program can build up a new string between the same tags.

    A tag should always start with "<" and end with ">", so in order
    to keep track of them, you can use:
    Code:
      IF LEFT$(Arr(J), 1) <> "<" AND RIGHT$(Arr(J), 1) <> ">" THEN
         'text = the element can be edited
      END IF

    ------------------

    Leave a comment:


  • Gregery D Engle
    replied
    Lance:

    I'm looking for a solid XML/HTML parser for an internal project I'm working on. What I'm wanting to do is split all the tags it finds in an array and allow someone to dynamically read/change the content and then save it later (which it will reassemble the array) thats why I'm looking for this. It is true that a valid html/xml document woudln't have out of place delemeters but I want to make sure it works perfect.


    Semen:

    Thanks, I'll look into it.


    ------------------
    -Greg

    Leave a comment:


  • Semen Matusovski
    replied
    Greg --
    If you try to write HTML parser, could be http://www.freecode.com/cgi-bin/viewproduct.pl?3702
    will be useful (Perl)

    ------------------

    Leave a comment:


  • Lance Edmonds
    replied
    Maybe I'm missing something, but I cannot determine your "rules" for identifying which tags form a pair...

    What is the purpose of this code Greg?



    ------------------
    Lance
    PowerBASIC Support
    mailto:[email protected][email protected]</A>

    Leave a comment:


  • Gregery D Engle
    replied
    Originally posted by Borje Hagsten:
    Not that I understand your goal, but to get that result, you have
    to let delimiter 1 parse from the start, and delimiter 2 from the
    end of the string. That will do it. Use INSTR(-1.. for backwards
    search. First parse will give the result you want. Example
    Your code was probably the best I have seen but it failed on this string:

    "<A><A>HELLO how <A>are </A>you today</A>I am fine!!!<A>hi</A>"

    it won't return the : hi

    I'm sure I can modify it I think...



    ------------------
    -Greg

    Leave a comment:


  • Michael Mattias
    replied
    REGEXPR should work to extract HTML or XML tags and tag values.

    I am writing this on-line and untested, but something like...

    <font face="Courier New, Courier" size="3"><pre>

    StartTag = "<a>" ' <-- you can build the tags dynamically
    EndTag = = "</a>" ' <-- with string concatenation

    SearchFor = StartTag & "[.]+" & EndTag

    REGEXPR SearchFor in textToSearch AT StartPos TO Posvar, LenVar
    IF PosVar THEN
    TagValue = MID$(TextToSearch, PosVar + LEN(StartTag) +1,_ LenVar - LEN(StartTag) - LEN(EndTag))
    END IF
    </pre></font>

    Give it a shot and see what happens.

    I posted some code using REGEXPR in the source code forum a couple of months ago which may help out a bit.

    MCM



    [This message has been edited by Michael Mattias (edited June 23, 2000).]

    Leave a comment:


  • Borje Hagsten
    replied
    Not that I understand your goal, but to get that result, you have
    to let delimiter 1 parse from the start, and delimiter 2 from the
    end of the string. That will do it. Use INSTR(-1.. for backwards
    search. First parse will give the result you want. Example
    Code:
      LOCAL MainString AS STRING, Delim1 AS STRING, Delim2 AS STRING, _
            Pos1 AS LONG, Pos2 AS LONG
                        
      MainString = "<A><A>HELLO how <A>are </A>you today</A>"
      Delim1 = "<A>" : Delim2 = "</A>"
                  
      Pos1 = INSTR(1, MainString, Delim1) + LEN(Delim1)
      Pos2 = INSTR(-1, MainString, Delim2)
     
      DO WHILE Pos1 > 0 AND Pos2 > 0
         MainString = MID$(MainString, Pos1, Pos2 - Pos1)
         MSGBOX MainString 
         Pos1 = INSTR(1, MainString, Delim1) + LEN(Delim1)
         Pos2 = INSTR(-1, MainString, Delim2)
      LOOP
    As for the array, I don't understand either, but if you
    put it inside the loop above, you can catch the shrinking
    MainString in each iteration.


    ------------------

    Leave a comment:


  • Gregery D Engle
    replied
    Originally posted by Semen Matusovski:
    Something like this
    k2 = 1
    Do
    If k1 > Len(Txt$) Then Exit Do
    k1 = Instr(k2, Txt$, "<A>"): If k1 = 0 Then El(...) = Mid$(Txt$, k1): Exit Do
    k2 = Instr(k1 + 3, Txt$, "</A>"): If k2 = 0 Then k2 = Len(Txt) + 1
    El(...) = Mid$(Txt$, k1 + 3, k2 - k1 - 3)
    k1 = k2 + 4
    Loop
    Your code works pretty good but fails on this string:

    "<A><A>HELLO how <A>are </A>you today</A>"

    I am wanting the response:

    <A>HELLO how <A>are </A>you today

    also the thing is that I want to be able to parse a document that is similar to this format so the it could be unlimited amount of fields. What I am actually wanting to do is have a function that 1) splits all the delimited fields up in an array/ or a string (that I can split later) Any ideas? I've been messing with this for a long time now and I just can't seem to get it working right.

    ------------------
    -Greg

    [This message has been edited by Gregery D Engle (edited June 22, 2000).]

    Leave a comment:


  • Gregery D Engle
    replied
    Originally posted by Eric Pearson:


    Maybe you could post a series of sample strings and the results you would like to see, so we can understand the logic better.

    -- Eric

    I appoligize about the confusion. Here is some sample strings and the results that I am trying to achieve:

    string1$="hello how are you<A>today?</A>I am just fine</A>"
    string2$="<A>hi how are you?<A> I am just fine</A>"

    delimeter1 = <A>
    delimeter2 = </A>

    STRING1$ response:

    "today?</A>I am just fine"

    STRING2$ response:

    "hi how are you?<A> I am just fine"


    notice that a delimeter within a delimeter is being ignored. Kinda like the Double Quoted String like this:

    ""hi how are you""

    any suggestions?


    ------------------
    -Greg

    Leave a comment:


  • Scott Turchin
    replied
    My example only removes <a> from the StArray(x) NOT the St, so hence whatever it takes to remove it since it is the final formatted string, leaving the original intact for parsing of the next loop.

    And dont' forget to check for UCASE$ or just do it like such:

    StArray(x) = Remove$(UCASE$(St),"<A>") to make sure you don't overlook the obvious...


    Scott


    ------------------
    Scott
    mailto:[email protected][email protected]</A>
    MCSE, MCP+Internet

    Leave a comment:


  • Eric Pearson
    replied
    It seems to me that Scott's solution is right. The <A> isn't really a "delimiter", right? It can be part of a returned string.

    Your first example isn't very clear... the first string starts with <A> and </A> but the second and third don't. Is that right?

    Maybe you could post a series of sample strings and the results you would like to see, so we can understand the logic better.

    -- Eric


    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>



    [This message has been edited by Eric Pearson (edited June 22, 2000).]

    Leave a comment:


  • Semen Matusovski
    replied
    Something like this
    Code:
    k2 = 1
    Do    
       If k1 > Len(Txt$) Then Exit Do
       k1 = Instr(k2, Txt$, "<A>"): If k1 = 0 Then El(...) = Mid$(Txt$, k1): Exit Do
       k2 = Instr(k1 + 3, Txt$, "</A>"): If k2 = 0 Then k2 = Len(Txt) + 1
       El(...) = Mid$(Txt$, k1 + 3, k2 - k1 - 3)
       k1 = k2 + 4
    Loop
    [This message has been edited by Semen Matusovski (edited June 22, 2000).]

    Leave a comment:


  • Gregery D Engle
    replied
    Originally posted by Eric Pearson:
    Greg --

    > I think that would work if it was one
    > delimeter but I don't think so with two.

    Use the REPLACE function to replace delimiter string #1 with CHR$(0), then do the same thing with delimiter #2. Then you're only dealing with one delimiter.

    Or, if you need to distinguish between the two, use CHR$(0) for one and CHR$(1) for the other, then use PARSE$(...ANY CHR$(0,1))

    -- Eric

    That would work 99% of the time but what about a string like this?

    <A><A>hi how are you<A> I am fine</a>

    first delimeter is <A>
    second delimeter is </a>

    I would want the string:

    <A>hi how are you<A> I am fine


    ------------------
    -Greg

    Leave a comment:


  • Eric Pearson
    replied
    Greg --

    > I think that would work if it was one
    > delimeter but I don't think so with two.

    Use the REPLACE function to replace delimiter string #1 with CHR$(0), then do the same thing with delimiter #2. Then you're only dealing with one delimiter.

    Or, if you need to distinguish between the two, use CHR$(0) for one and CHR$(1) for the other, then use PARSE$(...ANY CHR$(0,1))

    -- Eric


    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>



    [This message has been edited by Eric Pearson (edited June 22, 2000).]

    Leave a comment:


  • Gregery D Engle
    replied
    Originally posted by Scott Turchin:
    <font face="Courier New, Courier" size="3"><pre>
    This is off the top of my head
    Assume St is the TOTAL String of the entire length.

    for x = 1 to len(St)
    StArray(x) = Parse$(St,"</a>",x)
    StArray(x) = Remove$(StArray(x),"<a>") 'in case that 2nd tag shows up
    Next
    </pre></font>


    That's my basic t hought, you will have to expand on that but it should work, that's how I do it with the "|" sign in my decryption process and it works flawless...


    Scott


    I think that would work if it was one delimeter but I don't think so with two. I will try but I have tried almost everything and everytime I think I got it, it messes up

    ------------------
    -Greg

    Leave a comment:


  • Scott Turchin
    replied
    Code:
    This is off the top of my head
    Assume St is the TOTAL String of the entire length.
    
    for x = 1 to len(St)
       StArray(x) = Parse$(St,"</a>",x)
       StArray(x) = Remove$(StArray(x),"<a>") 'in case that 2nd tag shows up
    Next

    That's my basic t hought, you will have to expand on that but it should work, that's how I do it with the "|" sign in my decryption process and it works flawless...


    Scott


    ------------------
    Scott
    mailto:[email protected][email protected]</A>
    MCSE, MCP+Internet

    Leave a comment:


  • Still need help on String Manipulation property

    I am still working on a solid proof 2 delimited SPLIT command. I am sure someone has made one before let me describe in detail what I'm trying to do:
    let assume that:
    <A> is the first delimeter
    </a> is the second delimeter

    Input String
    ---------------------
    <A><A>hi how are you</a><hello><I am fine><A>really?</a><A>yes really!!!!<ggg></a>

    First Array:
    ---------------------
    <A>hi how are you</a>


    Second Array:
    ---------------------
    really?


    Third Array:
    ---------------------
    yes really!!!!<ggg>


    What I'm wanting to accomplish is return the text between two delimiters in a string but the catch is that there can be a delimeter within a delimeter like this:

    <A><A>hi</a>

    <A> is one delimeter
    </a> is the other delimeter

    I would get the string:

    <A>hi

    see with a normal split command it would return this instead:

    hi

    very bad. I hope someone can help me. Thanks.

    -------------
    -Greg



    [This message has been edited by Gregery D Engle (edited June 22, 2000).]
Working...
X