Announcement

Collapse
No announcement yet.

Parse$ error

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parse$ error

    I've discovered a problem with the Parse$ function. It correctly reports the parsecount on some strings but doesn't report the correct value on the Parse$ data when a double quote is within a parse delimited field.
    Code:
    #COMPILE EXE
    DEFLNG A-Z
    
    FUNCTION PBMAIN () AS LONG
       'input string below
       '123,"9" PAN",2.99
       x$ = "123,""9"" PAN"",2.99"
       PRINT PARSECOUNT(x$) 'returns 3
       PRINT PARSE$(x$,1) 'returns 123
       PRINT PARSE$(x$,2) 'returns 9 but should return 9" PAN
       PRINT PARSE$(x$,3) 'returns 2.99
       WAITKEY$
    
    
    END FUNCTION
    Thanks,
    Donnie

  • #2
    Try using "," for param three to EXPLICITLY define the parse character instead of relying on defaults.

    No promises but worth a shot.

    MCM
    Michael Mattias
    Tal Systems Inc. (retired)
    Racine WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Looks like it is working correctly to me. Here is a quote from the help file on PARSE$ .

      "If delim$ is not specified or is null (zero-length), standard comma-delimited (optionally quoted) fields are presumed. In this case only, the following parsing rules apply. If a standard field is enclosed in optional quotes, they are removed. If any characters appear between a quoted field and the next comma delimiter, they are discarded. If no leading quote is found, any leading or trailing blank spaces are trimmed before the field is returned."

      KS

      Comment


      • #4
        Looks like it is working correctly to me.
        Agreed, so one solution is below:
        Code:
        #COMPILE EXE
        DEFLNG A-Z
        
        FUNCTION PBMAIN () AS LONG
           'input string below
           '123,9" PAN,2.99
           x$ = "123,9"" PAN,2.99"
           PRINT x$
           PRINT PARSECOUNT(x$) 'returns 3
           PRINT PARSE$(x$,1) 'returns 123
           PRINT PARSE$(x$,2) 'returns 9" PAN
           PRINT PARSE$(x$,3) 'returns 2.99
           WAITKEY$
        
        END FUNCTION

        Comment


        • #5
          Originally posted by keith shelton View Post
          "If delim$ is not specified or is null (zero-length), standard comma-delimited (optionally quoted) fields are presumed. In this case only, the following parsing rules apply. If a standard field is enclosed in optional quotes, they are removed. If any characters appear between a quoted field and the next comma delimiter, they are discarded. If no leading quote is found, any leading or trailing blank spaces are trimmed before the field is returned."
          the string is: 123,"9" PAN",2.99
          the second field should be 9" PAN or a 9 inch pan. This is standard notation for a standard comma-delimited field parsing. Excel and others parser parse this string correctly. The question is if this is by design to ignore part of a field or an oversite.
          Thanks,
          Donnie

          Comment


          • #6
            Originally posted by John Gleason View Post
            Agreed, so one solution is below:
            Code:
            #COMPILE EXE
            DEFLNG A-Z
            
            FUNCTION PBMAIN () AS LONG
               'input string below
               '123,9" PAN,2.99
               x$ = "123,9"" PAN,2.99"
               PRINT x$
               PRINT PARSECOUNT(x$) 'returns 3
               PRINT PARSE$(x$,1) 'returns 123
               PRINT PARSE$(x$,2) 'returns 9" PAN
               PRINT PARSE$(x$,3) 'returns 2.99
               WAITKEY$
            
            END FUNCTION
            Thanks for the suggestion but I can find solutions to work arounds the issue. I would prefer to not have to rewrite the parse function for this one anomaly. I wrote one back in the QBX days but only recently found this issue with PB. One of the solutions however couldn't be to alter the source data. We have a scripting process that reads data from many external data sources. In my opinion the function should handle Standard comma-delimited fields.
            Actually in doing some more research the standard definition for this case is defined as:
            Fields with embedded double-quote characters must be enclosed within double-quote characters, and the embedded double-quote characters must be represented by a pair of double-quote characters.
            1997,Ford,E350,"Super ""luxurious"" truck"
            Of which I would have no problems telling our customer they are not adhearing to standard encoding rules.
            Thanks,
            Donnie

            Comment


            • #7
              It's not too much of a workaround to keep the data as is. Try specifying the delimiter like MCM pointed out above, then additionally trim the double quotes.

              added: I was still changing the input string, so to avoid any change, replace any $DQ pairs with a single $DQ first.

              Code:
              #COMPILE EXE
              DEFLNG A-Z
              
              FUNCTION PBMAIN () AS LONG
                 'input string below
                 x$ = "1997,Ford,E350,""Super """"luxurious"""" truck""
                 PRINT x$
                 REPLACE CHR$($DQ,$DQ) WITH $DQ IN x$  'added
                 PRINT PARSECOUNT(x$)
                 PRINT TRIM$(PARSE$(x$,",",1), $DQ)
                 PRINT TRIM$(PARSE$(x$,",",2), $DQ)
                 PRINT TRIM$(PARSE$(x$,",",3), $DQ)
                 PRINT TRIM$(PARSE$(x$,",",4), $DQ)
                 WAITKEY$
              
              END FUNCTION
              Last edited by John Gleason; 10 Jan 2009, 08:52 AM. Reason: added REPLACE statement

              Comment


              • #8
                Originally posted by John Gleason View Post
                It's not too much of a workaround to keep the data as is. Try specifying the delimiter like MCM pointed out above, then additionally trim the double quotes.

                added: I was still changing the input string, so to avoid any change, replace any $DQ pairs with a single $DQ first.

                Code:
                #COMPILE EXE
                DEFLNG A-Z
                
                FUNCTION PBMAIN () AS LONG
                   'input string below
                   x$ = "1997,Ford,E350,""Super """"luxurious"""" truck""
                   PRINT x$
                   REPLACE CHR$($DQ,$DQ) WITH $DQ IN x$  'added
                   PRINT PARSECOUNT(x$)
                   PRINT TRIM$(PARSE$(x$,",",1), $DQ)
                   PRINT TRIM$(PARSE$(x$,",",2), $DQ)
                   PRINT TRIM$(PARSE$(x$,",",3), $DQ)
                   PRINT TRIM$(PARSE$(x$,",",4), $DQ)
                   WAITKEY$
                
                END FUNCTION

                This then would fail on strings like 123,"Hughes,Donnie",7.25.

                I know one relatively easy work around would be REPLACE CHR$($DQ,$DQ) WITH CHR$(255) IN x$ or some other non ascii character then replace the result back to a single $DQ.

                The point is Powerbasic parse function is not handing imbedded $DQ character based on the standard definition of a CSV. Excel and Microsoft parsers do but PB does not.

                Thanks,
                Donnie

                Comment


                • #9
                  The point is Powerbasic parse function is not handing imbedded $DQ character based on the standard definition of a CSV
                  A "standard for CSV?"

                  I've probably only seen five or six "standards" for CSVs. ( This does not include all the new 'standards' developed by imaginative users.)

                  e.g.

                  >This then would fail on strings like 123,"Hughes,Donnie",7.25.

                  So what would be "correct?"

                  Four fields:
                  123
                  "Hughes
                  Donnie"
                  7.25
                  Four Fields:
                  123
                  Hughes
                  Donnie
                  7.25
                  Three fields:
                  123,
                  Hughes,Donnie
                  7.25
                  Three fields:
                  123
                  "Hughes,Donnie"
                  7.25

                  Go back to square one: CSV is not a good choice for data containing quotes and/or commas. Change now while you still can.

                  MCM
                  Michael Mattias
                  Tal Systems Inc. (retired)
                  Racine WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment


                  • #10
                    Here's what Wikipedia has to say about the quandary:
                    No general standard specification for CSV exists. Variations between CSV implementations in different programs are quite common and can lead to interoperation difficulties. For Internet communication of CSV files, an Informational IETF document (RFC 4180 from October 2005) describes the format for the "text/csv" MIME type registered with the IANA. Another relevant specification is provided by Fielded Text which also covers the CSV format.

                    Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.
                    But, since it looks like you can't change your data Donnie, your double substitution idea should work if "Hughes and Donnie" are the desired output as two fields. If it's a one field Hughes,Donnie sort of answer, then... um... there's more work to do.

                    Comment


                    • #11
                      This works fine:
                      Code:
                      #COMPILE EXE
                      DEFLNG A-Z
                      
                      FUNCTION PBMAIN () AS LONG
                         'input string below
                         '123,"9'' PAN",2.99
                         x$ = "123," & $DQ & "9'' PAN" & $DQ & ",2.99"
                         
                         ? x$
                         ? STR$(PARSECOUNT(x$)) 'returns 3
                         ? PARSE$(x$,1) 'returns 123
                         ? PARSE$(x$,2) 'returns  9'' PAN
                         ? PARSE$(x$,3) 'returns 2.99
                      END FUNCTION
                      I suppose, that INCH must be identified as dbl Short REMARK, but not as dbl. quote...

                      Comment


                      • #12
                        RFC 4180 defines a Common Format and MIME Type for Comma-Separated Values (CSV) Files

                        To quote from this document:
                        If double-quotes are used to enclose fields, then a double-quote
                        appearing inside a field must be escaped by preceding it with
                        another double quote. For example:

                        "aaa","b""bb","ccc"
                        Likewise, this format adhears to the way excel saves data when saved as a csv. The above is three fields with the second field containing b"bb. Powerbasic parse would only return the b portion of this field and ignore "bb portion of this field.

                        I have an issue with data being ignored...

                        Donnie

                        Comment


                        • #13
                          Voila! Ok, blast away and see if you can break this:
                          Code:
                          #COMPILE EXE
                          #DIM ALL
                          
                          FUNCTION PBMAIN () AS LONG
                             LOCAL x AS STRING, ii AS LONG
                             'input string below
                             x$ =  "123,""Hughes,Donnie"",7.25,,"""""""",""aaa"",""b""""bb"",""ccc"",""1997"",Ford,""""E350"""",""Super """"luxurious"""" truck""
                             PRINT x$
                             REPLACE CHR$($DQ,$DQ) WITH $BEL IN x$       'forces PARSE to include whole field by removing $DQ's temporarily
                             PRINT PARSECOUNT(x$): PRINT
                          
                             DIM yArray(1 TO PARSECOUNT(x$)) AS STRING
                             PARSE x, yArray()
                          
                             FOR ii = 1 TO PARSECOUNT(x$)
                                 REPLACE ANY $BEL WITH $DQ IN yArray(ii) 'with single chr replacements, use the ANY keyword for speed.
                                 PRINT yArray(ii)
                             NEXT
                             WAITKEY$
                          
                          END FUNCTION
                          (I used $BEL here instead of chr$(255). Any entirely unused chr in the data is fine.)

                          Comment


                          • #14
                            Originally posted by John Gleason View Post
                            Voila! Ok, blast away and see if you can break this:
                            Code:
                            #COMPILE EXE
                            #DIM ALL
                            
                            FUNCTION PBMAIN () AS LONG
                               LOCAL x AS STRING, ii AS LONG
                               'input string below
                               x$ =  "123,""Hughes,Donnie"",7.25,,"""""""",""aaa"",""b""""bb"",""ccc"",""1997"",Ford,""""E350"""",""Super """"luxurious"""" truck""
                               PRINT x$
                               REPLACE CHR$($DQ,$DQ) WITH $BEL IN x$       'forces PARSE to include whole field by removing $DQ's temporarily
                               PRINT PARSECOUNT(x$): PRINT
                            
                               DIM yArray(1 TO PARSECOUNT(x$)) AS STRING
                               PARSE x, yArray()
                            
                               FOR ii = 1 TO PARSECOUNT(x$)
                                   REPLACE ANY $BEL WITH $DQ IN yArray(ii) 'with single chr replacements, use the ANY keyword for speed.
                                   PRINT yArray(ii)
                               NEXT
                               WAITKEY$
                            
                            END FUNCTION
                            (I used $BEL here instead of chr$(255). Any entirely unused chr in the data is fine.)
                            Thanks for your effort and suggestion but actually an empty string may look like like ,"", instead of ,"""",. In this case the parse would return " as a result.

                            It took a little more than my initial easy solution and I solved it with this.
                            Code:
                            #COMPILE EXE
                            DEFLNG A-Z
                            
                            FUNCTION ParseDQReplace(SPtr AS DWORD,strLen&)
                               LOCAL lsptr AS BYTE PTR
                               lsPtr = sPtr
                               qstr& = 0
                               FOR w& = 0 TO strLen&-2
                                  IF @lsptr[w&] = 34 THEN
                                     IF w& = 0 OR @lsptr[w&-1] = 44 THEN
                                        qstr& = -1
                                     ELSEIF qstr& THEN
                                        IF @lsptr[w&+1] = 44 THEN
                                           qstr& = 0
                                        ELSE
                                           @lsPtr[w&] = 255
                                        END IF
                                     END IF
                                  END IF
                               NEXT
                            END FUNCTION
                            
                            FUNCTION PBMAIN () AS LONG
                               '123,"Hughes,Donnie",7.25,,"","aaa","b""bb","ccc","1997",Ford,""E350"","Super ""luxurious"" truck"
                               x$ = "123,""Hughes,Donnie"",7.25,,"""",""aaa"",""b""""bb"",""ccc"",""1997"",Ford,""""""E350"""""",""Super """"luxurious"""" truck""
                               ParseDQReplace(STRPTR(x$),LEN(x$))
                               DIM yArray(1 TO PARSECOUNT(x$)) AS STRING
                               PARSE x$, yArray()
                            
                               FOR ii = 1 TO PARSECOUNT(x$)
                                   REPLACE CHR$(255,255) WITH $DQ IN yArray(ii) 'with single chr replacements, use the ANY keyword for speed.
                                   PRINT yArray(ii)
                               NEXT
                               WAITKEY$
                            END FUNCTION
                            Fast and functional.
                            Thanks,
                            Donnie
                            Last edited by Donnie Hughes; 14 Jan 2009, 11:20 AM.

                            Comment


                            • #15
                              My understanding is that PARSE$ works correctly, and the data is used incorrectly.

                              Code:
                              #COMPILE EXE
                              DEFLNG A-Z
                              
                              FUNCTION PBMAIN () AS LONG
                                 'input string below
                                 '123,"9" PAN",2.99
                                 x$ = "123,""9"" PAN"",2.99" ' < -----Incorrect
                                 x$ = "123,9"" PAN,2.99"     ' < -----Correct
                                 PRINT PARSECOUNT(x$) 'returns 3
                                 PRINT PARSE$(x$,1) 'returns 123
                                 PRINT PARSE$(x$,2) ' did return 9 but now returns 9" PAN
                                 PRINT PARSE$(x$,3) 'returns 2.99
                                 WAITKEY$
                              
                              
                              END FUNCTION
                              The thing is that X$, being defined as "123,""9"" PAN"",2.99" is internally stored as:
                              Code:
                              123,"9" PAN",2.99
                              so will "123,9"" PAN,2.99" be stored as
                              Code:
                              123,9" PAN,2.99
                              before PARSE$ gets to work with it. This format PARSE$ can process as "expected". PARSE$ does not claim to be a CSV parser (on the contrary), and PB do reduce the DQs correctly when storing the string. I agree with Keith when he says PARSE$ do work as documented.

                              If you need to preprocess your strings prior to feeding them to PARSE$ as a result of this, it seems that there are a few CSV parsers available in the source forum. You may even get all you need from such a parser, and be able to do without PARSE$ entirely. Here's one CSV parser: http://www.powerbasic.com/support/pb...V+string+parse

                              ViH

                              PS: I see you've got yourself a working solution while I was typing , but my comment on PARSE$ still applies.
                              Last edited by Vidar Hanto; 11 Jan 2009, 09:24 PM.

                              Comment


                              • #16
                                Donnie said: an empty string may look like like ,"", instead of ,"""",. In this case the parse would return " as a result.
                                Is it possible that an empty string can look like like """"? If so, your algo returns ". Also I noted that for ""E350"" your algo adds a leading space and returns no quotes, that is, | E350| but I was thinking it should be |"E350"|.

                                Comment


                                • #17
                                  Originally posted by John Gleason View Post
                                  Is it possible that an empty string can look like like """"? If so, your algo returns ". Also I noted that for ""E350"" your algo adds a leading space and returns no quotes, that is, | E350| but I was thinking it should be |"E350"|.
                                  Thanks for looking at it and the comments you made. Actually the string of ,"""", is by standard definition a single double quote.

                                  You are correct that the ""E350"" is not a standard definition and returns | E350 | since it should be """E350""". What is actually happening is the inside single occurance of the double quote was replace with a single char 255 and it does not display with a print statement which apprears like a space. Because a supplier that is providing data may send me a string like this in the production program I decide to to treat that as
                                  replace chr$(255,255) with $dq in x$
                                  replace chr$(255) with $dq in x$

                                  This way it will will not discard any data, adhear to the standard definition and allow a little forgiveness to the standard without abandoning it.
                                  Thanks,
                                  Donnie

                                  Comment


                                  • #18
                                    [CSV Files] ..by standard definition...is not a standard definition .... adhear[sic] to the standard definition ... to the standard ...
                                    You, sir, must see even empty glasses as half-full.

                                    MCM
                                    Michael Mattias
                                    Tal Systems Inc. (retired)
                                    Racine WI USA
                                    [email protected]
                                    http://www.talsystems.com

                                    Comment


                                    • #19
                                      Not sure why the attack... I may have overused a term that was used in the PB help document or somehow personally offended you... Sorry if I did.

                                      I work for a fairly large software company ($220m) and one of the things we provide is a scripting tool that converts data from many sources from about 8 format types. We don't define the format the source provider sends to our customers, we just provide the conversions and tranformation mapping scripts. Once converted to XML, we have a standard import process to load data into our databases. The transformation tool was written with PB.

                                      One of the formats the conversion tool supports is CSV. Our customers call our help desk if they are trying to load a description of 9" Pan for an item they're selling through their scanning system and it only scans as 9.

                                      One thing we've learned years ago was to abandon the phrase "works as designed". This tends to tick off our customers. We challenge ourselves to make sure the design is correct. I didn't post this thread in an effort to query a solution but rather to challenge the design. It doesn't matter to me any more since I've worked around the problem.

                                      Thanks for listening...

                                      Comment

                                      Working...
                                      X