Announcement

Collapse
No announcement yet.

Excel csv versus MS DOS csv

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Excel csv versus MS DOS csv

    Hi, I have a dataset to decipher and get data out of that is comma delimited in Excel. When trying to read it natively i fail to read in lines via "line input #1, temp$" for example. However, saving to MS DOS csv file things work properly. Is there any quick way to get that original Excel CSV read to working. thx for any info.

  • #2
    What do you mean by "comma delimited in Excel" Excel files are not "Comma delimited text". There is no such thing as an "Excel CSV file"
    Please describe the original source of the data and how you are getting it to a "comma delimited in Excel" file and to a "MS DOS csv file".

    Added: If you are actually "saving as type" as both "CSV (Comma Delimited) and "CSV (MS DOS)" and you are getting different results, I would suspect that your spreadsheet contains Unicode and the first option is saving text as such. In that case you need to read it as WSTRING, not STRING.


    Comment


    • #3
      From a source of CSV files I use, the files would be read by Excel but not by my PB app. Investigation with a hex viewer showed $LFs but no $CRs.

      Cure - load file to a string, use INSTR to check there are no $CRs, REPLACE $LF WITH $CRLF in the string, save and open with your CSV code.

      ((LINE INPUT# depends on $CRLF))

      (((viewing the CSV you have with a hex viewer will reveal if you have UNICODE problem, $LF (no $CR) problem, or both, or other)))

      Cheers,
      Dale

      Comment


      • #4
        Another possible cause - the file might be TAB delimited instead of COMMA and still saved as dot CSV.

        See PB Help for PARSE$ to use TAB instead the default comma,

        Cheers,
        Dale

        Comment


        • #5
          Dale, Sounds like you've been there too

          Handling all the various possibilities in "text data files" from various sources has caused me no end of grief over the years.

          Dean,
          The things you need to check before trying to parse data files from outside sources:
          How is it encoded - ASCII, UTF8 or UTF16?
          Is it Unix/Mac or MS originated (LF v CRLF)?
          Is it tab, comma, space delimited or something esoteric such as a pipe (|)?
          Does it have a header row?
          Are all columns quoted, all strings quoted or only strings containing the delimiter quoted?
          Are there possible embedded Tabs, LFs or CRLFs within text columns?
          Does every row have the same number of columns?

          And I may have missed some
          Last edited by Stuart McLachlan; 27 Nov 2019, 11:20 PM. Reason: Added: the last potential problem (number of columns)

          Comment


          • #6
            Still there! Viewing the file to see what's in it is faster than blindly trying things in code.
            Dale

            Comment


            • #7
              Originally posted by Dale Yarker View Post
              Still there! Viewing the file to see what's in it is faster than blindly trying things in code.
              Yep, You should answer all of those questions before you start coding your parsing procedures. (ZTWin comes in very handy )

              But in many situations, you also need to include suitable error checks and warnings so that when (not if) some clown changes the way the source data is created it doesn't screw up anything else (such as a previously consistent database you are adding data to).

              Comment


              • #8
                I use Free File Viewer to look at strange files

                Saved me a lot of time!
                [I]I made a coding error once - but fortunately I fixed it before anyone noticed[/I]
                Kerry Farmer

                Comment


                • #9
                  Dean,

                  From Wikipedia: The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks. CSV implementations may not handle such field data, or they may use quotation marks to surround the field. Quotation does not solve everything: some fields may need embedded quotation marks, so a CSV implementation may include escape characters or escape sequences.

                  So, there is many implementations of CSV routine, some good, some not so.
                  Usually the main rules are respected, and some implement custom rules.
                  For example in Comma delimited CSV Excel file, having [sep=;] as first line will define the separator a semicolon.
                  Also, because the mix of nested [,] and ["] and [CR] and [LF] it may become difficult to parse all this correctly.

                  An example:
                  Year,Make,Model,Description,Price
                  1997,Ford,E350,"ac, abs, moon",3000.00
                  1999,Chevy,"Venture ""Extended Edition""","",4900.00
                  1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
                  1996,Jeep,Grand Cherokee,"MUST SELL!
                  air, moon roof, loaded",4799.00

                  Another example:
                  Player ,Position ,Nicknames ,Years Active
                  Skippy Peterson ,First Base ,"""Blue Dog"", ""The Magician""" ,1908-1913
                  Bud Grimsby ,Center Field ,"""The Reaper"", ""Longneck""" ,1910-1917
                  Vic Crumb ,Shortstop ,"""Fat Vic"", ""Very, Very Fat Vic""" ,1911-1912

                  Also, because of their limitation, LINE INPUT and PARSE$ may not be the best tools for CSV files.
                  A data line may have CR/LF embedded and semicolons between quotes.

                  You may try CSV reader

                  If you could provide us some part of the csv file you have trouble with, it would easier to help...

                  Comment


                  • #10
                    Originally posted by Pierre Bellisle View Post
                    Dean,

                    From Wikipedia: The CSV file format is not fully standardized.
                    ...
                    Which is why I always try to work with Tab or space delimited text data files

                    Comment


                    • #11
                      Well. It will serve you well to have 2 exes that will convert CRLF to LF records and on to Convert LF to CRLF records. For that. Just read up the whole file and do a Global change to the single string you used to read in the file then write it back out.
                      What ever field delimiter is used in the file. Just make sure you scan the files records and make sure your delimiters are the same count in each record for the entire file. But some lines in some files will not havematching fields in each record.
                      Your own your own With that and it looks like tab delimited fiends is a good way to delimit fields in records. You can always globally change them.
                      There is not standard in cvs files as stated above.
                      Massaging data in data in files has always been something you must learn earn in dealing with data files in transferring data.
                      p purvis

                      Comment


                      • #12
                        Originally posted by Paul Purvis View Post
                        What ever field delimiter is used in the file. Just make sure you scan the files records and make sure your delimiters are the same count in each record for the entire file.
                        How could I have forgotten that one

                        I''ve added it to post #5

                        Comment


                        • #13
                          If you generate or can choose the file format you working with, well you are a lucky man.
                          As long as you work only on simple thing, it will be nice.

                          This said, many of us have to work on files that we have no control on the format.
                          Delimiters by itself is by far the easiest problem to solve.
                          As said above, the trickiest parts is the mix of delimiters and quote and spaces and CR and LF and TAB.
                          Even a simple space delimited file may become hard to follow when you need to insert field that contain space inside,
                          then you need quotes to preserve this field. And if you got empty field and CR and LF for multiline field it can become kind of complex pretty soon.

                          Managing a simple file is simple.
                          If you need to go one step above, a little more work will be needed...

                          Comment


                          • #14
                            Originally posted by Paul Purvis
                            What ever field delimiter is used in the file. Just make sure you scan the files records and make sure your delimiters are the same count in each record for the entire file.
                            Originally posted by Stuart McLachlan View Post
                            How could I have forgotten that one
                            I''ve added it to post #5
                            You may have more delimiters if you got some between quotes being part of a field...
                            You may have less delimiters if you got multi-lines record...
                            You may have good delimiters count and still being in a multi-lines record...
                            Some file may have CR, LF, and CRLF having different meanings, specially in multi-line files.
                            Some format, treat missing delimiters as if the last missing fields are simply empty.
                            If you are not sure of the file format, first thing is to query the author of do a deep analyze, maybe both is better.

                            Comment


                            • #15
                              The pity is that ASCII includes separator characters (decimal 30 and 31). Commas, carriage returns and line feeds could be embedded in fields without conflict.

                              Too late now, sigh!
                              Dale

                              Comment


                              • #16
                                Originally posted by Pierre Bellisle View Post
                                Even a simple space delimited file may become hard to follow when you need to insert field that contain space inside,
                                then you need quotes to preserve this field. And if you got empty field and CR and LF for multiline field it can become kind of complex pretty soon.
                                I think we are talking about two different meanings of "space delimited" here. I didn't mean using a single space character as a delimiter,
                                When I said "space delimited", I was talking fixed width rows i.e. what Excel calls 'Formatted Text (Space delimited) (*.prn)".

                                In my experience, that's a fairly standard term for that type of file.

                                .





                                Comment


                                • #17
                                  I've found that NOTEPAD, included in every WINDOWS installation, is a very good tool to visually examine any CSV, TXT, PRN or other types of files, prior to design the code to manage it.

                                  Comment


                                  • #18
                                    Notepad is a VERY poor text editor for seeing what is actualy in a text file.. . Do yourself a favour and download Notepad++
                                    Then you can actually see the CR,s, LFs, Tabs, number of spaces etc that are in the file. You caan alse see line and column numbers

                                    Click image for larger version

Name:	TextEditors.jpg
Views:	84
Size:	185.3 KB
ID:	786889


                                    Comment


                                    • #19
                                      Hi Manuel,
                                      Yep, NotePad can be useful to examine many CSV, TXT, and PRN files
                                      as long as "non printable characters" are not involved in the formatting.
                                      If so, beside Notepad++, you may also use an HexViewer or UltraEdit, there are many choices...

                                      Comment


                                      • #20
                                        Originally posted by Stuart McLachlan
                                        Which is why I always try to work with Tab or space delimited text data files
                                        Oh! I see, it is not "(Tab) or (space) delimited text data files"
                                        but "(Tab) or (space delimited text) data files" you meant.

                                        Confusion could be anywhere.
                                        On Wiki, you can read, a delimited file has the advantage of allowing field values of any length. which may be interpreted as the opposite.

                                        To avoid those situations, I prefer the term "flat data file" if used in the right context of course.
                                        Still better for me would be something like "fixed length field" record file.

                                        Note that, Excel PRN Formatted text is limited to 240 characters per line,
                                        any characters beyond this will be wrapped to the next line.
                                        So, even a simple flat data file may become somehow problematic.

                                        But, all this is out of Dean's CSV files scope I guess. He does not seem to have the choice of the file format.

                                        I always revert to: If you are not sure of the file format, first thing is to query the author if possible or do a deep analyze, maybe both is better.

                                        Comment

                                        Working...
                                        X