Announcement

Collapse
No announcement yet.

Space delimited data files

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Space delimited data files

    I've just realized that I've been happily (and apparently successfully) been reading in numeric data from space-delimited files using INPUT#, whereas (according to Help, PB/CC5) it is supposed only to work with comma-delimited data. This appears to work with no problems, but should I be aware of any potential pitfalls? It's all quick-and-dirty stuff, processing data for my own analyses...

  • #2
    From Help for INPUT#
    Remarks filenum& is the file number, or variable containing a file number, given when the file was opened. variable_list is a comma-delimited sequence of one or more string or numeric variables. When the INPUT# statement reads an unquoted data item from a file, it removes leading and trailing spaces. If spaces are significant, place quotes around the file data, either directly or by using WRITE# to save the data to disk. Please note that data to be quoted should not contain embedded quotes.

    The data in the file must match the type(s) of the variable(s) defined in the INPUT# statement. The file data should be separated by commas with a carriage return at the end. The WRITE# statement is ideal for creating such files.

    INPUT# also supports fixed-length and nul-terminated string variables; however, data that is longer than the string is truncated to fit into the string. Dynamic strings receive the data without truncation. UDT variables may not be used, although fixed-length and nul-terminated UDT member variables are supported.
    Comma delimited was in mind when INPUT# created, but space by accident maybe? See leading/trailing space removal.

    Potential pitfall, uhm, uhm; what if value being read is 32?

    Cheers,
    Last edited by Dale Yarker; 2 Jun 2017, 12:20 PM.
    Dale

    Comment


    • #3
      but should I be aware of any potential pitfalls?
      Just as a matter of course, relying on undocumented behavior like this is IMNSHO a really, really bad idea.

      BTW, "apparently working" code not shown!
      Michael Mattias
      Tal Systems Inc.
      Racine WI USA
      mmattias@talsystems.com
      http://www.talsystems.com

      Comment


      • #4
        IMO data that was programatically created is likely going to be more consistent than data a user entered somehow. If there is any question about the internal consistency of data read in from a file it is likely the best procedure to read a whole line into some sort of string variable and parse it based on some delimiter such as commas, spaces, tabs, etc. The poorer the data is the trickier it gets. Consider though, that to allay the issue Michael expressed, you could read a whole line into a string variable, then do a Replace on it to replace white space with commas. Then you would be back in the realm of supported behavior.
        Fred
        "fharris"+Chr$(64)+"evenlink"+Chr$(46)+"com"

        Comment


        • #5
          Thanks all. Points well taken. The data files are ones that I created in the first place, but I'll play safe and use an approach along the lines that Fred advocates.

          Code not very interesting, but so that it's shown...
          Code:
            q=FREEFILE
            OPEN "sea-pen present NEMO-ERSEM all vars.asc" FOR INPUT AS #q
            FOR i=1 TO 6
              LINE INPUT #q, iline
            NEXT i
            DIM p(1 TO 501,1 TO 658) AS EXT
            FOR i=1 TO 501
              FOR j=1 TO 658
                INPUT #q, p(i,j)
              NEXT j
            NEXT i
            CLOSE #q
          The file is in ESRI ascii format (first six lines are headers).

          I'll replace that with something like (not tested yet)
          Code:
            q=FREEFILE
            OPEN "sea-pen present NEMO-ERSEM all vars.asc" FOR INPUT AS #q
            FOR i=1 TO 6
              LINE INPUT #q, iline
            NEXT i
            DIM p(1 TO 501,1 TO 658) AS EXT
            DIM ParsedData(1 TO 658) AS STRING
            FOR i=1 TO 501
              LINE INPUT #q, iline
              PARSE iline, ParsedData(), " "
              FOR j=1 TO 658
                p(i,j)=VAL(ParsedData(j))
              NEXT j
            NEXT i
            CLOSE #q
          Hard coding of row and column numbers, because this is just quick-and-dirty code for my own use, with a lifetime of a few days or hours...

          Comment


          • #6
            A couple of thoughts for you

            If you tally the spaces before you parse, this would make sure that you are reading what you think you are. You can also use your tally result for doing the 'for' and for DIM'ing the arrays if the records are constant.

            The other thought is to use PARSE$ which means that you are proactively managing the field that you are accessing.
            [I]I made a coding error once - but fortunately I fixed it before anyone noticed[/I]
            Kerry Farmer

            Comment


            • #7
              > I've been happily (and apparently successfully) been reading in numeric data from space-delimited files using INPUT#

              Are you certain that they are not tab-delimited? How are you looking at the file?
              "Not my circus, not my monkeys."

              Comment


              • #8
                Thanks Kerry, good advice. In this case I know exactly how many fields there should be, but I'll use PARSECOUNT to flag any problems.

                Eric - Thanks for the lead. Yes I'm certain for the files that I have created myself. Just checked another file originating from some other software, and this too is space- rather than tab-delimited - used Notepad++, show all characters.

                Much as I'm tempted to make use of the convenience of the undocumented behavior, I'll take on board MCM's view that it is a "really, really bad idea". Ah, well...

                Comment


                • #9
                  One advantage commas have over spaces as a delimiter is that you can represent the number zero, or the empty string, with nothing. For example, if you Input #1, x,y,z where the three variables are Longs, inputing the data
                  4,,5
                  sets y = 0. However inputing
                  4[two spaces]5
                  sets y = 5 and you miss a value for z.

                  Here’s an earlier discussion about removing the “air” from sequential files:
                  Shrunken Sequential Files.

                  Politically incorrect signatures about immigration patriots are forbidden. Googling “immigration patriots” is forbidden. Thinking about Googling ... well, don’t even think about it.

                  Comment


                  • #10
                    While not clearly documented it is actually doing what is described i.e.
                    The data in the file must match the type(s) of the variable(s) defined in the INPUT# statement.
                    You are specifically requesting a numeric type ext to be input. So you are effectively applying the VAL to that point in the file and the rules for VAL are clear, it will strip leading zeroes and then stop at the first character not a numeric character or correctly formed exponentiation etc and that of course will be the next space so INPUT# will stop reading the file till the next INPUT# request.
                    The only weakness is if a number is missing i.e. file contains Space Space then you will get number out of sequence.
                    Not recommended but if never a missing number then it should work fine, no need for any PARSE functions.

                    Comment


                    • #11
                      Thanks for that John, that's a nice explanation.

                      Comment

                      Working...
                      X