Announcement

Collapse
No announcement yet.

parsing command line options

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • parsing command line options

    I have Power Basic 3.0c, and dosemu has revitalized my interest in DOS.

    I have a few projects in mind to do, but the thought of coding the option parsing from scratch is enough to make me question whether they are too small to be worth pursuing.

    I would like to be substantially compliant with the guidelines in http://www.opengroup.org/onlinepubs/.../utilconv.html and accept GNU-style --long-options. But I also want to allow for traditional DOS-speak like /o options and @filename.ext specifications.

    It seems someone's already done it. The package is at http://www.sellsbrothers.com/tools/
    Unfortunately, I don't have a C++ compiler for DOS, and documentation is sparse. Is it possible for someone here to prepare this for use with PB (if that is itself possible)?

    ------------------
    Erich Schulman (KT4VOL/KTN4CA)
    Go Big Orange

  • #2
    Erich,

    parsing the command line in Powerbasic is so much more simple than in C.

    Do a search for COMMAND$ and/or PARSE$ and read the help file topcis
    for both.

    Even if the examples you find are for PBCC (for Windows) they will
    be more than enough to get you started. I don't have the DOS version
    of PB installed so I cannot look up all of the parsing and string functions
    available in the DOS version of PB but it would be a good start to
    look into that.

    I think you will find it's not that difficult to write your own
    code to allow for both DOS and UNIX style command line parameters
    with little effort.

    If you are truly lost, I may have time to help.

    As some have said here, it is sometimes much easier (and better) to
    write your own routines from scratch than to try to translate them
    word for word from another.. and after looking at the 'C' code in the
    links you provided, I'd much rather start from scratch in PowerBasic.

    -Michael.




    ------------------

    Comment


    • #3
      Thanks for the reply. I do have COMMAND$ but not PARSE$.

      Maybe you can look at my methods and see if anything is clearly wrong? This isn't how I was taught to write pseudocode, but it works for me.

      // Do easy stuff first
      Remove any leading and trailing whitespace
      Is COMMAND$ empty?
      -yes: return 0 and exit
      -no: continue
      Convert any TABs to spaces
      Is "--help" OR "-h" OR "-?" OR "/?" present? OR COMMAND$="?" ? // do it case-insensitively too
      // Do not check for "help". Suppose the caller is a dictionary program and the end user wanted to look up "help".
      // We could also check for language strings, like "--ayuda". See language support below.
      -yes: return 1 and exit
      -no: continue

      // Maybe there are no options? Ex: "foo bar.txt" has an operand but no options
      Is first character "-" OR "/" OR "@"?
      -no: return 2 and exit // just an operand, no options
      -yes: continue
      // Limitation: DOS allows filenames to start with "-" so this won't be seen as a filename argument
      // We could try to open -FOO.TXT, but it may not exist because user intends to create it
      // We'll leave this alone and just document the behavior as a feature

      Initialize EOR // Boolean flag to tell us when to quit processing the command line
      Initialize a string array to nulls // Each element will be a found argument with a null to tell the caller when the end of the list is reached

      /*
      DOS command line is limited to 127 characters.
      This would be the most possible options:
      -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o -o
      Thus the array can be DIMed to 42 elements.
      */

      // Valid delimiters of arguments:
      // " -" covers single-letter options (lone or grouped), --long-options, -- (end of options), and - (stdio marker)
      // " /" covers DOS style /o, /o tions and /options
      // " @" marks a filename

      Change all " /" to " -" // a bit less work to do


      DO UNTIL EOR=TRUE
      EXTRACT$ for " -" && EXTRACT$ for " @"
      Does EXTRACT$ return the same thing we started with?
      -yes: we're done (no delimiters left)
      -no: continue
      // EXTRACT$ gave us an option, an option with its argument, or an option group
      Copy this string into the array
      // We could also remove the leading "-" since the caller probably doesn't need it.
      // But do keep @ so the caller will know it's a filename.
      LOOP
      Return the array to the caller

      /*
      Expected results

      These should each come back as one element in the array:
      -
      --
      -f
      -foo
      --foo
      -foo bar
      --foo bar
      -f-oo
      --foo-bar
      -f o
      --fo
      @foobar.txt
      @foo
      @foo-
      @-foo
      @@foo
      @[email protected]

      Limitations:
      -file.txt (mentioned above)
      Options not tested for validity. Maybe we need a Python-esque TRY?
      Option groups are not broken apart. Even if -bar should be treated as -b -a -r, the caller will only get back -bar.
      */

      /*
      A little language support (may or may not be implemented)
      We always check for English --help.
      We can choose a way to receive other help words we want.
      Perhaps SET HELPWORDS=AYUDA\TASUKE\AIUTO\HILFE is a good way?
      */


      ------------------
      Erich Schulman (KT4VOL/KTN4CA)
      Go Big Orange

      Comment


      • #4
        Here's a "parse" routine I used for years. It should be easy to
        customize. It is probably adapted from a book by John C. Clark.
        The demo expects the string to be parsed to be in a text file
        rather than COMMAND$, but that's easy to change.

        Code:
        'demo
        
        cls
        redim ary$(1)
        sep$=" "       
         
        open "junk.txt" for input as #1
        open "junk2.txt" for output as #2
        while not eof(1)
            line input #1, work$
            CALL ParseIt(work$, ary$(), sep$) 
            FOR I= 1 to UBOUND(ary$)
               PRINT #2, ary$(I);",";
            NEXT
            PRINT #2,""
            INCR count&
            locate 24,1: ?count&,ary$(1);
        wend
        
        'SUB ParseIt (wk$, w$(), sep$)
        'SEP$ is separator(s): space, comma / or whatever
        'WK$ is the string to be parsed
        'w$() is an array of the resulting strings
        'UBOUND(ARY$) will tell you its size
        SUB ParseIt (wk$, w$(), sep$)
            wk$ = LTRIM$(RTRIM$(wk$))
            L = LEN(wk$)
            IF L < 1 THEN REDIM w$(0): EXIT SUB
            j = 1
            REDIM w$(1)
            FOR I = 1 TO L
              x$ = MID$(wk$, I, 1)
              IF INSTR(sep$,x$) THEN
                  IF w$(j) <> "" THEN
                      j = j + 1
                      IF j > UBOUND(w$) THEN REDIM PRESERVE w$(j)
                  END IF
              ELSE
                  w$(j) = w$(j) + x$
              END IF
            NEXT
        END SUB
        ------------------


        [This message has been edited by Emil Menzel (edited February 22, 2006).]

        Comment


        • #5
          Parsing a COMMAND$ is best done by first copying it to a atring,
          such as a$, which will permit editing the contents. You can also
          apply LCASE$() or UCASE$() to limit the number of tests you have
          to perform on the contents.

          There are two switches normally used in calling DOS programs. One
          is the forward slash (/) and the other is the leading hypen (-).
          Unfortunately, the forward slash is used in path designations in
          Linux and Unix just as the backslash (\) is used in DOS. In fact
          in early DOS, you had the option to use either in path statements.

          The hyphen or minus (-) can also be used in folder and file names,
          so it is not ideal either. The best way I have found to solve
          for this is to require a leading space before the (/) or (-) when
          designating a command line switch.

          Replacing tabs with spaces is a good idea. I do the same thing
          myself, although few people actually use tabs when entering
          command line options. Replacing " -" with " /" or vice-versa is
          also a good idea, since it means fewer cases to test for.

          Parsing a command line may also require accepting something in
          double quotes (or some other symbolic pairing), so one of the
          first things to do is look for such pairs and map them out of
          the passed parameters, replacing it in its entirety in the
          original a$ so that it does not trigger anything by accident. An
          easy way to do this is to create two matching strings, say a$
          and b$, then replace everything between double quotes in one of
          them with either spaces or nulls (CHR$(0))

          The method of scanning the remaining command string is a matter
          of choice. I generally do something like this:
          Code:
            a$ = UCASE$(COMMAND$)
            REPLACE CHR$(9) WITH " " IN a$
            b$ = a$
            i& = 0
            DO
              j& = INSTR(i&+1, a$, CHR$(34))  'look for double quote
              IF j&=0 THEN EXIT DO
              i& = INSTR(j&+1, a$, CHR$(34))  'look for second double quote
              IF i&=0 THEN i& = LEN(a$)+1     'in case none given
              MID$(a$, j&+1) = SPACE$(i&-j&-1) 'or STRING$(i&-j&+1,0)
            LOOP
            REPLACE " -" WITH " /" IN a$   'or REPLACE " /" WITH " -" IN a$
            i&=0
            DO
              i&=INSTR(i&+1,a$," /")  'look for a switch indicator  
              IF i&=0 THEN EXIT DO    'no more switches
              INCR i&
              j&=INSTR(i&+1,a$, ANY " /"+CHR$(34)) 'terminate on space, / or "
              IF j&=0 THEN j& = LEN(a$)+1
              c$ = MID$(a$,j&+1,i&-j&-1)
              SELECT CASE c$
              CASE "?", "H", "HELP"
                'explain the command options using STDOUT or PRINT
              CASE ELSE
                IF INSTR(c$, ANY ":\.") THEN
                  'c$ appears to contain a path\filename
                ELSE
                  'STDERR, STDOUT, or PRINT "Unidentified switch /"+c$
                  EXIT FUNCTION  
                END IF   
              END SELECT
            LOOP
          The trick sometimes is to think in circular terms in place of
          linear or sequential processes. Multiple switches can be
          all combined in this manner, regardless of sequence. If you
          limit a given switch to one letter, you can actually use more
          than one letter behind a (/) and set multiple flags as you
          step through each one, such as /aeiulk could all be handled
          individually, one letter at a time.

          Circular thinking can be particularly useful with exposed to
          external triggers. DO loops are very good for implementing
          circular responses, and you can write your code to run specific
          tasks when some expected event takes place, or to determine if
          an event took place which your program was not designed to
          handle.


          ------------------
          Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired

          [This message has been edited by Donald Darden (edited March 01, 2006).]

          Comment


          • #6
            >>
            Parsing a COMMAND$ is best done by first copying it to a atring,
            such as a$, which will permit editing the contents. You can also
            apply LCASE$() or UCASE$() to limit the number of tests you have
            to perform on the contents.
            <<

            Copying COMMAND$ does seem to be a good idea. I may have to be careful with using UCASE$/LCASE$, though. I want this parsing routine to be usable for a variety of projects, so I may need to allow for case-sensitive options.

            >>
            Parsing a command line may also require accepting something in
            double quotes (or some other symbolic pairing)
            <<

            That I did not fully consider. Joining with a + (as in DOS's COPY command) is also a potential issue, especially since their may or may not be whitespace around the +.

            >>
            If you limit a given switch to one letter, you can actually use more
            than one letter behind a (/) and set multiple flags as you
            step through each one, such as /aeiulk could all be handled
            individually, one letter at a time.
            <<

            I was thinking I might not need to break those single letters apart. The receiving program can easily enough do a INSTR on "aeiulk" and do whatever is needed.

            It might also be nice to allow for multiple letters anyway, like /on and /off. I would think those better done as --on and --off, but I want to be fairly supportive of both DOS and Unix conventions.

            If I implement this parser as a .PBU, perhaps single letter options can be specified as a compile-time option. If so, what is the best way of handling that in PB3? Most natural to me would be a line like
            #define SINGLE_LETTER_OPTIONS
            in config.h or
            LETTER_OPTIONS = SINGLE
            in Makefile.

            D'oh! I saw that my parsing function can't return an entire array to its caller. I may just have to allow that array to have global scope, then. Or at least concatenate everything back into a single variable-length string to return using CHR$(0) as new delimiters.


            ------------------
            Erich Schulman (KT4VOL/KTN4CA)
            Go Big Orange

            Comment


            • #7
              I can't see a need to create a compiler switch (constant reference)
              when compiling different options for different programs. As the
              programmer, you decide up front what switches will be used with
              each new program, how to implement those switches, and there is
              likely little common ground from one program to the next, except
              for invoking Help.

              I have found it convenient to define my own constant references in
              some of my programs. One is %DEBUG, which I can use to force some
              helpful debugging code to run that I add when something isn't
              working right, and I want MSGBOX or additional PRINT statements to
              show what's happening at some key point - this gets around the
              problem of actually running the DEBUGGER and having it run the
              program super slow when dealing with lots of processing.

              Another constant reference I sometimes use is %SLOWSTEP, which
              activates itself if a timer process shows that the debugger is
              actually running because the time between a couple of instructions
              has been significantly increased. I can pose a question such as
              IF %DEBUG AND %SLOWSTEP THEN ..., and do something that may help
              me isolate some logical error on my part. Alternatively, I have
              defined SLOWSTEP as a function to retest to see if the program is
              slow stepping (in the debugger mode) and conditioning my own
              MSGBOX or PRINT statements that way.

              So I can see the advantage of having your own constants or
              functions that are specific to creating alternate choices inside
              a given program, but I'm not sure about something more generic
              to use with a range of programs, unless you intend to have your
              own alternate debugging routines on tap as part of an Include
              file as well.

              Using UCASE$() or LCASE$() is always optional, but overcomes
              the tendency of some people to always be in upper case, lower
              case, or use proper case. You can always copy the contents of
              one string to another before putting one of them into a single
              case, which preserves any case applied by the user.

              Some programs make extensive use of switches and are forced(?) to
              require the same letter to flag different states, so /P and /p
              mean different things. I see that as awkward, myself. I think
              /p1 or /p2, /P1 or /P2, might be better.

              If you have multiple switches, such as the /aeiulk I mentioned,
              where each letter represents some option, then you cannot test
              for this combination with an INSTR() statement as you suggested.
              This is because each letter represents an optional choice, may
              not be used, or the letters may appear in some different order.
              Trying to exhaust all the possibilities with multiple INSTR()
              statements would quickly become unmanageable. However, processing
              through a combined switch with a DO loop is almost painless:
              Code:
                i&=0
                DO
                  i& = INSSTR(i&+1, a$, "/")
                  IF i& = 0 THEN EXIT DO
                  j& = INSTR(i&+1, a$, ANY " /"+CHR$(34))
                  IF j&=0 THEN j& = LEN(a$)+1
                  FOR k& = i&+1 to j&-1
                    SELECT CASE MID$(a$, k&, 1)
                    CASE "A", "a"
                      aSet& = 1
                    CASE "E", "e"
                      eSet& = 1
                    CASE "I", "i"
                      iSet& = 1
                    CASE "U", "u"
                      uSet& = 1
                    CASE "L", "l"
                      lSet& = 1
                    CASE "K", "k"
                      kSet& = 1
                    CASE "P", "p"
                      SELECT CASE MID$(a$, k&+1, 1)
                      CASE "1"
                        p1Set = 1
                      CASE "2"
                        p2Set =1
                      CASE ELSE
                        pSet = 1
                      END SELECT
                    END SELECT
                  NEXT
                  i& = j&-1
                LOOP
              Note that I avoided using UCASE%() or LCASE$() during testing,
              so you can do this multiple ways and get the same results.

              ------------------
              Old Navy Chief, Systems Engineer, Systems Analyst, now semi-retired

              Comment


              • #8
                The quickest way of testing for both upper and lower case
                is to AND the ASCII of the character with &HDF.

                a%=asc(a$) AND &HDF:a$=chr$(a%)

                If the char is "a" it will return "A"

                If the char is "A" it will return "A"

                It also makes for shorter code.


                ------------------

                Comment


                • #9
                  D'oh! I saw that my parsing function can't return an entire array to its caller. I may just have to allow that array to have global scope, then. Or at least concatenate everything back into a single variable-length string to return using CHR$(0) as new delimiters.
                  Doesn't have to.

                  Make sure the array is dynamic, REDIM it to anynumberofelements (to create a valid array) and pass it as a parameter to the parsing function.

                  In the parsing function, count the number of arguments, REDIM the array to the correct size and fill it.

                  Michael Mattias
                  Tal Systems (retired)
                  Port Washington WI USA
                  [email protected]
                  http://www.talsystems.com

                  Comment

                  Working...
                  X