Announcement

Collapse
No announcement yet.

Compiler Options and Code Size

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Compiler Options and Code Size

    Is there a technical note or other document somewhere that explains the effect different compiler options have on code size?

    I have seen various mentions but no comprehensive overall document.

    Commands and options I see as affecting code size are as follows. Have I missed any? Can anyone give estimates of the percentage or absolute increase in size each produces?

    $COM
    $DEBUG PBDEBUG
    $ERROR BOUNDS
    $ERROR NUMERIC
    $ERROR OVERFLOW
    $ERROR STACK

    $EVENT

    $FLOAT EMULATE/NPX/PROCEDURE
    $LIB COM
    $LIB LPT
    $LIB GRAPH
    $LIB CGA/EGA/VGA
    $LIB FULLFLOAT
    $LIB IPRINT

    $OPTIMIZE SIZE/SPEED

    $OPTION CNTLBREAK
    $OPTION GOSUB
    $OPTION SIGNED

    $SOUND

    $STACK

    ON EVENT
    ON ERROR

    Apart from the explicit options mentioned above, does the size of the runtime library code vary according to the statements you use, or is it a single unit?

    Similarly if you don't use COM or SOUND related statements does the compiler automaticaly omit the buffers and related library or do you have to explicitly exclude it.

    Any other suggestions for reducing program size?

    Paul


    -------------
    The PC Guru, Austin Tx
    Working on data recovery tools.

  • #2
    Paul --

    I don't have any numbers about how much the various PB/DOS metastatements save... I always go by "if I'm not using it, it's wasted space" and I turn off everything I can.

    I'm going to assume that you've done everything you can to avoid repeated code. My usual rule is "8 lines"... If the same 8 lines appear twice in a program, make it into a SUB or FUNCTION.

    The biggest change I ever made in a PB/DOS program was when I changed everything from SELECT CASE to IF/THEN statements. The program was full of SELECTs, and changing them made the program a full 15% smaller. The explanation is that the PB/DOS SELECT statement is "generic" and performs all of its operations with floating point math, while each IF/THEN/ELSE is "tuned" by the compiler to use the most efficient data types, usually INTEGER.

    On that same topic, use INTEGERs whenever you can in a PB/DOS program. (Use LONGs in 32-bit programs.) They are much more efficient that SINGLEs, or anything esle, for that matter.

    Another "no loss" thing you can do is to try to duplicate strings. For example, if your program contains the literal strings "MYPROG.EXE", "MyProg.EXE" and "myprog.exe" (in things like DIR$ and OPEN statements) then all three strings will be stored in the EXE. If you make them all exactly the same, the string will only be stored once, no matter how many times it is used in the program.

    I found another surprising one in an older version of PB/DOS, and I don't know whether or not it is true in the later versions... Using:

    Code:
    IF X = 1 THEN 
       CALL DoSomething
    END IF
    ...is a few bytes less efficient than....

    Code:
    IF X = 1 THEN CALL DoSomething
    If you have a lot of one-line blocks, it can make a difference. Of course you can't change blocks with two or more lines.

    If you aren't explicity using BYVAL numeric parameters in your SUBs and FUNCTIONS, that can make a difference too. If a SUB or FUNCTION does not change the value of a passed parameter (i.e. if you do not intentionally change a value and pass it back to the caller for further use) then doing this...

    Code:
    SUB MySub([b]BYVAL[/b] MyParam%)
        'WHATEVER
    END SUB
    ...will make a significant difference. But don't do that with string parameters unless you need the properties that BYVAL provides.

    As far as your Run Time Library question goes, I'm not sure exactly how "granular" it is, but if you do not use any statements or functions in certain "classes", the compiler will make the RTL smaller. The biggest exception to that rule is that if your program uses CHAIN, the entire RTL must be included in case another module needs it later.

    One very effective technique for reducing program size is to use CHAIN creatively. My largest PB/DOS program starts out with a small "loader" program that displays a splash screen, checks certain hardware and disk file parameters, and loads a ton of setup files. It then CHAINs to the large "meat and potatoes" program, using COMMON to pass certain configuration values. That way I "throw away" all of the code that is necessary for loading the config files and proofing the entries for valid values. Even if all of the config values are not COMMON, the main program can safely load them without proofing the values.

    -- Eric

    ------------------
    Perfect Sync: Perfect Sync Development Tools
    Email: mailto:[email protected][email protected]</A>



    [This message has been edited by Eric Pearson (edited March 04, 2000).]
    "Not my circus, not my monkeys."

    Comment


    • #3
      In addition to Eric's comments:

      1. If CHAIN is used in an EXE, the the compiler is forced to add all libraries into the RTL because it has no idea what the chained module will require in terms of RTL support. To ensure that CHAINING works, the EXE has the full RTL linked in.

      2. ERROR testing adds 1 to 2 bytes per statement to the compiled code size. I'm sure that this is discussed in the documentation.

      At it's largest, the entire PB/DOS runtime library is < 64Kb. You'll probably find that most effort should be centered around improving your code, rather than worrying about the size of the RTL.

      Finally, just because the EXE size is smaller does not strictly mean that the memory usage of the application code will be smaller; heap usage is a product of your design choices and coding style.

      ------------------
      Lance
      PowerBASIC Support
      mailto:[email protected][email protected]</A>
      Lance
      mailto:[email protected]

      Comment


      • #4
        In addition, program size does not necessarily equate to efficiency. Given the following two calculations:

        D = SQR((A - B) ^ 2 + (C - D) ^ 2)

        D = (A + B) * (C - D)

        efficiency is increased by breaking the first down into steps but is decreased by breaking the second down. In a few tests I've run, the time required for the first calculation is cut in about half by breaking the first into steps while doing the same with the second the time actually increases by a few seconds.

        Also, there is a difference in whether the calculation is performed in a SUB procedure or a FUNCTION procedure. With the first calculation, a SUB is a few seconds faster than a FUNCTION, but with the second, a FUNCTION is a few seconds faster than a SUB. At least it is on my archaic 486 33mHz machine and my 166mHz laptop.

        I've also found that $OPTIMIZE SIZE/SPEED, at least with my test progs, seems to have little effect on speed unless other options are set to OFF, especially $ERROR.

        ------------------
        Walt Decker

        Comment


        • #5
          Thank you for the comments, as a grey-bearded FORTRAN programmer I instinctively use Integer rather than floating point data types, and I have been trying to specify BYVAL wherever possible. I hadn't realised just how inefficient a SELECT CASE statement was though.

          My main criterion is to increase free space during execution. I'm already using ABSOLUTE ARRAYS to make use of the mono display area at A000 for storage (just don't try to use it as a disk buffer - that works in DOS but not when run under Windows).

          On variable types, is WORD as efficient as INTEGER (or DWORD as efficient as LONG) for FOR loops and array subscripts (or does the compiler start converting values)?

          Since I will probably soon be forced to use CHAIN, the modularity of run time library won't be significant but when you say all of the RTL, you aren't including the COM and other libraries that have their own kewords. No one answered my query whether use of $COM 0 and $LIB COM - are needed if your program does not have any COM statements? Or are they needed if you use CHAIN? Similarly for the various graphics and sound library code.

          On strings, it is nice that the compiler looks back and reuses the same string definition: presumably that means that where you have similar strings (eg error messages) it is more space efficient to break them up into the common and different parts:

          e.g.

          If errcode=1 then
          print "Error during read ";"Read past end of file."
          elseif errcode=2 then
          print "Error during read ";"Unexpected data format."

          I understand that ON ERROR adds an INT instruction to each statement but approximately how much overhead do the various other $ERROR and $OPTION options add anyway? Bounds checking is nice but, depending on how it is done, could be expensive in memory resources. I need to decide whether the memory saved is worth the risks.




          ------------------
          The PC Guru, Austin Tx
          Working on data recovery tools.

          Comment


          • #6
            Paul --

            Lance may correct me, but I believe that the entire "main" RTL is included if you use CHAIN. The RTL is only included in .EXE files, not .PBC (chain) files. When you CHAIN, the RTL is "left behind" for the next PBC file to use. So if an EXE chained to a .PBC module that contained a function like COS (for example) and the original EXE's RTL did not include it, the program could not continue running. But the $LIB metastatements can still be used to control certain parts of the RTL... see ERR 204 and 244 in the PB/DOS docs.

            > is WORD as efficient as INTEGER (or DWORD as efficient as LONG)

            No. Not nearly as efficient, as I understand it. It's not just a matter of size, but of signed vs. unsigned. The "native DOS" 16-bit data type is INTEGER, and the "native" 32-bit Windows data type is LONG. Using anything else requires the compilers to jump through hoops. (This is true of any compiler, not just PowerBASIC.)

            > it is more space efficient to break them up into the common
            > and different parts

            Not (as I remember it, anyway) for short strings like your examples. If you save (say) 16 bytes by doing that, but the compiler has to use 32 bytes of executable code to add the strings together at runtime, you lose the advantage. (I made up those numbers... I don't know where the actual break-even point is.) But if the first 16 characters are repeated in 10,000 different strings and you can write a function that tacks on the prefix, you might gain some space.

            If your primary concern is runtime memory space, then I'd add one "condition" to my advice about using INTEGERs. If most of your numbers are byte-sized and you use a lot of large arrays, you might actually be better off using BYTEs. The arrays themselves would only take up half the space that way. The best example of this is BIT arrays. My largest PB program uses a huge, 3-D true/false array so I create a pseudo-bit-array system using the BIT functions. The arrays end up being 16 times smaller (!) than when I used INTEGERs, and that more than makes up for the additional code that is required to "manage" the non-INTEGER values.

            As far as the "overhead" for things like bounds checking goes, it will depend almost entirely on your program. For example, is it array-intensive? All I can really recommend is "try it and see". It should only take a few seconds to re-compile a module with a different metastatement... Less time than typing a message for the BBS...

            -- Eric

            ------------------
            Perfect Sync: Perfect Sync Development Tools
            Email: mailto:[email protected][email protected]</A>



            [This message has been edited by Eric Pearson (edited March 05, 2000).]
            "Not my circus, not my monkeys."

            Comment

            Working...
            X