Announcement

Collapse
No announcement yet.

Style or speed, a choice?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Guest's Avatar
    Guest replied
    Hey Davide

    It ain't over till the fat lady sings you know :-)

    No - you ARE to be congratulated Davide. The whole issue of pre-processors brings out what is wrong - in my opinion. Why not begin a new thread with pre-processors. Did you take a look at BASNIX?

    And I don't buy your story about your application! We simply MUST see what it is about. Do you have a web page featuring it?

    Thanks Anton

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    Anton, come on, don' t be irritated.
    Before jumping on the table shaking your finger as an unneeded defender of the compiler, you better read the thread (i.e., the thing your talked about) to notice that it was not me who recalled that several compilers do these things, that i don' t ask the compiler to do, and in general to see what was said about PB compiler.

    Michael, yes, source code is structured for maintenance, at least the mine. And this is the ONLY reason why i would like a preprocessor. Is it so hard to understand? What' s the problem?

    Anyway you' re right, this thread needs to be closed.
    It was expected that a topic such as a preprocessor would irritated many people, mainly experienced sw marketers and skilled sw builders and experts. Since you are very intelligent i don' t need to explain you why. Who intended to understand already did it since the beginning, as did who intended not to. Just it was not expected that they decide to invest time and exposure in expressing their irritation.
    And i will not make any real-world example to show why in applications such as the mine, a preprocessor is a must. If some day i will get to have competitors over this application, i hope they will be minded like you about speed issues .

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Michael Mattias
    replied
    One other thing: Don't worry about the source code.

    Source code is structured for maintenance (um, well, that's the theory). What counts is the performance of the compiled code.

    MCM


    Leave a comment:


  • Guest's Avatar
    Guest replied
    Davide

    Geez you have been busy. Methinks you would have been a better playwright! Thanks for the entertainment. Is it true that Lance is selling tickets ;-)

    Actually I think your argument sticks. PERIOD. One thing I love about PB DoS is you simply KNOW that there is a powerful monkey under the hood.

    As regards your preprocessor etc, please take a look at http://kwazulunatal.com/seal.html and look at the boardroom. These guys have been so busy creating things of interest of which is a site that contains a very interesting project called BASNIX which has a preprocessor along your ideas. Take a look.

    Also one final statement or is it a META one? DON'T you just WISH for the compiler of the future? TALK TO IT and it does WHAT YOU WANT - figuring out exactly how to do it via online libraries the world over...dreams..

    Thanks
    Anton

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    Walt,

    in my opinion, whatever optimization we can do inside the procedure, it will remain that if we then turn it into a non-procedure, we' ll collect additional speed. The speed comes from the fact that it' s no longer a called thing.

    However i think that your point shows well a major difference between kinds of optimization, when they are catalogued by damages to the code readability (that is the only point by my side since the beginning).
    The optimization you are attempting (splitting an expression), would lead the code to a better readability, so from my point of view, not only it can be done by the programmer, but it must.
    Not so e.g. for exploding a time-critical call. This would lead the code to a worse readability, and should not be done. Not by the programmer, and above all not on the code he will manage. It should be done of course, but by a preprocessor.

    I think that if we will discuss still too long about the fact that a preprocessor could help, it will take longer than for writing a preprocessor in PB/DOS .

    I' ll monitor with interest your possible posts about expressions optimization, since i' ve a lot of math in my code (inside time-critical called procedures with lots of local and passed vars, you guess ).

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Walt Decker
    replied
    One thing I did try but didn't mention was breaking your addition/multiplication into three steps. With most other BASIC compilers, doing so would somewhat decrease the throughput; however, when I did so with PB it actually increased the throughput by 17+- seconds on my machine. I find that rather strange and intend to look into it further.

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    Thanks for your two cents Lance.

    About commercial considerations, i interestingly read and stored them, but i want to highlight that i was NO WAY talking about compiler' s lacks or inefficiencies (it' s probably the most efficient, lackless (?) and supported BASIC compiler on the market, i hope to never forget it). I was focusing right on a precompiler.

    When i have an application that is not time critical, i want that nothing is unrolled nor exploded nor optimized, i want the procedures to be really called and the vars scopes to be really separated at runtime too, exactely upon the rules of PB, because this grants the runtime to be safer and possible late bugs to be catched more easily should they arise. So i LIKE that the compiler, as i think it actually does, supports (and doesn' t unroll nor explode) loops, procedures, local & passed vars and other things that are adding value to having the source available and not only the exe.

    About Lance' s points, the 1 confirms me that pointers and ASM are all but an ultimate answer to speed questions.
    The 2 and 3 apply to unrolling loops or blindly exploding calls, but this wouldn' t be mandatory: also just exploding the call inside the loop will take you to other beachs, without unrolling the loop. There could be options and recognition of remmed flags in the code.
    About the 4 i really don' t know, about the 5 i agree.

    I didn' t want to suggest to PB to add similar optimization features to the compiler because, as you highlighted, just a part of us could benefit from them and there is no reason to force all to buy features that they don' t need or that they hate (this recalls me another sw company, not PowerBASIC Inc.).
    Maybe PB Inc., for obvious knowledge reasons, would be an unreachable master in writing a precompiler for its own compiler, to be sold separately, but this is another thing, this way we would get into commercial again.

    I think to a separate tool to provide TERRIFIC and CHEAP (in terms of development time and required knowledge :eek speed bonuses to time-critical programs that pointers and ASM have already optimized as they can (and so made the code forever damn darker and snob, sorry). Don' t laugh at my entusiasm, it comes directely from the real-world economy of my main PB/DOS application.

    We can have as many pointers, ASM routines and BYVALs as we want in our time-critical program, but, as long as we' ll have also critical called procedures with local & passed vars, we will be ever giving away MUCH speed. Unrolling and exploding them by hands, we will be giving away MUCH readability (and development time and other).
    This is not an opinion anymore, is a fact, maybe partial, but a fact. Questions regarding ASM or pointers apply to the general topic of optimization but don' t apply to this fact.

    Michael, i tried to query the stuff between my ears, and he led me to mind about a precompiler; and he ensured me that, although he has an high self-estimate, he wouldn' t get offended by being helped by a tool that is the result of someone' s wetware. Once again, i was just asking why do i have to force myself to deal with dark and bloat code while i simply don' t need to. Why this choice.
    I' m not looking for the best optimizer, i' m looking for an optimizer that accomplishes my tasks. This won' t make me forget that i have the responsibility to write good code. Writing good code is usually to me something like a phisyc pleasure, really.
    I agree with you that the best optimizer is not extra software.

    If i' ll find something like a precompiler i won' t miss to post here.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Michael Mattias
    replied
    Repeating something I posted on Usenet a while ago:

    The best optimizer is not extra software; it is better use of existing "wetware" - you know, that stuff between your ears.

    MCM


    Leave a comment:


  • Lance Edmonds
    replied
    I think I'll put my 2 cents in here, just in the interests of a broad discussion. These comments are based on my personal experience and observations, and my knowledge of the commercial world of software development and sales.

    In my opinion, implementing most of the suggestions/discussions so far should be tackled by a "preprocessor", rather than the compiler. By using a preprocessor, we guarantee that there are no new compiler issues that can complicate life

    Optimizations, such as "unrolling" code (ie, duplicating the contents of a loop sequentially instead of actually performing a loop) is one possible optimization that a compiler *could* make, but is easier to implement in a pre-processor.

    In the case of sub/function subsitution (making the code inline {as described above}, rather than called as a procedure) introduces a huge list of possible problems that can stop the optimization in its tracks:
    1. local storage (as it currently works now) is no longer available... any code that makes direct use of the call stack through pointers or assembly would need to be completely "rewritten" by the compiler... this is an {almost) impossible task as the compiler cannot judge all possible runtime scenarios and produce code that is guaranteed to run as the programmer intended.
    2. What happens when the sub/function is called hundreds of times inside a single loop, or in hundreds of places in a single BAS module?
    3. What about when the call is in a loop whose count is determined until runtime?
    4. What about CALL DWORD implementations?
    5. ... there are a huge number of other problems... I'm sure others will think of them.

    For all of these problems and more, we would need to add significant code to the compiler... for starters, this reduces compile time and it is possible that such wide-reaching optimizations may produce unexpected results (to what the programmer expected) that can add significantly to the process of debugging code. In addition, any changes to the PowerBASIC compiler need to take into account all of the existing code that has been written without problems. For example, even introducing a new keyword may break some code due to variable name conflict.

    As it stands at right now, PowerBASIC's compilers do make certain {low-level) optimizations that most people do not even notice, but these same people reap all the benefits: faster/smaller executable code. The compiler compiles the existing code "as the programmer intended" without, for example, invisibly rewriting your code as it compiles.

    If you need to siginficantly improve the performance of your code, then the tools for such a task are already as hand: Inline Assembly Code and Pointers. Indeed, this has always been the recommendation of PowerBASIC, Inc (for as long as I can remember): Write your code in the high-level BASIC language and if necessary, optimize your code with assembly/pointers to squeeze extra performance.

    It is a fact that PowerBASIC compilers already produce some of the fastest compiled code around, comparible to optimizing C compilers, and we are competitively priced.

    Given the extensive time, effort, and R&D investment required to implement such changes to the compiler must be commecially viable, these costs would have to be passed on to the customer, making the product instantly more expensive... the higher the price, the fewer copies are sold. At the current price of US$59 for a full-product version of PB/DOS (upgrade prices are lower still), we would need to sell thousands of compilers to *new* customers (and a lot of upgrades) just to claw-back the costs of development. Any price hike is unlikely to be popular, given the moderate performance increase the compiler may offer over existing versions. To be viable, the upgraded compiler would need to have significant performance across the board without breaking existing code.

    There rests my personal reasons for such changes to be implemented by a preprocessor... the compiler (as it stands today) will happily compile such code and not one new bug was introduced into the compiler... no existing code was broken, and no avalanche of tech support calls were made.

    That said, I would never cast aside any suggestion by a customer for improvement of the PowerBASIC compilers... every suggestion (including those in theis thread) is reviewed religiously by R&D and added to the "wish list". Every single new feature added to a PB product is carefully scrutinized by Bob Zale himself. Only Bob can decide to include or exclude a particular suggestion.

    [/Rant mode off]

    Regards,
    Lance

    Leave a comment:


  • Davide Vecchi
    replied
    Walt,

    about the benchmark, after i corrected the error in the DIM (SINGLE is needed instead of INTEGER), the time was 110.73", instead of 110.78", so virtually the same.

    In my example you could easily eliminate local variables and use the argument values directly because of the way it was written, but you shouldn' t focus on this; of course it makes no sense writing A1 = A0, B1 = B0 etc. just to use them in an expression, like i did; let' s say that these 4 assignments were 4 different statements, the concept is the same.

    I think you' re right about BYVAL; i like it even as a way to mark read-only vars; howewer passing the 4 parameters byval, the 110.73" just became 107.76".

    I think a destructor could check the data type, and could do some optimizations on this too.
    Of course, the first who has to care about these things is the programmer. I don' t want to minimize the role of the programmer. And if he uses pointers or assembly, as you suggested, he can do even bigger speed optimizations.
    Although i can' t see what' s wrong with an automatic code optimizer yet, i agree with you that if you want real optimization, the programmer has to do it.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Walt Decker
    replied
    Well, David, let me point out a couple of things about your benchmark. I ran it first as is with 100,000,000 interations on my 166mHz laptop. It took 135.172+- seconds. I next changed your proc declarations to the proper data type and ran it with the same number of interations. It took 109.851+- seconds. Next I eliminated your local variables and used the argument values directly. With the same iterations it took 100.349+- seconds. Finally, if passed the argument by value, except for X of course. Using the same number of iterations it took 92.331+- seconds.

    Now, when you unrolled your code, or destroyed it, as you put it, you took care of those small details. Granted, the unrolled code in the main body took only 52.832+- seconds, but what is really important is that by paying attention to small details like making sure your local variables are of the same type as those passed and passing the values BYVAL instead of BY REFERERENCE, you can significantly increase the throughput. A compiler can't do those things for you. A destructor can't check the data types for you. Those things the programmer has to do. By using the tools that PB3.5 provides, e.g., pointers and inline ASM, I'm sure you could decrease the throughput to your procs without significantly "dirtying" the code. Take a look at Randall Hyde's discussions concerning this very concept.

    Compilers, destructors, assemblers, linkers can only do so much. If you want real optimization, the programmer has to do it.

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    Errata corrige: the DIM inside the SUB should have SINGLE instead of INTEGER. The speed difference is the same.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    This is the benchmark of destructuring a SUB with 5 passed vars and 4 local vars. It ran on a Pentium II 266 Mhz, under Caldera DR-DOS 7.03, PB 3.5.

    ============= Beginning of original source code =============
    $DIM ALL
    $CPU 80386
    $OPTIMIZE SPEED
    $OPTION CNTLBREAK OFF
    $COMPILE EXE

    DEFINT A-Z

    DIM A AS SINGLE, B AS SINGLE, C AS SINGLE, X AS DOUBLE
    DIM NumIter AS DWORD, nIter AS DWORD
    DIM zTime AS EXT

    DO

    INPUT "Num. of iterations "; NumIter

    A = 10: B = 20: C = 30

    PRINT USING$("#############,", NumIter); " iterations =";

    zTime = TIMER

    FOR nIter = 1 TO NumIter

    CALL Proc(A, B, C, nIter, X)

    NEXT nIter

    zTime = TIMER - zTime: IF zTime < 0 THEN INCR zTime, 86400##

    PRINT USING$("######,.######", zTime); " sec."

    LOOP UNTIL NumIter <= 0

    END


    SUB Proc(A0 AS SINGLE, B0 AS SINGLE, C0 AS SINGLE, N0 AS DWORD, X0 AS DOUBLE)

    DIM A1 AS LOCAL INTEGER, B1 AS LOCAL INTEGER, C1 AS LOCAL INTEGER, N1 AS LOCAL DWORD

    A1 = A0
    B1 = B0
    C1 = C0
    N1 = N0

    X0 = (A1 + B1) * (C1 - N1)

    END SUB
    ============= End of original source code =============

    ======= Beginning of source code with the procedure destructured =======
    $DIM ALL
    $CPU 80386
    $OPTIMIZE SPEED
    $OPTION CNTLBREAK OFF
    $COMPILE EXE

    DEFINT A-Z

    DIM A AS SINGLE, B AS SINGLE, C AS SINGLE, X AS DOUBLE
    DIM NumIter AS DWORD, nIter AS DWORD
    DIM zTime AS EXT

    ' ------> Added by the destructurer: "globalization" of Proc' s LOCAL vars.

    DIM Proc_A1 AS SINGLE, Proc_B1 AS SINGLE, Proc_C1 AS SINGLE, Proc_N1 AS DWORD

    ' <------

    DO

    INPUT "Num. of iterations "; NumIter

    A = 10: B = 20: C = 30

    PRINT USING$("#############,", NumIter); " iterations =";

    zTime = TIMER

    FOR nIter = 1 TO NumIter

    REM CALL Proc(A, B, C, nIter, X)

    ' ^ It is easy to determine that A, B, C, nIter are INPUT vars to Proc,
    ' and that X is OUTPUT var to Proc.
    ' So:

    ' ------> Destructured sub' s body

    Proc_A1 = A
    Proc_B1 = B
    Proc_C1 = C
    Proc_N1 = nIter

    X = (Proc_A1 + Proc_B1) * (Proc_C1 - Proc_N1)

    ' <------

    NEXT nIter

    zTime = TIMER - zTime: IF zTime < 0 THEN INCR zTime, 86400##

    PRINT USING$("######,.######", zTime); " sec."

    LOOP UNTIL NumIter <= 0

    END

    ' SUB Proc(A0 AS SINGLE, B0 AS SINGLE, C0 AS SINGLE, N0 AS DWORD, X0 AS DOUBLE)

    ' It is easy to determine that A0, B0, C0, N0 are INPUT vars to Proc,
    ' and that X0 is OUTPUT var to Proc.

    ' DIM A1 AS LOCAL INTEGER, B1 AS LOCAL INTEGER, C1 AS LOCAL INTEGER, N1 AS LOCAL DWORD

    ' A1 = A0
    ' B1 = B0
    ' C1 = C0
    ' N1 = N0

    ' X0 = (A1 + B1) * (C1 - N1)

    ' END SUB
    ======= End of source code with the procedure destructured =======

    With 100,000,000 iterations, the structured code takes 110.78" and the destructured one takes 52.73" (-52%).
    With 1,000,000,000 iterations, the structured code takes 1107.68" and the destructured one takes 538.76" (-51%).

    It is MUCH. And in real-world coding, we' ll have likely more than 5 passed vars and, above all, more than 4 local vars.
    Sure we can avoid using passed and local vars in critical procedures, or avoid critical procedures at all, and things like these, but i want not to avoid these and not to miss to run double speed. I want the wife drunk and the bottle full, because i can have them. So i look for a destructurer.

    Since i never heard mentioned that this kind of program exists, i would assume that it is not a very common kind of program. Years ago i co-wrote a destructurer for an odd BASIC of a programmable serial keyboard. Maybe some day i' ll have the opportunity to write a destructurer for PB/DOS. If somebody is minding about doing it, i submit my application for being a beta-tester.

    I think that a program like this could even do much more than what above. It could be an optimizer too. It could easily work on vars scopes to minimize their use while globalizing procedures local vars. It could support REMmed statements that flag the critical procedures, the most likely truths in comparisons, and who knows what else, even better if relying on a good knowledge of PB internals.

    Davide Vecchi
    [email protected]et.it

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    Differently, i personally think it is NOT ONLY the programmers job to optimize the code for speed where it is necessary. It is also the compiler' s job. Furthermore, it could be easily a destructurer' s job.
    I have perfectely understood that all you think the programmer can do optimizations. I already knew and did that. It is not the point.
    The problem is that the programmer has * NOT * the freedom to do optimizations, because everytime he adds an optimization to the code, he has automatically made his code more dirty and less readable and maintainable.

    Yes, there are procedures that are time critical and procedures that aren' t. The compiler has no means of determinining which is which, but it has not even the need to. He could try to destructure ALL the procedures until the program gets too big; a SUB / FUNCTION clause could exist that tells to the compiler that the procedure is critical or not. This is not the point as well.

    Try to explain to your customers that the numbers crunching code you are providing him, now got 10% slower just because you (the programmer) wanted to internally revisit it to improve source readability. Let me know what does he answers. Mine answered "ah."

    Sorry guys but, as you can see, despite all the interesting opinions expressed here, the original post' s question, although it was very simple, has not been answered yet: WHY do we want to be restricted to this choice while we could choose both. Yes, why?

    Since i think it' s not clear enough what (or HOW MUCH) i' m talking about, i' ll make a benchmark to better show you that.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Walt Decker
    replied
    Personally, I think it is the programmers job to optimize the code for speed where it is necessary. Many applications have processes or procedures that are time critical while other processes or procedures in the same application are not time critical. The compiler has no means of determinining which is which, but the programmer knows.

    ------------------

    Leave a comment:


  • Davide Vecchi
    replied
    I was thinking to destructured code as a thing that the programmer would not even see; it was just an help given to the compiler in doing its optimization, that is obviously not full (otherwise it wouldn' t benefit from things such as moving a procedure body to the place of the calls).
    So the source could remain understandable and maintenable, but this wouldn' t affect speed any longer. Of course, for the debug and testing stages the original code could be used, and after debugging, the destructured code. Something like $ERROR x ON|OFF.

    Yes, i was supposing that PB does optimizations, but i am not aware of which these are, nor i can figure out where one could have learned about this. I would have never imagined that a decrementing FOR / NEXT was faster.

    I think that avoiding multiple calls of the same procedure would subtract much sense to having procedures. Much better if i can call my procedure as often as i want (i wrote it for this) and let another program eliminate the calls and globalize the variables while i look elsewhere, just before compiling.

    About testing the "most likely" truth first when doing comparisons, i was thinking about things like this. But the benchmark i had posted showed that if i write the IF / THEN / ELSE block so that the most likely truth is the IF, then the code is slower, while if the most likely truth is the ELSE, the code is faster. I couldn' t understand nothing usable from this, just that there are too many things to understand to me, and this kept me from benchmarking the order i write the conditions in a IF / THEN / ELSEIF.

    I think that until we will be able to say that the compiler can' t benefit from destructured code, this topic will make sense.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Michael Mattias
    replied
    The whole idea of a high-level language - COBOL or BASIC - is to make the source code easy to understand and maintain. It is the job of the compiler to maximize efficiency.

    In the mainframe COBOL world, the compiler itself handles "unrolling" of loops, changing PERFORMs (the equivalent of GOSUB) into "in-line" statements, and the consolidation of similar code into called subroutines.

    As you may or may not know, PB does a bunch of optimization: one example cited here in the past is that if you do a "FOR X = 1 to 10" and X is never referenced inside the loop, PB actually starts with 10 and DECREMENTS the counter, even though your source code uses an INCREMENTING counter.

    Let the compiler(s) you use handle the optimization; better yet, do all optimizations yourself by avoiding multiple calls of the same procedure, testing the "most likely" truth first when doing comparisons, and using the 'speed-friendliest' datatype for your platform whenever possible.




    ------------------
    Michael Mattias
    Racine WI USA
    [email protected]

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Some C compilers will let you tell them to "inline" functions... This tells the compiler to put the function inline in your code where possible instead of actually calling the function. There are other optimizations that some compilers can do as well, but I'm not very familiar with them .. as I rarely write in C.

    Jason


    [This message has been edited by Jason McCarver (edited February 18, 2000).]

    Leave a comment:


  • Davide Vecchi
    replied
    Anton,

    Well, maybe that thing of GOTO was wrong.. But i think in general that there are ways to write the code which results in faster exe and in bad structured code to be seen. For instance in a previous thread (PBDOS executables speed) Lance answered:

    << "When PB calls a sub or function, it takes time to set the stack up with any parameters that need to be passed, fo repetitively calling a sub/function inside a loop will definately slow your code down.
    It is usually faster to use SHARED variabled inside a sub/function, rather than pass them as parameters. Also, local variables take time to get set up when the sub/function is called, so using shared variables for all of your speed critical loops, you'lll get the best performance within a sub/function. " >>

    << "Finally, if the code in the sub/function is not too large (ie, memory usage is not critical), copy it directly into the loop rather than calling a sub/function within the loop. ">>

    Of course many of us prefere to use procedures, with local variables, trying to keep the different scopes as separate as possible, not repeating blocks and so on. But why not to let an appropriate program to destructure (?) our structured code before compiling it. I imagine that some program like this exists, and that if it doesn' t, then if one writes it i would use it, and all those who are speed critical should use it.
    If i will have further info i' ll let you know.

    Davide Vecchi
    [email protected]

    ------------------

    Leave a comment:


  • Guest's Avatar
    Guest replied
    Davide

    Doesn't the compiler do that automatically? I was under the impression that Mr Zale hates bloatware so I presumed the compiler restructured the calls made by bloat artists...

    You are right though. I have used my own date conversion routine (a one statement baby) rather than the elaborate ones available. It is definitely faster although the logic contained in the one liner is the same as the bloated examples.

    If your theory is correct, please inform me as it has implications...I would think.

    Anton

    ------------------

    Leave a comment:

Working...
X