Announcement

Collapse
No announcement yet.

Vector instructions with PowerBASIC

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Vector instructions with PowerBASIC

    According to Michael J. Flynn's taxonomy stands SIMD for Single Instruction, Multiple Data; also colloquially called as vector instructions. An application that may take advantage of SIMD is one where the same value is being added or subtracted to a large number of data points, a common operation in many multimedia applications.

    With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying get this pixel, now get the next pixel, a SIMD processor will have a single instruction that effectively says get lots of pixels. For a variety of reasons, this can take much less time than getting each pixel individually, like with traditional CPU design. Some common SIMD extensions are MMX, 3DNow!, SSE, and AltiVec (related to VMX).

    On the other hand, we’re living no longer in the 8086 era; it seems that the 80386 is also history. We’ve now the Pentium II, III or IV, the Centrino technology, the Athlon, the Opteron etc. with several deep instruction pipelines, branch prediction, out of order execution, concurrent processing, powerful MMX and SSE instructions, and last but not least multiple processor cores. That is state of the art. But how can we use this features when it comes to programs?

    Currently, implementing an algorithm with SIMD instructions usually requires human labor; most compilers don't generate SIMD instructions from a typical high level language program. Vectorization in compilers is an active area of computer science research. As a first step, I've uploaded the file SIMD.ZIP under the description Vectorization with SIMD instructions into the PB file base (Download). It can be accessed under http://www.powerbasic.com/support/do.../assembler.htm or
    http://www.powerbasic.com/support/downloads/dos.htm.

    SIMD.ZIP is Free Software under the GPL (General Public License); please check http://www.gnu.org/copyleft/gpl.html. All sources are included. Although the package contents at the present time only PB/DOS programs, it should be easy to implement the underlying ideas into PB/Win and PB/CC.

    At the moment, there are only two ways for using SIMD instructions with PowerBASIC: Using inline assembly, and making separate assembly language modules. What could be the future? Some C++ compilers have support for the so-called intrinsic functions. Most of the intrinsic functions generate one machine instruction each. It's therefore equivalent to an assembly language instruction.

    Coding with intrinsic functions is a kind of high-level assembly. It can easily be combined with high level language constructs such as IF-statements, loops, functions, classes and operator overloading. The invention of intrinsic functions has made it much easier to do programming tasks that previously required coding with assembly language syntax.

    The main advantage is: There's no need to learn assembly language. But such intrinsic functions for PowerBASIC are currently not available and must be written, perhaps as a result of a programming project. Any feedback is appreciated.

    Gunther
    Encoding Team

  • #2
    Hallo Gunther,

    I have tested SIMD.ZIP. I could compile both programs; they run flawless. The results are impressive. But I have a few questions:
    1. While the FPU and SSE code is running a few seconds, the BASIC code is running several minutes for the same task. Is that okay?
    2. ... most compilers don't generate SIMD instructions from a typical high level language program.
      Are you sure that PowerBASIC does not generate SIMD instructions?
    3. Do you know some literature about that topic?
    4. But such intrinsic functions for PowerBASIC are currently not available and must be written, perhaps as a result of a programming project.
      I am not sure what you mean. Could you explain that point a bit more, please?


    Thank you
    Anke




    Comment


    • #3
      Hallo Anke,

      Thank you for your feedback.

      While the FPU and SSE code is running a few seconds, the BASIC code is running several minutes for the same task. Is that okay?
      Yes, at least for PB/DOS. But I'm sure, if someone would port the source to, let me say PB/CC, we will have better results for the BASIC code. There are a lot of reasons for that behavior.

      Are you sure that PowerBASIC does not generate SIMD instructions?
      It depends. PB/DOS won't generate MMX, SSE or SSE2 instructions. For PB/Win or PB/CC I'm the wrong addressee for your question; I think only Bob Zale has the right answer. So far as I know, there are only two compilers, which generate more or less efficient SIMD instructions: the Intel C and FORTRAN compiler and the GCC. But as well PB/Win as PB/CC support the broad range of SIMD instructions with the inline assembler. So it's easy to implement such algorithms.

      Do you know some literature about that topic?
      The Intel and AMD manuals are an excellent source. Furthermore, there are some tutorials in the net, but these are rare birds.

      I am not sure what you mean. Could you explain that point a bit more, please?
      Of course. I think that some interested programmers should work together. The result could be a library, which is callable from PB and easy to combine with BASIC instructions.

      I hope that helps.

      Gunther
      Encoding Team

      Comment


      • #4
        Yes, that helps. Do you think that the example programs will also work with PB/CC?
        The Intel and AMD manuals are an excellent source.
        Yes, I've checked some AMD manuals, because my machine has a 64 bit AMD Athlon X2, which supports up to SSE3 instructions. I know that from your test programs. The AMD Architecture Manual says that my CPU has a lot new registers, for example R0 - R15, each of them 64 bit wide. Can one use those registers with PB?
        I think that some interested programmers should work together.
        Good point. I am very interested in such a project. Let me know, if you need help.

        Anke

        Comment


        • #5
          About your CPU question:
          The AMD Architecture Manual says that my CPU has a lot new registers, for example R0 - R15, each of them 64 bit wide. Can one use those registers with PB?
          The modern processors have a lot of interesting new registers. But you can't use the 64 bit wide registers R0 - R15 in your PB applications, although it would be neat. The same is true for the last XMM registers (XMM8 - XMM15), which are 128 bit wide. There's no chance. That has to do with the so called Long Mode. If you would have a native 64 bit OS (Windows or Unix) and a native 64 bit PB compiler, then you could use the complete register set.
          Do you think that the example programs will also work with PB/CC?
          Yes.
          I am very interested in such a project. Let me know, if you need help.
          Yes, fine.

          Gunther
          Encoding Team

          Comment

          Working...
          X