A novel large-bit-size architecture and microarchitecture for the implementation of Superscalar Pipeline VLIW microprocessors
Abstract
Microprocessors have grown tremendously in its computing and data crunching capability since the early days of the invention of a microprocessor. Today, most microprocessors in the market are at 32 bits, while the latest microprocessors from IBM, Intel and AMD are at 64 bits. To further grow the computational capability of a microprocessor, there are two possible paths. One method is to increase the bit size of the microprocessor to 128/256/512 bits. The larger the bitsize, the more data can be crunched at any one time. The second method is to implement multiple microprocessor core in a single microprocessor unit. For example, the Intel’s Pentium 4 Dual Core and AMD’s Athlon Dual Core both have two microprocessor core within a single microprocessor unit. Latest from Intel and AMD are quad core microprocessors with either a configuration of pseudo-quad core or full quad core within a single microprocessor unit. In a pseudo-quad core configuration, two silicon each consists dual core microprocessor is packaged within a single microprocessor unit while a full quad core consists of four microprocessor core on one silicon packaged within a single microprocessor unit. Both methods have its advantages and disadvantages. Both methods yields different design issues and have different engineering limitations. This work explores the method of increasing the data bus size of the microprocessor from 32/64 bits to
128/256/512 bits to allow for more data crunching capability. In the course of this work, a superscalar pipeline 64 bits VLIW microprocessor with 4 stages (fetch, decode, execute, writeback) and 3 parallel pipes is implemented on a TSMC 0.35 micron process. The implementation is then expanded to 128/256/512 bits using the same TSMC 0.35 micron process. To prove the concept that such a large bit size VLIW microprocessor can indeed be implemented, the said VLIW microprocessor of bitsize 64/128/256 is programmed on an Altera Stratix 2 EP2S180F1508I4 FPGA and back annotated for verification. In the TSMC 0.35 micron process implementation of the work, the critical path of the VLIW microprocessor of data bus size 128/256/512 is analyzed with its worst path within the adder of the ALU in the execute stage. Different adder architectures are investigated for suitability on synthesis implementation of large data bus size adder for efficient usage within the ALU. An adder algorithm using repetitive constructs in a parallel algorithm that allows for efficient and optimal synthesis for large data bus size
is proposed as a suitable implementation for the adder within the ALU. This work has two important findings. One is the proposed adder architecture synthesis of a large bit size adder that provides for improved performance-gatecountproduct compared to conventional adder architecture synthesis. Second is the proof of concept that a large bit size VLIW microprocessor is possible by implementing a 64/128/256 bits data size on an Altera Stratix 2 EP2S180F1508I4 FPGA.