Impact of VLSI Scaling and Synthesis
- n Multimedia Processor Cores
TOM ´
AS BAUTISTA AND ANTONIO N ´ U ˜ NEZ
CAD Division, IUMA – Applied Microelectronics Research Institute. University of Las Palmas de Gran Canaria. E-35017 Las Palmas de Gran Canaria, Canary Islands, Spain. E-mail: bautista@cma.ulpgc.es
Abstract— In this paper we present experimental results obtained dur- ing the modelling, design and implementation of a full set of versions of SPARC v8 Integer Unit core aimed for embedded applications in digital media products. VHDL has been the description language, Synopsis tools those for the logical synthesis, and Duet Technologies’ Epoch has been used for the physical layout of the final circuits. They have been mapped to 0.50 and 0.35
m, three metal layers processes, in order to study the impact- f VLSI scaling on SPARC microarchitectural features. The quantitative
results given characterize suitable points in the design space. They show how much microarchitecture, design, datapath granularity and module de- cisions affect performance and cost functions. Design space exploration down to physical layouts is made possible by modelling techniques based
- n configurable VHDL descriptions.
- I. INTRODUCTION
As feature size of tecnological processes approach deep sub- micron technnologies, and as metal layers are not a bottleneck anymore, the integration density available on chip is becoming extremely high. The natural trend is to take this density for free. However, deeper submicron technologies also bring along new problems especially related to wire delay and power consump- tion. A new design paradigm has emerged: the synthesis of large cores (in-house propietary or outsourced intellectual property) ultimately building very large and complete systems on a chip. This paradigm also calls for a synthesis approach relying on ar- chitectural, logic and layout synthesis tools. This is in contrast to mainstream design approaches relying on full-custom cores. Complexity issues in processor architectures related to feature size evolution has been studied among others by Palacharla et al [fro]. They haev studied the tradeoff between hardware and clock speed from an architectural point of view by using key pieces of full-custom layouts good to estimate the clock cycle in superscalar processors, and for geometries ranging from 0.8
✁ mto 0.18
✁ m. The layouts for the 0.35 and 0.18 ✁ m process were- btained by appropriately shrinking the layouts for the 0.80
process. We set the goal to conduct a similar study but under a “synthesis-based approach” for the design rather than a “full- custom-based approach”, and to analyze also the effect of the different synthesis steps on the various levels of description and design options of an architecture. In our case we developed completely processor layouts (over one hundred implementa- tions) for 0.50 and 0.35
✁ m technologies. A study includingthe 0.18
✁ m process is also underway.One of the industrial fields demanding this dense design paradigm is digital media processing, especially for medium and low bit-rate video decoding. In the digital media domain, pro- cessor workload is dominated by video processing tasks [Pir98], [Ack94]. In order to cope with this load, in particular for high bit-rate video coding, high-end architectures are being con- ceived and developed using superscalar, vector, and parallel pro- cessors [RS98], [Ses98], [Pur98], [RK96].
- A. Limits of specialization
The vector-microprocessor paradigm has been explored in depth in [Asa98] as a result of assessing the vectorizability of SPECint programs. A quantitative analysis of extending the short-vector microprocessor approach to long-vector micropro- cessors has been given recently in [LS98] demonstrating a clear performance advantage for multimedia applications over simple scalar and superscalar processors, up to a three-fold improve- ment factor. It also shows layout-area costs which can become up to one order of magnitude higher compared to simple scalar processors with multimedia extensions. Another related architectural trend is represented by VLIW approaches aimed to find and automatically generate efficient architectures through processor specialization. Relying on the power of the highly optimizing HP Labs Cambridge C Com- piler, quantitative results reported in [FFD96] show performance gains for high cost tightly targeted VLIW architectures, but also show dramatic performance losses in low and medium cost VLIW architectures if too narrow-scope custom-fit processors are defined from the application. After running 5730 experiments with 191 VLIW architectures tailored to fit, in a wide range, 10 multimedia benchmarks, au- thors conclude: “If and when the cost of individual chip design becomes very much lower than it is today, it will make a lot of sense to build chips for the narrowest of embedded applications. Today, that seems like a dangerous route to attempt”. In recent years the advantages of standard, mainstream, pro- grammable solutions have also been highlighted. These solu- tions rely on standard processors available as cores for embed- ded systems. This approach helps in software development since they are based on well established processor architectures and efficient optimising compilers. Process technology advances are also bringing these processors to speed marks that make soft- ware solutions ever attractive.