highly associative caches for low power processors
play

Highly-Associative Caches for Low-Power Processors - PDF document

Highly-Associative Caches for Low-Power Processors Motivation n Cache uses 30-60%


  1. Highly-Associative Caches for Low-Power Processors ������������� �������������� �������������������������� Motivation n Cache uses 30-60% processor energy in embedded systems. o Example: 43% for StrongArm-1 n Many academic studies on cache l [Albera, Bahar, ’98] – Power and performance trade-offs l [Amrutur, Horowitz, ‘98,’00] – Speed and power scaling l [Bellas, Hajj, Polychronopoulos, ’99] – Dynamic cache management l [Ghose, Kamble,’99] – Power reduction through sub-banking, etc. l [Inoue, Ishihara, Murakami,’99] – Way predicting set-associative cache l [Kin,Gupta, Mangione-Smith, ’97] – Filter cache l [Ko, Balsara, Nanda, ’98] – Multilevel caches for RISC and CISC l [Wilton, Jouppi, ’94] – CACTI cache model n Many Industrial Low-Power Processors use CAM ( content- addressable-memory ) o ARM3 – 64 -way set-associative – [Furber et. al. ’89] o StrongArm – 32 -way set-associative – [Santhanam et. al. ’98] o Intel XScale – 32 -way set-associative – ’01 n CAM : Fast and Energy-Efficient

  2. Talk Outline � Structural Comparison � Area and Delay Comparison � Energy Comparison � Related work � Conclusion Set-Associative RAM-tag Cache 7DJ��6WDWXV����'DWD 7DJ��6WDWXV����'DWD n Not energy-efficient o All ways are read out n Two-phase approach o More energy-efficient o 2X latency " " 7DJ��������,QGH[����������2IIVHW

  3. Set-Associative RAM-tag Sub-bank %86 &DFKH n Not energy-efficient o All ways are read out n Two-phase approach ��� o More energy-efficient �� o 2X latency JZO U OZO OZO H G R n Sub-banking F H 7DJ� ' � 65$0 'DWD 'DWD n 1 sub-bank = 1 way V V &HOOV 65$0 H 65$0 U &HOOV &HOOV G n Low-swing Bitlines G $ o Only for reads, writes performed full-swing n Wordline Gating 2IIVHW� 7DJ� 2IIVHW� 6HQVH 6HQVH &RPS 'HF� 'HF� $PSV $PSV DGGU RIIVHW RIIVHW ,�2 %86 CAM-tag Cache 7DJ�������6WDWXV�'DWD 7DJ�������6WDWXV�'DWD n Only one sub-bank activated n Associativity within sub-bank n Easy to implement high associativity +,7" +,7" +,7" :RUG 7DJ %DQN 2IIVHW

  4. CAM-tag Cache Sub-bank ��� �� JZO n Only one sub-bank activated OZO OZO \ n Associativity within D U U $ sub-bank � J 65$0 65$0 D W &HOOV � n Easy to implement &HOOV 0 $ high associativity & 2IIVHW� 2IIVHW� 6HQVH 6HQVH 'HF� $PSV 'HF� $PSV WDJ RIIVHW RIIVHW ,�2 %86 CAM Functionality and Energy Usage n CAM Energy Dissipation 6%LWBE %LW %LWBE 6%LW o Search Lines :/ o Match Lines o Drivers 0 $ 5 6 0LVPDWFK 0DWFK PDWFK 6%LWBE %LW %LWBE 6%LW 6%LWBE %LW %LWBE 6%LW ;25 :/ :/ ���7�&$0�&HOO :LWK�6HSDUDWH :ULWH�6HDUFK�/LQHV $QG�/RZ�6ZLQJ� PDWFK � PDWFK � � � 0DWFK�/LQH � � � � � � � �

  5. CAM-tag Cache Sub-bank Layout ��.%�&DFKH�6XE�EDQN�LPSOHPHQWHG�LQ������ µ P�&026�WHFKQRORJ\ ��[���5$0�$UUD\ �[��[���&$0�$UUD\ � 10% area overhead over RAM-tag cache Delay Comparison 5$0�WDJ�&DFKH� &ULWLFDO�3DWK� *OREDO�:RUGOLQH�'HFRGLQJ /RFDO�:RUGOLQH�'HFRGLQJ ,QGH[�%LWV JZO OZO 'HFRGHG�RIIVHW 7DJ�&RPS� 7DJ�ELWV 7DJ�UHDGRXW 'DWD�RXW 'DWD�UHDGRXW &$0�WDJ�&DFKH� &ULWLFDO�3DWK� 7DJ�ELWV 7DJ�ELWV�EURDGFDVWLQJ /RFDO�:RUGOLQH�'HFRGLQJ JZO 7DJ�ELWV OZO 7DJ�&RPS� 'HFRGHG�RIIVHW 'DWD�RXW 'DWD�UHDGRXW �����������������������

  6. Hit Energy Comparison - S 450 � Q LZW L � H 400 ijpeg K F D pegwit 350 & � perl % . 300 � m88ksim � U R 250 gcc I � V V Avg 200 H F F $ 150 � U H S 100 � \ J U 50 H Q ( � 0 W L + 1-way 2-way 4-way 8-way 8-way 16-way 32-way RAM RAM RAM RAM CAM CAM CAM $VVRFLDWLYLW\�DQG�,PSOHPHQWDWLRQ Miss Rate Results 16 25 /=: SHJZLW 14 20 12 10 15 8 8KB 10 6 16KB 4 5 2 0 0 1-way 2-way 4-way 8-way 16-way 32-way 64-way 1-way 2-way 4-way 8-way 16-way 32-way 64-way 2 3.5 1.8 1.6 3 LMSHJ SHUO 1.4 2.5 1.2 2 1 0.8 1.5 0.6 1 0.4 0.5 0.2 0 0 1-way 2-way 4-way 8-way 16-way 32-way 64-way 1-way 2-way 4-way 8-way 16-way 32-way 64-way 3.5 6 3 5 JFF P��NVLP 2.5 4 2 3 1.5 2 1 1 0.5 0 0 1-way 2-way 4-way 8-way 16-way 32-way 64-way 1-way 2-way 4-way 8-way 16-way 32-way 64-way

  7. Total Access Energy ( pegwit ) 3HJZLW�� +LJK�PLVV�UDWH�IRU�KLJK�DVVRFLDWLYLW\ - S � 1-RAM 2500 Q L � 2-RAM H K 4-RAM F D 8-RAM 2000 & � 8-CAM % . 16-CAM � � U 32-CAM R 1500 I � V V H F F $ 1000 � U H S � \ J U 500 H Q ( � O D W 0 R 7 32X 64X 128X 256X 512X 1024X 0LVV�(QHUJ\�([SUHVVHG�LQ�0XOWLSOHV�RI����ELW�5HDG�$FFHVV�(QHUJ\ Total Access Energy ( perl ) 3HUO�� 9HU\�ORZ�PLVV�UDWH�IRU�KLJK�DVVRFLDWLYLW\ - S 1-RAM � 500 Q L � 2-RAM H 450 K 4-RAM F D 8-RAM 400 & � % 8-CAM . 350 � 16-CAM � U R 300 32-CAM I � V V 250 H F F $ 200 � U H S 150 � \ J U 100 H Q ( � 50 O D W R 0 7 32X 64X 128X 256X 512X 1024X 0LVV�(QHUJ\�([SUHVVHG�LQ�0XOWLSOHV�RI����ELW�5HDG�$FFHVV�(QHUJ\

  8. Other Advantages of CAM-tag � Hit signal generated earlier � Simplifies pipelines � Simplified store operation � Wordline only enabled during a hit � Stores can happen in a single cycle � No write buffer necessary Related Work � CACTI and CACTI2 o [Wilton and Jouppi ’94],[Reinman and Jouppi, ’99] o Accurate delay and energy estimate l Results within 10% o Energy estimate not suited for low-power designs o Typical Low-power features not included in CACTI l Sub-banking l Low-swing bitlines l Wordline gating l Separate CAM search line l Low-swing match lines o Energy Estimation 10X greater than our model for one CAM-tag cache sub-bank l Our results closely agree with [Amruthur and Horowitz, 98]

  9. Conclusion � CAM tags – high performance and low-power � Energy consumption of 32-way CAM < 2-way RAM � Easy to implement highly-associative tags � Low area overhead (10%) � Comparable access delay � Better CPI by reducing miss rate Thank You! ���������������������������������

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend