Optimal DDR4 System with Data Bus Inversion Hing Yan (Thomas) To, - - PowerPoint PPT Presentation

optimal ddr4 system with data bus inversion
SMART_READER_LITE
LIVE PREVIEW

Optimal DDR4 System with Data Bus Inversion Hing Yan (Thomas) To, - - PowerPoint PPT Presentation

TITLE Optimal DDR4 System with Data Bus Inversion Hing Yan (Thomas) To, (Xilinx Inc.) Image Changyi Su (Xilinx Inc.), Juan Wang (Xilinx Inc.) Dmitry Klokotov (Xilinx Inc.), Lizhi Zhu (Xilinx Inc.), John Schmitz (Xilinx Inc.) Penglin Niu


slide-1
SLIDE 1

TITLE

Image

Optimal DDR4 System with Data Bus Inversion

Hing Yan (Thomas) To, (Xilinx Inc.) Changyi Su (Xilinx Inc.), Juan Wang (Xilinx Inc.) Dmitry Klokotov (Xilinx Inc.), Lizhi Zhu (Xilinx Inc.), John Schmitz (Xilinx Inc.) Penglin Niu (Xilinx Inc.), Yong Wang (Xilinx Inc.)

slide-2
SLIDE 2

SPEAKER

Hing Yan (Thomas) To

Technical Director, Xilinx Inc. tto@xilinx.com Thomas is a Technical Director in System Memory Signal Integrity & Device Power Group at Xilinx, Inc. Prior to joining Xilinx, Thomas was with NVIDIA Advanced Technology Group focused on high speed (32GTs) circuits & system channel designs and supported different test chips for different advanced process nodes such as 20nm SOC & 16nm FINFET process. Before NVIDIA, Thomas worked for Intel for more than 16 years covered and led many different types of system memory IO development such as Sandy Bridge Server DDR IO and covered many different system memory technology ranging from DDR1 to DDR4. Thomas received his PhD degree in Electrical Engineering from the Ohio State University in 1995 & he has over 37 patents in the fields of mixed signal IO circuits and system memory configurations as well as high speed clocking for high speed memory designs.

slide-3
SLIDE 3

Outline

  • High Performance Computing Performance Requirement Trend
  • Typical Power Distribution in Computing System Example
  • System Memory Power Improvement Approach
  • Technology Process Node Scaling Trend
  • IO Voltage Scaling Trend
  • DDR4 IO signaling
  • Data Bus Inversion (DBI) in DDR4 Interface
  • DQ bus data Functional View with DBI enabled
  • DDR4 System Power Improvement Example
  • DDR4 IO Interface Training & Calibration with DBI
  • Power Noise Improvement with DBI
  • Experimental Data Margin Validation and Results
  • Summary & Conclusions
slide-4
SLIDE 4

Computation Requirement Trend

1.00E+00 1.00E+01 1.00E+02 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09 1.00E+10

TFLOPs

Source:Top500.org

Top #1 System TFLOPs

 Computing Performance Requirement increases exponentially.  Expected to maintain similar or lower the Power Envelope.

slide-5
SLIDE 5

Typical Power Distribution Comparison

60% 14% 2% 19% 5%

Xeon +DDR3

CPU Board net Mem Store

30% 4% 6% 48% 12%

Atom + DDR3

CPU Board net Mem Store

 Traditionally CPU has been the dominated component.  System Memory becomes a factor as CPU power improves relatively.

slide-6
SLIDE 6

System Memory Power Improvement Approach

  • Technology Process Node Scaling Trends

– Improving Process Technology improves speed, power and memory density.

  • IO Voltage Scaling Trends

– Scaling down the IO voltage improves IO power.

  • IO signaling Improvements

– IO Signaling can improve IO power

slide-7
SLIDE 7

DRAM Process Technology Trend

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

New DRAM Process Technology node every year

5xnm 4xnm 3xnm 2xnm 2ynm 2znm 5xnm 4xnm 3xnm 2xnm 2ynm 2znm

* Customer sample shipping date for 1st procduct of each node

8xnm 6xnm

 DRAM introduced with new Process Technology Node every year .

slide-8
SLIDE 8

DRAM Power Improvement between DDR3 and DDR4

20 40 60 80 100 IDD0 IDD2N IDD4R IDD4W IDD5 IDD5N

IDD current Comparison

DDR3 DDR4

~35%

 DDR4 device improves power from DDR3 device

slide-9
SLIDE 9

DRAM IO Voltage Scaling Trend

 DDR IO Voltage has been scaling down from generation to generation.  Scaling rate is slowing down.

slide-10
SLIDE 10

Change of IO Standard

VDDQ VDDQ

 Only Logic Low in DDR4 dissipates DC power.

slide-11
SLIDE 11

DDR4 Per Unit Power Distribution Comparison

17% 62% 21%

Relative Power Distribution

Total Activate Power Total RD/WR/Term Power Total Background Power

 Even with Power Reduction w.r.t. DDR3, RD/WD/Term Power still a large portion.  DDR4 can enable DBI to further improve IO power opportunistically.

Assume 70% Read/30%Write no DBI enabled

slide-12
SLIDE 12

DBI Functional View

Data From Core Controller with DBI Enabled capability Channel DRAM DQ & DQS DBI#

slide-13
SLIDE 13

DBI Functional Burst Length View

Data From Core

Controller with DBI Enabled capability

Channel

DRAM

DQ & DQS

DBI#

Data From CORE

slide-14
SLIDE 14

System Power Comparison Set Up

FPGA DRAM

Write % Read % Test Programs (Traffic Gen) with different Rd%--Wr% ratio Test Programs with No DBI with DBI TG_a TG_m

slide-15
SLIDE 15

Read & Write Percentage Ratio for Relative Power Comparison

79 77 75 73 70 67 63 50 57 44 40 21 23 25 28 30 33 37 50 43 56 60

TG_A(79%RD:21%WR) TG_B(77%RD:23%WR) TG_C(75%RD:25%WR) TG_D(73%RD:28%WR) TG_E(70%RD:30%WR) TG_F(67%RD:33%WR) TG_G(63%RD:37%WR) TG_H(50%RD:50%WR) TG_J(57%RD:43%WR) TG_K(44%RD:56%WR) TG_M(40%RD:60%WR) Rd % Wr %

 Analyze the relative power improvement with different work loads.

slide-16
SLIDE 16

Relative Power Improvement with DBI

22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00 30.00 31.00 32.00 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 90.00 100.00

Nominized PWR to No DBI(DBI -Diabled) Nominized PWR to No DBI (DBI-Enabled) % Improvement

Relative System Power (%) Relative Improved System Power (%) ref to No DBI

 System with DBI enabled shows relative power improvement.  Improved amount varies with Read and Write % ratio

slide-17
SLIDE 17

DBI need Calibration

DQ0 (I/O) DQ1 (I/O) DQS(I/O) DQS#(I/O)

CK_GEN_ DQ TX FIR

DBI#(I/O) DQ0 (I/O) DQ1 (I/O) DQS (I/O) DQS# (I/O) DBI# (I/O)

CTLE RCV Vref Delay CK_GEN_ DQS TX FIR CTLE RCV Vref Delay

 DBI bit need to be calibrated together with other DQ bits

slide-18
SLIDE 18

Step Function Representation of with DQ Pattern

𝑬𝑹[𝟏] 𝒖 = 𝑬𝑹[𝟏]𝒔 𝒖 − 𝒔𝟐𝑼 − 𝑬𝑹[𝟏]𝒈 𝒖 − 𝒈𝟐𝑼 + ⋯ 𝑬𝑹[𝟏] 𝒔 𝒖 − 𝒔𝒋𝑼 − 𝑬𝑹[𝟏]𝒈 𝒖 − 𝒈𝒋𝑼 + ⋯ 𝑬𝑹[𝟖] 𝒖 = 𝑬𝑹[𝟖]𝒔 𝒖 − 𝒔𝟐𝑼 − 𝑬𝑹[𝟖]𝒈 𝒖 − 𝒈𝟐𝑼 + ⋯ 𝑬𝑹[𝟖] 𝒔 𝒖 − 𝒔𝒋𝑼 − 𝑬𝑹[𝟖]𝒈 𝒖 − 𝒈𝒋𝑼 + ⋯ 𝑬𝑹𝑻 𝒖 = 𝑬𝑹𝑻𝒔 𝒖 − 𝒔𝟐(𝑼 − 𝑼 𝟑) − 𝑬𝑹𝑻𝒈 𝒖 − 𝒈𝟐 𝑼 − 𝑼 𝟑 + ⋯ +𝑬𝑹𝑻𝒔 𝒖 − 𝒔𝒋(𝑼 − 𝑼 𝟑) − 𝑬𝑹𝑻𝒈 𝒖 − 𝒈𝒋(𝑼 − 𝑼 𝟑) + ⋯

Channel Configuration System

slide-19
SLIDE 19

DQ Eye Reference to DQS

𝐸𝑅_𝐸𝑅𝑇 𝐹𝑧𝑓(𝑢) = 𝑧(𝑢 + 𝑙𝑗𝑈) 0 ≤ 𝑢 ≤ 𝑈, ∀ 𝑙𝑗∈ ℕ0 , 𝑗 = 𝑠, 𝑔

TdivW_total VdivW_total Tjit

DV

 Based on the rise and fall unit step response & their combinations:-  Construct calibration pattern & to search for worst case jitter and eye height.

slide-20
SLIDE 20

DBI bit Calibration with DQ

 Make sure all DQ bits will have toggling coverage.

Data From CORE

slide-21
SLIDE 21

Power Noise Improvement with DBI enabled

 PDN Impedance (Z_pdn) is a function of frequency  Jitter is a function of Z_pdn and step current load characteristic.

slide-22
SLIDE 22

Voltage Droop Improvement with DBI Enabled

 Average step current reduced by enabling DBI.  Voltage Droop performance improves.

slide-23
SLIDE 23

System Eye Margin Improvement Validation Set Up

 Validation Methods:-  Direct measurement of DQ Eye at DRAM inputs.  Write and Read Eye Shmoo.  Compare with and without DBI enabled.

slide-24
SLIDE 24

Direct Write Eye Measurement at DRAM

 Write Eye measurement shows a 5% UI jitter improvement.  Validation extended to create functional Read and Write Eye shmoo next.

slide-25
SLIDE 25

Read and Write Shmoo Set Up

slide-26
SLIDE 26

Read Eye Shmoo without DBI Enabled

slide-27
SLIDE 27

Read Eye Shmoo with and without DBI Enabled

slide-28
SLIDE 28

Write Eye Shmoo without DBI Enabled

slide-29
SLIDE 29

Write Eye Shmoo with and without DBI Enabled

slide-30
SLIDE 30

94 96 98 100 102 104 106 108 110 112

Write Eye Width @ Vref Read Eye Width @ Vref

No DBI DBI

~11% ~7%

Eye Shmoo Comparison

 Eye width improvement observed  Improvement amount are different.  Write improved by 11%  Read improved by 7%  Different improvement implies different step current impact  Different PDN between DRAM unit and controller PHY.

slide-31
SLIDE 31

Summary and Conclusions

  • Computing Performance requirements drive the need to reduce system power.
  • System memory Power became one of the major factor to the total system power.
  • Traditional improvement methods, such as scaling process node and IO voltage,

slow down.

  • DDR4 IO introduced DBI function to opportunistically reduce the IO power.
  • Power improvement amount varies with Write and Read Ratio.
  • DBI reduced the average step current in memory system, hence improved channel

margin.

  • Experimental data showed the Channel Jitter improvement differs between Write

and Read direction.

slide-32
SLIDE 32
  • QUESTIONS?

Thank you!