High Speed and Low-Power SerDes Architectures using Chord Signaling
Amin Shokrollahi
In collaboration with
the Engineering Team of Kandou Bus
High Speed and Low-Power SerDes Architectures using Chord Signaling - - PowerPoint PPT Presentation
High Speed and Low-Power SerDes Architectures using Chord Signaling Part 1: THEORY Amin Shokrollahi In collaboration with the Engineering Team of Kandou Bus Chip-to-Chip Communication Chip 1 Chip 2 Communication wires Task: reliably
In collaboration with
the Engineering Team of Kandou Bus
Chip 1 Chip 2
Communication wires
Task: reliably transmit information from Chip 1 to Chip 2
Chip 1 Chip 2
Or any channel with a lot of “random” noise
Throughput Energy/bit Recovery time/ bit Wireless Mbps nJ nano-second Chip-to-Chip Gbps pJ pico-second Hardly any power or time to recover a transmitted bit
them)
same amount.
SSO Ref Xtlk EMI CM ISI Thermal Deterministic Random
Mitigation techniques SSO Modulation, design Ref Modulation, design Xtlk Layout, design EMI Shielding, modulation Common mode Modulation, design ISI Equalization, modulation Thermal Design
Mitigation techniques SSO Modulation, design Ref Modulation, design Xtlk Layout, design EMI Shielding, modulation Common mode Modulation, design ISI Equalization, modulation Thermal Design
communication wires
1 1 0 0 0 1 0 1 1 1 1 0 Vcm Vhigh Vlow
Driver Comparator
Transmission line
Reference
Transmits one bit per wire
+1 −1 Error Error Distinguisher
Single-ended SSO
+ Conclusion Not good for very high speed communication
Comparator
Transmission line
sgn(x-y)
Transmits one bit per a pair of wires
x b
y
Single-ended Differential SSO
Ref
EMI
Common mode
ISI +
Not good for very high speed communication pin count can be a problem
➡ 4-PAM signaling: 4 states, i.e., 2 bits ➡ 8-PAM signaling: 8 states, i.e., 3 bits ➡ What happens to noise?
Single-ended Differential 4-PAM diff. SSO
+/- Ref
+ Common mode
+ ISI +
Conclusion High speed problematic Pin count problematic High speed issues
➡ 4-PAM signaling: 4 states, i.e., 2 bits ➡ 8-PAM signaling: 8 states, i.e., 3 bits ➡ What happens to noise?
➡ Can this be done? ➡ How much more data can be sent? ➡ What happens to noise? ➡ Etc.
Chord signaling
single-ended, differential, differential 4-PAM, etc.)
{1, +1} 3 b
Vref
1 −1 1 −1 1 1 −1 −1
All possibilities
Indistinguishable +1 −1 (+1, +1) (−1, +1) (−1, −1) (+1, −1) Distinguishable Distinguishable Indistinguishable D D Distinguisher New distinguisher
y x − y x x + y + − ++ + + ++ + − −+ + − −−
− − −−
− + −− − + +− − + ++ 000 001 011 010 110 111 101 100
6 planes (comparators) 24 chambers = 24 points 4.5 bits on 3 wires
points
distinguish?
n−1
X
i=0
✓c i ◆ 1 + (−1)n−1−i
n = # wires c = # comparators = # hyperplanes
2 3 4 5 6 7 8 9 10 11 12 13 2 4 6 8 10 12 14 16 18 20 22 24 26 3 4 8 14 22 32 44 58 74 92 112 134 158 4 4 8 16 30 52 84 186 260 352 464 598 5 4 8 16 32 62 114 198 326 512 772 1124 1588 6 4 8 16 32 64 126 240 438 764 1276 2048 3172 7 4 8 16 32 64 128 254 494 932 1696 2972 5020 8 4 8 16 32 64 128 256 510 1004 1936 3632 6604
Possible to transmit 7 bits on 4 wires with a detector using 8 comparators
128
Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+
+
+/- Conclusion High speed problematic Pin count problematic High speed issues May have issues
Chord signaling can help lower the frequency. Does it help?
channel
wire.
single-ended signaling, corresponding to about 8 dB loss.
~36 pico-seconds ~27 mV ~30% of UI
~35 pico-seconds ~22 mV ~23.5% of UI
000 100 010 101 001 011
~38 pico-seconds ~22 mV ~21.5% of UI
000 001 011 010 110 111 101 100
6 12 18 24 Single-ended Hexagon Octagon Vertical Horizontal %UI
Lowering the frequency didn’t really help
ERROR
20 40 60 80 100 Single-ended Hexagon Octagon GW Vertical Horizontal %UI
How was this designed?
6 12 18 24 Single-ended Hexagon Octagon Vertical Horizontal %UI
Leads to errors
Smallest distance x Largest distance y
~ 17.5 - 24 psec 20 Gbps
~ 18 - 23 psec 18.7 Gbps Almost same width Same ISI ratio
~ 7.5 - 12 psec 24 Gbps
0 psec 24 Gbps Higher ISI ratio loses....
Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+
+
+/- Conclusion High speed problematic Pin count problematic High speed issues May have issues Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+
+
+ by design Conclusion High speed problematic Pin count problematic High speed issues May have issues
a b c d e a + x b + x c + x d + x e + x
Common mode noise
Means that comparators should evaluate to 0 on vector (1,1,1,...,1)
Common mode component is along vector (1,1,1,...,1) Means that the sum of the values on the wires should be constant.
x b
y
Common mode direction Orthogonal space Codewords should be on this line (-1,1) (1,-1)
Orthogonal space Codewords should be on this line (-1,1) (1,-1)
Common mode direction
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + + + + + * =
Orthogonal matrix Tempering orthogonal matrix
±(1, −1/3, −1/3, −1/3) ±(−1/3, 1, −1/3, −1/3) ±(−1/3, −1/3, 1, −1/3) ±(−1/3, −1/3, −1/3, 1)
ACTUAL IMPLEMENTATION MAY BE DIFFERENT
1 2 1 1 1 1 1 −1 1 −1 1 1 −1 −1 1 −1 −1 1
Hadamard Transform
Disappears by construction.
Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+
+
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues
Disappears, since no reference needed.
Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
+
mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
EMI
+
mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues
Largely mitigated, since sum of currents on the wires is 0 (far- fields cancel each other). Will talk about it later.
Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
EMI
+
mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues Single-ended Differential 4-PAM diff. Chord Signaling (so far) SSO
+/-
EMI
+ + Common mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues May have issues
Largely mitigated through additional constraint on tempering matrix.
SSO Ref EMI Common mode ISI Conclusion Chord Signaling + + + + + Can be used in a wide range of applications Single-ended Differential 4-PAM diff. Chord Signaling SSO
+/- + Ref
EMI
+ + Common mode
+ + ISI +
+ Conclusion High speed problematic Pin count problematic High speed issues
A (n, N, c, I)-Chordal Code (CC) is a pair where
such that
Lots of practical concerns swept under the rug.
[−1, 1]n Λ (Rn)∗ |λ(c1)| |λ(c2)| ≤ I 8c1, c2 2 C, c1 6= c29λ 2 Λ: λ(c1)λ(c2) < 0 (C, Λ) ∀λ ∈ Λ, c1, c2 ∈ C : Given n and N, find minimum I, such that there exists a (n, N, c, I)-CC for some c.
A (n, N, c, I)-Chordal Code (CC) is a tuple where
such that
Lots of practical concerns swept under the rug. (C, Λ, I) [−1, 1]n Λ (Rn)∗ I C × Λ 8λ 2 Λ8c1, c2 2 C, (c1, λ), (c2, λ) 62 I : |λ(c1)| |λ(c2)| ≤ I 8c1, c2 2 C, c1 6= c2 : 9λ 2 Λ, (c1, λ), (c2, λ) 62 I : λ(c1)λ(c2) < 0
Number of wires Number of codewords ISI-ratio Detection complexity
log2(N) n Rate, or pin-efficiency
For a (n, N, c, I)-CC we have:
|λ(c1)| |λ(c2)| ≤ I = ⇒ |λ(c2)| |λ(c1)| ≥ 1 I
n−1
X
i=0
✓c i ◆ 1 + (−1)n−1−i
What is the best ISI-ratio for n = 3, N = 16? Best result so far: 2.39304, 11 comparators, not practical
First order analysis of strength of the Electric far-field generated by a charge loop Area = A Frequency = f Current = I Far field strength ~ f 2 · I · A Fix all parameters to 1 for a baseline computation. Then far-field has FOM = 1.
+1
+1
FOM = 1 FOM = -1 Average strength = (1+|-1|)/2 = 1 Differential signaling:
Two differential lanes:
+ + + +
Average strength = (2+|-2|)/4 = 1
General form:
c0 c1 c2 c3 c5 c4 c5
c4+c5
c1+c2+c3+c4+c5
c3+c4+c5
c2+c3+c4+c5
+ + + + c5 c4+c5 c3+c4+c5 c2+c3+c4+c5 c1+c2+c3+c4+c5 + + + +
General form:
c1+2c2+3c3+4c4+5c5 c5
c4+c5
c1+c2+c3+c4+c5
c3+c4+c5
c2+c3+c4+c5
+ + + + c5 c4+c5 c3+c4+c5 c2+c3+c4+c5 c1+c2+c3+c4+c5 + + + + c’(1)
FOM for a code C in which for all codewords sum of coordinates is zero:
c2C
Two differential lanes: 1 4 (2 + | − 2|) = 1 ENRZ: FOM = (2+2/3)/2 = 4/3 ±(1, −1/3, −1/3, −1/3) → 2 ±(−1/3, 1, −1/3, −1/3) → 2 3 ±(−1/3, −1/3, 1, −1/3) → 2 3 ±(−1/3, −1/3, −1/3, 1) → 2 Equal throughput: differential runs at 1.5 times the frequency. Throughput-normalized FOM: Differential: (1.5)2 = 2.25 ENRZ: 4/3 ~ 1.33 SMALLER!
ENRZ DS
the design of noise resilient modulation schemes for chip-to-chip communication.
number of codewords.
construction to the hypercube.
subject of current presentation).
In collaboration with
the Engineering Team of Kandou Bus
Claude Elwood Shannon 1916 - 2001 MSc Thesis, MIT, 1937
Aerial view of Bell Labs in Murray Hill
http://users.ece.gatech.edu/~juang/B%20JUANG%20Georgia%20Tech%20Pictures.html
First Transistor Bardeen, Shockley, and Brettain, 1948 Received the Nobel Prize in Physics in 1956
http://www.just2good.co.uk/cpuSilicon.php ARMv6 processor
Reduc?on in capacitance reduces device delay by same factor Reduc?on in size reduces effec?ve capacitance by 1
√ 2
TD reduced by implies frequency 1/(3*TD) is increased by
1 √ 2 √ 2
To keep constant electrical field, voltage is mul?plied by 1
√ 2
x y A y √ 2 x √ 2 A 2
TD TD TD
Ring oscillator
x y y √ 2 x √ 2 A A 2
Reduc?on in feature size by
1 √ 2
Gordon Moore 1929 - Electronics Magazine 1965
http://www.businessinsider.com/munster-iphone-lines-launch-day-2013-9?IR=T https://www.wsj.com/articles/apple-store-lines-return-as-iphone-x-debuts-1509683635 https://www.gottabemobile.com/why-you-shouldnt-bother-lining-up-for-the-iphone-x-and-iphone-8/
Processor Transistor count Manufacturer Transistor pitch ARM-1 25'000 ARM Holdings 3000 nm Intel i-960 250'000 Intel 600 nm Pentium Pro 5'500'000 Intel 500 nm Pentium III 45'000'000 Intel 130 nm Pentium 4 184'000'000 Intel 65 nm Apple A7 1'000'000'000 Apple 28 nm Apple A8 2'000'000'000 Apple 20 nm Xeon Haswell 5'560'000'000 Intel 22 nm Fiji (GPU) 8'900'000'000 AMD 28 nm iPhone 6 iPhone 5S PC’s Gaming
0.00 7.50 15.00 22.50 30.00 1974 1976 1977 1978 1980 1982 1984 1985 1986 1988 1989 1990 1992 1994 1995 1997 1999 2001 2004 2006 2008 2010 2012 2014 2017 2018 2020 2022 2024 2026 2028 Actual Moore’s law
Andreas Bechtolsheim 1955 -
Time Performance CPU: 64X/12Y GigE: 10X/12Y Routers: 4X/12Y
“The amount of data is doubling every 24 months, and will reach 44 zettabytes (4.4 x 1022 bytes) by 2020” – IDC, 2014 “IP datacenter traffic will be 8.6 zettabytes by 2018” – Cisco forecast, 2013 “Monthly global mobile data traffic will surpass 24.3 exabytes (2.4 x 1018 bytes) by 2019” – Cisco forecast, 2015 “In 2008 the world’s 27 million business servers processed 9.57 zettabytes” – Computerworld, 2011
Noise Throughput 20G 400G 20x Today Target
Way too much power consump?on in wireline world at super high speeds (at least up un?l now).
av
av av
av
Encoder w0 w1 w2 w3 w4 w5 b0 b1 b2 b3 b4 b0 b1 b2 b3 b4
1 2 1 1 2 3 4 5 3 4 5 4 5
Rx frontend
DRAM DRAM Analog DSPs Memory interface Memory interface Long-reach SerDes Long-reach SerDes Long-reach SerDes Long-reach SerDes Long-reach SerDes Photonics (SiGe) Photonics (SiGe) Photonics (SiGe) Photonics (SiGe) Photonics (SiGe)
System on Chip SoC
Long-reach SerDes Long-reach SerDes Long-reach SerDes Long-reach SerDes Long-reach SerDes Photonics (SiGe) Photonics (SiGe) Photonics (SiGe) Photonics (SiGe) Photonics (SiGe)
SoC/2 SoC/2
Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing Glasswing
TC0 TC1 TC3 TC2 12 mm 5 mm 24 mm
comparable products
(more than 20Gbps/wire)
(16nm) in products in 2018
several $B per year through
electronics, and wireless space
[1] A. Abbasfar, “Generalized differential vector signaling,” Proc. of the ICC, pp. 1-5, 2009. [2] A. Abbasfar, “Simplified receiver for use in communication systems,” U.S. patent no. 8,159,375 [3] A. Amirkhany, “Multi-carrier signaling for high speed electrical links,” Ph.D. Thesis, Stanford University, 2008. (http://vlsiweb.stanford.edu/
people/alum/pdf/0803_AmirAmirkhany_Analog_Multitone_Links.pdf)
[4] A. Amirkhany, A. Abbasfar, V. Stojanovic, and M.A. Horowitz, “Practical limits of multi-tone signaling over high-speed backplane electrical links,” Proc. Of the ICC, pp. 2693–2698, 2007. [5] A. Amirkhany, K. Kaviani, A. Abbasfar, F. Shuaeb, W. Beyene, C. Hoshino, C. Madden, K. Chang, and C. Yuan, “A 4.1pJ/b 16Gb/s Coded Differential Bidirectional Parallel Electrical Link,” ISSCC 2012, pp. 138-140. [6] A. Bechtolsheim, “Moore’s law and networking,” North American Network Operator’s Group (NANOG) meeting, 2012. (https://
www.nanog.org/meetings/nanog55/presentations/Monday/Bechtolsheim.pdf)
[7] D.M. Chiarulli, J.D. Bakos, J.R. Martin, and S.P. Levitan, “Area, power, and pin efficient bus transceiver using multi-bit-differential signaling,” IEEE International Symposium on Circuits and Systems, pp. 1662–1665, 2005. [8] H. Cronie and A. Shokrollahi, “Orthogonal differential vector signaling,” U.S. Patent application no. 12/784414. [9] H. Cronie, A. Shokrollahi, and A. Tajalli, “Methods and systems for noise resilient, pin-efficient and low-power communications with sparse signaling codes,” U.S. Patent no. 8,649,445. [10] K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, N. Masuda, T.Takemoto, F. Yui, and T. Saito, “A 12.3-mW 12.5-Gb/s complete transceiver in 65-nm CMOS process,” IEEE J. Solid State Circuits, 45, 2010. [11] K. Gharibdoust, A. Tajalli, and Y. Leblebici, “A 7.5 mW 7.5 Gb/s mixed NRZ/multi-tone serial- data transceiver for multi-drop memory interfaces in 40nm CMOS,” ISSCC 2015, pp. 1–3, 2015. [12] M. Harwood et al., “A 12.5 Gb/s SerDes in 65nm CMOS using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery”, ISSCC 2007, pp. 436-439. [13] A. Healey and C. Morgan, “A comparison of 25 Gbps NRZ & PAM-4 Modulation used in legacy & premium backplane channels,” DesignCon 2012. [14] A. Hormati, A. Shokrollahi, and R. Ulrich, “Method and apparatus for low power chip- to-chip communications with constrained ISI- ratio,” U.S. Patent application no. 14/612241. [15] A. Hormati and A. Shokrollahi, “ISI tolerant signaling: a comparative study of PAM4 and ENRZ,” DesignCon 2016. [16] A. Hormati, A. Tajalli, Ch. Walter, K. Gharibdoust, and A. Shokrollahi, “ A Versatile Spectrum Shaping Scheme for Communicating Beyond Notches in Multi-Drop Interfaces,” DesignCon 2016.
[17] S.A. Ibrahim and B. Razavi, “Design requirements of 20-Gb/s serial links using multi-tone signaling,” ISSCC 2009, pp. 1–4. [18] J. Lee, M. Chen, and H. Wang, “Design and Comparison of Three 20-Gb/s Backplane Transceivers for Duobinary, PAM4, and NRZ Data”, JSSC, vol. 43, no .9, 2008. [19] F.J. MacWilliams and N.J.A. Sloane. “The Theory of Error-Correcting Codes.” North-Holland, 1988. [20] M. Mansuri, J.E. Jaussi, J.T. Kennedy, T. Hsueh, S. Shekhar, G. Balamurugan, F. O’Mahony, C. Roberts, R. Mooney, and B. Casper, “A scalable 0.128-to-1Tb/s 0.8-to-2.6pJ/b 64-lane parallel I/O in 32nm CMOS,” ISSCC 2013, pp. 402–403. [21] A. Nazemi, Kangmin Hu, B. Catli, Delong Cui, U. Singh, T. He, Zhi Huang, Bo Zhang, A. Momtaz, and J. Cao, “A 36Gb/s PAM4 transmitter using an 8b 18GS/s DAC in 28nm CMOS,” ISSCC 2015, pp. 58–60. [22] D. Oh, F. Ware, J.-H. Kim, A. Abbasfar, J. Wilson, L. Luo, R. Schmitt, and C. Yuan, “Pseudo- differential vector signaling for noise reduction in single-ended signaling systems,” Designcon, 2009. [23] P. Orlik and H. Terao. Arrangements of Hyperplanes. Springer Verlag, 1992. Number 300 in Grundlehren der mathmetischen Wissenschaften. [24] V. Parthasaraty, “PAM4 digital receiver performance and feasibility,” IEEE 802.3bj meeting, 2012 (http://www.ieee802.org/3/bj/public/jan12/
parthasarathy_01_0112.pdf)
[25] D.V. Perino and J.B. Dillon, “Apparatus and method for multilevel signaling,” U.S. Patent no. 6,005,895. [26] J.W. Poulton, S. Tell, and R. Palmer, “Multiwire differential signaling,” University of North Carolina-Chapel Hill, 2003. [27] A. Shokrollahi, “Vector signaling codes with reduced receiver complexity,” U.S. Patent application no. 14/313966. [28] A. Shokrollahi, “Vector signaling codes with high pin-efficiency and their applications to chip-to-chip communications and storage,” U.S. Patent application no. 14/612252. [29] A. Shokrollahi and R. Ulrich, “Vector signaling codes with increased signal to noise characteristics,” U.S. Patent application no. 62/015172. [30] A. Shokrollahi et al., “A Pin-Efficient 20.83 Gb/s/wire 0.94 pJ/bit Forwarded Clock CNRZ-5 coded Serial Link up to 12mm for MCM Packages in 28nmTechnology,” ISSCC 2016. [31] A. Singh et al., “A pin- and power-efficient low-latency 8-to-12Gb/s/wire 8b8w-coded SerDes link for high-loss channels in 40nm technology,” ISSCC 2014, pp. 442–443. [32] J. Poulton, W.J. Dally, , X. Chen, J.G. Eyles, Th.H. Greer, S.G. Tell, J.M. Wilson, and C.Th. Gray, “A 0.54 pJ/b 20 Gb/s ground-referenced single-ended short-reach serial link in 28 nm CMOS for advanced packaging applications,” JSSC, vol. 48, pp. 3206–3218, 2013.
[33] D. Slepian, “Permutation modulation,” Proc. IEEE, vol. 53, pp. 228–236, 1965. [34] D. Stauffer, J. Trinko-Mechler, M.A. Sorna, K. Dramstad, C.R. Ogilvie, A. Mohammad, and J.D. Rockrohr. “High Speed SerDes Devices and Applications.” Springer Verlag, 2009. [35] V.M. Stojanovic, A. Amirkhany, and J. Zerbe, “Multi-tone system with oversampled precoders,” U.S. Patent no. 7,817,743. [36] A. Tajalli, H. Cronie, and A. Shokrollahi, “Methods and circuits for efficient processing and detection of balanced codes,” U.S. Patent no. 8,593,305. [37] Th. Toifl, C. Menolfi, M. Ruegg, R. Reutemann, P. Buchmann, M. Kossel, T. Morf, J. Weiss, and M.L. Schmatz, “A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology,” JSSC, vol. 41, pp. 954–965, 2006. [38] R. Ulrich, “Multilevel driver for high speed chip-to-chip communications,” U.S. patent application no. 14/315,306. [39] G.A. Wiley, “Three phase and polarity encoded serial interface,” U.S. Patent no. 8,472,551. [40] L. Yang and J. Armstrong, “Oversampling to reduce the effect of timing jitter on high speed OFDM systems,” IEEE Communication Letters, vol. 14, pp. 196–198, 2010. [41] S. Zogopoulos and W. Namgoong, “High-Speed Single-Ended Parallel Link Based on Three-Level Differential Encoding”, JSSC,