Accurate&Network&Clock&Synchronization at&Scale - - PowerPoint PPT Presentation
Accurate&Network&Clock&Synchronization at&Scale - - PowerPoint PPT Presentation
Accurate&Network&Clock&Synchronization at&Scale Balaji&Prabhakar Joint&work&with:&Yilong Geng,&Zi Yin,&Shiyu Liu,&Ashish&Naik,& Mendel&Rosenblum&and&Amin&Vahdat
A"classical"hard"problem:"affects"performance"of"distributed"systems
- Can"boost"performance"of"existing"solutions
- e.g.,"in"finance,"in"databases"(causality"and"external"consistency)
- Or"enable"new"ones
- e.g.,"distributed"ledgers,"fine?grained"resource"and"task"scheduling,
In"financial"trading"end$to$end'clock'synchronization'can"provide
- Fairness"in"trading"exchanges,"enabling"them"to"move"to"the"Cloud
- Accurate"timestamps"for"market"data"collected"at"different"geographic"regions
- Solve"a"host"of"other"issues"(e.g.,"prevent"front?running)
Background:"Clock"Synchronization
Synchronizing+clocks+with+probes
Probe Probe !"# $"% !"% $"#
t
&# = & + Δ&# &% = & + Δ&% All+methods+of+clock+sync+use+these+4+timestamps;+main+differences:
1. Where+to+take+timestamps? 2. How+to+process+the+timestamps?
Where%to%take%timestamps
Server%A Switch
Queue
CPU
NIC
PHY PHY PHY Queue
NTP PTP DTP
Huygens
- Uses NIC%or%CPU%timestamps%and,%respectively,%gets%nsD and%usDlevel%sync%accuracy
- is%a%softwareDbased%approach
Server%B
CPU
NIC
PHY
First,'Let’s Look at Factors'Which'Make'Clock'Sync'Hard
Each'clock'is'different
- Clocks'have'different'resonant'frequencies
- Clocks'behave'differently to'the'same'frequency'or'offset'control'signal
Control(Stopped
Factors(Which(Make(Clock(Sync(Hard
A(clock’s(behavior(varies(over(time
- Due(to(temperature
- Due(to(vibration(noise((e.g.,(cooling(fans),(etc
3am 8pm 3am 6am 9am
The$network$connecting$the$clock
- Variations$in$path$delays$which$affect$clock$sync$probe$latencies
- Network$path$asymmetries$(propagation$delays$from$A!B$and$B!A$are$different)
What$Makes$Clock$Sync$Hard
Therefore,$to$achieve$good$clock$sync
- Need$to$solve$both the$heterogenous+clocks+and$the jittery+network+and$problems
- Continuously
Two$approaches
- 1. Treat$each$set$of$timestamp$quadruples$individually
- DTP$and$PTP
- Susceptible$to$clock$rounding$errors
How$to$process$timestamps
Two$approaches
- 1. Treat$each$set$of$timestamp$quadruples$individually
- DTP$and$PTP
- Susceptible$to$clock$rounding$errors
- 2. Treat$many$sets$of$timestamp$quadruples$collectively
A. NTP$does$this$for$timestamp$quads$from$multiple$packets$between$a$single$pair$of$machines B. Huygens$does$this$for$timestamp$quads$from$many$probes$and$across$a$network$of$machines
- Goes$below$clock$rounding$errors
- Synchronizes$clocks$to$10s$of$nanoseconds$under$heavy$load
- Achieves$global$consensus$of$time
How$to$process$timestamps
Accurate(sync(methods(aim(to(use(the(network(to(synchronize(clocks
- To(synchronize(clocks(A(and(B(in(end7hosts,(synchronize(all(the(switches/links(between(A(
and(B((or(have(them(participate(in(“transparent(mode”)(
⎼ e.g.,(PTP,(DTP(
- Use(dedicated(wires(of(precisely(known(lengths(to(send(time(pulses(along
⎼ PPS
- Use(a(dedicated(synchronous(Ethernet(network(to(convey(pulses/time
⎼ White(Rabbit
All(of(the(above
- Require(hardware(upgrades(and/or(special(protocols
- Do(not(scale(to(large(distances(or(large(numbers(of(nodes
NTP,(on(the(other(hand,(is(scalable(to(1000s(of(nodes(across(large(distances
- But(is(quite(inaccurate((100s(of(us(– 10s(of(milliseconds)(and(has(a(high(variance
Current Solutions: Discussion
Our$Approach:$The$Huygens$Algorithm
Huygens$is$a$software9based$algorithm$that$can
- Work$off$of$timestamps$from$the$NICs,$hosts,$VMs$or$containers
- Accuracy$
⎼ NIC$timestamps ! nanosecond$level$ ⎼ host/VM/container$timestamps$! 100s$of$ns$to$192$us in$a$single$DC ⎼ across$multiple$DCs$ ! 1—10$us,$depending$on$the$WAN$link$quality ! Paper:$“Exploiting$a$Natural$Network$Effect$for$Fine9grade$Clock$Synchronization”,$ NSDI$2018.$https://www.usenix.org/conference/nsdi18/presentation/geng
Huygens$can$synchronize$clocks$at$just%the%desired%nodes
- No$need$to$sync$all$intermediate$clocks$! enables$scaling$in$size$and$distance
- Being$a$software$overlay,$it$needs$no$hardware$upgrades$and$can$deploy$in$current$
DCs$and$Clouds
Problem:)Given)N)clocks)connected)by)a)packet7switched)network,) synchronize)them)as)accurately)as)possible Introduce)a)probe&mesh:
- Each)clock)randomly)picks)a)constant)number,)say)5,)of)other)clocks)to)probe
- Probes)are)acked
- Each)probe)or)ack)carries)a)transmit)timestamp)and)a)receive)timestamp)from)the)
sending)and)receiving)clocks
- Probing)overhead:)minimal,)roughly)700Kbps,)in)total)per)node,)counting)probes)
and)acks
12
The)Huygens)Algorithm
Consider)one)pair)of)clocks)at)servers)A)and)B
Probe Ack !"# $"% !"% $"#
t
&# = & + Δ&# &% = & + Δ&%
Probe&from&A&to&B:
- Receive&time&=&transmit&time&+&delay
- !"# − %&# = (") − %&) + +,-.-/0&1-2 023 4565612/ 36708
- %&# − %&) = !"# − (") − +,-.-/0&1-2 023 4565612/ 36708
- %&# − %&) < !"# − (")
Ack&from&B&to&A:
- %&# − %&) > ("# − !")
Each&probe/ack&bounds&clock&discrepancy
Clock&bounds&over&time
!"# − !"% (&')
"% (')*)
Offset:&793.3&us Drift:&71.65&us/sec Offset:&796.6&us
Clocks'can'drift'away'from' each'other'as'fast'as'30us/sec
More'examples'of'drifting'clocks'
−30 −20 −10 10 20 30 Clock drift (us/sec) 50 100 150 200 250 300 350 Number of server pairs
- Clock&drifting&speed&varies&
- ver&time&due&to&
temperature&variations
- Approximate&clock&
difference&with&piecewise& linear&functions
Nonlinear&Clock&Drifts
17
1. Support)vector)machine 2. Coded)probes 3. Network)effect
3)Steps)to)Finding)the)Middle)Red Line
SVM$achieves$sync$ accuracy$of$300~400$ns. Noisy$timestamps$cause$ synchronization$errors!
Step$1:$SVMs
How$to$identify$packets$ with$zero$queueing$ delays$and$no$timestamp$ noise?
Step%2:%Coded%probes
Network
Second%packet% delayed%more First%packet% delayed%more Likely%no% queueing%delay Second% packet First% packet 10%us >>10%us <<%10%us ~%10%us
Empirically,+coded+probes+filter+out+ 90%+of+bad+data+and+reduce+the+ clock+sync+error+by+a+factor+of+4.
Coded+probes
Step%3:%The%network%effect
A B C
If%my%clock%is%at%10,%B’s% clock%must%be%at%10:15 If%my%clock%is%at%10:15,%C’s% clock%must%be%at%10:05 If%my%clock%is%at%10:05,%A’s% clock%must%be%at%9:50 Guys,%we%are%off%by% 10%minutes!
2? 2? 6? H10? 5? 15? 3.3 3.3 3.3
The$network$effect
A D B E C F
2 4 6 8 10 12 K 10 20 30 40 50 Error (ns)
mean 99th percentile
!"#(%&& '()%& *. ,. ) ≈ 1 !"#(%&& 1%(2&% *. ,. )
Google&– Jupiter&testbed
- 32stage&40Gb/s&Clos&network
- 20&racks,&237&servers
A&10G240G&production&network
- 52stage&Clos&network
Stanford&testbed
- 22layer&1G&network
- 8&racks,&128&servers
- Cisco&2960&and&Cisco&3560&switches
Many&financial&firms
Pilots&and&deployments
Stanford
- 2*stage.1Gb/s.Clos.network
- 8.racks,.128.servers
Stanford.Testbed
Cisco 2960
Comparison*with*NTP
NTP (with)NIC)timestamps) Huygens Mean*abs.* error 99th percentile* abs.*error Mean*abs. error 99th percentile* abs.*error 0%*load 177.7*ns 558.8*ns 10.2*ns 18.5*ns 40%*load 77,975*ns 347,638 ns 11.2*ns 22.0*ns 80%*load 211,011*ns 778,070*ns 14.3*ns 32.7*ns
27
10 20 30 40 50 60 70 80 90 Network load (%) 10 20 30 40 50 60 Error (ns)
mean 99th percentile
Robust'to'high'network'load
Huygens:'Synchronization'error' stays'under'50'ns'at'90%'load
Comparison*with*NTP
NTP (with)NIC)timestamps) Huygens Mean*abs.* error 99th percentile* abs.*error Mean*abs. error 99th percentile* abs.*error 0%*load 177.7*ns 558.8*ns 10.2*ns 18.5*ns 40%*load 77,975*ns 347,638 ns 11.2*ns 22.0*ns 80%*load 211,011*ns 778,070*us 14.3*ns 32.7*ns
Demo%of%Clock%Sync%in%the%Cloud
31