PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+ ! - - PowerPoint PPT Presentation

pixelvault using gpus for securing cryptographic opera ons
SMART_READER_LITE
LIVE PREVIEW

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+ ! - - PowerPoint PPT Presentation

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+ ! Giorgos+Vasiliadis + + +gvasil@ics.forth.gr+ Elias!Athanasopoulos ! !elathan@ics.forth.gr! Michalis!Polychronakis ! !mikepo@cs.columbia.edu! So=ris!Ioannidis! ! !


slide-1
SLIDE 1

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+!

Giorgos+Vasiliadis + + +gvasil@ics.forth.gr+ Elias!Athanasopoulos ! !elathan@ics.forth.gr! Michalis!Polychronakis ! !mikepo@cs.columbia.edu! So=ris!Ioannidis! ! ! !so=ris@ics.forth.gr!

1!

slide-2
SLIDE 2

How!SSL/TLS!works!

  • Secure!Sockets!Layer!(SSL/TLS)!is!a!deGfacto!

standard!for!secure!communica=on!!

– Authen=ca=on,!confiden=ality,!integrity!!

2!

Client Server

Client Initiates Handshake Server Responds + Certificate Client sends secret Server and Client create Keys Secure Data Exchange RSA decryption AES cipher

slide-3
SLIDE 3

Mo=va=on!

  • Secret!keys!may!remain!unencrypted!in!CPU!

Registers,!RAM,!etc.!

– Memory!aOacks! – DMA/Firewire!aOacks! – Heartbleed!aOack! – …!

3!

slide-4
SLIDE 4

PixelVault!Overview!

  • Runs!encryp=on!

securely!outside!CPU/ RAM!

  • Only!onGchip!memory!
  • f!GPU!is!used!as!

storage!

  • Secret!keys!are!never!
  • bserved!from!host!

Host!

x86+Host+CPU+

PLAINTEXT CIPHERTEXT

Graphics+Card+

ENCRYPT

4!

slide-5
SLIDE 5

Cryptographic!Processing!with!GPUs!

  • GPUGaccelerated!SSL!

– [CryptoGraphics,!CTGRSA’05]! – [Harrison!et!al.,!Sec’08]! – [SSLShader,!NSDI’11]! – …!

  • HighGperformance!
  • CostGeffec=ve!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

GPU!

5!

slide-6
SLIDE 6

Cryptographic!Processing!with!GPUs!

  • GPUGaccelerated!SSL!

– [CryptoGraphics,!CTGRSA’05]! – [Harrison!et!al.,!Sec’08]! – [SSLShader,!NSDI’11]! – …!

  • HighGperformance!
  • CostGeffec=ve!

Can+we+also+make+it+secure?+

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

GPU!

6!

slide-7
SLIDE 7

Implementa=on!Challenges!

  • How!to!isolate!GPU!execu=on?!
  • Who!holds!the!keys?!
  • Where!is!the!code?!

7!

slide-8
SLIDE 8

Implementa=on!Challenges!

  • How!to!isolate!GPU!execu=on?!
  • Who!holds!the!keys?!
  • Where!is!the!code?!

8!

slide-9
SLIDE 9

GPU!as!a!coprocessor!

  • Typically!handled!by!the!host!

– Load!parameters,!launch!GPU!kernel,!transfer! data,!etc.!

  • Not!secure!for!our!purposes!

– Crypto!keys!have!to!be!transferred!every!=me!

9!

slide-10
SLIDE 10

Autonomous!GPU!execu=on!

  • Force!GPU!kernel!to!run!indefinitely!

– i.e.,!using!an!infinite!while!loop!

  • Cannot!rely!on!the!typical!parameterGpassing!

execu=on!of!GPU!kernels!!

– Instead,!we!allocate!a!memory!segment!that!is! shared!between!CPU/GPU!

10!

slide-11
SLIDE 11

Shared!Memory!between!CPU/GPU!

  • Page%locked+memory!

– Accessed!by!the!GPU! directly,!via!DMA! – Cannot!be!swapped!to! disk!

  • Processing!requests!are!

issued!through!this! shared!memory!space!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

GPU+

11!

slide-12
SLIDE 12

Shared!Memory!between!CPU/GPU!

  • GPU!con=nuously!

monitors!the!shared! space!for!new!requests!

!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

GPU+

12!

slide-13
SLIDE 13

Shared!Memory!between!CPU/GPU!

  • When!a!new!request!is!

available,!it!is! transferred!to!the! memory!space!of!the! GPU!

!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

GPU+

  • REQUEST
  • ffsets[msg#]

msg# keyIDs[msg#] msg_buf[]

13!

slide-14
SLIDE 14

Shared!Memory!between!CPU/GPU!

  • The!request!is!

processed!by!the!GPU!

!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

14!

  • REQUEST
  • ffsets[msg#]

msg# keyIDs[msg#] msg_buf[]

  • RESPONSE
  • ffsets[msg#]

msg# keyIDs[msg#] enc_msg_buf[]

slide-15
SLIDE 15

Shared!Memory!between!CPU/GPU!

  • When!processing!is!

finished,!the!host!is! no=fied!by!segng!the! response!parameter! fields!accordingly!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

GPU+

  • RESPONSE
  • ffsets[msg#]

msg# keyIDs[msg#] enc_msg_buf[]

15!

slide-16
SLIDE 16

Autonomous!GPU!execu=on!

  • NonGpreemp=ve!

execu=on!

  • Only!the!output!block!is!

being!wriOen!back!to! host!memory!

OpenSSL!stub!

SSH! Server! Web! Server! IMAP! Server!

Shared+Memory+Segment+

GPU+

16!

non-preemptive exec

slide-17
SLIDE 17

Implementa=on!Challenges!

  • How!to!isolate!GPU!execu=on?!
  • Who!holds!the!keys?!
  • Where!is!the!code?!

17!

slide-18
SLIDE 18

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

18!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

slide-19
SLIDE 19

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

19!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

OffGchip!global!memory.! No!protec=on;!data!can! be!acquired!by!the!CPU! directly.!!

slide-20
SLIDE 20

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

20!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

OnGchip!memories!

slide-21
SLIDE 21

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

21!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

Comparable!with! scratchpad!RAM!in!other! architectures.! ! Unfortunately,!its!contents! can!be!acquired!by!a! subsequent!GPU!kernel.!!

slide-22
SLIDE 22

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

22!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

Many!different!caches!(L1GL3,! texture,!constant).! Unfortunately,!the!data!stored! there!cannot!be!managed!by! the!programmer!

slide-23
SLIDE 23

Who!holds!the!keys?!

  • GPUs!contain!different!memory!hierarchies!of!…!

– different!sizes,!and!…! – different!characteris=cs!

23!

Host!Memory! CPU! (Host)! Global!Memory! Shared! Memory! Regs! Cache! SP! SP! SP! SP! SP! SP! SP! SP!

Mul=processor!N! Mul=processor!2! Mul=processor!1!

GPU!

Not!fullyGaddressable.! Reset!to!zero!on!each! GPU!kernel!execu=on.!

slide-24
SLIDE 24

Keeping!secrets!on!GPU!registers!

  • Secret!keys!are!loaded!on!GPU!registers!at!an!

early!stage!of!the!bootstrapping!phase!

– Preferably!from!an!external!storage!device!

  • Unfortunately,!the!number!of!available!

registers!in!current!GPU!models!is!small!

– Enough!for!a!single/few!secret!keys,!but!what+ about+mul7%homing+servers?+

24!

slide-25
SLIDE 25

Support!for!an!arbitrary!number!of!keys!

  • We!can!use!a!separate!KeyStore!array!that!

holds!an!arbitrary!number!of!secret!keys!

KeyStore+

Enc’ed!Key! Dec’ed!Key!

GPU+Registers+File+

encrypted!keys!are! stored!in!GPU!RAM:! each!key!is!decrypted!in!registers! during!encryp=on/decryp=on:! copy!to!registers! Master! Key!

25!

slide-26
SLIDE 26

Implementa=on!Challenges!

  • How!to!isolate!GPU!execu=on?!
  • Who!holds!the!keys?!
  • Where!is!the!code?!

26!

slide-27
SLIDE 27

Where!is!the!code?!

  • GPU!code!is!ini=ally!stored!in!global!device!

memory!for!the!GPU!to!execute!it!

– An!adversary!could!replace!it!with!a!malicious! version!

Global!Device! Memory!

27!

slide-28
SLIDE 28

Preven=ng!code!modifica=on!aOacks!

  • Three!levels!of!instruc=on!caching!(icache)!

– 4KB,!8KB,!and!32KB,!respec=vely! – HardwareGmanaged!

  • Opportunity:!Load!the!code!to!the!icache,!and!

then!erase!it!from!global!device!memory!

– The!code!runs!indefinitely!from!the!icache! – Not!possible!to!be!flushed!or!modified!

28!

slide-29
SLIDE 29

PixelVault!Crypto!Suite!

  • AESG128!
  • RSAG1024!

29!

slide-30
SLIDE 30

AES!Implementa=on!

  • The!key!and!all!intermediate!states!are!stored!

in!GPU!registers!

– 16!bytes!for!the!key! – 16!bytes!for!the!round!key! – 16!bytes!for!the!input/output!block!

  • The!only!data!that!is!wriOen!back!to!global,!
  • ffGchip!device!memory!is!the!output!block!

30!

slide-31
SLIDE 31

RSA!Implementa=on!

  • During!exponen=a=on,!each!thread!needs!three!

temporary!values!of!(n!+!2)!words!each,!where!n! is!the!size!of!the!key!in!bits!

– 408!words!for!1024Gbit!keys!

  • Unfortunately,!there!is!not!always!enough!space!

to!hold!all!three!temporary!values!in!registers!

– Store!the!three!temporary!values!in!shared!memory! (i.e.!scratchpad!memory)!

31!

slide-32
SLIDE 32

Performance!Evalua=on!

  • Hardware!setup!

– 2x!Intel!Xeon!E5520!QuadGcore!CPUs!at!2.27GHz! – 12GB!of!RAM! – GeForce!GTX480!

  • Comparison!against!the!standard!OpenSSL!

implementa=on!

– No!AESGNI!support!

33!

slide-33
SLIDE 33

AESG128!CBC!Performance!

34!

Number of Messages

1 16 64 128 1024 4096

Throughput (Gbit/s)

1 2 3

GPU PixelVault PixelVault (w/ KeyStore)

Number of Messages Throughput (Gbit/s)

1 2 3

CPU Number of Messages

1 16 64 128 1024 4096

Throughput (Gbit/s)

1 2 3 4 5 6

Number of Messages Throughput (Gbit/s)

1 2 3 4 5 6

Decryp=on! Encryp=on!

Up!to!20%!overhead!

  • n!GPU!execu=on!

Up!to!13%!overhead!!

  • n!GPU!execu=on!
slide-34
SLIDE 34

AESG128!CBC!Performance!

35!

Number of Messages

1 16 64 128 1024 4096

Throughput (Gbit/s)

1 2 3

GPU PixelVault PixelVault (w/ KeyStore)

Number of Messages Throughput (Gbit/s)

1 2 3

CPU Number of Messages

1 16 64 128 1024 4096

Throughput (Gbit/s)

1 2 3 4 5 6

Number of Messages Throughput (Gbit/s)

1 2 3 4 5 6

Decryp=on! Encryp=on!

Intel!Nehalem! single!core!(2.27GHz)!!

3xG4x!faster!than!CPU! for!a!sufficient!number!

  • f!messages!
slide-35
SLIDE 35

RSA!1024Gbit!Performance!

36!

#Msgs CPU GPU [25] PixelVault PixelVault (w/ KeyStore) 1 1632.7 15.5 15.3 14.3 16 1632.7 242.2 240.4 239.2 64 1632.7 954.9 949.9 939.6 112 1632.7 1659.5 1652.4 1630.3 128 1632.7 1892.3 1888.3 1861.7 1024 1632.7 10643.2 10640.8 9793.1 4096 1632.7 17623.5 17618.3 14998.8 8192 1632.7 24904.2 24896.1 21654.4

  • PixelVault!adds!an!1%G15%!overhead!over!the!default!!

GPUGaccelerated!RSA!

slide-36
SLIDE 36

RSA!1024Gbit!Performance!

37!

#Msgs CPU GPU [25] PixelVault PixelVault (w/ KeyStore) 1 1632.7 15.5 15.3 14.3 16 1632.7 242.2 240.4 239.2 64 1632.7 954.9 949.9 939.6 112 1632.7 1659.5 1652.4 1630.3 128 1632.7 1892.3 1888.3 1861.7 1024 1632.7 10643.2 10640.8 9793.1 4096 1632.7 17623.5 17618.3 14998.8 8192 1632.7 24904.2 24896.1 21654.4

  • S=ll!faster!than!CPU!when!batch!processing!!>128!messages!!
slide-37
SLIDE 37

Conclusions!

  • Cryptography!on!the!GPU!is!not!only!fast!…!
  • …!but!also!secure!+

– Preserves!the!secrecy!of!keys!even!when!the!base! system!is!fully!compromised!

  • Future!work!

– Adapt!to!other!ciphers!and!applica=on!domains!! – Apply!to!mobile!and!embedded!devices!

38!

slide-38
SLIDE 38

PixelVault:+Using+GPUs+for+Securing+ Cryptographic+Opera;ons+!

Giorgos+Vasiliadis + + +gvasil@ics.forth.gr+ Elias!Athanasopoulos ! !elathan@ics.forth.gr! Michalis!Polychronakis ! !mikepo@cs.columbia.edu! So=ris!Ioannidis! ! ! !so=ris@ics.forth.gr!

thank+you!+

39!