litmus testing at rack scale we re going to build a large
play

Litmus Testing at Rack Scale We're Going to Build a Large Program - PowerPoint PPT Presentation

Litmus Testing at Rack Scale We're Going to Build a Large Program Collider ad Collide instructions at 0.99 c , and observe the decay products. Images: CERN; Chaix & Morel et associs David Cock | 19. September 20 | 2 16 Programmers


  1. Litmus Testing at Rack Scale

  2. We're Going to Build a Large Program Collider ad Collide instructions at 0.99 c , and observe the decay products. Images: CERN; Chaix & Morel et associés David Cock | 19. September 20 | 2 16

  3. Programmers Once (Thought They) Understood Computer Architecture Image: Computer Systems, A Programmer's Perspective, David Cock | 19. September 20 | 3 Bryant & O'Hallaron, 2011 16

  4. Symmetric Multiprocessors Were Fairly Simple RAM Cache WB Cache WB David Cock | 19. September 20 | 4 16

  5. Concurrent Code Makes Architecture Visible Consider message passing.  Pretty much the simplest thing you can do with shared memory.  Systems like Barrelfish rely on it.  When are barriers required?  You can't write good code, without sufficiently  understanding the hardware. We're combining components in  new ways. David Cock | 19. September 20 | 5 16

  6. Message Passing with Shared Memory CPU CPU Read: *y = 1 Write: *x = 42 Read: *x = 42 Write: *y = 1 RAM *x = 0 *x = 42 *y = 1 *y = 0 David Cock | 19. September 20 | 6 16

  7. Message Passing with a Write Buffer CPU CPU Read: *y = 1 Write: *x = 42 Read: *x = 0 Write: *y = 1 *x = 42 WB *y = 1 RAM *x = 0 *y = 0 *y = 1 David Cock | 19. September 20 | 7 16

  8. Message Passing with a Barrier CPU CPU Read: *y = 1 Write: *x = 42 Read: *x = 42 Write: *y = 1 *x = 42 WB *y = 1 RAM *x = 42 *x = 0 *y = 0 *y = 1 David Cock | 19. September 20 | 8 16

  9. Of Course, CPUs Aren't That Simple CPU CPU CPU CPU WB WB WB WB 9 hops L1 L1 L1 L1 L2 L2 Coherent PCI Interconnect RAM L3 David Cock | 19. September 20 | 9 16

  10. You Can't Trust the Hardware Source: Chip Errata for the i.MX51, Freescale Semiconductor seL4 was verified modulo  a hardware model . The Cortex A8 has bugs:  Cache flushes don't work.  As of today, these “errata”  are still not public. We rediscovered these by  accident. Non-coherent memory is  coming. David Cock | 19. September 20 | 10 16

  11. And Then There's Rack Scale... CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM TOR TOR CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM Backhaul David Cock | 19. September 20 | 11 16

  12. There's a Lot of Data Available Cache dumps Program trace CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM Port mirroring CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM Event triggers TOR TOR CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM CPU CPU CPU CPU CPU CPU CPU CPU WB WB WB WB WB WB WB WB L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L3 L3 L3 L3 Coherent Coherent PCI NIC NIC PCI Interconnect Interconnect RAM RAM Backhaul Openflow David Cock | 19. September 20 | 12 16

  13. ARM High-Speed Serial Trace Port Image: Teledyne Lecroy Streams from the Embedded  Trace Macrocell . Cycle-accurate control flow +  events @ 6GiB/s+ Compatible with FPGA PHYs.  Well-documented protocol.  Available on ARMv8  David Cock | 19. September 20 | 13 16

  14. The HSSTP Hardware The official tool is CHF10,000 per core.  The cable run is maximum 15cm.  It's PHY-compatible with common FPGAs  A CHF6k FGPA could easily handle 10 – 15x cheaper!  We're working with the D-ITET DZ on an interface board.  If you like soldering, let us know!  David Cock | 19. September 20 | 14 16

  15. Fancy Triggering and Filtering The ETM has sophisticated  State 0 filtering e.g. Sequencer . B0 F0 Bn and Fn can be just about any  events on the SoC. State 1 States can enable/disable trace,  B1 F1 or log events. State 2 A powerful facility for pre-filtering  B2 F2 State 3 David Cock | 19. September 20 | 15 16

  16. Filtering and Offload in an FPGA We'll need to intelligently filter high-rate  data. We're using an FPGA for the physical  interface already. How much processing could we do?  We have expertise in the group with  FPGA query offloading Zsolt and I are writing a joint Master's project  proposal on this. David Cock | 19. September 20 | 16 16

  17. Hardware Tracing for Correctness Are HW operatjons right? 5Gb/s unmap(pa); cleanDCache(); flushTLB(); Filter at line rate ● Real time pipeline trace on ARM. ● Can halt and inspect caches. ● HW has “errata” (bugs). ● Check that it actually works! ● Catch transient and race bugs. Check temporal Log & process offmine assertjons David Cock | 19. September 20 | 17 16

  18. Hardware Tracing for Performance • Should see N coherency messages. 5Gb/s • Do we? ‐ The HW knows! Filter at line rate Is URPC optjmal? Cache 0 x 1 1 INVAL(0) URPC[0]= x; READ(1) URPC[1]= 1; … x Core 0 Cache 1 while(!URPC[1]); x= URPC[0]; Log & process offmine 2 Core 1 David Cock | 19. September 20 | 18 16

  19. Properties to Check: Security Runtime verification is an  established field. Lots of existing work to  build on. What properties could we  /* A very simple TESLA assertion. */ check efficiently? TESLA_WITHIN(example_syscall, previously(security_check(ANY(ptr), How could we map them  o, op) == 0)); to the filtering pipeline? http://www.cl.cam.ac.uk/research/security/ctsrd/tesla/ David Cock | 19. September 20 | 19 16

  20. Processing Engine That's a lot of data, how can we process it?  This is what rack-scale systems are for!  Andrei is starting on this as his Master's project.  David Cock | 19. September 20 | 20 16

  21. Properties to Check: Memory Management void *a = malloc(); Could we check this? ...  free(b); {a = b} We don't have data  values ( a & b ). We can play clever tricks  with the hardware! PROCID= b[31:16]; Shows what we could do PROCID= b[15:0];  with data trace. CID: CID: B[15:0] ++ ASID B[31:16] ++ ASID David Cock | 19. September 20 | 21 16

  22. A Streaming Verification Engine Capture Processing Properties Sources ETM HSSTP TESLA Dataflow Sequencer Engine malloc() Packet FPGA pairing FPGA Capture Capture Offload Coherence correctness Constraints Requirements David Cock | 19. September 20 | 22 16

  23. Offloading Example: LTL to Büchi ● LTL(-ish) formula: A store on core 1 is eventually visible on core 2. ● Think regular expressions for infinite streams. ● As for REs, we compile a checking automaton. ● Run the automaton in real time and look for violations. ● FPGAs are good at state machines. David Cock | 19. September 20 | 23 16

  24. An Instrumented Rack-Scale System ● 64 SoCs x 5Gb/s = 320Gb/s trace output. ● Online checkers (e.g. automata) will be essential at this scale. ● We're going to build this: – A rack of ARMv8 cores & FPGAs. ● We're starting a fortnightly reading group to get up to speed on the Runtime Monitoring literature – feel free to join. https://code.systems.ethz.ch/project/view/55/ rack-tracing@lists.inf.ethz.ch David Cock | 19. September 20 | 24 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend