 
              Processing Bottlenecks in Traditional Protocols Overview of InfiniBand • Ex: TCP/IP, UDP/IP Architecture • Generic architecture for all network interfaces • Host ‐ handles almost all aspects of communication – Data buffering (copies on sender and receiver) Dhabaleswar K. (DK) Panda – Data integrity (checksum) The Ohio State University – Routing aspects (IP routing) E ‐ mail: panda@cse.ohio ‐ state.edu http://www.cse.ohio ‐ state.edu/~panda • Signaling between different layers – Hardware interrupt whenever a packet arrives or is sent – Software signals between different layers to handle protocol processing in different priority levels HPCA '10 2 Capabilities of High ‐ Performance Networks Previous High ‐ Performance Network Stacks • Virtual Interface Architecture (VIA) • Intelligent Network Interface Cards – Standardized by Intel, Compaq, Microsoft • Support entire protocol processing completely in hardware • Fast Messages (FM) (hardware protocol offload engines) – Developed by UIUC • Provide a rich communication interface to applications • Myricom GM – User ‐ level communication capability – Proprietary protocol stack from Myricom – Gets rid of intermediate data buffering requirements • These network stacks set the trend for high ‐ performance communication requirements • No software signaling between communication layers – Hardware offloaded protocol stack – All layers are implemented on a dedicated hardware unit, and not – Support for fast and secure user ‐ level access to the protocol stack on a shared host CPU HPCA '10 3 HPCA '10 4 IB Trade Association IB Hardware Acceleration • IB Trade Association was formed with seven industry leaders • Some IB models have multiple hardware accelerators (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun) – E.g., Mellanox IB adapters • Goal: To design a scalable and high performance communication • Protocol Offload Engines and I/O architecture by taking an integrated view of computing, networking, and storage technologies – Completely implement layers 2 ‐ 4 in hardware • Many other industry participated in the effort to define the IB • Additional hardware supported features also present architecture specification – RDMA, Multicast, QoS, Fault Tolerance, and many more • IB Architecture (Volume 1, Version 1.0) was released to public on Oct 24, 2000 – Latest version 1.2.1 released January 2008 • http://www.infinibandta.org HPCA '10 5 HPCA '10 6 1
IB Overview A Typical IB Network • InfiniBand – Architecture and Basic Hardware Components – Communication Model and Semantics Three primary • Communication Model components • Memory registration and protection Channel Adapters • Channel and memory semantics Switches/Routers – Novel Features • Hardware Protocol Offload Links and connectors – Link, network and transport layer features – Management and Services • Subnet Management • Hardware support for scalable network management HPCA '10 7 HPCA '10 8 Components: Channel Adapters Components: Switches and Routers • Used by processing and I/O units to connect to fabric • Consume & generate IB packets • Programmable DMA engines with protection features • May have multiple ports – Independent buffering channeled • Relay packets from a link to another through Virtual Lanes • Switches: intra ‐ subnet • Host Channel Adapters (HCAs) • Routers: inter ‐ subnet • May support multicast HPCA '10 9 HPCA '10 10 Components: Links & Repeaters IB Overview • Network Links • InfiniBand – Copper, Optical, Printed Circuit wiring on Back Plane – Architecture and Basic Hardware Components – Not directly addressable – Communication Model and Semantics • Traditional adapters built for copper cabling • Communication Model – Restricted by cable length (signal integrity) • Memory registration and protection – For example, QDR copper cables are restricted to 7m • Channel and memory semantics • Intel Connects: Optical cables with Copper ‐ to ‐ optical – Novel Features conversion hubs (acquired by Emcore) • Hardware Protocol Offload – Up to 100m length – Link, network and transport layer features – 550 picoseconds – Management and Services copper ‐ to ‐ optical conversion latency • Subnet Management • Available from other vendors (Luxtera) (Courtesy Intel) • Hardware support for scalable network management • Repeaters (Vol. 2 of InfiniBand specification) HPCA '10 11 HPCA '10 12 2
IB Communication Model Queue Pair Model • Each QP has two queues – Send Queue (SQ) QP CQ – Receive Queue (RQ) Basic InfiniBand Communication Send Recv – Work requests are queued to the QP Semantics (WQEs: “Wookies”) WQEs CQEs • QP to be linked to a Complete Queue (CQ) – Gives notification of operation completion from QPs InfiniBand Device – Completed WQEs are placed in the CQ with additional information (CQEs: “Cookies”) HPCA '10 13 HPCA '10 14 More on WQEs and CQEs Memory Registration • Send WQEs contain data Before we do any communication: All memory used for communication must about what buffer to send 1. Registration Request be registered from, how much to send, • Send virtual address and length etc. 2. Kernel handles virtual ‐ >physical Process • Receive WQEs contain mapping and pins region into data about what buffer to physical memory receive into, how much to • Process cannot map memory 1 4 receive, etc. that it does not own (security !) • CQEs contain data about 2 Kernel 3. HCA caches the virtual to physical which QP the completed mapping and issues a handle WQE was posted on, how • Includes an l_key and r_key 3 much data actually arrived HCA/RNIC 4. Handle is returned to application HPCA '10 15 HPCA '10 16 Memory Protection Communication in the Channel Semantics (Send/Receive Model) For security, keys are required for all operations that touch buffers Processor Processor Memory Memory Memory Memory Segment • To send or receive data the l_key Segment Memory must be provided to the HCA Segment Memory Process Segment Memory Segment • HCA verifies access to local QP QP Processor is involved only to: CQ CQ Send Recv Send Recv memory 1. Post receive WQE l_key • For RDMA, initiator must have the Kernel 2. Post send WQE 3. Pull out completed CQEs from the CQ r_key for the remote virtual address • Possibly exchanged with a HCA/NIC send/recv InfiniBand Device InfiniBand Device • r_key is not encrypted in IB Hardware ACK Receive WQE contains information on the receive Send WQE contains information about the buffer (multiple non ‐ contiguous segments); send buffer (multiple non ‐ contiguous Incoming messages have to be matched to a r_key is needed for RDMA operations segments) receive WQE to know where to place HPCA '10 17 HPCA '10 18 3
Communication in the Memory Semantics (RDMA Model) IB Overview • InfiniBand Processor Processor Memory Memory Memory – Architecture and Basic Hardware Components Segment Memory Memory Segment – Communication Model and Semantics Segment Memory • Communication Model Segment QP Initiator processor is involved only to: QP CQ CQ Send Recv Send Recv • Memory registration and protection 1. Post send WQE 2. Pull out completed CQE from the send CQ • Channel and memory semantics – Novel Features No involvement from the target processor • Hardware Protocol Offload – Link, network and transport layer features – Management and Services InfiniBand Device InfiniBand Device Hardware ACK • Subnet Management Send WQE contains information about the send buffer (multiple segments) and the • Hardware support for scalable network management receive buffer (single segment) HPCA '10 19 HPCA '10 20 Hardware Protocol Offload Link Layer Capabilities • CRC ‐ based Data Integrity • Buffering and Flow Control • Virtual Lanes, Service Levels and QoS Complete • Switching and Multicast Hardware Implementations Exist • IB WAN Capability HPCA '10 21 HPCA '10 22 CRC ‐ based Data Integrity Buffering and Flow Control • Two forms of CRC to achieve both early error detection • IB provides an absolute credit ‐ based flow ‐ control and end ‐ to ‐ end reliability – Receiver guarantees that it has enough space allotted for N blocks – Invariant CRC (ICRC) covers fields that do not change per link (per of data network hop) – Occasional update of available credits by the receiver • E.g., routing headers (if there are no routers), transport headers, data payload • Has no relation to the number of messages, but only to the • 32 ‐ bit CRC (compatible with Ethernet CRC) total amount of data being sent • End ‐ to ‐ end reliability (does not include I/O bus) – Variant CRC (VCRC) covers everything – One 1MB message is equivalent to 1024 1KB messages (except for • 16 ‐ bit CRC rounding off at message boundaries) • Erroneous packets do not have to reach the destination • Early error detection HPCA '10 23 HPCA '10 24 4
Recommend
More recommend