1
play

1 IB Overview A Typical IB Network InfiniBand Architecture and - PDF document

Processing Bottlenecks in Traditional Protocols Overview of InfiniBand Ex: TCP/IP, UDP/IP Architecture Generic architecture for all network interfaces Host handles almost all aspects of communication Data buffering (copies on


  1. Processing Bottlenecks in Traditional Protocols Overview of InfiniBand • Ex: TCP/IP, UDP/IP Architecture • Generic architecture for all network interfaces • Host ‐ handles almost all aspects of communication – Data buffering (copies on sender and receiver) Dhabaleswar K. (DK) Panda – Data integrity (checksum) The Ohio State University – Routing aspects (IP routing) E ‐ mail: panda@cse.ohio ‐ state.edu http://www.cse.ohio ‐ state.edu/~panda • Signaling between different layers – Hardware interrupt whenever a packet arrives or is sent – Software signals between different layers to handle protocol processing in different priority levels HPCA '10 2 Capabilities of High ‐ Performance Networks Previous High ‐ Performance Network Stacks • Virtual Interface Architecture (VIA) • Intelligent Network Interface Cards – Standardized by Intel, Compaq, Microsoft • Support entire protocol processing completely in hardware • Fast Messages (FM) (hardware protocol offload engines) – Developed by UIUC • Provide a rich communication interface to applications • Myricom GM – User ‐ level communication capability – Proprietary protocol stack from Myricom – Gets rid of intermediate data buffering requirements • These network stacks set the trend for high ‐ performance communication requirements • No software signaling between communication layers – Hardware offloaded protocol stack – All layers are implemented on a dedicated hardware unit, and not – Support for fast and secure user ‐ level access to the protocol stack on a shared host CPU HPCA '10 3 HPCA '10 4 IB Trade Association IB Hardware Acceleration • IB Trade Association was formed with seven industry leaders • Some IB models have multiple hardware accelerators (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun) – E.g., Mellanox IB adapters • Goal: To design a scalable and high performance communication • Protocol Offload Engines and I/O architecture by taking an integrated view of computing, networking, and storage technologies – Completely implement layers 2 ‐ 4 in hardware • Many other industry participated in the effort to define the IB • Additional hardware supported features also present architecture specification – RDMA, Multicast, QoS, Fault Tolerance, and many more • IB Architecture (Volume 1, Version 1.0) was released to public on Oct 24, 2000 – Latest version 1.2.1 released January 2008 • http://www.infinibandta.org HPCA '10 5 HPCA '10 6 1

  2. IB Overview A Typical IB Network • InfiniBand – Architecture and Basic Hardware Components – Communication Model and Semantics Three primary • Communication Model components • Memory registration and protection Channel Adapters • Channel and memory semantics Switches/Routers – Novel Features • Hardware Protocol Offload Links and connectors – Link, network and transport layer features – Management and Services • Subnet Management • Hardware support for scalable network management HPCA '10 7 HPCA '10 8 Components: Channel Adapters Components: Switches and Routers • Used by processing and I/O units to connect to fabric • Consume & generate IB packets • Programmable DMA engines with protection features • May have multiple ports – Independent buffering channeled • Relay packets from a link to another through Virtual Lanes • Switches: intra ‐ subnet • Host Channel Adapters (HCAs) • Routers: inter ‐ subnet • May support multicast HPCA '10 9 HPCA '10 10 Components: Links & Repeaters IB Overview • Network Links • InfiniBand – Copper, Optical, Printed Circuit wiring on Back Plane – Architecture and Basic Hardware Components – Not directly addressable – Communication Model and Semantics • Traditional adapters built for copper cabling • Communication Model – Restricted by cable length (signal integrity) • Memory registration and protection – For example, QDR copper cables are restricted to 7m • Channel and memory semantics • Intel Connects: Optical cables with Copper ‐ to ‐ optical – Novel Features conversion hubs (acquired by Emcore) • Hardware Protocol Offload – Up to 100m length – Link, network and transport layer features – 550 picoseconds – Management and Services copper ‐ to ‐ optical conversion latency • Subnet Management • Available from other vendors (Luxtera) (Courtesy Intel) • Hardware support for scalable network management • Repeaters (Vol. 2 of InfiniBand specification) HPCA '10 11 HPCA '10 12 2

  3. IB Communication Model Queue Pair Model • Each QP has two queues – Send Queue (SQ) QP CQ – Receive Queue (RQ) Basic InfiniBand Communication Send Recv – Work requests are queued to the QP Semantics (WQEs: “Wookies”) WQEs CQEs • QP to be linked to a Complete Queue (CQ) – Gives notification of operation completion from QPs InfiniBand Device – Completed WQEs are placed in the CQ with additional information (CQEs: “Cookies”) HPCA '10 13 HPCA '10 14 More on WQEs and CQEs Memory Registration • Send WQEs contain data Before we do any communication: All memory used for communication must about what buffer to send 1. Registration Request be registered from, how much to send, • Send virtual address and length etc. 2. Kernel handles virtual ‐ >physical Process • Receive WQEs contain mapping and pins region into data about what buffer to physical memory receive into, how much to • Process cannot map memory 1 4 receive, etc. that it does not own (security !) • CQEs contain data about 2 Kernel 3. HCA caches the virtual to physical which QP the completed mapping and issues a handle WQE was posted on, how • Includes an l_key and r_key 3 much data actually arrived HCA/RNIC 4. Handle is returned to application HPCA '10 15 HPCA '10 16 Memory Protection Communication in the Channel Semantics (Send/Receive Model) For security, keys are required for all operations that touch buffers Processor Processor Memory Memory Memory Memory Segment • To send or receive data the l_key Segment Memory must be provided to the HCA Segment Memory Process Segment Memory Segment • HCA verifies access to local QP QP Processor is involved only to: CQ CQ Send Recv Send Recv memory 1. Post receive WQE l_key • For RDMA, initiator must have the Kernel 2. Post send WQE 3. Pull out completed CQEs from the CQ r_key for the remote virtual address • Possibly exchanged with a HCA/NIC send/recv InfiniBand Device InfiniBand Device • r_key is not encrypted in IB Hardware ACK Receive WQE contains information on the receive Send WQE contains information about the buffer (multiple non ‐ contiguous segments); send buffer (multiple non ‐ contiguous Incoming messages have to be matched to a r_key is needed for RDMA operations segments) receive WQE to know where to place HPCA '10 17 HPCA '10 18 3

  4. Communication in the Memory Semantics (RDMA Model) IB Overview • InfiniBand Processor Processor Memory Memory Memory – Architecture and Basic Hardware Components Segment Memory Memory Segment – Communication Model and Semantics Segment Memory • Communication Model Segment QP Initiator processor is involved only to: QP CQ CQ Send Recv Send Recv • Memory registration and protection 1. Post send WQE 2. Pull out completed CQE from the send CQ • Channel and memory semantics – Novel Features No involvement from the target processor • Hardware Protocol Offload – Link, network and transport layer features – Management and Services InfiniBand Device InfiniBand Device Hardware ACK • Subnet Management Send WQE contains information about the send buffer (multiple segments) and the • Hardware support for scalable network management receive buffer (single segment) HPCA '10 19 HPCA '10 20 Hardware Protocol Offload Link Layer Capabilities • CRC ‐ based Data Integrity • Buffering and Flow Control • Virtual Lanes, Service Levels and QoS Complete • Switching and Multicast Hardware Implementations Exist • IB WAN Capability HPCA '10 21 HPCA '10 22 CRC ‐ based Data Integrity Buffering and Flow Control • Two forms of CRC to achieve both early error detection • IB provides an absolute credit ‐ based flow ‐ control and end ‐ to ‐ end reliability – Receiver guarantees that it has enough space allotted for N blocks – Invariant CRC (ICRC) covers fields that do not change per link (per of data network hop) – Occasional update of available credits by the receiver • E.g., routing headers (if there are no routers), transport headers, data payload • Has no relation to the number of messages, but only to the • 32 ‐ bit CRC (compatible with Ethernet CRC) total amount of data being sent • End ‐ to ‐ end reliability (does not include I/O bus) – Variant CRC (VCRC) covers everything – One 1MB message is equivalent to 1024 1KB messages (except for • 16 ‐ bit CRC rounding off at message boundaries) • Erroneous packets do not have to reach the destination • Early error detection HPCA '10 23 HPCA '10 24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend