SLIDE 3 英文标题 颜色 内部使用字体 外部使用字体 中文标题 颜色 字体 黑体 英文正文 子目录 级 颜色 黑色 内部使用字体 外部使用字体 中文正文 子目录 级 颜色 黑色 字体 细黑体 配色参考方案: 建议同一页面内不超过 四种颜色,以下是 组 配色方案,同一页面内 只选择一组使用。(仅 供参考) 客户或者合作伙伴的 标志放在右上角
Page 3
Three Critical Use Cases
Online Data Intensive (OLDI) Services
Tail Latency is Critical
OLDI applications have real-time deadlines and run in parallel on 1000s of servers. Incast is a naturally
- ccurring phenomenon. Tail latency reduces the quality
- f results
Aggregator
… Worker Worker Worker … Aggregator Aggregator Aggregator Worker … Worker
Request
Deadline = 250 ms Deadline = 50 ms Deadline = 10 ms
Loss and Latency Sensitive
Disaggregated resource pooling, such as NVMe over Fabrics, use RDMA and run over converged network
- infrastructure. Low latency and loss are critical.
NVMe over Fabrics
Training Scale is Network Limited
Massively parallel HPC applications, such AI training, are dependent on low latency and high throughput
- network. Billions of parameters. Scales out is limited by
network performance.
Deep Learning
… … … Start Elapsed Time
Feed Data Training MPI Allreduce Weights Send Weight
Rank 0 Rank 1 Rank 2 Partition Partition 1 Partition 2 Dataset