Session Border Control in the Cloud
Accelerating Virtual Network Functions with GPUs
Kevin Riley –CTO & EVP of Advanced R&D
Session Border Control in the Cloud Accelerating Virtual Network - - PowerPoint PPT Presentation
Session Border Control in the Cloud Accelerating Virtual Network Functions with GPUs Kevin Riley CTO & EVP of Advanced R&D Our Application and the GPU Opportunity SBCsWhat are They? SBC Application Secure and Interwork Unified
Kevin Riley –CTO & EVP of Advanced R&D
2 Ribbon Communications Confidential and Proprietary
SBCs…What are They?
Secure and Interwork Unified Communications Deployed in Service Provider Core, Edge and Customer Premise Application Decomposes into Control and Media Plane Transcoding Inefficiencies Inhibiting Cloud Migration at Scale Enhanced Security Capabilities Ill-suited to CPU
Challenges Evolution
Historically Implemented on Purpose Built HW Migration to CPU and Cloud Infrastructure is current State of the Art
Control Plane Media Transcoding Media Security & Forwarding
SBC Application Components
4 Ribbon Communications Confidential and Proprietary
5 Ribbon Communications Confidential and Proprietary
GB/s
Peak Memory Bandwidth
Source: Nvidia’s Presentation
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
X86 CPU NVIDIA GPU
GFLOPS 8000 7000 6000 5000 4000 3000 2000 1000 1400 1200 1000 800 600 400 200
Peak Double Precision FLOPS
2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
X86 CPU NVIDIA GPU
Market Realist
6 Ribbon Communications Confidential and Proprietary
Challenges
various types of recursive filters which are ill-suited for parallelization.
be limited by Amdahl’s law.
powerful than a CPU core.
For a stable transcoding system, it’s imperative that processing of all channels is completed within the codec frame time. CPUs and DSPs process channels sequentially and hence need to ensure per channel processing time is low.
TIME Channel N Channel 0
Codec Frame Time
7 Ribbon Communications Confidential and Proprietary
New Approach
similar jobs.
less than frame time. G729A transcoding (encode + decode), with a 10ms frame-time, takes approximately 35us for one channel on an E5-2690v2 processor. On a CPU we can achieve approximately 285 transcodes. When we offload per channel processing to a single GTX970 thread, it takes approximately 6ms (initial prototype). However in this 6ms we can process 1664 channels (GTX970 has 1664 cores).
Channel 0 Channel 1 Channel N
TRADITIONAL APPROACH NEW APPROACH
TIME Channel N Channel 0
Codec Frame Time
8 Ribbon Communications Confidential and Proprietary
hardware
reference source code.
hardware
IPP and third-party vendors.
hardware.
third-party vendors.
9 Ribbon Communications Confidential and Proprietary
10 Ribbon Communications Confidential and Proprietary
518% 1136% 534% 320% 407% 605% 1458% 1066% 519% 732% G729A EVRC-9.3 EVRCB-9.3 AMR-12.2 AMRW B-6.6
SESSIONS MULTIPLIER
GPU VS CPU - SESSIONS
M60 V-100
193% 333% 133% 81% 111% 356% 543% 314% 172% 209% G729A EVRC-9.3 EVRCB-9.3 AMR-12.2 AMRW B-6.6
SESSIONS /WATT
GPU VS CPU -SESSIONS/WATT
M-60 V-100
11 Ribbon Communications Confidential and Proprietary
Media Transcoding Media Security & Forwarding
Control Plane