Edge Resource Management Systems: From Today to Tomorrow November - - PowerPoint PPT Presentation
Edge Resource Management Systems: From Today to Tomorrow November - - PowerPoint PPT Presentation
Edge Resource Management Systems: From Today to Tomorrow November 2018 Berlin OpenStack Summit Who are we? Sandro Mazziotta Abdelhadi Chari Adrien Lebre Cloud/NFV innovation Director Product Management Professor (HdR) at IMT Atlantique
OpenStack Summit Berlin 2
Who are we?
Adrien Lebre
Professor (HdR) at IMT Atlantique FEMDC SiG Co-chair (2016-2018) Discovery PI
http://beyondtheclouds.github.io Abdelhadi Chari Cloud/NFV innovation Project manager abdelhadi.chari@orange.com Sandro Mazziotta Director Product Management Openstack NFV smazziot@redhat.com
OpenStack Summit Berlin 3
Multiple use cases are triggering Edge...
Source: https://wiki.akraino.org/display/AK/Akraino+Edge+Stack
OpenStack Summit Berlin 4
Edge from the Infrastructure viewpoint?
A set of independent computing sites that should be seen as a global and unique infrastructure National Core Regional Core Edge Far Edge Far Edge Multiple racks Less than 10 servers 1 rack 1 to 3 servers Form Factors
OpenStack Summit Berlin 5
New constraints on the resource mgmt system
Latency / Bandwidth / intermittent Network consumer ⇔ service service ⇔ service control plane ⇔ resources Resilience make edge sites autonomous, minimize failure domain to 1 site Regulations keep sensitive data
- n-site / within regulatory
region Geo-Distribution Need to deploy and perform the lifecycle management of distributed systems Scale From a few regional sites to a large number of remote resources
And many others...
Orange: Edge through the NFV perspective
OpenStack Summit Berlin 7
NFV hosting infrastructure needs: lower and lower
Regional PoPs
Distribution level
Country /International Data Centers
300 - 500 KM 1000 + KM Few tens Few per country
Backbone network Local PoPs Control Plane Focus
4G vBBU 2G/3G vRNC vSSL/IPSec Gateway vCDN vCPE vBOX vEPC4Business vBNG
DSLAM / OLT
RAN Virtual DU (MAC/RLC) vOLT MEC
Mobile Core:
Mobile Core GiLAN vEPC MVNO vCDN control vPCRF vMME MVAS vWiFi access control Gateway vIMS vSBC
Data Plane Focus
100 - 250 KM Few hundreds 5 - 50 KM Few- Tens of thousands
Central Offices
OpenStack Summit Berlin 8
- Simply speaking: we should be able to ‘’play’’ with this distributed cloud
infrastructure as if it was located in a single data center
- Not so easy!
○ Scalability ○ Lifecycle of control components ○ Networking: especially the interactions with the WAN ○ On-site operations for initial setup, hardware upgrade/troubleshooting ○ How to architecture the control plane components for better resiliency/efficiency? ○ Can we really share the infrastructure among a large variety of NFV functions? (mission critical and best effort)
What do we expect from this distributed infrastructure ?
OpenStack Summit Berlin 9
- And additional requirements will appear, driven by the nature of the
NFV functions themselves:
○ Performance and real time constraints (e.g. for Higher Phy vRAN functions) ○ Mix of Workloads to be supported (VMs + Containers) ○ Location awareness for resources’ allocations/reconfigurations/interconnections
Orange as other Global Telcos, is strongly interested in preparing different scenarios of how to use Openstack to address these requirements and work/support OpenStack evolutions for this purpose
What do we expect from this distributed infrastructure ?
Can we operate such a topology with OpenStack?
OpenStack Summit Berlin 11
Edge: Envisioned Topology
The Akraino View
Source: https://wiki.akraino.org/display/AK/Akraino+Edge+Stack
E: Edge Site R: Regional Site C: Central Site
Another possible View
Wired Wireless
WAN
WAN Medium/Micro DCs Controller Compute/Storage Node Distributed DC Embedded DC Edge Site Constrained Edge Site Large Data Center Central / Regional Site Mobile Edge Site
OpenStack Summit Berlin 12
- Two questions to address:
○ How does openstack behave in each of these deployment scenarios? (i.e., what are the challenges of each scenarios?) ○ How can we make each openstack collaborate with the others?
Can we operate such a topology with OS?
WAN WAN Micro DC Distributed DC Embedded DC
Regional Site Customer Premises Equipment
Data Center
Central Site Public Transport
Wired Wireless Controller Compute/Storage Node
OpenStack Summit Berlin
WAN
WAN
Micro DC Distributed DC Embedded DC
Edge Site Constrained Edge Site
Data Center
Central / Regional Site Mobile Edge Site
13
Edge: Envisioned Topology
s c a l a b i l i t y Footprint S y n c h r
- n
i z a t i
- n
Network Specifics
Wired Wireless Controller Compute/Storage Node
OpenStack Summit Berlin
WAN Micro DC Embedded DC Edge Site Constrained Edge Site Data Center Central / Regional Site Mobile Edge Site
Wired Wireless
Distributed DC
14
Edge: Envisioned Topology
s c a l a b i l i t y Footprint S y n c h r
- n
i z a t i
- n
Network Specifics
Controller Compute/Storage Node
A few challenges (scalability, latency, intermittent network connectivity, deployment, etc.) Inria, Orange, and Redhat have been investigating the distributed DC scenarios for the two last years.
Let’s focus on Distributed DC
OpenStack Summit Berlin 16
Distributed DC => Distributed Compute Nodes
Central Location
Remote Sites 1 Remote Sites 2 Remote Sites 3 Remote Sites N
Features: ➢ 1 shared (Openstack) cluster ➢ 1 central control Plane and N remote sites with Compute nodes ➢ Each Remote Site is an AZ
Openstack Controllers TripleO (LifeCycle Mgmt) Compute Compute Compute Compute
A few performance studies ➢ Evaluating OpenStack WANWide (See FEMDC openstack wikipage) ➢ Supported by Redhat since Newton
WAN
OpenStack Summit Berlin 17
Lessons learnt
- The testing effort was focused on clarifying limitations and expectations in Newton
○ Latency ■ At 50ms roundtrip latency the testing started producing errors and timeouts. ■ Beyond this is not generally supported, but the testing was focused on the infrastructure ■ Once our testing reached 300ms roundtrip latency, the errors increased to the point where service communication will fail. ○ Size of Images ■ We had our initial tests to validate the size of the images and the impact to deploy the first time ■ Beyond 2GB images, deployments failed when deploying 1000 VMs over 10 compute nodes ○ Bandwidth ■ This is highly dependant on the environment [Size of Images, Unique Images, App Needs]. ■ Since images are cached, the most bandwidth is needed when sending a unique image the first time.
OpenStack Summit Berlin 18
Upstream agenda
- In Queens/Rocky
○ Compute Node with Ephemeral Storage ○ Director Deployment ○ Split Stack => Distributed Deployment ○ (L3 support)/Multi-subnet configurations
- In Stein
○ Compute Node with local (persistent) Storage ○ Distributed Ceph Support: A single Ceph cluster across the central and remote sites ○ Glance Images on multi-store ○ Enable HCI Node (Converged Compute, Storage) ○ Ceph Cluster on remote sites (min of 3 servers)
- In Train
○ Advanced monitoring capabilities to collect data and distribute them to the central location. See Presentation on Thursday 3PM Using Prometheus Operator ...
OpenStack Summit Berlin 19
Are there other DCN challenges?
- Ongoing activities
○ New performance evaluations based on Queen (Redhat) ○ Impact of remote failures/network disconnections (under investigations at Inria/Orange)
http://beyondtheclouds.github.io/blog/
○ Qpid router as an alternative to Rabbit: a few studies have been performed since 2017 ■ PTG in Dublin / Boston Presentation
https://www.openstack.org/videos/vancouver-2018/openstack-internal-messaging-at-the-edge-in-depth-evaluation
■ Berlin Presentation
https://www.openstack.org/summit/berlin-2018/rabbitmq-or-qpid-dispatch-router-pushing-openstack-to-the-edge
- Challenges
○ SDN solution with DCN ■ Not all SDN solutions will work with DCN ■ In particular, need to take into account the interface with the WAN ○ Lifecycle of remote resources ○ Impact throughout the whole infrastructure (control and data planes) S e e v i d e
- s
- n
l i n e
Let’s consider several openstack control planes
OpenStack Summit Berlin
WAN
WAN
Micro DC Distributed DC Embedded DC
Regional Site Customer Premises Equipment
Data Center
Central Site Public Transport
21
We need several control planes
Wired Wireless Controller Compute/Storage Node
OpenStack Summit Berlin 22
- Collaborations between/for all services
- A major constraint/objective: do not modify the code
- Two alternative approaches:
○
- 1. One ring to rule them all
■ A global AMQP bus and a global shared DB ■ A few presentations have been performed (see FEMDC openstack wiki page)
https://wiki.openstack.org/wiki/Fog_Edge_Massively_Distributed_Clouds#Achieved_Actions
Several control planes - academic investigations Several control planes: academic investigations
OpenStack Instances [3, 9, 45]
GALERA
Collaboration is not only sharing states (a few services have to be extended) Almost straightforward (integration at the oslo level) Scalability / Partitioning / Versioning
OpenStack Summit Berlin 23
- A major constraint/objective: do not modify the code
- Two alternative approaches:
○
- 2. Another Reify location information at the API level
Several control planes - academic investigations Several control planes: academic investigations
- penstack server create my-vm ——flavor m1.tiny --image cirros.uec —-scope {”image”:”edge2”}
scope: {“identity”:”edge1”,”compute”:”edge1”,”volume”:”edge1”,”network”:”edge1”,”:"image":"edge2"}
OpenStack Summit Berlin 24
- A major constraint/objective: do not modify the code
- Two alternative approaches:
○
- 2. Reify location information at the API level (cross service collaborations)
Several control planes - academic investigations Several control planes: academic investigations
- penstack server create my-vm ——flavor m1.tiny --image cirros.uec —-scope {”image”:”edge2”}
- penstack image list —-scope {”image”:”edge1 AND edge2” AND ”edge3”}
- penstack image create ... —-scope {”image”:”edge1 AND edge2”}
- penstack server create my-vm ... —-scope {”compute”:”edge1 XOR edge2”}
Collaboration is not only sharing states (a few services have to be extended) Almost straightforward (with HA proxy) Scalability / Partitioning / Versioning
Takeaway
OpenStack Summit Berlin 26
- OpenStack WANWide, a single control plane
○ Distributed Compute Nodes for NFV use-cases ■ First concrete deployments in an edge context ○ A few challenges to address but several ongoing efforts
- Multiple control planes: a few technical and research challenges
○ Partitioning/Resynchronisation issues (Intermittent networks, embedded DC...) ○ Cross operations (neutron/cinder between sites...) ○ “Single Pane of Glass” (4000 edge sites… How many nodes?) ■ Does it make sense? ■ How can it be implemented? ○ etc.
Takeaway
OpenStack Summit Berlin 27
- And Kubernetes
○ Can we leverage lessons learnt from previous studies? ■ Does it make sense to perform WANWide Kubernetes evaluations? ■ Cluster federation? ■ Kubernetes has been designed to hide the resource distribution, edge aims at controlling location aspects. Can these two objectives adress simultaneously?
- Could we deal with both ecosystem (some sites with kubernetes, others with
OpenStack)?
- Is the API approach enough to fulfill the expected capabilities?
- Should we considered edge agreement between resource providers such as
the network peering agreements?
Takeaway (cont.)
Interested by all those aspects: See you in Denver for discussing new achievements !
29
Edge is about all smart-* Apps and IoT devices
National Data Center Source:https://opentechdiary.wordpress.com/2015/07/22/part-5-a-walk-through-internet-of-things-iot-basics/
Smart cities, public transportations, industrial internet (Industry 4.0), internet of skills.
Source: https://www.ericsson.com/thinkingahead/the-networked-societ y-blog/2017/02/14/virtual-reality-comes-age-internet-skills/