Scaling Nova with CellsV2 The Nova Developer and the CERN Operator - - PowerPoint PPT Presentation
Scaling Nova with CellsV2 The Nova Developer and the CERN Operator - - PowerPoint PPT Presentation
Scaling Nova with CellsV2 The Nova Developer and the CERN Operator perspective Dan Smith (Red Hat) Belmiro Moreira (CERN) Your deployment probably looks like this: API DB/MQ Computes Nova with Cells(v1) This special router needs separate
Your deployment probably looks like this:
API DB/MQ Computes
Nova with Cells(v1)
Replication In Python
nova cells
This special router needs separate code for almost every feature!
Native sharding of the contended resources
API DB/MQ Computes
CellsV2 Services
Scheduler “Super” Conductor API Conductor Compute Compute Conductor Compute Compute Placement
Design and Development Tenets
Mainstream
- CellsV2 should not
be opt-in or a different code path
- Full upstream testing
in a reasonably cells-y configuration
- Cells should be
invisible to regular API users
No Python Replication
- Data should either
live at the global or cell level (not both)
- Aim for no
“unsupported in cells” features
Performance
- Optimize cross-cell
instance-based API
- perations
- Introduce caching
and fault tolerance as needed
Development Challenges
- Unify two camps of Nova users
○ Those for which CellsV1 will never be a desirable solution ○ Those for which CellsV1 is a necessary evil
- Must be able to prescribe a transition for both camps
○ Regular operators have minimal tolerance for unnecessary steps and sometimes fewer resources ○ Typical CellsV1 operators often have more resources, but have large existing deployments
- Major re-architecting of a large amount of Nova internals
- All of this must happen in parallel to other efforts
- The world kept changing while we worked on this
How’d that go then?
- Mostly good?
○ Obviously this introduced bugs and churn
- Some additional operational overhead for regular operators
- Existing CellsV1 users faced a big transition
○ Deployment assumptions ○ Some of the least-desirable attributes became “features”
- Resulted in some cleanups and stricter rules around existing Nova code
○ Laid the groundwork for future non-scale-related use cases
Status (Rocky)
- Fully developed and tested in mainstream Nova - there is no “non-cells”
deployment arrangement
- Good multi-cell performance
○ Focus has been on instance operations ○ Some admin-type operations may still need optimizing
- Some remaining functions fail to work properly in a fully-isolated environment
○ Late affinity check
- Performance is rapidly improving
- Fault tolerance is naive but improving
What’s next?
- Cross-cell migrations
○ Further eliminating the restrictions of running with multiple cells
- Fault tolerance improvements
○ API availability when cells are down ○ Improving quota handling when cells are down ○ Still plenty of room to improve with caching and DB replication
- Affinity via placement
CERN - Cloud resources status board - 06/11/2018@11:26
Cells at CERN
- CERN uses cells since 2013
- Why we use cells?
○ Single endpoint. Scale transparently between different Data Centres ○ Availability and Resilience ○ Isolate failure domains ○ Dedicate cells to projects ○ Hardware type per cell ○ Easy to introduce new configurations
CellsV1 and The Operational Nightmare
- Unmaintained upstream
- Only few deployments using CellsV1
- Several functionality missing
○ Flavor Propagation ○ No aggregates support ○ No server group support ○ No security groups with nova-network
- A lot of local patches to make other basic functionality work
○ Examples:
■ Boot more than one instance per request ■ Availability Zones support
- DBs can get out of sync
- Upgrade is hard!
Journey to CellsV2 at CERN
Newton Ocata Pike Queens (https://www.youtube.com/watch?v=49CFXNIDM3c&t) Grizzly . . .
CellsV1 deployed at CERN Cloud 2 cells 2013 CellsV2 deployed at CERN Cloud 70 cells 2018
Why we are excited about CellsV2?
- Upstream code
- All nova deployments use now cells
○ We are not in the “blackhole” anymore
- Finally we can use nova full feature set
- Promise of sane DBs
- Rolling upgrades for old CellsV1 users
- CERN moved fast to CellsV2
Identified few interesting issues at scale. Most already fixed in Rocky
HOT Databases
Nova API Servers CellA compute nodes CellA controller CellB compute nodes CellB controller TOP Cell controllers
nova_api DB nova DB nova DB RabbitMQ RabbitMQ RabbitMQ
CellZ compute nodes CellZ controller
nova DB RabbitMQ
HOT Databases
- Cell databases activity increased a lot with cellsV2
○ Simple API operations need to connect to all DBs. Most of these operations were sequential ■ nova list; nova boot ○ Most of the issues are already fixed in Queens/Rocky or in progress
■ For example:
- https://bugs.launchpad.net/nova/+bug/1771810
- https://bugs.launchpad.net/nova/+bug/1746558
- https://bugs.launchpad.net/nova/+bug/1746561
Number of queries and connections in one Cell DB after Nova Queens upgrade with cellsV2 enabled. API only available to few users
DB Down! Cloud Down!
- Fault tolerant DB solution per cell is recommended by Nova team
○ Very challenging for CERN considering the number of cells ○ One of the reasons that we decided to use cells was failure domains
- An unavailable cell DB affects the entire cloud
○ Can’t create, list, delete instances…
- No perfect solution... few compromises
○ https://review.openstack.org/#/q/topic:bp/handling-down-cell ○ For example:
■ Not all information is available when getting instances
- nova list; nova show
- returns a minimalistic construct from the available information in the API DB
■ Is not possible to calculate quota if project has instances in an unavailable cell
- Policy: os_compute_api:servers:create:cell_down
Scheduling
- Central Scheduling
○ Filters are not per cell
■ Ex: “PCIPassthroughFilter” runs in every schedule request because we deploy GPUs in one cell
○ “request-filter” for Placement ■
Allows placement to be be aware of project cell mapping and AVZs ■ Basic filtering on Placement. Reduces the number of allocation candidates ■ Uses aggregates and placement aggregates
- Automatic sync in Rocky
■ https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregates-in-placement
○ However, we can still get a large number of allocation candidates
■ scheduler/max_placement_results = 10
- Improves scheduling performance
- But… Unveiled some issues… (https://bugs.launchpad.net/nova/+bug/1777591)
○ Live migration with a defined target ○ Rebuild with a new image
Miscellaneous
- Delete "Orphan" request_specs and instance_mappings
○ https://bugs.launchpad.net/nova/+bug/1761198
- Slow AVZ list. Important for Horizon
○ https://bugs.launchpad.net/nova/+bug/1801897
- Scheduling time is higher than in CellsV1
- Don’t expect always a consistent state from 5 years old DBs
○ Delete aggregate_hosts fails if service not available
Rocky Upgrade - Nova
- Control plane
○
Upgraded in 1h (nova API unavailable) ○ VMs (4 vcpus/8GB RAM) ○ Top control plane ■ 16 nova-api ■ 10 nova-conductor; 10 nova-scheduler ■ 10 nova-placement-api ○ 73 cell controllers ■ nova-api; nova-conductor; nova-network
- DBs sync done the day before
- upgrade_levels/compute=auto
Number of nova api requests
Rocky Upgrade - Nova
- Compute nodes upgraded during the next 24h after control plane
- Number of placement requests increased with compute nodes upgrade
- Needed to x3 the number of placement nodes
- Impact in the VM scheduling time
- nova-compute (ironic driver) rollback to Queens!
- http://lists.openstack.org/pipermail/openstack-dev/2018-November/136251.html
- https://review.openstack.org/#/c/614886/
Summary
CERN Cloud is running Nova Rocky with CellsV2
- Few issues found during Queens. Most of them are already fixed in Rocky
- CellsV2 works at scale
- No more code handcraft like in CellsV1 to have basic functionality
- Performance is improving
- Much easier upgrade