OpenStack Stable What it actually means to maintain stable branches - - PowerPoint PPT Presentation

openstack stable
SMART_READER_LITE
LIVE PREVIEW

OpenStack Stable What it actually means to maintain stable branches - - PowerPoint PPT Presentation

OpenStack Stable What it actually means to maintain stable branches Matthew Treinish (irc: mtreinish) Matt Riedemann (irc: mriedem) Ihar Hrachyshka (irc: ihrachys) Newton Summit, Austin, TX - 2016/04/25 Agenda Introductions Who cares?


slide-1
SLIDE 1

OpenStack Stable

What it actually means to maintain stable branches

Matthew Treinish (irc: mtreinish) Matt Riedemann (irc: mriedem) Ihar Hrachyshka (irc: ihrachys)

Newton Summit, Austin, TX - 2016/04/25

slide-2
SLIDE 2

2

Agenda

  • Introductions
  • Who cares?
  • What is it?
  • How is it done?
  • Q & A
slide-3
SLIDE 3

3

Introductions

  • Matthew Treinish (HPE)

○ Core reviewer on QA projects and stable-maint-core ○ QA PTL from Juno through Mitaka ○ TC Member

  • Matt Riedemann (IBM)

○ Core reviewer on Nova projects and stable-maint-core ○ Nova PTL for Newton

  • Ihar Hrachyshka (Red Hat)

○ Core reviewer on Neutron projects and stable-maint-core

slide-4
SLIDE 4

4

slide-5
SLIDE 5

5

Who cares about stable branches?

  • Production clouds (per April 2016 user survey)
slide-6
SLIDE 6

6

Who cares about stable branches?

  • Distributions

○ Red Hat, Mirantis, SUSE, Canonical, etc ○ Release their latest product from stable branches after they have been ‘stable’ for a while already (2-6 months) ○ Support generally runs far longer than upstream ○ Security fixes

  • Production clouds that roll their own packages and patches
slide-7
SLIDE 7

7

What is “stable branch maintenance”?

  • Enforcing policy

○ What is the policy and who enforces it?

  • Backporting fixes

○ What’s appropriate and how are these identified?

  • Keeping the CI system running

○ What are the issues and how are they fixed?

slide-8
SLIDE 8

8

What is the stable branch policy?

  • Support phases

○ Phase 1 (first 6 months): all appropriate bug fixes ○ Phase 2 (6-12 months): critical / security fixes ○ Phase 3 (12+ months): security fixes

  • Appropriate fixes

○ No features or backward incompatible changes. ○ Clear user impact/benefit. ○ Self-contained and no new dependencies.

slide-9
SLIDE 9

9

What is the stable branch policy?

  • Team structure

○ Project-specific teams: identifies potential backports, reviews backports, requests releases, monitors CI status. ○ Stable-maint-core: approves project-specific stable core additions and guides the project-specific teams on policy.

  • Review guidelines

○ Fixes must be in upstream branches first, using cherry picks and the same Change Id and fall within the support phase.

slide-10
SLIDE 10

10

How are stable branches maintained?

  • Maintaining stable branches involves knowing the Continuous

Integration system as well as actually identifying fixes to backport, reviewing backports and requesting point releases.

slide-11
SLIDE 11

11

Periodic Stable Jobs

  • Nightly runs of unit tests, and tempest for each supported branch
  • Failures reported to openstack-stable@lists.openstack.org
  • All results viewable on openstack-health
slide-12
SLIDE 12

12

Periodic Stable Jobs

slide-13
SLIDE 13

13

Branchless Tempest

  • Starting in Icehouse Tempest doesn’t branch on releases
  • Tempest supports running against all stable branches
  • Ensures API backwards compatibility between releases
  • Each proposed Tempest commit runs against master, and all

supported stable branches

  • Much more throughput than periodic stable jobs, often the first to

catch issues

slide-14
SLIDE 14

14

Types of failures

Bugs in dependencies Examples:

  • Kernel bugs
  • Incompatible

requirements Get bug reported upstream, pin versions in upper constraints and global- requirements

Infra Breaks Examples:

  • Bad nodepool

images

  • service outages
  • mirrors broken

Make infra more resilient

Bugs in OpenStack Examples:

  • IP Allocation

bugs

  • races w/ multiple

workers

  • races w/ async

messaging Fix on master if still applicable, backport fixes where possible

Bugs in Tests Examples:

  • Use of

timestamps

  • Global state

corruption Backport fixes to unit and functional tests when appropriate, fix tempest bugs

Upstream Service Breaks Examples:

  • Public cloud
  • utages

Weather the storm

slide-15
SLIDE 15

15

Stable Failure Rates over Time

Tempest Failure Rates 1 Month before Juno EOL Number of Total Individual Tests Run Daily Per Branch

slide-16
SLIDE 16

16

Case study: the Neutron process

slide-17
SLIDE 17

17

Bug fix flow: ideal

All bugs fixed on master are also backported to stable (if affected)

master mitaka liberty kilo-eol mitaka distro liberty distro distro

slide-18
SLIDE 18

18

Bug fix flow: reality

Sometimes stable gets a fix, if someone cared (read: was affected).

master mitaka liberty distro mitaka distro liberty distro

slide-19
SLIDE 19

19

Backport starvation: reasons

  • Backports done “on demand”
  • No process to identify bug fixes to backport
  • No process to track backports to merge
  • It’s hard work and not very well understood (and not tracked in

Stackalytics)

slide-20
SLIDE 20

20

Starvation solutions: neutron/Liberty+

  • Backports done weekly
  • Get major stable branch consumers on board
  • Implement tools and process to manage backport flow

○ automate bug fix classification and review proposal process ○ goodies landing in openstack-infra/release-tools

slide-21
SLIDE 21

21

Starvation solutions: neutron/Liberty+

$ ./bugs-fixed-since.py neutron <previous-hash> | ./lp-filter-bugs-by-importance.py neutron | ./annotate-lp-bugs.py neutron … https://bugs.launchpad.net/bugs/1513765 "bulk delete of ports cost iptables-firewall too much time" (Critical,Fix Released) [bridge,in- stable-liberty,liberty-backport-potential,linuxbridge,mitaka-rc- potential,ovs,sg-fw] [kevinbenton] ...

slide-22
SLIDE 22

22

Starvation solutions: neutron/Liberty+

$ … | grep -v in-stable-mitaka $ … | grep linuxbridge $ … | grep -v in-stable-liberty | grep liberty- backport-potential ...

slide-23
SLIDE 23

23

Starvation solutions: neutron/Liberty+

$ ./lp-reset-backport-potential.py neutron python- neutronclient Processing project neutron… Removing liberty-backport-potential tag for 123456 Removing liberty-backport-potential tag for 234567 Processing project python-neutronclient… Removing liberty-backport-potential tag for 345678 Removing liberty-backport-potential tag for 456789

slide-24
SLIDE 24

24

Starvation solutions: gotchas

  • More backports - higher chance of regression, but

○ author is usually aware of an issue ○ still, it’s worth it ■ 50%+ downstream escalations could be avoided

  • Frequent releases needed for more granular updates

  • penstack/releases: ./new_release neutron mitaka
slide-25
SLIDE 25

25

Starvation solutions: future

  • Adopt proactive backport practice for other interested projects
  • Things missing:

○ clear documentation on the process ○ complete backport pipeline automation ○ more involvement from other major contributors

slide-26
SLIDE 26

26

Where to get more information

  • #openstack-stable channel on Freenode IRC
  • penstack-dev mailing list: http://lists.openstack.org/cgi-

bin/mailman/listinfo/openstack-dev

  • Docs: http://docs.openstack.org/project-team-guide/stable-branches.html
  • Weekly IRC meeting: https://wiki.openstack.org/wiki/Meetings/StableTeam
  • Periodic Stable Test Results:

http://status.openstack.org/openstack-health/#/g/build_queue/periodic-stable

slide-27
SLIDE 27

27

Questions?