Compartmentalized Continuous Integration David Neto Devin Sundaram - - PowerPoint PPT Presentation

compartmentalized continuous integration
SMART_READER_LITE
LIVE PREVIEW

Compartmentalized Continuous Integration David Neto Devin Sundaram - - PowerPoint PPT Presentation

Compartmentalized Continuous Integration David Neto Devin Sundaram Senior MTS Senior MTS Altera Corp. THAT SPECIAL THING 2000 That special thing 2007 p4 vs. svn 2009 Collaboration++ THREE TAKEAWAYS Continuous Integration is tough


slide-1
SLIDE 1

Compartmentalized Continuous Integration

David Neto Devin Sundaram Senior MTS Senior MTS Altera Corp.

slide-2
SLIDE 2

THAT SPECIAL THING

2007 p4 vs. svn 2009 Collaboration++ 2000 That special thing…

slide-3
SLIDE 3

THREE TAKEAWAYS

  • Continuous Integration is tough with a complex build
  • Compartmentalize

= Classify + filter the change going into your integration build

  • Track your own metadata for a codeline

– With triggers and a second Perforce repository

slide-4
SLIDE 4

Continuous Integration

slide-5
SLIDE 5

BROADCAST FEATURES / BUGFIX

slide-6
SLIDE 6

FIND AND FIX DEFECTS EARLY

Defect cost Risk to fix Release date Time

slide-7
SLIDE 7

SYSTEMS FAIL AT THE SEAMS No substitute for end-to-end test

slide-8
SLIDE 8

INTEGRATION BUILD IS YOUR PRODUCT

  • Integration build = put all pieces together
  • It’s what you deliver. Everything else is just pretend.
  • Communicate functionality across your team

– Broadcast new feature / bugfix

  • Complex systems fail at the seams

– Feedback for developers

slide-9
SLIDE 9

CONTINUOUS INTEGRATION

  • Make an Integration Build as often as possible
  • It’s the heartbeat of your project
slide-10
SLIDE 10

SHAPES ALL PROCESS AND INFRASTUCTURE

  • Supporting practices [Fowler]:

– Maintain a code repository – Automate the build – Make the build self-testing – Commit as often as possible – Every commit to mainline should be built

– Keep the build fast

– Test in a clone of production environment – Make it easy to get latest deliverables – Everyone can see result of latest build – Automate deployment

slide-11
SLIDE 11

ALTERA’S SOFTWARE BUILD

  • Altera makes Field Programmable Gate Arrays (FPGA)

– Programming = Rewiring – 3.9 billion transistors!

  • Altera Complete Design Suite (ACDS)

= Development tools

  • ACDS Build:

– 255K source files, 45GB – ~400 developers, 5 locations worldwide – 14 hour build, multiprocessor, multiplatform – Hundreds of source changes per day

slide-12
SLIDE 12

MULTI LAYER SYSTEM CHALLENGE

  • Long time to build Device data
  • Rapid development within and across layers

– E.g. Roll out new device family – E.g. DDR memory interface support crosses 5 layers

Device data Low level compiler Debug and analysis System integration tools IP cores Domain specific Physical models: logic, timing, power

slide-13
SLIDE 13

TOO MANY CHANGES à à BUILD RISK

  • Probability of a clean build drops quickly with number of

new changes

  • When it breaks, hard to tell whose fault

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201

Number of changes

Probability of a clean build

99% per change reliability 95% per change reliability

37% 13%

slide-14
SLIDE 14

STALE BASELINE à à COMPOUNDED RISK

  • The longer you go without an integration build, the

higher the risk

– Blind to recent data, API, code

  • Skip too many heartbeats à PROJECT DIES
slide-15
SLIDE 15

SOLUTION: COMPARTMENTALIZATION

  • Must keep integration build stable
  • Limit the damage to the whole by separating the parts
  • But how?
slide-16
SLIDE 16

Compartmentalization: Previous approaches

slide-17
SLIDE 17

STAGED BUILDS [Fowler]

  • Full build = pipeline of smaller builds
  • Most developers work with output of earlier stage
  • In our case: Too slow

– Device data build = 4 hours – Most layers built later: need device info

  • Verdict: Does not solve our problem
slide-18
SLIDE 18

INCREMENTAL REBUILD BOT

  • Each change automatically built on latest stable base

+ changes since stable base

– Tell developer if it passed or broke the bot

  • Tricky policy:

– If a new change breaks the bot: Keep or Eject?

  • In our case:

– Can’t rely on perfect dependencies – Device change à full integration build – Apparent developer reliability improves

  • Verdict: We use it, but does not solve whole problem
slide-19
SLIDE 19

MULTIPLE CODELINES: Strategy

  • Partition active work into different codelines
  • Qualify separately, with module build
  • Frequently integrate “main” à private
  • Occasionally integrate private à “main”

Main Private2 Private1

slide-20
SLIDE 20

MULTIPLE CODELINES: Variations

  • Development codelines [Wingerd]
  • Microsoft’s Virtual Build Labs [Maraia]
  • Inside/Outside codelines, Remote development lines,

Change Propagation Queues, … [Appleton et. al.]

  • Virtual Codelines (one codeline + just-in-time

branching) [Appleton et.al.]

slide-21
SLIDE 21

MULTIPLE CODELINES: Issues

  • Integration is manual

– Requires superhero to integrate. Painful. – Manual implies infrequent. Delays integration.

  • Hard / impossible to develop a change across

components Main Private2 Private1 Painful?!

slide-22
SLIDE 22

MULTIPLE CODELINES: Verdict

  • Ok if perfect modularity
  • Manual
  • Infrequent
  • Inflexible: Can’t develop across components
  • “Occasional” integration, not Continuous!

“90% of SCM "process" is enforcing codeline promotion to compensate for the lack of a mainline” -- Wingerd

slide-23
SLIDE 23

Compartmentalization: Altera’s solution

slide-24
SLIDE 24

REQUIREMENTS

  • One codeline
  • No client customization: Server side only
  • Transparent to most users, most of the time
  • Support ad hoc cross-component development
  • Automatic: Hands off operation
slide-25
SLIDE 25

GATEKEEPER STRATEGY

  • Limit the amount of untested code accepted into the

integration build

  • All code is guilty until proven innocent
  • Integration build uses only innocent (verified) code*
  • Each file revision in one of two integration states:
  • Upgrade from Fresh to Verified when used in a

successful Gatekeeper build Fresh Verified

*Some exceptions

slide-26
SLIDE 26

COMPARTMENTALIZE = CLASSIFY + FILTER

Classify into Domains Gatekeepers Integration

slide-27
SLIDE 27

CLASSIFICATION: ZONES, DOMAINS

  • Classify each submitted change into a domain
  • Site = Dev location that makes an integration build
  • Zone = named set of depot paths

– One zone for each major component – Zone can be “site specific”

  • When lots of activity in that zone, and want to protect a site from

bad changes from other sites

  • Domains =

– Zone – { Zone:Site | for each Site, each site-specific Zone } – COMBO

  • If a change touches files in more than one Zone
slide-28
SLIDE 28

GATEKEEPER RESPONSIBILITY

  • Each Gatekeeper is responsible for a Domain

– Validates Fresh changes in that Domain

  • Run part or all of the build

– Uses Fresh revisions from its own Domain – Verified code otherwise

  • If ok, update integration state:

foo.c #1 #2 #3 #4 #5 foo.c #1 #2 #3 #4 #5

slide-29
SLIDE 29

EXAMPLE GATEKEEPER

N + 1 Gatekeeper Integration N Integration Runs part of the build, on top of previous full build. Responsible for

  • ne domain,

uses verified source from two

  • thers.
slide-30
SLIDE 30

OTHER GATEKEEPER: SPREAD + LIMIT RISK

N + 1 Gatekeeper Integration N Integration In general, limited amount of change going into any one build. Climb the reliability curve! Fresh

slide-31
SLIDE 31

GATEKEEPER CAN RUN WHOLE BUILD

N + 1 Gatekeeper Integration But responsible for just one domain. COMBO builds do this

slide-32
SLIDE 32

EXCLUSION RULE

  • Should avoid “broken by construction” gatekeepers
  • Rule: Each file may have fresh revisions from at

most one domain

– Conflicts from: Site-specific zones; COMBO

  • Allow many fresh revisions from same domain

– Enable rapid development

foo.c #1 #2 #3 A #4 A #5 A foo.c #1 #2 #3 A #4 A #5 B

slide-33
SLIDE 33

E.g. Alice (site TO) submits foo.c, foo.h

foo.c #4 #5 q:TO foo.h #1 #2 q:TO Alice changed param type q:TO Gatekeeper uses #5 q:TO #2 q:TO q:SJ Gatekeeper uses #4 #1 Zone “q” is site-specific TO, SJ are sites

slide-34
SLIDE 34

Bob (site SJ) develops update to foo.c …

foo.c #4 #5? q:SJ foo.h #1 Bob does not know about Alice’s change

slide-35
SLIDE 35

Bob resolves to Alice’s change

foo.c #4 #5 q:TO foo.h #1 #2 q:TO #6? q:SJ

slide-36
SLIDE 36

What if we allow Bob to submit?

foo.c #4 #5 q:TO foo.h #1 #2 q:TO q:SJ Gatekeeper uses #1 #6 q:SJ #6 q:SJ BROKEN BY CONSTRUCTION Sees only half of Alice’s change!

slide-37
SLIDE 37

Exclusion Rule avoids broken-by-construction

foo.c #4 #5 q:TO foo.h #1 #2 q:TO #6 q:SJ Exclusion rule detects this conflict, Rejects Bob’s change #6? q:SJ

slide-38
SLIDE 38

Bob waits until Alice’s change is verified

foo.c #4 #5 foo.h #1 #2 #6 q:SJ Now Bob’s change is accepted #6 q:TO

slide-39
SLIDE 39

NOMADIC OWNERSHIP

  • Exclusion Rule creates temporary “ownership” of a file

– Delays updates destined to other domains – Especially within site-specific zones

  • Sometimes annoying

– Willing to pay the price – Better than the alternatives!

  • Minimized by refactoring: break up files
  • But it’s flexible and automatic

– Temporary ownership migrates according to update patterns

slide-40
SLIDE 40

SOMETIMES BYPASS GATEKEEPERS

  • Rare, long build time

– E.g. COMBO

  • Site-protected, long

build time

  • Acceptable

integration risk

slide-41
SLIDE 41

TURN OFF GATEKEEPERS

  • Late in release cycle

– Development has slowed – Each change carefully reviewed

  • Low integration risk
  • Avoid annoyance of

Exclusion Rule

slide-42
SLIDE 42

Mechanics

slide-43
SLIDE 43

INTEGRATION STATUS TRACKING: WHAT

  • For each file revision keep:

– State: Fresh, or Verified – Domain – User, change#, depot path

  • Store in log-structured control file
  • But only need this for recent revisions

– Only Fresh revisions can conflict – Each revision eventually Verified – Need only from oldest Fresh until #head

  • Purge older records
slide-44
SLIDE 44

INTEGRATION STATUS TRACKING: HOW

  • Integration status must always be up-to-date

– Needed for Exclusion Rule, checked in change-submit trigger

  • Can’t just use floating labels:

– P4 triggers can’t update P4 metadata !

  • Need:

– Fast atomic updates to control file – Only need latest version – Compact storage

  • Store in a second Perforce repository!

– Filetype text+S512: Purges old contents

slide-45
SLIDE 45

USING A SECOND PERFORCE REPOSITORY

Primary P4D Sister P4D Triggers Build scripts: Code selection. Mark-as-Verified One control file per tracked codeline Users submit

slide-46
SLIDE 46

CONFIGURATION

  • Map developers to sites via P4 Group membership

– E.g. p4sip.to.users, p4sip.sj.users

  • Codeline policy defined in “p4sip.zone” file

– Stored in root of codeline: Carried into branches – Sites – Named zones

  • Mapping to depot paths relative to the codeline
  • Which zones are site-specific

– Parse output of “p4 print”: Safe in triggers

  • Triggers

– From-out, form-in: “change” forms – Per codeline: change-submit, change-commit

slide-47
SLIDE 47

LIFE CYCLE OF A CHANGE SUBMISSION (1/2)

p4 submit User Primary P4D Sister P4D Lookup user site, insert into change description Validate site in change description Edit change form* Send list of files Ok Form-out: Form-in: * Can change site string: Masquerade as other site, or force COMBO

slide-48
SLIDE 48

p4 revert, sync, edit

LIFE CYCLE OF A CHANGE SUBMISSION (2/2)

User Primary P4D Sister P4D Assign to domain Send file contents Send list of files Ok, or Error with list of conflicts Check Exclusion Rule Change-submit Read control file p4 print Assign to domain Add revision records Change-commit Edit control file Submit Ok p4 submit Point of no return

slide-49
SLIDE 49

MAKING A BUILD

  • 1. Produce “build label”: revisions to use

– Select: Base build label + base domains, Verified domains, Fresh domains – Use “painter algorithm”:

  • For each file, take latest revision in any category
  • 2. Compile + smoketest (subset of) code
  • 3. If not ok:

– Notify errant developer – If integration build: Repair build, update label

  • 4. If ok:

– Publish build label along with binary – Mark revisions in label as Verified

  • Use same revert/sync/edit/submit logic as the commit trigger
slide-50
SLIDE 50

Summary

slide-51
SLIDE 51

WHY DOES IT WORK?

  • Climb the reliability curve
  • Gatekeepers catch most breaks
  • Run Gatekeepers often: usually fast
  • Reduce “fog of war” in Integration build
slide-52
SLIDE 52

USABILITY

  • Most people never notice
  • Exclusion rule can annoy
  • “Build temporal mechanics”

– Uncertain delay between submit and getting into the build – Changes appear out of order in different sites

slide-53
SLIDE 53

EFFECTIVENESS

  • Integration build each day x 3 sites
  • Changes from site X

– In site X’s integration build next day – In site Y’s integration build 36-48 hours later

  • No performance loss
  • Control file typically ~ 2K revision records
  • 175,000 changes in 3 years
slide-54
SLIDE 54

OTHER USES FOR YOUR OWN METADATA?

  • Can track whatever metadata you want

– Within a “recent” time window

  • Change review and approval
  • Nested transactions?

– “Group commit” emerges from Exclusion rule

  • Alice, Andrew, Amy commit changes in TO
  • SJ build sees all or none of them

– Could generalize this…?

slide-55
SLIDE 55

ANOTHER TOOL FOR YOUR TOOLBOX

Staged builds Unit tests

slide-56
SLIDE 56

Acknowledgements

  • Rob Romano
  • Jeff Da Silva
  • David Karchmer
  • Mun Soon Yap
  • Addy Yeow
  • Alan Herrmann
  • And all the developers and builders at Altera
slide-57
SLIDE 57

Thank You!