112 quotes & text 1920x1080 72 URLs & citations 72 - - PowerPoint PPT Presentation

112 quotes amp text
SMART_READER_LITE
LIVE PREVIEW

112 quotes & text 1920x1080 72 URLs & citations 72 - - PowerPoint PPT Presentation

112 quotes & text 1920x1080 72 URLs & citations 72 code{:;} 36 credits Growing Pains Software Repositories at SCALE Do you put all of your bits in a single gigantic repository or many smaller ones? Why are we even asking?


slide-1
SLIDE 1

112 quotes & text

72 URLs & citations 72 ¡code{:;} ¡

36 credits

1920x1080

slide-2
SLIDE 2

Growing Pains

Software Repositories at SCALE

slide-3
SLIDE 3

Do you put all of your bits in a single gigantic repository or many smaller ones?

slide-4
SLIDE 4

Why are we even asking?

  • Ten years ago most people were using centralized SCMs.
  • Nature of Software Development has changed.
  • Software projects have become more complicated.
  • More outsourcing and partnering.
slide-5
SLIDE 5

Outline

  • Some historical context.
  • Kinds of SCMs.
  • Advantages and disadvantages of Monorepo & Multirepo.
  • What serves you?
slide-6
SLIDE 6

BitKeeper

1999

Subversion

2000

TFS

2005

Arch

2002

monotone

2003

ArX

2003

Bazaar

2005

mercurial

2005

git

2005

Darcs

2002

SVK

2003

AccuRev

2002

fossil

2007

slide-7
SLIDE 7

BitKeeper

1999

Subversion

2000

git

2005

mercurial

2005

Perforce

2000

2015 and beyond

slide-8
SLIDE 8

vs

Distributed Centralized

SCM db Workspace Workspace Workspace Workspace Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace
slide-9
SLIDE 9

Centralized SCM

  • Serializes what is really parallel work.
  • Merge then commit model. Means

you can’t test changes in isolation.

  • No local sandboxes. Mixes

‘committing’ and ‘publishing’ code.

  • Branches are heavyweight.
  • Limited workflow.
  • Partial checkouts.
  • Binary handling.
  • Single place to backup / you know

where your source is.

  • Security: you can set up

permissions on the server.

  • File Locking.

Advantages Disadvantages

SCM db Workspace Workspace Workspace Workspace Workspace
slide-10
SLIDE 10

Distributed SCM

  • Workspaces take up more

space since they include the full history.

  • Binary files can be a problem.
  • No partial checkouts.
  • Hard to control access.
  • Commit then merge.
  • Separates commit from
  • publishing. Gives you a local

sandbox.

  • Implicit backup.
  • More flexible workflows.
  • Branches are lightweight.

Advantages Disadvantages

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace
slide-11
SLIDE 11

Why did DVCS overtake centralized systems?

slide-12
SLIDE 12

What role does the SCM have?

slide-13
SLIDE 13

SCM as Backup

  • Check files in.
  • Check files out.
  • Occassionally revert to a previous

version.

slide-14
SLIDE 14

SCM as Detective

  • When was this bug introduced?
  • Bisect
  • History exploration tools.
  • Who deleted this?
  • Why is this code this way?
slide-15
SLIDE 15

SCM as Data

  • Historically, how long does it take us

to develop a feature?

  • How long to fix a bug?
  • Which areas of the code are

unmaintained? Obsolete? Can be removed?

slide-16
SLIDE 16

SCM as Post Mortem

  • What caused us to ship this bug?
  • What could we have done to prevent

it?

slide-17
SLIDE 17

It’s about Workflow

slide-18
SLIDE 18

Centralized Workflow with DVCS

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

  • fficial bits
slide-19
SLIDE 19

Workflow with DVCS

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

  • fficial bits
slide-20
SLIDE 20

Workflow with DVCS

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

  • fficial bits
slide-21
SLIDE 21

Workflow with DVCS

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

merge test

  • fficial bits
slide-22
SLIDE 22

Every workspace is a branch

slide-23
SLIDE 23

Three Problems with DVCS

Binary Files Security Large Source Bases

slide-24
SLIDE 24

Three Problems with DVCS

Binary Files Large Source Bases Security

slide-25
SLIDE 25

Binaries Don’t Diff Well

  • Rolling checksums help “chunk”.
  • However, some file formats trickle

changes.

  • Video formats.
  • Image formats.
  • Storing every copy bloats the history.
slide-26
SLIDE 26

Binary Files

Solution: Make them act more like centralized systems! Replace binary files in history with pointers. And store the contents in a server (or many). If someone wants an old copy, it’s fetched on demand. BitKeeper BAM Git LFS Mercurial LFE

slide-27
SLIDE 27

Three Problems with DVCS

Binary Files Security Large Source Bases

slide-28
SLIDE 28

Security in DVCS

  • With a monorepo 


→ All or nothing.

  • With multirepo (including nested) 


→ Access at a repository level.

  • Read vs Write Access 


→ Anyone can commit, don’t let them push!

slide-29
SLIDE 29

Three Problems with DVCS

Binary Files Security Large Source Bases

slide-30
SLIDE 30

LARGE source bases

slide-31
SLIDE 31

Number of Files 0K 350K 700K 1.050K 1.400K Number of Commits 1,25M 2,5M 3,75M 5M

4M (bk) 7.656M Facebook (git) 9.500M Android (repo) 14.362M 1M (bk) 2.696M Linux (git) 599M FreeBSD Ports (SVN) 238M FreeBSD Src (SVN) 896M

slide-32
SLIDE 32

Number of Files 0M 250M 500M 750M 1.000M Number of Commits 10M 20M 30M 40M

All Combined 0,08T Google 86T

slide-33
SLIDE 33
slide-34
SLIDE 34

Monorepo vs Multirepo

slide-35
SLIDE 35
slide-36
SLIDE 36

Some Disadvantages

  • A little too easy to share.
  • Access control. (E.g. Outsourcing.)
  • Noisy commit messages.
  • Cloning no longer an option.
slide-37
SLIDE 37

Not just LARGE
 also COMPLICATED

slide-38
SLIDE 38

Library API

slide-39
SLIDE 39

What about multirepo?

slide-40
SLIDE 40

Library API

server.git webapp.git restapi.git app.git macApp.git droid.git WinApp.git libglue.git

slide-41
SLIDE 41

ONE DOES NOT SIMPLY CHANGE A PUBLIC API

slide-42
SLIDE 42

Problems of Multirepo

  • Loss of atomicity.
  • Loss of the ability to use SCM tools.
  • That feeling of “Never change anything”.
  • Having multiple repositories breaks tools that interact with the

SCM.

slide-43
SLIDE 43

Mono vs Multi? 
 How about a Hybrid?

Solution: Stitch together multiple repositories into one.

  • Partial Checkouts.
  • Preserves Atomic Commits.
  • You can decouple and reuse components.
slide-44
SLIDE 44

Case Study: Git Submodules

Repository Submodule

e46fe3df01435bf523d2ab4f2755556c0e4e6f78 /submodule/path/in/repo .gitmodules http://some_server/submodule

slide-45
SLIDE 45

Repository Submodule

Case Study: Git Submodules

Repository Submodule

http://some_server/submodule

Submodule

clone

slide-46
SLIDE 46

Repository Submodule

Case Study: Git Submodules

Repository

http://some_server/submodule

Submodule

clone

Submodule

clone

slide-47
SLIDE 47

Repository Submodule

Case Study: Git Submodules

http://some_server/submodule

Submodule Repository Submodule

push push

slide-48
SLIDE 48

Repository Submodule

Case Study: Git Submodules

http://some_server/submodule

Submodule Repository Submodule

sync

slide-49
SLIDE 49

Case Study: Git Submodules

fatal: ¡reference ¡isn’t ¡a ¡tree: ¡6c…e0 ¡ Unable ¡to ¡checkout ¡'6c…e0' ¡in ¡submodule ¡path ¡'sub' Means Someone forgot to push the submodule ‘sub’.

slide-50
SLIDE 50

Case Study: Git Submodules

submodule ¡$ ¡git ¡push ¡ Everything ¡up-­‑to-­‑date Means You made a commit in the submodule while it was in a detached head state (the default). You will cause the problem outlined in the previous slide.

slide-51
SLIDE 51

MY BRAIN HURTS

slide-52
SLIDE 52

Git Submodules are too loosely coupled with the main repo.

slide-53
SLIDE 53

Key Insight

  • We’ve seen this problem before: 


CVS

  • We’ve solved this problem before: 


ChangeSets bind changes to independent files together.

  • What if we treat repositories the same way we treat files?
slide-54
SLIDE 54

A component is to a product like a file is to a repository

slide-55
SLIDE 55

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Clone

slide-56
SLIDE 56

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Pull

slide-57
SLIDE 57

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Push

slide-58
SLIDE 58

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Clone

slide-59
SLIDE 59

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Detach

SCM db Workspace

slide-60
SLIDE 60

SCM db Workspace

Product Components

SCM db Workspace SCM db Workspace SCM db Workspace

BitKeeper Nested

Port

SCM db Workspace

slide-61
SLIDE 61
  • Goes better with distributed.
  • Project has conceptual

boundaries.

  • You can work with a small

number of components.

  • Outsourcing, working with

partners.

Multirepo

So?

  • Goes better with centralized.
  • Project boundaries are not clear

(files move around).

  • Lots of reuse, origin doesn’t

matter.

  • Huge source base and need

most of it. No natural boundaries.

Monorepo

  • Goes better with distributed.
  • Takes atomic commits from

monorepo.

  • Takes conceptual boundaries

from multirepo.

  • You can clone components but

still work within overall structure.

Hybrid

slide-62
SLIDE 62

Don’t let your tools determine your workflow

slide-63
SLIDE 63

Distributed SCM workflows are MORE FLEXIBLE MORE FLEXIBLE And can be sprinkled with enough centralized JUJU to make them scale

slide-64
SLIDE 64

Ahhh, people ask me questions,
 lost in confusion Well, I tell them there’s no problem,


  • nly solutions
slide-65
SLIDE 65

The End

slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68

MY BRAIN STILL HURTS