STAT 605 Data Science Computing Introduction to Version Control: git - - PowerPoint PPT Presentation

stat 605 data science computing
SMART_READER_LITE
LIVE PREVIEW

STAT 605 Data Science Computing Introduction to Version Control: git - - PowerPoint PPT Presentation

STAT 605 Data Science Computing Introduction to Version Control: git Some materials adapted from Pro Git by Scott Chacon and Ben Straub Version control It is useful to record and track the changes to a project over time Revert to older


slide-1
SLIDE 1

STAT 605 Data Science Computing

Introduction to Version Control: git

Some materials adapted from Pro Git by Scott Chacon and Ben Straub

slide-2
SLIDE 2

Version control

It is useful to record and track the changes to a project over time

  • Revert to older versions (e.g., if we accidentally introduce a bug)
  • Compare different implementations of a function
  • Track who implemented what changes

Want to do this locally (i.e., stored on our own machine, not in the cloud)... ...but in a distributed manner (i.e., multiple people working on project at once).

For a more thorough discussion of why you should use version control, the problems that git seeks to solve, and how it solves them, see https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

slide-3
SLIDE 3

git: Distributed Version Control

Created by Linus Torvalds (also the creator of Linux) Free and open-source, available at https://git-scm.com/ Installation

Ubuntu: apt install git (you may need to use sudo) Windows/MacOS: https://git-scm.com/downloads

slide-4
SLIDE 4

Please see the lecture video for a demonstration

  • f git installation and configuration.
slide-5
SLIDE 5

git stores file snapshots over time

Image credit: S. Chacon and B. Straub. Pro Git

As we make changes to a project, git keeps track of those changes

Allows us to got back to an earlier version, if necessary

slide-6
SLIDE 6

Getting a git repository

Option 1: create a git repository Take a directory on your machine Start tracking files in that directory Option 2: clone an existing repository from elsewhere Take an existing git repo (e.g., an R package that you like) Create a copy of it on your local machine This also allows you to contribute back to a project, if you wish to do so

slide-7
SLIDE 7

Please see the lecture video for a demonstration

  • f creating and cloning repositories in git.
slide-8
SLIDE 8

Recording changes in your repository

All files that git tracks are in one of three states at a given time

Image credit: S. Chacon and B. Straub. Pro Git

Modified Staged Committed Untracked

slide-9
SLIDE 9

Three basic states: modified, staged, committed

A file in your git repository can be in one of three states:

Modified: file has been changed, but is not

yet committed to the database

Staged: a modified file that is ready to be

included in the next snapshot

Committed: the file is stored in the database

(i.e., a snapshot has been taken).

Image credit: S. Chacon and B. Straub. Pro Git

slide-10
SLIDE 10

Three basic states: modified, staged, committed

A file in your git repository can be in one of three states:

Modified: file has been changed, but is not

yet committed to the database

Staged: a modified file that is ready to be

included in the next snapshot

Image credit: S. Chacon and B. Straub. Pro Git

Note: not every file in a project directory has to be part of the repo. Thus, there may be files in a directory that are in none of these states, because they are not being tracked at all.

Committed: the file is stored in the database

(i.e., a snapshot has been taken).

slide-11
SLIDE 11

Please see the lecture video for a demonstration of adding files to the git repo and tracking changes.

slide-12
SLIDE 12

The basic workflow

1) Modify one or more files in your repository 2) Stage the changes that you wish to add to the next snapshot 3) Commit your changes. A snapshot of the staged files is stored.

slide-13
SLIDE 13

Reminder: git stores file snapshots over time

Image credit: S. Chacon and B. Straub. Pro Git

As we make changes to a project, git keeps track of those changes

Allows us to got back to an earlier version, if necessary

slide-14
SLIDE 14

The structure of the git repository

When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged

Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) (commit(s) that came directly before)

Add three files for staging, and commit.

slide-15
SLIDE 15

The structure of the git repository

When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged

Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) (commit(s) that came directly before) Commit object created by git commit

slide-16
SLIDE 16

The structure of the git repository

When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged

Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) (commit(s) that came directly before)

Now, git has created a commit object, which includes a pointer to the root tree object of the project. A blob object is created for each newly committed file. Roughly speaking, tree objects correspond to UNIX/Linux directories, while blob objects correspond to files. Add three files for staging, and commit.

slide-17
SLIDE 17

Commit object created by git commit Tree object created by git init and updated by commits Blob objects corresponding to the three files, created by git add

Image credit: S. Chacon and B. Straub. Pro Git

slide-18
SLIDE 18

Please see the lecture video for a demonstration of examining the git commit history

slide-19
SLIDE 19

Managing multiple versions: Branching in git

When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged

Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) (commit(s) that came directly before)

Add three files for staging, and commit.

slide-20
SLIDE 20

As we make additional changes and commit them, each commit points back to the commit immediate before it. A branch is simply a pointer to one of these commit objects.

Image credit: S. Chacon and B. Straub. Pro Git

slide-21
SLIDE 21

Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on.

Image credit: S. Chacon and B. Straub. Pro Git

slide-22
SLIDE 22

Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on. Note: the master branch is not special; it is just the default name for the first branch created by init.

Image credit: S. Chacon and B. Straub. Pro Git

slide-23
SLIDE 23

Creating a new branch

Head points to current (only) branch. The current branch, master, created

  • n initialization of the repository.

Image credit: S. Chacon and B. Straub. Pro Git

slide-24
SLIDE 24

Creating a new branch

Create a new branch called testing, pointing to the current commit. Head still points to current branch. New branch created by git branch. The current branch, master, created

  • n initialization of the repository.

Image credit: S. Chacon and B. Straub. Pro Git

slide-25
SLIDE 25

Please see the lecture video for a demonstration of creating a new branch with git branch and viewing the branch pointers using git log

slide-26
SLIDE 26

Branches: the basic workflow

Here is a project with three commits, and a single branch.

Image credit: S. Chacon and B. Straub. Pro Git

slide-27
SLIDE 27

Branches: the basic workflow

Here is a project with three commits, and a single branch. Create a new branch called iss53, and switch HEAD to that branch.

Image credit: S. Chacon and B. Straub. Pro Git

slide-28
SLIDE 28

Branches: the basic workflow

Here is a project with three commits, and a single branch. Create a new branch called iss53, and switch HEAD to that branch. Now we have a new branch, iss53, pointed to by HEAD (not shown). Any commits we make will be made to iss53, rather than master.

Image credit: S. Chacon and B. Straub. Pro Git

slide-29
SLIDE 29

If we make changes and commit them, the current branch moves forward, while master remains unchanged.

Branches: the basic workflow

Image credit: S. Chacon and B. Straub. Pro Git

slide-30
SLIDE 30

Please see the lecture video for a demonstration of switching between branches with git checkout and viewing the commit history of multiple branches.

slide-31
SLIDE 31

If we make changes in both of our branches, then they will have divergent histories. The changes in the two branches are isolated from one another. Eventually, we may want to merge them.

Image credit: S. Chacon and B. Straub. Pro Git

slide-32
SLIDE 32

Merge the changes made in branch iss53 into branch master.

Merging branches in git

Image credit: S. Chacon and B. Straub. Pro Git

slide-33
SLIDE 33

New commit created by merge operation.

Merging branches in git

After a merge like this, we can typically delete the branch that we merged: git branch -d iss53

Image credit: S. Chacon and B. Straub. Pro Git

slide-34
SLIDE 34

Please see the lecture video for a demonstration of merging branches with git merge

slide-35
SLIDE 35

Merge conflicts

What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like...

Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch Contents of index.html in branch being merged

slide-36
SLIDE 36

Merge conflicts

What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like...

Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch Contents of index.html in branch being merged We have to fix these sections before we can merge!

slide-37
SLIDE 37

Please see the lecture video for a demonstration

  • f fixing merge conflicts.
slide-38
SLIDE 38

Remote repositories

Your git repo can be hosted remotely on a server (e.g., on github)

Useful for collaboration (e.g., with a research group, company, etc) Once remote is set up, you and your team can push/pull data to/from it

Basic commands git fetch <remote> : retrieves data from <remote>

Note: this only downloads data. You still have to merge it.

git pull : automatically fetch and merge data from a remote branch

Note: your repo must be tracking the remote branch. See here for more information

git push <remote> <branch> : upload changes to the remote repo

Uploads the changes in your branch <branch> to the remote <remote> Note: the term “remote” does not necessarily mean that the repo is not on your local machine! It is just a repo that is not the one you are currently working in.

slide-39
SLIDE 39

Please see the lecture video for a demonstration

  • f pulling and pushing from/to a remote repo.
slide-40
SLIDE 40

git: a beginner’s recipe

When in doubt: 1. git add 2. git commit 3. git pull 4. git push Use git status liberally and at any time.

Sticking to these commands and this order will keep you out of trouble, but you’re better off reading the documentation and making sure you understand what’s going on under the hood.

slide-41
SLIDE 41

git: next steps

Of course, we have only scratched the surface of the tools available in git But you now know more than enough to work on basic projects To learn more, I recommend Pro Git by Scott Chacon and Ben Straub Available for free at https://git-scm.com/book/en/v2 Other resources: “Everyday git” quick guide https://git-scm.com/docs/giteveryday Documentation (also available through man git): https://git-scm.com/docs Version Control with Git by J. Loeliger (2009) O’Reilly