STAT 605 Data Science Computing Introduction to Version Control: git - PowerPoint PPT Presentation

STAT 605 Data Science Computing Introduction to Version Control: git Some materials adapted from Pro Git by Scott Chacon and Ben Straub

Version control It is useful to record and track the changes to a project over time ● Revert to older versions (e.g., if we accidentally introduce a bug) ● Compare different implementations of a function ● Track who implemented what changes Want to do this locally (i.e., stored on our own machine, not in the cloud)... ...but in a distributed manner (i.e., multiple people working on project at once). For a more thorough discussion of why you should use version control, the problems that git seeks to solve, and how it solves them, see https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control

git : Distributed Version Control Created by Linus Torvalds (also the creator of Linux) Free and open-source, available at https://git-scm.com/ Installation Ubuntu: apt install git (you may need to use sudo ) Windows/MacOS: https://git-scm.com/downloads

Please see the lecture video for a demonstration of git installation and configuration.

git stores file snapshots over time As we make changes to a project, git keeps track of those changes Allows us to got back to an earlier version, if necessary Image credit: S. Chacon and B. Straub. Pro Git

Getting a git repository Option 1: create a git repository Take a directory on your machine Start tracking files in that directory Option 2: clone an existing repository from elsewhere Take an existing git repo (e.g., an R package that you like) Create a copy of it on your local machine This also allows you to contribute back to a project, if you wish to do so

Please see the lecture video for a demonstration of creating and cloning repositories in git .

Recording changes in your repository All files that git tracks are in one of three states at a given time Untracked Committed Modified Staged Image credit: S. Chacon and B. Straub. Pro Git

Three basic states: modified, staged, committed A file in your git repository can be in one of three states: Modified : file has been changed, but is not yet committed to the database Staged: a modified file that is ready to be included in the next snapshot Committed: the file is stored in the database (i.e., a snapshot has been taken). Image credit: S. Chacon and B. Straub. Pro Git

Three basic states: modified, staged, committed A file in your git repository can be in one of three states: Modified : file has been changed, but is not yet committed to the database Note: not every file in a project directory has to Staged: a modified file that is ready to be be part of the repo. Thus, there may be files in included in the next snapshot a directory that are in none of these states, because they are not being tracked at all. Committed: the file is stored in the database (i.e., a snapshot has been taken). Image credit: S. Chacon and B. Straub. Pro Git

Please see the lecture video for a demonstration of adding files to the git repo and tracking changes.

The basic workflow 1) Modify one or more files in your repository 2) Stage the changes that you wish to add to the next snapshot 3) Commit your changes. A snapshot of the staged files is stored.

Reminder: git stores file snapshots over time As we make changes to a project, git keeps track of those changes Allows us to got back to an earlier version, if necessary Image credit: S. Chacon and B. Straub. Pro Git

The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit.

The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Commit object created by git commit

The structure of the git repository When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit. Now, git has created a commit object , which includes Roughly speaking, tree objects a pointer to the root tree object of the project. correspond to UNIX/Linux directories, A blob object is created for each newly committed file. while blob objects correspond to files.

Commit object created Tree object created by git init by git commit and updated by commits Blob objects corresponding to the three files, created by git add Image credit: S. Chacon and B. Straub. Pro Git

Please see the lecture video for a demonstration of examining the git commit history

Managing multiple versions: Branching in git When you commit to the repo: Git stores an object with a pointer to the snapshot of the files you staged Object also includes additional information, e.g., commit author, message, etc Commit object stores a pointer to its parent(s) ( commit(s) that came directly before ) Add three files for staging, and commit.

As we make additional changes and commit them, each commit points back to the commit immediate before it. A branch is simply a pointer to one of these commit objects. Image credit: S. Chacon and B. Straub. Pro Git

Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on. Image credit: S. Chacon and B. Straub. Pro Git

Two different branches, both pointing to the same commit. The head points to the current branch. That is, the branch that we are currently working on. Note: the master branch is not special; it is just the default name for the first branch created by init. Image credit: S. Chacon and B. Straub. Pro Git

Creating a new branch Head points to current (only) branch. The current branch, master , created on initialization of the repository. Image credit: S. Chacon and B. Straub. Pro Git

Creating a new branch Create a new branch called testing , pointing to the current commit. Head still points to current branch. The current branch, master , created on initialization of the repository. New branch created by git branch . Image credit: S. Chacon and B. Straub. Pro Git

Please see the lecture video for a demonstration of creating a new branch with git branch and viewing the branch pointers using git log

Branches: the basic workflow Here is a project with three commits, and a single branch. Image credit: S. Chacon and B. Straub. Pro Git

Branches: the basic workflow Here is a project with three commits, and a single branch. Create a new branch called iss53 , and switch HEAD to that branch. Image credit: S. Chacon and B. Straub. Pro Git

Branches: the basic workflow Here is a project with three commits, and a single branch. Create a new branch called iss53 , and switch HEAD to that branch. Now we have a new branch, iss53 , pointed to by HEAD (not shown). Any commits we make will be made to iss53 , rather than master . Image credit: S. Chacon and B. Straub. Pro Git

Branches: the basic workflow If we make changes and commit them, the current branch moves forward, while master remains unchanged. Image credit: S. Chacon and B. Straub. Pro Git

Please see the lecture video for a demonstration of switching between branches with git checkout and viewing the commit history of multiple branches.

If we make changes in both of our branches, then they will have divergent histories . The changes in the two branches are isolated from one another. Eventually, we may want to merge them. Image credit: S. Chacon and B. Straub. Pro Git

Merging branches in git Merge the changes made in branch iss53 i nto branch master . Image credit: S. Chacon and B. Straub. Pro Git

Merging branches in git New commit created by merge operation. After a merge like this, we can typically delete the branch that we merged: git branch -d iss53 Image credit: S. Chacon and B. Straub. Pro Git

Please see the lecture video for a demonstration of merging branches with git merge

Merge conflicts What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like... Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch Contents of index.html in branch being merged

Merge conflicts What if we make changes to the same part of the same file in two branches? git may not know how to merge them, and we’ll get an error like... Note: You can use git status to get more information about what went wrong. Files with merge conflicts will have sections that look like this. Contents of index.html in HEAD branch We have to fix these sections before we can merge! Contents of index.html in branch being merged

Please see the lecture video for a demonstration of fixing merge conflicts.

STAT 605 Data Science Computing Introduction to Version Control: git - PowerPoint PPT Presentation

STAT 605 Data Science Computing Introduction to Version Control: git Some materials adapted from Pro Git by Scott Chacon and Ben Straub Version control It is useful to record and track the changes to a project over time Revert to older

STAT 605 Data Science Computing Introduction to the UNIX/Linux command line Why UNIX/Linux? As a

STAT 605 Data Science Computing grep and regular expressions Text data is ubiquitous Examples:

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

STAT 605 Data Science Computing Introduction to Shell Scripting Basic concepts Shell : the

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

HAND COUNTY AUDITOR 415 WEST FIRST AVENUE MILLER, SOUTH DAKOTA 57362.1346 (605) 853-2182 FAX;

CHALLENGER 605 NEW PROSPECT PRESENTATION CL605-5936 BOMBARDIER AEROSPACE / BUSINESS AIRCRAFT

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

STAT 830 Non-parametric Inference Basics Handwritten Notes Richard Lockhart Simon Fraser

1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values Sample (n=30) drawn using Minitab:

Special cases of lower previsions and their use in statistics Part II: Statistics with interval

Schools Technical Advisory Team Meeting #2 November 12, 2019 STAT Meeting #2 Welcome! STAT

Schools Technical Advisory Team Meeting #6 February 18, 2020 STAT Meeting #6 Welcome! STAT

Schools Technical Advisory Team Meeting #5 January 28, 2020 STAT Meeting #5 Welcome! STAT

Neural Networks as Stat Mech Systems Based on arXiv:1710.06570 [stat.ML], A

STAT 113 Tests and Confidence Intervals Colin Reimer Dawson Oberlin College October 10th, 2016

Version control & Automation [ git & make ] Dani Arribas-Bel & Thomas De Graaff

--for-hackathons A Fast Introduction to Version Control Whos done this before?

Persistent Data Structures (Version Control) Partial Partial Full Full Confluently

Revision Control (short version) Andrew Haydn Grant Technical Director MIT Game Lab September

Version control [ GitHub ] Thomas De Graaff August 23, 2016 Introduction Assignments: Working

Welcome and Introduction Welcome and Introduction Programming for Statistical Programming for

Version Control with git or: Why you dont want to live without it! living knowledge WWU

CSCI 2132 Software Development Lab 3: SVN Source Version Control Tutorial Instructor: Vlado

STAT 605 Data Science Computing Introduction to Version Control: git - PowerPoint PPT Presentation

STAT 605 Data Science Computing Introduction to Version Control: git Some materials adapted from Pro Git by Scott Chacon and Ben Straub Version control It is useful to record and track the changes to a project over time Revert to older

STAT 605 Data Science Computing Introduction to the UNIX/Linux command line Why UNIX/Linux? As a

STAT 605 Data Science Computing grep and regular expressions Text data is ubiquitous Examples:

STAT 605 Data Science Computing Introduction to sed and awk Editing text streams: sed sed is short

STAT 605 Data Science Computing Introduction to Shell Scripting Basic concepts Shell : the

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

HAND COUNTY AUDITOR 415 WEST FIRST AVENUE MILLER, SOUTH DAKOTA 57362.1346 (605) 853-2182 FAX;

CHALLENGER 605 NEW PROSPECT PRESENTATION CL605-5936 BOMBARDIER AEROSPACE / BUSINESS AIRCRAFT

V2 28 May 2015 What Is Wrong With Stat 101? 1 2 V2 2015 USCOTS Whats Wrong with Stat 101?

STAT 830 Non-parametric Inference Basics Handwritten Notes Richard Lockhart Simon Fraser

1 2019 STAT 373/ Week 9 STAT 814_STAT714 Population values Sample (n=30) drawn using Minitab:

Special cases of lower previsions and their use in statistics Part II: Statistics with interval

Schools Technical Advisory Team Meeting #2 November 12, 2019 STAT Meeting #2 Welcome! STAT

Schools Technical Advisory Team Meeting #6 February 18, 2020 STAT Meeting #6 Welcome! STAT

Schools Technical Advisory Team Meeting #5 January 28, 2020 STAT Meeting #5 Welcome! STAT

Neural Networks as Stat Mech Systems Based on arXiv:1710.06570 [stat.ML], A

STAT 113 Tests and Confidence Intervals Colin Reimer Dawson Oberlin College October 10th, 2016

Version control &amp; Automation [ git &amp; make ] Dani Arribas-Bel &amp; Thomas De Graaff

--for-hackathons A Fast Introduction to Version Control Whos done this before?

Persistent Data Structures (Version Control) Partial Partial Full Full Confluently

Revision Control (short version) Andrew Haydn Grant Technical Director MIT Game Lab September

Version control [ GitHub ] Thomas De Graaff August 23, 2016 Introduction Assignments: Working

Welcome and Introduction Welcome and Introduction Programming for Statistical Programming for

Version Control with git or: Why you dont want to live without it! living knowledge WWU

CSCI 2132 Software Development Lab 3: SVN Source Version Control Tutorial Instructor: Vlado

Version control & Automation [ git & make ] Dani Arribas-Bel & Thomas De Graaff