Introduction

  • Understanding the use and mechanics of Version Control Systems (VCSs)
  • Focus on Git, the de facto standard for version control

What is Version Control?

  • Tools to track changes in source code or file collections
  • Maintains history, facilitates collaboration
  • Uses snapshots to encapsulate the state of files and folders

Why Use Version Control?

  • Track changes, understand history
  • Collaborate with others seamlessly
  • Answer critical questions about code changes

Git’s Reputation

  • Known for its complexity
  • Emphasizing understanding over memorization of commands
  • XKCD Comic on Git

xkcd 1597


Git’s Data Model


Snapshots in Git

  • Git views history as a series of snapshots (trees and blobs)
  • Example tree structure:
    <root> (tree)
    |
    +- foo (tree)
    |  |
    |  + bar.txt (blob, contents = "hello world")
    |
    +- baz.txt (blob, contents = "git is wonderful")
    

Modeling History

  • History represented as a Directed Acyclic Graph (DAG) of snapshots
  • Snapshots are commits with parent references
  • Commits can have multiple parents (merging branches)

Commit History Visualization

  • Each commit points to its parent(s)
  • Branches and merges in development are clearly visible
o <-- o <-- o <-- o
            ^
             \
              --- o <-- o

Immutable Commits

  • Commits are immutable in Git
  • Edits create new commits; references are updated

Data Model as Pseudocode

  • Conceptual representation of Git’s data model
type blob = array<byte>
type tree = map<string, tree | blob>
type commit = struct
{
    parents: array<commit>
    author: string
    message: string
    snapshot: tree
}

Objects and Content-Addressing

  • Git stores objects (blobs, trees, commits) content-addressed by SHA-1 hash
  • Objects refer to other objects via hash

References

  • Human-readable names for SHA-1 hashes (e.g., master)
  • References are mutable, unlike objects

Repositories

  • A repository is a collection of objects and references

Staging Area

  • Mechanism to specify modifications for the next snapshot
  • Allows for clean, organized commits

Git CLI Basics

  • git help <command>: get help for a git command
  • git init: creates a new git repo, with data stored in the .git directory
  • git status: tells you what’s going on
  • git add <filename>: adds files to staging area
  • git commit: creates a new commit
  • git log: shows a flattened log of history
  • git log --all --graph --decorate: visualizes history as a DAG
  • git diff <filename>: show changes you made relative to the staging area
  • git diff <revision> <filename>: shows differences in a file between snapshots
  • git checkout <revision>: updates HEAD and current branch


Branching and Merging

  • git branch: shows branches
  • git branch <name>: creates a branch
  • git checkout -b <name>: creates a branch and switches to it
    • same as git branch <name>; git checkout <name>
  • git merge <revision>: merges into current branch
  • git mergetool: use a fancy tool to help resolve merge conflicts
  • git rebase: rebase set of patches onto a new base

Remotes

  • git remote: list remotes
  • git remote add <name> <url>: add a remote
  • git push <remote> <local branch>:<remote branch>: send objects to remote, and update remote reference
  • git branch --set-upstream-to=<remote>/<remote branch>: set up correspondence between local and remote branch
  • git fetch: retrieve objects/references from a remote
  • git pull: same as git fetch; git merge
  • git clone: download repository from remote

Undo

  • git commit --amend: edit a commit’s contents/message
  • git reset HEAD <file>: unstage a file
  • git checkout -- <file>: discard changes

Advanced Git

  • git config: Git is highly customizable
  • git clone --depth=1: shallow clone, without entire version history
  • git add -p: interactive staging
  • git rebase -i: interactive rebasing
  • git blame: show who last edited which line
  • git stash: temporarily remove modifications to working directory
  • git bisect: binary search history (e.g. for regressions)
  • .gitignore: specify intentionally untracked files to ignore

Miscellaneous

  • GUI clients, shell and editor integration
  • Different workflows (e.g., GitFlow, pull requests)
  • GitHub and other Git hosting providers

Resources