Introduction to VCS

Understanding how to effectively use Version Control Systems (VCSs) is crucial for any developer or team working on software projects.

VCS helps manage changes to source code and other collections of files over time.

In this lecture, we will dive into the use and mechanics of VCSs, with a particular emphasis on Git, which has become the de facto standard for version control in the industry.


What is Version Control?

Version control is a system that records changes to files over time so that you can recall specific versions later.

# Also a version control system, but not a perfect one
echo "Hello, World!" > hello.txt
echo "Hello, World 2!" > hello2.txt

We need version control systems to track modifications in source code or any file collection to manage and maintain a history of your work.

Version control systems facilitate collaboration by allowing multiple people to work on the same project simultaneously without overwriting each other’s contributions.

They use snapshots to encapsulate the state of files and folders at specific points in time, making it easy to revert to previous versions if necessary.


Why Use Version Control?

Using version control offers many benefits:

  • Track Changes: Keep a detailed history of modifications, making it easy to identify when and why changes were made.
  • Understand History: Review the evolution of your project to understand the reasoning behind certain decisions.
  • Enable Collaboration: Multiple team members can work on the same codebase concurrently without conflicts.
  • Answer Critical Questions: Determine who made specific changes, what changes were made, and when they were implemented, which is essential for debugging and accountability.

Git’s Reputation

While Git is a powerful tool, it is often perceived as complex and challenging to learn.

This complexity can be problematic for newcomers.

However, by focusing on understanding the core concepts and data models rather than just memorizing commands, you can gain proficiency more effectively.

As shown in the XKCD comic on Git, many developers have struggled with Git, but with the right approach, you can master it.

xkcd 1597


Git’s Data Model


Snapshots in Git

Git views the history of your project as a series of snapshots. Each snapshot captures the state of your files and directories at a particular moment. These snapshots are composed of trees and blobs:

  • Trees: Represent directories and the hierarchy of files.
  • Blobs: Store the contents of individual files.

For example, a snapshot might look like:

<root> (tree)
|
+- foo (tree)
|  |
|  + bar.txt (blob, contents = "hello world")
|
+- baz.txt (blob, contents = "git is wonderful")

Modeling History

Git represents the history of your project as a Directed Acyclic Graph (DAG) of snapshots.

Each commit in Git is a node in this graph and points to its parent commit(s):

  • Commits: Snapshots of your project at specific points in time.
  • Parents: References to previous commits, forming the history.

Commits can have multiple parents, which occurs during a merge when two branches are combined.

This DAG structure enables Git to model complex development histories, including branching and merging.


Commit History Visualization

In Git, each commit points back to its parent(s), creating a chain of commits.

This chain can branch and merge, representing different lines of development. A simplified visualization might look like:

o <-- o <-- o <-- o  (main branch)
                ^
                 \
                  --- o <-- o  (feature branch)
  • Main Branch: The primary line of development.
  • Feature Branch: A separate line of work that can later be merged back into the main branch.

Immutable Commits

Commits in Git are immutable, meaning they cannot be changed once created.

If you need to make changes, Git creates a new commit with the modifications.

References, such as branch names, are updated to point to the new commit.


Data Model as Pseudocode

Conceptual representation of Git’s data model

type blob = array<byte>;
 
type tree = map<string, tree | blob>;
 
type commit = struct {
    parents: array<commit>;
    author: string;
    message: string;
    snapshot: tree;
}
  • Blob: Stores the content of a file.
  • Tree: Represents a directory, mapping filenames to blobs or other trees.
  • Commit: Contains metadata (author, message), references to parent commits, and a snapshot of the project state (a tree).

Write Yourself a Git if you want to explore Git’s internals further and write yourself a simple version of Git.


Objects and Content-Addressing

Git stores all data as objects in a content-addressable manner using SHA-1 hashes:

  • SHA-1 Hashes: Unique identifiers generated from the content of the object.
  • Objects: Include blobs, trees, and commits.

Because objects are identified by their content hash, identical files or snapshots are stored only once, saving space and ensuring consistency.


References

References in Git are human-readable names that point to specific commits (SHA-1 hashes):

  • Branches: Named references (e.g., master, develop) that point to the latest commit in a line of development.
  • Tags: Fixed pointers to specific commits, often used for marking release points.

References are mutable, meaning they can be updated to point to different commits as the project evolves.


Repositories

A Git repository is a collection of all your project’s data, including:

  • Objects: Blobs, trees, and commits stored by their content hashes.
  • References: Branches, tags, and other pointers to commits.

The repository contains the complete history of your project.


Staging Area

The staging area (also known as the index) is a crucial part of Git’s workflow:

  • Purpose: It allows you to select specific changes you want to include in your next commit.
  • Benefits:
    • Organize your commits logically by grouping related changes.
    • Review changes before committing.
    • Control over what goes into each commit.

By adding changes to the staging area, you can create clean and meaningful commits, which improve the project’s history and collaboration.


Git CLI Basics

Here are some fundamental Git commands to get you started:

  • Getting Help:
    • git help <command>: Displays help information for a specific Git command.
  • Initializing a Repository:
    • git init: Creates a new Git repository in your current directory. Git data is stored in the .git directory.
  • Checking Status:
    • git status: Shows the status of your working directory and staging area, including untracked files and changes to be committed.
  • Adding Files:
    • git add <filename>: Adds a file to the staging area.
  • Committing Changes:
    • git commit: Creates a new commit with the changes in the staging area.
  • Viewing History:
    • git log: Displays a list of commits in your repository.
    • git log --all --graph --decorate: Provides a visual representation of the commit history as a DAG.
  • Showing Differences:
    • git diff <filename>: Shows changes in the working directory relative to the staging area.
    • git diff <revision> <filename>: Shows differences between a specific revision and the current state of a file.
  • Switching Branches or Revisions:
    • git checkout <revision>: Updates the files in your working directory to match the specified revision or branch.

Writing Good Commit Messages

Writing clear and informative commit messages is essential for maintaining a readable project history. Good commit messages should:

  • Summarize Changes: Provide a concise summary of what the commit does.
  • Explain Why: Offer context about why the changes were made.
  • Follow Conventions: Use a consistent style and format.

For more guidance, refer to:


Branching and Merging

Branching allows you to diverge from the main line of development to work on features or fixes independently.

  • Listing Branches:
    • git branch: Lists all branches in the repository.
  • Creating a Branch:
    • git branch <name>: Creates a new branch with the specified name.
  • Switching Branches:
    • git checkout <name>: Switches to the specified branch.
  • Creating and Switching Branches:
    • git checkout -b <name>: Creates a new branch and switches to it immediately (equivalent to git branch <name> followed by git checkout <name>).
  • Merging Branches:
    • git merge <revision>: Merges the specified branch or commit into the current branch.
  • Resolving Conflicts:
    • git mergetool: Launches a tool to help resolve merge conflicts interactively.
  • Rebasing:
    • git rebase: Moves or combines a sequence of commits to a new base commit, useful for maintaining a linear project history.

Remotes

Remotes are versions of your repository hosted on the internet or network.

  • Listing Remotes:
    • git remote: Lists the names of all remotes.
  • Adding a Remote:
    • git remote add <name> <url>: Adds a new remote repository with the specified name and URL.
  • Pushing Changes:
    • git push <remote> <local branch>:<remote branch>: Sends commits from your local branch to the remote branch.
  • Setting Upstream Branches:
    • git branch --set-upstream-to=<remote>/<remote branch>: Sets the tracking relationship between your local branch and a remote branch.
  • Fetching Changes:
    • git fetch: Retrieves commits, files, and references from a remote repository.
  • Pulling Changes:
    • git pull: Combines git fetch and git merge, fetching changes from the remote and merging them into your current branch.
  • Cloning Repositories:
    • git clone <url>: Creates a local copy of a remote repository.

Undo

Git provides commands to undo changes at various stages:

  • Amending Commits:
    • git commit --amend: Modifies the most recent commit with new changes or an updated commit message.
  • Unstaging Files:
    • git reset HEAD <file>: Removes the specified file from the staging area without changing the working directory.
  • Discarding Changes:
    • git checkout -- <file>: Reverts the file in your working directory to match the last commit, discarding any uncommitted changes.

Advanced Git


Advanced Commands and Features

  • Customization:
    • git config: Adjust Git’s settings and preferences. Git is highly customizable, allowing you to configure aliases, default behaviors, and more. See the Git Configuration documentation for details.
  • Shallow Clones:
    • git clone --depth=1: Performs a shallow clone, downloading only the most recent history. Useful for reducing clone time and disk space.
  • Interactive Staging:
    • git add -p: Allows you to interactively review and stage changes hunk by hunk.
  • Interactive Rebasing:
    • git rebase -i: Offers an interactive interface to edit, reorder, squash, or delete commits during a rebase.
  • Annotating Changes:
    • git blame: Shows who last modified each line of a file, useful for tracking down the origin of changes.
  • Stashing Changes:
    • git stash: Temporarily saves changes in your working directory so you can work on something else and reapply them later.
  • Bisecting:
    • git bisect: Performs a binary search through your commit history to find the commit that introduced a bug or regression.
  • Ignoring Files:
    • .gitignore: A file where you specify patterns for files or directories that Git should ignore. Refer to the Git Ignore documentation for syntax and usage.

Miscellaneous

  • GUI Clients:
    • There are graphical user interfaces available for Git, which can simplify complex tasks and visualize history. I personally use Fork and IntelliJ IDEA’s built-in Git integration.
  • Shell and Editor Integration:
    • Many shells and text editors offer Git integration, providing features like syntax highlighting, auto-completion, and inline diffs.
  • Workflows:
    • Different teams may use various Git workflows, such as GitFlow, feature branching, or pull requests, to manage their development process.
  • Hosting Providers:
    • Platforms like GitHub, GitLab, and Bitbucket offer Git hosting with additional features like issue tracking, continuous integration, and code reviews.

Resources

To further enhance your understanding of Git, consider exploring the following resources: