Understanding Version Control Systems: A Comprehensive Guide

Explore the world of version control systems (VCS). Learn about their key functions, benefits, and the different types available, including centralized and distributed VCS. Discover how VCS helps developers collaborate efficiently and maintain a complete history of their work.



Version Control System

A Version Control System (VCS) is a software that helps developers collaborate and maintain a complete history of their work. Here are the key functions of a VCS:

  • Allows developers to work simultaneously.
  • Prevents overwriting each other's changes.
  • Maintains a history of every version.

Types of Version Control Systems

  • Centralized Version Control System (CVCS)
  • Distributed/Decentralized Version Control System (DVCS)

In this section, we will focus on Distributed Version Control Systems, especially Git, which falls under DVCS.

Distributed Version Control System

A Centralized Version Control System (CVCS) uses a central server to store all files and enable collaboration. The main drawback is its single point of failure. If the server goes down, collaboration stops. In a worst-case scenario, disk corruption could result in the loss of the entire project history.

Distributed Version Control Systems (DVCS) address this by fully mirroring the repository across clients. If the server goes down, any client can restore it. Git operates without relying on a central server, allowing users to perform operations offline, such as:

  • Commit changes
  • Create branches
  • View logs

A network connection is only needed to publish changes or fetch the latest updates.

Advantages of Git

Free and Open Source

Git is released under the GPL open-source license and can be used freely. You can modify its source code to meet your needs.

Fast and Small

Most Git operations are performed locally, resulting in faster performance. Git is written in C, avoiding overheads from high-level languages. Despite mirroring the entire repository, its size is efficiently compressed on the client side.

Implicit Backup

With multiple copies of the repository, data loss is rare. Each client serves as a backup, useful in case of crashes or disk corruption.

Security

Git uses the SHA1 cryptographic hash function to name and identify objects. Every file and commit is checked by its checksum, ensuring data integrity.

No Need for Powerful Hardware

In a CVCS, the central server must handle the entire team’s requests, which can become a bottleneck. In contrast, DVCS operations happen on the client side, so the server hardware can be simpler.

Easier Branching

In a CVCS, creating, deleting, or merging branches is time-consuming. Git simplifies this, allowing branches to be managed quickly and efficiently.

DVCS Terminologies

Local Repository

Every developer gets a private copy of the entire repository in Git, where they can perform operations like adding, removing, renaming files, and committing changes.

Working Directory and Staging Area (Index)

The working directory is where files are checked out. In Git, not all modified files are tracked automatically. Only files in the staging area are considered for commits. Here's the basic Git workflow:

  1. Modify a file in the working directory.
  2. Add the file to the staging area.
  3. Commit the file, moving it from the staging area to the repository.
Example: Git Workflow

# First commit
git add sort.c
git commit –m “Added sort operation”

# Second commit
git add search.c
git commit –m “Added search operation”
Output

First commit: Added sort operation
Second commit: Added search operation

Blobs

Blob stands for Binary Large Object. Each version of a file is represented as a blob, holding the file data without metadata. In Git, blobs are named by the SHA1 hash of the file content.

Trees

A tree object represents a directory and holds blobs as well as sub-directories, each named as an SHA1 hash.

Commits

A commit object represents the current state of the repository, named by its SHA1 hash. Each commit has a pointer to its parent, forming a history of changes.

Branches

Branches create separate lines of development. By default, Git has a master branch. Once a feature is complete, it is merged with the master branch and the feature branch can be deleted.

Tags

Tags assign a meaningful name to a specific version of the repository. Unlike branches, tags are immutable and are often used for product releases.

Clone

Cloning creates an instance of the repository, including the working copy and the entire repository. Networking is only involved when synchronizing repositories.

Pull

The pull operation copies changes from a remote repository to a local one for synchronization. This is similar to the update operation in Subversion.

Push

The push operation sends changes from a local repository to a remote one, storing them in the Git repository.

HEAD

HEAD is a pointer to the latest commit in a branch. It is updated with each new commit. The branch heads are stored in the .git/refs/heads/ directory.

Example: View HEAD

# View the master branch HEAD
ls -1 .git/refs/heads/
cat .git/refs/heads/master
Output

master
570837e7d58fa4bccd86cb575d884502188b0c49

Revision

A revision represents a specific version of the source code, identified by its commit hash.

URL

The URL represents the location of the Git repository and is stored in the config file.

Example: View URL

# View the repository URL
pwd
cat .git/config
Output

/home/tom/tom_repo
url = gituser@git.server.com:project.git