DBMS Storage Systems: Understanding Data Persistence

Explore the different storage systems used in database management systems (DBMS). Learn about the distinction between primary storage (volatile memory) and secondary storage (persistent storage), and understand how data is physically stored and accessed on these devices.



DBMS - Storage System

In databases, data is stored in file formats containing records. At the physical level, data is stored in electromagnetic format on storage devices. These devices can be categorized into three main types:

Memory Types

Primary Storage

Primary storage refers to memory that is directly accessible by the CPU. This includes the CPU's internal memory (registers), fast memory (cache), and main memory (RAM). These components are all located on the motherboard or CPU chipset. Primary storage is typically very fast but volatile, requiring a constant power supply to retain data. If the power fails, all data in primary storage is lost.

Secondary Storage

Secondary storage devices are used to store data for future use or as backup. These devices are external to the CPU chipset or motherboard and include magnetic disks, optical disks (e.g., DVD, CD), hard disks, flash drives, and magnetic tapes.

Tertiary Storage

Tertiary storage is used for large-scale data storage. These devices are external to the computer system and are typically the slowest. Tertiary storage devices, such as optical disks and magnetic tapes, are often used for system backups.

Memory Hierarchy

Memory hierarchy in a computer system helps bridge the speed gap between the CPU and main memory. The CPU can access its internal registers and main memory, though the latter is slower. Cache memory is added to minimize this mismatch, providing the fastest access time for data that is frequently accessed by the CPU.

The cost of memory typically increases with speed. While larger storage devices are slower and less expensive, they can store more data than CPU registers or cache memory.

Magnetic Disks

Hard disk drives (HDDs) are common secondary storage devices. These magnetic disks store data through magnetization on metal disks coated with a magnetizable material. The disks are placed vertically on a spindle, and a read/write head moves between them to magnetize or demagnetize specific spots. These magnetized spots represent binary data (0s and 1s).

A hard disk is organized with concentric circles called tracks, which are further divided into sectors, each typically storing 512 bytes of data.

Redundant Array of Independent Disks (RAID)

RAID, or Redundant Array of Independent Disks, is a technology that connects multiple secondary storage devices to form a single storage unit. Different RAID levels define various ways to use disk arrays to achieve specific goals.

RAID Levels

RAID 0

RAID 0 uses a striped array of disks. Data is split into blocks and distributed among the disks, which enhances speed and performance by allowing parallel read/write operations. However, RAID 0 lacks redundancy and backup.

RAID 1

RAID 1, or mirroring, involves duplicating data across multiple disks. Each disk in the array receives a copy of the data, providing 100% redundancy in case of failure.

RAID 2

RAID 2 utilizes Error Correction Code (ECC) and Hamming distance for data protection. Each data bit is recorded on a separate disk, with ECC codes stored on different disks. Due to its complexity and high cost, RAID 2 is not commonly used.

RAID 3

RAID 3 stripes data across multiple disks and stores parity bits on a separate disk, allowing recovery from a single disk failure.

RAID 4

RAID 4 writes entire blocks of data onto disks and stores parity on a separate disk. It uses block-level striping, unlike RAID 3, which uses byte-level striping. A minimum of three disks is required for RAID 4.

RAID 5

RAID 5 stripes data blocks across disks and distributes parity bits across all disks rather than storing them on a separate disk. This setup provides data redundancy and improved read performance.

RAID 6

RAID 6 extends RAID 5 by using two parity bits distributed across multiple disks, offering enhanced fault tolerance. A minimum of four disks is required for RAID 6.

RAID technology allows for a combination of storage reliability, performance, and data protection by utilizing various RAID levels based on system needs.