Jan 11, 20215 min read

Raid

RAID refers to redundancy array of the independent disk. It is a technology which is used to connect multiple secondary storage devices for increased performance, data redundancy or both. It gives you the ability to survive one or more drive failure depending upon the RAID level used.

Data redundancy, although taking up extra space, adds to disk reliability. This means, in case of disk failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the operation. On the other hand, if the data is spread across just multiple disks without the RAID technique, the loss of a single disk can affect the entire data.

It consists of an array of disks in which multiple disks are connected to achieve different goals.

Key evaluation points for a RAID System

Reliability: How many disk faults can the system tolerate?
Availability: What fraction of the total session time is a system in uptime mode, i.e. how available is the system for actual use?
Performance: How good is the response time? How high is the throughput (rate of processing work)? Note that performance contains a lot of parameters and not just the two.
Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the user?

There are 7 levels of RAID schemes. These schemas are as RAID 0, RAID 1, ...., RAID 6.

These levels contain the following characteristics:

It contains a set of physical disk drives.
In this technology, the operating system views these separate disks as a single logical disk.
In this technology, data is distributed across the physical drives of the array.
Redundancy disk capacity is used to store parity information.
In case of disk failure, the parity information can be helped to recover the data.

Standard RAID levels

RAID 0

RAID level 0 provides data stripping, i.e., a data can place across multiple disks. It is based on stripping that means if one disk fails then all data in the array is lost.
This level doesn't provide fault tolerance but increases the system performance.

Example:

In this figure, block 0, 1, 2, 3 form a stripe.

In this level, instead of placing just one block into a disk at a time, we can work with two or more blocks placed it into a disk before moving on to the next one.

In this above figure, there is no duplication of data. Hence, a block once lost cannot be recovered.

Pros of RAID 0:

In this level, throughput is increased because multiple data requests probably not on the same disk.
This level full utilizes the disk space and provides high performance.
It requires minimum 2 drives.

Cons of RAID 0:

It doesn't contain any error detection mechanism.
The RAID 0 is not a true RAID because it is not fault-tolerance.
In this level, failure of either disk results in complete data loss in respective array.

RAID 1

This level is called mirroring of data as it copies the data from drive 1 to drive 2. It provides 100% redundancy in case of a failure.

Example:

Only half space of the drive is used to store the data. The other half of drive is just a mirror to the already stored data.

Pros of RAID 1:

The main advantage of RAID 1 is fault tolerance. In this level, if one disk fails, then the other automatically takes over.
In this level, the array will function even if any one of the drives fails.

Cons of RAID 1:

In this level, one extra drive is required per drive for mirroring, so the expense is higher.

RAID 2

RAID 2 consists of bit-level striping using hamming code parity. In this level, each data bit in a word is recorded on a separate disk and ECC code of data words is stored on different set disks.
Due to its high cost and complex structure, this level is not commercially used. This same performance can be achieved by RAID 3 at a lower cost.

Pros of RAID 2:

This level uses one designated drive to store parity.
It uses the hamming code for error detection.

Cons of RAID 2:

It requires an additional drive for error detection.

RAID 3

RAID 3 consists of byte-level striping with dedicated parity. In this level, the parity information is stored for each disk section and written to a dedicated parity drive.
In case of drive failure, the parity drive is accessed, and data is reconstructed from the remaining devices. Once the failed drive is replaced, the missing data can be restored on the new drive.
In this level, data can be transferred in bulk. Thus high-speed data transmission is possible.

Pros of RAID 3:

In this level, data is regenerated using parity drive.
It contains high data transfer rates.
In this level, data is accessed in parallel.

Cons of RAID 3:

It required an additional drive for parity.
It gives a slow performance for operating on small sized files.

RAID 4

RAID 4 consists of block-level stripping with a parity disk. Instead of duplicating data, the RAID 4 adopts a parity-based approach.
This level allows recovery of at most 1 disk failure due to the way parity works. In this level, if more than one disk fails, then there is no way to recover the data.
Level 3 and level 4 both are required at least three disks to implement RAID.

In this figure, we can observe one disk dedicated to parity.

In this level, parity can be calculated using an XOR function. If the data bits are 0,0,0,1 then the parity bits is XOR(0,1,0,0) = 1. If the parity bits are 0,0,1,1 then the parity bit is XOR(0,0,1,1)= 0. That means, even number of one results in parity 0 and an odd number of one results in parity 1.

Suppose that in the above figure, C2 is lost due to some disk failure. Then using the values of all the other columns and the parity bit, we can re-compute the data bit stored in C2. This level allows us to recover lost data.

RAID 5

RAID 5 is a slight modification of the RAID 4 system. The only difference is that in RAID 5, the parity rotates among the drives.
It consists of block-level striping with DISTRIBUTED parity.
Same as RAID 4, this level allows recovery of at most 1 disk failure. If more than one disk fails, then there is no way for data recovery.

This figure shows that how parity bit rotates.

This level was introduced to make the random write performance better.

Pros of RAID 5:

This level is cost effective and provides high performance.
In this level, parity is distributed across the disks in an array.
It is used to make the random write performance better.

Cons of RAID 5:

In this level, disk failure recovery takes longer time as parity has to be calculated from all available drives.
This level cannot survive in concurrent drive failure.

RAID 6

This level is an extension of RAID 5. It contains block-level stripping with 2 parity bits.
In RAID 6, you can survive 2 concurrent disk failures. Suppose you are using RAID 5, and RAID 1. When your disks fail, you need to replace the failed disk because if simultaneously another disk fails then you won't be able to recover any of the data, so in this case RAID 6 plays its part where you can survive two concurrent disk failures before you run out of options.

Pros of RAID 6:

This level performs RAID 0 to strip data and RAID 1 to mirror. In this level, stripping is performed before mirroring.
In this level, drives required should be multiple of 2.