Reference answer
RAID 0, or the most basic kind of RAID, is called striping. If you have 3 discs, disc 1, disc 2, disc 3, data gets sent to disk 1, disc 2, disc 3 – disc 1, disc 2, disc 3 – disc 1, disc 2, disc 3. The advantage is when you have 3 drives, of 2 TB each and a RAID 0 array, you have 6 TB, total. The speed that you get is equivalent to 3 times each drive because you're using all 3 drives in series. Write, write, write – Write, write, write – Write, write, write. The disadvantage is It has zero redundancy. If 1 of the 3 drives in your RAID array fails, you lose everything because your data is spread across the drives. RAID 0 gives you great speed and performance but has no redundancy. RAID 1, is called mirroring. If you have a 10 TB drive in your computer/server, and a second 10 TB drive, the data is copied from one drive to the other drive in real-time. The advantage is that you always have an identical ready-to-use copy of your data. If one of the drives is lost, the other drive still has all the data. The disadvantage is you don't get any increase in capacity. If you have two 10 TB drives, you only have 10 TBs of capacity, in total, because one drive is being always used for backup. Also, you don't have any speed improvement because everything is being written to one disc, and the speed limit is the disc, which is going to be written to the next disc at the same speed limitation on that disc, so you've got no speed performance. RAID 1 gives you great redundancy and availability, but you do not get any speed or more capacity. RAID 5, is the most common form of RAID in the enterprise environment, is called striping with parity. AWS typically doesn't recommend it on their network, but the entire enterprising world is running it. And I bet you, they're probably running it on their internal RAID arrays for which they sell us EBS volumes as well as S3. If you have 3 disks, disc 1, disc 2, disc 3, data gets written on all 3 discs, but they also send what's called parity data (recovery) on all 3 desks. Let's say disc 1 gets data, disc 2 gets data, disc 3 gets parity. The next time disc 1 gets parity, disco 2 gets data, disc 3 gets data. etc. What happens is you're taking one of the discs and you're using it for recovery. If there are 4 discs, you've got the capacity of 3, because 1 of them is going to be used for parity data. If you have 4 drives in a RAID 5 array, you'll have 3 that'll get used. If there are 6 drives, 5 out of the 6 will be used and 1 will be used for parity. The advantage is that RAID 5 generally speaking has some very good performance in terms of throughput. It also provides great redundancy. If anything happens, basically you remove the bad drive, you pop a new drive in and you basically ask your RAID array to rebuild the data from the parity data from the other drives and you are good to go. The disadvantage is that it can actually add some latency because writing this parity data definitely adds latency into the environment. RAID 5 gives you a good blend of speed, performance, and redundancy. RAID 10, combining mirroring and striping. If you need more performance and lower latency than you could possibly get with RAID 5, there is another option. The option is a combination of RAID 1 and RAID 0, that is RAID 10. RAID 0 is super-fast because you're running from drive to drive to drive. RAID 1 is perfect for backup, have one drive here, it gets copied to another drive. If you have 4 drives in the first RAID array and in RAID 0, you get 4 times the capacity and 4 times the speed. If you made another RAID 0 array, you'd have again the same speed and capacity. If you mirrored the first RAID array to the second RAID array, you'd effectively have 1 RAID array in terms of capacity and the other RAID array in terms of backup and redundancy. The advantage is that it is a fantastic way for high performance. The disadvantage is, that it requires double the number of disks and it gets very expensive very quickly. RAID 10 gives you the speed of RAID 0, but with redundancy.