RAID 5?

Baba O'Riley · Oct 8, 2005

Oh man! Ad-blocking software has been detected! :'(

This website is run by the community, for the community... and it needs advertisements in order to keep running. Blocking our ads means your killing our stats!
Please disable your ad-block, or become a premium member to hide all advertisements and this notice.

Hi all,

I'm currently working on my A+ and I'm trying to get my head around RAID 5. I thought I understood it but the Meyers book doesn't really explain it in depth and it's raised more questions than answers. If you take a look at this diagram , you can see that the top stripe (P) on drive C is providing parity for Stripes 1A and 1B. But, Stripes 1A and 1B combined are twice the size (in terms of storage) than Stripe P on Disk C.
As far as I can see this is the case for the Parity stripes on the other two drives as well. So how is this possible? Does the RAID array use some kind of compression?

I hope someone can explain it as the websites I've looked at don't. Failing that, it's a call to the NITLC tutors on Monday.

Cheers,

Baba.

ffreeloader · Oct 8, 2005

Oh man! Ad-blocking software has been detected! :'(

This website is run by the community, for the community... and it needs advertisements in order to keep running. Blocking our ads means your killing our stats!
Please disable your ad-block, or become a premium member to hide all advertisements and this notice.

Raid 5 is pretty cool.

In the example you're talking about the parity stripe holds an algorithm from which it is possible to restore the data if one of the other two disks is lost, not exact copies.

The math is above my head, but it basically works by being able to figure out what's missing by looking at what is left.

Edit here...

Note that the parity stripe doesn't exist on just one disk. The parity stripe is written across all disks. That's why it's called striping. Thus if one disk is missing only 1/3 of the data can be missing and it's not all in one huge chunk.

Baba O'Riley · Oct 8, 2005

Thanks ffreeloader,

I understand that the parity is distributed across all drives. To quote Meyers, "One disk's worth of storage is used for parity", ie, if you have an array of three 200GB drives, 200GB is used for parity and you have 400GB of storage. Therin lies my confusion. How can 200GB of disk space provide parity for 400GB? They must be some smart algorithms to reproduce fully 50% of all missing data in the event of a drive failure.

ffreeloader · Oct 8, 2005

They must be some smart algorithms to reproduce fully 50% of all missing data in the event of a drive failure.
Click to expand...

Well, it's not actually 50%. It's more like a third, because, remember, that no more than 1/3 of the data is written on any one drive, however, that being said, the math is way over my head. WAY OVER!!

You have to get past the idea that 1/2 the data is written on one drive, 1/2 on another drive, and all the parity data written on a third drive. The stripes are written so that the parity stripe and the two data stripes are written equally across all drives.

ffreeloader · Oct 8, 2005

Baba,

Try this link. I think it will give you a much better visualization as to how data is written in a Raid 5 configuration, although it uses a 4 disk array rather than a 3 disk array. The one you have is very simplistic at best.

The_Geek · Oct 8, 2005

Maybe a real world visual will help. This is from one of my test machines:

Baba O'Riley · Oct 8, 2005

OK Guys. Geek, I don't know what I'm supposed to infer your diagram but it's very pretty. freeloader, again, I know that the parity data is distrubuted across the drives.

Well, it's not actually 50%. It's more like a third, because, remember, that no more than 1/3 of the data is written on any one drive, however, that being said, the math is way over my head
Click to expand...

Let me explain what I meant a bit more in depth. According to Meyers, in a RAID 5 array, one drive's WORTH of storage is used for parity and that seems to be bourne out by freeloaders linked diagram as well. That's one drive's worth, not one drive. So in a three 200GB disk array, each drive has 133GB dedicated to data storage and 66GB dedicated to parity. Now assume one drive fails, you've lost 133GB of data which is in theory, duplicated across the other two. The other two drives have between them 133GB of parity data, all well and good. Except, that 133GB of space is supposedly duplicating the data on two drives ie, 266GB of data. Therefore the algorithms need to reconstruct 133GB of data from 66GB of data stored on the remaing two hard drives. That my friends is 50% of the lost data.

Now I can just, only just, beleive that an algorithm can reconstruct data to that extent but what about an array with five 200GB disks? 200GB of parity and 800GB of data. So in the event of a drive failure an algorithm would have to reconstruct 160GB of data from 40GB worth of parity data. That's 75% of the lost data. If an algorithm was that efficient why isn't it used to compress data eg. in Winzip? Particularly media files like mp3s which are notoriously hard to compress as they already are compressed to the limit.

Nothing I've seen so far adequately explains this, not even freeloaders link, which I'd already seen before my original post.

ffreeloader · Oct 8, 2005

Baba,

Raid doesn't compress data. It reconstructs the missing data from what remains of the original data. Big difference.

Read this link and see if it makes sense to you. The algorithm actually takes the saved portions of the striped data and uses it to rebuild the lost data.

Raid doesn't compress data. Get away from the idea that Raid is built on data compression. It has nothing to do with data compression.

Baba O'Riley · Oct 9, 2005

Ok, choice of words was wrong there. Replace compression with "space saving". If the algorithms can rebuild that much data, why not use tham more widely, ie to "save space" on hard drives etc?

Baba O'Riley · Oct 9, 2005

ffreeloader, that link looks like it's what I was looking for. I haven't read it too in depth as it's late/early and my brain isn't operating at 100%. Looks interesting though, cheers.

ffreeloader · Oct 9, 2005

Ok, choice of words was wrong there. Replace compression with "space saving". If the algorithms can rebuild that much data, why not use tham more widely, ie to "save space" on hard drives etc?
Click to expand...

I would imagine it's performance, but I've never really thought about it or studied into it. Maybe that's something for you to research. It's not something that lights a fire under me.

Baba O'Riley · Oct 9, 2005

Freddy, just read that link you posted. The page on parity explains it all very neatly and simply, thanks. Just out of interest, there's no reason that all the parity data can't be stored on one drive, if it fails all the original data is still intact and if one of the other drives fail, all the parity is still intact. So why is it common practice to stripe the parity data as well? Is it just easier to program a RAID controller to stripe everyting rather than just some data?

Edit: I suppose because it can read/write the parity data faster by striping it. Obvious really!

Jellyman_4eva · Oct 10, 2005

It is common practice to stripe the parity data for speed and also to remove excessive disk usage of the parity disk if you did not use striping.

This is because every time a piece of data is written to a hard disk, parity data is also written to disk.

If for example you have a RAID array with 128K clusters, if you write a lot of sub 128K files, only one disk is being used to store the data, this disk in our case with 3 drives, will be either disk 1 or disk 2. (Note this happens all the time anyway because files will not always split across both disks equally because of their sizes, I was just using the 128K example because its easier to understand)

Disk 3 however will always be used to store the parity information so as you can see Disk 3 will always be used whereas Disk 1 and 2 have a 50% chance of being used, and thus over an average, Disk 3 with the parity has a much greater chance of failure because of its higher usage.

d-Faktor · Oct 10, 2005

Jellyman_4eva said:

Disk 3 however will always be used to store the parity information ... [snip]
Click to expand...

sure you're not talking about a raid3 config? (or maybe i should read the thread more carefully)

Jellyman_4eva · Oct 10, 2005

Hi,

Yes OK I was being a little vague in my description. My example was showing RAID3 as that is data striping and not parity striping, I was using this to emphasise why we now have RAID5!!

Veteran's son · Oct 12, 2005

Excellent information in this thread!

Log in or Sign up

RAID 5?

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

ffreeloader Terabyte Poster

The_Geek Megabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

Jellyman_4eva Byte Poster

d-Faktor R.I.P - gone but never forgotten.

Jellyman_4eva Byte Poster

Veteran's son Megabyte Poster

Share This Page

Navigation

Popular Forums

Useful Links

Log in or Sign up

RAID 5?

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

ffreeloader Terabyte Poster

The_Geek Megabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

Baba O'Riley Gigabyte Poster

ffreeloader Terabyte Poster

Baba O'Riley Gigabyte Poster

Jellyman_4eva Byte Poster

d-Faktor R.I.P - gone but never forgotten.

Jellyman_4eva Byte Poster

Veteran's son Megabyte Poster

Share This Page

Useful Searches