RAID 5?

Discussion in 'Hardware' started by Baba O'Riley, Oct 8, 2005.

  1. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    Hi all,

    I'm currently working on my A+ and I'm trying to get my head around RAID 5. I thought I understood it but the Meyers book doesn't really explain it in depth and it's raised more questions than answers. If you take a look at this diagram , you can see that the top stripe (P) on drive C is providing parity for Stripes 1A and 1B. But, Stripes 1A and 1B combined are twice the size (in terms of storage) than Stripe P on Disk C.
    As far as I can see this is the case for the Parity stripes on the other two drives as well. So how is this possible? Does the RAID array use some kind of compression?

    I hope someone can explain it as the websites I've looked at don't. Failing that, it's a call to the NITLC tutors on Monday.

    Cheers,

    Baba.
     
    Certifications: A+, Network+
    WIP: 70-270
  2. ffreeloader

    ffreeloader Terabyte Poster

    3,661
    106
    167
    Raid 5 is pretty cool.

    In the example you're talking about the parity stripe holds an algorithm from which it is possible to restore the data if one of the other two disks is lost, not exact copies.

    The math is above my head, but it basically works by being able to figure out what's missing by looking at what is left.

    Edit here...

    Note that the parity stripe doesn't exist on just one disk. The parity stripe is written across all disks. That's why it's called striping. Thus if one disk is missing only 1/3 of the data can be missing and it's not all in one huge chunk.
     
    Certifications: MCSE, MCDBA, CCNA, A+
    WIP: LPIC 1
  3. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    Thanks ffreeloader,

    I understand that the parity is distributed across all drives. To quote Meyers, "One disk's worth of storage is used for parity", ie, if you have an array of three 200GB drives, 200GB is used for parity and you have 400GB of storage. Therin lies my confusion. How can 200GB of disk space provide parity for 400GB? They must be some smart algorithms to reproduce fully 50% of all missing data in the event of a drive failure.
     
    Certifications: A+, Network+
    WIP: 70-270
  4. ffreeloader

    ffreeloader Terabyte Poster

    3,661
    106
    167
    Well, it's not actually 50%. It's more like a third, because, remember, that no more than 1/3 of the data is written on any one drive, however, that being said, the math is way over my head. WAY OVER!!

    You have to get past the idea that 1/2 the data is written on one drive, 1/2 on another drive, and all the parity data written on a third drive. The stripes are written so that the parity stripe and the two data stripes are written equally across all drives.
     
    Certifications: MCSE, MCDBA, CCNA, A+
    WIP: LPIC 1
  5. ffreeloader

    ffreeloader Terabyte Poster

    3,661
    106
    167
    Baba,

    Try this link. I think it will give you a much better visualization as to how data is written in a Raid 5 configuration, although it uses a 4 disk array rather than a 3 disk array. The one you have is very simplistic at best.
     
    Certifications: MCSE, MCDBA, CCNA, A+
    WIP: LPIC 1
  6. The_Geek

    The_Geek Megabyte Poster

    772
    13
    64
    Maybe a real world visual will help. This is from one of my test machines:

    [​IMG]

    [​IMG]
     
    Certifications: CompTIA and Micro$oft
    WIP: PDI+
  7. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    OK Guys. Geek, I don't know what I'm supposed to infer your diagram but it's very pretty. freeloader, again, I know that the parity data is distrubuted across the drives.

    Let me explain what I meant a bit more in depth. According to Meyers, in a RAID 5 array, one drive's WORTH of storage is used for parity and that seems to be bourne out by freeloaders linked diagram as well. That's one drive's worth, not one drive. So in a three 200GB disk array, each drive has 133GB dedicated to data storage and 66GB dedicated to parity. Now assume one drive fails, you've lost 133GB of data which is in theory, duplicated across the other two. The other two drives have between them 133GB of parity data, all well and good. Except, that 133GB of space is supposedly duplicating the data on two drives ie, 266GB of data. Therefore the algorithms need to reconstruct 133GB of data from 66GB of data stored on the remaing two hard drives. That my friends is 50% of the lost data.

    Now I can just, only just, beleive that an algorithm can reconstruct data to that extent but what about an array with five 200GB disks? 200GB of parity and 800GB of data. So in the event of a drive failure an algorithm would have to reconstruct 160GB of data from 40GB worth of parity data. That's 75% of the lost data. If an algorithm was that efficient why isn't it used to compress data eg. in Winzip? Particularly media files like mp3s which are notoriously hard to compress as they already are compressed to the limit.

    Nothing I've seen so far adequately explains this, not even freeloaders link, which I'd already seen before my original post.
     
    Certifications: A+, Network+
    WIP: 70-270
  8. ffreeloader

    ffreeloader Terabyte Poster

    3,661
    106
    167
    Baba,

    Raid doesn't compress data. It reconstructs the missing data from what remains of the original data. Big difference.

    Read this link and see if it makes sense to you. The algorithm actually takes the saved portions of the striped data and uses it to rebuild the lost data.

    Raid doesn't compress data. Get away from the idea that Raid is built on data compression. It has nothing to do with data compression.
     
    Certifications: MCSE, MCDBA, CCNA, A+
    WIP: LPIC 1
  9. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    Ok, choice of words was wrong there. Replace compression with "space saving". If the algorithms can rebuild that much data, why not use tham more widely, ie to "save space" on hard drives etc?
     
    Certifications: A+, Network+
    WIP: 70-270
  10. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    ffreeloader, that link looks like it's what I was looking for. I haven't read it too in depth as it's late/early and my brain isn't operating at 100%. Looks interesting though, cheers.
     
    Certifications: A+, Network+
    WIP: 70-270
  11. ffreeloader

    ffreeloader Terabyte Poster

    3,661
    106
    167
    I would imagine it's performance, but I've never really thought about it or studied into it. Maybe that's something for you to research. It's not something that lights a fire under me.
     
    Certifications: MCSE, MCDBA, CCNA, A+
    WIP: LPIC 1
  12. Baba O'Riley

    Baba O'Riley Gigabyte Poster

    1,760
    23
    99
    Freddy, just read that link you posted. The page on parity explains it all very neatly and simply, thanks. Just out of interest, there's no reason that all the parity data can't be stored on one drive, if it fails all the original data is still intact and if one of the other drives fail, all the parity is still intact. So why is it common practice to stripe the parity data as well? Is it just easier to program a RAID controller to stripe everyting rather than just some data?

    Edit: I suppose because it can read/write the parity data faster by striping it. Obvious really!
     
    Certifications: A+, Network+
    WIP: 70-270
  13. Jellyman_4eva

    Jellyman_4eva Byte Poster

    213
    4
    34
    It is common practice to stripe the parity data for speed and also to remove excessive disk usage of the parity disk if you did not use striping.

    This is because every time a piece of data is written to a hard disk, parity data is also written to disk.

    If for example you have a RAID array with 128K clusters, if you write a lot of sub 128K files, only one disk is being used to store the data, this disk in our case with 3 drives, will be either disk 1 or disk 2. (Note this happens all the time anyway because files will not always split across both disks equally because of their sizes, I was just using the 128K example because its easier to understand)

    Disk 3 however will always be used to store the parity information so as you can see Disk 3 will always be used whereas Disk 1 and 2 have a 50% chance of being used, and thus over an average, Disk 3 with the parity has a much greater chance of failure because of its higher usage.
     
    Certifications: MCDST, MCITP-EDST/EDA/EA/SA/ MCSA 2K3/2K8, MCSE+M 2K3/2K8, ISA/TMG, VCP3/4, CCNA, Exchange, SQL, Citrix, A+, N+, L+, Sec+, Ser+, JNCIA-SSL, JNCIS-SSL
    WIP: Lots
  14. d-Faktor
    Honorary Member

    d-Faktor R.I.P - gone but never forgotten.

    810
    0
    39
    :blink
    sure you're not talking about a raid3 config? (or maybe i should read the thread more carefully)
     
  15. Jellyman_4eva

    Jellyman_4eva Byte Poster

    213
    4
    34
    Hi,

    Yes OK I was being a little vague in my description. My example was showing RAID3 as that is data striping and not parity striping, I was using this to emphasise why we now have RAID5!!
     
    Certifications: MCDST, MCITP-EDST/EDA/EA/SA/ MCSA 2K3/2K8, MCSE+M 2K3/2K8, ISA/TMG, VCP3/4, CCNA, Exchange, SQL, Citrix, A+, N+, L+, Sec+, Ser+, JNCIA-SSL, JNCIS-SSL
    WIP: Lots
  16. Veteran's son

    Veteran's son Megabyte Poster

    915
    2
    55
    Excellent information in this thread! :)
     
    Certifications: A+
    WIP: N+

Share This Page

Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.