Curious RAM Controller Failure ?

Discussion in 'Hardware' started by GSteer, Jun 8, 2010.

  1. GSteer

    GSteer Megabyte Poster

    627
    31
    109
    Well, bit of a ball ache this one.

    Dell Poweredge 600SC, 512mb DDR ECC Registered

    Customer Description: Voicemail software/server freezers and requires rebooting.

    Diagnosis indicated failed ram, failed Memtest v3.5, v4.0 and the Windows Mem tester, although the internal diagnostics utility gave it an all clear, the system ended up hard freezing in all three above tests, with memtest giving an "unexpected interrupt - halting cpu". It would be the only type of RAM that we didn't currently have a spare stick of lying around, so an order was placed to test.

    Today: New 1Gb stick inserted (old was 512mb), tested, failed, head in hands, wtf O.o. Did some more tracking the noted that all the errors on the 512mb stick occur at 511.9mb, all errors on the 1Gb stick at 1023.9mb (nb few low end errors ~1mb). Now thats just too much of a coincidence.

    1Gb: 1023.9mb / 3fffdc80 (error location)
    512mb: 511.9mb / 0001fffdc80 (error location)

    The rest of the system passes it's internal utility checks, cpu caches ok, timers ok, everything "appears" fine.

    I'm scratching me head here and having a well deserved coffee after spending hours hacking the image of this system into a VM from a clonezilla image yesterday. Ultimately I'm going to suggest they virtualise the box anyway as it's noisy and needs replacing with something newer but this is bugging me from a technical perspective.

    Anyone come across this before, any ideas? There's a fresh pot of dark roast on if you want some :)
     
    Certifications: BSc. (Comp. Sci.), MBCS, MCP [70-290], Specialist [74-324], Security+, Network+, A+, Tea Lord: Beverage Brewmaster | Courses: LFS101x Introduction to Linux (edX)
    WIP: CCNA Routing & Switching
  2. greenbrucelee
    Highly Decorated Member Award

    greenbrucelee Zettabyte Poster

    14,292
    265
    329
    did you run memtest on one dimm at a time in all slots for several passes?

    If not then do that. Memtest is only truly accurate on one stick.

    Doing that will either prove the ram is faulty or the slots are faulty.

    If you have done the above then I think it would be safe to assume the slots are faulty.

    Mixing ram makes and speeds can cause problems and even some motherboards don't like different sizes.
     
    Last edited: Jun 8, 2010
    Certifications: A+, N+, MCDST, Security+, 70-270
    WIP: 70-620 or 70-680?
  3. asje1

    asje1 Byte Poster

    173
    2
    32
    Ensure that the memory slot isn’t clogged with dust or anything that would cause connection issues.

    Try the stick in a different memory module bank then try running mem test to make sure its not a physical fault (If the board will boot like that)

    Make sure the RAM speeds are configured correctly

    Try updating the BIOS
     
    Certifications: A+, N+
  4. GSteer

    GSteer Megabyte Poster

    627
    31
    109
    Unable to test in other slots, server refuses to post unless there is a stick of RAM present in slot 1.

    Multiple tests run against it, all fail, cleared the slots of dust (whilst doing the usual loud coughing) previously.

    Quoting this morning on a virtual box, if we end up with the hardware I might do some more digging out of curiosity.
     
    Certifications: BSc. (Comp. Sci.), MBCS, MCP [70-290], Specialist [74-324], Security+, Network+, A+, Tea Lord: Beverage Brewmaster | Courses: LFS101x Introduction to Linux (edX)
    WIP: CCNA Routing & Switching
  5. greenbrucelee
    Highly Decorated Member Award

    greenbrucelee Zettabyte Poster

    14,292
    265
    329
    If you cant't run without a dimm in slot one then memtest maynot be entirely accurate but if its bring up loads of errors then I think its the slots at fault and not the memory. Try cleaning the connectors and contacts with a rubber from the end of a pencil and see if it helps.
     
    Certifications: A+, N+, MCDST, Security+, 70-270
    WIP: 70-620 or 70-680?
  6. asje1

    asje1 Byte Poster

    173
    2
    32
    Ive had issues before similar to this, but not had the time to find out 'exactly' what the problem is. We had some issues where a mobo would only show 8-10GB even though 12GB was installed, mem test would test 12GB, Bios would recognise 10GB etc etc - and ended up just replacing components until the problems fixed.. This sounds to me like its the motherboard playing up. It's unlikely that a brand new stick of memory is fauly unless your very unlucky.

    Is there another machine you can test the newer memory module on?

    I know its a long shot but have there been any changes to the servers normal environment? Power cut / Surge? Forceful shutdowns? Have it been moved / Transported?
     
    Certifications: A+, N+

Share This Page

Loading...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.