Skip to main content

Latest Tweets

[ECCO] yet another hard drive failure

Print Friendly, PDF & Email

To all users of the ecco.vrdc.cornell.edu cluster:

EMERGENCY DOWNTIME: NOW

REASONS: /home storage unit has had 2 drive failures.

WHO WILL BE AFFECTED? All Users of the ecco.vrdc.cornell.edu cluster

WHAT WILL BE UNAVAILABLE? ecco.vrdc.cornell.edu & compute nodes

STATUS: NO further jobs can start as of NOW. A follow-up message will be sent upon completion.

QUESTIONS: ecco-help@cac.cornell.edu

We are terribly sorry for the inconvenience, and are looking into alternatives. Technical aside: our current hardware (Dell MD-1000) does not support RAID 6, only RAID 5. The two drive failures came in rapid succession. That means that even with a hot-spare drive (a drive not used for anything else except for an emergency like this), the filesystem would have failed. Newer hardware supports RAID 6, which allows for up to 2 drives to fail simultaneously. We are looking into what options there are with the current hardware in ECCO.

1 comment to [ECCO] yet another hard drive failure

  • Lars Vilhuber

    UPDATE [2014-08-26]: We are working on restoring a functional /home directory. The best recoverable state of the home filesystem will also be made available. Mid-term, we expect to have a replacement storage system in place within 7-10 days. Sorry for the inconvenience.