Tuesday, July 24, 2012

Troubleshooting Raid Issue


1. My problem from the other day was rebuilding a raid when I got ecc errors on a different disk than the one being rebuilt. I did a rescan and the ecc errors went away, but the rebuild seemed to be stuck. I contacted 3ware, makers of our raid card and was told to do this:

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> maint remove c0 p6
Exporting port /c0/p6 ... Done.

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     NOT-PRESENT      -      -           -             -
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p6].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    DEGRADED       -      64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     OK               -      465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288     

//cdfs3> /c0/u1 start rebuild disk=6 ignoreecc
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Done.

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    REBUILDING     0      64K     1396.95   OFF    OFF      ON       

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288

This seems to be working. I guess I’ll know in a few hours if everything is ok.
If this still doesn’t work, I’m supposed to send 3ware an error log.

./tw_CLI info c0 diag>error.txt

2. The easy raid rebuild from yesterday turned out to not be so easy. Checking today, I got this message:
//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     ECC-ERROR        u1     465.76 GB   976773168     WD-WCANU1109927
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288
This does not look good at all, but the disk seems to be ok. So I tried this:
//cdfs3> /c0 rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [none].

//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288
That’s good. Now, I know the disk p6 is good because it was just replaced. I’m unsure now, if the rebuild will continue without problems or if I need to restart it. I think I’ll leave it for a while to see if things start working again. If it stays stuck at 89% for a while, I’ll run the rebuild command again.
I tried to reissue the command and got the following:
//cdfs3> /c0/u1 start rebuild disk=6
Sending rebuild start request to /c0/u1 on 1 disk(s) [6] ... Failed.

(0x0B:0x0032): Unit is rebuilding
//cdfs3> info c0

Unit  UnitType  Status         %Cmpl  Stripe  Size(GB)  Cache  AVerify  IgnECC
------------------------------------------------------------------------------
u0    RAID-5    OK             -      64K     1396.95   ON     OFF      OFF
u1    RAID-5    REBUILDING     89     64K     1396.95   OFF    OFF      OFF      

Port   Status           Unit   Size        Blocks        Serial
---------------------------------------------------------------
p0     OK               u0     465.76 GB   976773168     WD-WCANU1137212
p1     OK               u0     465.76 GB   976773168     WD-WCANU1090078
p2     OK               u0     465.76 GB   976773168     WD-WCANU1119743
p3     OK               u0     465.76 GB   976773168     WD-WCANU1089924
p4     OK               u1     465.76 GB   976773168     WD-WCANU1136981
p5     OK               u1     465.76 GB   976773168     WD-WCANU1109927
p6     DEGRADED         u1     465.76 GB   976773168     WD-WCAPW5103756
p7     OK               u1     465.76 GB   976773168     WD-WCANU1125288
So I think I’ll just have to leave it and hope that it finishes.

No comments:

Post a Comment