Techie blog for me to remember since I did this before after lots of research and was frustrated when it happened to me again and I could not find the command – which means there is a dearth of comments about it on the internet. If you find this may it be exactly what you are looking for… it will be next time I forget.
Too many boring details in the history of why – and I have already written too much – odd considering. The zfs pool I had needed a disk replacement and here of the story of how I finally got it working.
Sun Thor x4540 Solaris 10.5
23 RAID 1 zfs sets
Greenplum database 22.214.171.124
A disk had too many errors and we needed to replace it – unfortunately they sent me a Hitachi replacement for a Seagate drive (SATA 500GB 7200rpm)
Standard replacement procedure:
– assuming failed disk is c3t5d0
# zpool status
–will show all disks in the zpool including one that failed
–will show all disks and which physical slot c3t5d0 is in the x4540
# cfgadm -alv | grep c3t4d0
–will show the device configuration slot
c3::dsk/c3t5d0 connected configured unknown ATA HITACHI HUA7250S
unavailable disk n /devices/pci@0,0/pci10de,376@f/pci1000,1000@0:scsi::dsk/c3t5d0
# zpool offline <pool> c3t5d0
–will remove the disk from the zfs raid set – errors if there is no redundancy… I like zfs.
# cfgadm -c unconfigure c3::dsk/c3t5d0
— unconfigures device from sun hardware
Remove and replace the device
# cfgadm -alv | grep c3t4d0
–check to see if it is there…
# cfgadm -c configure c3::dsk/c3t5d0
— configures device back into sun hardware
# zpool clear <poolname> [c3t5d0]
— this clears the drive back into the pool and it should start resilvering – with my machine about 80 minutes
*** except that it would not work… I am pretty sure because the wwn (world wide name) of the disk changed during the swap (shows in dmesg)
and you get the output below with a # zpool status: (second RAID 1 set is what a normal set is like)
mirror DEGRADED 0 0 0
c2t5d0 ONLINE 0 0 0
spare DEGRADED 0 0 0
c3t5d0 UNAVAIL 0 0 0 cannot open
c6t1d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
After much online searching I found it, then forgot and the second time it happened was frustrated trying to find the solution… so blog it… some public notes for you too.
in short, it needs to replace itself to accept the new wwn.
# zpool replace <poolname> c3t5d0 c3t5d0
yea, that simple… then it starts resilvering.
But wait… Another interesting issue if the drive shows as “removed” you may need to manually “remove” it again… annoying. Here is the command:
List out devices, see what the drive replaced as:
In my case it was
c6::dsk/c6t6d0 connected configured unknown ATA SEAGATE ST35002Nunavailable disk n /devices/pci@3c,0/pci10de,376@f/pci1000,1000@0:scsi::dsk/c6t6d0
c6::sd40 connected configured unknown ATA SEAGATE ST35002N
unavailable disk n /devices/pci@3c,0/pci10de,376@f/pci1000,1000@0:scsi::sd40
Then unconfigure it and let it “reinsert”.
- # cfgadm -c unconfigure c6::sd40
- # cfgadm -x remove_device c6::sd40
Removing SCSI devie: /devices/pci@3c,0/pci10de,376@f/pci1000,1000@0/sd@7,0
This operation will suspend activity on SCSI bus: c6
Continue (yes/no)? yes
SCSI bus quiesced successfully.
It is now safe to proceed with hotplug operation.
Enter y if operation is complete or n to abort (yes/no)? yes
Don’t forget to check:
# fmadm faulty -a
(The -a is important since vanilla fmadm faulty hides fixed problems and maintains the service light)
to see if you need to clear the fault.
Al this because there was bad firmware on the 48 drives in each of the three x4540’s we use. Oh yea… and you did all this 12x on all the drives that had errors… just in case …
Off to Vegas.