|
Title: PV State - Missing - how to fix? Post by: pete_d_ats on December 14, 2006, 04:57:28 PM Hello, I'm running AIX 4.3.3 ML8 and we had a scsi bus problem yesterday which caused 2 pdisks to go "stale". hdisk2 is part of rootvg and hdisk3 is part of homevg. Of course IBM will not help me since I'm still at 4.3.3 (we plan on upgrading to 5.2 after the first of the year). IBM CE ran diags on the scsi bus and drives and all are showing as OK.
I'm assuming that it's fairly easy to get these 2 drives back and mirrorred I just don't know the correct commands. Of course this is a production box. # lsvg -p homevg homevg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk3 missing 542 77 00..25..27..25..00 hdisk1 active 542 77 00..13..07..25..32 # lsvg -p rootvg rootvg: PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION hdisk0 active 542 9 00..00..00..00..09 hdisk2 missing 542 9 00..00..00..00..09 # lspv hdisk2 PHYSICAL VOLUME: hdisk2 VOLUME GROUP: rootvg PV IDENTIFIER: 0008f5af182b889f VG IDENTIFIER 0008f5afec749b98 PV STATE: missing STALE PARTITIONS: 239 ALLOCATABLE: yes PP SIZE: 16 megabyte(s) LOGICAL VOLUMES: 8 TOTAL PPs: 542 (8672 megabytes) VG DESCRIPTORS: 1 FREE PPs: 9 (144 megabytes) USED PPs: 533 (8528 megabytes) FREE DISTRIBUTION: 00..00..00..00..09 USED DISTRIBUTION: 109..108..108..108..100 lspv hdisk3 PHYSICAL VOLUME: hdisk3 VOLUME GROUP: homevg PV IDENTIFIER: 0008f5af0071eb82 VG IDENTIFIER 0008f5af372a18c1 PV STATE: missing STALE PARTITIONS: 350 ALLOCATABLE: yes PP SIZE: 16 megabyte(s) LOGICAL VOLUMES: 11 TOTAL PPs: 542 (8672 megabytes) VG DESCRIPTORS: 1 FREE PPs: 77 (1232 megabytes) USED PPs: 465 (7440 megabytes) FREE DISTRIBUTION: 00..25..27..25..00 USED DISTRIBUTION: 109..83..81..83..109 Title: Re: PV State - Missing - how to fix? Post by: Michael on December 16, 2006, 08:36:50 AM I have to think a bit on this. 4.3.3 is a while back, and I need to focus on what *MISSING* actually means again.
In SMS can you see all 4 disks? verify SCSI termination, and if they are two seperate vg, try focusing on rootvg first, with each disk on a seperate adapter (SCSI chain) so that you can be very certain that there is no SCSI address error (as my memory comes back to me, double SCSI address was the most frequent "cause" of missing, if the disk controller wasnt burned itself). Title: Re: PV State - Missing - how to fix? Post by: pete_d_ats on December 16, 2006, 03:23:05 PM Not using SMS with this server. Using smit. But in talking with AIX s/w support yesterday he walked me thru removing the mirrors on the 2 affected drives. Only problem now is that I can't "find" the drives after removing them from the system (via rmdev -dl "hdiskX") so we are now looking at the scsi bus (both drives are on the same bus). The IBM CE is "hoping" that our weekly reboot will "clear up" the scsi bus since it now seems to be stuck or locked out/busy. If that doesn't clear up the hold on the bus, he'll have to replace the scsi bus/adapter, which will be a long and lengthy process...
pete Title: Re: PV State - Missing - how to fix? Post by: Michael on December 16, 2006, 11:28:41 PM The command you will need is ldeletepv. (on 4.3.3. 5.X made it 'easier'.)
http://www.faqs.org/faqs/aix-faq/part1/section-41.html for some hints. Partial quote below. There is another command I have to look for which will list all the pvid of all the disks in a vg. Maybe someone else finds that command first. Quote reducevg -f <vgname> <pvid> ldeletepv -g VGid -p PVid -g Required, specify the VGid of the volume group you are removing the physical volume from -p Required, specify the PVid of the PV to be removed Title: Re: PV State - Missing - how to fix? Post by: Michael on December 17, 2006, 12:36:25 PM Ok. Basic steps.
1. Do a complete backup. 2. List what AIX knows: Quote root@x054[/usr/sbin]:lqueryvg -p hdisk0 -At Max LVs: 256 PP Size: 23 Free PPs: 586 LV count: 9 PV count: 2 Total VGDAs: 3 Conc Allowed: 0 MAX PPs per PV 1016 MAX PVs: 32 Conc Autovaryo 0 Varied on Conc 0 Logical: 0040dd9a00004c000000010d5ae1df16.1 hd5 1 0040dd9a00004c000000010d5ae1df16.2 hd6 1 0040dd9a00004c000000010d5ae1df16.3 hd8 1 0040dd9a00004c000000010d5ae1df16.4 hd4 1 0040dd9a00004c000000010d5ae1df16.5 hd2 1 0040dd9a00004c000000010d5ae1df16.6 hd9var 1 0040dd9a00004c000000010d5ae1df16.7 hd3 1 0040dd9a00004c000000010d5ae1df16.8 hd1 1 0040dd9a00004c000000010d5ae1df16.9 hd10opt 1 Physical: 0041d26a06c1122e 2 0 0040dd8acc23029a 1 0 Total PPs: 1074 LTG size: 128 HOT SPARE: 0 AUTO SYNC: 0 VG PERMISSION: 0 SNAPSHOT VG: 0 IS_PRIMARY VG: 0 PSNFSTPP: 4352 VARYON MODE: 0 VG Type: 0 Max PPs: 32512 root@x054[/usr/sbin]:lspv hdisk0 0041d26a06c1122e rootvg active hdisk1 0040dd8acc23029a rootvg active hdisk2 0040dd9a95b310b7 backupvg active hdisk3 0040dd9a95b316b0 backupvg active hdisk4 0040dd9a54f30c35 nimvg active hdisk5 0040dd9a54f314ed nimvg active hdisk6 0040dd9a54f31b6e nimvg active hdisk7 0040dd9a54f320b8 nimvg active root@x054[/usr/sbin]:lsvg rootvg VOLUME GROUP: rootvg VG IDENTIFIER: 0040dd9a00004c000000010d5ae1df16 VG STATE: active PP SIZE: 8 megabyte(s) VG PERMISSION: read/write TOTAL PPs: 1074 (8592 megabytes) MAX LVs: 256 FREE PPs: 586 (4688 megabytes) LVs: 9 USED PPs: 488 (3904 megabytes) OPEN LVs: 8 QUORUM: 1 TOTAL PVs: 2 VG DESCRIPTORS: 3 STALE PVs: 0 STALE PPs: 0 ACTIVE PVs: 2 AUTO ON: yes MAX PPs per VG: 32512 MAX PPs per PV: 1016 MAX PVs: 32 LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no HOT SPARE: no BB POLICY: relocatable 3. Perform the removal of a 'dead' disk from a working volumn group. Let's assume that my hdisk1 was missing. Then I would use: # ldeletepv -g 0040dd9a00004c000000010d5ae1df16 -p 0040dd8acc23029a # rmdev -dl hdisk1 Title: Re: PV State - Missing - how to fix? Post by: pete_d_ats on December 17, 2006, 01:27:54 PM This is what IBM AIX S/W support had me do to remove hdisk3 from homevg:
lsvg -p homevg lsvg -l homevg lquerypv -h /dev/hdisk3 80 10 <-- this will give the same info as reported by the lspv command -- no info was reported, so that shows that the ODM is "out of sync" lquerypv -h /dev/hdisk3 <-- no output lquerypv -h /dev/hdisk1 80 10 <-- ran this to verify output was correct for hdisk1 (which is OK) lquerypv -h /dev/hdisk1 <-- gives complete dump. look down at address 0080 and you'll see same as above "80 10" output lsvg -M homevg <-- shows all hdisks associated with homevg lsvg -M homevg|grep hdisk3 <-- look to see what is referring hdisk3 unmirrorvg -c 1 homevg hdisk1 <-- this failed, so have to individually break mirror on each lv independently. lsvg -l homevg <-- to show ratios again rmlvcopy lvbdslib 1 hdisk3 <-- picked smallest and least volatile lsvg -l homevg|grep lvbdslib <-- look for ratio between LP's and PP's. If not mirrored PV should show 1 and LV STATE s/b open/syncd rmlvcopy lvatimmback 1 hdisk3 <-- run this command for all lv's associated with homevg (based on the "lsvg -l homevg" output from above rmlvcopy lvlogfiles 1 hdisk3 rmlvcopy lvcashlogs 1 hdisk3 rmlvcopy lvhauler 1 hdisk3 rmlvcopy lvlocal 1 hdisk3 rmlvcopy lvmim 1 hdisk3 rmlvcopy lvatimm 1 hdisk3 rmlvcopy loglv00 1 hdisk3 rmlvcopy lvinformix 1 hdisk3 lsvg -l homevg <-- should only see 1 to 1 ratio and PVs of 1 with PV STATE at "open/syncd" lsvg -M homevg <-- should only show hdisk1 refs lsvg -M homevg|grep -v hdisk1 <-- except is 1 line stating "hdisk3:1-542" which just shows it being assigned to homevg reducevg homevg hdisk3 <-- to remove hdisk3 from homevg lsvg -p homevg <-- should show only 1 hdisk rmdev -dl hdisk3 <-- totally remove hdisk3 from system lsdev -Cc disk <-- verify no hdisk3 synclvodm -Pv homevg <-- this is to "sync up" the odm refs for homevg lspv <-- validate no ref to hdisk3 cfgmgr -v <-- this "should" refind the hdisk. lsdev -Cc disk <-- NO HDISK3!!! This means a hardware error at this point. At this point I need to call IBM hardware since the system can not find either disk in scsi0. Title: Re: PV State - Missing - how to fix? Post by: Michael on December 18, 2006, 07:17:57 AM The procedure I repeated was one which was used when a disk died (especially for mirrors) and therefore could no longer be updated.
Are there any disks active on scsi0? To reset a scsi controller you need to bring the device into 'defined' state. To be sure you have a device completely 'quiet' you need to get it into defined state down to it's slot, or pciX definition. Starting with a hdisk we need to find it's parent. $ lsdev -Cl hdisk1 -F parent Quote michael@x100:[/etc/objrepos]lsdev -Cl hdisk1 -F parent scsi0 To set this controller, and all the devices it is supporting in defined state use the command: # rmdev -l scsi0 -R If all devices are in a state that they can be 'powered off' you will see a list of devices going into defined state. Chances are you will see something like this. Quote michael@x100:[/etc/objrepos]rmdev -l scsi0 -R cd0 Defined Method error (/etc/methods/ucfgdevice): 0514-062 Cannot perform the requested function because the specified device is busy. Some devices are still available: Quote michael@x100:[/etc/objrepos]lsdev -Cl scsi0 scsi0 Available 04-C0 Wide SCSI I/O Controller michael@x100:[/etc/objrepos]lsdev -C | grep 04-C0 scsi0 Available 04-C0 Wide SCSI I/O Controller cd0 Defined 04-C0-00-3,0 SCSI Multimedia CD-ROM Drive hdisk0 Available 04-C0-00-4,0 16 Bit SCSI Disk Drive hdisk1 Available 04-C0-00-5,0 16 Bit SCSI Disk Drive rmt0 Available 04-C0-00-0,0 4.0 GB 4mm Tape Drive Conclusion: on this system I wont be able to attempt a reset of scsi0 without rebooting the system. |