Ledcode 0555 fsck error
 
*
Welcome, Guest. Please login or register. January 10, 2009, 01:32:17 AM


Login with username, password and session length


Pages: [1] 2   Go Down
  Print  
Author Topic: Ledcode 0555 fsck error  (Read 3322 times)
0 Members and 1 Guest are viewing this topic.
fbergenh
Full Member
***
Posts: 16


« Reply #22 on: November 08, 2007, 11:19:00 AM »

By the way, for both of you: thanks for the help.  Grin
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #21 on: November 08, 2007, 11:18:07 AM »

John, when I arrived at work last monday, it was my intension to ask somebody to replace the disk. I never needed to ask it, because the machine was active again.  Wink

This morning, I did a

1) diagnostics on the disks (no errors)
2) certify on hdisk0, output:

Certify media task
device: hdisk0 in location U0.1-P1-I2/Z1-A0

The certify operation is in progress

please standby

... % completed
Disk drive capacity.....................18200 MB
Data errors recovered.......................1
Data errors not recovered..................0
Equipment check errors recovered.......0
equipment check errors not recovered..0


3) fsck on the filesystems /dev/hd1, /dev/hd2, /dev/hd3, /dev/hd4 and /dev/hd9var (no errors).

I guess this means everything is ok right now.

The only thing I'll have to look into is the quorum checking. And yes, I will keep an eye on the errpt....
Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #20 on: November 08, 2007, 08:45:57 AM »

John,

regarding VIO - I take that as a challange to do a writeup on best practices. And for those who cant wait - there is an IBM training regarding APV (advanced power virtualization) best practices. This course has lots of hands on (60%+ of the class time is to setting up dual VIO and shared ethernet adapter failover).

In most of the world the class code starts with the code AU78. The code in USA was Q1378 but that  has been changed to AU780 (so the USA is now also in "most of the world" Smiley )
Logged
John Peck
Global Moderator
Senior Member
*****
Posts: 46


« Reply #19 on: November 08, 2007, 01:06:20 AM »

Oh sorry but you said:
Quote
However, I think I am going to stop the efforts and ask the appropriate people to replace the disk (and add another to make mirroring of the rootvg possible).

This has reminded me of another thread where we were talking about running diagnostics to certify (and maybe format) a suspect disk.

As Michael noted, a system can die without being able to write an error to the log on the disk - such dying throws can be extracted from a dump etc..  However. it is entirely possible that a disk can fail, perhaps only in one tiny part of it, and even kill the system without leaving an error log entry anywhere. 

You can test this sort of thing with for example adding a disk to the rootvg to put some paging space on, then just pull out that part of the paging space to simulate failure, and as soon as the system decides to try and use that area, bang.  Like having a piece of your brain removed I suppose ;-)

Mirroring is always the answer.

You've done all this now apparently, but, when mirroring from a potentially suspect disk be sure to check that suspect disk thoroughly first with the diags and also run fsck checks - requiring maintenance mode etc.   While mirroring a disk, look especially for errors at that point of accessing all of the data. 

In future at the first sign of any disk errors with that suspect disk, I would replace it.  When adding any new disk, it's a good idea (although boringly slow) to do the diag format and certify first.

As an aside, just imagine how troublesome such things can be when you have VIO and maybe dozens of different operating system instances sharing a disk - eggs, basket,... 
« Last Edit: November 08, 2007, 01:23:13 AM by John Peck » Logged
fbergenh
Full Member
***
Posts: 16


« Reply #18 on: November 07, 2007, 08:24:16 PM »

Going back to the beginning of this thread, and the original problem...

It appears that the original cause was likely to have been a disk bad block/sector, probably under the hd8 JFSLOG LV, in an unmirrored rootvg.

Sadly the effect of a disk error can be fatal to any system, that's the main reason why it is recommended (here anyway) that you mirror rootvg at least
- and turn quorum checking off on a two disk mirroring set.

You said that you were replacing the duff disk and adding another to mirror it, so there's no reason why that should occur again.  Obviously you will monitor your error log and act on disk errors when and if they are logged.  When you appear to have a failing disk, simply un-mirror it out of the volume group, replace and re-mirror - all of which can be done on the fly these days usually.


Not entirely correct. I didn't replace the disk, just added another disk to the rootvg and did the mirror. There are no errors in the errpt (not even on the formerly 'corrupt' disk) and there are no signs that there are (for example) staled partition.

I am new at this company (started working for them on the 9th of september) and was checking all 'my' servers for strange things. Hadn't come to this one, otherwise I would have mirrored the rootvg immediately and well before the problems occured....  Wink
Logged
John Peck
Global Moderator
Senior Member
*****
Posts: 46


« Reply #17 on: November 07, 2007, 07:13:44 PM »


Going back to the beginning of this thread, and the original problem...

It appears that the original cause was likely to have been a disk bad block/sector, probably under the hd8 JFSLOG LV, in an unmirrored rootvg.

Sadly the effect of a disk error can be fatal to any system, that's the main reason why it is recommended (here anyway) that you mirror rootvg at least
- and turn quorum checking off on a two disk mirroring set.

You said that you were replacing the duff disk and adding another to mirror it, so there's no reason why that should occur again.  Obviously you will monitor your error log and act on disk errors when and if they are logged.  When you appear to have a failing disk, simply un-mirror it out of the volume group, replace and re-mirror - all of which can be done on the fly these days usually.
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #16 on: November 07, 2007, 01:13:06 PM »

I am glad the problem is resolved, but I find it a little bit disturbing that I don't have a clue on what caused the problem in the first place and why a powerdown can solve this kind of problems.

As far as I am concerned, it is possible it will occur again sometime in the future. Luckely, the problem won't be as big as it was this time, but still...
Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #15 on: November 06, 2007, 10:05:30 PM »

Thanks for the updates. Amazing what a power down can resolve!
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #14 on: November 05, 2007, 11:05:35 AM »

I already found an answer on my question:

"readvgda /dev/hdisk1" shows some old lv's from 2002, so I did an "extendvg -f rootvg hdisk1".

Logged
fbergenh
Full Member
***
Posts: 16


« Reply #13 on: November 05, 2007, 10:01:24 AM »

New development on this case:  Huh

Last weekend, there were some power-maintenance actions which needed all the machines to power down for the whole weekend. Last friday, I did a shutdown -Fh on all my machines (all but this machine, which ofcourse was already down due to the problems) and then I went home.

This morning, I did a ping on all the machine to check if they were active again and automatically, I tried to ping this machine also.

I was very suprised when the ping returned a normal reactie..... It seems that the powerdown was the key to the recovery of the system.

Naturally, the first thing I am trying to do is to mirror the rootvg, but that also causes some problems:

<hostname>:/root # extendvg rootvg hdisk1
0516-1398 extendvg: The physical volume hdisk1, appears to belong to
another volume group. Use the force option to add this physical volume
to a volume group.
0516-792 extendvg: Unable to extend volume group.


when I look at the disks, it seems hdisk1 is not assigned to any vg:

<hostname>:/root # lspv
hdisk0          005b835aa01e3d28                    rootvg          active
hdisk2          005b835af7e00482                    vgSQLTST_MOS_20 active
hdisk3          005b835a1195b4dc                    vg_bak          active
hdisk4          005b835aa01e3d68                    vg_esf          active
hdisk6          005b835af7e190cf                    vgSQLTST_MOS_20 active
hdisk1          005b835a1affe1d3                    None
hdisk5          00c8140e11571b1d                    None
hdisk7          none                                None
hdisk8          none                                None
hdisk9          none                                None
hdisk10         none                                None
dlmfdrv7        005b835a2d2517bf                    vg_p44          active
dlmfdrv8        005b835a2d2518e6                    vg_p44          active
dlmfdrv9        005b835abef765e6                    vg_esf          active
dlmfdrv10       005b835ac3bc2959                    vg_p44          active
dlmfdrv         none                                None


I am not sure if I should use the extendvg -f option, because I am not sure what's on hdisk1. Is there another way to check what is on the disk?

Logged
fbergenh
Full Member
***
Posts: 16


« Reply #12 on: October 29, 2007, 01:41:55 PM »

Thanks for the suggestions, John.

However, I think I am going to stop the efforts and ask the appropriate people to replace the disk (and add another to make mirroring of the rootvg possible).

I tried the following commands this morning:

remove hd8 from disk:
# rmlv -f hd8

output on console:

0516-062 lqueryvg: unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics.
0516-912: rmlv: unable to remove lv hd8


Make another loglv on disk:
# mklv -a e -t jfslog -y loglv00 rootvg 1 hdisk0

output on console:

0516-062 lqueryvg: unable to read or write logical volume manager record. PV may be permanently corrupted. Run diagnostics.
0516-822: mklv: unable to create logical volume


I even tried to move hd8 to another part of the disk with the chlv -a e hd8 command, but that also gave me a message like cannot move logical volume.

I guess the disk is really gone....  Sad
Logged
John Peck
Global Moderator
Senior Member
*****
Posts: 46


« Reply #11 on: October 26, 2007, 05:05:11 PM »


Sounds then like it may be the hd8 LV which is over an area of bad disk.
A similarly terminal experience may be had when some disk goes in your paging space area, if you haven't mirrored.

From the maintenance shell, try removing hd8 (rmlv)
and then recreate it in a different place on the disk
- so before that check where it was, doubtless "midway",
although it should be "center" but there probably isn't any space there anyway,
so try "edge" for the worst performance (move it later maybe)
- the key thing to note is that you have to say type is "jfslog"
not the default jfs, and there is of course no help or tab list on that smit field.
- then on the new hd8 LV do the logform before the fsck again.

Incidentally, the log doesn't have to be called hd8, you could change
the log associated with each rootvg filesystem to some other new name
and leave hd8 alone and corrupted (like Britney ?-)

Logged
fbergenh
Full Member
***
Posts: 16


« Reply #10 on: October 26, 2007, 10:52:54 AM »

I tried to do that, but with the following result:

# /usr/sbin/logform /dev/hd8
logform: destroy /dev/hd8 (y)? y
/usr/sbin/logform: I/O error
Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #9 on: October 26, 2007, 10:09:38 AM »

# logform /dev/hd8
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #8 on: October 26, 2007, 09:47:03 AM »

This morning I repeated the whole operation one more time, but the result was the same as yesterday afternoon.

I received exactly the same error messages:

importing volume group
rootvg
checking the / file system

log redo processing for /dev/rhd4

syncpt record at 20c0c8
error writing 7,48
write of map failed 4
update of maps failed
update of superblock failed
write of map failed 5
update of maps failed
update of superblock failed
write of map failed 6
update of maps failed
update of superblock failed
write of map failed 7
update of maps failed
update of superblock failed
write of map failed 8
update of maps failed
update of superblock failed
write of map failed 9
update of maps failed
update of superblock failed
end of log 2142ff8
syncpt record at 20cc0c8
syncpt address 20bf0d4
number of log records = 4998
number of do blocks = 84
number of nodo blocks = 0
failure replaying log = 0
/dev/rhd4: unable to read superblock (TERMINATED)
checking the /usr filesystem

/dev/rhd2: unable to read superblock (TERMINATED)
exit from this shell to continue the process of accessing the root volume group
#


I guess I am in a deadlock situation: I need to fsck / and /usr. In order to succeed the fsck on /, the jfslog needs to be ok. But the logform on hd8 gives me an I/O error.

Is there another way to clean the jfslog partition (or am I using the wrong command)?

Or is the I/O error the clue to a hardware failure?

Frank
Logged
Pages: [1] 2   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 4.32 seconds with 18 queries.




eXTReMe Tracker

Terms of Use and Privacy and Security Policies
Copyright 2001-2008 Michael Felt and ROOTVG.NET