Ledcode 0555 fsck error
 
*
Welcome, Guest. Please login or register. January 10, 2009, 12:32:14 AM


Login with username, password and session length


Pages: 1 [2]   Go Down
  Print  
Author Topic: Ledcode 0555 fsck error  (Read 3321 times)
0 Members and 1 Guest are viewing this topic.
fbergenh
Full Member
***
Posts: 16


« Reply #7 on: October 25, 2007, 02:18:52 PM »

Michael,

What did you mean by clearing the jfslog partition in your comment:

"In particular with / (or hd4), you may need to clear the jfslog partition before fsck will be able to rebuild the superblock information."

Is that the same as "yes | logform /dev/hd8" ?


By the way, I think (or hope) I am making some progress.

The first time I tried to follow the procedure mentioned before, I received the "lqueryvg" messages and the "unable to read superblock" messages one hour after selecting the option:

2) access this volume group and start a shell before mounting the filesystems

I did try to do some fsck manually, but they gave the same error messages (Unable to read superblock).

When I follow the procedure today, after 11 minutes the console is showing me this:

---------------------------------------------
importing volume group
rootvg
checking the / file system

log redo processing for /dev/rhd4

after another 35 minutes the following messages appear on the console:

syncpt record at 20c0c8
error writing 7,48
write of map failed 4
update of maps failed
update of superblock failed
write of map failed 5
update of maps failed
update of superblock failed
write of map failed 6
update of maps failed
update of superblock failed
write of map failed 7
update of maps failed
update of superblock failed
write of map failed 8
update of maps failed
update of superblock failed
write of map failed 9
update of maps failed
update of superblock failed
end of log 2142ff8
syncpt record at 20cc0c8
syncpt address 20bf0d4
number of log records = 4998
number of do blocks = 84
number of nodo blocks = 0
failure replaying log = 0
/dev/rhd4: unable to read superblock (TERMINATED)
checking the /usr filesystem

/dev/rhd2: unable to read superblock (TERMINATED)
exit from this shell to continue the process of accessing the root volume group
#

after that I tried to clean the jfslog:

# /usr/sbin/logform /dev/hd8
logform: destroy /dev/hd8 (y)? y
/usr/sbin/logform: I/O error

and a fsck:

# fsck -n /
fsck: unable to read superblock (TERMINATED)

I am used to the fact that occassionally fsck won't repair all the issues at once the first time, so I will repeat the actions one more time tomorrow. Maybe the error messages (write of maps failed etc) will be less, then I know it is really progressing. Otherwise, I think I will have to replace the disk and install the machine from scratch....
« Last Edit: October 25, 2007, 03:07:39 PM by fbergenh » Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #6 on: October 25, 2007, 11:29:58 AM »

I have seen it before, though not recently (last three years), but generally on test and/or "ancient" systems. And of course, many times as a training exercise. My comments are based on the various solution paths, and things that go wrong with the training exercise - and some personal experience with real disk failures (the old 2.0 and 2.2G disks died horribly and often!)

So, I am hoping for some new insights!
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #5 on: October 25, 2007, 08:51:03 AM »

John and Michael,

Thanks for the help, I'll let you know what I've done.

Your last remark was a bit disappointing, Michael. It would have been easier if somebody had seen this before....  Wink

Luckely, it is a test machine, not a critical one.
Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #4 on: October 25, 2007, 08:40:31 AM »

diagnostics is dependent on information in the error log files. As that is not mounted, it can only see the system as it is right now.

With disks there are some extra checks (certify and format plus certify) that you need to be sure about the status of your disks.

Try to mount /var and review the contents of your system. Also note, if you system crashed you may have extra information on your dump device (hd6 probably). See if you can copy the core (rc.boot uses this code)
            # Mount /var for copycore
            echo "rc.boot: executing \"fsck -fp var\"" \
                >>/../tmp/boot_log
            fsck -fp /var
            echo "rc.boot: executing \"mount /var\"" \
                >>/../tmp/boot_log
            mount /var
            [ $? -ne 0 ] && loopled 0x518
            # retrieve dump
            echo "rc.boot: executing \"copycore\"" \
                >>/../tmp/boot_log
            copycore
            umount /var

I have never tried this in maint mode, so I am not sure if it will be doable.

If successful, you can use kdb to check the status, and last three errpt messages (that were in core, but not yet written to errorlog files.

Good luck - keep us informed. These problems don't occur often on live systems - so all steps taken to recover are useful for later review.
Logged
Michael
Administrator
Hero Member
*****
Posts: 539


« Reply #3 on: October 25, 2007, 08:33:30 AM »

The code being executed to give you code 555 is:
            echo "rc.boot: executing \"fsck -fp /dev/hd4\"" \
                >>/tmp/boot_log
            fsck -fp /dev/hd4
            [ "$?" -ne 0 ] && loopled 0x555

When you enter maintenance mode be sure (as John suggests) to varyon rootvg without mounting the filesystems.

At this point you can run fsck -n on /dev/hd4 and see what errors it reports. If you can at least run the -n option successfully, you should also be able to run a corrective action.

Maybe you have a low level disk failure - you could try mounting /var and reading the error logs, and/or use alog -o to read a log (e.g. the console log) for error messages posted before the server went down.

Sometimes the "corrupted" area is the jfslog (or jfs2log). Again, start with fsck -n on all the other filesystems to know which ones need no attention. If you have any reason to doubt logical volume hd8 then reformat it (yes | logform /dev/hd8) and run fsck again.

In particular with / (or hd4), you may need to clear the jfslog partition before fsck will be able to rebuild the superblock information.

Advice: be conservative - use fsck -n first to see the errors fsck is finding before actually starting a fix.
Logged
fbergenh
Full Member
***
Posts: 16


« Reply #2 on: October 25, 2007, 08:30:43 AM »

John,

Yes, I took option 4, without mounting the filesystems.

My problem now is that I am not sure what's causing the problem. Just a corrupt filesystem or is the disk gone?

I assume that a complete diagnostics of the system would give me a warning if the disk is dying or already gone?

Frank

Logged
John Peck
Global Moderator
Senior Member
*****
Posts: 46


« Reply #1 on: October 25, 2007, 05:45:04 AM »

Were you taking option 4, the without mounting filesystems one ?
fsck will usually fail on a mounted filesystem, not necessarily because there is a problem, and it can't write to it to fix anything if it's done while mounted.
Logged
fbergenh
Full Member
***
Posts: 16


« on: October 23, 2007, 12:24:16 PM »

Hi everybody,

I am having a problem with one of my P660's (running AIX 5.2 ML7).

Yesterday morning the machine was not availlable anymore, and when I checked it, it was showing a led code 0555 fsck error.

It is not possible to boot the machine, I suspect it is a corrupt filesystem or corrupt boot device/image. Sadly enough, it is the rootvg, which has only one disk.... Cry

I've tried to follow a procedure I found in a redbook ("Problem Solving and Troubleshooting AIX 5L", chapter LED 551,555 or 557 halt), but I am not sure what the actual problem is.

The steps I should do according to the redbook:

1) boot from bootable media
2) choose "start maintenance mode for system recovery"
3) choose "access a root volume group"
4) choose "access this volume group and start a shell before mounting the filesystems"

According to my console, it was trying to do an importvg of the rootvg, but after one hour the following messages where shown:

----------------------------------------
importing volume group
0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permantly corrupted. Run diagnostics

rootvg
0516-062 lqueryvg: Unable to read or write logical volume manager record. PV may be permantly corrupted. Run diagnostics

checking / file system
/dev/rhd4: unable to read superblock (TERMINATED)

checking /usr file system
/dev/rhd2: unable to read superblock (TERMINATED)

exit from this shell to continue the process of accessing the root volume group
-----------------------------------

On the prompt I tried to manually fsck some of the filesystems, but naturally these commands went wrong also.

After all this, I decided to run a full diagnostics of the system, this returned well without any problems.

I am a bit confused, because the diagnostics shows no trouble at all. On the other hand, the lqueryvg messages let me think that there might be a corrupt pv.

I am not sure how to procede. Is it enough to re-install AIX (software problem) or should I replace the disk (hardware problem)?
Logged
Pages: 1 [2]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC

Valid XHTML 1.0! Valid CSS! Dilber MC Theme by HarzeM
Page created in 3.885 seconds with 18 queries.




eXTReMe Tracker

Terms of Use and Privacy and Security Policies
Copyright 2001-2008 Michael Felt and ROOTVG.NET