haaaard.. disc

 
Post new topic   Reply to topic    Lunar-Linux Forum Index -> General Discussion
View previous topic :: View next topic  
Author Message
wdp



Joined: 09 Sep 2006
Posts: 6
Location: Bremen

PostPosted: Sat Jan 12, 2008 4:32 am    Post subject: haaaard.. disc Reply with quote

Hey,

i found the following lines in my logfiles:

Code:
Jan 12 01:12:07 irulan smartd[1815]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 64 to 65
Jan 12 01:12:07 irulan smartd[1815]: Device: /dev/hda, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 64 to 65
Jan 12 02:12:07 irulan smartd[1815]: Device: /dev/hdc, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 58 to 57
Jan 12 02:12:07 irulan smartd[1815]: Device: /dev/hdc, SMART Usage Attribute: 195 Hardware_ECC_Recovered changed from 58 to 57
Jan 12 03:12:07 irulan smartd[1815]: Device: /dev/hda, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 65 to 64


I tried to find out wether that's good or bad. At the moment i'm still not sure, maybe someone of you can help?

smartctl -A shows:

hda
Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   065   051   006    Pre-fail  Always       -       115551025
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       55
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   079   060   030    Pre-fail  Always       -       17578685589
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9156
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       64
194 Temperature_Celsius     0x0022   032   053   000    Old_age   Always       -       32
195 Hardware_ECC_Recovered  0x001a   065   051   000    Old_age   Always       -       115551025
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   188   000    Old_age   Always       -       74
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0


hdc
Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   058   055   006    Pre-fail  Always       -       134704791
  3 Spin_Up_Time            0x0003   099   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   030    Pre-fail  Always       -       329042277
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       10253
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       44
194 Temperature_Celsius     0x0022   033   055   000    Old_age   Always       -       33
195 Hardware_ECC_Recovered  0x001a   058   054   000    Old_age   Always       -       134704791
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0


both harddiscs are the same; here's hdparm -i:

Code:

 Model=ST380011A, FwRev=3.06, SerialNo=3JV822FQ
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=156299375
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 *udma4 udma5
 AdvancedPM=no WriteCache=enabled
 Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2:  ATA/ATAPI-1,2,3,4,5,6


why udma4 if it could run at udma5? In fact hdc is running in udma5, while hda is running in udma4. I don't know why, in my syslog i got the following:

Code:

VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt8237 (rev 00) IDE UDMA133 controller on pci0000:00:0f.1
    ide0: BM-DMA at 0xfc00-0xfc07, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xfc08-0xfc0f, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
Switched to high resolution mode on CPU 0
Switched to high resolution mode on CPU 1
hda: ST380011A, ATA DISK drive
hda: selected mode 0x45
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: ST380011A, ATA DISK drive
hdc: selected mode 0x45
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 512KiB
hda: Host Protected Area detected.
        current capacity is 156299375 sectors (80025 MB)
        native  capacity is 156301488 sectors (80026 MB)
hda: Host Protected Area disabled.
hda: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
hdc: max request size: 512KiB
hdc: Host Protected Area detected.
        current capacity is 156299375 sectors (80025 MB)
        native  capacity is 156301488 sectors (80026 MB)
hdc: Host Protected Area disabled.
hdc: 156301488 sectors (80026 MB) w/2048KiB Cache, CHS=16383/255/63, UDMA(100)
hdc: cache flushes supported
 hdc: hdc1 hdc2 hdc3 hdc4 < hdc5 hdc6 hdc7 >


no failure so far.. after turning the raid on and mounting it, i get:

Code:

raid1: raid set md0 active with 2 out of 2 mirrors
md: ... autorun DONE.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 292k freed
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
ide0: reset: success
EXT3 FS on hda7, internal journal


so as you can see it's starting with udma5 - at the point while or before mounting hda7 it's disabling DMA and switching to udma4 - if i try to manually change that with hdparm i get the same errors again in dmesg and it's automatically changing again to udma4.

The harddisc(s) are running pretty fine. Anyway. If i go on them with e2fsck (Read only) i get the following:

root partition - hda7
Code:

Pass 1: Checking inodes, blocks, and sizes
Deleted inode 1224228 has zero dtime.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(2469935--2469937) -(2470117--2470118)
Fix? no

Inode bitmap differences:  -1224228
Fix? no


/usr - hda5
Code:

Pass 1: Checking inodes, blocks, and sizes
Deleted inode 194639 has zero dtime.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -(418059--418096)
Fix? no

Inode bitmap differences:  -194639
Fix? no


on hdc hdc3
Code:

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (817363, counted=817362).
Fix? no


hdc7 (not in use atm)
Code:

Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
/lost+found not found.  Create? no

Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (619015, counted=1070784).
Fix? no

Free inodes count wrong (521339, counted=611636).
Fix? no


i checked the raid device, too, there are no filesystem errors. so i'm not seeing anything 'critical'.

Someone any idea on this? I mean - is the harddisc broken, should i replace it, will the harddisc die in a few weeks (i have that errors now a while and nothing happened), is this behavior correctable and maybe chipset/kernel related... ANY ideas?
Back to top
View user's profile Send private message Visit poster's website
wdp



Joined: 09 Sep 2006
Posts: 6
Location: Bremen

PostPosted: Sat Jan 12, 2008 4:44 am    Post subject: smartctl Reply with quote

ah well.. and here:

Code:

root@irulan /home # smartctl -H /dev/hda           
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

root@irulan /home # smartctl -H /dev/hdc
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


seems okay, too
Back to top
View user's profile Send private message Visit poster's website
wdp



Joined: 09 Sep 2006
Posts: 6
Location: Bremen

PostPosted: Sat Jan 12, 2008 5:30 am    Post subject: and again teh me Reply with quote

smartctl --test=long /dev/hda
Code:

# 1  Extended offline    Completed without error       00%      9158         -
# 2  Extended offline    Aborted by host               90%      9157         -
# 3  Short offline       Completed without error       00%      5208         -


the # 2 is there, because i aborted the check, anyway, as you can see, here's no problem, too.
_________________
Noooo! Kitty!!! not the penguin!! go take another bird
Back to top
View user's profile Send private message Visit poster's website
wdp



Joined: 09 Sep 2006
Posts: 6
Location: Bremen

PostPosted: Sat Jan 12, 2008 3:24 pm    Post subject: Reply with quote

jfyi

el_angelo mentioned that it could be the chipset. He had similar problems for 2 years and nothing happened. So it could be 'safe' (more or less) to just ignore this "errors"

Anyway.
_________________
Noooo! Kitty!!! not the penguin!! go take another bird
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    Lunar-Linux Forum Index -> General Discussion All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group