使用smartctl命令檢查磁盤
在TS過(guò)程中會(huì)碰到很多磁盤異常出現(xiàn)影響系統(tǒng)數(shù)據(jù)或者生產(chǎn)數(shù)據(jù)的情況跟衅,但是有時(shí)候無(wú)法判斷磁盤出現(xiàn)問(wèn)題是由于磁盤物理?yè)p壞還是磁盤SATA口接觸不良導(dǎo)致,這個(gè)時(shí)候smartctl命令就可以很容易判斷出磁盤是否存在物理?yè)p壞問(wèn)題,并且及時(shí)作出預(yù)警。
使用smartctl進(jìn)行SMART測(cè)試
所有現(xiàn)代硬盤都可通過(guò)SMART
屬性監(jiān)視其當(dāng)前狀態(tài)痛单。這些值提供有關(guān)硬盤各種參數(shù)的信息,并可提供有關(guān)磁盤剩余壽命或任何可能的錯(cuò)誤的信息嵌牺。此外帮辟,可以執(zhí)行各種SMART
測(cè)試速址,以確定磁盤上的任何硬件問(wèn)題。本文介紹如何使用smartctl(Smartmontools)對(duì)Linux進(jìn)行此類測(cè)試织阅。
安裝Smartmontools
Smartmontools可以使用yum直接安裝:
# yum install -y smartmontools
要確保硬盤支持SMART并啟用壳繁,請(qǐng)使用以下命令(本例中為硬盤/ dev / sda):
# smartctl -i /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.10.0-514.el7.x86_64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: HGST HUS726030ALE610
Serial Number: N8GH1H5Y
LU WWN Device Id: 5 000cca 244c6d728
Firmware Version: APBDT7JN
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Oct 11 07:09:17 2017 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
上面的參數(shù)中,SMART support is: Available - device has SMART capability.
顯示磁盤SMART是支持的荔棉,SMART support is: Enabled
是可用的
可用測(cè)試
SMART
根據(jù)規(guī)格類型和SCSI
設(shè)備提供兩種不同的測(cè)試闹炉。這些測(cè)試中的每一個(gè)都可以以兩種模式進(jìn)行:
- 前景模式
- 背景模式
在背景模式下,測(cè)試的優(yōu)先級(jí)低润樱,這意味著正常指令繼續(xù)由硬盤處理渣触。如果硬盤驅(qū)動(dòng)器正忙,則測(cè)試暫停壹若,然后以較低的加載速度繼續(xù)運(yùn)行嗅钻,因此不會(huì)中斷操作。
在前景模式下店展,所有命令將在測(cè)試期間以“檢查條件”狀態(tài)進(jìn)行應(yīng)答养篓。因此,僅當(dāng)不使用硬盤時(shí)赂蕴,才建議使用此模式柳弄。
原則上,背景模式是首選模式概说。
ATA / SCSI測(cè)試
短期測(cè)試
短期測(cè)試的目標(biāo)是快速識(shí)別有缺陷的硬盤驅(qū)動(dòng)器碧注。因此,短測(cè)試的最大運(yùn)行時(shí)間為2分鐘糖赔。測(cè)試通過(guò)將磁盤劃分成三個(gè)不同的段來(lái)檢查磁盤萍丐。測(cè)試以下幾個(gè)方面:
- 電氣特性:控制器測(cè)試自己的電子元件,由于這是每個(gè)制造商特有的放典,因此無(wú)法準(zhǔn)確解釋正在測(cè)試的內(nèi)容逝变。例如,可以想到測(cè)試內(nèi)部RAM奋构,讀/寫電路或頭電子元件壳影。
- 機(jī)械特性:要測(cè)試的伺服系統(tǒng)和定位機(jī)制的確切順序也是每個(gè)制造商特有的。
- 讀/校驗(yàn):它將讀取磁盤的某個(gè)區(qū)域并驗(yàn)證某些數(shù)據(jù)声怔,讀取的區(qū)域的大小和位置也是每個(gè)制造商特有的。
長(zhǎng)期測(cè)試
長(zhǎng)期測(cè)試被設(shè)計(jì)為生產(chǎn)中的最終測(cè)試舱呻,與短暫測(cè)試相同醋火,有兩個(gè)差異悠汽。
第一個(gè):沒(méi)有時(shí)間限制,并且在讀/校驗(yàn)中檢查整個(gè)磁盤芥驳,而不僅僅是一個(gè)部分柿冲。例如,長(zhǎng)期測(cè)試可以用于確認(rèn)短測(cè)試的結(jié)果兆旬。
ATA指定測(cè)試
此處列出的所有測(cè)試僅適用于ATA硬盤驅(qū)動(dòng)器假抄。
運(yùn)輸測(cè)試
可以執(zhí)行該測(cè)試以確定在幾分鐘內(nèi)傳輸硬盤時(shí)的損壞。
選擇測(cè)試
在選定的測(cè)試期間丽猬,檢查指定的邏輯塊范圍宿饱。要掃描的邏輯塊以以下格式指定:
# smartctl -t select,10-20 /dev/sda ## LBA 10 to LBA 20(incl.)
# smartctl -t select,10+11 /dev/sda ## LBA 10 to LBA 20(incl.)
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
SPAN STARTING_LBA ENDING_LBA
0 10 20
Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.
Testing has begun.
也可以有多個(gè)范圍(最多5個(gè))進(jìn)行掃描:
# smartctl -t select,0-10 -t select,10-20 /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Selective self-test routine immediately in off-line mode".
SPAN STARTING_LBA ENDING_LBA
0 0 10
1 10 20
Drive command "Execute SMART Selective self-test routine immediately in off-line mode" successful.
Testing has begun.
smartctl的測(cè)試程序
在執(zhí)行測(cè)試之前,使用以下命令顯示各種測(cè)試的持續(xù)時(shí)間的近似值:
# smartctl -c /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
以下命令啟動(dòng)所需的測(cè)試(在后臺(tái)模式下):
# smartctl -t <short|long|conveyance|select> /dev/sda
也可以執(zhí)行“離線”測(cè)試脚祟。但是谬以,僅進(jìn)行標(biāo)準(zhǔn)自檢(Short Test)。
示例輸出:
# smartctl -t short /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Wed Oct 11 15:40:28 2017
Use smartctl -X to abort test.
要在前臺(tái)模式下執(zhí)行測(cè)試由桌,必須在命令中添加“-C”为黎。
# smartctl -t <short|long|conveyance|select> -C /dev/sda
查看測(cè)試結(jié)果
通常,測(cè)試結(jié)果包含在以下命令的輸出中:
# smartctl -a /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SV300S37A60G
Serial Number: 50026B724A01E182
LU WWN Device Id: 5 0026b7 24a01e182
Firmware Version: 580ABBF0
User Capacity: 60,022,480,896 bytes [60.0 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 revision 3
Local Time is: Wed Oct 11 15:41:49 2017 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7d) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0025) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 095 095 050 Old_age Always - 8601004262
5 Reallocated_Sector_Ct 0x0033 099 099 003 Pre-fail Always - 0
9 Power_On_Hours 0x0032 085 085 000 Old_age Always - 145066815403100
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 120
171 Unknown_Attribute 0x000a 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
174 Unknown_Attribute 0x0030 000 000 000 Old_age Offline - 97
177 Wear_Leveling_Count 0x0000 000 000 000 Old_age Offline - 96
181 Program_Fail_Cnt_Total 0x000a 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0012 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x0000 029 041 000 Old_age Offline - 77312098333
194 Temperature_Celsius 0x0022 029 041 000 Old_age Always - 29 (Min/Max 18/41)
195 Hardware_ECC_Recovered 0x001c 102 102 000 Old_age Offline - 8601004262
196 Reallocated_Event_Count 0x0033 099 099 003 Pre-fail Always - 0
201 Soft_Read_Error_Rate 0x001c 102 102 000 Old_age Offline - 8601004262
204 Soft_ECC_Correction 0x001c 102 102 000 Old_age Offline - 8601004262
230 Head_Amplitude 0x0013 100 100 000 Pre-fail Always - 100
231 Temperature_Celsius 0x0013 091 091 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0032 000 000 000 Old_age Always - 9261
234 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 14820
241 Total_LBAs_Written 0x0032 000 000 000 Old_age Always - 14820
242 Total_LBAs_Read 0x0032 000 000 000 Old_age Always - 6033
SMART Error Log not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 13404 -
# 2 Selective offline Completed without error 00% 13404 -
# 3 Selective offline Completed without error 00% 13404 -
# 4 Selective offline Completed without error 00% 13404 -
# 5 Short offline Completed without error 00% 13403 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 10 Not_testing
2 10 20 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
如果只顯示測(cè)試結(jié)果行您,也可以使用以下命令:
# smartctl -l selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 13404 -
# 2 Selective offline Completed without error 00% 13404 -
# 3 Selective offline Completed without error 00% 13404 -
# 4 Selective offline Completed without error 00% 13404 -
# 5 Short offline Completed without error 00% 13403 -
顯示磁盤全部的健康狀態(tài)
# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-358.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED ##說(shuō)明自檢通過(guò)铭乾,磁盤健康
查看磁盤錯(cuò)誤日志
# smartctl -l error /dev/sdb
Sample Output
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-32-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 occurred at disk power-on lifetime: 10042 hours (418 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 a0
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 a0 08 35d+16:09:47.709 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 08 35d+16:09:42.437 IDENTIFY DEVICE
ec 00 00 00 00 00 a0 08 35d+16:09:42.159 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 35d+16:09:42.159 SET FEATURES [Set transfer mode]
ec 00 00 00 00 00 a0 08 35d+16:09:42.159 IDENTIFY DEVICE
處理過(guò)程
首先通過(guò)smartctl -H /dev/sda
檢查磁盤健康狀態(tài),然后smartctl -a /dev/sda
查看磁盤詳細(xì)情況娃循,再對(duì)磁盤進(jìn)行短期測(cè)試smartctl -t short /dev/sda
炕檩,最后查看磁盤測(cè)試結(jié)果smartctl -l selftest /dev/sda
,基本磁盤健康狀態(tài)就可以定位出來(lái)淮野,最后檢查磁盤錯(cuò)誤日志smartctl -l error /dev/sdb