linux structure needs cleaning結(jié)構(gòu)需要清理

最近遇到一個類似的故障Bug624293-XFS internal error/mount: Structure need scleaning

structure needs cleaning

容器引擎啟動失敗盒粮，/home/robot/docker下報錯structure needs cleaning√辏看Linux 操作系統(tǒng)日志也是上邊的報錯。

首先問自己為什么(why)出現(xiàn)structure? needs cleaning?什么時間(when)會出現(xiàn)structure needs cleaning?怎么(how)恢復(fù)環(huán)境?

Try to repair:首先嘗試修復(fù)

[root@scheat tmp]# xfs_check /dev/vdb

xfs_check: 無法初始化數(shù)據(jù)cannot init perag data (117)

ERROR:文件系統(tǒng)在日志中有重要的元數(shù)據(jù)更改吊宋，需要重播疲眷。 The filesystem has? valuable metadata changes in a log which? needs to be replayed.? 掛載文件系統(tǒng)重播日志，卸載文件系統(tǒng)前首先運行xfs_check (Mount the? filesystem to replay the log, and unmount it before re-running xfs_check). 如果無法卸載文件系統(tǒng)則使用xfs_repair -L 參數(shù)破壞日志并嘗試修復(fù)理茎。 If you are? unable to mount the filesystem, then use the xfs_repair -L option? to destroy the? log and attempt a repair.

Note that destroying the log may cause? corruption -- please attempt a mount of the? filesystem before doing this.

[root@scheat tmp]# xfs_repair /dev/vdb

Phase 1 - find and verify superblock...

Phase 2 - using internal log

? ? ? ? - zero log...

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed.? Mount the filesystem to replay the log, and unmount it before re-running? xfs_repair.? If you are unable to mount? the filesystem, then use the -L option to? destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount of the? filesystem before doing this.

[root@scheat tmp]#

xfs_metadump? -g /dev/vdb ./dev-vdb.dump

xfs_metadump: cannot init perag data (117)

Copying log? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

[root@scheat tmp]

nothing help

going forward with:下一步-L修復(fù)

xfs_repair -L /dev/vdb

lot of errors!

Timeline of the Problem:問題的時間表

- everything went fine I installing a new? virtual Fileserver

- The Host has a 3Ware Controller in:

I have a 3Ware 9690SA-8I Controller with 4 x 2TB Disks ( RAID 10 for data ) and 2 x 320GB ( for OS ).

Then I do a reboot to clean the system and checks if all OK. There one Disks disappear from the RAID 10. Most likly because I don't set it to fix Link Speed = 1.5 Gbps. Then I rebuild the array but I couldn't mount it because of Metadata Problems !

I also see? the message:

Aug 15 20:30:05 scheat kernel: Filesystem "vdb": Disabling barriers, trial barrier write failed

Does this filesystem Problems only happen because of the disapperd Disk and the wrong Link Speed(是否僅由于缺少磁盤和錯誤的鏈接速度而導(dǎo)致此文件系統(tǒng)出現(xiàn)問題) ? or do I need to change something other ?

thanks for help

The array controller should be taking care? of any data integrity problems.磁盤陣列控制器應(yīng)注意任何數(shù)據(jù)的完整性問題饿序。

原理篇

Q: What is the problem with the write cache on journaled filesystems?

https://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F

Many drives use a write back cache in order to speed up the performance of writes. However, there are conditions such as power failure when the write cache memory is never flushed to the actual disk. Further, the drive can destage data from the write cache to the platters in any order that it chooses. This causes problems for XFS and journaled filesystems in general because they rely on knowing when a write has completed to the disk通常這會導(dǎo)致XFS和日記文件系統(tǒng)出現(xiàn)問題因為它們依賴于知道何時完成對磁盤的寫入. They need to know that the log information has made it to disk before allowing metadata to go to disk它們需要知道日志信息在允許元數(shù)據(jù)進(jìn)入磁盤之前已進(jìn)入磁盤. When the metadata makes it to disk then the transaction can effectively be deleted from the log resulting in movement of the tail of the log and thus freeing up some log space當(dāng)元數(shù)據(jù)放入磁盤時則可以有效地從日志中刪除事務(wù)勉失，從而移動日志尾部，從而釋放一些日志空間. So if the writes never make it to the physical disk, then the ordering is violated and the log and metadata can be lost, resulting in filesystem corruption因此如果從未寫入物理磁盤原探，則將違反順序并且日志和元數(shù)據(jù)可能會丟失乱凿，從而導(dǎo)致文件系統(tǒng)損壞。.

With hard disk cache sizes of currently (Jan 2009) up to 32MB that can be a lot of valuable information. In a RAID with 8 such disks these adds to 256MB, and the chance of having filesystem metadata in the cache is so high that you have a very high chance of big data losses on a power outage.總結(jié)一句話：硬盤緩存越大則丟數(shù)據(jù)的可能性越大咽弦。當(dāng)前（2009年1月）的硬盤緩存大小最大為32MB徒蟆，這可能是很多有價值的信息。在8個此類磁盤的RAID中型型，硬盤緩存增加到256MB段审，這樣的話，在高速緩存中有文件系統(tǒng)元數(shù)據(jù)的機(jī)會非常高闹蒜，以至于停電時很有可能造成大量數(shù)據(jù)丟失寺枉。

With a single hard disk and barriers turned on (on=default), the drive write cache is flushed before and after a barrier is issued. A powerfail "only" loses data in the cache but no essential ordering is violated, and corruption will not occur.在單個硬盤和barriers打開的情況下（on = default），在barrier解決前后都會刷新驅(qū)動器寫緩存绷落。電源故障“僅”會丟失高速緩存中的數(shù)據(jù)姥闪，但不會違反基本順序，也不會發(fā)生損壞砌烁。

With a RAID controller with battery backed controller cache and cache in write back mode, you should turn off barriers - they are unnecessary in this case, and if the controller honors the cache flushes, it will be harmful to performance. But then you *must* disable the individual hard disk write cache in order to ensure to keep the filesystem intact after a power failure. The method for doing this is different for each RAID controller. See the section about RAID controllers below.對于具有后備電池的控制器緩存和緩存處于回寫模式的RAID控制器筐喳，在這種情況下應(yīng)該關(guān)閉barriers，它們是不必要的，并且如果控制器采用高速緩存刷新功能疏唾，則將對性能造成危害。但是函似，你必須*禁用單個硬盤寫緩存槐脏，以確保斷電后保持文件系統(tǒng)完整。每個RAID控制器對這個的處理方法不同撇寞。請參閱下面有關(guān)RAID控制器的部分顿天。

問題清楚了

Thats clear, I already mention that the? maybe the Controller trigger the Problem.

But this night I get another XFS internal? error during a rsync Job:

----Once again, that is not directory block data that is being dumped there. It looks like a? partial path name ("/Pm.Reduzieren/S")? which tends to indicate that the directory? read has returned uninitialisd data.這不是轉(zhuǎn)儲在那里的目錄塊數(shù)據(jù)∶锏＃看起來像部分路徑名（“ /Pm.Reduzieren/S”）牌废，傾向于讀取的目錄已返回未初始化的數(shù)據(jù)。

Did the filesystem repair cleanly? if you run xfs_repair a second time, did it find more? errors or was it clean? i.e. is this still? corruption left over from the original? incident, or is it new corruption?文件系統(tǒng)修復(fù)干凈了嗎啤握？如果第二次運行xfs_repair鸟缕，它是否發(fā)現(xiàn)了更多錯誤還是沒有？是從原始事件遺留的損壞排抬，還是新的損壞懂从？

----The filesystem repair did work fine,? all was Ok. the second was a new Problem.

LSI / 3 Ware now replace the Controller and the BBU Board and also the Battery, because they don't now what's happen.

There where no problem on the Host.

I now disable the write Cache according? the faq: /cX/uX set cache=off

tw_cli /c6/u1 show all

tw_cli /c6/u1 set cache=off

But not sure howto disable the individual Harddisk Cache.

最后的面紗

File system errors can be a little tricky to narrow down. In some of the more rare cases a drive might be writing out bad data. However, per the logs I didn’t see any indication of a drive problem and not one has reallocated a sector. I see that all four are running at the 1.5Gb/s Link Speed now.要減小文件系統(tǒng)錯誤，可能會有些棘手蹲蒲。在某些較罕見的情況下番甩，驅(qū)動器可能會寫出不良數(shù)據(jù)。然而根據(jù)日志届搁，沒有看到任何驅(qū)動器問題的跡象缘薛，也沒有重新分配了扇區(qū)。我看到4個文件系統(tǒng)都以1.5Gb / s的鏈接速度運行卡睦。

Sometimes the problem can be traced back to the controller and/or the BBU. I did notice something pretty interesting in the driver message log and the controller’s advanced diagnostic.有時問題可以追溯到控制器或BBU宴胧。我確實在驅(qū)動程序消息日志和控制器的高級診斷中發(fā)現(xiàn)了一些非常有趣的東西。

According to the driver message log, the last Health Check [capacity test] was done on Aug 10th:驅(qū)動消息日志中最后一次健康檢查操作在8月10號

Aug 10 21:40:35 enif kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x0051): Battery health check started:.

However, the controller’s advanced log shows this:然后控制器的高級日志顯示如下

/c6/bbu Last Capacity Test? ? ? ? = 10-Jul-2010

There is an issue between controller and BBU and we need to understand which component is at issue. If this is a live server you may want to replace both components. Or if you can perform some troubleshooting, power the system down and remove the BBU and its daughter PCB from the RAID controller. Then ensure the write cache setting remains enabled and see if there’s a reoccurrence. If so the controller is bad. If not it’s the BBU that we need to replace.這是一個在控制器和BBU之間的問題么翰，我們需要理解問題所在的組件模塊牺汤。

Just for Information，the Problem was a? Bug in the virtio driver with disks over 2 TB !

Bug605757 - 2tb virtio disk gets massively corrupted filesystems

*** This bug has been marked as a duplicate of bug 605757 ***