最近遇到一個使用hbck -repair修復(fù)HBase元數(shù)據(jù)遇到的坑,被坑得不清测秸,周末都沒休息好匀归,時間全花在給hbck填坑了舌镶。今天抽時間在測試環(huán)境重現(xiàn)了這個bug阱扬,總算把整個流程搞明白了幽污,記錄下來框咙,分享給技術(shù)的朋友們咕痛。
詳細(xì)的排查過程如下:
一、問題出現(xiàn)
最近備集群春節(jié)后縮容下架大量的機(jī)器喇嘱,由于緩存問題導(dǎo)致集群不穩(wěn)定(這個問題后面會專門寫文章來分析)茉贡,因此在22號凌晨的時候?qū)浞菁鹤隽酥貑ⅲ貑⒑蠹赫M癯啤5堑诙炜槠停瑐浼汗芾淼耐路答佌f接口成功率比較低,日志反饋是有region沒上線引起王暗,95c85b3886f19a0637a3b8b7749755e1 not deployed on any region server
負(fù)責(zé)的同事使用如下hbase shell命令進(jìn)行上線操作:
assign '95c85b3886f19a0637a3b8b7749755e1'
但是無法上線成功悔据,于是聯(lián)系我上來一起跟進(jìn)。
二俗壹、開始排查
1科汗、首先查看master的日志,看看有什么報錯信息
具體信息如下日志所示:
master.HMaster: Client=hbaseadmin//ip_address1 assign tablename1_201707,745,1553316101646.95c85b3886f19a0637a3b8b7749755e1.
master.AssignmentManager: Force region state offline {95c85b3886f19a0637a3b8b7749755e1 state=OFFLINE, ts=1553400041143,server=null}
master.AssignmentManager: Assigning tablename1_201707,745,1553316101646.95c85b3886f19a0637a3b8b7749755e1. to 9.7.155.40,60020,1553187654611
2019-03-24 12:00:41,146 INFO [FifoRpcScheduler.handler1-thread-80] master.RegionStates: Transition {95c85b3886f19a0637a3b8b7749755e1 state=OFFLINE, ts=1553400041144, server=null} to {95c85b3886f19a06
37a3b8b7749755e1 state=PENDING_OPEN, ts=1553400041146, server=9ip_address2,60020,1553187654611}
.......此處省略1000個字
zookeeper.ZKAssign: master:60000-0x162d87d74bf0b8a, quorum=cft-history-db-ss-hbase-zk-4:2181,cft-history-db-ss-hbase-zk-3:2181,cft-history-db-ss
-hbase-zk-2:2181,cft-history-db-ss-hbase-zk-1:2181,cft-history-db-ss-hbase-zk-5:2181, baseZNode=/hbase_cft_history_db_ss Creating (or updating) unassigned node 95c85b3886f19a0637a3b8b7749755e1 with OFFLINE state
從上面的日志中可以看出绷雏,這個region嘗試在9ip_address2這臺regionserver上進(jìn)行上線操作头滔,但是不成功,后面這個region又嘗試在多臺機(jī)器上嘗試上線涎显,均失敗了坤检。從這些日志中無法快速定位出region無法上線的原因。
2期吓、查看regionserver的日志找到region不能上線的直接原因
繼續(xù)查看9ip_address2這臺regionserver的日志早歇,發(fā)現(xiàn)了region不能上線的原因,具體日志如下:
ERROR [RS_OPEN_REGION-ip_address4:60020-6] regionserver.HRegion: Could not initialize all stores for the region=tablename1_201707,745,1553316101646.95c85b3886f19a0637a3b8b7749755e1.
ERROR [RS_OPEN_REGION-ip_address4:60020-6] handler.OpenRegionHandler: Failed open of region=tablename1_201707,745,1553316101646.95c85b3886f19a0637a3b8b7749755e1., starting to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/data/default/tablename1_201707/7c89f0da4b54a4cc168fe2284e276bb3/i/0a0bed4104a24af280c69e6b67b9501c_SeqId_2_
從上面日志可以看出,95c85b3886f19a0637a3b8b7749755e1上線的時候會去讀取7c89f0da4b54a4cc168fe2284e276bb3下面的0a0bed4104a24af280c69e6b67b9501c_SeqId_2_這個hfile文件箭跳,而這個文件不存在晨另,導(dǎo)致失敗。
在HBase中谱姓,一般是region有做分裂后借尿,又還沒來得及做major_compact,導(dǎo)致子region中存在指向鏈接文件屉来,這樣才會去讀取另外一個region的信息路翻,使用hdfs命令直接查看該region下面的HFile文件,證實(shí)猜想是正確的奶躯,如下文件列表所示:
/hbase/data/default/tablename1_201707/95c85b3886f19a0637a3b8b7749755e1/i/0a0bed4104a24af280c69e6b67b9501c_SeqId_2_
/hbase/data/default/tablename1_201707/95c85b3886f19a0637a3b8b7749755e1/i/0a0bed4104a24af280c69e6b67b9501c_SeqId_2_.7c89f0da4b54a4cc168fe2284e276bb3
這個就是鏈接文件了
備注:為了控制篇幅帚桩,這里只顯示兩個相關(guān)的文件,并把文件的權(quán)限嘹黔、屬組和屬主都去掉了账嚎,另外把IP和表名相關(guān)信息也替換掉了。
但是這又著實(shí)很奇怪儡蔓,region分裂后郭蕉,子region的鏈接文件的命名規(guī)則如下:
但是,仔細(xì)看上面的HFile的鏈接文件和文件就會發(fā)現(xiàn)喂江,鏈接文件的前半部分的父region的文件名和本Region的文件名一模一樣召锈。這太奇怪了,正常是不可能文件名一模一樣的获询。
3涨岁、跟蹤父region的變動情況
于是,繼續(xù)通過查看master日志跟蹤7c89f0da4b54a4cc168fe2284e276bb3這個父region的相關(guān)分裂日志吉嚣,如下:
Handled SPLIT event; parent=tablename1_201707,745,1498621284887.7c89f0da4b54a4cc168fe2284e276bb3., daughter a=tablename1_201707,745,1552927092514.ad1d42c37c6f13d9bccc9944fd8740ea., daughter b=tablename1_201707,745498567#2017-07-2710:00:17#1991763146,1552927092514.c35511f97dccac11ea9de6aebf467cb4., on 9.7.157.9,60020,1534514346802
從日志中可以看出梢薪,7c89f0da4b54a4cc168fe2284e276bb3通過split操作分裂為了ad1d42c37c6f13d9bccc9944fd8740ea和c35511f97dccac11ea9de6aebf467cb4兩個region,根本就沒有95c85b3886f19a0637a3b8b7749755e1什么事尝哆。繼續(xù)查看日志秉撇,發(fā)現(xiàn)如下異常日志:
master.HMaster: Client=hbaseadmin//ip_address3 offline tablename1_201707,745,1498621284887.7c89f0da4b54a4cc168fe2284e276bb3.
#本來是下線狀態(tài)的母region又被下線了一次
master.CatalogJanitor: Daughter region does not exist: ad1d42c37c6f13d9bccc9944fd8740ea, parent is: 7c89f0da4b54a4cc168fe2284e276bb3
#子region不知道被清理掉了
regionserver.HRegionFileSystem: Trying to open a region that do not exists on disk: hdfs://hbase-ns-cft-history-db-ss/hbase/data/default/tablename1_201707/c35511f97dccac11ea9de6aebf467cb4
master.CatalogJanitor: Daughter region does not exist: c35511f97dccac11ea9de6aebf467cb4, parent is: 7c89f0da4b54a4cc168fe2284e276bb3
#另一個子region也不知道被清理掉了
master.CatalogJanitor: Deleting region tablename1_201707,745,1498621284887.7c89f0da4b54a4cc168fe2284e276bb3. because daughter splits no longer hold references
Region directory (hdfs://cluster1/hbase/data/default/tablename1_201707/7c89f0da4b54a4cc168fe2284e276bb3) was empty, just deleting and returning!
catalog.MetaEditor: Deleted tablename1_201707,745,1498621284887.7c89f0da4b54a4cc168fe2284e276bb3.
#master檢查到母region上沒有任何的鏈接文件后,直接清理掉了秋泄。
備注:這里日志去掉了時間和部分信息琐馆,只保留了核心的日志!
從上面日志看出7c89f0da4b54a4cc168fe2284e276bb3被來自ip_address3的IP強(qiáng)行下線恒序,并且master發(fā)現(xiàn)這個region的兩個子region的文件也不見了瘦麸。master發(fā)現(xiàn)母region沒有了引用鏈接后,也被清理掉了歧胁。因此前面95c85b3886f19a0637a3b8b7749755e1去讀取鏈接中指向的母region的HFile文件不存在滋饲,導(dǎo)致region上線異常彤敛。
三、排查和解決問題
此時了赌,有點(diǎn)一頭霧水,但是有個關(guān)鍵的點(diǎn)是c89f0da4b54a4cc168fe2284e276bb3是直接被ip_address3這個IP的客戶端下線的玄糟,一定是有什么操作觸發(fā)了異常的操作勿她。于是找管理備集群的同事確認(rèn),集群管理的同事反饋沒有執(zhí)行任何下線的操作阵翎,只是執(zhí)行了hbck -repair操作逢并,-repair 打開所有的修復(fù)選項(xiàng),相當(dāng)于-fixAssignments -fixMeta -fixHdfsHoles -fixHdfsOrphans -fixHdfsOverlaps -fixVersionFile -sidelineBigOverlaps郭卫。初步懷疑是hbck執(zhí)行repair的時候做了“壞事”砍聊。
想快速修復(fù)線上問題,想首先確認(rèn)一下95c85b3886f19a0637a3b8b7749755e1這個region是什么時候創(chuàng)建的贰军,搜索了master的日志玻蝌,發(fā)現(xiàn)沒有任何的記錄。繼續(xù)觀察95c85b3886f19a0637a3b8b7749755e1這個region中的文件词疼,由于鏈接文件和region中的HFile文件名稱相同俯树,并且HFile文件創(chuàng)建的時候還是2017年的,想到一個快速上線region的思路贰盗。
本來Region的鏈接文件就是去讀取這個文件许饿,這個文件在自己的region下面,那就不需要通過鏈接文件去讀取了舵盈。因此解決辦法就是講鏈接文件移走陋率,在上線即可,操作如下:
hadoop fs -mv link_filename backupdir
assign 'regionname'
操作完成后秽晚,接口成功率就上去了:
至此瓦糟,問題得到解決,但是還是有三個地方有疑惑:
1爆惧、95c85b3886f19a0637a3b8b7749755e1是什么時候創(chuàng)建的狸页?(初步懷疑是hbck搞的鬼,待驗(yàn)證)
2扯再、95c85b3886f19a0637a3b8b7749755e1上的HFile文件是怎么來的芍耘?(初步猜想是hbck從其他三個region拷貝過來的)
3、為什么hbck會去下線正常的子region熄阻?
這些如果能找到hbck -repair的日志也能確定具體hbck干了什么“壞事”斋竞,可惜集群管理的同事使用的是命令>號直接覆蓋掉了日志,導(dǎo)致無法確定具體的原因秃殉。因此只能通過測試環(huán)境來重現(xiàn)該問題坝初。
四浸剩、測試重現(xiàn),找到罪魁禍?zhǔn)?/h1>
通過之前的排查鳄袍,發(fā)現(xiàn)幾個表都是數(shù)據(jù)量比較大绢要,分裂后,還沒做完major_compact拗小,使用hbck -repair操作會觸發(fā)類型的問題重罪。因此在測試環(huán)境構(gòu)造對應(yīng)環(huán)境,我的測試環(huán)境有兩個region哀九,如下截圖所示:
重現(xiàn)步驟如下:
1剿配、對第一個region進(jìn)行split操作,操作命令如下:
split 'fb1c59eb558dcae821e7409a81642b02'
2阅束、分裂成功并且子region上線成功呼胚,日志如下:
SPLIT event; parent=table_name2_2015,,1553513856271.fb1c59eb558dcae821e7409a81642b02., daughter a=table_name2_2015,,1553514657067.54c8d09ca7d5fa3ee549cebf118045f5.daughter b=table_name2_2015,34992944,1553514657067.b32e72a1179edb587c9768621f54bb2e., on datanode3,60020,1552573473932
從上面的日志可以看出fb1c59eb558dcae821e7409a81642b02成功分裂為了54c8d09ca7d5fa3ee549cebf118045f5和b32e72a1179edb587c9768621f54bb2e這兩個region,并且在datanode3,60020,1552573473932這個regionserver上上線成功餓了息裸。
3蝇更、趁還沒做major_compact的時候發(fā)起hbck -repair的操作,命令如下:
hbck -repair? >>repair.log 2>&1
4呼盆、查看master的日志簿寂,看看異常情況,發(fā)現(xiàn)確實(shí)是hbck把三個region都下線了宿亡,日志如下所示:
Client=hbaseadmin//ip_addr5 offline table_name2_2015,,1553514657067.54c8d09ca7d5fa3ee549cebf118045f5.
Transition {54c8d09ca7d5fa3ee549cebf118045f5 state=OPEN, ts=1553514722000,server=datanode3,60020,1553514716244} to {54c8d09ca7d5fa3ee549cebf118045f5 state=OFFLINE, ts=1553514763802, server=datanode3,60020,1553514716244}
Client=hbaseadmin//ip_addr5 offline table_name2_2015,,1553513856271.fb1c59eb558dcae821e7409a81642b02.
Transition {fb1c59eb558dcae821e7409a81642b02 state=SPLIT, ts=1553514721575, server=null} to {fb1c59eb558dcae821e7409a81642b02 state=OFFLINE, ts=1553514763815, server=null}
Client=hbaseadmin//ip_addr5 offline table_name2_2015,34992944,1553514657067.b32e72a1179edb587c9768621f54bb2e.
Transition {b32e72a1179edb587c9768621f54bb2e state=OPEN,ts=1553514722034,server=datanode3,60020,1553514716244} to {b32e72a1179edb587c9768621f54bb2e state=OFFLINE, ts=1553514764826, server=datanode3,60020,1553514716244}
備注:這里對日志做了下優(yōu)化常遂,去掉了時間和部分內(nèi)容,只保留核心內(nèi)容
5挽荠、此時頁面上也由原來的3個region變成了孤零零的1個region克胳,如下圖所示:
6、再分析hbck -repair的日志圈匆,終于發(fā)現(xiàn)hbck干的“壞事”了漠另,具體日志如下:
ERROR: (region table_name2_2015,,1553514657067.54c8d09ca7d5fa3ee549cebf118045f5.) Multiple regions have the same startkey:
ERROR: (region table_name2_2015,,1553513856271.fb1c59eb558dcae821e7409a81642b02.) Multiple regions have the same startkey:
ERROR: (regions table_name2_2015,,1553513856271.fb1c59eb558dcae821e7409a81642b02. and table_name2_2015,34992944,1553514657067.b32e72a1179edb587c9768621f54bb2e.) There is an overlap in the region chain.
#執(zhí)行merge region的操作,景三個region合并成一個region
util.HBaseFsck: == [hbasefsck-pool1-t26] Merging regions into one region: { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5, deployed =>replicaId => 0 },{meta=>null,hdfs=>hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02, deployed => , replicaId => 0 },{ meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e, deployed => , replicaId => 0 }
#下線三個region
[hbasefsck-pool1-t26] Offlining region: { meta => null, hdfs =>hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5, deployed => , replicaId => 0 }
util.HBaseFsck: [hbasefsck-pool1-t26] Offlining region: { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02, deployed => , replicaId => 0 }
util.HBaseFsck: [hbasefsck-pool1-t26] Offlining region: { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e, deployed => , replicaId => 0 }
#創(chuàng)建了一個新region
Created new empty container region: {ENCODED => ed8942007ba8404039022851cc77db6a, NAME => 'table_name2_2015,,1553514764826.ed8942007ba8404039022851cc77db6a.', STARTKEY => '', ENDKEY => '700'} to contain regions: { meta => null, hdfs =>hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5, deployed => , replicaId => 0 },{ meta => null, hdfs =>hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02, deployed => , replicaId => 0 },{ meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e, deployed => , replicaId => 0 }
[hbasefsck-pool1-t26] Merging { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5, deployed => , replicaId => 0 } into hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a
#將三個region的HFile文件move到新的region目錄下
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5/.tmp into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/.tmp
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5/i into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/i
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5/recovered.edits into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/recovered.edits
util.HBaseFsck: Sidelining files from hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5/.tmp into containing region hdfs://slave:9000/hbase/.hbck/hbase-1553514761533/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5/.tmp
util.HBaseFsck: Removing old region dir: hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5
util.HBaseFsck: [hbasefsck-pool1-t26] Sidelined region dir hdfs://slave:9000/hbase/data/default/table_name2_2015/54c8d09ca7d5fa3ee549cebf118045f5 into hdfs://slave:9000/hbase/.hbck/hbase-1553514761533
util.HBaseFsck: [hbasefsck-pool1-t26] Merging { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02, deployed => , replicaId => 0 } into hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02/.splits into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/.splits
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02/.tmp into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/.tmp
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02/i into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/i
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02/recovered.edits into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/recovered.edits
util.HBaseFsck: Removing old region dir: hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02
util.HBaseFsck: [hbasefsck-pool1-t26] Sidelined region dir hdfs://slave:9000/hbase/data/default/table_name2_2015/fb1c59eb558dcae821e7409a81642b02 into hdfs://slave:9000/hbase/.hbck/hbase-1553514761533
util.HBaseFsck: [hbasefsck-pool1-t26] Merging { meta => null, hdfs => hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e, deployed => , replicaId => 0 } into hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e/.tmp into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/.tmp
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e/i into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/i
util.HBaseFsck: [hbasefsck-pool1-t26] Moving files from hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e/recovered.edits into containing region hdfs://slave:9000/hbase/data/default/table_name2_2015/ed8942007ba8404039022851cc77db6a/recovered.edits
util.HBaseFsck: Removing old region dir: hdfs://slave:9000/hbase/data/default/table_name2_2015/b32e72a1179edb587c9768621f54bb2e
INFO [main] util.HBaseFsck: Loading HBase regioninfo from HDFS...
至此跃赚,我們有已經(jīng)知道了hbck -repair干的壞事和我們之前猜測的差不多笆搓,主要問題是hbck在修復(fù)的時候?qū)σ呀?jīng)split的region但是還沒有來得及做大合并的region判斷有問題,會將三個region(1個母region和2個子region)進(jìn)行下線纬傲,新創(chuàng)建一個region满败,并把三個region涉及到的所有文件都移動到新的region中,不會導(dǎo)致數(shù)據(jù)的丟失叹括,但是會導(dǎo)致region無法上線算墨,影響業(yè)務(wù)。
五汁雷、代碼分析
稍后再補(bǔ)充
備注:hbase 0.98 和 hbase 1.26都有此問題