022 HBase 壓縮和數(shù)據(jù)局部性

022 HBase Compaction and Data Locality in Hadoop

1. HBase Compaction and Data Locality With Hadoop

1. HBase 壓實(shí)和數(shù)據(jù)的位置與 Hadoop

In this Hadoop HBase tutorial of HBase Compaction and Data Locality with Hadoop, we will learn the whole concept of Minor and Major Compaction in HBase, a process by which HBase cleans itself in detail. Also, we will see Data Locality with Hadoop Compaction because data locality is a solution to data not being available to Mapper.
So, let’s start HBase Compaction and Data Locality in Hadoop.

HBase 的 Hadoop 教程 HBase 的壓實(shí)和地點(diǎn)與 Hadoop 數(shù)據(jù)透敌,我們將未成年人學(xué) ”的概念和主要的壓實(shí)HBase,HBase 詳細(xì)清理自身的過(guò)程.此外盯滚,我們將看到數(shù)據(jù)的位置Hadoop因?yàn)閿?shù)據(jù)局部性是 Mapper 無(wú)法使用的數(shù)據(jù)的解決方案.
所以,讓我們?cè)?Hadoop 中開始 Compaction 壓縮和數(shù)據(jù)本地化.

HBase Compaction and Data Locality in Hadoop

2. What is HBase Compaction?

2. HBase 的壓實(shí)酗电?

As we know, for read performance, HBase is an optimized distributed data store. But this optimal read performance needs one file per column family. Although, during the heavy writes, it is not always possible to have one file per column family. Hence, to reduce the maximum number of disk seeks needed for read, HBase tries to combine all HFiles into a large single HFile. So, this process is what we call Compaction.
Do you know about HBase Architecture
In other words, Compaction in HBase is a process by which HBase cleans itself, whereas this process is of two types: Minor HBase Compaction as well as Major HBase Compaction.

在讀取性能方面淌山,HBase 是一個(gè)經(jīng)過(guò)優(yōu)化的分布式數(shù)據(jù)存儲(chǔ).但是,這種最佳讀取性能需要每個(gè)列族一個(gè)文件.雖然顾瞻,在大量寫入期間,每個(gè)列族不總是可以有一個(gè)文件.因此荷荤,為了減少讀取所需的最大磁盤查找次數(shù)退渗,HBase 嘗試將所有 HFiles 組合成一個(gè)大的單個(gè)HFile.這個(gè)過(guò)程就是我們所說(shuō)的壓實(shí).
了解 HBase 架構(gòu)嗎:
換句話說(shuō),Compaction 中的壓縮是 HBase 清理自己的過(guò)程蕴纳,而這個(gè)過(guò)程有兩種類型: 輕微的 HBase 壓縮和主要的 HBase 壓縮.

a. HBase Minor Compaction

A.HBase 輕微的壓實(shí)

The process of combining the configurable number of smaller HFiles into one large HFile is what we call Minor compaction. Though, it is quite important since, reading particular rows needs many disk reads and may reduce overall performance, without it.
Here are the several processes which involve in HBase Minor Compaction, are:

將可配置數(shù)量較小的 HFile 合并成一個(gè)大 HFile 的過(guò)程就是我們所說(shuō)的小壓縮.盡管如此会油,這一點(diǎn)非常重要,因?yàn)樽x取特定行需要許多磁盤讀取古毛,如果沒有它翻翩,可能會(huì)降低整體性能.
以下是 HBase 次要壓縮中涉及的幾個(gè)過(guò)程:

  1. By combining smaller Hfiles, it creates bigger Hfile.
  2. Also, Hfile stores the deleted file along with it.
  3. To store more data increases space in memory.
  4. Uses merge sorting.

[圖片上傳中...(image-cb1f06-1564812083800-3)]

HBase Compaction

b. HBase Major compaction

B.HBase 主要壓實(shí)

Whereas, a process of combining the StoreFiles of regions into a single StoreFile, is what we call HBase Major Compaction. Also, it deletes remove and expired versions. As a process, it merges all StoreFiles into single StoreFile and also runs every 24 hours. However, the region will split into new regions after compaction, if the new larger StoreFile is greater than a certain size (defined by property).
Have a look at HBase Commands
Well, the HBase Major Compaction in HBase is the other way to go around:

然而都许,將區(qū)域的存儲(chǔ)文件組合成一個(gè)存儲(chǔ)文件的過(guò)程,就是我們所說(shuō)的 HBase 主要壓縮.此外嫂冻,它會(huì)刪除刪除和過(guò)期的版本.作為一個(gè)過(guò)程胶征,它將所有存儲(chǔ)文件合并到單個(gè)存儲(chǔ)文件中,并且每 24 小時(shí)運(yùn)行一次.然而桨仿,如果新的更大的存儲(chǔ)文件大于某個(gè)大小 (由屬性定義)睛低,則該區(qū)域?qū)⒃趬嚎s后拆分為新區(qū)域.
看看 HBase 命令
嗯,HBase 中的 HBase 主要壓縮是另一種方法:

  1. Data present per column family in one region is accumulated to 1 Hfile.
  2. All deleted files or expired cells are deleted permanently, during this process.
  3. Increase read performance of newly created Hfile.
  4. It accepts lots of I/O.
  5. Possibilities for traffic congestion.
  6. The other name of major compaction process is Write amplification Process.
  7. And it is must schedule this process at a minimum bandwidth of network I/O.

HBase Major Compaction

<form action="https://data-flair.training:443/blogs/hbase-compaction/" method="post" class="wpcf7-form" novalidate="">

Get the most demanding skills of IT Industry - Learn Hadoop
<input type="email" name="your-email" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-email wpcf7-validates-as-required wpcf7-validates-as-email" aria-required="true" aria-invalid="false" placeholder="Email"> <input type="tel" name="your-phone" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-tel wpcf7-validates-as-required wpcf7-validates-as-tel" aria-required="true" aria-invalid="false" placeholder="Phone with country code"> <input type="submit" value="Submit" class="wpcf7-form-control wpcf7-submit">

獲得 IT 行業(yè)最苛刻的技能-學(xué)習(xí) Hadoop
<input type="email" name="your-email" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-email wpcf7-validates-as-required wpcf7-validates-as-email" aria-required="true" aria-invalid="false" placeholder="Email"> <input type="tel" name="your-phone" value="" size="40" class="wpcf7-form-control wpcf7-text wpcf7-tel wpcf7-validates-as-required wpcf7-validates-as-tel" aria-required="true" aria-invalid="false" placeholder="Phone with country code"> <input type="submit" value="Submit" class="wpcf7-form-control wpcf7-submit">

</form>

3. HBase Compaction Tuning

HBase 3. 壓縮調(diào)整

a. Short Description of HBase Compaction:

A.HBase 壓實(shí)的簡(jiǎn)要說(shuō)明:

Now, to enhance performance and stability of the HBase cluster, we can use some hidden HBase compaction configuration like below.

現(xiàn)在服傍,為了增強(qiáng) HBase 集群的性能和穩(wěn)定性,我們可以使用一些隱藏的 compaction 壓縮配置罩抗,如下所示.

b. Disabling Automatic Major Compactions in HBase

在 HBase 中禁用自動(dòng)主要操作

Generally, HBase users ask to possess a full management of major compaction events hence the method to do that is by setting** HBase.hregion.majorcompaction** to 0, disable periodic automatic major compactions in HBase.
However, it does not offer 100% management of major compactions, yet, by HBase automatically, minor compactions can be promoted to major ones, sometimes, although, we’ve got another configuration choice, luckily, that will help during this case.
Let’s take a tour to HBase Operations.

通常灿椅,HBase 用戶要求對(duì)主要壓實(shí)事件進(jìn)行全面管理澄暮,因此實(shí)現(xiàn)這一點(diǎn)的方法是通過(guò)設(shè)置HBase.hregion.majorcompaction到 0阱扬,在 HBase 中禁用定期自動(dòng)主要操作.
然而泣懊,它并沒有提供 100% 的主要 compactions 管理,但是麻惶,通過(guò) HBase 自動(dòng)地馍刮,次要的 compactions 可以被提升到主要的 compactions,盡管有時(shí),幸運(yùn)的是, 在這種情況下亡脸,這將會(huì)有所幫助.
讓我們來(lái)了解一下 HBase 操作.

c. Maximum HBase Compaction Selection Size

最大 Compaction 壓縮選擇大小

Control compaction process in HBase is another option:
hbase.hstore.compaction.max.size (by default value is set to LONG.MAX_VALUE)
In HBase 1.2+ we have as well:
hbase.hstore.compaction.max.size.offpeak

HBase 中的控制壓縮過(guò)程是另一個(gè)選擇:
Compaction.hstore.compaction.max.size (默認(rèn)設(shè)置為 LONG.max _ value)
在 HBase 1.2 + 中,我們也有:
Compaction.hstore.compaction.最大尺寸.非峰值

d. Off-peak Compactions in HBase

HBase 中的非高峰競(jìng)爭(zhēng)

Further, we can use off-peak configuration settings, if our deployment has off-peak hours.
Here are HBase Compaction Configuration options must set, to enable off peak compaction:
hbase.offpeak.start.hour= 0..23
hbase.offpeak.end.hour= 0..23
Compaction file ratio for off peak 5.0 (by default) or for peak hours is 1.2.
Both can be changed:
hbase.hstore.compaction.ratio
hbase.hstore.compaction.ratio.offpeak
As much high the file ratio value will be, the more will be the aggressive (frequent) compaction. So, for the majority of deployments, default values are fine.

此外滥朱,如果我們的部署有非高峰時(shí)間鹃栽,我們可以使用非高峰配置設(shè)置.
這里是HBase 壓縮配置必須設(shè)置選項(xiàng)蓬抄,以啟用非峰值壓實(shí):
Hbase.offpeak.開始.小時(shí) = 月..23
Hbase.offpeak..小時(shí) = 月..23
非峰值 5.0 (默認(rèn)情況下) 或峰值時(shí)間的壓實(shí)文件比率為 1.2.
兩者都可以改變:
Compaction.hstore.壓實(shí).比
Compaction.hstore.壓實(shí).比.
文件比率值越高饮亏,積極 (頻繁) 的壓縮就越多.因此付翁,對(duì)于大多數(shù)部署來(lái)說(shuō)佣渴,默認(rèn)值是可以的.

4. Data Locality in Hadoop

4. Hadoop 數(shù)據(jù)位置

As we know, in Hadoop, Datasets is stored in HDFS. Basically, it is divided into blocks as well as stored among the data nodes in a Hadoop cluster. Though, the individual Mappers will process the blocks (input splits), while a MapReduce job is executed against the dataset. However, data has to copy over the network from the data node that has data to the data node that is executing the Mapper task, when data is not available for Mapper in the same node. So, it is what we call data Locality in Hadoop.
You can learn more about Data Locality in Hadoop
In Hadoop, there are 3 categories of Data Locality, such as:

我們知道瀑粥,在 Hadoop 中狞换,數(shù)據(jù)集存儲(chǔ)在HDFS.基本上查库,它被分成塊,并存儲(chǔ)在Hadoop 集群.然而黄琼,單個(gè)地圖繪制程序?qū)⑻幚磉@些塊 (輸入拆分)樊销,而 MapReduce 作業(yè)對(duì)數(shù)據(jù)集執(zhí)行.然而,當(dāng)數(shù)據(jù)在同一節(jié)點(diǎn)中不可用于 Mapper 時(shí)脏款,數(shù)據(jù)必須通過(guò)網(wǎng)絡(luò)從具有數(shù)據(jù)的數(shù)據(jù)節(jié)點(diǎn)復(fù)制到執(zhí)行 Mapper 任務(wù)的數(shù)據(jù)節(jié)點(diǎn).這就是我們所說(shuō)的 Hadoop 中的數(shù)據(jù)局部性.
您可以在 Hadoop 中了解更多關(guān)于數(shù)據(jù)局部性的信息
在 Hadoop 中围苫,有 3 類數(shù)據(jù)局部性,如:

Data Locality in Hadoop

1. Data Local Data Locality

1. 數(shù)據(jù)本地?cái)?shù)據(jù)的地方

Data local data locality is when data is located on the same node as the mapper working on the data. In this case, the proximity of data is very near to computation. Basically, it is the highly preferable option.

數(shù)據(jù)本地性是當(dāng)數(shù)據(jù)位于與處理數(shù)據(jù)的映射程序相同的節(jié)點(diǎn)上時(shí).在這種情況下撤师,數(shù)據(jù)的接近性非常接近于計(jì)算.基本上剂府,這是一個(gè)非常好的選擇.

2. Intra-Rack Data Locality

2. 所在地內(nèi)部資料架子

However, because of resource constraints, it is always not possible to execute the Mapper on the same node. Hence at that time, the Mapper executes on another node within the same rack as the node that has data. So, this is what we call intra-rack data locality.

然而,由于資源限制剃盾,在同一個(gè)節(jié)點(diǎn)上執(zhí)行映射程序總是不可能的.因此腺占,在那個(gè)時(shí)候,映射程序在與有數(shù)據(jù)的節(jié)點(diǎn)位于同一機(jī)架內(nèi)的另一個(gè)節(jié)點(diǎn)上執(zhí)行.這就是我們所說(shuō)的機(jī)架內(nèi)數(shù)據(jù)局部性.

3. Inter-Rack Data Locality

3. 間架資料地點(diǎn)

Well, there is a case when we are not able to achieve intra-rack locality as well as data locality because of resource constraints. So, at that time we need to execute the mapper on nodes on different racks, and also then the data copy from the node that has data to the node executing mapper between racks. So, this is what we call inter-rack data locality. Although, this option is less preferable.
Let’s learn features and principle of Hadoop
So, this was all in HBase Compaction and Data Locality in Hadoop. Hope you like our explanation.

嗯痒谴,有一種情況是衰伯,由于資源限制,我們無(wú)法實(shí)現(xiàn)機(jī)架內(nèi)局部性和數(shù)據(jù)局部性.因此积蔚,當(dāng)時(shí)我們需要在不同機(jī)架上的節(jié)點(diǎn)上執(zhí)行映射程序意鲸,然后將數(shù)據(jù)從具有數(shù)據(jù)的節(jié)點(diǎn)復(fù)制到機(jī)架之間執(zhí)行映射程序的節(jié)點(diǎn)上.這就是我們所說(shuō)的機(jī)架間數(shù)據(jù)局部性.雖然,這個(gè)選項(xiàng)不太可取.
讓我們學(xué)習(xí) Hadoop 的特性和原理
所以库倘,這都是 Hadoop 中的 HBase 壓縮和數(shù)據(jù)本地化.希望你喜歡我們的解釋

5. Conclusion: HBase Compaction

5..結(jié)論: Compaction 壓縮

Hence, in this Hadoop HBase tutorial of HBase Compaction and Data Locality, we have seen the cleaning process of HBase that is HBase Compaction. Also, we have seen a solution to data not being available to Mapper, Apache Hadoop Data Locality in detail. Hope it helps! Please share your experience through comments on our HBase Compaction explanation.
See also –
HBase Performance Tuning
For reference

因此临扮,在這個(gè) Compaction 壓縮和數(shù)據(jù)局部性的 Hadoop HBase 教程中,我們看到了 cleaning 壓縮的清理過(guò)程.ALso教翩,我們已經(jīng)看到了一個(gè)數(shù)據(jù)不可用的解決方案映射,詳細(xì)說(shuō)明 Apache Hadoop 數(shù)據(jù)局部性.希望有幫助!請(qǐng)通過(guò)對(duì)我們的 HBase 壓縮解釋的評(píng)論分享你的經(jīng)驗(yàn).
另見-
HBase 性能調(diào)優(yōu)
供參考

https://data-flair.training/blogs/hbase-compaction

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個(gè)濱河市饱亿,隨后出現(xiàn)的幾起案子蚜退,更是在濱河造成了極大的恐慌,老刑警劉巖彪笼,帶你破解...
    沈念sama閱讀 207,113評(píng)論 6 481
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件钻注,死亡現(xiàn)場(chǎng)離奇詭異,居然都是意外死亡配猫,警方通過(guò)查閱死者的電腦和手機(jī)幅恋,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 88,644評(píng)論 2 381
  • 文/潘曉璐 我一進(jìn)店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)泵肄,“玉大人捆交,你說(shuō)我怎么就攤上這事淑翼。” “怎么了品追?”我有些...
    開封第一講書人閱讀 153,340評(píng)論 0 344
  • 文/不壞的土叔 我叫張陵玄括,是天一觀的道長(zhǎng)。 經(jīng)常有香客問(wèn)我肉瓦,道長(zhǎng)遭京,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 55,449評(píng)論 1 279
  • 正文 為了忘掉前任泞莉,我火速辦了婚禮哪雕,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘戒财。我一直安慰自己热监,他們只是感情好捺弦,可當(dāng)我...
    茶點(diǎn)故事閱讀 64,445評(píng)論 5 374
  • 文/花漫 我一把揭開白布饮寞。 她就那樣靜靜地躺著,像睡著了一般列吼。 火紅的嫁衣襯著肌膚如雪幽崩。 梳的紋絲不亂的頭發(fā)上,一...
    開封第一講書人閱讀 49,166評(píng)論 1 284
  • 那天寞钥,我揣著相機(jī)與錄音慌申,去河邊找鬼。 笑死理郑,一個(gè)胖子當(dāng)著我的面吹牛蹄溉,可吹牛的內(nèi)容都是我干的。 我是一名探鬼主播您炉,決...
    沈念sama閱讀 38,442評(píng)論 3 401
  • 文/蒼蘭香墨 我猛地睜開眼柒爵,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼!你這毒婦竟也來(lái)了赚爵?” 一聲冷哼從身側(cè)響起棉胀,我...
    開封第一講書人閱讀 37,105評(píng)論 0 261
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤,失蹤者是張志新(化名)和其女友劉穎冀膝,沒想到半個(gè)月后唁奢,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體,經(jīng)...
    沈念sama閱讀 43,601評(píng)論 1 300
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡窝剖,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 36,066評(píng)論 2 325
  • 正文 我和宋清朗相戀三年麻掸,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片赐纱。...
    茶點(diǎn)故事閱讀 38,161評(píng)論 1 334
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡脊奋,死狀恐怖采郎,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情狂魔,我是刑警寧澤蒜埋,帶...
    沈念sama閱讀 33,792評(píng)論 4 323
  • 正文 年R本政府宣布,位于F島的核電站最楷,受9級(jí)特大地震影響整份,放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜籽孙,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 39,351評(píng)論 3 307
  • 文/蒙蒙 一烈评、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧犯建,春花似錦讲冠、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,352評(píng)論 0 19
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)。三九已至玻熙,卻和暖如春否彩,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背嗦随。 一陣腳步聲響...
    開封第一講書人閱讀 31,584評(píng)論 1 261
  • 我被黑心中介騙來(lái)泰國(guó)打工列荔, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留蛀蜜,地道東北人绘沉。 一個(gè)月前我還...
    沈念sama閱讀 45,618評(píng)論 2 355
  • 正文 我出身青樓,卻偏偏與公主長(zhǎng)得像铛铁,于是被迫代替她去往敵國(guó)和親署恍。 傳聞我的和親對(duì)象是個(gè)殘疾皇子崎溃,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 42,916評(píng)論 2 344

推薦閱讀更多精彩內(nèi)容