HBase架構(gòu)詳解和數(shù)據(jù)的讀寫流程

HBase架構(gòu)圖理解

18.png

HMaster鏈接Zookeeper的目得：HMaster需要知道哪些HRegionServere是活的及HRegionServer所在的位置持偏，然后管理HRegionServer。
HBase內(nèi)部是通過DFS client把數(shù)據(jù)寫到HDFS上的
每一個HRegionServer有多個HRegion刻像，每一個HRegion有多個Store，每一個Store對應(yīng)一個列簇。
HFile是HBase中KeyValue數(shù)據(jù)的存儲格式浮声，HFile是Hadoop的二進制格式文件魄健，StoreFile就是對HFile進行了封裝赋铝，然后進行數(shù)據(jù)的存儲。
HStore由MemStore和StoreFile組成沽瘦。
HLog記錄數(shù)據(jù)的所有變更革骨，可以用來做數(shù)據(jù)恢復。
hdfs對應(yīng)的目錄結(jié)構(gòu)為
namespace->table->列簇->列->單元格

17.png

寫數(shù)據(jù)流程

zookeeper中存儲了meta表的region信息析恋，從meta表獲取相應(yīng)region信息良哲，然后找到meta表的數(shù)據(jù)
根據(jù)namespace、表名和rowkey根據(jù)meta表的數(shù)據(jù)找到寫入數(shù)據(jù)對應(yīng)的region信息
找到對應(yīng)的regionserver
把數(shù)據(jù)分別寫到HLog和MemStore上一份
MemStore達到一個閾值后則把數(shù)據(jù)刷成一個StoreFile文件助隧。若MemStore中的數(shù)據(jù)有丟失筑凫，則可以總HLog上恢復
當多個StoreFile文件達到一定的大小后，會觸發(fā)Compact合并操作，合并為一個StoreFile漏健，這里同時進行版本的合并和數(shù)據(jù)刪除嚎货。
當Compact后，逐步形成越來越大的StoreFIle后蔫浆，會觸發(fā)Split操作殖属，把當前的StoreFile分成兩個，這里相當于把一個大的region分割成兩個region瓦盛。如下圖：

19.png

讀數(shù)據(jù)流程

zookeeper中存儲了meta表的region信息洗显，所以先從zookeeper中找到meta表region的位置，然后讀取meta表中的數(shù)據(jù)原环。meta中又存儲了用戶表的region信息挠唆。
根據(jù)namespace、表名和rowkey在meta表中找到對應(yīng)的region信息
找到這個region對應(yīng)的regionserver
查找對應(yīng)的region
先從MemStore找數(shù)據(jù)嘱吗，如果沒有玄组，再到StoreFile上讀(為了讀取的效率)。

HBase Java API基本使用

package org.apache.hadoop.hbase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.PrefixFilter;
import org.apache.hadoop.hbase.util.Bytes;

public class HbaseClientTest {
    
    /*
     * 跟去表名獲取表的實例
     */
    public static HTable getTable (String name) throws Exception{
        //get the hbase conf instance
        Configuration conf = HBaseConfiguration.create();
        //get the hbase table instance
        HTable table = new HTable(conf, name);
        
        return table;
    }
    
    /**
     * get the data from the hbase table 
     * 
     * get 'tbname','rowkey','cf:col'
     * 
     * 列簇-》列名-》value-》timestamp
     */
    public static void getData(HTable table) throws Exception {
        // TODO Auto-generated method stub
        Get get = new Get(Bytes.toBytes("20161119_10003"));
        //conf the get 
        //get.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));
        get.addFamily(Bytes.toBytes("info"));
        //load the get 
        Result rs = table.get(get);
        //print the data
        for(Cell cell : rs.rawCells()){
            System.out.println(
                    Bytes.toString(CellUtil.cloneFamily(cell))
                    +"->"+
                    Bytes.toString(CellUtil.cloneQualifier(cell))
                    +"->"+
                    Bytes.toString(CellUtil.cloneValue(cell))
                    +"->"+
                    cell.getTimestamp()
                    );
            System.out.println("------------------------------");
        }
        
    }
    
    /**
     * put the data to the hbase table 
     * 
     * put 'tbname','rowkey','cf:col','value'
     *      
     */
    public static void putData(HTable table) throws Exception {
        //get the put instance
        Put put = new Put(Bytes.toBytes("20161119_10003"));
        //conf the put
        put.add(
                Bytes.toBytes("info"), 
                Bytes.toBytes("age"), 
                Bytes.toBytes("20")
                );
        //load the put 
        table.put(put);
        //print
        getData(table);
    }
    
    /**
     * delete the data from the hbase table 
     * 
     * delete 'tbname','rowkey','cf:col'
     *      
     */
    public static void deleteData(HTable table) throws Exception {
        //get the delete instance
        Delete del = new Delete(Bytes.toBytes("20161119_10003"));
        //conf the del
        //del.deleteColumn(Bytes.toBytes("info"),Bytes.toBytes("age"));
        del.deleteColumns(Bytes.toBytes("info"),Bytes.toBytes("age"));
        //load the del
        table.delete(del);
        //print
        getData(table);
    }
    
    /**
     * scan the all table
     * scan 'tbname'
     *      
     */
    public static void scanData(HTable table) throws Exception {
        //get the scan instance
        Scan scan = new Scan();
        //load the scan
        ResultScanner rsscan = table.getScanner(scan);
        for(Result rs : rsscan){
            System.out.println(Bytes.toString(rs.getRow()));
            for(Cell cell : rs.rawCells()){
                System.out.println(
                        Bytes.toString(CellUtil.cloneFamily(cell))
                        +"->"+
                        Bytes.toString(CellUtil.cloneQualifier(cell))
                        +"->"+
                        Bytes.toString(CellUtil.cloneValue(cell))
                        +"->"+
                        cell.getTimestamp()
                        );
            }
            System.out.println("------------------------------");
        }
    }
    
    /**
     * scan the table  with limit
     * 
     * scan 'tbname',{STARTROW => 'row1',STOPROW => 'row2'}
     */
    public static void rangeData(HTable table) throws Exception {
        //get the scan instance
        Scan scan = new Scan();
        //conf the scan
            //scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));
            //scan.addFamily(family);
            //scan.setStartRow(Bytes.toBytes("20161119_10002"));
            //scan.setStopRow(Bytes.toBytes("20161119_10003"));
        Filter filter = new PrefixFilter(Bytes.toBytes("2016111"));
        scan.setFilter(filter);
        //hbase conf
        //是否啟動緩存
        scan.setCacheBlocks(true);
        //設(shè)置緩存的條數(shù)
        scan.setCaching(100);
        //每一次取多少條
        scan.setBatch(10);
        //共同決定了請求RPC的次數(shù)
        
        //load the scan
        ResultScanner rsscan = table.getScanner(scan);
        for(Result rs : rsscan){
            System.out.println(Bytes.toString(rs.getRow()));
            for(Cell cell : rs.rawCells()){
                System.out.println(
                        Bytes.toString(CellUtil.cloneFamily(cell))
                        +"->"+
                        Bytes.toString(CellUtil.cloneQualifier(cell))
                        +"->"+
                        Bytes.toString(CellUtil.cloneValue(cell))
                        +"->"+
                        cell.getTimestamp()
                        );
            }
            System.out.println("------------------------------");
        }
    }
    
    public static void main(String[] args) throws Exception {
        HTable table = getTable("test:tb1");
        getData(table);
        putData(table);
        deleteData(table);
        scanData(table);
        rangeData(table);
    }   
}

HBase架構(gòu)中各個模塊的功能再次總結(jié)

** Client **
整個HBase集群的訪問入口谒麦；
使用HBase RPC機制與HMaster和HRegionServer進行通信俄讹；
與HMaster進行通信進行管理表的操作；
與HRegionServer進行數(shù)據(jù)讀寫類操作绕德；
包含訪問HBase的接口患膛，并維護cache來加快對HBase的訪問
** Zookeeper **
保證任何時候，集群中只有一個HMaster耻蛇；
存貯所有HRegion的尋址入口踪蹬；
實時監(jiān)控HRegion Server的上線和下線信息，并實時通知給HMaster臣咖；
存儲HBase的schema和table元數(shù)據(jù)跃捣；
Zookeeper Quorum存儲表地址、HMaster地址亡哄。
** HMaster **
HMaster沒有單點問題枝缔，HBase中可以啟動多個HMaster，通過Zookeeper的Master Election機制保證總有一個Master在運行蚊惯，主負責Table和Region的管理工作愿卸。
管理用戶對表的創(chuàng)建、刪除等操作截型；
管理HRegionServer的負載均衡趴荸，調(diào)整Region分布；
Region Split后宦焦，負責新Region的分布发钝；
在HRegionServer停機后顿涣，負責失效HRegionServer上Region遷移工作。
** HRegion Server **
維護HRegion酝豪，處理對這些HRegion的IO請求涛碑，向HDFS文件系統(tǒng)中讀寫數(shù)據(jù)；
負責切分在運行過程中變得過大的HRegion孵淘。
Client訪問hbase上數(shù)據(jù)的過程并不需要master參與（尋址訪問Zookeeper和HRegion Server蒲障，數(shù)據(jù)讀寫訪問HRegione Server），HMaster僅僅維護這table和Region的元數(shù)據(jù)信息瘫证，負載很低揉阎。

hbase與mapreduce的集成

可以把hbase表中的數(shù)據(jù)作為mapreduce計算框架的輸入，或者把mapreduce的計算結(jié)果輸出到hbase表中背捌。
我們以hbase中自帶的mapreduce程序舉例

直接運行會發(fā)現(xiàn)報錯缺少jar包毙籽，所以運行前需引入環(huán)境變量

$ export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2 
$ export HADOOP_HOME=/opt/modules/hadoop-2.5.0  
# $HBASE_HOME/bin/hbase mapredcp可以列出hbase在yarn上運行所需的jar包
$ export HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase mapredcp`

運行示例

$ $HADOOP_HOME/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar rowcounter  test:tb1

HBase的數(shù)據(jù)遷移的importsv的使用

HBase數(shù)據(jù)來源于日志文件或者RDBMS，把數(shù)據(jù)遷移到HBase表中毡庆。常見的有三種方法：（1）使用HBase Put API坑赡；（2）使用HBase批量加載工具；（3）自定義MapReduce job實現(xiàn)扭仁。
importtsv是HBase官方提供的基于mapreduce的批量數(shù)據(jù)導入工具垮衷，同時也是hbase提供的一個命令行工具厅翔，可以將存儲在HDFS上的自定義分隔符(默認是\t)的數(shù)據(jù)文件乖坠，通過一條命令方便的導入到HBase中。
** 測試 **

準備數(shù)據(jù)文件

[wulei@bigdata-00 datas]$ cat tb1.tsv 
10001   zhangsan        20
10002   lisi    22
10003   wangwu  30

把數(shù)據(jù)文件上傳到hdsf上

$ bin/hdfs dfs -put /opt/datas/tb1.tsv /

在hbase中創(chuàng)建表
> create 'student','info'
將HDFS中的數(shù)據(jù)導入到hbase表中

$HADOOP_HOME/bin/yarn jar lib/hbase-server-0.98.6-hadoop2.jar importtsv  -Dimporttsv.separator=\t -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age  student  /tb1.tsv

Dimporttsv.columns為指定分隔符
Dimporttsv.columns指定數(shù)據(jù)文件中每一列如何對應(yīng)表中的rowkey和列
/tb1.tsv為hdfs上的數(shù)據(jù)文件的路徑

查看執(zhí)行結(jié)果

hbase(main):010:0> scan 'student'
ROW                       COLUMN+CELL                                                              
 10001                    column=info:age, timestamp=1480123167099, value=20                       
 10001                    column=info:name, timestamp=1480123167099, value=zhangsan                
 10002                    column=info:age, timestamp=1480123167099, value=22                       
 10002                    column=info:name, timestamp=1480123167099, value=lisi                    
2 row(s) in 0.8210 seconds

最后編輯于：2017.12.04 16:58:52

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末刀闷，一起剝皮案震驚了整個濱河市熊泵，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌甸昏，老刑警劉巖顽分，帶你破解...
沈念sama閱讀 217,657評論 6贊 505
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異施蜜，居然都是意外死亡卒蘸，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,889評論 3贊 394
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門翻默，熙熙樓的掌柜王于貴愁眉苦臉地迎上來缸沃，“玉大人，你說我怎么就攤上這事修械≈耗粒” “怎么了？”我有些...
開封第一講書人閱讀 164,057評論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵肯污，是天一觀的道長翘单。經(jīng)常有香客問我吨枉，道長，這世上最難降的妖魔是什么哄芜？我笑而不...
開封第一講書人閱讀 58,509評論 1贊 293
?港島之戀（遺憾婚禮）
正文為了忘掉前任貌亭，我火速辦了婚禮，結(jié)果婚禮上认臊，老公的妹妹穿的比我還像新娘属提。我一直安慰自己，他們只是感情好美尸，可當我...
茶點故事閱讀 67,562評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布冤议。她就那樣靜靜地躺著，像睡著了一般师坎。火紅的嫁衣襯著肌膚如雪恕酸。梳的紋絲不亂的頭發(fā)上，一...
開封第一講書人閱讀 51,443評論 1贊 302
城市分裂傳說
那天胯陋，我揣著相機與錄音蕊温，去河邊找鬼。笑死遏乔，一個胖子當著我的面吹牛义矛，可吹牛的內(nèi)容都是我干的。我是一名探鬼主播盟萨，決...
沈念sama閱讀 40,251評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼凉翻，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了捻激？” 一聲冷哼從身側(cè)響起制轰，我...
開封第一講書人閱讀 39,129評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎胞谭，沒想到半個月后垃杖，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體，經(jīng)...
沈念sama閱讀 45,561評論 1贊 314
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡丈屹，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,779評論 3贊 335
?白月光啟示錄
正文我和宋清朗相戀三年调俘，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片旺垒。...
茶點故事閱讀 39,902評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡彩库，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出袖牙，到底是詐尸還是另有隱情侧巨，我是刑警寧澤，帶...
沈念sama閱讀 35,621評論 5贊 345
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布鞭达，位于F島的核電站司忱，受9級特大地震影響皇忿，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜坦仍，卻給世界環(huán)境...
茶點故事閱讀 41,220評論 3贊 328
男人毒藥：我在死后第九天來索命
文/蒙蒙一鳍烁、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧繁扎，春花似錦幔荒、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,838評論 0贊 22
一樁弒父案爹梁，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至提澎，卻和暖如春姚垃，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背盼忌。一陣腳步聲響...
開封第一講書人閱讀 32,971評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工积糯，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人谦纱。一個月前我還...
沈念sama閱讀 48,025評論 2贊 370
代替公主和親
正文我出身青樓看成，卻偏偏與公主長得像，于是被迫代替她去往敵國和親跨嘉。傳聞我的和親對象是個殘疾皇子川慌，可洞房花燭夜當晚...
茶點故事閱讀 44,843評論 2贊 354