HBase

簡介?

? ? HBase是高可靠性，高性能尽棕，面向列，可伸縮的分布式存儲系統(tǒng)彬伦，利用HBase技術(shù)可在廉價PC Server上搭建起大規(guī)模結(jié)構(gòu)化存儲集群滔悉。

HBase的目標(biāo)是存儲并處理大型的數(shù)據(jù)，更具體來說是僅需使用普通的硬件配置单绑，就能處理成千上萬的行和列所組成的大型數(shù)據(jù)回官。

? ? ?HBase是Google Bigtable的開源實現(xiàn)，但是也有很多不同之處搂橙。比如：Google Bigtable利用GFS作為其文件存儲系統(tǒng)歉提，HBASE利用HadoopHDFS作為其文件存儲系統(tǒng)；Google運行MAPREDUCE來處理Bigtable中的海量數(shù)據(jù)，HBASE同樣利用Hadoop MapReduce來處理HBASE中的海量數(shù)據(jù)唯袄；Google Bigtable利用Chubby作為協(xié)同服務(wù)弯屈，HBase利用Zookeeper作為對應(yīng)。

行存儲

? ? 優(yōu)點：寫入一次性恋拷，保持數(shù)據(jù)完整性

? ? 缺點：數(shù)據(jù)讀取過程中產(chǎn)生冗余數(shù)據(jù)

列存儲

? ? 優(yōu)點：讀取過程不產(chǎn)生冗余數(shù)據(jù)，特別適合對數(shù)據(jù)完整性不高的大數(shù)據(jù)領(lǐng)域

? ? 缺點：寫入效率差厅缺，保證數(shù)據(jù)完整性方面差

與傳統(tǒng)數(shù)據(jù)庫對比

1.傳統(tǒng)數(shù)據(jù)庫遇到的問題

? ?數(shù)據(jù)量很大時無法存儲

? ?沒有良好的備份機制

? ?數(shù)據(jù)達到一定數(shù)量開始緩慢蔬顾，很大的話基本無法支撐

2.HBase的優(yōu)勢

? ?線性擴展，隨著數(shù)據(jù)量增加可以通過節(jié)點擴展進行支撐

? ?數(shù)據(jù)存儲在HDFS上湘捎，備份機制健全

? ?通過zookeeper協(xié)調(diào)找數(shù)據(jù)诀豁，訪問速度塊

HBase 集群中的角色

Hbase一張表又一個或多個Hregion組成，記錄之間按照行鍵的字典排序（每條數(shù)據(jù)也是按照順序有序的進行排序窥妇，為了檢索更快舷胜，更高效）

1，一個或多個主節(jié)點 Hmaster

監(jiān)控RegionServer

?處理RegionServer故障轉(zhuǎn)移

?處理元數(shù)據(jù)的變更

?在空閑時間進行數(shù)據(jù)負載均衡

?通過Zookeeper發(fā)布自己的位置給客戶端

2.多個節(jié)點活翩，HregionServer

?負責(zé)存儲HBase的實際數(shù)據(jù)

?處理分配給它的Region

?刷新緩存到HDFS

?維護HLog

執(zhí)行壓縮

負責(zé)處理Region分片

基本原理

? ? HBase 一種作為存儲分布式文件系統(tǒng)烹骨，另一種作為數(shù)據(jù)處理模型的MR框架

? ? HBase 內(nèi)置有Zookeeper，但一般我們會有其他的Zookeeper集群來監(jiān)管master和regionserver,Zookeeper通過選舉材泄，保證任何時候沮焕，集群中只有一個活躍的HMaster，HMaster與HRegionServer啟動會向Zookeeper注冊拉宗，存儲所有HRegion 的尋址入口峦树，實時監(jiān)控HRegionserver的上線和下線信息，并實時通知給HMaster旦事，存儲HBase的schema和table元數(shù)據(jù)魁巩。默認情況下，HBase管理Zookeeper實例姐浮，Zookeeper的引入使得HMaster不再是單點故障谷遂，一般情況下會啟動兩個HMaster，非Active的HMaster會定期和Active HMaster通信以獲取最新狀態(tài)单料，從而保證它實時更新埋凯，如果啟動多個HMaster反而會增加Active HMaster的負擔(dān)。

? ? 一個RegionServer可以包含多個HRegion扫尖，每個RegionServer維護一個HLog白对，和多個HFile以及對應(yīng)的MemStore.RegionServer運行在與DataNode上。數(shù)量可以與DateNode數(shù)量一致

組件說明

Write-Ahead logs

HBase 的修改記錄换怖，當(dāng)對 HBase 讀寫數(shù)據(jù)的時候甩恼，數(shù)據(jù)不是直接寫進磁盤，它會在內(nèi)存中保留一段時間（時間以及數(shù)據(jù)量閾值可以設(shè)定）。但把數(shù)據(jù)保存在內(nèi)存中可能有更高的概率引起數(shù)據(jù)丟失条摸，為了解決這個問題悦污，數(shù)據(jù)會先寫在一個叫做Write-Ahead logfile 的文件中,再寫入內(nèi)存中。所以在系統(tǒng)出現(xiàn)故障的時候钉蒲，數(shù)據(jù)可以通過這個日志文件重建切端。

HFile

這是在磁盤上保存原始數(shù)據(jù)的實際的物理文件，是實際的存儲文件顷啼。

StoreHFile 存儲在 Store 中踏枣，一個 Store 對應(yīng) HBase 表中的一個列族。

MemStore顧名思義钙蒙，就是內(nèi)存存儲茵瀑，位于內(nèi)存中，用來保存當(dāng)前的數(shù)據(jù)操作躬厌，所以當(dāng)數(shù)據(jù)保存在 WAL 中之后马昨，RegsionServer

會在內(nèi)存中存儲鍵值對。

RegionHbase 表的分片扛施，HBase 表會根據(jù) RowKey 值被切分成不同的region 存儲在 RegionServer 中鸿捧，在一個 RegionServer 中可以有多個不同的 region。

Hbase是按照行鎖定,管理著不同的地區(qū)煮嫌，RegionServer主要是管理著用戶的讀和寫笛谦，這些數(shù)據(jù)是在HDFS存的

HRegion?相當(dāng)于是對著地區(qū)一個封裝

按照RoWky范圍分的：region“Hregion”RegionServer

按照列簇（Columc Family）“多個HStore

HStor“memStore（寫緩存）+ HFiles（均為有序的鍵值）

Hbase系統(tǒng)架構(gòu)：

客戶端：訪問Hbase接口，維護緩存加速區(qū)域服務(wù)器訪問

主負載均衡昌阿，分配Region到RegionServer

RegionServer維護區(qū)域負責(zé)區(qū)域的IO

Zookeeper 保證集群只有一個Master 存儲所有Region（Root）入口地址饥脑，實時監(jiān)控Region Server的上下線

HMaster功能（主）：

?負載均衡，管理和分配HRegion

DDL 增刪改

類似NameNode管理一些元數(shù)據(jù)（table的結(jié)構(gòu)元數(shù)據(jù)）

ACL權(quán)限控制

HRegionServer（從）：

管理和存放本地的HRegion

讀寫HDFS懦冰，提供IO操作

本地化：HRegion的數(shù)據(jù)盡量和數(shù)據(jù)所屬的DataNode在一塊灶轰，但是這個本地化不能夠總是滿足和實現(xiàn)

HBase 安裝部署

下載

http://hbase.apache.org/downloads.html

上傳解壓

[root@master HBase]# tar -zxvf hbase-1.4.6-bin.tar.gz

配置環(huán)境

#hbase-site.xml

# vim hbase-site.xml

##添加以下內(nèi)容

? ?hbase.rootdir

? ?hdfs://master:9000/hbase

? hbase.master.port

? 16000

? hbase.cluster.distributed

? true

? hbase.zookeeper.quorum

? master:2181,slave1:2181,slave2:2181

? hbase.zookeeper.property.dataDir

? /opt/apps/Zookeeper/data

[root@master conf]# vim hbase-env.sh

##添加以下內(nèi)容

export JAVA_HOME=/opt/apps/Java/jdk1.8.0_172

export HBASE_HOME=/opt/apps/HBase/hbase-1.4.6

Extra Java CLASSPATH elements.? Optional.

export HBASE_CLASSPATH=$CLASSPATH:$HBASE_HOME/lib

[root@master conf]# vim regionservers

##添加以下內(nèi)容

slave1

slave2

master

解決Jar包問題

[root@master lib]# rm -rf hadoop-*

[root@master lib]# rm -rf zookeeper-3.4.10.jar

[root@master lib]# cp ./* /opt/apps/HBase/hbase-1.4.6/lib/

hadoop-annotations-2.7.6.jar

hadoop-auth-2.7.6.jar

hadoop-client-2.7.6.jar

hadoop-common-2.7.6.jar

hadoop-hdfs-2.7.6.jar

hadoop-mapreduce-client-app-2.7.6.jar

hadoop-mapreduce-client-common-2.7.6.jar

hadoop-mapreduce-client-core-2.7.6.jar

hadoop-mapreduce-client-jobclient-2.7.6.jar

hadoop-mapreduce-client-shuffle-2.7.6.jar

hadoop-yarn-api-2.7.6.jar

hadoop-yarn-client-2.7.6.jar

hadoop-yarn-common-2.7.6.jar

hadoop-yarn-server-common-2.7.6.jar

zookeeper-3.4.12.jar

在HBase中添加Hadoop的配置文件

## 通過軟連接的方式創(chuàng)建

[root@master conf]# ln -s /opt/apps/Hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml /opt/apps/H

Base/hbase-1.4.6/conf/core-site.xml

[root@master conf]# ln -s /opt/apps/Hadoop/hadoop-2.7.6/etc/hadoop/hdfs-site.xml /opt/apps/H

Base/hbase-1.4.6/conf/hdfs-site.xml

分發(fā)至各個節(jié)點

啟動

start-hbase.sh

查看進程

[root@master apps]# jps

115106 Jps

99509 QuorumPeerMain

113333 HRegionServer

103769 ResourceManager

103419 NameNode

113196 HMaster

103615 SecondaryNameNode

通過http://master:16010 訪問檢查是否成功

命令炒作

#list 查看所有表

hbase(main):001:0> list

=> ["student", "teacher"]

#create ? 表名? 列族?（可以多個，） ? ?創(chuàng)建表

hbase(main):002:0> create 'test' ,'info','name'

#put ? 表名? key鍵? 列族（名）? 值? ? ? ?添加（修改）數(shù)據(jù)

hbase(main):006:0> put 'test','01','info:name' ,'zhangsan'

#scan? 表名? 查看表數(shù)據(jù)

hbase(main):006:0> put 'test','01','info:name' ,'zhangsan'

#describe 表名 ? 查看表結(jié)構(gòu)

hbase(main):008:0> describe 'test'

Table test is ENABLED? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

test? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

COLUMN FAMILIES DESCRIPTION? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COM

PRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COM

PRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

2 row(s) in 0.1810 seconds

#get 表名列key值 ? ?獲取一行數(shù)據(jù)內(nèi)容

hbase(main):010:0> get 'test','01'

#查看一定范圍的行數(shù)據(jù)（按字節(jié)以字典順序）

hbase(main):013:0> scan 'test',{STARTROW=>'01',STOPROW=>'03'}

ROW? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? COLUMN+CELL? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

01? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? column=info:name, timestamp=1534852652080, value=zhangsan? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

02? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? column=info:name, timestamp=1534853260806, value=zhangsan?

#統(tǒng)計表中多少條數(shù)據(jù)

hbase(main):014:0> count 'test'

#刪除表中某個字段

hbase(main):016:0> delete 'test' ,'03','info:name'

# deleteall?刪除表中一條數(shù)據(jù)

hbase(main):019:0> deleteall 'test','02'

# truncate 清除表中數(shù)據(jù)

hbase(main):021:0> truncate 'test'

#刪除表

hbase(main):023:0> disable 'test'

hbase(main):026:0> drop 'test'

API操作

package com.zhiyou.HBase;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.HColumnDescriptor;

import org.apache.hadoop.hbase.HTableDescriptor;

import org.apache.hadoop.hbase.MasterNotRunningException;

import org.apache.hadoop.hbase.TableName;

import org.apache.hadoop.hbase.ZooKeeperConnectionException;

import org.apache.hadoop.hbase.client.Delete;

import org.apache.hadoop.hbase.client.Get;

import org.apache.hadoop.hbase.client.HBaseAdmin;

import org.apache.hadoop.hbase.client.HTable;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.client.ResultScanner;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.util.Bytes;

import org.junit.Before;

import org.junit.Test;

public class HBaseOperate {

private Configuration conf = null;

//lianjie

@Before

public void connect() {

conf = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", "master,slave1,slave2");

conf.set("hbase.zookeeper.property.clientPort", "2181");

}

/**

* 創(chuàng)建表

* @throws Exception

* @throws IOException

@Test

public void createTable() throws Exception{

HBaseAdmin admin = new HBaseAdmin(conf);

HTableDescriptor desc = new HTableDescriptor(TableName.valueOf("teacher"));

desc.addFamily(new HColumnDescriptor("info"));

desc.addFamily(new HColumnDescriptor("zhicheng"));

admin.createTable(desc);

? ? admin.close();

}

@Test

public void putToTable() throws IOException {

//創(chuàng)建表對象

HTable table = new HTable(conf, "teacher");

//創(chuàng)建put對象

Put put = new Put("l00002".getBytes());

put.add("info".getBytes(),"name".getBytes(),"san".getBytes());

table.put(put);

table.close();

}

/**

* 判斷表是否存在

* @throws MasterNotRunningException

* @throws ZooKeeperConnectionException

* @throws IOException

@Test

public void isExist() throws MasterNotRunningException, ZooKeeperConnectionException, IOException {

HBaseAdmin admin = new HBaseAdmin(conf);

boolean re = admin.tableExists("student");

System.out.println(re);

admin.close();

}

//刪除行

@Test

public void deleteRow() throws IOException {

HTable table = new HTable(conf,"teacher");

Delete delete = new Delete("100002".getBytes());

table.delete(delete);

? ? table.close();

}

@Test

public void scanTable() throws IOException? {

HTable table = new HTable(conf, "teacher");

Scan scan = new Scan();

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

Cell[] cs = r.rawCells();

for (Cell c : cs) {

System.out.print("行鍵"+ Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.print("列族"+Bytes.toString(CellUtil.cloneFamily(c)));

? ? System.out.print("列"+Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.println("值"+Bytes.toString(CellUtil.cloneValue(c)));

}

@Test

public void getRow() throws IOException {

HTable table = new HTable(conf, "teacher");

Get get = new Get("100001".getBytes());

Result rs = table.get(get);? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Cell[] cs = rs.rawCells();

for (Cell c : cs) {

System.out.print("行鍵"+ Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.print("列族"+Bytes.toString(CellUtil.cloneFamily(c)));

? ? System.out.print("列"+Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.println("值"+Bytes.toString(CellUtil.cloneValue(c)));

}

HBase的讀寫流程

讀數(shù)據(jù)流程

Hbase 兩張?zhí)厥獾谋硭⒏郑@兩張表存在于Zookeeper上-Root表記錄了》Meta表的region信息

-Meta表笋颤，該表不會做分裂，記錄了用戶表的Region信息内地，.Meta表可以有多個region

尋址流程：

從0.96之后去掉了-Root表伴澄，所以流程是：從Zookeeper（／hbase／meta-region-server）中獲取hbase.meta的位置（HRegionServer的位置），緩存該位置信息阱缓，然后從HRegionServer中查詢用戶Table對應(yīng)請求的Rowkey所在的HregionServer非凌，緩存該位置信息，最后從查詢到HRegionServer中讀取Row

掃描的依次順序荆针；BlockCache敞嗡，MemStore颁糟，StoreFile（HFile) 塊緩存

region server保存著meta表以及數(shù)據(jù)，要想訪問數(shù)據(jù)喉悴±饷玻客戶端必須通過Zookeeper獲取—ROOT—的位置信息

通過—Root—來獲取meta中的region的位置

客戶端通過meta獲取數(shù)據(jù)的region位置

通過region的位置獲取數(shù)據(jù)

寫入數(shù)據(jù)流程

客戶端先訪問Zookeeper，找到元數(shù)據(jù)信息

確定要寫入的數(shù)據(jù)在哪個region上

然后客戶端向該region server發(fā)送寫數(shù)據(jù)的請求

客戶端先把數(shù)據(jù)寫到HLog中箕肃，以及所需要的操作婚脱，防止數(shù)據(jù)丟失

然后寫入Memstore

如果HLog和Memstore 均寫入成功，則表示該數(shù)據(jù)寫入成功勺像。如果在這個過程中起惕，Memstore的數(shù)據(jù)達到了閥值，就會將Memsstore中的數(shù)據(jù)刷新到storefile

storefile過多時咏删，region就會越來越大，如果達到閾值问词，那么region會被master一分為二

storefile最后會不斷的溢出成Hfile

在region server空閑的時候督函，會將HFile這些小文件進行合并

HBase的MR

通過HBase的相關(guān)JavaAPI，我們可以實現(xiàn)HBase操作的MapReduce過程激挪，如使用MapReduce將數(shù)據(jù)從本地文件系統(tǒng)導(dǎo)入數(shù)據(jù)到HBase的表中辰狡。

統(tǒng)計HBase表中行

[root@master jar]# yarn jar hbase-server-1.4.6.jar rowcounter student

## 報錯

## Exception in thread "main" java.lang.NoClas

sDefFoundError: org/apache/hadoop/hase/filter/

Filter

## 解決方案

## 在環(huán)境變量中添加HADOOP_CLASSPATH變量，將HBase的

jar包添加進去

導(dǎo)入HDFS上的文件到HBase

[root@master jar]# vim input_hbase.tsv

## 添加數(shù)據(jù)

11111 zhangsan 18

11112 lisi 17

11113 wangwu 99

11114 zhaoliu 100

## 上傳至Hadoop

[root@master jar]# hadoop fs -mkdir? /hbase/mr/

[root@master jar]# hadoop fs -put input_hbase.tsv? /hbase/mr/

## 在HBase上創(chuàng)建相應(yīng)的表垄分，否則會出現(xiàn)表不存在的異常

hbase(main):001:0> create 'people','info'

## 執(zhí)行

[root@master jar]# yarn jar hbase-server-1.4.6.jar? importtsv -Dimporttsv.columns=HBASE_ROW_KEY,info:name,info:age people

hdfs://master:9000/hbase/mr/

自定義HBase MR

package com.zhiyou.HBase.diymr;

import java.io.IOException;

import org.apache.hadoop.hbase.Cell;

import org.apache.hadoop.hbase.CellUtil;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Result;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapper;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.mapreduce.Mapper;

/**

* 使用HBase作為輸入

* @author Administrator

public class HTableToOtherTable extends TableMapper{

@Override

protected void map(ImmutableBytesWritable key, Result value,

Mapper.Context context)

throws IOException, InterruptedException {

? //將People中的數(shù)據(jù)提取出宛篇，放入另一個HBase中

Put put = new Put(key.get());

Cell[] cells = value.rawCells();

//解析這行數(shù)據(jù)

for (Cell c : cells) {

if("info".equals(Bytes.toString(CellUtil.cloneFamily(c)))) {

//是這個列族的數(shù)據(jù)取出

if("name".equals(Bytes.toString(CellUtil.cloneQualifier(c)))) {

//將這個數(shù)據(jù)加入到put

put.add(c);

}else if("age".equals(Bytes.toString(CellUtil.cloneQualifier(c)))) {

put.add(c);

}

//將數(shù)據(jù)一個個傳遞到reduce

context.write(key, put);

}

Reduce：

package com.zhiyou.HBase.diymr;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Mutation;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableReducer;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.mapreduce.Reducer;

public class HTableToOtherTableReduce extends TableReducer{

@Override

protected void reduce(ImmutableBytesWritable key, Iterable values,

Reducer.Context context)

throws IOException, InterruptedException {

for (Put put : values) {

context.write(NullWritable.get(),put);

}

Driver：

package com.zhiyou.HBase.diymr;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.hbase.HBaseConfiguration;

import org.apache.hadoop.hbase.client.Put;

import org.apache.hadoop.hbase.client.Scan;

import org.apache.hadoop.hbase.io.ImmutableBytesWritable;

import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;

import org.apache.hadoop.hbase.util.Bytes;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

public class HTableToOtherTableDriver extends Configured implements Tool{

private Scan scan = new Scan();

@Override

public int run(String[] arg0) throws Exception {

//創(chuàng)建conf

Configuration conf = this.getConf();

//配置conf

//創(chuàng)建job

Job job = Job.getInstance(conf, "hhah");

job.setJarByClass(HTableToOtherTableDriver.class);

//配置job

TableMapReduceUtil.initTableMapperJob(

"people",? //表名

scan,? ? ? //掃描器

HTableToOtherTable.class,? //輸入Mapper類

ImmutableBytesWritable.class,//輸入Mapper類型

Put.class, //輸出Mapper類型

job);

TableMapReduceUtil.initTableReducerJob(

"people_mr",

HTableToOtherTableReduce.class,

job );

//執(zhí)行

boolean re = job.waitForCompletion(true);

if(re) {

System.out.println("執(zhí)行成功");

}else {

System.out.println("失敗");

}

return re?0:1;

}

public static void main(String[] args) throws Exception {

Configuration conf = HBaseConfiguration.create();

ToolRunner.run(conf,new HTableToOtherTableDriver(), args);

}

Hbase過濾器FilterListFilterList

代表一個過濾器列表，可以添加多個過濾器進行查詢薄湿，多個過濾器之間的關(guān)系有：與關(guān)系（符合所有）：FilterList.Operator.MUST_PASS_ALL或關(guān)系（符合任一）：FilterList.Operator.MUST_PASS_ONE

FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ONE);

Scan s1 = new Scan();

filterList.addFilter(new SingleColumnValueFilter(

Bytes.toBytes(“f1”),

Bytes.toBytes(“c1”),

CompareOp.EQUAL,Bytes.toBytes(“v1”)));

filterList.addFilter(new SingleColumnValueFilter(

Bytes.toBytes(“f1”),

Bytes.toBytes(“c2”),

CompareOp.EQUAL,Bytes.toBytes(“v2”)));

// 添加下面這一行后叫倍，則只返回指定的cell，同一行中的

其他cell不返回

s1.addColumn(Bytes.toBytes(“f1”), Bytes.toBytes(“c1”));

s1.setFilter(filterList); //設(shè)置filter

ResultScanner ResultScannerFilterList = table

.getScanner(s1);//返回結(jié)果列表

過濾器的種類

列植過濾器—SingleColumnValueFilter過濾列植的相等豺瘤、不等吆倦、范圍等列名

前綴過濾器—ColumnPrefixFilter過濾指定前綴的列名多個列名前綴過濾器—MultipleColumnPrefixFilter過濾多個指定前綴的列名

rowKey過濾器—RowFilter通過正則，過濾rowKey值坐求。

列植過濾器—SingleColumnValueFilterSingleColumnValueFilter

列值判斷相等 (CompareOp.EQUAL ),

不等(CompareOp.NOT_EQUAL),

范圍 (e.g., CompareOp.GREATER)…………下面示例檢查列值和字符串'values' 相等...

SingleColumnValueFilter f = new SingleColumnValueFilter(

Bytes.toBytes("cFamily"),

Bytes.toBytes("column"),

CompareFilter.CompareOp.EQUAL,

Bytes.toBytes("values"));

s1.setFilter(f);

注意：如果過濾器過濾的列在數(shù)據(jù)表中有的行中不存在蚕泽，那么這個過濾器對此行無法過濾。

#hbase表數(shù)據(jù)

hbase(main):002:0> scan 'teacher'

? ? ? ? ROW? ? ? ? ? ? ? ? ? ? ? ? COLUMN+CELL? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

? ? ? ? 8000? ? ? ? ? ? ? ? ? ? ? column=info:sex, timestamp=1534993262771, value=hhaha? ? ? ? ? ? ? ? ? ?

? ? ? ? 9990? ? ? ? ? ? ? ? ? ? ? column=info:sex, timestamp=1534993287438, value=hehe? ? ? ? ? ? ? ? ? ?

? ? ? ? l00001? ? ? ? ? ? ? ? ? ? column=info:name, timestamp=1534844312072, value=zhangsan? ? ? ? ? ? ? ?

? ? ? ? l00002? ? ? ? ? ? ? ? ? ? column=info:name, timestamp=1534944349197, value=san? ? ? ? ? ? ? ? ? ?

? ? ? ? l00002? ? ? ? ? ? ? ? ? ? column=zhicheng:zhang, timestamp=1534944349197, value=11?

? ? @Test

public void scanTable() throws IOException? {

HTable table = new HTable(conf, "teacher");

Scan scan = new Scan();

ResultScanner rs = table.getScanner(scan);

for (Result r : rs) {

Cell[] cs = r.rawCells();

System.out.println(r);

for (Cell c : cs) {

System.out.print("行鍵"+ Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.print("列族"+Bytes.toString(CellUtil.cloneFamily(c)));

? ? System.out.print("列"+Bytes.toString(CellUtil.cloneRow(c))+"\t");

? ? System.out.println("值"+Bytes.toString(CellUtil.cloneValue(c)));

}

結(jié)果：

keyvalues={8000/info:sex/1534993262771/Put/vlen=5/seqid=0}

行鍵8000 列族info列sex 值hhaha

keyvalues={9990/info:sex/1534993287438/Put/vlen=4/seqid=0}

行鍵9990 列族info列sex 值hehe

keyvalues={l00001/info:name/1534844312072/Put/vlen=8/seqid=0}

行鍵l00001 列族info列name 值zhangsan

列名前綴過濾器—ColumnPrefixFilter

過濾器—ColumnPrefixFilterColumnPrefixFilter 用于指定列名前綴值相等

ColumnPrefixFilter f = new ColumnPrefixFilter(

Bytes.toBytes("values"));

2. s1.setFilter(f);

多個列值前綴過濾器—MultipleColumnPrefixFilterMultipleColumnPrefixFilter 和 ColumnPrefixFilter 行為差不多桥嗤，但可以指定多個前綴

byte[][] prefixes = new byte[][] {Bytes.toBytes("value1"),Bytes.toBytes("value2")};

Filter f = new MultipleColumnPrefixFilter(prefixes);

s1.setFilter(f);

rowKey過濾器—RowFilterRowFilter

是rowkey過濾器通常根據(jù)rowkey來指定范圍時须妻，使用scan掃描器的StartRow和StopRow方法比較好。

Filter f = new RowFilter(

CompareFilter.CompareOp.EQUAL,

new RegexStringComparator("^1234")); /

/匹配以1234開頭的rowkey

s1.setFilter(f);

HBase 數(shù)據(jù)庫架構(gòu)組成部分泛领。

HMaster荒吏、HRegionServer、HRegion师逸、Store司倚、MemStore豆混、StoreFile、HFile动知、HLog等皿伺。

HBase與Hive的區(qū)別

Hive

數(shù)據(jù)倉庫

Hive 的本質(zhì)其實就相當(dāng)于將 HDFS 中已經(jīng)存儲的文件在 Mysql 中做了一個映射關(guān)系，以方便使用 HQL 去管理查詢盒粮。

用于數(shù)據(jù)分析鸵鸥、清洗

Hive 適用于離線的數(shù)據(jù)分析和清洗，延遲較高丹皱《恃ǎ基于 HDFS 、MapReduceHive 存儲的數(shù)據(jù)依舊在 DataNode 上摊崭，編寫的 HQL 語句終將是轉(zhuǎn)換為 MapReduce 代碼執(zhí)行讼油。HBase數(shù)據(jù)庫

HBase

數(shù)據(jù)庫

是一種面向列存儲的分布式的非關(guān)系型數(shù)據(jù)庫。

用于存儲結(jié)構(gòu)化和非結(jié)構(gòu)化的數(shù)據(jù)適用于單表非關(guān)系型數(shù)據(jù)的存儲呢簸，不適合做關(guān)聯(lián)查詢矮台，類似 JOIN等操作。

基于 HDFS

數(shù)據(jù)持久化存儲的體現(xiàn)形式是 Hfile根时，存放于 DataNode 中瘦赫，被ResionServer 以 region 的形式進行管理。

延遲較低蛤迎，適合接入在線業(yè)務(wù)使用面對大量的企業(yè)數(shù)據(jù)确虱，HBase 可以實現(xiàn)單表大量數(shù)據(jù)的存儲，同時提供了高效的數(shù)據(jù)訪問速度替裆。

Hive與HBase集成操作

配置

替換hive中l(wèi)ib中jar包（HBase ?zookeeper）

修改配置文件

[root@master jar]# vim /opt/apps/Hive/hive-2.3.3/conf/hive-site.xml

? hive.zookeeper.quorum

? master,slave1,slave2

hive (default)> create table hive_hbase_people(id int,name string,age int)

? ? ? ? ? ? ? > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

? ? ? ? ? ? ? > with serdeproperties("hbase.columns.mapping"=":key,info:name,info:age")

? ? ? ? ? ? ? > tblproperties("hbase.table.name"="hbase_hive_people");

## 創(chuàng)建完成之后校辩，HBASE中的表會自動創(chuàng)建

## 關(guān)聯(lián)表要想插入數(shù)據(jù)，不能使用load方式加載

簡單操作

假設(shè)HBase的某一個表中扎唾，已經(jīng)存儲了一些數(shù)據(jù)召川，現(xiàn)在需使用Hive的外部表來關(guān)聯(lián)的HBase的這個表，可以胸遇，可以借助Hive進行離線分析

## 在HBase 創(chuàng)建相應(yīng)的表

hbase(main):003:0> create 'zhiyou:student','haha'

## hive中創(chuàng)建關(guān)聯(lián)的外部表

hive (default)> create external table hive_external_hbase_student(id int,name string)

stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

with serdeproperties ("hbase.columns.mapping"=":key,haha:name")

tblproperties("hbase.table.name"="zhiyou:student");

hive (default)> select * from hive_external_hbase_student;

sqoop與HBase的集成操作

配置

[root@master ~]# vim /opt/apps/Sqoop/sqoop-1.4/conf/sqoop-env.sh

#set the path to where bin/hbase is available

export HBASE_HOME=/opt/apps/HBase/hbase-1.4.6

#Set the path for where zookeper config dir is

export ZOOKEEPER_HOME=/opt/apps/Zookeeper/zookeeper-3.4.12

export ZOOCFGDIR=$ZOOKEEPER_HOME/conf

[root@master Zookeeper]# sqoop import

--connect jdbc:mysql://master:3306/mysql_bigdata

--username root

--password 123456

--table product

--columns "id,name,price"

--hbase-create-table

--hbase-row-key "id"

--hbase-table "hbase_sqoop_product"

--column-family "info"

--split-by id

相關(guān)參數(shù)

參數(shù) 描述

columnfamilySets the target column family for the import設(shè)置導(dǎo)入

的目標(biāo)列族荧呐。

--hbasecreatetableIf specified, create missing HBase tables 是否自動創(chuàng)建

不存在的 HBase 表（這就意味著，不需要手動提前在

HBase 中先建立表）

--hbaserow-keySpecifies which input column to use as the row

key.In case, if input table contains composite key,

then must be in the form of a comma-separated list

of composite key attributes. mysql 中哪一列的值作為

HBase 的 rowkey纸镊，如果rowkey是個組合鍵倍阐，則以逗號分

隔。（注：避免 rowkey 的重復(fù)）

--hbasetableSpecifies an HBase table to use as the target instead

?of HDFS.指定數(shù)據(jù)將要導(dǎo)入到 HBase 中的哪張表中逗威。

--hbasebulkloadEnables bulk loading.是否允許 bulk 形式的導(dǎo)入峰搪。

簡單使用

[root@master ~]# sqoop import \

--connect jdbc:mysql://master:3306/mysql_bigdata \

--username root \

--password 123456 \

--table product \

--columns "id,name,price" \

--hbase-create-table \

--hbase-row-key "id" \

--hbase-table "hbase_sqoop_product_1" \

--column-family "info"

Hbase shell的其他命令

數(shù)據(jù)的備份與恢復(fù)

備份

停止 HBase 服務(wù)后，使用 distcp 命令運行 MapReduce 任務(wù)進行備份凯旭，將數(shù)據(jù)備份到另一個地方概耻，可以是同一個集群使套，也可以是專用的備份集群。即鞠柄，把數(shù)據(jù)轉(zhuǎn)移到當(dāng)前集群的其他目錄下（也可以不在同一個集群中）

恢復(fù)

非常簡單侦高，與備份方法一樣，將數(shù)據(jù)整個移動回來即可厌杜。

節(jié)點的管理服役（commissioning ）

當(dāng)啟動 regionserver 時奉呛，regionserver 會向 HMaster 注冊并開始接收本地數(shù)據(jù)，開始的時候夯尽，新加入的節(jié)點不會有任何數(shù)據(jù)瞧壮，平衡器開啟的情況下，將會有新的 region 移動到開啟的RegionServer 上匙握。如果啟動和停止進程是使用 ssh 和 HBase 腳本咆槽，那么會將新添加的節(jié)點的主機名加入到 conf/regionservers 文件中。

退役

顧名思義圈纺，就是從當(dāng)前 HBase 集群中刪除某個 RegionServer

停止負載均衡

balance_switch=flase

停止region server

hbase-daemon.sh stop? regret

高可用

在 HBase 中 Hmaster 負責(zé)監(jiān)控 RegionServer 的生命周期罗晕，均衡RegionServer 的負載，如果 Hmaster 掛掉了赠堵，那么整個 HBase 集群將陷入不健康的狀態(tài)，并且此時的工作狀態(tài)并不會維持太久法褥。所以HBase 支持對 Hmaster 的高可用配置茫叭。

[root@master conf]# vim backup-masters

master

slave1

slave2

##遠程拷貝

?hbase 的預(yù)分區(qū)。

首先就是要想明白數(shù)據(jù)的key是如何分布的半等，然后規(guī)劃一下要分成多少region揍愁，每個region的startkey和endkey是多少，然后將規(guī)劃的key寫到一個文件中杀饵。比如莽囤，key的前幾位字符串都是從0001~0010的數(shù)字，這樣可以分成10個region切距。

hbase shell中建分區(qū)表朽缎，指定分區(qū)文件：

create?'split_table_test',?'cf',?{SPLITS_FILE?=>?'region_split_info.txt'}

Hbase 設(shè)計表的時候 rowkey 和分區(qū)考慮哪個？還是都考慮谜悟？

Hbase默認建表時有一個region话肖，這個region的rowkey是沒有邊界的，即沒有startkey和endkey葡幸，在數(shù)據(jù)寫入時最筒，所有數(shù)據(jù)都會寫入這個默認的region，隨著數(shù)據(jù)量的不斷增加蔚叨，此region已經(jīng)不能承受不斷增長的數(shù)據(jù)量床蜘，會進行split辙培，分成2個region。在此過程中邢锯，會產(chǎn)生兩個問題：1.數(shù)據(jù)往一個region上寫,會有寫熱點問題扬蕊。2.region split會消耗寶貴的集群I/O資源〉簦基于此我們可以控制在建表的時候厨相，創(chuàng)建多個空region，并確定每個region的起始和終止rowky鸥鹉，這樣只要我們的rowkey設(shè)計能均勻的命中各個region蛮穿，就不會存在寫熱點問題。自然split的幾率也會大大降低毁渗。當(dāng)然隨著數(shù)據(jù)量的不斷增長践磅，該split的還是要進行split。

Linux優(yōu)化

## 1. 開啟文件預(yù)讀緩存：ra：readahead

blockdev --setra 1024 /dev/sda

## 2. 關(guān)閉進程睡眠池：不允許后臺進程進入睡眠狀態(tài)灸异，如果

這個進程是空閑的府适，那么直接kill掉

sysctl -w vm.swappiness=0

## 調(diào)整允許打開最大的文件數(shù)和線程數(shù)

ulimit -u ## 允許打開最大文件數(shù)

ulimit -n ## 查看允許最大的進程數(shù)

##可以在下面的文件中修改

/etc/security/limits.conf

## 3. 補丁更新

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市肺樟，隨后出現(xiàn)的幾起案子檐春，更是在濱河造成了極大的恐慌，老刑警劉巖么伯，帶你破解...
沈念sama閱讀 218,941評論 6贊 508
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件疟暖，死亡現(xiàn)場離奇詭異，居然都是意外死亡田柔，警方通過查閱死者的電腦和手機俐巴，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,397評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來硬爆，“玉大人欣舵，你說我怎么就攤上這事∽嚎模” “怎么了缘圈？”我有些...
開封第一講書人閱讀 165,345評論 0贊 356
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長袜蚕。經(jīng)常有香客問我准验，道長，這世上最難降的妖魔是什么廷没？我笑而不...
開封第一講書人閱讀 58,851評論 1贊 295
?港島之戀（遺憾婚禮）
正文為了忘掉前任糊饱，我火速辦了婚禮，結(jié)果婚禮上颠黎，老公的妹妹穿的比我還像新娘另锋。我一直安慰自己滞项，他們只是感情好，可當(dāng)我...
茶點故事閱讀 67,868評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布夭坪。她就那樣靜靜地躺著文判，像睡著了一般。火紅的嫁衣襯著肌膚如雪室梅。梳的紋絲不亂的頭發(fā)上戏仓，一...
開封第一講書人閱讀 51,688評論 1贊 305
城市分裂傳說
那天，我揣著相機與錄音亡鼠，去河邊找鬼赏殃。笑死，一個胖子當(dāng)著我的面吹牛间涵，可吹牛的內(nèi)容都是我干的仁热。我是一名探鬼主播，決...
沈念sama閱讀 40,414評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼勾哩，長吁一口氣：“原來是場噩夢啊……” “哼抗蠢！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起思劳，我...
開封第一講書人閱讀 39,319評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤迅矛，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后潜叛，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體诬乞，經(jīng)...
沈念sama閱讀 45,775評論 1贊 315
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,945評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年钠导，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片森瘪。...
茶點故事閱讀 40,096評論 1贊 350
活死人
序言：一個原本活蹦亂跳的男人離奇死亡牡属，死狀恐怖，靈堂內(nèi)的尸體忽然破棺而出扼睬，到底是詐尸還是另有隱情逮栅，我是刑警寧澤，帶...
沈念sama閱讀 35,789評論 5贊 346
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布窗宇，位于F島的核電站措伐，受9級特大地震影響，放射性物質(zhì)發(fā)生泄漏军俊。R本人自食惡果不足惜侥加，卻給世界環(huán)境...
茶點故事閱讀 41,437評論 3贊 331
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望粪躬。院中可真熱鬧担败，春花似錦昔穴、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,993評論 0贊 22
一樁弒父案吗货，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至狈网，卻和暖如春宙搬，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背拓哺。一陣腳步聲響...
開封第一講書人閱讀 33,107評論 1贊 271
情欲美人皮
我被黑心中介騙來泰國打工勇垛，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人拓售。一個月前我還...
沈念sama閱讀 48,308評論 3贊 372
代替公主和親
正文我出身青樓晾剖，卻偏偏與公主長得像，于是被迫代替她去往敵國和親像屋。傳聞我的和親對象是個殘疾皇子鬼吵，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 45,037評論 2贊 355

HBase

簡介?

2.HBase的優(yōu)勢

基本原理

組件說明

Hbase系統(tǒng)架構(gòu)：

HBase 安裝部署

下載

上傳解壓

配置環(huán)境

命令炒作

API操作

HBase的讀寫流程

讀數(shù)據(jù)流程

寫入數(shù)據(jù)流程

HBase的MR

Hbase過濾器FilterListFilterList

過濾器的種類

HBase與Hive的區(qū)別

Hive

HBase

簡單操作

sqoop與HBase的集成操作

Linux優(yōu)化

推薦閱讀更多精彩內(nèi)容