一、Hbase簡介
HBase是Apache Hadoop的數(shù)據(jù)庫,能夠?qū)Υ笮蛿?shù)據(jù)提供隨機(jī)像吻、實(shí)時的讀寫訪問峻黍,是Google的BigTable的開源實(shí)現(xiàn)。HBase的目標(biāo)是存儲并處理大型的數(shù)據(jù)萧豆,更具體地說僅用普通的硬件配置奸披,能夠處理成千上萬的行和列所組成的大型數(shù)據(jù)庫昏名。
HBase是一個開源的涮雷、分布式的、多版本的轻局、面向列的存儲模型洪鸭。可以直接使用本地文件系統(tǒng)也可使用Hadoop的HDFS文件存儲系統(tǒng)仑扑。為了提高數(shù)據(jù)的可靠性和系統(tǒng)的健壯性览爵,并且發(fā)揮HBase處理大型數(shù)據(jù)的能力,還是使用HDFS作為文件存儲系統(tǒng)更佳镇饮。另外蜓竹,HBase存儲的是松散型數(shù)據(jù),具體來說储藐,HBase存儲的數(shù)據(jù)介于映射(key/value)和關(guān)系型數(shù)據(jù)之間俱济。如下圖所示,HBase存儲的數(shù)據(jù)從邏輯上看就是一張很大的表钙勃,并且它的數(shù)據(jù)列可以根據(jù)需要動態(tài)增加蛛碌。每一個cell中的數(shù)據(jù)又可以有多個版本(通過時間戳來區(qū)別),從下圖來看辖源,HBase還具有“向下提供存儲蔚携,向上提供運(yùn)算”的特點(diǎn)。
二克饶、Hbase安裝概述
- 配置hosts酝蜒,確保涉及的主機(jī)名均可以解析為ip。
若已經(jīng)安裝部署好hadoop矾湃,則此步已經(jīng)完成秕硝。 - 編輯hbase-env.xml。
- 編輯hbase-site.xml洲尊。
- 編輯regionservers文件远豺。
- 把Hbase復(fù)制到其它節(jié)點(diǎn)。
- 啟動Hbase坞嘀。
- 驗(yàn)證啟動躯护。
三、安裝步驟
- 配置hosts丽涩,確保涉及的主機(jī)名均可以解析為ip棺滞。
[hadoop@master ~]$ cat /etc/hosts 10.10.18.229 master 10.10.18.221 slave01 10.10.19.231 slave02 10.10.19.232 slave03 10.10.18.230 slave04
- 下載安裝包并解壓
[hadoop@master ~]$ wget http://www-us.apache.org/dist/hbase/stable/hbase-1.2.5-bin.tar.gz [hadoop@master ~]$ tar xvf hbase-1.2.5-bin.tar.gz
- 編輯環(huán)境變量
#添加 [hadoop@master hbase-1.2.5]$ vim ~/.bash_profile export HBASE_HOME=/home/hadoop/hbase-1.2.5 export PATH=$PATH:$HBASE_HOME/bin
- 編輯hbase-env.xml
主要是添加JAVA_HOME環(huán)境變量export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
- 編輯hbase-site.xml
<configuration> <property> <name>hbase.rootdir</name> #設(shè)置hbase數(shù)據(jù)庫存放數(shù)據(jù)的目錄 <value>hdfs://master:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> #打開hbase分布模式 <value>true</value> </property> <property> <name>hbase.master</name> #指定hbase集群主控節(jié)點(diǎn) <value>master:60000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>master,slave01,slave02,slave03,slave04</value> #指定zookeeper集群節(jié)點(diǎn)名,因?yàn)槭怯蓏ookeeper表決算法決定的 </property> <property> <name>hbase.zookeeper.property.dataDir</name> #指zookeeper集群data目錄 <value>/home/hadoop/hbase-1.2.5/zookeeper</value> </property> </configuration>
- 編輯regionservers文件
[hadoop@master conf]$ cat regionservers slave01 slave02 slave03 slave04
- 將修改的hbase目錄同步其它節(jié)點(diǎn)
[hadoop@master ~]$ scp -r hbase-1.2.5 slave01:~/ [hadoop@master ~]$ scp -r hbase-1.2.5 slave02:~/ [hadoop@master ~]$ scp -r hbase-1.2.5 slave03:~/ [hadoop@master ~]$ scp -r hbase-1.2.5 slave04:~/
- 啟動/關(guān)閉Hbase數(shù)據(jù)庫集群
#啟動hbase之前必需檢查hadoop是否已經(jīng)啟動 [hadoop@master ~]$ hdfs dfsadmin -report |less Configured Capacity: 9508728098816 (8.65 TB) Present Capacity: 7003711967546 (6.37 TB) DFS Remaining: 5616475771026 (5.11 TB) DFS Used: 1387236196520 (1.26 TB) DFS Used%: 19.81% Under replicated blocks: 137 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (4): ... #啟動Hbase集群 [hadoop@master conf]$ start-hbase.sh slave01: starting zookeeper, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-zookeeper-slave01.out slave03: starting zookeeper, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-zookeeper-slave03.out master: starting zookeeper, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-zookeeper-master.out slave04: starting zookeeper, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-zookeeper-slave04.out slave02: starting zookeeper, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-zookeeper-slave02.out starting master, logging to /home/hadoop/hbase-1.2.5/logs/hbase-hadoop-master-master.out slave01: starting regionserver, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-regionserver-slave01.out slave03: starting regionserver, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-regionserver-slave03.out slave02: starting regionserver, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-regionserver-slave02.out slave04: starting regionserver, logging to /home/hadoop/hbase-1.2.5/bin/../logs/hbase-hadoop-regionserver-slave04.out #可以看到在master上多了兩個進(jìn)程HQuorumPeer和HMaster [hadoop@master conf]$ jps 13154 Jps 46355 ResourceManager 9736 RunJar 45787 NameNode 46090 SecondaryNameNode 12668 HMaster 8641 JobHistoryServer 12559 HQuorumPeer #在slave節(jié)點(diǎn)上也多了兩個進(jìn)程:HQuorumPeer和HRegionServer [hadoop@master conf]$ ssh slave01 Last login: Thu May 4 17:31:40 2017 from master [hadoop@slave01 ~]$ jps 13244 DataNode 15944 NodeManager 5688 HQuorumPeer 6057 Jps 5800 HRegionServer
- 查看Hbase狀態(tài)
#進(jìn)入Hbase shell [hadoop@master conf]$ hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 1.2.5, rd7b05f79dee10e0ada614765bb354b93d615a157, Wed Mar 1 00:34:48 CST 2017 #可以看到裁蚁,當(dāng)前存在一個活動master主機(jī),3臺regionserver主機(jī)继准,貌似還少一臺枉证。后來發(fā)現(xiàn)是因?yàn)閞egionserver主機(jī)時間與master主機(jī)不同步,時間比master主機(jī)早了導(dǎo)致的移必。 hbase(main):001:0> status 1 active master, 0 backup masters, 3 servers, 0 dead, 0.3333 average load #安裝配置ntp服務(wù)后室谚,4個regionserver都存在了 hbase(main):001:0> status 1 active master, 0 backup masters, 4 servers, 0 dead, 0.5000 average load