cdh集群安裝

?

[if !supportLists]第1章????????[endif]Cloudera Manager

1.1 CM簡(jiǎn)介

1.1.1 CM簡(jiǎn)介

Cloudera Manager是一個(gè)擁有集群自動(dòng)化安裝空幻、中心化管理、集群監(jiān)控、報(bào)警功能的一個(gè)工具茴丰,使得安裝集群從幾天的時(shí)間縮短在幾個(gè)小時(shí)內(nèi)卫旱,運(yùn)維人員從數(shù)十人降低到幾人以內(nèi),極大的提高集群管理的效率。

1.1.2 CM架構(gòu)

1.2 環(huán)境準(zhǔn)備

1.2.1 虛擬機(jī)準(zhǔn)備

克隆三臺(tái)虛擬機(jī)(hadoop102、hadoop103虱岂、hadoop104),配置好對(duì)應(yīng)主機(jī)的網(wǎng)絡(luò)IP漠吻、主機(jī)名稱量瓜、關(guān)閉防火墻。

設(shè)置hadoop102途乃、hadoop103、hadoop104的主機(jī)對(duì)應(yīng)內(nèi)存分別是:16G扔傅、4G耍共、4G

1.2.2 SSH免密登錄

配置hadoop102對(duì)hadoop102烫饼、hadoop103、hadoop104三臺(tái)服務(wù)器免密登錄试读。

1)生成公鑰和私鑰:

[root@hadoop102 .ssh]$

ssh-keygen -t rsa

然后敲(三個(gè)回車)杠纵,就會(huì)生成兩個(gè)文件id_rsa(私鑰)、id_rsa.pub(公鑰)

2)將公鑰拷貝到要免密登錄的目標(biāo)機(jī)器上

[root@hadoop102 .ssh]$

ssh-copy-id hadoop102

[root@hadoop102 .ssh]$

ssh-copy-id hadoop103

[root@hadoop102 .ssh]$

ssh-copy-id hadoop104

3)重復(fù)1和2的操作钩骇,配置hadoop103對(duì)hadoop102比藻、hadoop103、hadoop104三臺(tái)服務(wù)器免密登錄倘屹。

1.2.3 集群同步腳本

1)在/root目錄下創(chuàng)建bin目錄银亲,并在bin目錄下創(chuàng)建文件xsync,文件內(nèi)容如下:

[root@hadoop102 ~]$ mkdir bin

[root@hadoop102 ~]$ cd bin/

[root@hadoop102 bin]$ vi xsync

在該文件中編寫如下代碼

#!/bin/bash

#1 獲取輸入?yún)?shù)個(gè)數(shù)纽匙,如果沒有參數(shù),直接退出

pcount=$#

if((pcount==0)); then

echo no args;

exit;

fi


#2 獲取文件名稱

p1=$1

fname=`basename $p1`

echo fname=$fname


#3 獲取上級(jí)目錄到絕對(duì)路徑

pdir=`cd -P $(dirname $p1); pwd`

echo pdir=$pdir


#4 獲取當(dāng)前用戶名稱

user=`whoami`


#5 循環(huán)

for((host=103; host<105; host++)); do

???????echo ------------------- hadoop$host --------------

???????rsync -av $pdir/$fname $user@hadoop$host:$pdir

done

2)修改腳本 xsync 具有執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 xsync

1.2.4 安裝JDK(三臺(tái))

1)在hadoop102的/opt目錄下創(chuàng)建module和software文件夾

[root@hadoop102 opt]# mkdir module

[root@hadoop102 opt]# mkdir software

2)用SecureCRT將jdk-8u144-linux-x64.tar.gz導(dǎo)入到hadoop102的/opt/software目錄

3)在Linux系統(tǒng)下的opt目錄中查看軟件包是否導(dǎo)入成功

[root@hadoop102 software]$ ls

jdk-8u144-linux-x64.tar.gz

4)解壓JDK到/opt/module目錄下,并修改文件的所有者和所有者組為root

[root@hadoop102 software]$ tar -zxvf

jdk-8u144-linux-x64.tar.gz -C /opt/module/

[root@hadoop102 module]# chown root:root

jdk1.8.0_144/ -R

5)配置JDK環(huán)境變量

?????? (1)打開/etc/profile文件

[root@hadoop102 software]$ vi

/etc/profile

在profile文件末尾添加JDK路徑

#JAVA_HOME

export JAVA_HOME=/opt/module/jdk1.8.0_144

export PATH=$PATH:$JAVA_HOME/bin

?????? (2)讓修改后的文件生效

[root@hadoop102 jdk1.8.0_144]$ source

/etc/profile

6)測(cè)試JDK是否安裝成功

[root@hadoop102 jdk1.8.0_144]# java

-version

java version "1.8.0_144"

7)將hadoop102中的JDK和環(huán)境變量分發(fā)到hadoop103聚请、hadoop104兩臺(tái)主機(jī)

[root@hadoop102 opt]# xsync /opt/module/

[root@hadoop102 opt]# xsync /etc/profile


分別在hadoop103柜砾、hadoop104上source一下

[root@hadoop103 ~]$ source /etc/profile

[root@hadoop104 ~]# source /etc/profile

1.2.5 集群整體操作腳本

?????? 1)在/root/bin目錄下創(chuàng)建腳本xcall.sh

[root@hadoop102 bin]$ vim xcall.sh

?????? 2)在腳本中編寫如下內(nèi)容

#! /bin/bash


for i in hadoop102 hadoop103 hadoop104

do

???????echo --------- $i ----------

???????ssh $i "$*"

done

3)修改腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 xcall.sh

4)將/etc/profile文件追加到~/.bashrc后面

[root@hadoop102 module]# cat /etc/profile>> ~/.bashrc

[root@hadoop103 module]# cat /etc/profile>> ~/.bashrc

[root@hadoop104 module]# cat /etc/profile>> ~/.bashrc

5)測(cè)試

[root@hadoop102 bin]# xcall.sh jps

1.2.6 安裝MySQL

注意:一定要用root用戶操作如下步驟;先卸載MySQL再安裝

1)安裝包準(zhǔn)備

?????? (1)查看MySQL是否安裝

[root@hadoop102 桌面]# rpm

-qa|grep mysql

mysql-libs-5.1.73-7.el6.x86_64

?????? (2)如果安裝了MySQL践瓷,就先卸載

[root@hadoop102 桌面]#

rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64

?????? (3)上傳mysql-libs.zip到hadoop102的/opt/software目錄院喜,并解壓文件到當(dāng)前目錄

[root@hadoop102 software]# unzip

mysql-libs.zip

[root@hadoop102 software]# ls

mysql-libs.zip

mysql-libs

?????? (4)進(jìn)入到mysql-libs文件夾下

?[root@hadoop102 mysql-libs]#ll

總用量76048

-rw-r--r--. 1 root root 18509960 3月?26 2015 MySQL-client-5.6.24-1.el6.x86_64.rpm

-rw-r--r--. 1 root root?3575135 12月? 1 2013 mysql-connector-java-5.1.27.tar.gz

-rw-r--r--. 1 root root 55782196 3月?26 2015 MySQL-server-5.6.24-1.el6.x86_64.rpm

2)安裝MySQL服務(wù)器

(1)安裝MySQL服務(wù)端

[root@hadoop102 mysql-libs]# rpm -ivh

MySQL-server-5.6.24-1.el6.x86_64.rpm

(2)查看產(chǎn)生的隨機(jī)密碼

[root@hadoop102 mysql-libs]# cat /root/.mysql_secret

OEXaQuS8IWkG19Xs

(3)查看MySQL狀態(tài)

[root@hadoop102 mysql-libs]# service mysql status

(4)啟動(dòng)MySQL

[root@hadoop102 mysql-libs]# service mysql start

3)安裝MySQL客戶端

(1)安裝MySQL客戶端

[root@hadoop102 mysql-libs]# rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm

(2)鏈接MySQL(密碼替換成產(chǎn)生的隨機(jī)密碼)

[root@hadoop102 mysql-libs]# mysql -uroot -pOEXaQuS8IWkG19Xs

(3)修改密碼

mysql>SET PASSWORD=PASSWORD('000000');

(4)退出MySQL

mysql>exit

4)MySQL中user表中主機(jī)配置

配置只要是root用戶+密碼,在任何主機(jī)上都能登錄MySQL數(shù)據(jù)庫(kù)晕翠。

(1)進(jìn)入MySQL

[root@hadoop102 mysql-libs]# mysql -uroot -p000000

(2)顯示數(shù)據(jù)庫(kù)

mysql>show databases;

(3)使用MySQL數(shù)據(jù)庫(kù)

mysql>use mysql;

(4)展示MySQL數(shù)據(jù)庫(kù)中的所有表

mysql>show tables;

(5)展示user表的結(jié)構(gòu)

mysql>desc user;

(6)查詢user表

mysql>select User, Host, Password from user;

(7)修改user表喷舀,把Host表內(nèi)容修改為%

mysql>update user set host='%' where host='localhost';

(8)刪除root用戶的其他host

mysql>

delete from user where Host='hadoop102';

delete from user where Host='127.0.0.1';

delete from user where Host='::1';

(9)刷新

mysql>flushprivileges;

(10)退出

mysql>quit;

1.2.7 創(chuàng)建CM用的數(shù)據(jù)庫(kù)

在MySQL中依次創(chuàng)建監(jiān)控?cái)?shù)據(jù)庫(kù)、Hive數(shù)據(jù)庫(kù)崖面、Oozie數(shù)據(jù)庫(kù)元咙、Hue數(shù)據(jù)庫(kù)

1)啟動(dòng)數(shù)據(jù)庫(kù)

[root@hadoop102 ~]# mysql -uroot -p000000

2)集群監(jiān)控?cái)?shù)據(jù)庫(kù)

mysql> create database amon DEFAULT CHARSET utf8 COLLATE

utf8_general_ci;

3)Hive數(shù)據(jù)庫(kù)

mysql> create database hive DEFAULT CHARSET utf8 COLLATE

utf8_general_ci;

4)Oozie數(shù)據(jù)庫(kù)

mysql> create database oozie DEFAULT CHARSET utf8 COLLATE

utf8_general_ci;

5)Hue數(shù)據(jù)庫(kù)

mysql> create database hue

DEFAULT CHARSET utf8 COLLATE utf8_general_ci;

6)關(guān)閉數(shù)據(jù)庫(kù)

mysql> quit;

1.2.8 下載第三方依賴

依次在三臺(tái)節(jié)點(diǎn)(所有Agent的節(jié)點(diǎn))上執(zhí)行下載第三方依賴(注意:需要聯(lián)網(wǎng))

[root@hadoop102 ~]# yum -y

install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain

cyrus-sasl-gssapi fuse fuse-libs redhat-lsb


[root@hadoop103 ~]# yum -y

install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain

cyrus-sasl-gssapi fuse fuse-libs redhat-lsb


[root@hadoop104 ~]# yum -y

install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain

cyrus-sasl-gssapi fuse fuse-libs redhat-lsb

1.2.9 關(guān)閉SELINUX

安全增強(qiáng)型Linux(Security-Enhanced Linux)簡(jiǎn)稱SELinux,它是一個(gè) Linux 內(nèi)核模塊巫员,也是Linux的一個(gè)安全子系統(tǒng)庶香。為了避免安裝過(guò)程出現(xiàn)各種錯(cuò)誤,建議關(guān)閉简识,有如下兩種關(guān)閉方法:

1)臨時(shí)關(guān)閉(不建議使用)

[root@hadoop102 ~]# setenforce 0

但是這種方式只對(duì)當(dāng)次啟動(dòng)有效赶掖,重啟機(jī)器后會(huì)失效。

2)永久關(guān)閉(建議使用)

修改配置文件/etc/selinux/config

[root@hadoop102 ~]# vim

/etc/selinux/config

將SELINUX=enforcing 改為SELINUX=disabled

SELINUX=disabled

3)同步/etc/selinux/config配置文件

[root@hadoop102 ~]# xsync

/etc/selinux/config

4)重啟hadoop102七扰、hadoop103奢赂、hadoop104主機(jī)

[root@hadoop102 ~]# reboot

[root@hadoop103 ~]# reboot

[root@hadoop104 ~]# reboot

1.2.10 配置NTP時(shí)鐘同步

1)NTP服務(wù)器配置

[root@hadoop102 ~]# vi /etc/ntp.conf

①注釋掉所有的restrict開頭的配置

②修改#restrict

192.168.1.0 mask 255.255.255.0 nomodify notrap

為restrict

192.168.1.102 mask 255.255.255.0 nomodify notrap

③將所有server配置進(jìn)行注釋

④添加下面兩行內(nèi)容

server

127.127.1.0

fudge 127.127.1.0

stratum 10

2)啟動(dòng)NTP服務(wù)service ntpd start

[root@hadoop102 ~]# service ntpd start

3)NTP客戶端配置(在agent主機(jī)上進(jìn)行配置hadoop103,hadoop104)

[root@hadoop103 ~]# vi /etc/ntp.conf


①注釋所有restrict和server配置

②添加server 192.168.1.102

4)手動(dòng)測(cè)試

[root@hadoop103~]# ntpdate 192.168.1.102

顯示如下內(nèi)容為成功:

17 Jun 15:34:38 ntpdate[9247]: step time

server 192.168.1.102 offset 77556618.173854 sec

如果顯示如下內(nèi)容需要先關(guān)閉ntpd:

17 Jun 15:25:42 ntpdate[8885]: the NTP

socket is in use, exiting

5)啟動(dòng)ntpd并設(shè)置為開機(jī)自啟(每個(gè)節(jié)點(diǎn)hadoop102,hadoop103颈走,hadoop104)

[root@hadoop103 ~]#? chkconfig ntpd on

[root@hadoop103 ~]#?service ntpd start

6)使用群發(fā)date命令查看結(jié)果

1.3 CM安裝部署

1.3.1 CM下載地址

1)CM下載地址:http://archive.cloudera.com/cm5/cm/5/

2)離線庫(kù)下載地址:http://archive.cloudera.com/cdh5/parcels

1.3.2 CM安裝

注:以下所有操作均使用root用戶

1)創(chuàng)建/opt/module/cm目錄

[root@hadoop102 module]# mkdir –p

/opt/module/cm

2)上傳cloudera-manager-el6-cm5.12.1_x86_64.tar.gz到hadoop102的/opt/software目錄膳灶,并解壓到/opt/module/cm目錄

[root@hadoop102 software]# tar -zxvf

cloudera-manager-el6-cm5.12.1_x86_64.tar.gz -C /opt/module/cm

3)分別在hadoop102、hadoop103、hadoop104創(chuàng)建用戶cloudera-scm

[root@hadoop102 module]#

useradd \

--system \

--home=/opt/module/cm/cm-5.12.1/run/cloudera-scm-server

\

--no-create-home \

--shell=/bin/false \

--comment "Cloudera SCM User"

cloudera-scm


[root@hadoop103 module]#

useradd \

--system \

--home=/opt/module/cm/cm-5.12.1/run/cloudera-scm-server

\

--no-create-home \

--shell=/bin/false \

--comment "Cloudera SCM User" cloudera-scm


[root@hadoop104 module]#

useradd \

--system \

--home=/opt/module/cm/cm-5.12.1/run/cloudera-scm-server

\

--no-create-home \

--shell=/bin/false \

--comment "Cloudera SCM User"

cloudera-scm

參數(shù)說(shuō)明:

--system 創(chuàng)建一個(gè)系統(tǒng)賬戶

--home 指定用戶登入時(shí)的主目錄轧钓,替換系統(tǒng)默認(rèn)值/home/<用戶名>

--no-create-home 不要?jiǎng)?chuàng)建用戶的主目錄

--shell 用戶的登錄 shell 名

--comment 用戶的描述信息

注意:Cloudera

Manager默認(rèn)去找用戶cloudera-scm序厉,創(chuàng)建完該用戶后,將自動(dòng)使用此用戶毕箍。

4)修改CM Agent配置

修改文件/opt/module/cm/cm-5.12.1/etc/cloudera-scm-agent/

config.ini的主機(jī)名稱

[root@hadoop102 cloudera-scm-agent]# vim

/opt/module/cm/cm-5.12.1/etc/cloudera-scm-agent/config.ini

修改主機(jī)名稱

server_host=hadoop102

5)配置CM的數(shù)據(jù)庫(kù)

拷貝mysql-connector-java-5.1.27-bin.jar文件到目錄/usr/share/java/????

[root@hadoop102 cm]# mkdir –p /usr/share/java/

[root@hadoop102 mysql-libs]# tar -zxvf

mysql-connector-java-5.1.27.tar.gz


[root@hadoop102 mysql-libs]# cp

/opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar

/usr/share/java/


[root@hadoop102 mysql-libs]# mv

/usr/share/java/mysql-connector-java-5.1.27-bin.jar

/usr/share/java/mysql-connector-java.jar

注意:jar包名稱要修改為mysql-connector-java.jar

6

)使用CM自帶的腳本弛房,在MySQL中創(chuàng)建CM庫(kù)

[root@hadoop102 cm-5.12.1]#

/opt/module/cm/cm-5.12.1/share/cmf/schema/scm_prepare_database.sh

mysql cm -hhadoop102-uroot -p000000 --scm-host hadoop102 scm

scm scm

參數(shù)說(shuō)明

-h:Database host

-u:Databaseusername

-p:DatabasePassword

--scm-host:SCM server's hostname

7)分發(fā)cm

[root@hadoop102 module]# xsync

/opt/module/cm

8)創(chuàng)建Parcel-repo

[root@hadoop102 module]# mkdir -p /opt/cloudera/parcel-repo

[root@hadoop102 module]# chown

cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo

9)拷貝下載文件manifest.json、CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1而柑、CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel到hadoop102的/opt/cloudera/parcel-repo/目錄下

[root@hadoop102 parcel-repo]# ls

CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel?CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1?

manifest.json

10)將CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1:需改名為

CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha

[root@hadoop102 parcel-repo]# mv

CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha1

CDH-5.12.1-1.cdh5.12.1.p0.3-el6.parcel.sha

11)在hadoop102上創(chuàng)建目錄/opt/cloudera/parcels文捶,并修改該目錄的所屬用戶及用戶組為cloudera-scm

[root@hadoop102 module]# mkdir -p

/opt/cloudera/parcels

[root@hadoop102 module]# chown

cloudera-scm:cloudera-scm /opt/cloudera/parcels

12)分發(fā)/opt/cloudera/

[root@hadoop102 opt]# xsync

/opt/cloudera/

1.3.3 啟動(dòng)CM服務(wù)

1)啟動(dòng)服務(wù)節(jié)點(diǎn):hadoop102

[root@hadoop102 cm]# /opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server

start

Starting cloudera-scm-server:????????????????????????????? [確定]

2)啟動(dòng)工作節(jié)點(diǎn):hadoop102、hadoop103媒咳、hadoop104

[root@hadoop102 cm]# /opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent

start


[root@hadoop103 cm]#/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent start


[root@hadoop104 cm]#/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent start

注意:?jiǎn)?dòng)過(guò)程非常慢粹排,Manager啟動(dòng)成功需要等待5分鐘左右,過(guò)程中會(huì)在數(shù)據(jù)庫(kù)中創(chuàng)建對(duì)應(yīng)的表需要耗費(fèi)一些時(shí)間伟葫。

3)查看被占用則表示安裝成功了:薮辍!筏养!

[root@hadoop102 cm]# netstat -anp | grep

7180

tcp???????0????? 0 0.0.0.0:7180??????????????? 0.0.0.0:*?????????????????? LISTEN????? 5498/java

4)訪問(wèn)http://hadoop102:7180斧抱,(用戶名、密碼:admin)

1.3.4 關(guān)閉CM服務(wù)

1)關(guān)閉工作節(jié)點(diǎn):hadoop102渐溶、hadoop103辉浦、hadoop104

[root@hadoop102 cm]# /opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent

stop

Stopping cloudera-scm-agent:?????????????????????????????? [確定]

[root@hadoop103 cm]#

/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent stop

Stopping cloudera-scm-agent:? ?????????????????????????????[確定]

[root@hadoop104 cm]#

/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-agent stop

Stopping cloudera-scm-agent:?????????????????????????????? [確定]

2)關(guān)閉服務(wù)節(jié)點(diǎn):hadoop102

[root@hadoop102 cm]#

/opt/module/cm/cm-5.12.1/etc/init.d/cloudera-scm-server stop

停止cloudera-scm-server:???????????????????????????????? [確定]

1.4 CM的集群部署

1.4.1 接受條款和協(xié)議

1.4.2 選擇免費(fèi)

1.4.3 指定主機(jī)

1.4.4 選擇CDH的版本5.12.1

1.4.5 等待下載安裝

第2章數(shù)據(jù)采集模塊

2.1 HDFS、YARN茎辐、Zookeeper安裝

2.1.1 選擇自定義安裝

2.1.2 選擇安裝服務(wù)

2.1.3 分配節(jié)點(diǎn)

2.1.4 集群設(shè)置全部選默認(rèn)即可

2.1.5 自動(dòng)啟動(dòng)進(jìn)程

2.1.6 修改HDFS的權(quán)限檢查配置

關(guān)閉HDFS中的權(quán)限檢查:dfs.permissions宪郊。

2.1.7 配置Hadoop支持LZO

1)點(diǎn)擊主機(jī),在下拉菜單中點(diǎn)擊Parcel

2)點(diǎn)擊配置

3)找到遠(yuǎn)程Parcel存儲(chǔ)庫(kù)URL拖陆,點(diǎn)擊最后一欄的加號(hào)弛槐,增加一欄,輸入http://archive.cloudera.com/gplextras/parcels/latest/依啰,之后點(diǎn)擊保存更改

4)返回Parcel列表乎串,可以看到多出了LZO,選擇下載速警,下載完成后選擇分配叹誉,分配完成后選擇激活。

5)安裝完LZO后闷旧,打開HDFS配置长豁,找到“壓縮編碼解碼器”一欄,點(diǎn)擊加號(hào)忙灼,添加com.hadoop.compression.lzo.LzopCodec后保存更改

6)打開YARN配置匠襟,找到MR 應(yīng)用程序 Classpath,添加/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/hadoop-lzo-cdh4-0.4.15-gplextras.jar

7)更新過(guò)期配置,重啟進(jìn)程


2.2 Flume安裝

2.2.1 日志采集Flume安裝

1)添加服務(wù)

2)選擇Flume宅此,點(diǎn)擊繼續(xù)

3)選擇節(jié)點(diǎn)

4)完成

2.2.2 日志采集Flume配置

1)Flume配置分析

Flume直接讀log日志的數(shù)據(jù)机错,log日志的格式是app-yyyy-mm-dd.log爬范。

2)Flume的具體配置如下:

(1)在CM管理頁(yè)面上點(diǎn)擊Flume

(2)在實(shí)例頁(yè)面選擇hadoop102上的Agent

(3)在CM管理頁(yè)面hadoop102上Flume的配置中找到代理名稱改為a1

(4)在配置文件如下內(nèi)容(flume-kafka)

a1.sources=r1

a1.channels=c1 c2

a1.sinks=k1 k2


# configure source

a1.sources.r1.type = TAILDIR

a1.sources.r1.positionFile =

/opt/module/flume/log_position.json

a1.sources.r1.filegroups = f1

a1.sources.r1.filegroups.f1 = /tmp/logs/app.+

a1.sources.r1.fileHeader = true

a1.sources.r1.channels = c1 c2


#interceptor

a1.sources.r1.interceptors = i1 i2

a1.sources.r1.interceptors.i1.type = com.atguigu.flume.interceptor.LogETLInterceptor$Builder

a1.sources.r1.interceptors.i2.type = com.atguigu.flume.interceptor.LogTypeInterceptor$Builder


# selector

a1.sources.r1.selector.type =

multiplexing

a1.sources.r1.selector.header = topic

a1.sources.r1.selector.mapping.topic_start = c1

a1.sources.r1.selector.mapping.topic_event = c2


# configure channel

a1.channels.c1.type = memory

a1.channels.c1.capacity=10000

a1.channels.c1.byteCapacityBufferPercentage=20


a1.channels.c2.type = memory

a1.channels.c2.capacity=10000

a1.channels.c2.byteCapacityBufferPercentage=20


# configure sink

# start-sink

a1.sinks.k1.type =

org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k1.kafka.topic = topic_start

a1.sinks.k1.kafka.bootstrap.servers = hadoop102:9092,hadoop103:9092,hadoop104:9092

a1.sinks.k1.kafka.flumeBatchSize = 2000

a1.sinks.k1.kafka.producer.acks = 1

a1.sinks.k1.channel = c1


# event-sink

a1.sinks.k2.type =

org.apache.flume.sink.kafka.KafkaSink

a1.sinks.k2.kafka.topic = topic_event

a1.sinks.k2.kafka.bootstrap.servers =

hadoop102:9092,hadoop103:9092,hadoop104:9092

a1.sinks.k2.kafka.flumeBatchSize = 2000

a1.sinks.k2.kafka.producer.acks = 1

a1.sinks.k2.channel = c2

??????????? 注意:com.atguigu.flume.interceptor.LogETLInterceptor和com.atguigu.flume.interceptor.LogTypeInterceptor是自定義的攔截器的全類名父腕。需要根據(jù)用戶自定義的攔截器做相應(yīng)修改。

(3)修改/opt/module/flume/log_position.json的讀寫權(quán)限

[root@hadoop102 module]# mkdir -p

/opt/module/flume

[root@hadoop102 flume]# touch

log_position.json

[root@hadoop102 flume]# chmod 777

log_position.json

[root@hadoop102 module]# xsync flume/

注意:Json文件的父目錄一定要?jiǎng)?chuàng)建好青瀑,并改好權(quán)限

2.2.3 Flume攔截器

本項(xiàng)目中自定義了兩個(gè)攔截器璧亮,分別是:ETL攔截器、日志類型區(qū)分?jǐn)r截器斥难。

ETL攔截器主要用于枝嘶,過(guò)濾時(shí)間戳不合法和Json數(shù)據(jù)不完整的日志

日志類型區(qū)分?jǐn)r截器主要用于,將啟動(dòng)日志和事件日志區(qū)分開來(lái)哑诊,方便發(fā)往Kafka的不同Topic群扶。

1)創(chuàng)建Maven工程flume-interceptor

2)創(chuàng)建包名:com.atguigu.flume.interceptor

3)在pom.xml文件中添加如下配置

<dependencies>

??? <dependency>

??????? <groupId>org.apache.flume</groupId>

??????? <artifactId>flume-ng-core</artifactId>

??????? <version>1.7.0</version>

??? </dependency>

</dependencies>

<build>

??? <plugins>

??????? <plugin>

??????????? <artifactId>maven-compiler-plugin</artifactId>

??????????? <version>2.3.2</version>

??????????? <configuration>

??????????????? <source>1.8</source>

??????????????? <target>1.8</target>

??????????? </configuration>

??????? </plugin>

??????? <plugin>

??????????? <artifactId>maven-assembly-plugin</artifactId>

??????????? <configuration>

??????????????? <descriptorRefs>

??????????????????? <descriptorRef>jar-with-dependencies</descriptorRef>

??????????????? </descriptorRefs>

??????????? </configuration>

??????????? <executions>

??????????????? <execution>

??????????????????? <id>make-assembly</id>

??????????????????? <phase>package</phase>

??????????????????? <goals>

??????????????????????? <goal>single</goal>

??????????????????? </goals>

??????????????? </execution>

??????????? </executions>

??????? </plugin>

??? </plugins>

</build>

4)在com.atguigu.flume.interceptor包下創(chuàng)建LogETLInterceptor類名

Flume ETL攔截器LogETLInterceptor

package com.atguigu.flume.interceptor;

import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.interceptor.Interceptor;

import java.nio.charset.Charset;import java.util.ArrayList;import java.util.List;

public class LogETLInterceptor implements Interceptor {

??? @Override

??? public void initialize() {

??? }

??? @Override

??? public Event intercept(Event event) {

??????? // 1 獲取數(shù)據(jù)

??????? byte[] body = event.getBody();

??????? String log =new String(body, Charset.forName("UTF-8"));

??????? // 2 判斷數(shù)據(jù)類型并向Header中賦值

??????? if (log.contains("start")) {

??????????? if (LogUtils.validateStart(log)){

??????????????? return event;

??????????? }

??????? }else {

??????????? if (LogUtils.validateEvent(log)){

??????????????? return event;

??????????? }

??????? }

??????? // 3 返回校驗(yàn)結(jié)果

??????? return null;

??? }

??? @Override

??? public Listintercept(List events) {

??????? ArrayListinterceptors =new ArrayList<>();

??????? for (Event event : events) {

??????????? Event intercept1 =intercept(event);

??????????? if (intercept1 != null){

???????????????interceptors.add(intercept1);

??????????? }

??????? }

???????return interceptors;

??? }

??? @Override

??? public void close() {

??? }

??? public static class Builder implements Interceptor.Builder{

??????? @Override

??????? public Interceptor build() {

??????????? return new LogETLInterceptor();

??????? }

??????? @Override

??????? public void configure(Context context) {

??????? }

??? }

}


4)Flume日志過(guò)濾工具類

package com.atguigu.flume.interceptor;import org.apache.commons.lang.math.NumberUtils;

public class LogUtils {

??? public static boolean validateEvent(String log) {

??????? // 服務(wù)器時(shí)間| json

??????? // 1549696569054 |{"cm":{"ln":"-89.2","sv":"V2.0.4","os":"8.2.0","g":"M67B4QYU@gmail.com","nw":"4G","l":"en","vc":"18","hw":"1080*1920","ar":"MX","uid":"u8678","t":"1549679122062","la":"-27.4","md":"sumsung-12","vn":"1.1.3","ba":"Sumsung","sr":"Y"},"ap":"weather","et":[]}

??????? // 1切割

??????? String[] logContents = log.split("\\|");

??????? // 2 校驗(yàn)

??????? if(logContents.length != 2){

??????????? return false;

??????? }

??????? //3 校驗(yàn)服務(wù)器時(shí)間

??????? if (logContents[0].length()!=13 || !NumberUtils.isDigits(logContents[0])){

??????????? return false;

??????? }

??????? // 4 校驗(yàn)json

??????? if (!logContents[1].trim().startsWith("{") || !logContents[1].trim().endsWith("}")){

??????????? return false;

??????? }

??????? return true;

??? }

??? public static boolean validateStart(String log) {

??????? if (log == null){

??????????? return false;

??????? }

??????? // 校驗(yàn)json

??????? if (!log.trim().startsWith("{") || !log.trim().endsWith("}")){

??????????? return false;

??????? }

??????? return true;

??? }

}


5)Flume日志類型區(qū)分?jǐn)r截器LogTypeInterceptor

package com.atguigu.flume.interceptor;

import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.interceptor.Interceptor;

import java.nio.charset.Charset;import java.util.ArrayList;import java.util.List;import java.util.Map;

public class LogTypeInterceptor implements Interceptor {

??? @Override

??? public void initialize() {

??? }

??? @Override

??? public Event intercept(Event event) {

??????? // 區(qū)分日志類型:??body? header

??????? // 1獲取body數(shù)據(jù)

??????? byte[] body = event.getBody();

??????? String log =new String(body, Charset.forName("UTF-8"));

??????? // 2 獲取header

??????? Map headers =event.getHeaders();

??????? // 3 判斷數(shù)據(jù)類型并向Header中賦值

???????if (log.contains("start")) {

??????????? headers.put("topic","topic_start");

??????? }else {

??????????? headers.put("topic","topic_event");

??????? }

??????? return event;

??? }

??? @Override

??? public List intercept(Listevents) {

??????? ArrayListinterceptors =new ArrayList<>();

??????? for (Event event : events) {

??????????? Event intercept1 =intercept(event);

??????????? interceptors.add(intercept1);

??????? }

??????? return interceptors;

??? }

??? @Override

??? public void close() {

??? }

??? public static class Builder implements? Interceptor.Builder{

??????? @Override

??????? public Interceptor build() {

??????????? return new LogTypeInterceptor();

??????? }

??????? @Override

??????? public void configure(Context context) {

??????? }

??? }

}

6)打包

攔截器打包之后,只需要單獨(dú)包镀裤,不需要將依賴的包上傳竞阐。打包之后要放入flume的lib文件夾下面。

注意:為什么不需要依賴包暑劝?因?yàn)橐蕾嚢趂lume的lib目錄下面已經(jīng)存在了骆莹。

7)采用root用戶將flume-interceptor-1.0-SNAPSHOT.jar包放入到hadoop102的/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/flume-ng/lib/文件夾下面。

[root@hadoop102

lib]# ls | grep interceptor

flume-interceptor-1.0-SNAPSHOT.jar

8)分發(fā)Flume到hadoop103

[root@hadoop102

lib]$ xsync flume-interceptor-1.0-SNAPSHOT.jar

9)啟動(dòng)Flume

10)查看Flume日志

2.3 Kafka安裝

可以選擇在線安裝和離線包安裝担猛,在線安裝下載時(shí)間較長(zhǎng)幕垦,離線包安裝時(shí)間較短。這里我們?yōu)榱斯?jié)省時(shí)間傅联,選擇離線安裝先改。

2.3.1 導(dǎo)入離線包

1)在hadoop102上創(chuàng)建/opt/cloudera/csd目錄

[root@hadoop102 parcel-repo]# mkdir -p

/opt/cloudera/csd

2)上傳KAFKA-1.2.0.jar到/opt/cloudera/csd目錄,并修改所有者和所有者的組

[root@hadoop102 cloudera]# chown

cloudera-scm:cloudera-scm /opt/cloudera/csd/ -R

3)上傳KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel蒸走、KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel.sha1到/opt/cloudera/parcel-repo目錄仇奶,并修改KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel.sha1名稱為KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel.sha

[root@hadoop102 parcel-repo]# mv

/opt/cloudera/parcel-repo/KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel.sha1

/opt/cloudera/parcel-repo/KAFKA-4.0.0-1.4.0.0.p0.1-el6.parcel.sha

4)ClouderManager中選擇Parcel->檢查Parcel->Kafka點(diǎn)擊分配->激活

5)ClouderManager中選擇Parcel->檢查Parcel->Kafka點(diǎn)擊分配->激活

2.3.2 在線下載安裝包(不選)

1)點(diǎn)擊主機(jī),選擇Parcel

2)找到Kafka點(diǎn)擊下載载碌,下載完成后點(diǎn)擊分配進(jìn)行分配猜嘱,然后點(diǎn)擊激活,出現(xiàn)已分配嫁艇,已激活則證明分配激活成功

2.3.3 Kafka安裝

[if !supportLists]3)? [endif]回到首頁(yè)朗伶,點(diǎn)擊添加服務(wù)

4)選擇Kafka,點(diǎn)擊繼續(xù)

5)Kafka的Broker選擇三臺(tái)機(jī)器

6)修改Kafka的堆大小為256M

7)完成

8)重新部署客戶端配置

9)點(diǎn)擊重啟過(guò)時(shí)服務(wù)

10)等待重啟步咪,重啟完成后點(diǎn)擊完成

2.3.4 查看Kafka Topic

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]#

/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/kafka-topics--zookeeper hadoop102:2181 --list

2.2.5 創(chuàng)建Kafka Topic

進(jìn)入到/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1目錄下分別創(chuàng)建:?jiǎn)?dòng)日志主題论皆、事件日志主題。

1)創(chuàng)建啟動(dòng)日志主題

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$ bin/kafka-topics--zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181? --create --replication-factor 1 --partitions1 --topictopic_start

2)創(chuàng)建事件日志主題

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$ bin/kafka-topics--zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181? --create --replication-factor 1 --partitions1 --topictopic_event

2.3.6 刪除Kafka Topic

1)刪除啟動(dòng)日志主題

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$ bin/kafka-topics

--delete --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181 --topic topic_start

2)刪除事件日志主題

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$ bin/kafka-topics

--delete --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181 --topic topic_event

2.3.7 生產(chǎn)消息

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$

bin/kafka-console-producer --broker-listhadoop102:9092 --topic topic_start

>hello world

>atguigu?atguigu

2.3.8 消費(fèi)消息

[root@hadoop103 KAFKA-4.0.0-1.4.0.0.p0.1]$

bin/kafka-console-consumer \

--bootstrap-server hadoop102:9092--from-beginning --topic topic_start

--from-beginning:會(huì)把first主題中以往所有的數(shù)據(jù)都讀取出來(lái)。根據(jù)業(yè)務(wù)場(chǎng)景選擇是否增加該配置点晴。

2.3.9 查看某個(gè)Topic的詳情

[root@hadoop102 KAFKA-4.0.0-1.4.0.0.p0.1]$ bin/kafka-topics--zookeeper hadoop102:2181?--describe--topic topic_start

2.4 Flume消費(fèi)Kafka數(shù)據(jù)寫到HDFS

1)集群規(guī)劃

?服務(wù)器hadoop102服務(wù)器hadoop103服務(wù)器hadoop104

Flume(消費(fèi)Kafka)??Flume

2)Flume配置分析

3)Flume的具體配置如下:

?????? (1)在CM管理頁(yè)面hadoop104上Flume的配置中找到代理名稱

a1

在配置文件如下內(nèi)容(kafka-hdfs)

## 組件

a1.sources=r1 r2

a1.channels=c1 c2

a1.sinks=k1 k2


## source1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource

a1.sources.r1.batchSize = 5000

a1.sources.r1.batchDurationMillis = 2000

a1.sources.r1.kafka.bootstrap.servers =hadoop102:9092,hadoop103:9092,hadoop104:9092

a1.sources.r1.kafka.topics=topic_start


## source2

a1.sources.r2.type =org.apache.flume.source.kafka.KafkaSource

a1.sources.r2.batchSize = 5000

a1.sources.r2.batchDurationMillis = 2000

a1.sources.r2.kafka.bootstrap.servers =hadoop102:9092,hadoop103:9092,hadoop104:9092

a1.sources.r2.kafka.topics=topic_event


## channel1

a1.channels.c1.type=memory

a1.channels.c1.capacity=100000

a1.channels.c1.transactionCapacity=10000


## channel2

a1.channels.c2.type=memory

a1.channels.c2.capacity=100000

a1.channels.c2.transactionCapacity=10000


## sink1

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d

a1.sinks.k1.hdfs.filePrefix = logstart-

a1.sinks.k1.hdfs.round = true

a1.sinks.k1.hdfs.roundValue = 10

a1.sinks.k1.hdfs.roundUnit = second


##sink2

a1.sinks.k2.type = hdfs

a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d

a1.sinks.k2.hdfs.filePrefix = logevent-

a1.sinks.k2.hdfs.round = true

a1.sinks.k2.hdfs.roundValue = 10

a1.sinks.k2.hdfs.roundUnit = second


## 不要產(chǎn)生大量小文件

a1.sinks.k1.hdfs.rollInterval = 10

a1.sinks.k1.hdfs.rollSize = 134217728

a1.sinks.k1.hdfs.rollCount = 0


a1.sinks.k2.hdfs.rollInterval = 10

a1.sinks.k2.hdfs.rollSize = 134217728

a1.sinks.k2.hdfs.rollCount = 0


## 控制輸出文件是原生文件感凤。

a1.sinks.k1.hdfs.fileType = CompressedStream

a1.sinks.k2.hdfs.fileType = CompressedStream


a1.sinks.k1.hdfs.codeC = lzop

a1.sinks.k2.hdfs.codeC = lzop


## 拼裝

a1.sources.r1.channels = c1

a1.sinks.k1.channel= c1


a1.sources.r2.channels = c2

a1.sinks.k2.channel= c2

2.5 日志生成數(shù)據(jù)傳輸?shù)紿DFS

1)將log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar上傳都hadoop102的/opt/module目錄

2)分發(fā)log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar到hadoop103

[root@hadoop102

module]# xsync log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar

3)在/root/bin目錄下創(chuàng)建腳本lg.sh

[root@hadoop102 bin]$ vim lg.sh

?????? 4)在腳本中編寫如下內(nèi)容

#! /bin/bash


?? for iin hadoop102 hadoop103

?? do

????? ssh$i "java -classpath/opt/module/log-collector-1.0-SNAPSHOT-jar-with-dependencies.jarcom.atguigu.appclient.AppMain$1 $2>/opt/module/test.log &"

?? done

5)修改腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 lg.sh

6)啟動(dòng)腳本

[root@hadoop102 module]$ lg.sh

第3章 數(shù)倉(cāng)搭建環(huán)境準(zhǔn)備

3.1 Hive安裝

1)添加服務(wù)

2)添加Hive服務(wù)

3)將 Hive 服務(wù)添加到Cluster1

4)配置hive元數(shù)據(jù)

5)測(cè)試通過(guò)后繼續(xù)

6)自動(dòng)啟動(dòng)Hive進(jìn)程

7)修改Hive配置

/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/

3.2 Oozie安裝

3.2.1 添加Oozie服務(wù)

3.2.2 選擇集群節(jié)點(diǎn)

3.2.3 選擇有MySQL的節(jié)點(diǎn)安裝

3.2.4 鏈接數(shù)據(jù)庫(kù)

3.2.5 一路繼續(xù)到完成

3.3 Hue安裝

3.3.1 Hue概述

1)Hue來(lái)源

HUE=Hadoop User Experience(Hadoop用戶體驗(yàn)),直白來(lái)說(shuō)就一個(gè)開源的Apache Hadoop UI系統(tǒng)粒督,由Cloudera Desktop演化而來(lái)陪竿,最后Cloudera公司將其貢獻(xiàn)給Apache基金會(huì)的Hadoop社區(qū),它是基于Python Web框架Django實(shí)現(xiàn)的屠橄。通過(guò)使用HUE我們可以在瀏覽器端的Web控制臺(tái)上與Hadoop集群進(jìn)行交互來(lái)分析處理數(shù)據(jù)族跛。

2)Hue官網(wǎng)及使用者

官網(wǎng)網(wǎng)站:http://gethue.com/

3.3.2 安裝前的準(zhǔn)備

1)在LoadBalancer節(jié)點(diǎn)安裝mod_ssl

[root@hadoop102 ~]# yum -y install mod_ssl

2)查看/usr/lib64/mysql下有沒有l(wèi)ibmysqlclient_r.so.16,如果沒有锐墙,上傳libmysqlclient_r.so.16到/usr/lib64/mysql礁哄,并軟鏈接到/usr/lib64/

[root@hadoop102 ~]# ls /usr/lib64/mysql

[root@hadoop103mysql]#

scp /usr/lib64/mysql/libmysqlclient_r.so.16

root@hadoop102:/usr/lib64/mysql/


[root@hadoop102 ~]ln -s /usr/lib64/mysql/libmysqlclient_r.so.16

/usr/lib64/libmysqlclient_r.so.16

3.3.3 HUE安裝步驟

1)添加HUE服務(wù)

2)選擇集群節(jié)點(diǎn)

3)分配角色

4)配置數(shù)據(jù)庫(kù)

5)安裝完成

6)HUE頁(yè)面

http://hadoop102:8888(未優(yōu)化)或http://hadoop102:8889(優(yōu)化)

第一次開啟HUE會(huì)出現(xiàn)以下頁(yè)面,此時(shí)輸入的用戶名和密碼可以隨意溪北,之后登錄頁(yè)面以第一次輸入的賬號(hào)密碼為依據(jù)桐绒。例如,用戶名:admin 密碼:admin

第4章 用戶行為數(shù)倉(cāng)搭建

4.1 ODS層

原始數(shù)據(jù)層之拨,存放原始數(shù)據(jù)茉继,直接加載原始日志、數(shù)據(jù)敦锌,數(shù)據(jù)保持原貌不做處理馒疹。

4.1.1 創(chuàng)建數(shù)據(jù)庫(kù)

1)創(chuàng)建gmall數(shù)據(jù)庫(kù)

hive (default)> create database gmall;

說(shuō)明:如果數(shù)據(jù)庫(kù)存在且有數(shù)據(jù),需要強(qiáng)制刪除時(shí)執(zhí)行:drop database gmall cascade;

2)使用gmall數(shù)據(jù)庫(kù)

hive (default)> use gmall;

4.1.2 創(chuàng)建啟動(dòng)日志表ods_start_log

1)創(chuàng)建輸入數(shù)據(jù)是lzo輸出是text乙墙,支持json解析的分區(qū)表

hive (gmall)>

drop table if exists ods_start_log;

CREATE EXTERNAL TABLE ods_start_log (`line`string)

PARTITIONED BY (`dt` string)

STORED AS

? INPUTFORMAT'com.hadoop.mapred.DeprecatedLzoTextInputFormat'

?OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION'/warehouse/gmall/ods/ods_start_log';

說(shuō)明Hive的LZO壓縮:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

4.1.3 ODS層加載數(shù)據(jù)腳本

1)在hadoop102的/root/bin目錄下創(chuàng)建腳本

[root@hadoop102 bin]$ vim ods_log.sh

?????? 在腳本中編寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期颖变;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date -d "-1 day" +%F`

fi


echo "===日志日期為$do_date==="

sql="

load data inpath '/origin_data/gmall/log/topic_start/$do_date'into table "$APP".ods_start_log partition(dt='$do_date');


"


hive -e "$sql"

說(shuō)明1:

[ -n 變量值] 判斷變量的值誓军,是否為空

-- 變量的值虏杰,非空,返回true

-- 變量的值宴卖,為空汉买,返回false

說(shuō)明2:

查看date命令的使用衔峰,[root@hadoop102 ~]$date --help

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 ods_log.sh

3)腳本使用

[root@hadoop102 module]$ ods_log.sh 2019-02-10

4.2 DWD層啟動(dòng)表數(shù)據(jù)解析

4.2.1 創(chuàng)建啟動(dòng)表

1)建表語(yǔ)句

hive (gmall)>

drop table if exists dwd_start_log;

CREATE EXTERNAL TABLE dwd_start_log(

`mid_id` string,

`user_id` string,

`version_code` string,

`version_name` string,

`lang` string,

`source` string,

`os` string,

`area` string,

`model` string,

`brand` string,

`sdk_version` string,

`gmail` string,

`height_width` string,?

`app_time` string,

`network` string,

`lng` string,

`lat` string,

`entry` string,

`open_ad_type`string,

`action` string,

`loading_time`string,

`detail` string,

`extend1` string

)

PARTITIONED BY (dt string)

location '/warehouse/gmall/dwd/dwd_start_log/';

4.2.2 DWD層啟動(dòng)表加載數(shù)據(jù)腳本

1)在hadoop102的/root/bin目錄下創(chuàng)建腳本

[root@hadoop102 bin]$ vim dwd_start_log.sh

?????? 在腳本中編寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date-d "-1 day" +%F`?

fi


sql="

set hive.exec.dynamic.partition.mode=nonstrict;


insert overwrite table "$APP".dwd_start_log

PARTITION (dt='$do_date')

select

???get_json_object(line,'$.mid') mid_id,

???get_json_object(line,'$.uid') user_id,

???get_json_object(line,'$.vc') version_code,

??? get_json_object(line,'$.vn') version_name,

???get_json_object(line,'$.l') lang,

???get_json_object(line,'$.sr') source,

???get_json_object(line,'$.os') os,

???get_json_object(line,'$.ar') area,

???get_json_object(line,'$.md') model,

??? get_json_object(line,'$.ba')brand,

???get_json_object(line,'$.sv') sdk_version,

???get_json_object(line,'$.g') gmail,

???get_json_object(line,'$.hw') height_width,

???get_json_object(line,'$.t') app_time,

???get_json_object(line,'$.nw') network,

??? get_json_object(line,'$.ln')lng,

???get_json_object(line,'$.la') lat,

???get_json_object(line,'$.entry') entry,

???get_json_object(line,'$.open_ad_type') open_ad_type,

???get_json_object(line,'$.action') action,

???get_json_object(line,'$.loading_time') loading_time,

???get_json_object(line,'$.detail') detail,

???get_json_object(line,'$.extend1') extend1

from "$APP".ods_start_log

where dt='$do_date';

"


hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 dwd_start_log.sh

3)腳本使用

[root@hadoop102 module]$ dwd_start_log.sh2019-02-10

4.3 DWS層(需求:用戶日活躍)

目標(biāo):統(tǒng)計(jì)當(dāng)日蛙粘、當(dāng)周垫卤、當(dāng)月活動(dòng)的每個(gè)設(shè)備明細(xì)

4.3.1 每日活躍設(shè)備明細(xì)

1)建表語(yǔ)句

hive (gmall)>

drop table if exists dws_uv_detail_day;

create external table dws_uv_detail_day

(

??? `mid_id` string COMMENT '設(shè)備唯一標(biāo)識(shí)',

???`user_id` string COMMENT '用戶標(biāo)識(shí)',

???`version_code` string COMMENT '程序版本號(hào)',

???`version_name` string COMMENT '程序版本名',

???`lang` string COMMENT '系統(tǒng)語(yǔ)言',

???`source` string COMMENT '渠道號(hào)',

???`os` string COMMENT '安卓系統(tǒng)版本',

???`area` string COMMENT '區(qū)域',

???`model` string COMMENT '手機(jī)型號(hào)',

???`brand` string COMMENT '手機(jī)品牌',

???`sdk_version` string COMMENT 'sdkVersion',

???`gmail` string COMMENT 'gmail',

???`height_width` string COMMENT '屏幕寬高',

???`app_time` string COMMENT '客戶端日志產(chǎn)生時(shí)的時(shí)間',

???`network` string COMMENT '網(wǎng)絡(luò)模式',

???`lng` string COMMENT '經(jīng)度',

???`lat` string COMMENT '緯度'

)

partitioned by(dt string)

stored as parquet

location '/warehouse/gmall/dws/dws_uv_detail_day'

;

4.3.2 DWS層加載數(shù)據(jù)腳本

1)在hadoop102的/root/bin目錄下創(chuàng)建腳本

[root@hadoop102 bin]$ vim dws_log.sh

?????? 在腳本中編寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

??? do_date=$1

else

??? do_date=`date-d "-1 day" +%F`?

fi



sql="

? sethive.exec.dynamic.partition.mode=nonstrict;


?insert overwrite table "$APP".dws_uv_detail_daypartition(dt='$do_date')

?select?

???mid_id,

???concat_ws('|', collect_set(user_id)) user_id,

???concat_ws('|', collect_set(version_code)) version_code,

???concat_ws('|', collect_set(version_name)) version_name,

???concat_ws('|', collect_set(lang)) lang,

???concat_ws('|', collect_set(source)) source,

???concat_ws('|', collect_set(os)) os,

???concat_ws('|', collect_set(area)) area,

???concat_ws('|', collect_set(model)) model,

???concat_ws('|', collect_set(brand)) brand,

???concat_ws('|', collect_set(sdk_version)) sdk_version,

???concat_ws('|', collect_set(gmail)) gmail,

???concat_ws('|', collect_set(height_width)) height_width,

???concat_ws('|', collect_set(app_time)) app_time,

???concat_ws('|', collect_set(network)) network,

???concat_ws('|', collect_set(lng)) lng,

???concat_ws('|', collect_set(lat)) lat

? from"$APP".dwd_start_log

? wheredt='$do_date'?

? groupby mid_id;

"


hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 dws_log.sh

3)腳本使用

[root@hadoop102 module]$ dws_log.sh 2019-02-10

4.4 ADS層(需求:用戶日活躍)

目標(biāo):當(dāng)日活躍設(shè)備數(shù)

4.4.1 活躍設(shè)備數(shù)

1)建表語(yǔ)句

hive (gmall)>

drop table if exists ads_uv_count;

create external table ads_uv_count(

??? `dt` string COMMENT '統(tǒng)計(jì)日期',

??? `day_count` bigint COMMENT '當(dāng)日用戶數(shù)量'

) COMMENT '活躍設(shè)備數(shù)'

row format delimited fields terminated by

'\t'

location '/warehouse/gmall/ads/ads_uv_count/'

;

4.4.2 ADS層加載數(shù)據(jù)腳本

1)在hadoop102的/root/bin目錄下創(chuàng)建腳本

[root@hadoop102 bin]$ vim ads_uv_log.sh

?????? 在腳本中編寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期出牧;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date-d "-1 day" +%F`?

fi


sql="

? sethive.exec.dynamic.partition.mode=nonstrict;


insert into table"$APP".ads_uv_count

select?

?'$do_date' dt,

??daycount.ct

from

(

??select?

?????'$do_date' dt,

??????count(*) ct

?? from"$APP".dws_uv_detail_day

??where dt='$do_date'?

)daycount;

"


hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 ads_uv_log.sh

3)腳本使用

[root@hadoop102 module]$ ads_uv_log.sh2019-02-10

第5章 業(yè)務(wù)數(shù)倉(cāng)搭建

5.1 業(yè)務(wù)數(shù)據(jù)生成

5.1.1 建表語(yǔ)句

1)通過(guò)SQLyog創(chuàng)建數(shù)據(jù)庫(kù)gmall

2)設(shè)置數(shù)據(jù)庫(kù)編碼

3)導(dǎo)入建表語(yǔ)句(1建表腳本)

選擇->1建表腳本.sql

4)重復(fù)步驟3的導(dǎo)入方式穴肘,依次導(dǎo)入:2商品分類數(shù)據(jù)插入腳本、3函數(shù)腳本舔痕、4存儲(chǔ)過(guò)程腳本评抚。

5.1.2 生成業(yè)務(wù)數(shù)據(jù)

1)生成業(yè)務(wù)數(shù)據(jù)函數(shù)說(shuō)明

?????? init_data ( do_date_string VARCHAR(20) , order_incr_numINT, user_incr_num INT , sku_num INT , if_truncate BOOLEAN? ):

?????? 參數(shù)一:do_date_string生成數(shù)據(jù)日期

?????? 參數(shù)二:order_incr_num訂單id個(gè)數(shù)

?????? 參數(shù)三:user_incr_num用戶id個(gè)數(shù)

?????? 參數(shù)四:sku_num商品sku個(gè)數(shù)

?????? 參數(shù)五:if_truncate是否刪除數(shù)據(jù)

2)案例測(cè)試:

(1)需求:生成日期2019年2月10日數(shù)據(jù)豹缀、訂單1000個(gè)、用戶200個(gè)慨代、商品sku300個(gè)邢笙、刪除原始數(shù)據(jù)。

CALL init_data('2019-02-10',1000,200,300,TRUE);

(2)查詢生成數(shù)據(jù)結(jié)果

SELECT * from base_category1;

SELECT * from base_category2;

SELECT * from base_category3;


SELECT * from order_info;

SELECT * from order_detail;


SELECT * from sku_info;

SELECT * from user_info;


SELECT * from payment_info;

5.2 業(yè)務(wù)數(shù)據(jù)導(dǎo)入數(shù)倉(cāng)

5.2.1 Sqoop安裝

1)添加服務(wù)

2)選擇Sqoop

目前選擇sqoop1即可

3)選擇節(jié)點(diǎn)

5.2.2 Sqoop定時(shí)導(dǎo)入腳本

1)在/root/bin目錄下創(chuàng)建腳本sqoop_import.sh

[root@hadoop102

bin]$ vim sqoop_import.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


db_date=$2

echo

$db_date

db_name=gmall


import_data()

{

sqoop

import \

--connect

jdbc:mysql://hadoop102:3306/$db_name \

--username

root \

--password

000000 \

--target-dir? /origin_data/$db_name/db/$1/$db_date \

--delete-target-dir

\

--num-mappers

1 \

--fields-terminated-by

"\t" \

--query

"$2"'and? $CONDITIONS;'

}


import_sku_info(){

? import_data?"sku_info"? "select

id, spu_id, price, sku_name, sku_desc,

weight, tm_id,

category3_id, create_time

? from sku_info?where 1=1"

}


import_user_info(){

? import_data "user_info""select

id, name, birthday, gender, email,

user_level,

create_time

from user_info where 1=1"

}


import_base_category1(){

? import_data "base_category1""select

id, name from base_category1 where 1=1"

}


import_base_category2(){

? import_data "base_category2""select

id, name, category1_id from

base_category2 where

1=1"

}


import_base_category3(){

? import_data "base_category3""select id, name, category2_id from base_category3where 1=1"

}


import_order_detail(){

? import_data??"order_detail"??"select

??? od.id,

??? order_id,

??? user_id,

??? sku_id,

??? sku_name,

??? order_price,

??? sku_num,

??? o.create_time?

? from order_info o , order_detail od

? whereo.id=od.order_id

? andDATE_FORMAT(create_time,'%Y-%m-%d')='$db_date'"

}


import_payment_info(){

? import_data "payment_info"?? "select

??? id,?

??? out_trade_no,

??? order_id,

??? user_id,

??? alipay_trade_no,

??? total_amount,?

??? subject,

??? payment_type,

??? payment_time

? from payment_info

? whereDATE_FORMAT(payment_time,'%Y-%m-%d')='$db_date'"

}


import_order_info(){

? import_data??"order_info"??"select

??? id,

??? total_amount,

??? order_status,

??? user_id,

??? payment_way,

??? out_trade_no,

??? create_time,

??? operate_time?

? from order_info

? where? (DATE_FORMAT(create_time,'%Y-%m-%d')='$db_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$db_date')"

}


case

$1 in

? "base_category1")

???? import_base_category1

;;

? "base_category2")

???? import_base_category2

;;

? "base_category3")

???? import_base_category3

;;

? "order_info")

???? import_order_info

;;

? "order_detail")

???? import_order_detail

;;

? "sku_info")

???? import_sku_info

;;

? "user_info")

???? import_user_info

;;

? "payment_info")

???? import_payment_info

;;

?? "all")

??import_base_category1

??import_base_category2

??import_base_category3

??import_order_info

??import_order_detail

??import_sku_info

??import_user_info

??import_payment_info

;;

esac

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 sqoop_import.sh

3)執(zhí)行腳本導(dǎo)入數(shù)據(jù)

[root@hadoop102 bin]$ sqoop_import.sh all2019-02-10

5.3 ODS層

完全仿照業(yè)務(wù)數(shù)據(jù)庫(kù)中的表字段侍匙,一模一樣的創(chuàng)建ODS層對(duì)應(yīng)表氮惯。

5.3.1 創(chuàng)建訂單表

hive (gmall)>

drop table if exists ods_order_info;

create external table ods_order_info (

??`id` string COMMENT '訂單編號(hào)',

??`total_amount` decimal(10,2) COMMENT '訂單金額',

??`order_status` string COMMENT '訂單狀態(tài)',

?`user_id` string COMMENT '用戶id' ,

??? `payment_way` string

COMMENT '支付方式', ?

??? `out_trade_no` string

COMMENT '支付流水號(hào)', ?

??? `create_time` string

COMMENT '創(chuàng)建時(shí)間', ?

??? `operate_time` string

COMMENT '操作時(shí)間'

) COMMENT '訂單表'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_order_info/'

;

5.3.2 創(chuàng)建訂單詳情表

hive (gmall)>

drop table if exists ods_order_detail;

create external table ods_order_detail(

??`id` string COMMENT '訂單編號(hào)',

??`order_id` string? COMMENT '訂單號(hào)',

?`user_id` string COMMENT '用戶id' ,

??? `sku_id` string

COMMENT '商品id', ?

??? `sku_name` string

COMMENT '商品名稱', ?

??? `order_price` string

COMMENT '商品價(jià)格', ?

??? `sku_num` string

COMMENT '商品數(shù)量', ?

??? `create_time` string

COMMENT '創(chuàng)建時(shí)間'

) COMMENT '訂單明細(xì)表'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_order_detail/'

;

5.3.3 創(chuàng)建商品表

hive (gmall)>

drop table if exists ods_sku_info;

create external table ods_sku_info(

??`id` string COMMENT 'skuId',

??`spu_id` string ? COMMENT 'spuid',

?`price` decimal(10,2) COMMENT '價(jià)格' ,

??? `sku_name` string

COMMENT '商品名稱', ?

??? `sku_desc` string

COMMENT '商品描述', ?

??? `weight` string

COMMENT '重量', ?

??? `tm_id` string COMMENT

'品牌id', ?

??? `category3_id` string

COMMENT '品類id', ?

??? `create_time` string

COMMENT '創(chuàng)建時(shí)間'

) COMMENT '商品表'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_sku_info/'

;

5.3.4 創(chuàng)建用戶表

hive (gmall)>

drop table if exists ods_user_info;

create external table ods_user_info(

??`id` string COMMENT '用戶id',

??`name`?string COMMENT '姓名',

?`birthday` string COMMENT '生日' ,

??? `gender` string

COMMENT '性別', ?

??? `email` string COMMENT

'郵箱', ?

??? `user_level` string

COMMENT '用戶等級(jí)', ?

??? `create_time` string

COMMENT '創(chuàng)建時(shí)間'

) COMMENT '用戶信息'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_user_info/'

;

5.3.5 創(chuàng)建商品一級(jí)分類表

hive (gmall)>

drop table if exists ods_base_category1;

create external table ods_base_category1(

??`id` string COMMENT 'id',

??`name`?string COMMENT '名稱'

) COMMENT '商品一級(jí)分類'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_base_category1/'

;

5.3.6 創(chuàng)建商品二級(jí)分類表

hive (gmall)>

drop table if exists ods_base_category2;

create external table ods_base_category2(

??`id` string COMMENT ' id',

??`name`?string COMMENT '名稱',

?category1_id string COMMENT '一級(jí)品類id'

) COMMENT '商品二級(jí)分類'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_base_category2/'

;

5.3.7 創(chuàng)建商品三級(jí)分類表

hive (gmall)>

drop table if exists ods_base_category3;

create external table ods_base_category3(

??`id` string COMMENT ' id',

??`name`?string COMMENT '名稱',

?category2_id string COMMENT '二級(jí)品類id'

) COMMENT '商品三級(jí)分類'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_base_category3/'

;

5.3.8 創(chuàng)建支付流水表

hive (gmall)>

drop table if exists `ods_payment_info`;

create external table? `ods_payment_info`(

????`id` ?

bigint COMMENT '編號(hào)',

`out_trade_no`?string COMMENT '對(duì)外業(yè)務(wù)編號(hào)',

????`order_id`

?????? string COMMENT '訂單編號(hào)',

????`user_id`

??????? string COMMENT '用戶編號(hào)',

????`alipay_trade_no`

string COMMENT '支付寶交易流水編號(hào)',

????`total_amount`

?? decimal(16,2) COMMENT '支付金額',

????`subject`

??????? string COMMENT '交易內(nèi)容',

????`payment_type` string

COMMENT '支付類型',

????`payment_time`

? string COMMENT '支付時(shí)間'

)? COMMENT '支付流水表'

PARTITIONED BY ( `dt` string)

row format delimited? fields terminated by '\t'

location '/warehouse/gmall/ods/ods_payment_info/'

;

5.3.9 ODS層數(shù)據(jù)導(dǎo)入腳本

1)在/root/bin目錄下創(chuàng)建腳本ods_db.sh

[root@hadoop102

bin]$ vim ods_db.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


??APP=gmall


# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

??? do_date=$1

else

??? do_date=`date-d "-1 day" +%F`?

fi


sql="

load data inpath '/origin_data/$APP/db/order_info/$do_date'? OVERWRITE into table"$APP".ods_order_info partition(dt='$do_date');


load data inpath'/origin_data/$APP/db/order_detail/$do_date'?OVERWRITE into table "$APP".ods_order_detailpartition(dt='$do_date');


load data inpath'/origin_data/$APP/db/sku_info/$do_date'?OVERWRITE into table "$APP".ods_sku_infopartition(dt='$do_date');


load data inpath

'/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table

"$APP".ods_user_info partition(dt='$do_date');


load data inpath

'/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table

"$APP".ods_payment_info partition(dt='$do_date');


load data inpath

'/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table

"$APP".ods_base_category1 partition(dt='$do_date');


load data inpath

'/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table

"$APP".ods_base_category2 partition(dt='$do_date');


load data inpath

'/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table

"$APP".ods_base_category3 partition(dt='$do_date');

"

hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 ods_db.sh

3)采用腳本導(dǎo)入數(shù)據(jù)

[root@hadoop102 bin]$ ods_db.sh 2019-02-10

5.4 DWD層

對(duì)ODS層數(shù)據(jù)進(jìn)行判空過(guò)濾丈积。對(duì)商品分類表進(jìn)行維度退化(降維)筐骇。

5.4.1 創(chuàng)建訂單表

hive (gmall)>

drop table if exists dwd_order_info;

create external table dwd_order_info (

??`id` string COMMENT '',

??`total_amount` decimal(10,2) COMMENT '',

??`order_status` string COMMENT ' 1?2?3? 4? 5',

?`user_id` string COMMENT 'id' ,

??? `payment_way` string

COMMENT '', ?

??? `out_trade_no` string

COMMENT '', ?

??? `create_time` string

COMMENT '', ?

??? `operate_time` string

COMMENT ''

)

PARTITIONED BY ( `dt` string)

stored as?parquet

location '/warehouse/gmall/dwd/dwd_order_info/'

;

5.4.2 創(chuàng)建訂單詳情表

hive (gmall)>

drop table if exists dwd_order_detail;

create external table dwd_order_detail(

??`id` string COMMENT '',

??`order_id` decimal(10,2) COMMENT '',

?`user_id` string COMMENT 'id' ,

??? `sku_id` string

COMMENT 'id', ?

??? `sku_name` string

COMMENT '', ?

??? `order_price` string

COMMENT '', ?

??? `sku_num` string

COMMENT '',

??? `create_time` string

COMMENT ''

)

PARTITIONED BY (`dt` string)

stored as parquet

location '/warehouse/gmall/dwd/dwd_order_detail/'

;

5.4.3 創(chuàng)建用戶表

hive (gmall)>

drop table if exists dwd_user_info;

create external table dwd_user_info(

??`id` string COMMENT 'id',

??`name`?string COMMENT '',

?`birthday` string COMMENT '' ,

??? `gender` string

COMMENT '', ?

??? `email` string COMMENT

'', ?

??? `user_level` string

COMMENT '', ?

??? `create_time` string COMMENT

''

)

PARTITIONED BY (`dt` string)

stored as?parquet

location '/warehouse/gmall/dwd/dwd_user_info/'

;

5.4.4 創(chuàng)建支付流水表

hive (gmall)>

drop table if exists `dwd_payment_info`;

create external? table?`dwd_payment_info`(

????`id` ?

bigint COMMENT '',

`out_trade_no`?string COMMENT '',

????`order_id`

?????? string COMMENT '',

????`user_id`

??????? string COMMENT '',

????`alipay_trade_no`

string COMMENT '',

????`total_amount`

?? decimal(16,2) COMMENT '',

????`subject`

??????? string COMMENT '',

`payment_type`???string COMMENT '',

`payment_time`?string COMMENT ''

)?

PARTITIONED BY ( `dt` string)

stored as?parquet

location '/warehouse/gmall/dwd/dwd_payment_info/'

;

5.4.5 創(chuàng)建商品表(增加分類)

hive (gmall)>

drop table if exists dwd_sku_info;

create external table dwd_sku_info(

??`id` string COMMENT 'skuId',

??`spu_id` string COMMENT 'spuid',

?`price` decimal(10,2) COMMENT '' ,

??? `sku_name` string

COMMENT '', ?

??? `sku_desc` string

COMMENT '', ?

??? `weight` string

COMMENT '', ?

??? `tm_id` string COMMENT

'id', ?

`category3_id` stringCOMMENT '1id',

`category2_id` string COMMENT '2id',

`category1_id` string COMMENT '3id',

`category3_name` string COMMENT '3',

`category2_name` string COMMENT '2',

`category1_name` string COMMENT '1',

`create_time` stringCOMMENT ''

)

PARTITIONED BY ( `dt` string)

stored as?parquet

location '/warehouse/gmall/dwd/dwd_sku_info/'

;

5.4.6 DWD層數(shù)據(jù)導(dǎo)入腳本

1)在/root/bin目錄下創(chuàng)建腳本dwd_db.sh

[root@hadoop102

bin]$ vim dwd_db.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date-d "-1 day" +%F`?

fi


sql="


set hive.exec.dynamic.partition.mode=nonstrict;


insert overwrite table?? "$APP".dwd_order_infopartition(dt)

select * from "$APP".ods_order_info

where dt='$do_date'? and id is not null;


insert overwrite table?? "$APP".dwd_order_detailpartition(dt)

select * from"$APP".ods_order_detail

where dt='$do_date'?? andid is not null;


insert overwrite table?? "$APP".dwd_user_info partition(dt)

select * from "$APP".ods_user_info

where dt='$do_date'?? andid is not null;


insert overwrite table?? "$APP".dwd_payment_infopartition(dt)

select * from"$APP".ods_payment_info

where dt='$do_date'? andid is not null;


insert overwrite table?? "$APP".dwd_sku_info partition(dt)

select?

???sku.id,

???sku.spu_id,

???sku.price,

???sku.sku_name,?

???sku.sku_desc,?

???sku.weight,?

???sku.tm_id,?

?? ?sku.category3_id,?

??? c2.id category2_id ,?

??? c1.id category1_id,?

??? c3.namecategory3_name,?

??? c2.namecategory2_name,?

??? c1.name category1_name,?

???sku.create_time,

???sku.dt

from

???"$APP".ods_sku_info sku

join "$APP".ods_base_category3 c3 on sku.category3_id=c3.id

???join "$APP".ods_base_category2 c2 on c3.category2_id=c2.id

???join "$APP".ods_base_category1 c1 on c2.category1_id=c1.id

where sku.dt='$do_date'? andc2.dt='$do_date'?

and? c3.dt='$do_date'and?c1.dt='$do_date'

andsku.id is not null;


"

hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 dwd_db.sh

3)采用腳本導(dǎo)入數(shù)據(jù)

[root@hadoop102 bin]$ dwd_db.sh 2019-02-10

5.5 DWS層之用戶行為寬表

1)為什么要建寬表

需求目標(biāo)江滨,把每個(gè)用戶單日的行為聚合起來(lái)組成一張多列寬表,以便之后關(guān)聯(lián)用戶維度信息后進(jìn)行厌均,不同角度的統(tǒng)計(jì)分析唬滑。

5.5.1 創(chuàng)建用戶行為寬表

hive (gmall)>

drop table if exists dws_user_action;

create external table dws_user_action

(??

??? user_id???????? ?string?????comment '用戶id',

??? order_count???? bigint?????comment '下單次數(shù)',

??? order_amount??? decimal(16,2)? comment '下單金額',

??? payment_count?? bigint?????comment '支付次數(shù)',

??? payment_amount? decimal(16,2) comment '支付金額'

) COMMENT '每日用戶行為寬表'

PARTITIONED BY (`dt` string)

stored as parquet

location'/warehouse/gmall/dws/dws_user_action/'

tblproperties("parquet.compression"="snappy");

5.5.2 用戶行為數(shù)據(jù)寬表導(dǎo)入腳本

1)在/root/bin目錄下創(chuàng)建腳本dws_db_wide.sh

[root@hadoop102

bin]$ vim dws_db_wide.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date-d "-1 day" +%F`?

fi


sql="

with?

tmp_order as

(

???select

???????user_id,

???????count(*)? order_count,

???????sum(oi.total_amount) order_amount

???from "$APP".dwd_order_info oi

???where date_format(oi.create_time,'yyyy-MM-dd')='$do_date'

???group by user_id

) ,

tmp_payment as

(

???select

???????user_id,

???????sum(pi.total_amount) payment_amount,

???????count(*) payment_count

???from "$APP".dwd_payment_info pi

???where date_format(pi.payment_time,'yyyy-MM-dd')='$do_date'

???group by user_id

)

insert overwrite table"$APP".dws_user_action partition(dt='$do_date')

select

???user_actions.user_id,

???sum(user_actions.order_count),

???sum(user_actions.order_amount),

???sum(user_actions.payment_count),

???sum(user_actions.payment_amount)

from

(

???select

???????user_id,

???????order_count,

???? ???order_amount,

???????0 payment_count,

???????0 payment_amount

???from tmp_order


???union all

???select

???????user_id,

???????0 order_count,

???????0 order_amount,

???????payment_count,

???????payment_amount

???from tmp_payment

?)user_actions

group by user_id;

"


hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 dws_db_wide.sh

3)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ dws_db_wide.sh

2019-02-10

5.6 ADS層(需求:GMV成交總額)

5.6.1 建表語(yǔ)句

hive (gmall)>

drop table if exists ads_gmv_sum_day;

create external table

ads_gmv_sum_day(

??? `dt` string COMMENT '統(tǒng)計(jì)日期',

`gmv_count`?bigint COMMENT '當(dāng)日gmv訂單個(gè)數(shù)',

`gmv_amount`?decimal(16,2) COMMENT '當(dāng)日gmv訂單總金額',

`gmv_payment`?decimal(16,2) COMMENT '當(dāng)日支付金額'

) COMMENT 'GMV'

row format delimited fields terminated by

'\t'

location '/warehouse/gmall/ads/ads_gmv_sum_day/'

;

5.6.2 數(shù)據(jù)導(dǎo)入腳本

1)在/root/bin目錄下創(chuàng)建腳本ads_db_gmv.sh

[root@hadoop102

bin]$ vim ads_db_gmv.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


# 定義變量方便修改

APP=gmall


# 如果是輸入的日期按照取輸入日期棺弊;如果沒輸入日期取當(dāng)前時(shí)間的前一天

if [ -n "$1" ] ;then

?? do_date=$1

else

?? do_date=`date-d "-1 day" +%F`

fi


sql="

insert into table "$APP".ads_gmv_sum_day

select

???'$do_date' dt,

???sum(order_count)? gmv_count,

???sum(order_amount) gmv_amount,

???sum(payment_amount) payment_amount

from "$APP".dws_user_action

where dt ='$do_date'

group by dt;

"


hive -e "$sql"

2)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 ads_db_gmv.sh

3)執(zhí)行腳本

[root@hadoop102 bin]$ ads_db_gmv.sh

2019-02-10

5.6.3 數(shù)據(jù)導(dǎo)出腳本

1)在MySQL中創(chuàng)建ads_gmv_sum_day表

DROP TABLE IF EXISTS ads_gmv_sum_day;

CREATE TABLE ads_gmv_sum_day(

?`dt` varchar(200) DEFAULT NULL COMMENT '統(tǒng)計(jì)日期',

?`gmv_count` bigint(20) DEFAULT NULL COMMENT '當(dāng)日gmv訂單個(gè)數(shù)',

?`gmv_amount` decimal(16, 2) DEFAULT NULL COMMENT '當(dāng)日gmv訂單總金額',

?`gmv_payment` decimal(16, 2) DEFAULT NULL COMMENT '當(dāng)日支付金額'

) ENGINE = InnoDB CHARACTER SET = utf8

COLLATE = utf8_general_ci COMMENT = '每日活躍用戶數(shù)量' ROW_FORMAT =

Dynamic;

2)在/root/bin目錄下創(chuàng)建腳本sqoop_export.sh

[root@hadoop102

bin]$ vim sqoop_export.sh

?????? 在腳本中填寫如下內(nèi)容

#!/bin/bash


db_name=gmall


export_data() {

sqoop export \

--connect "jdbc:mysql://hadoop102:3306/${db_name}?useUnicode=true&characterEncoding=utf-8"? \

--username root \

--password 000000 \

--table $1 \

--num-mappers 1 \

--export-dir /warehouse/$db_name/ads/$1 \

--input-fields-terminated-by

"\t" \

--update-mode allowinsert \

--update-key

"tm_id,category1_id,stat_mn,stat_date" \

--input-null-string '\\N'??? \

--input-null-non-string '\\N'

}


case $1 in

?"ads_gmv_sum_day")

????export_data "ads_gmv_sum_day"

;;

??"all")

????export_data "ads_gmv_sum_day"

;;

esac

3)增加腳本執(zhí)行權(quán)限

[root@hadoop102 bin]$ chmod 777 sqoop_export.sh

4)執(zhí)行腳本導(dǎo)入數(shù)據(jù)

[root@hadoop102 bin]$ sqoop_export.sh

all

5)在SQLyog查看導(dǎo)出數(shù)據(jù)

select * from ads_gmv_sum_day

5.7 Oozie基于Hue實(shí)現(xiàn)GMV指標(biāo)全流程調(diào)度

5.7.1 執(zhí)行前的準(zhǔn)備

1)添加MySQL驅(qū)動(dòng)文件

[root@hadoop102 ~]# cp /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/lib


[root@hadoop102 ~]# cp/opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/sqoop/lib/


[root@hadoop102 mysql-connector-java-5.1.27]#hadoop fs -put /opt/software/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar/user/oozie/share/lib/lib_20190603220831/sqoop


[root@hadoop102 mysql-connector-java-5.1.27]#xsync /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/hadoop/lib


[root@hadoop102mysql-connector-java-5.1.27]#? xsync /opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/lib/sqoop/lib/

2)修改YARN的容器內(nèi)存yarn.nodemanager.resource.memory-mb為4G

5)新建一個(gè)Hue賬戶

6)切換用戶

5.7.2 在Hue中創(chuàng)建Oozie任務(wù)GMV

1)生成數(shù)據(jù)

CALL init_data('2019-02-12',300,200,300,FALSE);

2)安裝Oozie可視化js晶密,復(fù)制ext-2.2.zip到/opt/cloudera/parcels/CDH/lib/oozie/libext(或/var/lib/oozie)中,解壓

[root@hadoop102 libext]# unzip

ext-2.2.zip

3)查看Hue的web頁(yè)面

4)選擇query->scheduler->workflow

5)點(diǎn)擊My Workflow->輸入gmv

6)點(diǎn)擊保存

5.7.3 編寫任務(wù)腳本并上傳到HDFS

1)點(diǎn)擊workspace模她,查看上傳情況

2)上傳要執(zhí)行的腳本導(dǎo)HDFS路徑

?[root@hadoop102 bin]# hadoop fs -put /root/bin/*.sh/user/hue/oozie/workspaces/hue-oozie-1559457755.83/lib

3)點(diǎn)擊左側(cè)的->Documents->gmv

5.7.4 編寫任務(wù)調(diào)度

1)點(diǎn)擊編輯

2)選擇actions

3)拖拽控件編寫任務(wù)

4)選擇執(zhí)行腳本

5)Files再選擇執(zhí)行腳本

6)定義腳本參數(shù)名稱

7)按順序執(zhí)行3-6步驟稻艰。

ods_db.sh、dwd_db.sh侈净、dws_db_wide.sh尊勿、ads_db_gmv.sh和sqoop_export.sh

?????? 其中ods_db.sh、dwd_db.sh畜侦、dws_db_wide.sh元扔、ads_db_gmv.sh只有一個(gè)參數(shù)do_date;

?????? sqoop_export.sh只有一個(gè)參數(shù)all旋膳。

8)點(diǎn)擊保存

5.7.5 執(zhí)行任務(wù)調(diào)度

1)點(diǎn)擊執(zhí)行

2)點(diǎn)擊提交

第6章 即席查詢數(shù)倉(cāng)搭建

6.1 Impala安裝

6.1.1 添加服務(wù)

6.1.2 選擇Impala服務(wù)

6.1.3 角色分配

注意:最好將StateStore和CataLog Sever單獨(dú)部署在同一節(jié)點(diǎn)上澎语。

6.1.4 配置Impala

6.1.5 啟動(dòng)Impala

6.1.6 安裝成功

6.1.7 配置Hue支持Impala

6.2 Impala基于Hue查詢




第7章Spark2.1安裝

在CDH5.12.1集群中,默認(rèn)安裝的Spark是1.6版本验懊,這里需要將其升級(jí)為Spark2.1版本擅羞。經(jīng)查閱官方文檔,發(fā)現(xiàn)Spark1.6和2.x是可以并行安裝的义图,也就是說(shuō)可以不用刪除默認(rèn)的1.6版本减俏,可以直接安裝2.x版本,它們各自用的端口也是不一樣的歌溉。

Cloudera發(fā)布Apache Spark 2概述(可以在這里面找到安裝方法和parcel包的倉(cāng)庫(kù))

cloudera的官網(wǎng)可以下載相關(guān)的parcel 的離線安裝包:https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html

Cloudera Manager及5.12.0版本的介紹:https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_parcels.html#cmug_topic_7_11_5__section

7.1 升級(jí)過(guò)程

7.1.1 離線包下載

1)所需軟件:http://archive.cloudera.com/spark2/csd/

2)Parcels 包的下載地址:http://archive.cloudera.com/spark2/parcels/2.2.0.cloudera1/

7.1.2 離線包上傳

1)上傳文件SPARK2_ON_YARN-2.1.0.cloudera1.jar到/opt/cloudera/csd/下面

2)上傳文件SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel和SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel.sha1到/opt/cloudera/parcel-repo/

3)將SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel.sha1重命名為SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel.sha

[root@hadoop102 parcel-repo]# mv

/opt/cloudera/parcel-repo/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel.sha1

/opt/cloudera/parcel-repo/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el6.parcel.sha

7.2 頁(yè)面操作

7.2.1 更新Parcel

在cm首頁(yè)點(diǎn)擊Parcel垄懂,再點(diǎn)擊檢查新Parcel

7.2.2 點(diǎn)擊分配

7.2.3 點(diǎn)擊激活

7.2.4 回到首頁(yè)點(diǎn)擊添加服務(wù)

7.2.5 點(diǎn)擊Spark2繼續(xù)

7.2.6 選擇一組依賴關(guān)系

7.2.7 角色分配

7.2.8 部署并啟動(dòng)

注意:這里我報(bào)了一個(gè)錯(cuò):客戶端配置 (id=12) 已使用 1 退出骑晶,而預(yù)期值為0

1)問(wèn)題原因:最后找到原因是因?yàn)镃M安裝Spark不會(huì)去環(huán)境變量去找Java,需要將Java路徑添加到CM配置文件

2)解決方法1(需要重啟cdh):

[root@hadoop102 java]# vim

/opt/module/cm/cm-5.12.1/lib64/cmf/service/client/deploy-cc.sh

在文件最后加上

JAVA_HOME= /opt/module/jdk1.8.0_144

export JAVA_HOME= /opt/module/jdk1.8.0_144

3)解決方法2(無(wú)需重啟cdh):

查看/opt/module/cm/cm-5.12.1/lib64/cmf/service/common/cloudera-config.sh

找到j(luò)ava8的home目錄

cdh不會(huì)使用系統(tǒng)默認(rèn)的JAVA_HOME環(huán)境變量草慧,而是依照bigtop進(jìn)行管理桶蛔,因此我們需要在指定的/usr/java/jdk1.8目錄下安裝jdk。當(dāng)然我們已經(jīng)在/opt/module/jdk1.8.0_144下安裝了jdk漫谷,因此創(chuàng)建一個(gè)連接過(guò)去即可

[root@hadoop102 ~]# ln -s /opt/module/jdk1.8.0_144/

/usr/java/jdk1.8

[root@hadoop103 ~]# ln -s

/opt/module/jdk1.8.0_144/ /usr/java/jdk1.8

[root@hadoop104 ~]# ln -s

/opt/module/jdk1.8.0_144/ /usr/java/jdk1.8

3)解決方法3(需要重啟cdh):

找到hadoop102仔雷、hadoop103、hadoop104三臺(tái)機(jī)器的配置舔示,配置java主目錄

7.2.9 命令行查看命令

第8章 Sentry安裝

8.1 Sentry安裝部署

8.1.1 添加Sentry服務(wù)

8.1.2 選擇添加到集群

8.1.3 自定義Sentry角色分配

8.1.4 Mysql中創(chuàng)建Sentry庫(kù)

8.1.5 配置數(shù)據(jù)庫(kù)連接

8.1.6 成功完成Sentry的服務(wù)添加

8.2 Sentry與Hive集成使用

8.2.1 Hive的配置中需要配置以下參數(shù)

設(shè)置sentry.hive.testing.mode 為true


8.2.2 配置Hive使用Sentry服務(wù)

8.2.3 關(guān)閉Hive的用戶模擬功能


8.3 Sentry與Hive的使用

8.3.1 使用beeline連接HiveServer2碟婆,并登錄hive用戶

[root@hadoop104 init.d]#beeline

Java

? HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;

? support was removed in 8.0

Java

? HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M;

? support was removed in 8.0

Beeline

? version 1.1.0-cdh5.12.1 by Apache Hive

beeline>

? ! connect

? jdbc:hive2://hadoop104:10000

scan

? complete in 484ms

Connecting

? to jdbc:hive2://hadoop104:10000

Enter

? username for jdbc:hive2://hadoop104:10000: hive

Enter

? password for jdbc:hive2://hadoop104:10000: (直接回車)

Connected

? to: Apache Hive (version 1.1.0-cdh5.12.1)

Driver:

? Hive JDBC (version 1.1.0-cdh5.12.1)

Transaction

? isolation: TRANSACTION_REPEATABLE_READ

0:

? jdbc:hive2://hadoop104:10000>

注意:標(biāo)紅部分為輸入的hive用戶,輸入的hive用戶并未真正的校驗(yàn)惕稻。

8.3.2 創(chuàng)建一個(gè)admin角色

0: jdbc:hive2://hadoop104:10000> create role admin;


8.3.3 為admin角色賦予超級(jí)權(quán)限

0:jdbc:hive2://hadoop104:10000> grant all on server server1 to

? role admin;


8.3.4 將admin角色授權(quán)給hive用戶組

0: jdbc:hive2://hadoop104:10000> grant? role admin to group hive;


8.3.5 創(chuàng)建test表

使用beeline登錄hive用戶竖共,創(chuàng)建一個(gè)test表,并插入測(cè)試數(shù)據(jù)

0:jdbc:hive2://hadoop104:10000>?

? create table test (s1 string, s2 string) row format delimited fields? terminated by ',';


0: jdbc:hive2://hadoop104:10000>? insert into test values('a','b'),('1','2');


8.3.6 創(chuàng)建測(cè)試角色并授權(quán)給用戶組

創(chuàng)建兩個(gè)角色:

read:只能讀default庫(kù)test表俺祠,并授權(quán)給user_r用戶組

write:只能寫default庫(kù)test表公给,并授權(quán)給user_w用戶組

注意:集群所有節(jié)點(diǎn)必須存在user_r和user_w用戶,用戶默認(rèn)用戶組與用戶名一致蜘渣,賦權(quán)是針對(duì)用戶組而不是針對(duì)用戶淌铐。

[root@hadoop104 ~]# useradd user_r

[root@hadoop104

? ~]# id user_r

uid=502(user_r)

? gid=503(user_r) 組=503(user_r)

[root@hadoop104

? ~]# useradd user_w

[root@hadoop104

? ~]# id user_w

uid=503(user_w)

? gid=504(user_w) 組=504(user_w)

8.3.7 使用hive用戶創(chuàng)建創(chuàng)建read和write角色,并授權(quán)read角色對(duì)test表select權(quán)限蔫缸,write角色對(duì)test表insert權(quán)限

0: jdbc:hive2://hadoop104:10000>? create role read;


0:

? jdbc:hive2://hadoop104:10000> grant select on table test to role read;


0:

? jdbc:hive2://hadoop104:10000> create role write;


0:

? jdbc:hive2://hadoop104:10000> grant insert on table test to role write;



8.3.8 為user_r用戶組授權(quán)read角色腿准,為user_w用戶組授權(quán)write角色

0: jdbc:hive2://hadoop104:10000> grant role read to group user_r;


0: jdbc:hive2://hadoop104:10000> grant

? role write to group user_w;


8.3.9 beeline驗(yàn)證

1.使用user_r用戶登錄beeline進(jìn)行驗(yàn)證

[root@hadoop104? init.d]#beeline

beeline>! connect? jdbc:hive2://hadoop104:10000

scan complete in 4ms

Connecting to

? jdbc:hive2://hadoop104:10000

Enter username for

? jdbc:hive2://hadoop104:10000: user_r

Enter password for

? jdbc:hive2://hadoop104:10000: (直接回車)

Connected to: Apache Hive (version

? 1.1.0-cdh5.12.1)

Driver: Hive JDBC (version

? 1.1.0-cdh5.12.1)

Transaction isolation:

? TRANSACTION_REPEATABLE_READ

0:

? jdbc:hive2://hadoop104:10000> show? tables;

INFO? : OK

+-----------+--+

|? tab_name? |

+-----------+--+

|? test????? |

+-----------+--+

1

? row selected (0.649 seconds)

0:? jdbc:hive2://hadoop104:10000>? select * from test;

INFO? : OK

+----------+----------+--+

|? test.s1? | test.s2? |

+----------+----------+--+

|? a??????? | b??????? |

|? 1??????? | 2??????? |

+----------+----------+--+

2

? rows selected (0.485 seconds)

0:

? jdbc:hive2://hadoop104:10000> insert

? into test values("2", "222");

Error: Error while compiling statement: FAILED:? SemanticException No valid privileges

?User user_r does? not have privileges for QUERY

?The required? privileges: Server=server1->Db=default->Table=test->action=insert;? (state=42000,code=40000)

0:

? jdbc:hive2://hadoop104:10000>


2.使用user_w用戶登錄beeline驗(yàn)證

[root@hadoop104 init.d]#beeline

beeline>

? ! connect jdbc:hive2://hadoop104:10000

Connecting

? to jdbc:hive2://hadoop104:10000

Enter

? username for jdbc:hive2://hadoop104:10000: user_w

Enter

? password for jdbc:hive2://hadoop104:10000:

Connected

? to: Apache Hive (version 1.1.0-cdh5.12.1)

Driver:

? Hive JDBC (version 1.1.0-cdh5.12.1)

Transaction

? isolation: TRANSACTION_REPEATABLE_READ

0:

? jdbc:hive2://hadoop104:10000> select? * from test;

Error: Error while compiling statement: FAILED:? SemanticException No valid privileges

?User user_w does? not have privileges for QUERY

?The required? privileges:? Server=server1->Db=default->Table=test->Column=s1->action=select;? (state=42000,code=40000)

0:

? jdbc:hive2://hadoop104:10000>insert into test values("2", "333");

…..

INFO? : Total MapReduce CPU Time Spent: 20? seconds 30 msec

INFO? : Completed executing? command(queryId=hive_20190603173535_88f71295-e612-4b45-bff7-df1fc4310a54);? Time taken: 58.184 seconds

INFO? : OK

No

? rows affected (59.17 seconds)

0:

? jdbc:hive2://hadoop104:10000>

驗(yàn)證總結(jié):

user_r用戶所屬組為user_r擁有test表讀權(quán)限,所以只能對(duì)test表進(jìn)行selecth和count操作不能進(jìn)行insert操作拾碌;

user_w用戶所屬組為user_w擁有test表寫權(quán)限吐葱,所以只能對(duì)test表進(jìn)行insert操作不能進(jìn)行select和count操作;

8.4 Kerberos在企業(yè)中的作用

CDH平臺(tái)中的安全倦沧,認(rèn)證(Kerberos/LDAP)是第一步唇撬,授權(quán)(Sentry)是第二步。如果要啟用授權(quán)展融,必須先啟用認(rèn)證窖认。但在CDH平臺(tái)中給出了一種測(cè)試模式,即不啟用認(rèn)證而只啟用Sentry授權(quán)告希。但強(qiáng)烈不建議在生產(chǎn)系統(tǒng)中這樣使用扑浸,因?yàn)槿绻麤]有用戶認(rèn)證,授權(quán)沒有任何意義形同虛設(shè)燕偶,用戶可以隨意使用任何超級(jí)用戶登錄HiveServer2或者Impala喝噪,并不會(huì)做密碼校驗(yàn)。注:本文檔僅適用于測(cè)試環(huán)境指么。

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請(qǐng)聯(lián)系作者
  • 序言:七十年代末酝惧,一起剝皮案震驚了整個(gè)濱河市榴鼎,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌晚唇,老刑警劉巖巫财,帶你破解...
    沈念sama閱讀 221,820評(píng)論 6 515
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件,死亡現(xiàn)場(chǎng)離奇詭異哩陕,居然都是意外死亡平项,警方通過(guò)查閱死者的電腦和手機(jī),發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 94,648評(píng)論 3 399
  • 文/潘曉璐 我一進(jìn)店門悍及,熙熙樓的掌柜王于貴愁眉苦臉地迎上來(lái)闽瓢,“玉大人,你說(shuō)我怎么就攤上這事心赶】鬯希” “怎么了?”我有些...
    開封第一講書人閱讀 168,324評(píng)論 0 360
  • 文/不壞的土叔 我叫張陵园担,是天一觀的道長(zhǎng)届谈。 經(jīng)常有香客問(wèn)我,道長(zhǎng)弯汰,這世上最難降的妖魔是什么? 我笑而不...
    開封第一講書人閱讀 59,714評(píng)論 1 297
  • 正文 為了忘掉前任湖雹,我火速辦了婚禮咏闪,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘摔吏。我一直安慰自己鸽嫂,他們只是感情好,可當(dāng)我...
    茶點(diǎn)故事閱讀 68,724評(píng)論 6 397
  • 文/花漫 我一把揭開白布征讲。 她就那樣靜靜地躺著据某,像睡著了一般。 火紅的嫁衣襯著肌膚如雪诗箍。 梳的紋絲不亂的頭發(fā)上癣籽,一...
    開封第一講書人閱讀 52,328評(píng)論 1 310
  • 那天,我揣著相機(jī)與錄音滤祖,去河邊找鬼筷狼。 笑死,一個(gè)胖子當(dāng)著我的面吹牛匠童,可吹牛的內(nèi)容都是我干的埂材。 我是一名探鬼主播,決...
    沈念sama閱讀 40,897評(píng)論 3 421
  • 文/蒼蘭香墨 我猛地睜開眼汤求,長(zhǎng)吁一口氣:“原來(lái)是場(chǎng)噩夢(mèng)啊……” “哼俏险!你這毒婦竟也來(lái)了严拒?” 一聲冷哼從身側(cè)響起,我...
    開封第一講書人閱讀 39,804評(píng)論 0 276
  • 序言:老撾萬(wàn)榮一對(duì)情侶失蹤竖独,失蹤者是張志新(化名)和其女友劉穎裤唠,沒想到半個(gè)月后,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體预鬓,經(jīng)...
    沈念sama閱讀 46,345評(píng)論 1 318
  • 正文 獨(dú)居荒郊野嶺守林人離奇死亡巧骚,尸身上長(zhǎng)有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點(diǎn)故事閱讀 38,431評(píng)論 3 340
  • 正文 我和宋清朗相戀三年,在試婚紗的時(shí)候發(fā)現(xiàn)自己被綠了格二。 大學(xué)時(shí)的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片劈彪。...
    茶點(diǎn)故事閱讀 40,561評(píng)論 1 352
  • 序言:一個(gè)原本活蹦亂跳的男人離奇死亡,死狀恐怖顶猜,靈堂內(nèi)的尸體忽然破棺而出沧奴,到底是詐尸還是另有隱情,我是刑警寧澤长窄,帶...
    沈念sama閱讀 36,238評(píng)論 5 350
  • 正文 年R本政府宣布滔吠,位于F島的核電站,受9級(jí)特大地震影響挠日,放射性物質(zhì)發(fā)生泄漏疮绷。R本人自食惡果不足惜,卻給世界環(huán)境...
    茶點(diǎn)故事閱讀 41,928評(píng)論 3 334
  • 文/蒙蒙 一嚣潜、第九天 我趴在偏房一處隱蔽的房頂上張望冬骚。 院中可真熱鬧,春花似錦懂算、人聲如沸只冻。這莊子的主人今日做“春日...
    開封第一講書人閱讀 32,417評(píng)論 0 24
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽(yáng)喜德。三九已至矾踱,卻和暖如春丹喻,著一層夾襖步出監(jiān)牢的瞬間寸爆,已是汗流浹背绣版。 一陣腳步聲響...
    開封第一講書人閱讀 33,528評(píng)論 1 272
  • 我被黑心中介騙來(lái)泰國(guó)打工弦讽, 沒想到剛下飛機(jī)就差點(diǎn)兒被人妖公主榨干…… 1. 我叫王不留闽撤,地道東北人添吗。 一個(gè)月前我還...
    沈念sama閱讀 48,983評(píng)論 3 376
  • 正文 我出身青樓届惋,卻偏偏與公主長(zhǎng)得像入桂,于是被迫代替她去往敵國(guó)和親奄薇。 傳聞我的和親對(duì)象是個(gè)殘疾皇子,可洞房花燭夜當(dāng)晚...
    茶點(diǎn)故事閱讀 45,573評(píng)論 2 359

推薦閱讀更多精彩內(nèi)容