- 外部表的使用
- 基于ETL的數(shù)據(jù)加載到數(shù)據(jù)倉庫
- 使用Hive進行統(tǒng)計分析
*對比 MR 和 Hive
查看表:
desc formatted <table_name>
MANAGED_TABLE:內(nèi)部表
刪除內(nèi)部表:HDFS上的數(shù)據(jù)被刪除 & Meta也被刪除
創(chuàng)建外部表:
CREATE EXTERNAL TABLE emp_external(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/external/emp/';
刪除外部表: HDFS上的數(shù)據(jù)不被刪除 & Meta被刪除
上次ETL的數(shù)據(jù):
create external table vvtrack_info(
ip string,
country string,
province string,
city string,
url string,
time string,
page string
) partitioned by (day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/project/vvtrackinfo/';
LOAD DATA INPATH 'hdfs://hadoop000:8020/project/input/etl' OVERWRITE INTO TABLE track_info partition(day='2013-07-21');
統(tǒng)計瀏覽量
select count(*) from vvtrack_info where day='2013-07-21';
統(tǒng)計省份:
select province, count(*) from vvtrack_info where day='2013-07-21 group by province;
省份統(tǒng)計表
create table vvtrack_info_province_stat(
province string,
cnt bigint
) partitioned by (day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
insert overwrite table vvtrack_info_province_stat partition(day='2013-07-21')
select province,count(*) as cnt from vvtrack_info where day='2013-07-21' group by province ;
1)ETL
2)把ETL輸出的數(shù)據(jù)加載到track_info分區(qū)表里
3)各個維度統(tǒng)計結(jié)果的數(shù)據(jù)輸出到各自維度的表里(track_info_province_stat)
4)將數(shù)據(jù)導出(optional)
查看函數(shù)
show functions
查看具體函數(shù)用法
desc function <function_name>
查看具體例子
desc function extended <function_name>