CREATE [EXTERNAL]
TABLE
[IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
分區(qū)建表分為2種咕幻,一種是單分區(qū)渔伯,也就是說在表文件夾目錄下只有一級文件夾目錄。另外一種是多分區(qū)肄程,表文件夾下出現(xiàn)多文件夾嵌套模式锣吼。
a. 單分區(qū)建表語句:create table day_table (id int, content string) partitioned by (dt string);單分區(qū)表,按天分區(qū)蓝厌,在表結(jié)構(gòu)中存在id玄叠,content,dt三列拓提。
b. 雙分區(qū)建表語句:create table day_hour_table (id int, content string) partitioned by (dt string, hour string);雙分區(qū)表读恃,按天和小時分區(qū),在表結(jié)構(gòu)中新增加了dt和hour兩列代态。
表文件夾目錄示意圖(多分區(qū)表):
添加分區(qū)表語法(表已創(chuàng)建狐粱,在此基礎(chǔ)上添加分區(qū)):
ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ... partition_spec: : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...)
用戶可以用 ALTER TABLE ADD PARTITION 來向一個表中增加分區(qū)。當分區(qū)名是字符串時加引號胆数。例:
ALTER TABLE day_table ADD PARTITION (dt='2008-08-08', hour='08') location '/path/pv1.txt' PARTITION (dt='2008-08-08', hour='09') location '/path/pv2.txt';
7肌蜻、. 刪除分區(qū)語法:
ALTER TABLE table_name DROP partition_spec, partition_spec,...
用戶可以用 ALTER TABLE DROP PARTITION 來刪除分區(qū)。分區(qū)的元數(shù)據(jù)和數(shù)據(jù)將被一并刪除必尼。例:
ALTER TABLE day_hour_table DROP PARTITION (dt='2008-08-08', hour='09');
數(shù)據(jù)加載進分區(qū)表中語法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
例:
LOAD DATA INPATH '/user/pv.txt' INTO TABLE day_hour_table PARTITION(dt='2008-08- 08', hour='08'); LOAD DATA local INPATH '/user/hua/*' INTO TABLE day_hour partition(dt='2010-07- 07');
當數(shù)據(jù)被加載至表中時蒋搜,不會對數(shù)據(jù)進行任何轉(zhuǎn)換。Load操作只是將數(shù)據(jù)復制至Hive表對應的位置判莉。數(shù)據(jù)加載時在表下自動創(chuàng)建一個目錄豆挽,文件存放在該分區(qū)下。
基于分區(qū)的查詢的語句:
SELECT day_table.* FROM day_table WHERE day_table.dt>= '2008-08-08';
查看分區(qū)語句:
hive> show partitions day_hour_table; OK dt=2008-08-08/hour=08 dt=2008-08-08/hour=09 dt=2008-08-09/hour=09