好記性不如爛筆頭恬试,何況記性不好
本文都來自hive文檔, 記下來方便查詢搏嗡。
Database操作
- create database
文法CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name [COMMENT database_comment] [LOCATION hdfs_path] [WITH DBPROPERTIES (property_name=property_value, ...)];
- drop database
文法DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
默認(rèn)是RESTRICT
比默,在數(shù)據(jù)庫不空是鸦致,drop失敗棕诵。CASCADE
會(huì)刪除掉db中的表纪蜒。
- alter database
文法
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
- 查詢當(dāng)前db
SELECT current_database()
Table 操作
- 創(chuàng)建table
文法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path]
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
解釋
臨時(shí)表
使用CREATE TEMPRARY table ...
, 臨時(shí)表和當(dāng)前session相關(guān),session斷開之后HIVE會(huì)刪除臨時(shí)表扰付。臨時(shí)表不支持分區(qū)和索引內(nèi)表
使用CREATE table ...
模式創(chuàng)建內(nèi)表堤撵,內(nèi)表元數(shù)據(jù)、表數(shù)據(jù)都有HIVE管理羽莺。表數(shù)據(jù)存放在hive.metastore.warehouse.dir
/table_name下实昨,drop內(nèi)表時(shí),表數(shù)據(jù)和元數(shù)據(jù)都會(huì)被刪除盐固。外表
使用CREATE EXTERNAL table ...
創(chuàng)建外表荒给,創(chuàng)建時(shí)可是使用CREATE EXTERNAL table ... LOCATION hdfs_path
指定外表數(shù)據(jù)存儲(chǔ)的位置,當(dāng)你已經(jīng)有一份數(shù)據(jù)時(shí)闰挡,可以將數(shù)據(jù)擋在hdfs_path路徑下锐墙,然后創(chuàng)建外表礁哄。drop外表時(shí)只會(huì)刪除表的元數(shù)據(jù)长酗,表數(shù)據(jù)不會(huì)刪除。更新表元數(shù)據(jù)庫
當(dāng)通過手動(dòng)的方式(hdfs dfs -put)向hdfs某表路徑下添加分區(qū)數(shù)據(jù)時(shí)桐绒,hive管理的元數(shù)據(jù)無法自動(dòng)追蹤分區(qū)信息夺脾,可以通過MSCK REPAIR TABLE table_name;
檢查并修復(fù)元數(shù)據(jù)庫。表數(shù)據(jù)存儲(chǔ)格式
STORED AS INPUTFORMAT input_format_class OUTPUTFORMAT output_format_class茉继,自定義input format和output format從hdfs讀和寫咧叭。-
ROW FORMAT
有兩種方式:DELIMITED,指定符號(hào)分割行烁竭,行內(nèi)各個(gè)field分割方式菲茬,如果存在column是map或者collection類型,分別通過
MAP KEYS TERMIATED BY char
分割key和value派撕,或者通過COLLECTION ITEMS TERMINATED BY char
分割列表元素婉弹。SERDE, 自定義序列化反序列化類,可以參考hive serde.
-
分區(qū)表
通過語句create table table_name(...) partitioned by (col_name1 col_type1, col_name2 col_type2 )
按照指定列創(chuàng)建分區(qū)表终吼,可以像partitioned by (col_name1 col_type1, col_name2 col_type2镀赌,.. )
創(chuàng)建多個(gè)分區(qū),一個(gè)分區(qū)對(duì)應(yīng)一個(gè)子目錄际跪,比如col_name1可以table_name目錄下的子路徑商佛,col_name2又是col_name1的下一級(jí)路徑喉钢。被用來分區(qū)的列(col_name1,col_name2等)不能在出現(xiàn)在創(chuàng)建表的列里面。
-
通過從select語句的查詢結(jié)果創(chuàng)建表
create table as select xxx from xxx
,這是一個(gè)原子操作良姆,意思就是在select語句產(chǎn)生全部結(jié)果之后肠虽,才會(huì)基于結(jié)果創(chuàng)建table,用戶不會(huì)看見執(zhí)行一部分的select結(jié)果玛追。通過這中方式創(chuàng)建的表要求創(chuàng)建出來的表不可以是:分區(qū)表(PARTITIONED BY xxx)舔痕、分桶表(CLUSTERD BY xxx)、外表豹缀。
create table like
通過CREATE table table_name like another_table
這種方式可以創(chuàng)建一個(gè)新的表table_name伯复,其表定義和another_table一樣,但是新表是一個(gè)空表邢笙,不會(huì)復(fù)制another_table的數(shù)據(jù)啸如,僅僅表結(jié)構(gòu)一樣。-
分桶表
CREATE TABLE table_name(...) CLUSTERED BY (col_name col_type, ...) SORTED BY (col_name ...) INTO num_bucker BUCKETS
創(chuàng)建分桶表氮惯,用來創(chuàng)建bucket的列名必須出現(xiàn)在創(chuàng)建表時(shí)的列里叮雳,這和partition不一樣。同時(shí)分桶和分區(qū)可以同時(shí)出現(xiàn)妇汗,不沖突帘不。往分桶表中插入數(shù)據(jù)時(shí),HIVE不會(huì)自動(dòng)的分桶和排序(分區(qū)表也不會(huì))杨箭,因此需要指定分桶寞焙,有兩種方式:
- 打開分桶開關(guān),強(qiáng)制分桶
SET hive.enforce.bucketing=true;
使用如下語句插入:
insert into table table_name sort by col_name [desc|asc],...
此時(shí)不需要使用CLUSTER BY互婿,但是需要指定sort by捣郊,且和創(chuàng)建時(shí)SORTED BY一樣。 - 設(shè)置
SET mapred.reduce.tasks = <num_buckets>;
慈参,reducer個(gè)數(shù)和創(chuàng)建時(shí)桶個(gè)數(shù)一樣呛牲。
使用如下語句插入:
insert into table table_name cluster by col_name,... sort by col_name [desc|asc],...
- 打開分桶開關(guān),強(qiáng)制分桶
- ALTER table|partition
- 設(shè)置table|partition 的serde class
ALTER TABLE table_name [PARTITION partition_spec] SET SERDE serde_class_name [WITH SERDEPROPERTIES serde_properties];
- 增加分區(qū)
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location1'] partition_spec [LOCATION 'location2'] ...;
增加分區(qū)只會(huì)修改元數(shù)據(jù),不檢查數(shù)據(jù)是否存在驮配、不加載數(shù)據(jù)娘扩。所以指定的LOCATION即使不存在也不會(huì)出錯(cuò)。
- 設(shè)置table|partition 的serde class
-
修改column
ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] [CASCADE|RESTRICT];
-
col_old_name col_new_name column_type
將舊的column改成新的壮锻。 - FIRST|AFTER column_name表示將列置于column_name之前和之后琐旁。
- CASCADE會(huì)修改表的元數(shù)據(jù)和partition的元數(shù)據(jù)。RESTRICT只會(huì)修改表的元數(shù)據(jù)躯保。
-
-
ADD/REPLACE column
ALTER TABLE table_name [PARTITION partition_spec] ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) [CASCADE|RESTRICT]
- ADD 新增列旋膳,新增的列在分區(qū)列之前,其他非分區(qū)列之后途事。
- REPLACE验懊,會(huì)刪除現(xiàn)在所有列擅羞,然后加入新的列。
- 視圖操作
- 創(chuàng)建視圖
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ] [COMMENT view_comment] [TBLPROPERTIES (property_name = property_value, ...)] AS SELECT ...;
- view_name 不能和已有的表或者視圖重名义图。
- 不指定column_name 時(shí)會(huì)從SELECT中推斷列名减俏。
- 當(dāng)視圖所基于的底層表schema發(fā)生改變時(shí),視圖的schema不會(huì)改變碱工。
- 視圖是只讀的娃承,不可以使用INSERT/DELETE/ALTER.
- 刪除視圖
DROP VIEW [IF EXISTS]db_name.view_name
- 修改視圖
ALTER VIEW [db_name.]view_name AS select_statement;
其效果和CREATE VIEW一樣,但是要求view_name必須存在怕篷。
- 索引操作
- 創(chuàng)建索引
持續(xù)更新...