問題描述
Hive數(shù)據(jù)庫對數(shù)據(jù)格式及具體的內(nèi)容并不關(guān)心篙悯,只有在數(shù)據(jù)被讀出時(shí)才會與定義的Schema進(jìn)行轉(zhuǎn)換求摇。那這個(gè)時(shí)候就會出現(xiàn)數(shù)據(jù)類型轉(zhuǎn)換的問題
準(zhǔn)備測試表和數(shù)據(jù)
create table test_null (id int, age string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
1,23
2,24c
3,32d
4,30
5,NULL
將測試數(shù)據(jù)加載到test_cast表中,查看表中的數(shù)據(jù)
表結(jié)構(gòu)為id和age兩個(gè)字段均為int類型炫贤,在Load的示例數(shù)據(jù)中age列有非數(shù)值類型的數(shù)據(jù)瑟曲,查看表數(shù)據(jù)時(shí)會看到如上截圖類型轉(zhuǎn)換失敗顯示為NULL综膀。
解決方案
Hive本身沒有機(jī)制來校驗(yàn)數(shù)據(jù)的有效性歉秫,如果我們想檢索出表中類型轉(zhuǎn)換異常的數(shù)據(jù)籽暇,則可以通過nvl和cast兩個(gè)函數(shù)來結(jié)合判斷數(shù)據(jù)是否轉(zhuǎn)換失敗了温治。
select id,nvl(cast(age as int), "error") age from test_cast;
將類型異常的數(shù)據(jù)插入到新的表中,SQL如下:
create table test_exception as
select * from (select id,nvl(cast(age as int), 'error') age from test_cast) as b where b.age='error';
同樣也可以只是用cast來進(jìn)行查找戒悠,SQL如下:
create table test_exception as
select * from (select id,nvl(cast(age as int), age) age from test_cast) as b where b.age is null;