為什么做增量
數(shù)據(jù)量大,只需要增量最新被更改的數(shù)據(jù)盆昙。如何做增量
(1)insert into
比如行為數(shù)據(jù)羽历,發(fā)生一條記錄就插入一條,數(shù)據(jù)不會被update淡喜。
嚴(yán)格T+1秕磷,初始化時候限定created_at的時間。否則凌晨之后的數(shù)據(jù)會被重復(fù)插入炼团。
(2)insert overwrite
初始化的時候不限定時間澎嚣。
增量的時候where to_date(a.updated_at) >= '%s'
(選擇updated_at 做為增量字段,接近凌晨創(chuàng)建的數(shù)據(jù)2016-12-20 23:59:58的更新時間是T日瘟芝,此數(shù)據(jù)不會被增量進易桃,接近嚴(yán)格T+1 。以updated_at做為初始化時候時間的限制字段模狭,會將T日凌晨之后更新創(chuàng)建于幾日前的數(shù)據(jù)剔除颈抚,此為bug。)
增量的方法步驟如下:
1)建表
use default;
create table if not exists loan_f_order_info
(
ord_no string comment "訂單號",
uid string comment "用戶ID",
crt_tim timestamp comment "創(chuàng)建時間",
upd_tim timestamp comment "更新時間"
)
COMMENT "test表";
2)初始化
insert overwrite table loan_f_order_info
select order_no as ord_no,
uid as uid,
created_at as crt_tim,
updated_at as upd_tim
from ods_loan.ods_loan_dsloan_loan_stage_order
3)增量
a)insert into table default.loan_f_order_info
select order_no as ord_no,
uid as uid,
created_at as crt_tim,
updated_at as upd_tim
from ods_loan.ods_loan_dsloan_loan_stage_order
where to_date(updated_at) >= date_sub(from_unixtime(unix_timestamp(),'yyyy-MM-dd'),1)
b)insert overwrite table default.loan_f_order_info
select ord_no,
uid,
crt_tim,
upd_tim
from (
select *,
row_number() over(partition by uid, ord_no order by upd_tim desc) as row_num
from default.loan_f_order_info
) za
where row_num = '1'