ODS.jpg
更多信息卿城,請關注個人博客不圓的珠子
儲),是企業(yè)數(shù)據(jù)的抽取與交換平臺。通常ODS的數(shù)據(jù)不做任何轉化示弓,只保留源系統(tǒng)數(shù)據(jù)损拢。并將數(shù)據(jù)分發(fā)給數(shù)據(jù)倉庫,數(shù)據(jù)集市等下游系統(tǒng)被碗,并在ODS數(shù)據(jù)基礎上開發(fā)各類報表某宪。目前接觸到的項目中锐朴,將數(shù)據(jù)倉庫與ODS集成到一個項目中兴喂,共用同一個數(shù)據(jù)庫衣迷。減少了數(shù)據(jù)傳輸,提高了資源的利用率酱酬。在ODS層數(shù)據(jù)中,我們將源系統(tǒng)數(shù)據(jù)變化的過程保留下來岳悟,由于源系統(tǒng)和ODS不在同一服務器上,源系統(tǒng)通常通過文件交換的方式傳輸數(shù)據(jù)贵少,為了減少文件傳輸,數(shù)據(jù)文件可分為增量文件和全量文件滔灶。根據(jù)保留數(shù)據(jù)類型的不同普碎,又分為流水數(shù)據(jù)與歷史數(shù)據(jù)录平。通常我們在開發(fā)ODS時麻车,涉及到以下幾種算法缀皱,全量歷史加載算法,增量歷史加載算法动猬,流水全量加載算法啤斗,流水增量加載算法,全量覆蓋加載算法赁咙。
2.相關變量說明
變量名 | 變量描述 |
---|---|
${DB} | 數(shù)據(jù)庫實例名 |
${TBNAME} | 表名 |
$COLS | 表字段钮莲,以逗號分隔,col1,col2.. |
$DATA_DT | 數(shù)據(jù)加載日期 |
${SRC_DB} | 源數(shù)據(jù)庫實例名 |
${SRC_TBNAME} | 源表名 |
ND | 臨時表彼水,用于加載當日數(shù)據(jù) |
OD | 臨時表崔拥,用于加載上日數(shù)據(jù) |
${HT_S_DT} | 拉鏈開始日期 |
${HT_E_DT} | 拉鏈結束日期 |
${TX_DATE} | 交易日期,作業(yè)運行日期 |
WT_U | 臨時表凤覆,用于保存需要關鏈數(shù)據(jù) |
WT_I | 臨時表链瓦,用于保存需要插入的數(shù)據(jù) |
3.全量覆蓋加載算法
全量覆蓋加載是所有加載中最簡單的一種數(shù)據(jù)加載方式。它是指直接將目標表中數(shù)據(jù)刪除并將源系統(tǒng)提供的數(shù)據(jù)文件直接加載到目標表中的過程盯桦。偽代碼如下:
//清除表數(shù)據(jù)
TRUNCATE TABLE ${DB}.${TBNAME} ;
//將源表數(shù)據(jù)插入到目標表中
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT)
SELECT
$COLS
,TO_DATE('$DATA_DT', 'YYYYMMDD')
FROM ${SRC_DB}.${SRC_TBNAME};
4.全量歷史加載算法
- 將當日數(shù)據(jù)加載到ND表中
INSERT INTO ${ND}( $COLS )
SELECT $COLS
FROM ${SRC_DB}.${SRC_TBNAME}
;
- 將上日正常數(shù)據(jù)加載到臨時表OD中
INSERT INTO OD
SELECT * FROM ${DB}.${TBNAME}
WHERE ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 產生插入結果數(shù)據(jù)(新增數(shù)據(jù)慈俯,當日數(shù)據(jù)不在昨日數(shù)據(jù)中),并將數(shù)據(jù)加載到臨時表WT_I中
INSERT INTO WT_I(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,$TX_DATE
,$MAX_DT
,$TX_DATE
FROM ${ND}
WHERE ($COLS_NOT_NULL)
NOT IN (SELECT $COLS_NOT_NULL
FROM OD)
;
- 產生關鏈數(shù)據(jù)(昨日數(shù)據(jù)不在當日數(shù)據(jù)中,已經被源系統(tǒng)刪除的數(shù)據(jù))拥峦,并將數(shù)據(jù)加在到臨時表WT_U中
INSERT INTO WT_U(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,${HT_S_DT}
,${TX_DATE}
,DATA_DT
FROM OD
WHERE ( $COLS_NOT_NULL )
NOT IN (SELECT $COLS_NOT_NULL
FROM ${ND})
;
- 將目標表中已經失效的數(shù)據(jù)關鏈
UPDATE ${DB}.${TBNAME}
SET ${HT_E_DT} = ${TX_DATE}
WHERE ($COLS_NOT_NULL) IN (SELECT $COLS_NOT_NULL FROM WT_U)
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 將新增數(shù)據(jù)插入插入到目標表中
INSERT INTO ${DB}.${TBNAME}(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT)
SELECT
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT
FROM WT_I;
5.增量歷史加載算法
1.將當日數(shù)據(jù)加載到臨時表ND表中
INSERT INTO ${ND}( $COLS )
SELECT $COLS
FROM ${DB}.${SRC_TBNAME}
;
- 將上日正常數(shù)據(jù)加載到臨時表OD中(根據(jù)表主鍵查找當日數(shù)據(jù))
INSERT INTO OD
SELECT * FROM ${DB}.${TBNAME}
WHERE ($PRI_KEY_NAME) IN (SELECT $PRI_KEY_NAME FROM ${ND})
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
- 產生插入結果數(shù)據(jù)(當日數(shù)據(jù)不在昨日數(shù)據(jù)中)肥卡,并將數(shù)據(jù)加載到臨時表WT_I中
INSERT INTO WT_I(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,$TX_DATE
,$MAX_DT
,$TX_DATE
FROM ${ND}
WHERE ($COLS_NOT_NULL)
NOT IN (SELECT $COLS_NOT_NULL
FROM OD)
;
- 產生關鏈數(shù)據(jù)(昨日數(shù)據(jù)不在當日數(shù)據(jù)中,源系統(tǒng)已經失效數(shù)據(jù)),并將數(shù)據(jù)加在到臨時表WT_U中
INSERT INTO WT_U(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT )
SELECT
$COLS
,${HT_S_DT}
,${TX_DATE}
,DATA_DT
FROM OD
WHERE ( $COLS_NOT_NULL )
NOT IN (SELECT $COLS_NOT_NULL
FROM ${ND})
;
- 將目標表中失效的數(shù)據(jù)關鏈
UPDATE ${DB}.${TBNAME}
SET ${HT_E_DT} =${TX_DATE}
WHERE ($COLS_NOT_NULL) IN (SELECT $COLS_NOT_NULL FROM WT_U)
AND ($PRI_KEY_NAME) IN (SELECT $PRI_KEY_NAME FROM ${ND})
AND ${HT_E_DT} > ${TX_DATE}
AND ${HT_S_DT} <= ${TX_DATE}
;
- 插入當日新增數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}(
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT)
SELECT
$COLS
,${HT_S_DT}
,${HT_E_DT}
,DATA_DT
FROM WT_I;
6.流水增量加載算法
- 清除當日數(shù)據(jù)
DELETE FROM ${DB}.${TBNAME}
WHERE DATA_DT = '$DATA_DT'
;
- 插入數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT)
SELECT
$COLS
,'$DATA_DT'
FROM ${DB}.${SRC_TBNAME}
MINUS ALL
SELECT
$COLS
,'$DATA_DT'
FROM ${DB}.${TBNAME}
WHERE DATA_DT <= ${TX_DATE}
;
7.流水全量加載算法
- 清除當日數(shù)據(jù)
DELETE FROM ${DB}.${TBNAME}
WHERE DATA_DT = $DATA_DT
- 插入數(shù)據(jù)
INSERT INTO ${DB}.${TBNAME}
( $COLS
,DATA_DT
,HT_S_DT
,HT_E_DT
)
SELECT
$COLS
,'$DATA_DT'
,'$DATA_DT'
,'$MAX_DT'
FROM ${DB}.${SRC_TBNAME};