1. Relational Design Theory
1.1 Relation model introduction
在ER model之外,最流行的數(shù)據(jù)模型是relation model,它大大簡(jiǎn)化了傳統(tǒng)網(wǎng)絡(luò)模型/層次模型的coding工作量稳衬。同時(shí)能夠評(píng)估schema的優(yōu)劣展箱,使得數(shù)據(jù)存儲(chǔ)更加簡(jiǎn)潔炬转,關(guān)系更加清晰属韧。
一個(gè)好的database應(yīng)該使用最小的存儲(chǔ)空間體現(xiàn)所有的必要attributes亮钦,規(guī)避數(shù)據(jù)冗余掩完。數(shù)據(jù)冗余最直接的弊端就是難以操作數(shù)據(jù)庫噪漾,任何一個(gè)點(diǎn)的數(shù)據(jù)update,都要找到所有重復(fù)該數(shù)據(jù)的位置進(jìn)行更新且蓬。所以database的design要盡可能減少tables之間的overlap欣硼,同時(shí)一個(gè)table實(shí)現(xiàn)一個(gè)功能,不要復(fù)合在一起恶阴。通常一個(gè)table只能含有一個(gè)entity诈胜。
相比于ER model,relation model能夠保證minimal redundancy冯事,它可以深入到non-key的relation進(jìn)行設(shè)計(jì)焦匈,但是ER model做不到。
1.2 Notation/Terminology
Relation schemas: upper-case letters, denoting set of all attributes (e.g. R, S, P, Q )
Relation instances: lower-case letter corresponding to schema (e.g. r(R), s(S), p(P), q(Q) )
Tuples: lower-case letters (e.g. t, t', t1, u, v )
Attributes: upper-case letters from start of alphabet (e.g. A, B, C, D )
Sets of attributes: simple concatenation of attribute names (e.g. X=ABCD, Y=EFG )
Attributes in tuples: tuple[attrSet] (e.g. t[ABCD], t[X])
重要概念:functional dependencies are constraints between attributes within a relation.
A relation instance r(R) satisfies a dependency X → Y if
--for any t,u ∈ r, t[X]=u[X] → t[Y]=u[Y]
Y is functionally dependent on X, OR X determines Y
schema-based dependency:
--for any t, u ∈ any r(R), t[X] = u[X] → t[Y] = u[Y]
Inference Rules:
-Reflexivity, X -> X
-Augmentation, X -> Y --> XZ -> YZ
-Transitivity, X -> Y, Y -> Z --> X -> Z
-Additivity, X -> Y, X -> Z --> X -> YZ
-Projectivity, X -> YZ --> X -> Y, X -> Z
-Pseudotransitivity, X -> Y, YZ -> W --> XZ -> W
重要概念:Closures - The largest collection of dependencies that can be derived from F is called the closure of F and is denoted F+.
1.3 Normalization
Normalization是為了減少data/relation redundancy昵仅。共有6個(gè)normal forms(NF): 1NF, 2NF, 3NF, BCNF, 4NF, 5NF缓熟,前述順序是redundancy從大到小。
(1)1NF
要求所有的attribute都是atomic的,所以任何composite的attribute都要把最細(xì)節(jié)的信息當(dāng)做attribute够滑,比如name中必須把所有first, middle, last放在attribute中垦写,不能放入一個(gè)name作為attribute,而實(shí)際上name又分為first, middle, last彰触。
(2)2NF
所有的非primary key都要完全depend on primary key梯澜,沒有partial dependencies。
(3)3NF
所有的non-key attributes不能determine other non-key attributes渴析。
(4)BCNF
所有的non-key attributes不能反過來決定primary key晚伙。
4NF和5NF極少使用,因?yàn)閞edundancy的減少往往要以犧牲performance為代價(jià)俭茧,最常用的是3NF和BCNF咆疗。