1.課程簡介
(1)教師是Rachid Hamadi,名字發(fā)音是Ra’shid空执。
(2)課程主要是理論,少量編程怀伦,使用開源軟件PostgreSQL 9.3.3脆烟,這個軟件是engineer-based山林,市面常見的是web-based MySQL房待。
(3)共3個assignment:
– Assignment 1: ER Diagram / Relational Mapping (10%) (due Week 5)
– Assignment 2: SQL Queries and Functions (15%) (due Week 8)
– Assignment 3: Normalisation / Relational Algebra / Transaction (15%) (due Week 11)
database的一個核心是消除redundancy,一條數(shù)據(jù)不要重復(fù)儲存驼抹。database最常見的問題是如何同時處理數(shù)據(jù)桑孩,比如,在一個賬戶下多人同時save和withdraw框冀。
(4)9311的后續(xù)課程
? COMP9315: how to build relational DBMSs (write your own Oracle)
? COMP9318: techniques for data mining (discovering patterns in DB)
? COMP6714: information retrieval, web search (dealing with text data)
? COMP9319: web search and data compression (dealing searching compressed web data)
? COMP932(1|2|3): service-oriented computing, which relies on DB background
課件中在大標(biāo)題上經(jīng)常能看到“cont’d”流椒,含義是continue
2.Data Modeling
(1)什么是database
最簡單的定義是Database is a set of tables that data are related to each other. 換言之,無關(guān)聯(lián)的數(shù)據(jù)不應(yīng)該用數(shù)據(jù)庫處理明也。
Textbook的定義:a collection of related data
(2)database中的重要概念
1.Data:能描述所代表的對象特征
2.Relationships:定義數(shù)據(jù)間的關(guān)聯(lián)
3.Constraints:關(guān)于數(shù)據(jù)和關(guān)聯(lián)的限制宣虾,比如bank teller不能修改客戶的賬戶金額
4.Redundancy: 每條數(shù)據(jù)只在一處存儲
5.Data manipulation:常見declarative和procedural兩類,data manipulation是數(shù)據(jù)操作温数,比如篩選數(shù)據(jù)绣硝、插入、刪除撑刺。declarative programming描述對象的性質(zhì)而不是流程鹉胖,主要在邏輯層面構(gòu)建管理數(shù)據(jù)庫,procedural programming起源于structured programming,基于程序調(diào)用(procedure call)或函數(shù)調(diào)用(function call)進(jìn)行流程控制甫菠。
6.Transactions:多條數(shù)據(jù)的同時處理挠铲,一條數(shù)據(jù)的多用戶處理,以及引發(fā)的數(shù)據(jù)集聚效應(yīng)
7.Concurrency:多用戶同時調(diào)用一個數(shù)據(jù)
8.Scale:big data被應(yīng)用的越來越多寂诱,如何構(gòu)建成規(guī)模的數(shù)據(jù)庫變成越來越重要的問題
(3)什么是data
Textbook的定義:Known facts that can be recorded and have explicit meaning. 這意味著數(shù)據(jù)應(yīng)該有數(shù)據(jù)類型和存儲方式等信息拂苹。
孤立存在的數(shù)據(jù)沒有什么價值,有關(guān)聯(lián)的數(shù)據(jù)形成確定的數(shù)據(jù)結(jié)構(gòu)痰洒,再通過database management system(DBMS)管理就實現(xiàn)了數(shù)據(jù)的價值醋寝。
(4)什么是DBMS
Textbook的定義:a collection of programs that enables users to create and maintain a database.
(5)什么是database system
Textbook的定義:the database and DBMS together.
按照范圍來記憶,上文提到最小的單元是data带迟,data關(guān)聯(lián)起來后形成database音羞,構(gòu)建對database的管理形成DBMS,database和DBMS加和構(gòu)成了database system仓犬。即
data < database < DBMS < database system
(6)Database Users
1.Database Administrator(DBA):管理數(shù)據(jù)庫權(quán)限嗅绰,協(xié)調(diào)服務(wù)用戶,維護(hù)數(shù)據(jù)庫的軟硬件搀继,提升數(shù)據(jù)庫的效率
2.Databse Designer:定義數(shù)據(jù)結(jié)構(gòu)窘面,數(shù)據(jù)限制,數(shù)據(jù)傳輸叽躯,要直接對終端客戶的需求負(fù)責(zé)
3.End Users:數(shù)據(jù)查詢财边,數(shù)據(jù)更新等。其下又分為4類用戶--a. Casual点骑,偶爾使用數(shù)據(jù)庫酣难;b. Naive,占據(jù)絕大多數(shù)的end users黑滴,他們不需要了解數(shù)據(jù)庫的底層構(gòu)建憨募,而是在頂層使用,例如之前提到的bank-teller袁辈;c. Sophisticated菜谣,這些用戶非常熟悉database,他們的工作和database密切相關(guān)晚缩,例如商業(yè)分析師尾膊、科學(xué)家和工程師;d. Stand-alone荞彼,維護(hù)使用自己的數(shù)據(jù)庫冈敛,并不分享,例如個人通訊錄
(7)DBMS
課程核心是學(xué)會SQL-Structured Query Language卿泽,具體使用什么樣的DBMS不重要莺债,一般Relational database system is more efficient and flexible than object-oriented database system.
注:PL-Procedural Language
1.Oracle:resource hungry, non-flexible
2.MySQL:開源滋觉,高效,web-based齐邦,但oracle控制了MySQL椎侠,未來可能解除開源的屬性
3.PostgreSQL:課程使用
4.MongoDB:開源,NoSQL database program
(8)數(shù)據(jù)庫開發(fā)流程
database application development:
1.analyse application requirements
2.develop a data model to meet these requirements
3.define operations (transactions) on this model
4.implement the data model as relational schema
5.implement transactions via SQL and PLs
6.construct a web interface to these transactions
分析需求-建立模型-定義操作-實現(xiàn)關(guān)系模型-實現(xiàn)操作-構(gòu)建交互界面
(9)數(shù)據(jù)庫語言
1.Request to DBMS(e.g. SQL):
data manipulation language, DML (data level);
data definition language, DDL (data structure constraints);
create and drop database, indexes, functions
2.Results/effects from DBMS requests
tuples or sets of tuples
changes to underlying data store
(10)Data Modeling
1.Data modeling的目標(biāo):describe information, relationships, constraints.
2.Data modelling is a design process, it converts requirements into a data model.
3.Data modeling類型:logical(抽象的概念設(shè)計措拇,比如Entity Relationship, Object Definition Language)我纪,physical(實用型,例如relational)丐吓。一般先design using abstract mode浅悉,再 map to physical model。
4.design ideas:從簡單的信息入手券犁,先找到對象然后分析關(guān)聯(lián)术健,一般自然語言的名詞是數(shù)據(jù),而動詞是關(guān)聯(lián)粘衬,考慮所有可能的數(shù)據(jù)荞估。沒有完美的design,design時要多考慮正確性correctness稚新,完整性completeness勘伺,一致性consistency。
Example Gmail
最重要的是分析model的核心要素:information/data, relationships, constraints
Data Objects:
1.Users--username, password, security questions, backup email, phone, etc.
2.Emails--subject, body, from, to, cc, bcc, date-sent, status, etc.
3.Attachments--content-type, content, etc.
4.Sessions--user, expiry-time, location, etc.
5.Labels--name, color, etc.
Relationships:
Email-has-Label
Email-has-Attachment
Email-is a reply to-Email
3.ER Notation(以Textbook的Entity Relationship notation為準(zhǔn))
(1)ER Model
世界是由互相關(guān)聯(lián)的entities構(gòu)建起來的褂删。
ER的三個核心要素是:
attributes屬性--data item describing a property of interest
entity實體--collection of attributes describing object of interest
relationship關(guān)聯(lián)--association between entities (objects)
(2)ER Diagrams
ER diagrams are a graphical tool for data modelling.
It is consist of Data, Relationships, Attributes, Connections among them.
圖中有3種一維幾何圖形(線段飞醉,加粗線段,箭頭)和3種二維幾何圖形(矩形屯阀,橢圓缅帘,菱形)
1.橢圓,代表attributes
2.矩形蹲盘,代表data/entities
3.菱形股毫,代表relationships
4.線段,代表participation中的”部分關(guān)系“
5.加粗線段召衔,代表participation中的”完全關(guān)系“
6.箭頭,代表cardinality中的”1“祭陷;非箭頭苍凛,代表cardinality中的”多“
7.加橫線的橢圓,代表key attributes兵志,一般是unique的
另外
1.entity/data通常用單數(shù)名詞表示一類概念
2.如果一個橢圓是可以依據(jù)其他橢圓計算得到的醇蝴,則用虛線橢圓表示,比如年齡可以用出生日期計算想罕。叫做derived attributes
3.如果一個橢圓包含多個橢圓悠栓,則用雙線橢圓表示霉涨,比如最喜歡的食物,其中包含多個attributes惭适。叫做multivalued attributes
(3)entity sets
一種是extensional view笙瑟,通過同類事物擴(kuò)展得到一個entity set;
另一種是intensional view癞志,通過抽象事物得到一個class的entity set往枷。
(4)attributs的keys
1.key/superkey指的是attributes中比較獨(dú)特可以代表object的一個/一組,從superkey中如果刪除一個attribute凄杯,往往還可以代表object错洁;
2.candidate key值得也是attributes中比較獨(dú)特可以代表object的一個/一組,但是如果從candidate key中刪除一個attribute戒突,不能再代表object屯碴;
3.primary key是由databse designer選擇的
注:ER圖中key用下劃線標(biāo)注。
(5)relational sets
Relationship被定義為an association among several entities膊存;
Relationship set被定義為collection of relationships of the same type窿锉;
Degree = # entities involved in relationship (in ER model, ≥ 2);
Cardinality = # associated entities on each side of relationship膝舅;
Participation = must every entity be in the relationship
(6)Subclasses and Inheritance
1.A subclass of an entity set A is a set of entities:
? with all attributes of A, plus (usually) its own attributes
? that is involved in all of A's relationships, plus its own
2.Properties of subclasses:
? overlapping or disjoint
? total or partial