Machine Learning - Week 8 : Unsupervised Learning&Dimensionality Reduction(k-means & PCA)
Ubuntu16.04.1上安裝Octave4.0.2
Unsupervised Learning
1. Clustering
1.1 K-means algorithm
input
steps
1.2 Optimization objective
loss function J()
steps
1.3 Random Initialization
Make K-means avoid local optima——multiple random initializations
Random Initialization
initialize some times to avoid local optima when k = 2 to 10
1.4 Choosing the number of clusters【k】——Elbow method
Elbow method
tests
Dimensionality Reduction
2. Motivation
2.1 Motivation 1: Data Compression
For example: 2D -> 1D, 3D -> 2D
2.2 Motivation 2: Data Visualization
ND -> 2/3D can visualize it (N >= 2/3).
3. Principal Component Analysis【PCA】
通常需要先將數(shù)據(jù)歸一化
Data preprocessing
What PCA do
PCA實(shí)現(xiàn) [U,S,V]=svd(Sigma)
get U(n*k)
PCA Algorithm Summary
4. Applying PCA
4.1 Reconstruction from compressed representation
U reduce
4.2 Choosing the number of principal components
Choosing k
僅調(diào)用一次svd()函數(shù)胆胰,計(jì)算不同的k值是否滿足>=0.99恶迈,獲得合適的k參數(shù)。
Choosing k method
Choosing k in Octave
4.3 Advice for applying PCA
Application of PCA
使用正則化的方法避免過(guò)擬合,而非PCA
Bad use of PCA : To prevent overfitting
在已經(jīng)使用ML算法后發(fā)現(xiàn)有必要使用PCA時(shí),再使用
When PCA should be used