Principal Component Analysis (PCA) - Part 1
mean-平均值
Variance-方差:Measure of the spread of the data
Covariance協(xié)方差Measure of the co-dependence of two random variables
特征值和特征向量押赊,矩陣
Principal Component Analysis (PCA) - Part 2
?PCA is a method of revealing underling trends in large amounts of data
?A new coordinate system is constructed by rotating the axes
?The first coordinate is the direction in which the data varies most, and so on…
?Select a few new variables which contain most of the variation of the data, and can be visualized
Principal Component Analyis (PCA) Plotting in MATLAB
有MATLAB License截止到2017.11.27误澳,一個(gè)PCA的示例
軟件下載https://www.mathworks.com/licensecenter/classroom/netsysbio
軟件學(xué)習(xí)https://matlabacademy.mathworks.com/
Clustergram in MATLAB
Hierarchical Clustering層次聚類
根據(jù)距離聚類 距離的計(jì)算包括Euclidean 腐芍、Correlation 养筒、Hamming等,其中以Euclidean最常見斩披。
下圖中subtype1-3三種壓型分開溜族,紅色是高表達(dá)的區(qū)域。上部的基因在左上角區(qū)域高表達(dá)垦沉、中部的基因在中部區(qū)域高表達(dá)煌抒、下部的基因在右下角區(qū)域高表達(dá)。
Linkage Function
有以下幾種
Average
Median
Single
Complete
Standardization
?Standardization convert data into standardized z-scores.
?Standardization is a normalization process that forces the values to fall into the range that is most suitable to be visualized in a clustergram.
?There are two standardize options: row standardization or column standardization.
?For gene expression data we generally use row standardization.
一個(gè)基于matlab的聚類示例
Self-Organizing Maps
K-means
找一個(gè)或多個(gè)點(diǎn)厕倍,mk使得該點(diǎn)到其余點(diǎn)的距離最短寡壮。
Self-Organizing Maps
與k-means的區(qū)別是,它是三維的,適用于非線性的數(shù)據(jù)诬像。
Network-Based Clustering
將網(wǎng)絡(luò)進(jìn)行聚類
Gephi可以看network
Popular Network Clustering Methods