編程作業(yè)（七）

K均值算法與主成分分析算法

K均值分析算法

在本部分練習中信柿，你將實現(xiàn)K均值算法并將該算法用于圖像壓縮硼啤。最初牲证，你通過使用2D數(shù)據(jù)集幫助你理解K均值算法丢早。在此之后，你將使用K均值算法以減少顏色數(shù)量的方式壓縮圖像。

任務一實現(xiàn)K均值算法

K均值算法是一種自動將相似數(shù)據(jù)聚類在一起的方法，其基本代碼如下：

% Initialize centroids
centroids = kMeansInitCentroids(X, K);
for iter = 1:iterations
    % Cluster assignment step: Assign each data point to the
    % closest centroid. idx(i) corresponds to c?(i), the index
    % of the centroid assigned to example i
    idx = findClosestCentroids(X, centroids);

    % Move centroid step: Compute means based on centroid
    % assignments
    centroids = computeMeans(X, idx, K);
end

上述代碼中，循環(huán)分為了兩步：1）將樣本數(shù)據(jù)x⁽ⁱ⁾聚類至最近的聚類中心彪蓬；2）計算每一聚類的均值再將樣本數(shù)據(jù)重新分配。

Step 1：尋找最近的聚類中心

根據(jù)上述公式捺萌，我們在findClosestCentroids.m文件中添加如下代碼：

for i = 1 : length(idx)

    T = [];
    for j = 1 : K
        T = [T; X(i, :)];
    end
    [t, idx(i)] = min(sum((T - centroids) .^ 2, 2));  % sum(A, 2)表示對矩陣A按列求和
end

Step 2：計算每一聚類的均值

根據(jù)上述公式档冬，我們在computeCentroids.m文件中添加如下代碼：

for k = 1 : K

    T = find(idx == k);
    centroids(k, :) = sum(X(T, :)) / size(T, 1);   % size(A, 1)表示求矩陣A的行數(shù)
end

最后，我們運行該任務部分代碼桃纯，其中ex7.m文件中的代碼如下：

%% ================= Part 1: Find Closest Centroids ====================
%  To help you implement K-Means, we have divided the learning algorithm 
%  into two functions -- findClosestCentroids and computeCentroids. In this
%  part, you should complete the code in the findClosestCentroids function. 
%
fprintf('Finding closest centroids.\n\n');

% Load an example dataset that we will be using
load('ex7data2.mat');

% Select an initial set of centroids
K = 3; % 3 Centroids
initial_centroids = [3 3; 6 2; 8 5];

% Find the closest centroids for the examples using the
% initial_centroids
idx = [];
idx = findClosestCentroids(X, initial_centroids);

fprintf('Closest centroids for the first 3 examples: \n')
fprintf(' %d', idx(1:3));
fprintf('\n(the closest centroids should be 1, 3, 2 respectively)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

運行結果為：

任務二隨機初始化（該部分任務不需提交）

在實際應用中捣郊，我們更為推薦對訓練集進行隨機初始化操作。我們在kMeansInitCentroids.m文件中添加如下代碼：

% Initialize the centroids to be random examples

% Randomly reorder the indices of examples
randidx = randperm(size(X, 1));
% Take the first K examples as centroids
centroids = X(randidx(1:K), :);

上述代碼通過使用randperm()函數(shù)隨機排列了樣本數(shù)據(jù)的索引慈参，然后其基于索引的隨機排列選擇最初的K個樣本數(shù)據(jù)呛牲。這種方式避免了出現(xiàn)兩次隨機選擇相同樣本數(shù)據(jù)的問題。

任務三使用K均值算法對圖像壓縮（該部分任務不需提交）

128*128的原始圖片

在24位彩色圖像中驮配，每個像素表示為紅色娘扩、綠色和藍色強度值的三個8位無符號整數(shù)（其取值范圍為0~255）。我們的圖像包括數(shù)千種顏色壮锻，在這部分練習中琐旁，你需將顏色的數(shù)量減少至16種。通過這種方式猜绣，可以有效地壓縮圖像灰殴。在該方式下，你只需存儲16個所選顏色地RGB值掰邢，并且對于圖像中地每個像素牺陶，你只需存儲該位置地顏色索引即可。

在本練習中辣之，你將使用K均值算法來選擇用于表示壓縮圖像的16中顏色掰伸。具體來說，你應將原始圖像中的每個像素視為樣本數(shù)據(jù)怀估，通過使用K均值算法找出最佳的16種顏色狮鸭。

在Octave或 Matlab中合搅，我們使用如下代碼來導入圖像：

% Load 128x128 color image (bird small.png)
A = imread('bird small.png');

% You will need to have installed the image package to used
% imread. If you do not have the image package installed, you
% should instead change the following line to
%
% load('bird small.mat'); % Loads the image into the variable A

這創(chuàng)建了一個三維矩陣A，其前兩個索引標識像素位置歧蕉，最后一個索引表示紅色灾部、綠色或藍色。例如：A(50, 33, 3)表示行50和列33處像素的藍色強度惯退。

該部分代碼如下：

%% ============= Part 4: K-Means Clustering on Pixels ===============
%  In this exercise, you will use K-Means to compress an image. To do this,
%  you will first run K-Means on the colors of the pixels in the image and
%  then you will map each pixel onto its closest centroid.
%  
%  You should now complete the code in kMeansInitCentroids.m
%

fprintf('\nRunning K-Means clustering on pixels from an image.\n\n');

%  Load an image of a bird
A = double(imread('bird_small.png'));

% If imread does not work for you, you can try instead
%   load ('bird_small.mat');

A = A / 255; % Divide by 255 so that all values are in the range 0 - 1

% Size of the image
img_size = size(A);

% Reshape the image into an Nx3 matrix where N = number of pixels.
% Each row will contain the Red, Green and Blue pixel values
% This gives us our dataset matrix X that we will use K-Means on.
X = reshape(A, img_size(1) * img_size(2), 3);

% Run your K-Means algorithm on this data
% You should try different values of K and max_iters here
K = 16; 
max_iters = 10;

% When using K-Means, it is important the initialize the centroids
% randomly. 
% You should complete the code in kMeansInitCentroids.m before proceeding
initial_centroids = kMeansInitCentroids(X, K);

% Run K-Means
[centroids, idx] = runkMeans(X, initial_centroids, max_iters);

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ================= Part 5: Image Compression ======================
%  In this part of the exercise, you will use the clusters of K-Means to
%  compress an image. To do this, we first find the closest clusters for
%  each example. After that, we 

fprintf('\nApplying K-Means to compress an image.\n\n');

% Find closest cluster members
idx = findClosestCentroids(X, centroids);

% Essentially, now we have represented the image X as in terms of the
% indices in idx. 

% We can now recover the image from the indices (idx) by mapping each pixel
% (specified by its index in idx) to the centroid value
X_recovered = centroids(idx,:);

% Reshape the recovered image into proper dimensions
X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);

% Display the original image 
subplot(1, 2, 1);
imagesc(A); 
title('Original');

% Display compressed image side by side
subplot(1, 2, 2);
imagesc(X_recovered)
title(sprintf('Compressed, with %d colors.', K));


fprintf('Program paused. Press enter to continue.\n');
pause;

運行結果為：

主成分分析算法

在部分練習中赌髓，你將使用主成分分析算法實現(xiàn)降維。最初蒸痹，你將在2D數(shù)據(jù)集上了解PCA算法，然后在一個大數(shù)據(jù)集上使用PCA算法呛哟。

任務一可視化數(shù)據(jù)集

該部分代碼如下：

%% ================== Part 1: Load Example Dataset  ===================
%  We start this exercise by using a small dataset that is easily to
%  visualize
%
fprintf('Visualizing example dataset for PCA.\n\n');

%  The following command loads the dataset. You should now have the 
%  variable X in your environment
load ('ex7data1.mat');

%  Visualize the example dataset
plot(X(:, 1), X(:, 2), 'bo');
axis([0.5 6.5 2 8]); axis square;

fprintf('Program paused. Press enter to continue.\n');
pause;

運行結果為：

任務二實現(xiàn)PCA算法

PCA算法由兩部分組成：首先計算數(shù)據(jù)的協(xié)方差矩陣叠荠，然后使用Octave或Matlab的svd()函數(shù)來計算特征向量U₁, ···, U_n。

不過在使用PCA算法之前扫责，我們應當對訓練集進行均值歸一化操作榛鼎，其具體實現(xiàn)代碼可在featureNormalize.m文件中查看。

求取協(xié)方差矩陣的數(shù)學公式為：

svd()函數(shù)為：[U, S, V] = svd(Sigma)

我們在pca.m文件中添加如下代碼：

Sigma = X' * X / m;
[U, S, V] = svd(Sigma);

該部分在ex7_pac.m文件的代碼如下：

%% =============== Part 2: Principal Component Analysis ===============
%  You should now implement PCA, a dimension reduction technique. You
%  should complete the code in pca.m
%
fprintf('\nRunning PCA on example dataset.\n\n');

%  Before running PCA, it is important to first normalize X
[X_norm, mu, sigma] = featureNormalize(X);

%  Run PCA
[U, S] = pca(X_norm);

%  Compute mu, the mean of the each feature

%  Draw the eigenvectors centered at mean of data. These lines show the
%  directions of maximum variations in the dataset.
hold on;
drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);
drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);
hold off;

fprintf('Top eigenvector: \n');
fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));
fprintf('\n(you should expect to see -0.707107 -0.707107)\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

運行結果為：

任務三使用PCA算法降維

這部分你需要將數(shù)據(jù)集X中的每個數(shù)據(jù)投影到主成分矩陣U中的前K個分量上鳖孤。projectData.m文件中的代碼如下：

U_reduce = U(:, 1 : K);
Z = X * U_reduce;

任務四重建數(shù)據(jù)的近似值

recoverData.m添加的代碼如下：

U_reduce = U(:, 1 : K);
X_rec = Z * U_reduce';

任務三者娱、四部分的運行結果如下：

本次編程作業(yè)需要提交的部分到此結束。文檔中的后續(xù)任務苏揣，此處不再敘述黄鳍。請自行查閱和動手實踐，謝謝平匈！

最后編輯于：2017.12.10 23:47:00

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末框沟，一起剝皮案震驚了整個濱河市，隨后出現(xiàn)的幾起案子增炭，更是在濱河造成了極大的恐慌忍燥，老刑警劉巖，帶你破解...
沈念sama閱讀 211,639評論 6贊 492
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件隙姿，死亡現(xiàn)場離奇詭異梅垄，居然都是意外死亡，警方通過查閱死者的電腦和手機输玷，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 90,277評論 3贊 385
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門队丝，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人欲鹏，你說我怎么就攤上這事炭玫。” “怎么了貌虾？”我有些...
開封第一講書人閱讀 157,221評論 0贊 348
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵吞加，是天一觀的道長。經(jīng)常有香客問我，道長衔憨，這世上最難降的妖魔是什么叶圃？我笑而不...
開封第一講書人閱讀 56,474評論 1贊 283
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮践图，結果婚禮上掺冠，老公的妹妹穿的比我還像新娘。我一直安慰自己码党，他們只是感情好德崭，可當我...
茶點故事閱讀 65,570評論 6贊 386
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著揖盘，像睡著了一般眉厨。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上兽狭，一...
開封第一講書人閱讀 49,816評論 1贊 290
城市分裂傳說
那天憾股，我揣著相機與錄音，去河邊找鬼箕慧。笑死服球，一個胖子當著我的面吹牛，可吹牛的內容都是我干的颠焦。我是一名探鬼主播斩熊，決...
沈念sama閱讀 38,957評論 3贊 408
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼伐庭！你這毒婦竟也來了座享？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 37,718評論 0贊 266
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤似忧，失蹤者是張志新（化名）和其女友劉穎渣叛，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體盯捌，經(jīng)...
沈念sama閱讀 44,176評論 1贊 303
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡淳衙，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 36,511評論 2贊 327
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了饺著。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片箫攀。...
茶點故事閱讀 38,646評論 1贊 340
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖幼衰，靈堂內的尸體忽然破棺而出靴跛，到底是詐尸還是另有隱情，我是刑警寧澤渡嚣，帶...
沈念sama閱讀 34,322評論 4贊 330
?日本核電站爆炸內幕
正文年R本政府宣布梢睛，位于F島的核電站肥印，受9級特大地震影響，放射性物質發(fā)生泄漏绝葡。R本人自食惡果不足惜深碱，卻給世界環(huán)境...
茶點故事閱讀 39,934評論 3贊 313
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望藏畅。院中可真熱鬧敷硅，春花似錦、人聲如沸愉阎。這莊子的主人今日做“春日...
開封第一講書人閱讀 30,755評論 0贊 21
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽榜旦。三九已至幽七，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間章办，已是汗流浹背锉走。一陣腳步聲響...
開封第一講書人閱讀 31,987評論 1贊 266
情欲美人皮
我被黑心中介騙來泰國打工滨彻，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留藕届，地道東北人。一個月前我還...
沈念sama閱讀 46,358評論 2贊 360
代替公主和親
正文我出身青樓亭饵，卻偏偏與公主長得像休偶，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子辜羊，可洞房花燭夜當晚...
茶點故事閱讀 43,514評論 2贊 348

編程作業(yè)（七）

K均值算法與主成分分析算法

K均值分析算法

主成分分析算法

推薦閱讀更多精彩內容