主成分分析

1. 數(shù)據(jù)維度

PCA 主成分分析
principle component analysis
PCA是一套全面應(yīng)用于各類數(shù)據(jù)分析的分析方法，包括特征集壓縮feature set compression
每當(dāng)進行數(shù)據(jù)可視化的時候吆录，都可以應(yīng)用主成分分析

二維數(shù)據(jù)

image.png

一維數(shù)據(jù)

image.png

并非嚴格一維數(shù)據(jù)锅论，某些地方會出現(xiàn)一些偏差药薯，但是為了理解這些數(shù)據(jù)饿序，我樂意將這些偏差信息看成是干擾信息熬苍，并將其看作是一維的數(shù)據(jù)集：

image.png

PCA特別擅長處理坐標系的移位和旋轉(zhuǎn)

6. 用于數(shù)據(jù)轉(zhuǎn)換的PCA

如果你擁有的任何形狀的數(shù)據(jù)劳澄，
PCA finds a new coordinate system that's obtained from the old one by translation and rotation only

PCA moves the center of the coordinate system with the center of the data

PCA move the x-axis into the principle axis of variation ,where you see the most variation relative to all the data points

PCA move the y-axis down the road into a orthogonal less important directions of variation

主成分分析為你找到這些軸地技，并告訴你這些軸的重要性

7. 新坐標系的中心

（2,3）

image.png

8. 新坐標系的主軸

image.png

△x=1
△y=1

9.新系統(tǒng)的第二主成分

image.png

△x=-1
△y=1

在PCA分析法中書寫向量時，最低輸出向量值被規(guī)定為1

歸一化 PCA 成分向量后秒拔，
△x（黑）= 根號 2 分之一
△y（黑）= 根號 2 分之一 #新的x軸
新的x軸和新的y軸所屬的向量是正交的
△x（紅）= 負根號 2 分之一
△y（紅）= 根號 2 分之一 #新的y軸

11. 練習(xí)查找新軸

通過PCA還可以得出一個重要值莫矗，那就是軸的散布值 spread value
如果散布率較小，那么散布值對于主軸來說傾向于是一個很大的值砂缩，而對于第二主成分軸來說則小很多

12. 哪些數(shù)據(jù)可用于PCA

image.png

Part of the beauty of PCA is that the data doesn't have to be perfectly 1D in order to find the principal axis!

13. 軸何時占主導(dǎo)地位

長軸是否占優(yōu)勢
所謂長軸占優(yōu)勢是指軸的重要值importance value作谚，或者說長軸特征值要大于短軸的特征值

14.可測量的特征與潛在的特征練習(xí)

給定一些房屋的參數(shù)，如果想預(yù)測它的價格庵芭，該使用以下那個算法呢妹懒？
□ 決策樹分類器
□ SVC
□ √線性回歸
因為我們預(yù)期的輸出是連續(xù)性的，所以使用分類器是不合適的

15. 從四個特征到兩個

給定一些房屋的參數(shù)双吆，預(yù)測它的價格

可衡量特征：
square footage
no. of rooms
school ranking
neighborhood safety

潛在特征
size
neighborhood

16. 在保留信息的同時壓縮

將四項特征壓縮為兩項眨唬，以便我們能真正獲得核心的信息的最好方法是什么？
我們實際要調(diào)查的是size neighborhood這兩個特征
哪個是最合適的選擇參數(shù)的工具好乐？
□ SelectKBest（K 為要保留的特征數(shù)量）
□ √ SelectPercentile 指定你希望保留的特征的百分比
因為我們已知希望得到兩個特征匾竿，所以使用SelectKBest,它將保留最強大的兩個特征，并拋棄除此之外的所有其他特征

如果我們知道本來有多少個可選特征蔚万，也知道最后需要多少個特征岭妖，那么也可以使用 SelectPercentile

17.復(fù)合特征

我有很多特征可以使用，但是假設(shè)只有一小部分特征在驅(qū)動數(shù)據(jù)模式笛坦，然后我將根據(jù)這個找出一個復(fù)合特征区转，以便弄清楚潛在的現(xiàn)象
這里的復(fù)合特征/組合特征，也被稱為主要成分principle component ,是一個非常強大的算法版扩，本課中，我們主要在特征降維的情況中討論它侄泽，降低特征的維度礁芦，從而將一大堆特征縮減至幾個特征
PCA也是非監(jiān)督學(xué)習(xí)中一種非常強大的獨立算法

image.png

例子：將square footage no.room 轉(zhuǎn)化成size
上圖看上去有些像線性回歸，但是PCA并不是線性回歸，線性回歸的目的是預(yù)測與輸入值相對應(yīng)的輸出值柿扣，而PCA不是要預(yù)測任何值肖方，而是算出數(shù)據(jù)的大致方向，使得我們的數(shù)據(jù)能夠在盡可能少地損失信息的同時映射在該方向上

在我找到了主成分未状，也就是這個向量的方向后俯画，我會對所有的數(shù)據(jù)點進行一個處理，這里稱為映射司草，數(shù)據(jù)最初是二維的艰垂，但是在我把它映射到主成分上后，它就變成了一維數(shù)據(jù)

18. 最大方差

variance

the willingness/flexibility of an algorithm to learn
technical term in statistics -- roughly the 'spread' of a data distribution(similar to standard deviation)
對于具有較大方差的特征埋虹，它的樣本散布的值范圍極大猜憎，若方差較小，則各個特征通常是緊密聚集在一起

image.png

在上圖中搔课，在數(shù)據(jù)周邊畫一個橢圓胰柑，使得橢圓內(nèi)包含大部分數(shù)據(jù)，這個橢圓可以用兩個數(shù)字的參數(shù)來表示爬泥，即橢圓的長軸距離和短軸距離柬讨，那么在這兩條線中，哪一條線所指的方向是數(shù)據(jù)的最大方差袍啡？即哪一個方向上的數(shù)據(jù)更為分散踩官？

長軸的線是數(shù)據(jù)最大方差的方向

19. 最大方差的優(yōu)點

principal component of a data set is the direction that has the largest variance because ?

image.png

why do you think we define the principle component this way?
what's the advantage of looking for the direction that has the largest variance?
when we are doing our project of these two dimension feature space down on to one dimension,why do we project all the data points down onto this heavy red line instead of projecting them onto this shorter line?
□ 計算復(fù)雜度低
□ √可以最大程度保留來自原始數(shù)據(jù)的信息量
□ 只是一種慣例，并沒有什么實際的原因

當(dāng)我們沿著最大方差的維度進行映射時葬馋，它能夠保留原始數(shù)據(jù)中最多的信息

20. 最大方差與信息損失

safety problems + school ranking →(PCA) neighborhood quality
find the direction of maximal variance
最大方差的方向就是將信息的損失減到最小的方向

image.png

當(dāng)我將這些二維的點投射到這條一維的線上時卖鲤，就會丟失信息，丟失的信息量等于某個特定的點與它在這條線上的新位置之間的距離

21. 信息損失和主成分

信息丟失：各個點與其在該線上的新特征上新投影的點之間的距離總和

當(dāng)我們將方差最大化的同時畴嘶，我們實際上是將點與其在該線上的投影之間的距離最小化
projection onto direction of maximal variance minimizes distance from old(higher-dimensional) data point to its new transformed value
→ minimizes information loss

23. 用于特征轉(zhuǎn)換的 PCA

PCA as a general algorithm for feature transformation
我們將所有這四個特征一起放入PCA中蛋逾，它可以自動將這些特征結(jié)合成新的特征，并對這些新特征的相對能力劃分等級窗悯，如果我們的案例中有兩個隱藏特征推動數(shù)據(jù)中大部分變化区匣，那么PCA將選出這些特征，并將其作為第一和第二主成分蒋院，第一個主成分即影響最大的特征亏钩。
由于第一個主成分是混合產(chǎn)生的，可能包含所有特征或多或少的元素欺旧，但是該非監(jiān)督學(xué)習(xí)算法非常強大姑丑，可以幫助你從根本上了解數(shù)據(jù)中的隱藏特征，如果你對房價一無所知辞友，PCA仍可讓你獲得自己的見解栅哀，如總體上有兩個因素推動房價的變動震肮，至于這兩個因素是不是neighborhood和size,則取決于你自己，現(xiàn)在除了進行降維操作留拾，你還會了解到有關(guān)數(shù)據(jù)變化的模式的重要信息

25. PCA 的回顧/定義

review/definition of PCA

systematized way to transform input features into principal component
use principal components as new features in regression/classification
you can also rank the principle components,the more variance you have of the data along a given principal component,the higher that principal component is ranked.so the one that has the most variance will be the first principal component,second will be the second principal component,and so on .
the principal components are all perpendicular to each other in a sense,so the second principal component is mathematically guaranteed to not overlap at all with the first principal component,and the third will not overlap with the first through the second ,and so on.so you can treat them as independent features in a sense.
there is a maximum number of principal components you can find,it's equal to the number of input features that you had in you data set.usually, you'll only use the first handful of principal components,but you could go all the way out and use the maximum number,in that case though,you are not really gaining anything,you're just representing your features in a different way,so the PCA won't give you the wrong answer,but it doesn't give you any advantages over just using the original input features if you're using all of the principal components together in a regression or classification task.

26. 將 PCA 應(yīng)用到實際數(shù)據(jù)

在以下幾段視頻中戳晌，Katie 和 Sebastian 研究安然的一些財務(wù)數(shù)據(jù)，并著眼于 PCA 的應(yīng)用痴柔。

請記住沦偎，要獲得包含項目代碼的版本庫以及此數(shù)據(jù)集，請訪問以下網(wǎng)址：

https://github.com/udacity/ud120-projects

安然數(shù)據(jù)位于：final_project/

28. sklearn 中的 PCA

def doPCA():
    from sklearn.decomposition import PCA
    pca = PCA(n_components = 2)
    pca.fit(data)
    returen pca

pca = doPCA()
print pca.explained_variance_ratio_  #方差比咳蔚，是特征值的具體表現(xiàn)形式豪嚎，可以了解第一/二個主成分占數(shù)據(jù)變動的百分比
first_pc = pca.components_[0]
second_pc = pca.components_[1]

transformed_data = pca.transform(data)
for ii,jj in zip(transformed_data,data):
    plt.scatter(first_pc[0]**ii[0],first_pc[1]**ii[0],color='r')
    plt.scatter(second_pc[0]**ii[1],second_pc[1]**ii[1],color='c')
    plt.scatter(jj[0],jj[1],color='b')

29.何時使用 PCA

latent features driving the patterns in data(big shots at Enron)
if you want to access to latent features that you think might be showing up in the patterns in your data,maybe the entire point of what you're trying to do is figure out if there's a latent feature,in other words,you just want to know the size of the first principal components,then measure who the big shots are at Enron.
dimensionality reduction
-- visualize high dimensional data
sometimes you will have more than two features,you have to represent three or four or many numbers about a data point if you only have two dimensions in which to draw ,and so what you can do is project it down to the first two principal components and just plot that,and just draw that scatter plot.
-- reduce noise
the hope is that the first or the second,your strongest principal components are capturing the actual patterns in the data,and the smaller principle components are just representing noisy variations about those patterns,so by throwing away the less important principle components,you're getting rid of that noise.
-- make other algorithms(regression,classification) work better with fewer inputs(eigenfaces)
using PCA as pre-processing before you use another algorithm,so a regression or a classification task,if you have very high dimensionality, and if you have a complex,say,classification algorithm,the algorithm can be very high variance,it can end up fitting to noise in the data,it can end up running really low,there are lots of things that can happen when you have very high input dimensionality with some of these algorithms,but, of course,the algorithm might work really well for the problem at hand,so one of the things you can do is use PCA to reduce the dimensionality of your input features,so that then your,say classification algorithm works better.
in the example of eigenfaces,a method of applying PCA to pictures of people,this is a very high dimensionality space,you have many many pixels in the picture,but say,you want to identify who is pictured in the image,you are running some kind of facial identification,so with PCA you can reduce the very high input dimensionality into something that's maybe a factor of ten lower,and feed this into SVM,which can then do the actual classification of trying to figure out who's pictured,so now the inputs ,instead of being the original pixels or the images,are the principal components.

30. 用于人臉識別的PCA

PCA for facial recognition
what makes facial recognition in pictures good for PCA?
□ √pictures of faces generally have high input dimensionality (many pixels)
人臉照片通常有很高的輸入維度（很多像素）
在這種情況下，縮減非常重要屹篓，因為SVM很難處理一百萬個特征
□ √faces have general patterns that could be captured in smaller number of dimensions(two eyes on top,mouth /chin on bottom,etc.)
人臉具有一些一般性形態(tài)疙渣，這些形態(tài)可以以較小維數(shù)的方式捕捉，比如人一般都有兩只眼睛堆巧，眼睛基本都位于接近臉的頂部的位置
在兩張頭像中妄荔，并不是一百萬個像素點都存在差異，而是只有幾個主要的差異點谍肤，我們或許可以用PCA挑選出這些點啦租，并讓它們發(fā)揮最大用處
□ ×facial recognition is simple using machine learning(humans do it easily)
使用機器學(xué)習(xí)技術(shù)，人臉識別是非常容易的（因為人類可以輕易做到）
很難用決策樹來實現(xiàn)人臉識別

31. 特征臉方法代碼

在人臉識別中荒揣，結(jié)合使用PCA和SVM是很強大的

"""
===================================================
Faces recognition example using eigenfaces and SVMs
===================================================

The dataset used in this example is a preprocessed excerpt of the
"Labeled Faces in the Wild", aka LFW_:

  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

  .. _LFW: http://vis-www.cs.umass.edu/lfw/

  original source: http://scikit-learn.org/stable/auto_examples/applications/face_recognition.html

"""
print __doc__

from time import time
import logging
import pylab as pl
import numpy as np

from sklearn.cross_validation import train_test_split
from sklearn.datasets import fetch_lfw_people
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import RandomizedPCA
from sklearn.svm import SVC

# Display progress logs on stdout
logging.basicConfig(level=logging.INFO, format='%(asctime)s %(message)s')
###############################################################################
# Download the data, if not already on disk and load it as numpy arrays
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)

# introspect the images arrays to find the shapes (for plotting)
n_samples, h, w = lfw_people.images.shape
np.random.seed(42)

# for machine learning we use the data directly (as relative pixel
# position info is ignored by this model)
X = lfw_people.data
n_features = X.shape[1]

# the label to predict is the id of the person
y = lfw_people.target
target_names = lfw_people.target_names
n_classes = target_names.shape[0]

print "Total dataset size:"
print "n_samples: %d" % n_samples
print "n_features: %d" % n_features
print "n_classes: %d" % n_classes


###############################################################################
# Split into a training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

###############################################################################
# Compute a PCA (eigenfaces) on the face dataset (treated as unlabeled
# dataset): unsupervised feature extraction / dimensionality reduction
n_components = 150

print "Extracting the top %d eigenfaces from %d faces" % (n_components, X_train.shape[0])
t0 = time()
pca = RandomizedPCA(n_components=n_components, whiten=True).fit(X_train) #figuring out what the principle components are 
print "the raio is ", pca.explained_variance_ratio_ #每個主成分的可釋方差  0.19346534  0.15116844
print "done in %0.3fs" % (time() - t0)

eigenfaces = pca.components_.reshape((n_components, h, w)) #asks for the eigenfaces

print "Projecting the input data on the eigenfaces orthonormal basis"
t0 = time()
X_train_pca = pca.transform(X_train) #transform data into the principle components representation 
X_test_pca = pca.transform(X_test)
print "done in %0.3fs" % (time() - t0)


###############################################################################
# Train a SVM classification model

print "Fitting the classifier to the training set"
t0 = time()
param_grid = {
         'C': [1e3, 5e3, 1e4, 5e4, 1e5],
          'gamma': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.1],
          }
# for sklearn version 0.16 or prior, the class_weight parameter value is 'auto'
clf = GridSearchCV(SVC(kernel='rbf', class_weight='balanced'), param_grid)
clf = clf.fit(X_train_pca, y_train)  #SVC using the principle components as the features
print "done in %0.3fs" % (time() - t0)
print "Best estimator found by grid search:"
print clf.best_estimator_


###############################################################################
# Quantitative evaluation of the model quality on the test set

print "Predicting the people names on the testing set"
t0 = time()
y_pred = clf.predict(X_test_pca) #SVC try to identify in the test set who appears in a given picture.
print "done in %0.3fs" % (time() - t0)

print classification_report(y_test, y_pred, target_names=target_names)
print confusion_matrix(y_test, y_pred, labels=range(n_classes))


###############################################################################
# Qualitative evaluation of the predictions using matplotlib

def plot_gallery(images, titles, h, w, n_row=3, n_col=4):
    """Helper function to plot a gallery of portraits"""
    pl.figure(figsize=(1.8 * n_col, 2.4 * n_row))
    pl.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
    for i in range(n_row * n_col):
        pl.subplot(n_row, n_col, i + 1)
        pl.imshow(images[i].reshape((h, w)), cmap=pl.cm.gray)
        pl.title(titles[i], size=12)
        pl.xticks(())
        pl.yticks(())


# plot the result of the prediction on a portion of the test set

def title(y_pred, y_test, target_names, i):
    pred_name = target_names[y_pred[i]].rsplit(' ', 1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ', 1)[-1]
    return 'predicted: %s\ntrue:      %s' % (pred_name, true_name)

prediction_titles = [title(y_pred, y_test, target_names, i)
                         for i in range(y_pred.shape[0])]

plot_gallery(X_test, prediction_titles, h, w)

# plot the gallery of the most significative eigenfaces

eigenface_titles = ["eigenface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces, eigenface_titles, h, w)

pl.show()

The eigenfaces are basically the principle components of the face data.

image.png

at last ,the algorithm will show you the eigenfaces.
在SVM中篷角，將PCA產(chǎn)生的合成圖像用作特征，在預(yù)測圖片中的臉的身份時非常有用

33. PCA 迷你項目

我們在討論 PCA 時花費了大量時間來探討理論問題系任，因此恳蹲，在此迷你項目中，我們將要求你寫一些 sklearn 代碼俩滥。特征臉方法代碼很有趣嘉蕾，而且內(nèi)容豐富，足以勝任這一整個迷你項目的試驗平臺霜旧。

可在 pca/eigenfaces.py 中找到初始代碼错忱。此代碼主要取自此處 sklearn 文檔中的示例。

請注意挂据，在運行代碼時以清，對于在 pca/eigenfaces.py 的第 94 行調(diào)用的 SVC 函數(shù)，有一個參數(shù)有改變崎逃。對于“class_weight”參數(shù)掷倔，參數(shù)字符串“auto”對于 sklearn 版本 0.16 和更早版本是有效值，但將被 0.19 舍棄个绍。如果運行 sklearn 版本 0.17 或更高版本今魔，預(yù)期的參數(shù)字符串應(yīng)為“balanced”勺像。如果在運行 pca/eigenfaces.py 時收到錯誤或警告障贸，請確保第 98 行包含與你安裝的 sklearn 版本匹配的正確參數(shù)错森。

sklearn 0.16或更早版本 class_weight='auto'
sklearn 0.16或更高版本 class_weight='balanced'

34.每個主成分的可釋方差

我們提到 PCA 會對主成分進行排序，第一個主成分具有最大方差篮洁，第二個主成分具有第二大方差涩维，依此類推。第一個主成分可以解釋多少方差袁波？第二個呢瓦阐？

print "the raio is ", pca.explained_variance_ratio_  #每個主成分的可釋方差  0.19346534  0.15116844

第一主成分解釋了多少變異量？ 0.19346534
第二主成分呢篷牌？ 0.15116844

我們發(fā)現(xiàn)睡蟋，有時 Pillow 模塊（本例中使用的）可能會造成麻煩。如果你收到與 fetch_lfw_people() 命令相關(guān)的錯誤枷颊，請嘗試以下命令：

pip install --upgrade PILLOW

35.要使用多少個主成分戳杀？

現(xiàn)在你將嘗試保留不同數(shù)量的主成分。在類似這樣的多類分類問題中（要應(yīng)用兩個以上標簽）夭苗，準確性這個指標不像在兩個類的情形中那么直觀信卡。相反，更常用的指標是 F1 分數(shù)f1-score题造。
我們將在評估指標課程中學(xué)習(xí) F1 分數(shù)f1-score傍菇，但你自己要弄清楚好的分類器的特點是具有高 F1 分數(shù)f1-score還是低 F1 分數(shù)f1-score。你將通過改變主成分數(shù)量并觀察 F1 分數(shù)f1-score如何相應(yīng)地變化來確定界赔。
將更多主成分添加為特征以便訓(xùn)練分類器時丢习，你是希望它的性能更好還是更差？
as you add more principal components as features for training your classifier,do you expect it to get better or worse performance?
□ √ could go either way
While ideally, adding components should provide us additional signal to improve our performance, it is possible that we end up at a complexity where we overfit.

36. F1 分數(shù)與使用的主成分數(shù)

將 n_components 更改為以下值：[10, 15, 25, 50, 100, 250]淮悼。對于每個主成分咐低，請注意 Ariel Sharon 的 F1 分數(shù)。（對于 10 個主成分敛惊，代碼中的繪制功能將會失效渊鞋，但你應(yīng)該能夠看到 F1 分數(shù)。）
如果看到較高的 F1 分數(shù)瞧挤，這意味著分類器的表現(xiàn)是更好還是更差锡宋？

Ariel Sharon f-score
n_components = 150 f-score=0.65
n_components = 10 f-score=0.11
n_components = 15 f-score=0.33
n_components = 50 f-score=0.67
n_components = 100 f-score=0.67
n_components = 250 f-score=0.62

if you see a higher f1-score ,dose it mean the classifier is doing better,or worse?
□ √ better

37. 維度降低與過擬合

在使用大量主成分時，是否看到過擬合的任何證據(jù)特恬？PCA 維度降低是否有助于提高性能执俩？
did you see any evidence of overfitting when using a large number of PCs?
□ √ yes,performance starts to drop with many PCs.

38. 選擇主成分

selecting a number of principle components
think about selecting how many principle components you should look at.
there is no cut and dry answer for how many principle components you should use,you kind of have to figure it out

what's a good way to figure out how many PCs to use?
□ × just take top 10%
□ √train on different number of PCs,and see how accuracy responds-cut off when it becomes apparent that adding more PCs doesn't by you much more discrimination
□ × perform feature selection on input features before putting them into PCA,then use as many PCs as you have input features.
PCA is going to find a way to combine information from potentially many different input features together,so if you are throwing out input features before you do PCA,you are throwing information that PCA might be able to kind of rescue in a sense.it's fine to do feature selection on the principle components after you have make them,but you want to be very careful about throwing out information before performing PCA.
PCA can be fairly computationally expensive,so if you have a very large input feature space and you know that a lot of them are potentially completely irrelevant features. go ahead and try tossing them out,but proceed with caution.

?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市癌刽，隨后出現(xiàn)的幾起案子役首，更是在濱河造成了極大的恐慌尝丐，老刑警劉巖，帶你破解...
沈念sama閱讀 216,496評論 6贊 501
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件衡奥，死亡現(xiàn)場離奇詭異爹袁，居然都是意外死亡，警方通過查閱死者的電腦和手機矮固，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 92,407評論 3贊 392
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門失息，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人档址，你說我怎么就攤上這事盹兢。” “怎么了守伸？”我有些...
開封第一講書人閱讀 162,632評論 0贊 353
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵绎秒，是天一觀的道長。經(jīng)常有香客問我尼摹，道長见芹，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 58,180評論 1贊 292
?港島之戀（遺憾婚禮）
正文為了忘掉前任窘问，我火速辦了婚禮辆童，結(jié)果婚禮上，老公的妹妹穿的比我還像新娘惠赫。我一直安慰自己把鉴，他們只是感情好，可當(dāng)我...
茶點故事閱讀 67,198評論 6贊 388
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布儿咱。她就那樣靜靜地躺著庭砍，像睡著了一般。火紅的嫁衣襯著肌膚如雪混埠。梳的紋絲不亂的頭發(fā)上怠缸，一...
開封第一講書人閱讀 51,165評論 1贊 299
城市分裂傳說
那天，我揣著相機與錄音钳宪，去河邊找鬼揭北。笑死，一個胖子當(dāng)著我的面吹牛吏颖，可吹牛的內(nèi)容都是我干的搔体。我是一名探鬼主播，決...
沈念sama閱讀 40,052評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼半醉，長吁一口氣：“原來是場噩夢啊……” “哼疚俱！你這毒婦竟也來了？” 一聲冷哼從身側(cè)響起缩多，我...
開封第一講書人閱讀 38,910評論 0贊 274
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤呆奕，失蹤者是張志新（化名）和其女友劉穎养晋，沒想到半個月后，有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體梁钾，經(jīng)...
沈念sama閱讀 45,324評論 1贊 310
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡绳泉，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內(nèi)容為張勛視角年9月15日...
茶點故事閱讀 37,542評論 2贊 332
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發(fā)現(xiàn)自己被綠了陈轿。大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片圈纺。...
茶點故事閱讀 39,711評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖麦射，靈堂內(nèi)的尸體忽然破棺而出，到底是詐尸還是另有隱情灯谣，我是刑警寧澤潜秋，帶...
沈念sama閱讀 35,424評論 5贊 343
?日本核電站爆炸內(nèi)幕
正文年R本政府宣布，位于F島的核電站胎许，受9級特大地震影響峻呛，放射性物質(zhì)發(fā)生泄漏。R本人自食惡果不足惜辜窑，卻給世界環(huán)境...
茶點故事閱讀 41,017評論 3贊 326
男人毒藥：我在死后第九天來索命
文/蒙蒙一钩述、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧穆碎，春花似錦牙勘、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,668評論 0贊 22
一樁弒父案方面，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至色徘，卻和暖如春恭金，著一層夾襖步出監(jiān)牢的瞬間，已是汗流浹背褂策。一陣腳步聲響...
開封第一講書人閱讀 32,823評論 1贊 269
情欲美人皮
我被黑心中介騙來泰國打工横腿，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人斤寂。一個月前我還...
沈念sama閱讀 47,722評論 2贊 368
代替公主和親
正文我出身青樓耿焊，卻偏偏與公主長得像，于是被迫代替她去往敵國和親扬蕊。傳聞我的和親對象是個殘疾皇子搀别，可洞房花燭夜當(dāng)晚...
茶點故事閱讀 44,611評論 2贊 353

主成分分析

1. 數(shù)據(jù)維度

6. 用于數(shù)據(jù)轉(zhuǎn)換的PCA

7. 新坐標系的中心

8. 新坐標系的主軸

9.新系統(tǒng)的第二主成分

11. 練習(xí)查找新軸

12. 哪些數(shù)據(jù)可用于PCA

13. 軸何時占主導(dǎo)地位

14.可測量的特征與潛在的特征練習(xí)

15. 從四個特征到兩個

16. 在保留信息的同時壓縮

17.復(fù)合特征

18. 最大方差

19. 最大方差的優(yōu)點

20. 最大方差與信息損失

21. 信息損失和主成分

23. 用于特征轉(zhuǎn)換的 PCA

25. PCA 的回顧/定義

26. 將 PCA 應(yīng)用到實際數(shù)據(jù)

28. sklearn 中的 PCA

29.何時使用 PCA

30. 用于人臉識別的PCA

31. 特征臉方法代碼

33. PCA 迷你項目

34.每個主成分的可釋方差

35.要使用多少個主成分戳杀？

36. F1 分數(shù)與使用的主成分數(shù)

37. 維度降低與過擬合

38. 選擇主成分

推薦閱讀更多精彩內(nèi)容