高斯混合
class sklearn.mixture.GaussianMixture(n_components=1, covariance_type='full', tol=0.001, reg_covar=1e-06, max_iter=100,
n_init=1, init_params='kmeans', weights_init=None, means_init=None, precisions_init=None, random_state=None, warm_start=False,
verbose=0, verbose_interval=10)
- n_components: 混合高斯模型個(gè)數(shù)锅尘,默認(rèn)為 1
- covariance_type: 協(xié)方差類型检诗,包括 {‘full’,‘tied’, ‘diag’, ‘spherical’} 四種,full 指每個(gè)分量有各自不同的標(biāo)準(zhǔn)協(xié)方差矩陣,完全協(xié)方差矩陣(元素都不為零), tied 指所有分量有相同的標(biāo)準(zhǔn)協(xié)方差矩陣(HMM 會(huì)用到),diag 指每個(gè)分量有各自不同對(duì)角協(xié)方差矩陣(非對(duì)角為零,對(duì)角不為零), spherical 指每個(gè)分量有各自不同的簡(jiǎn)單協(xié)方差矩陣涂身,球面協(xié)方差矩陣(非對(duì)角為零,對(duì)角完全相同搓蚪,球面特性)蛤售,默認(rèn)‘full’ 完全協(xié)方差矩陣
- tol:EM 迭代停止閾值,默認(rèn)為 1e-3.
- reg_covar: 協(xié)方差對(duì)角非負(fù)正則化妒潭,保證協(xié)方差矩陣均為正悴能,默認(rèn)為 0
- max_iter: 最大迭代次數(shù),默認(rèn) 100
- n_init: 初始化次數(shù)雳灾,用于產(chǎn)生最佳初始參數(shù)漠酿,默認(rèn)為 1
- init_params: {‘kmeans’, ‘random’}, defaults to ‘kmeans’. 初始化參數(shù)實(shí)現(xiàn)方式,默認(rèn)用 kmeans 實(shí)現(xiàn)谎亩,也可以選擇隨機(jī)產(chǎn)生
- weights_init: 各組成模型的先驗(yàn)權(quán)重炒嘲,可以自己設(shè)宇姚,默認(rèn)按照 7 產(chǎn)生
- means_init: 初始化均值,同 8
- precisions_init: 初始化精確度(模型個(gè)數(shù)夫凸,特征個(gè)數(shù))浑劳,默認(rèn)按照 7 實(shí)現(xiàn)
- random_state : 隨機(jī)數(shù)發(fā)生器
- warm_start : 若為 True,則 fit()調(diào)用會(huì)以上一次 fit()的結(jié)果作為初始化參數(shù)夭拌,適合相同問題多次 fit 的情況魔熏,能加速收斂,默認(rèn)為 False鸽扁。
- verbose : 使能迭代信息顯示道逗,默認(rèn)為 0,可以為 1 或者大于 1(顯示的信息不同)
- verbose_interval : 與 13 掛鉤献烦,若使能迭代信息顯示,設(shè)置多少次迭代后顯示信息卖词,默認(rèn) 10 次巩那。
參考代碼
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import StratifiedKFold
print(__doc__)
colors = ['navy', 'turquoise', 'darkorange']
def make_ellipses(gmm, ax):
for n, color in enumerate(colors):
if gmm.covariance_type == 'full':
covariances = gmm.covariances_[n][:2, :2]
elif gmm.covariance_type == 'tied':
covariances = gmm.covariances_[:2, :2]
elif gmm.covariance_type == 'diag':
covariances = np.diag(gmm.covariances_[n][:2])
elif gmm.covariance_type == 'spherical':
covariances = np.eye(gmm.means_.shape[1]) * gmm.covariances_[n]
v, w = np.linalg.eigh(covariances)
u = w[0] / np.linalg.norm(w[0])
angle = np.arctan2(u[1], u[0])
angle = 180 * angle / np.pi # convert to degrees
v = 2. * np.sqrt(2.) * np.sqrt(v)
ell = mpl.patches.Ellipse(gmm.means_[n, :2], v[0], v[1],
180 + angle, color=color)
ell.set_clip_box(ax.bbox)
ell.set_alpha(0.5)
ax.add_artist(ell)
ax.set_aspect('equal', 'datalim')
iris = datasets.load_iris()
# Break up the dataset into non-overlapping training (75%) and testing
# (25%) sets.
skf = StratifiedKFold(n_splits=4)
# Only take the first fold.
train_index, test_index = next(iter(skf.split(iris.data, iris.target)))
X_train = iris.data[train_index]
y_train = iris.target[train_index]
X_test = iris.data[test_index]
y_test = iris.target[test_index]
n_classes = len(np.unique(y_train))
# Try GMMs using different types of covariances.
# print(n_classes)
estimators = {cov_type: GaussianMixture(n_components=n_classes,
covariance_type=cov_type, max_iter=20, random_state=0)
for cov_type in ['spherical', 'diag', 'tied', 'full']}
n_estimators = len(estimators)
plt.figure(figsize=(3 * n_estimators // 2, 6))
plt.subplots_adjust(bottom=.01, top=0.95, hspace=.15, wspace=.05,
left=.01, right=.99)
for index, (name, estimator) in enumerate(estimators.items()):
# Since we have class labels for the training data, we can
# initialize the GMM parameters in a supervised manner.
estimator.means_init = np.array([X_train[y_train == i].mean(axis=0)
for i in range(n_classes)])
# Train the other parameters using the EM algorithm.
estimator.fit(X_train)
h = plt.subplot(2, n_estimators // 2, index + 1)
make_ellipses(estimator, h)
for n, color in enumerate(colors):
data = iris.data[iris.target == n]
plt.scatter(data[:, 0], data[:, 1], s=0.8, color=color,
label=iris.target_names[n])
# Plot the test data with crosses
for n, color in enumerate(colors):
data = X_test[y_test == n]
plt.scatter(data[:, 0], data[:, 1], marker='x', color=color)
y_train_pred = estimator.predict(X_train)
train_accuracy = np.mean(y_train_pred.ravel() == y_train.ravel()) * 100
plt.text(0.05, 0.9, 'Train accuracy: %.1f' % train_accuracy,
transform=h.transAxes)
y_test_pred = estimator.predict(X_test)
test_accuracy = np.mean(y_test_pred.ravel() == y_test.ravel()) * 100
plt.text(0.05, 0.8, 'Test accuracy: %.1f' % test_accuracy,
transform=h.transAxes)
plt.xticks(())
plt.yticks(())
plt.title(name)
plt.legend(scatterpoints=1, loc='lower right', prop=dict(size=12))
plt.show()