隱含馬爾可夫模型(Hidden Markov Model睹簇,HMM)最初是在20世紀(jì)60年代后半期,由Leonard E. Baum和其他一些作者在一系列統(tǒng)計學(xué)論文中描述的械媒。其最初應(yīng)用于語音識別領(lǐng)域劲妙。
1980年代后半期,HMM開始應(yīng)用到生物序列本昏,尤其是DNA序列的分析中。隨后枪汪,在生物信息學(xué)領(lǐng)域涌穆,HMM逐漸成為一項不可或缺的技術(shù)。
本文內(nèi)容包含來自:
[1] 用hmmlearn學(xué)習(xí)隱馬爾科夫模型HMM
[2] 官方文檔
hmmlearn
hmmlearn曾經(jīng)是scikit-learn項目的一部分雀久,現(xiàn)已獨立成單獨的Python包宿稀,可直接通過pip進行安裝,為無監(jiān)督隱馬爾可夫模型赖捌。其官方文檔網(wǎng)址為https://hmmlearn.readthedocs.io/en/stable/祝沸。其有監(jiān)督的版本為seqlearn。
pip3 install hmmlearn
hmmlearn提供三種模型:
名稱 | 簡介 | 觀測狀態(tài) |
---|---|---|
hmm.GaussianHMM |
Hidden Markov Model with Gaussian emissions. | 連續(xù) |
hmm.GMMHMM |
Hidden Markov Model with Gaussian mixture emissions. | 連續(xù) |
hmm.MultinomialHMM |
Hidden Markov Model with multinomial (discrete) emissions | 離散 |
MultinomialHMM
方法聲明為
class hmmlearn.hmm.MultinomialHMM(n_components=1, startprob_prior=1.0, transmat_prior=1.0,
algorithm='viterbi', random_state=None, n_iter=10, tol=0.01, verbose=False, params='ste', init_params='ste')
其中巡蘸,較為常用(或?qū)⒏拢┑膮?shù)為:
- n_components:(int)隱含狀態(tài)個數(shù)
- n_iter:(int, optional)訓(xùn)練時循環(huán)(迭代)最大次數(shù)
- tol:(float, optional)Convergence threshold. EM will stop if the gain in log-likelihood is below this value.
- verbose:(bool, optional)賦值為
True
時奋隶,會向標(biāo)準(zhǔn)輸出輸出每次迭代的概率(score)與本次 - init_params:(string, optional)決定哪些參數(shù)會在訓(xùn)練時被初始化。
‘s’
for startprob,‘t’
for transmat,‘e’
for emissionprob悦荒。空字符串""
代表全部使用用戶提供的參數(shù)進行訓(xùn)練嘹吨。
定義搬味、使用:
import numpy as np
from hmmlearn import hmm
states = ["box 1", "box 2", "box3"]
n_states = len(states)
observations = ["red", "white"]
n_observations = len(observations)
start_probability = np.array([0.2, 0.4, 0.4])
transition_probability = np.array([
[0.5, 0.2, 0.3],
[0.3, 0.5, 0.2],
[0.2, 0.3, 0.5]
])
emission_probability = np.array([
[0.5, 0.5],
[0.4, 0.6],
[0.7, 0.3]
])
model = hmm.MultinomialHMM(n_components=n_states, n_iter=20, tol=0.001)
model.startprob_=start_probability
model.transmat_=transition_probability
model.emissionprob_=emission_probability
維特比算法預(yù)測狀態(tài)
有說法稱,其返回結(jié)果為ln(prob)
蟀拷,文檔原文為“the log probability”
seen = np.array([[0,1,0]]).T
logprob, box = model.decode(seen, algorithm="viterbi")
print("The ball picked:", ", ".join(map(lambda x: observations[x], seen)))
print("The hidden box", ", ".join(map(lambda x: states[x], box)))
輸出為
('The ball picked:', 'red, white, red')
('The hidden box', 'box3, box3, box3')
計算觀測的概率
print model.score(seen)
輸出為
-2.03854530992
訓(xùn)練與數(shù)據(jù)準(zhǔn)備
import numpy as np
from hmmlearn import hmm
states = ["box 1", "box 2", "box3"]
n_states = len(states)
observations = ["red", "white"]
n_observations = len(observations)
model = hmm.MultinomialHMM(n_components=n_states, n_iter=20, tol=0.01)
D1 = [[1], [0], [0], [0], [1], [1], [1]]
D2 = [[1], [0], [0], [0], [1], [1], [1], [0], [1], [1]]
D3 = [[1], [0], [0]]
X = numpy.concatenate([D1, D2, D3])
model.fit(X)
print model.startprob_
print model.transmat_
print model.emissionprob_
print model.score(X)