人工智能 - 自編碼器 AutoEncoder [2]

歡迎Follow我的GitHub，關注我的簡書

自編碼器贪绘，使用稀疏的高階特征重新組合兑牡，來重構自己，輸入與輸出一致税灌。

TensorFlow框架的搭建方法均函，參考

源碼，同時菱涤，復制autoencoder_models的模型文件苞也。

本文源碼的GitHub地址

AutoEncoder

工程配置

下載Python的依賴庫：scikit-learn==0.19.0、scipy==0.19.1粘秆、sklearn==0.0

scipy

如果安裝scipy出錯如迟，則把scipy==0.19.1寫入requestments.txt，再安裝攻走，錯誤如下：

THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    scipy from http://mirrors.aliyun.com/pypi/packages/63/68/c5098f3b6034e69d187e3f2e989f462143d9f8b524f5a4f9e13c4a6f5f47/scipy-0.19.1-cp27-cp27m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl#md5=72415e8da753eea97eb9820602931cb5:
        Expected md5 72415e8da753eea97eb9820602931cb5
             Got        073584eb2c597bbfb82a5865b7055787

或者殷勘，直接編寫requestments.txt，全部安裝

pip install -r requirements.txt

matplotlib

安裝matplotlib

pip install matplotlib -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

如果安裝matplotlib報錯昔搂，如下：

RuntimeError: Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework. See the Python documentation for more information on installing Python as a framework on Mac OS X. Please either reinstall Python as a framework, or try one of the other backends. If you are using (Ana)Conda please install python.app and replace the use of 'python' with 'pythonw'. See 'Working with Matplotlib on OSX' in the Matplotlib FAQ for more information.

則執(zhí)行Shell命令

cd ~/.matplotlib
touch matplotlibrc

導入matplotlib

import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt

opencv

opencv的導入庫是cv2玲销，安裝是opencv-python

sudo pip install opencv-python -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

導入cv2，如果直接使用import cv2摘符，則無法自動補全贤斜，導入時應該使用：

import cv2.cv2 as cv2

圖片存儲

獲取MNIST的圖片源，test表示測試集逛裤，train表示訓練集瘩绒，images表示圖片集，labels表示標簽集带族。images的數(shù)據(jù)類型是ndarry锁荔，784維；labels的數(shù)據(jù)類型也是ndarray炉菲，one-hot類型堕战。

# 加載數(shù)據(jù)
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
images = mnist.test.images  # 圖片
labels = mnist.test.labels  # 標簽

將784維的一階矩陣轉換為28維的二階圖片坤溃，將one-hot標簽轉換為數(shù)字（0~9）拍霜，存儲test的前100張圖片。

# 存儲圖片
size = len(labels)
for i in range(size):
    pxl = np.array(images[i])  # 像素
    img = pxl.reshape((28, 28))  # 圖片
    lbl = np.argmax(labels[i])  # 標簽
    misc.imsave('./IMAGE_data/test/' + str(i) + '_' + str(lbl) + '.png', img)  # scipy的存儲模式
    if i == 100:
        break

合并100張圖片為一張圖片薪介，便于做對比祠饺。

# 合并圖片
large_size = 28 * 10
large_img = Image.new('RGBA', (large_size, large_size))
paths_list, _, __ = listdir_files('./IMAGE_data/test/')
for i in range(100):
    img = Image.open(paths_list[i])
    loc = ((int(i / 10) * 28), (i % 10) * 28)
    large_img.paste(img, loc)
large_img.save('./IMAGE_data/merged.png')

圖片的三種存儲方式：scipy、matplotlib（含坐標）汁政、opencv道偷。

# 其他的圖片存儲方式
pixel = np.array(images[0])  # 784維的數(shù)據(jù)
label = np.argmax(labels[0])  # 找到標簽
image = pixel.reshape((28, 28))  # 轉換成28*28維的矩陣

# -------------------- scipy模式 -------------------- #
misc.imsave('./IMAGE_data/scipy.png', image)  # scipy的存儲模式
# -------------------- scipy模式 -------------------- #

# -------------------- matplotlib模式 -------------------- #
plt.gray()  # 轉變?yōu)榛叶葓D片
plt.imshow(image)
plt.savefig("./IMAGE_data/plt.png")
# plt.show()
# -------------------- matplotlib模式 -------------------- #

# -------------------- opencv模式 -------------------- #
image = image * 255  # 數(shù)據(jù)是0~1的浮點數(shù)
cv2.imwrite("./IMAGE_data/opencv.png", image)
# cv2.imshow('hah', pixels)
# cv2.waitKey(0)
# -------------------- opencv模式 -------------------- #

自編碼器

讀取MNIST的數(shù)據(jù)

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

將訓練數(shù)據(jù)與測試數(shù)據(jù)標準化

X_train, X_test = standard_scale(mnist.train.images, mnist.test.images)

以訓練數(shù)據(jù)為標準缀旁，計算均值和標準差，然后處理訓練數(shù)據(jù)與測試數(shù)據(jù)勺鸦。

def standard_scale(X_train, X_test):
    preprocessor = prep.StandardScaler().fit(X_train)
    X_train = preprocessor.transform(X_train)
    X_test = preprocessor.transform(X_test)
    return X_train, X_test

在StandardScaler中并巍，mean_表示均值矩陣，與圖片維數(shù)一致换途；scale_表示標準差懊渡，也與圖片維數(shù)一致；矩陣中每一個數(shù)字都減去對應的均值军拟，除以對應的標準差剃执。

self.scale_ = _handle_zeros_in_scale(np.sqrt(self.var_))
X -= self.mean_
X /= self.scale_

設置訓練參數(shù)：n_samples全部樣本個數(shù)，training_epochs迭代次數(shù)懈息，batch_size批次的樣本數(shù)肾档，display_step顯示步數(shù)。

n_samples = int(mnist.train.num_examples)
training_epochs = 20
batch_size = 128
display_step = 1

AdditiveGaussianNoiseAutoencoder辫继，簡稱AGN怒见，加高斯噪聲的自動編碼器。n_input輸入節(jié)點數(shù)骇两，與圖片維數(shù)相同速种，784維；n_hidden隱含層的節(jié)點數(shù)低千，需要小于輸入節(jié)點數(shù)配阵，200維；transfer_function激活函數(shù)示血，tf.nn.softplus棋傍；optimizer優(yōu)化器，AdamOptimizer难审，學習率是0.001瘫拣；scale噪聲系數(shù)，0.01告喊。

autoencoder = AdditiveGaussianNoiseAutoencoder(
    n_input=784, n_hidden=200, transfer_function=tf.nn.softplus,
    optimizer=tf.train.AdamOptimizer(learning_rate=0.001), scale=0.01)

關于激活函數(shù)softplus的原理如下：

mat = [1., 2., 3.]  # 需要使用小數(shù)
# softplus: [ln(e^1 + 1), ln(e^2 + 1), ln(e^3 + 1)]
print tf.Session().run(tf.nn.softplus(mat))

random_normal生成隨機的正態(tài)分布數(shù)組

rn = tf.random_normal((100000,))  # 一行麸拄，指定seed，防止均值的時候隨機
mean, variance = tf.nn.moments(rn, 0)  # 計算均值和方差黔姜，預期均值約等于是0拢切，方差是1
print tf.Session().run(tf.nn.moments(rn, 0))

AdditiveGaussianNoiseAutoencoder的構造器

def __init__(self, n_input, n_hidden, transfer_function=tf.nn.softplus, optimizer=tf.train.AdamOptimizer(),
             scale=0.1):
    self.n_input = n_input  # 輸入的節(jié)點數(shù)
    self.n_hidden = n_hidden  # 隱含層節(jié)點數(shù)，小于輸入節(jié)點數(shù)
    self.transfer = transfer_function  # 激活函數(shù)
    self.scale = tf.placeholder(tf.float32)  # 系數(shù)秆吵，待訓練的參數(shù)淮椰，初始的feed數(shù)據(jù)是training_scale
    self.training_scale = scale  # 高斯噪聲系數(shù)
    network_weights = self._initialize_weights()  # 初始化權重系數(shù)，輸入層w1/b1，輸出層w2/b2
    self.weights = network_weights  # 權重

    # model
    self.x = tf.placeholder(tf.float32, [None, self.n_input])  # 需要feed的數(shù)據(jù)
    self.hidden = self.transfer(tf.add(tf.matmul(self.x + scale * tf.random_normal((n_input,)),
                                                 self.weights['w1']),
                                       self.weights['b1']))
    self.reconstruction = tf.add(tf.matmul(self.hidden, self.weights['w2']), self.weights['b2'])

    # cost主穗，0.5*(x - x_)^2泻拦，求和
    self.cost = 0.5 * tf.reduce_sum(tf.pow(tf.subtract(self.reconstruction, self.x), 2.0))
    self.optimizer = optimizer.minimize(self.cost)

    init = tf.global_variables_initializer()
    self.sess = tf.Session()
    self.sess.run(init)  # 執(zhí)行圖

random_normal隨機生成矩陣，參數(shù)(n_input,)忽媒，n_input行1列争拐，均值為0，方差為1晦雨，tf.nn.moments陆错，返回均值和方差。

rn = tf.random_normal((100000,))  # 一行金赦，指定seed音瓷，防止均值的時候隨機
mean, variance = tf.nn.moments(rn, 0)  # 計算均值和方差，預期均值約等于是0夹抗，方差是1
print tf.Session().run(tf.nn.moments(rn, 0))

初始化權重绳慎，分為兩層，將n_input維的數(shù)據(jù)轉換為n_hidden維的數(shù)據(jù)漠烧，再反向轉換回去杏愤。初始權重初始化使用xavier_initializer（澤維爾初始化器），權重的均值為1已脓，方差為1/(n_input+n_hidden)珊楼。

def _initialize_weights(self):
    all_weights = dict()
    # 使用xavier_initializer初始化
    all_weights['w1'] = tf.get_variable("w1", shape=[self.n_input, self.n_hidden],
                                        initializer=tf.contrib.layers.xavier_initializer())
    all_weights['b1'] = tf.Variable(tf.zeros([self.n_hidden], dtype=tf.float32))
    all_weights['w2'] = tf.Variable(tf.zeros([self.n_hidden, self.n_input], dtype=tf.float32))
    all_weights['b2'] = tf.Variable(tf.zeros([self.n_input], dtype=tf.float32))
    return all_weights

訓練模型，輸出每個輪次的平均avg_cost度液，

for epoch in range(training_epochs):
    avg_cost = 0.
    total_batch = int(n_samples / batch_size)
    # Loop over all batches
    for i in range(total_batch):
        batch_xs = get_random_block_from_data(X_train, batch_size)

        # Fit training using batch data
        cost = autoencoder.partial_fit(batch_xs)
        # Compute average loss
        avg_cost += cost / n_samples * batch_size

    # Display logs per epoch step
    if epoch % display_step == 0:
        print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(avg_cost))

print("Total cost: " + str(autoencoder.calc_total_cost(X_test)))

隨機獲取起始位置厕宗，取區(qū)塊大小的一批數(shù)據(jù)。

def get_random_block_from_data(data, batch_size):
    start_index = np.random.randint(0, len(data) - batch_size)  # 隨機獲取區(qū)塊
    return data[start_index:(start_index + batch_size)]  # batch_size大小的區(qū)塊

調用autoencoder的partial_fit堕担，向算法Feed數(shù)據(jù)已慢，數(shù)據(jù)就是批次數(shù)據(jù)，高斯噪聲系數(shù)使用默認霹购。

def partial_fit(self, X):
    cost, opt = self.sess.run((self.cost, self.optimizer),
                              feed_dict={self.x: X, self.scale: self.training_scale})
    return cost

最終輸出整個測試集X_test的Cost值佑惠。

print("Total cost: " + str(autoencoder.calc_total_cost(X_test)))

原圖像的效果（100張）：

MINST

自編碼器的效果（100張）：

AutoEncoder的MINST

OK，that‘s all! Enjoy it!

最后編輯于：2017.12.10 01:34:35

?著作權歸作者所有,轉載或內容合作請聯(lián)系作者

人面猴
序言：七十年代末齐疙，一起剝皮案震驚了整個濱河市膜楷，隨后出現(xiàn)的幾起案子，更是在濱河造成了極大的恐慌贞奋，老刑警劉巖赌厅，帶你破解...
沈念sama閱讀 218,546評論 6贊 507
死咒
序言：濱河連續(xù)發(fā)生了三起死亡事件，死亡現(xiàn)場離奇詭異忆矛，居然都是意外死亡察蹲，警方通過查閱死者的電腦和手機，發(fā)現(xiàn)死者居然都...
沈念sama閱讀 93,224評論 3贊 395
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門催训，熙熙樓的掌柜王于貴愁眉苦臉地迎上來洽议，“玉大人，你說我怎么就攤上這事漫拭⊙切郑” “怎么了？”我有些...
開封第一講書人閱讀 164,911評論 0贊 354
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵采驻，是天一觀的道長审胚。經(jīng)常有香客問我，道長礼旅，這世上最難降的妖魔是什么膳叨？我笑而不...
開封第一講書人閱讀 58,737評論 1贊 294
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮痘系，結果婚禮上菲嘴，老公的妹妹穿的比我還像新娘。我一直安慰自己汰翠，他們只是感情好龄坪，可當我...
茶點故事閱讀 67,753評論 6贊 392
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著复唤，像睡著了一般健田。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發(fā)上佛纫，一...
開封第一講書人閱讀 51,598評論 1贊 305
城市分裂傳說
那天妓局，我揣著相機與錄音，去河邊找鬼呈宇。笑死跟磨，一個胖子當著我的面吹牛，可吹牛的內容都是我干的攒盈。我是一名探鬼主播抵拘，決...
沈念sama閱讀 40,338評論 3贊 418
雙鴛鴦連環(huán)套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼型豁！你這毒婦竟也來了僵蛛？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 39,249評論 0贊 276
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤迎变，失蹤者是張志新（化名）和其女友劉穎充尉，沒想到半個月后，有當?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體衣形，經(jīng)...
沈念sama閱讀 45,696評論 1贊 314
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡驼侠，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 37,888評論 3贊 336
?白月光啟示錄
正文我和宋清朗相戀三年姿鸿，在試婚紗的時候發(fā)現(xiàn)自己被綠了。大學時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片倒源。...
茶點故事閱讀 40,013評論 1贊 348
活死人
序言：一個原本活蹦亂跳的男人離奇死亡苛预，死狀恐怖，靈堂內的尸體忽然破棺而出笋熬，到底是詐尸還是另有隱情热某，我是刑警寧澤，帶...
沈念sama閱讀 35,731評論 5贊 346
?日本核電站爆炸內幕
正文年R本政府宣布胳螟，位于F島的核電站昔馋，受9級特大地震影響，放射性物質發(fā)生泄漏糖耸。R本人自食惡果不足惜秘遏，卻給世界環(huán)境...
茶點故事閱讀 41,348評論 3贊 330
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望嘉竟。院中可真熱鬧垄提，春花似錦、人聲如沸周拐。這莊子的主人今日做“春日...
開封第一講書人閱讀 31,929評論 0贊 22
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽妥粟。三九已至审丘，卻和暖如春，著一層夾襖步出監(jiān)牢的瞬間勾给，已是汗流浹背滩报。一陣腳步聲響...
開封第一講書人閱讀 33,048評論 1贊 270
情欲美人皮
我被黑心中介騙來泰國打工丈探，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留脯丝，地道東北人。一個月前我還...
沈念sama閱讀 48,203評論 3贊 370
代替公主和親
正文我出身青樓越锈，卻偏偏與公主長得像桩警，于是被迫代替她去往敵國和親可训。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 44,960評論 2贊 355