前言
圖像的風(fēng)格遷移是計算機視覺領(lǐng)域最有趣的應(yīng)用之一,用深度學(xué)習(xí)實現(xiàn)圖像的風(fēng)格遷移,可以分為實現(xiàn)神經(jīng)風(fēng)格遷移算法和利用算法生成新的藝術(shù)圖像莺褒。大多數(shù)深度學(xué)習(xí)算法需要優(yōu)化損失函數(shù)來得到一系列參數(shù),而神經(jīng)風(fēng)格遷移是通過優(yōu)化損失函數(shù)得到像素值。在整個算法實現(xiàn)之前暖侨,首先,需要導(dǎo)入相關(guān)庫
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf
神經(jīng)風(fēng)格遷移的實現(xiàn)是利用兩幅圖像崇渗,包括一張內(nèi)容圖像(用C表示)和風(fēng)格圖像(用S表示)生成一張新的圖像(用G表示)字逗,這張新生成的圖像包含了圖像C的內(nèi)容和圖像S的風(fēng)格京郑。以下過程,如下圖所示:
遷移學(xué)習(xí)
神經(jīng)風(fēng)格遷移的實現(xiàn)使用了預(yù)訓(xùn)練的卷積模型葫掉,根據(jù)NST(Nerual Style Transfer)的論文些举,實現(xiàn)神經(jīng)風(fēng)格遷移算法,使用了VGG-19的深度神經(jīng)網(wǎng)絡(luò)俭厚,VGG-19的模型已經(jīng)利用了大量的圖像數(shù)據(jù)進(jìn)行了訓(xùn)練户魏,在神經(jīng)網(wǎng)絡(luò)的較淺層能夠?qū)W習(xí)低層特征,在網(wǎng)絡(luò)的深層能夠?qū)W習(xí)到圖像的深層特征挪挤。
加載預(yù)訓(xùn)練模型的代碼如下所示:
model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
整個模型被存儲在python
中的字典中叼丑,字典中的鍵表示變量名,與鍵相對應(yīng)的值表示變量名所對應(yīng)的張量值扛门。
如下代碼所示鸠信,可以指定模型的輸入圖像
model["input"].assign(image)
也可以訪問網(wǎng)絡(luò)的指定的某一激活層,例如在此圖像上運行網(wǎng)絡(luò)的4_2層论寨,可以用以下代碼實現(xiàn):
sess.run(model["conv4_2"])
神經(jīng)風(fēng)格遷移
構(gòu)建神經(jīng)風(fēng)格遷移的過程可以分為以下幾個步驟:
- 定義內(nèi)容損失函數(shù)
- 定義風(fēng)格損失函數(shù)
- 根據(jù)內(nèi)容損失函數(shù)和風(fēng)格損失函數(shù)得到一個損失函數(shù)
計算內(nèi)容損失
內(nèi)容損失函數(shù)之前星立,首先,需要確定內(nèi)容圖片葬凳,用如下代碼加載內(nèi)容圖片:
import imageio
content_image = imageio.imread("images/louvre.jpg")
imshow(content_image)
利用卷積神經(jīng)網(wǎng)絡(luò)绰垂,利用淺層的網(wǎng)絡(luò)提取圖像邊緣等簡單的紋理特征,而利用深層的網(wǎng)絡(luò)能夠提取復(fù)雜的諸如物體類別等較為復(fù)雜的紋理特征沮明。
我們希望生成的圖像具有內(nèi)容圖像
的內(nèi)容辕坝,需要選擇網(wǎng)絡(luò)的一些激活層代表圖像的內(nèi)容。實際上荐健,通常會選擇網(wǎng)絡(luò)的中間層酱畅,既不是淺層,也不是深層江场。
假定選擇網(wǎng)絡(luò)中的隱藏層L纺酸,設(shè)定圖像C為預(yù)訓(xùn)練的模型VGG的輸入甜滨,利用前向傳播帚豪,得到輸入圖像C在隱藏層L的激活值,將圖像G作為輸入拷泽,在隱藏層L同樣能夠得到一個圖像G的激活值丰介,用
表示哥纫,其中隱藏層L的輸出是一個
的張量镇眷,則內(nèi)容的損失函數(shù)波桩,可以用以下公式表示:
隱藏層L的激活值是一個3維向量鹅巍,為了方便計算音同,可以將其轉(zhuǎn)化為一個二維向量词爬,如下圖所示:
內(nèi)容損失函數(shù)的定義如下代碼所示:
def compute_content_cost(a_C, a_G):
"""
計算內(nèi)容損失
參數(shù):
a_C -- (1, n_H, n_W, n_C)的張量, 圖像C隱藏層的激活值
a_G --(1, n_H, n_W, n_C), 的張量, 圖像G隱藏層的激活值
返回值:
J_content 損失值
"""
m, n_H, n_W, n_C = a_G.get_shape.as_list()
a_C_unrolled = tf.reshape(a_C, [n_H*n_W, n_C])
a_G_unrolled = tf.reshape(a_C, [n_H*n_W, n_C])
J_content = 1./(4 * n_H * n_W * n_C)*tf.reduce_sum(tf.square(tf.subtract(a_C_unrolled, a_G_unrolled)))
return J_content
計算風(fēng)格損失
風(fēng)格矩陣
風(fēng)格矩陣又被稱為Grma矩陣
,在線性代數(shù)中,Grma矩陣表示一系列向量()點積的集合权均,例如顿膨,
表示的是
.即也就是锅锨,
能夠比較的是兩個向量
的相似程度,如果這兩個向量高度相似恋沃,則說明這兩個向量的點積值較大必搞,也就是
較大。
綜上所述囊咏,兩幅圖像恕洲,經(jīng)過轉(zhuǎn)換之后,將一張圖片經(jīng)過轉(zhuǎn)置之后梅割,可以直接相乘得到Grma矩陣研侣,計算神經(jīng)網(wǎng)絡(luò)風(fēng)格遷移的過程可以如下圖表示:
需要注意的是,gram矩陣的對角元素衡量濾波器的活躍程度炮捧,假定第
個濾波器正在探測圖像的垂直紋理庶诡,
能夠衡量整個圖像中垂直紋理的普遍程度,
越大咆课,圖像中的垂直紋理越多末誓。
通過捕獲不同特征的普遍程度()以及同時出現(xiàn)的不同特征的數(shù)量(
),風(fēng)格矩陣
可以衡量圖像的風(fēng)格。
利用tensorflow實現(xiàn)圖像的風(fēng)格矩陣的代碼如下所示:
def gram_matrix(A):
"""
參數(shù):
A --矩陣 (n_C, n_H*n_W)
返回值:
GA --矩陣A的 Grma矩陣(n_C, n_C)
"""
GA = tf.lingla.matmul(A,A,transpose_b=True)
return GA
風(fēng)格損失
生成風(fēng)格矩陣之后书蚪,優(yōu)化目標(biāo)就會變成最小化生成圖像與風(fēng)格矩陣之間的距離喇澡,可以用神經(jīng)網(wǎng)絡(luò)的一個隱藏層實現(xiàn),具體實現(xiàn)公式如下所示:
和
圖像分別表示生成圖像和風(fēng)格圖像的Gram矩陣殊校。利用代碼實現(xiàn)此公式可以分為以下幾個步驟:
- 從隱藏層激活值
恢復(fù)矩陣的維數(shù);
- 將隱藏層激活值
和
轉(zhuǎn)換為2維矩陣;
- 計算圖像的風(fēng)格矩陣晴玖;
- 計算風(fēng)格損失。
綜上为流,整體實現(xiàn)代碼如下所示:
def compute_layer_style_cost(a_S, a_G):
"""
參數(shù):
a_S -- (1, n_H, n_W, n_C)維數(shù)的張量, 表示風(fēng)格圖像隱藏層的激活值
a_G -- (1, n_H, n_W, n_C)維數(shù)的張量, 圖像G的風(fēng)格所代表的隱藏層的激活值
返回值:
J_style_layer --風(fēng)格損失
"""
m, n_H, n_W, n_C = a_G.get_shape().as_list()
a_S = tf.reshape(a_S,[n_H*n_W,n_C])
a_G = tf.reshape(a_G,[n_H*n_W,n_C])
GS = gram_matrix(a_S)
GG = gram_matrix(a_G)
J_style_layer = 1./(4*n_C**2*(n_H*n_W)**2)*tf.sum(tf.reduce_sum(tf.square(tf.substract(GS,GG)))
return J_style_layer
風(fēng)格權(quán)重
至此呕屎,已經(jīng)完成了單個網(wǎng)絡(luò)層中關(guān)于圖像風(fēng)格損失的實現(xiàn),如果敬察,利用多層網(wǎng)絡(luò)結(jié)構(gòu)秀睛,并給每一層網(wǎng)絡(luò)賦予不同的權(quán)重,最后莲祸,將這些網(wǎng)絡(luò)層合并起來蹂安,可能會有意想不到的結(jié)果,具體代碼如下所示:
STYLE_LAYERS = [
('conv1_1', 0.2),
('conv2_1', 0.2),
('conv3_1', 0.2),
('conv4_1', 0.2),
('conv5_1', 0.2)]
合并不同層的風(fēng)格損失锐帜,可以如下公式所示:
其中田盈,表示不同層的權(quán)重,如上代碼中的
STYLE_LAYERS
所示缴阎。
以上公式的代碼實現(xiàn)允瞧,如下所示:
def compute_style_cost(model, STYLE_LAYERS):
"""
計算不同層的總體風(fēng)格損失
參數(shù):
model --實現(xiàn)的tensorflow模型
STYLE_LAYERS --一個python列表,包含網(wǎng)絡(luò)層的名字以及對應(yīng)的權(quán)重
返回值:
J_style--返回的總體損失
"""
# 初始化風(fēng)格損失
J_style = 0
for layer_name, coeff in STYLE_LAYERS:
out = model[layer_name]
a_S = sess.run(out)
a_G = out
J_style_layer = compute_layer_style_cost(a_S, a_G)
J_style += coeff * J_style_layer
return J_style
定義總體損失函數(shù)
圖像風(fēng)格遷移的總體損失函數(shù)如下公式所示,由圖像內(nèi)容損失和圖像風(fēng)格損失總體構(gòu)成瓷式,如下公式所示:
根據(jù)以上公式,代碼實現(xiàn)如下所示:
def total_cost(J_content, J_style, alpha = 10, beta = 40):
"""
計算總體損失
參數(shù):
J_content -- 內(nèi)容損失
J_style -- 風(fēng)格損失
alpha -- 超參數(shù)语泽,內(nèi)容損失的權(quán)重值
beta -- 超參數(shù)贸典,風(fēng)格損失的權(quán)重值
返回值:
J --總體損失
"""
J = alpha*J_content + beta*J_style
return J
解決優(yōu)化問題
綜上,實現(xiàn)神經(jīng)風(fēng)格遷移的項目可以分為以下幾個步驟:
- 創(chuàng)建一個交互seesion
- 載入內(nèi)容圖像
- 載入風(fēng)格圖像
- 隨機初始化生成的圖像
- 加載VGG-16模型
- 創(chuàng)建tensorflow圖
- 通過VGG-16計算內(nèi)容圖像的損失
- 通過VGG-16計算風(fēng)格圖像的損失
- 計算總體損失
- 定義優(yōu)化函數(shù)并初始化學(xué)習(xí)率
- 運行tensorflow圖踱卵,并經(jīng)過多次迭代更新生成的圖像
與常規(guī)session不同廊驼,交互式session將自身設(shè)定為默認(rèn)session,可以使得運行變量時惋砂,無需經(jīng)常引用session對象妒挎,從而大大簡化了代碼。
如下代碼所示:
tf.reset_default_graph()
sess = tf.InteractiveSession()
加載內(nèi)容圖像和風(fēng)格圖像西饵,并實現(xiàn)圖像轉(zhuǎn)化
# 加載內(nèi)容圖像
content_image = iamgeio.imread("images/louvre_small.jpg")
content_image = reshape_and_normalize_image(content_image)
# 加載風(fēng)格圖像
style_image = imageio.imread("images/monet.jpg")
style_image = reshape_and_normalize_image(style_image)
生成隨機噪聲圖像酝掩,初始化生成圖像,如下所示:
generated_image = generate_noise_image(content_image)
imshow(generated_image[0])
加載VGG16模型
model = load_vgg_model("pretrained-model/imagenet-vgg-verydeep-19.mat")
計算圖像的內(nèi)容損失時眷柔,將a_C和a_G分配給VGG模型特定的隱藏層激活期虾,并且使用conv4_2
層計算內(nèi)容損失,如下代碼所示:
# 將內(nèi)容圖像指定為模型的輸入
sess.run(model['input'].assign(content_image))
# 選擇conv4_2層的輸出
out = model['conv4_2']
#設(shè)定a_C為選定的隱藏層激活值
a_C = sess.run(out)
# 設(shè)定a_G為相同的隱藏層激活值
a_G = out
# 計算內(nèi)容損失
J_content = compute_content_cost(a_C, a_G)
風(fēng)格圖像的損失計算如下代碼所示:
sess.run(model['input'].assign(style_image))
J_style = compute_style_cost(model, STYLE_LAYERS)
計算總體損失
J = total_cost(J_content, J_style, alpha = 10, beta = 40)
定義優(yōu)化函數(shù)驯嘱,并設(shè)定其學(xué)習(xí)率為2.0
optimizer = tf.train.AdamOptimizer(2.0)
train_step = optimizer.minimize(J)
通過模型的多次迭代實現(xiàn)風(fēng)格遷移的代碼如下所示:
def model_nn(sess, input_image, num_iterations = 200):
# 初始化session的全局變量
sess.run(tf.global_variables_initializer())
# 初始化生成圖像并將其指定為模型的輸入
sess.run(model['input'].assign(input_image))
for i in range(num_iterations):
# 通過迭代镶苞,最小化損失函數(shù)
sess.run(train_step)
# 通過給定當(dāng)前模型的輸入計算生成圖像
generated_image = sess.run(model['input'])
if i%20 == 0:
Jt, Jc, Js = sess.run([J, J_content, J_style])
print("Iteration " + str(i) + " :")
print("total cost = " + str(Jt))
print("content cost = " + str(Jc))
print("style cost = " + str(Js))
保存生成的圖像
save_image("output/" + str(i) + ".png", generated_image)
save_image('output/generated_image.jpg', generated_image)
return generated_image
最后,運行此模型鞠评,并給定輸入茂蚓,生成的新圖像,如下所示:
附錄:相關(guān)代碼
### Part of this code is due to the MatConvNet team and is used to load the parameters of the pretrained VGG19 model in the notebook ###
import os
import sys
import scipy.io
import scipy.misc
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from PIL import Image
from nst_utils import *
import numpy as np
import tensorflow as tf
class CONFIG:
IMAGE_WIDTH = 400
IMAGE_HEIGHT = 300
COLOR_CHANNELS = 3
NOISE_RATIO = 0.6
MEANS = np.array([123.68, 116.779, 103.939]).reshape((1,1,1,3))
VGG_MODEL = 'pretrained-model/imagenet-vgg-verydeep-19.mat' # Pick the VGG 19-layer model by from the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition".
STYLE_IMAGE = 'images/stone_style.jpg' # Style image to use.
CONTENT_IMAGE = 'images/content300.jpg' # Content image to use.
OUTPUT_DIR = 'output/'
def load_vgg_model(path):
"""
Returns a model for the purpose of 'painting' the picture.
Takes only the convolution layer weights and wrap using the TensorFlow
Conv2d, Relu and AveragePooling layer. VGG actually uses maxpool but
the paper indicates that using AveragePooling yields better results.
The last few fully connected layers are not used.
Here is the detailed configuration of the VGG model:
0 is conv1_1 (3, 3, 3, 64)
1 is relu
2 is conv1_2 (3, 3, 64, 64)
3 is relu
4 is maxpool
5 is conv2_1 (3, 3, 64, 128)
6 is relu
7 is conv2_2 (3, 3, 128, 128)
8 is relu
9 is maxpool
10 is conv3_1 (3, 3, 128, 256)
11 is relu
12 is conv3_2 (3, 3, 256, 256)
13 is relu
14 is conv3_3 (3, 3, 256, 256)
15 is relu
16 is conv3_4 (3, 3, 256, 256)
17 is relu
18 is maxpool
19 is conv4_1 (3, 3, 256, 512)
20 is relu
21 is conv4_2 (3, 3, 512, 512)
22 is relu
23 is conv4_3 (3, 3, 512, 512)
24 is relu
25 is conv4_4 (3, 3, 512, 512)
26 is relu
27 is maxpool
28 is conv5_1 (3, 3, 512, 512)
29 is relu
30 is conv5_2 (3, 3, 512, 512)
31 is relu
32 is conv5_3 (3, 3, 512, 512)
33 is relu
34 is conv5_4 (3, 3, 512, 512)
35 is relu
36 is maxpool
37 is fullyconnected (7, 7, 512, 4096)
38 is relu
39 is fullyconnected (1, 1, 4096, 4096)
40 is relu
41 is fullyconnected (1, 1, 4096, 1000)
42 is softmax
"""
vgg = scipy.io.loadmat(path)
vgg_layers = vgg['layers']
def _weights(layer, expected_layer_name):
"""
Return the weights and bias from the VGG model for a given layer.
"""
wb = vgg_layers[0][layer][0][0][2]
W = wb[0][0]
b = wb[0][1]
layer_name = vgg_layers[0][layer][0][0][0][0]
assert layer_name == expected_layer_name
return W, b
return W, b
def _relu(conv2d_layer):
"""
Return the RELU function wrapped over a TensorFlow layer. Expects a
Conv2d layer input.
"""
return tf.nn.relu(conv2d_layer)
def _conv2d(prev_layer, layer, layer_name):
"""
Return the Conv2D layer using the weights, biases from the VGG
model at 'layer'.
"""
W, b = _weights(layer, layer_name)
W = tf.constant(W)
b = tf.constant(np.reshape(b, (b.size)))
return tf.nn.conv2d(prev_layer, filter=W, strides=[1, 1, 1, 1], padding='SAME') + b
def _conv2d_relu(prev_layer, layer, layer_name):
"""
Return the Conv2D + RELU layer using the weights, biases from the VGG
model at 'layer'.
"""
return _relu(_conv2d(prev_layer, layer, layer_name))
def _avgpool(prev_layer):
"""
Return the AveragePooling layer.
"""
return tf.nn.avg_pool(prev_layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# Constructs the graph model.
graph = {}
graph['input'] = tf.Variable(np.zeros((1, CONFIG.IMAGE_HEIGHT, CONFIG.IMAGE_WIDTH, CONFIG.COLOR_CHANNELS)), dtype = 'float32')
graph['conv1_1'] = _conv2d_relu(graph['input'], 0, 'conv1_1')
graph['conv1_2'] = _conv2d_relu(graph['conv1_1'], 2, 'conv1_2')
graph['avgpool1'] = _avgpool(graph['conv1_2'])
graph['conv2_1'] = _conv2d_relu(graph['avgpool1'], 5, 'conv2_1')
graph['conv2_2'] = _conv2d_relu(graph['conv2_1'], 7, 'conv2_2')
graph['avgpool2'] = _avgpool(graph['conv2_2'])
graph['conv3_1'] = _conv2d_relu(graph['avgpool2'], 10, 'conv3_1')
graph['conv3_2'] = _conv2d_relu(graph['conv3_1'], 12, 'conv3_2')
graph['conv3_3'] = _conv2d_relu(graph['conv3_2'], 14, 'conv3_3')
graph['conv3_4'] = _conv2d_relu(graph['conv3_3'], 16, 'conv3_4')
graph['avgpool3'] = _avgpool(graph['conv3_4'])
graph['conv4_1'] = _conv2d_relu(graph['avgpool3'], 19, 'conv4_1')
graph['conv4_2'] = _conv2d_relu(graph['conv4_1'], 21, 'conv4_2')
graph['conv4_3'] = _conv2d_relu(graph['conv4_2'], 23, 'conv4_3')
graph['conv4_4'] = _conv2d_relu(graph['conv4_3'], 25, 'conv4_4')
graph['avgpool4'] = _avgpool(graph['conv4_4'])
graph['conv5_1'] = _conv2d_relu(graph['avgpool4'], 28, 'conv5_1')
graph['conv5_2'] = _conv2d_relu(graph['conv5_1'], 30, 'conv5_2')
graph['conv5_3'] = _conv2d_relu(graph['conv5_2'], 32, 'conv5_3')
graph['conv5_4'] = _conv2d_relu(graph['conv5_3'], 34, 'conv5_4')
graph['avgpool5'] = _avgpool(graph['conv5_4'])
return graph
def generate_noise_image(content_image, noise_ratio = CONFIG.NOISE_RATIO):
"""
Generates a noisy image by adding random noise to the content_image
"""
# Generate a random noise_image
noise_image = np.random.uniform(-20, 20, (1, CONFIG.IMAGE_HEIGHT, CONFIG.IMAGE_WIDTH, CONFIG.COLOR_CHANNELS)).astype('float32')
# Set the input_image to be a weighted average of the content_image and a noise_image
input_image = noise_image * noise_ratio + content_image * (1 - noise_ratio)
return input_image
def reshape_and_normalize_image(image):
"""
Reshape and normalize the input image (content or style)
"""
# Reshape image to mach expected input of VGG16
image = np.reshape(image, ((1,) + image.shape))
# Substract the mean to match the expected input of VGG16
image = image - CONFIG.MEANS
return image
def save_image(path, image):
# Un-normalize the image so that it looks good
image = image + CONFIG.MEANS
# Clip and Save the image
image = np.clip(image[0], 0, 255).astype('uint8')
scipy.misc.imsave(path, image)
吳恩達(dá)深度學(xué)習(xí)課程4 第四章作業(yè)