本論文主要講解了一個通過低幀率生成高幀率的視頻的算法席函，訓(xùn)練數(shù)據(jù)就是高幀率的視頻沐批。
原文在這里
這里我結(jié)合代碼講解一下我對整個訓(xùn)練過程酵颁，網(wǎng)絡(luò)和里面的公式的一些理解。
參考代碼登颓，基于tensorflow

公式講解

要想合成0幀和1幀之間的任意幀搅荞，這里的做法是分別求到t到0和1的光流，然后用光流復(fù)原t幀圖像框咙，如公式1所示咕痛。

公式1

公式1，0和1之間的中間任意第t幀的生成方式喇嘱，用I0和Ft0來生成t茉贡，用I1和Ft1來生成t，再將兩個生成圖加權(quán)求到It者铜。這里的alpha權(quán)值不是一個值腔丧，這里權(quán)值的計算要考慮時間一致性和遮擋問題放椰。所謂時間一致性，就是t離哪一幀近就更像哪一幀愉粤；遮擋問題砾医，文章認為t中的像素，0和1中至少有一幀中會出現(xiàn)衣厘。所以設(shè)計了以下公式

公式2

公式2就是將公式一種的alpha具體化的效果如蚜，這里（1-t）和（t）是考慮的時間一致性；Vt0和Vt1是所謂的可視性map用來考慮遮擋問題影暴，Vt0+Vt1=1错邦，對于1中不可見的像素，就依靠0幀型宙，那么Vt0接近1撬呢，如果都可見，Vt0和Vt1就55開妆兑。再乘上一個歸一化因子1/Z倾芝。得到最后結(jié)果。所以實際用的時候是用公式2箭跳，在下面代碼中也有說明晨另。

使用公式2能求到t幀，但問題是我們還沒有求到Ft-0和Ft-1兩個光流谱姓。本文的做法是借尿，先求F10和F01兩個光流圖

公式3

這里用公式3求Ft-1的光流，這里前提條件是光流場局部平滑屉来，應(yīng)該是指局部區(qū)域的光流是一樣的意思路翻，不然感覺公式3推導(dǎo)有問題。

公式4

通過公式三茄靠，將三的兩個公式加權(quán)求和得到最終的Ft-1茂契，同理求到Ft-0
這樣就通過F1-0和F0-1求到了Ft-1和Ft-1。

網(wǎng)絡(luò)構(gòu)建

image.png

整個網(wǎng)絡(luò)的構(gòu)成就是如Figure4所示慨绳，前后兩個網(wǎng)絡(luò)用的是U-NET（具體參見本文）,是個全卷積網(wǎng)絡(luò)掉冶。
flow computation輸入是I0和I1兩幀圖像，輸出是F0-1和F1-0兩個光流圖脐雪，按道理然后我們通過公式4就可以算出Ft-1和Ft-0厌小，但是文中說由于有些區(qū)域的不平滑所以這樣求到的結(jié)果不好，所以用第二個網(wǎng)絡(luò)來優(yōu)化一下战秋。
第二個網(wǎng)絡(luò)的輸入是璧亚，I1和I0兩幀圖像，通過網(wǎng)絡(luò)1算出來的Ft-1和Ft-0脂信，通過I1和Ft-1算出來的It癣蟋，通過I0和Ft-0算出來的It透硝，一共6個項組合輸入，輸出的是Ft-1和Ft-0的修正值疯搅，和Vt-0，三項秉撇。
Vt-0是t對于0的可見性map，這里要求Vt-0和Vt-1要和為1秋泄。
Ft-1和Ft-0的修正值加上前面算到的FT-1和Ft-0就是最后的Ft-1和Ft-0琐馆。這樣要求得值都求到了。最后用公式2求到最后的T幀值It.

損失函數(shù)

損失函數(shù)由4項損失加權(quán)求得

重構(gòu)損失

重構(gòu)損失計算的是gt的It和預(yù)測的It的每個像素的L1損失值

感知損失

感知損失方法是恒序，將兩個IT分別輸入VGG網(wǎng)絡(luò)中瘦麸，取中間的特征層計算兩者的L2損失，重構(gòu)的好應(yīng)該是要相同的

wrap損失

wrap有4項歧胁，I0和用I1,F0-1重構(gòu)的I0之間的l1損失滋饲，I1和用I0,F1-0重構(gòu)的I1之間的l1損失，IT和用I0,FT-0重構(gòu)的It之間的L1損失喊巍，IT和用I1,FT-1重構(gòu)的It之間的L1損失屠缭，這里Ft-1和Ft-0是用第一個網(wǎng)絡(luò)輸出的計算的，不是用的最后的修正值

平滑損失

平滑損失崭参，就是要求所求的F0-1和F1-0的相鄰像素的光流值要一樣呵曹，求了他們的一階導(dǎo)數(shù)。

接下來是相關(guān)代碼及注釋

代碼講解

# SloMo vanila model  這段代碼包含了構(gòu)建網(wǎng)絡(luò)的全過程
def SloMo_model(frame0, frame1, frameT, FLAGS, reuse=False, timestamp=0.5):
    # Define the container of the parameter
    if FLAGS is None:
        raise ValueError('No FLAGS is provided for generator')

    Network = collections.namedtuple('Network', 'total_loss, reconstruction_loss, perceptual_loss, \
                                                wrapping_loss,  smoothness_loss, pred_frameT   \
                                                Ft0, Ft1, Vt0,\
                                                grads_and_vars, train, global_step, learning_rate')
    with tf.variable_scope("SloMo_model", reuse=reuse):
        with tf.variable_scope("flow_computation"):
            flow_comp_input = tf.concat([frame0, frame1], axis=3)
            flow_comp_out, flow_comp_enc_out = UNet(flow_comp_input,
                                                    output_channels=4,  # 2 channel for each flow
                                                    first_kernel=FLAGS.first_kernel,
                                                    second_kernel=FLAGS.second_kernel)   #使用UNET構(gòu)建第一個光流生成網(wǎng)絡(luò)何暮，有兩個輸出
                                                    #第一個輸出為兩張圖的雙向光流奄喂，4層，每個光流兩層海洼；
                                                    #第二個輸出是網(wǎng)絡(luò)中間的的編碼層
            flow_comp_out = lrelu(flow_comp_out)
            F01, F10 = flow_comp_out[:, :, :, :2], flow_comp_out[:, :, :, 2:]   #將上面的第一個輸出分開成兩個光流圖
            print("Flow Computation Graph Initialized !!!!!! ")

        with tf.variable_scope("flow_interpolation"):
            Fdasht0 = (-1 * (1 - timestamp) * timestamp * F01) + (timestamp * timestamp * F10)
            Fdasht1 = ((1 - timestamp) * (1 - timestamp) * F01) - (timestamp * (1 - timestamp) * F10)
            #通過兩張光流圖計算t時刻到0和1時刻的光流圖
            flow_interp_input = tf.concat([frame0, frame1,
                                           flow_back_wrap(frame1, Fdasht1),
                                           flow_back_wrap(frame0, Fdasht0),
                                           Fdasht0, Fdasht1], axis=3)
            #構(gòu)建第二個網(wǎng)絡(luò)的輸入跨新，為0,1兩幀圖像，用flow_back_wrap計算合成得到的t時刻的幀坏逢，分別用Fasht0和Fdasht1兩個光流得到域帐，還有Fdasht0和Fdasht1兩個光流圖，一共6個張圖組成的輸入
            flow_interp_output, _ = UNet(flow_interp_input,
                                         output_channels=5,  # 2 channels for each flow, 1 visibilty map.
                                         decoder_extra_input=flow_comp_enc_out,
                                         first_kernel=3,
                                         second_kernel=3)
            #構(gòu)建第二個網(wǎng)絡(luò)是整，輸出5個通道俯树，額外的輸入是在中間加進去的，加入網(wǎng)絡(luò)的解碼部分
            deltaFt0, deltaFt1, Vt0 = flow_interp_output[:, :, :, :2], flow_interp_output[:, :, :, 2:4], \
                                      flow_interp_output[:, :, :, 4:5]
            #分解網(wǎng)絡(luò)的輸出贰盗，為兩個光流輸出圖许饿，是對t到1和0 的兩個光流的的修正，還有一個可見map通道
            deltaFt0 = lrelu(deltaFt0)
            deltaFt1 = lrelu(deltaFt1)
            Vt0 = tf.sigmoid(Vt0)
            Vt0 = tf.tile(Vt0, [1, 1, 1, 3])  # Copy same in all three channels
            Vt1 = 1 - Vt0

            Ft0, Ft1 = Fdasht0 + deltaFt0, Fdasht1 + deltaFt1  #用網(wǎng)絡(luò)2的輸出修正t到1和0的兩個光流舵盈，作為最后的光流輸出

            normalization_factor = 1 / ((1 - timestamp) * Vt0 + timestamp * Vt1 + FLAGS.epsilon)
            pred_frameT = tf.multiply((1 - timestamp) * Vt0, flow_back_wrap(frame0, Ft0)) + \
                          tf.multiply(timestamp * Vt1, flow_back_wrap(frame1, Ft1))
            pred_frameT = tf.multiply(normalization_factor, pred_frameT)
            #這里使用公式2來計算最終的合成圖像陋率，考慮了可見性map
            print("Flow Interpolation Graph Initialized !!!!!! ")

    rec_loss = reconstruction_loss(pred_frameT, frameT)  #重構(gòu)損失球化，就是計算gt的T幀和生成的T幀每個像素的l1損失
    percep_loss = perceptual_loss(pred_frameT, frameT, layers=FLAGS.perceptual_mode)
    #感知損失，將gt幀和生成幀分別輸入到vgg網(wǎng)絡(luò)中用中間層的特征來計算l2損失
    wrap_loss = wrapping_loss(frame0, frame1, frameT, F01, F10, Fdasht0, Fdasht1)
    #wrap損失瓦糟，分別計算  用frame1和 F01生成frame0幀筒愚，用frame0和 F10生成frame1幀，用frame0和Fdasht0生成frameT幀
    #用frame1和Fdasht1生成frameT幀菩浙，和他們的真實值之間的l1損失
    smooth_loss = smoothness_loss(F01, F10)
    #平滑損失巢掺，就是要求F01和F10兩個光流圖盡量平滑，相鄰像素之間的光流要相等

    total_loss = FLAGS.reconstruction_scaling * rec_loss + \
                 FLAGS.perceptual_scaling * percep_loss + \
                 FLAGS.wrapping_scaling * wrap_loss + \
                 FLAGS.smoothness_scaling * smooth_loss
    #將損失加權(quán)求和
    
    #以上就是整個的訓(xùn)練模型的構(gòu)建劲蜻，以下就是網(wǎng)絡(luò)訓(xùn)練方式的構(gòu)建陆淀，都是標(biāo)準(zhǔn)寫法

    with tf.variable_scope("global_step_and_learning_rate", reuse=reuse):
        global_step = tf.contrib.framework.get_or_create_global_step()
        learning_rate = tf.train.exponential_decay(FLAGS.learning_rate, global_step, FLAGS.decay_step,
                                                   FLAGS.decay_rate,
                                                   staircase=FLAGS.stair)
        incr_global_step = tf.assign(global_step, global_step + 1)

    with tf.variable_scope("optimizer", reuse=reuse):
        with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
            tvars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='SloMo_model')
            optimizer = tf.train.AdamOptimizer(learning_rate, beta1=FLAGS.beta)
            grads_and_vars = optimizer.compute_gradients(total_loss, tvars)
            train_op = optimizer.apply_gradients(grads_and_vars)

    return Network(
        total_loss=total_loss,
        reconstruction_loss=rec_loss,
        perceptual_loss=percep_loss,
        wrapping_loss=wrap_loss,
        smoothness_loss=smooth_loss,
        pred_frameT=pred_frameT,
        Ft0=Ft0,
        Ft1=Ft1,
        Vt0=Vt0,
        grads_and_vars=grads_and_vars,
        train=tf.group(total_loss, incr_global_step, train_op),
        global_step=global_step,
        learning_rate=learning_rate
    )

這里是幾種損失的實現(xiàn)

    Ipred = tf.image.convert_image_dtype(Ipred, dtype=tf.uint8)
    Iref = tf.image.convert_image_dtype(Iref, dtype=tf.uint8)

    Ipred = tf.cast(Ipred, dtype=tf.float32)
    Iref = tf.cast(Iref, dtype=tf.float32)

    # tf.reduce_mean(tf.norm(tf.math.subtract(Ipred, Iref), ord=1, axis=[3]))
    return l1_loss(Ipred, Iref)


def perceptual_loss(Ipred, Iref, layers="VGG54"):  #用了vgg54層的特征
    # Note name scope is ignored in varibale naming (scope)
    with tf.name_scope("vgg19_Ipred"):
        Ipred_features = VGG19_slim(Ipred, layers, reuse=tf.AUTO_REUSE)
    with tf.name_scope("vgg19_Iref"):
        Iref_features = VGG19_slim(Iref, layers, reuse=tf.AUTO_REUSE)

    return l2_loss(Ipred_features, Iref_features)


def wrapping_loss(frame0, frame1, frameT, F01, F10, Fdasht0, Fdasht1):
    return l1_loss(frame0, flow_back_wrap(frame1, F01)) + \
           l1_loss(frame1, flow_back_wrap(frame0, F10)) + \
           l1_loss(frameT, flow_back_wrap(frame0, Fdasht0)) + \
           l1_loss(frameT, flow_back_wrap(frame1, Fdasht1))


def smoothness_loss(F01, F10):#計算delta，將圖像平移一個像素先嬉，作差
    deltaF01 = tf.reduce_mean(tf.abs(F01[:, 1:, :, :] - F01[:, :-1, :, :])) + tf.reduce_mean(
        tf.abs(F01[:, :, 1:, :] - F01[:, :, :-1, :]))
    deltaF10 = tf.reduce_mean(tf.abs(F10[:, 1:, :, :] - F10[:, :-1, :, :])) + tf.reduce_mean(
        tf.abs(F10[:, :, 1:, :] - F10[:, :, :-1, :]))
    return 0.5 * (deltaF01 + deltaF10)

以上是個人閱讀論文筆記轧苫，如有錯誤，希望大家批評指正疫蔓，謝謝

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation論文閱讀

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation論文閱讀

公式講解

網(wǎng)絡(luò)構(gòu)建

損失函數(shù)

代碼講解