How to use TensorFlow tf.train.string_input_producer to produce several epochs data? - Stack Overflow

When I want to use tf.train.string_input_producer to load data for 2 epochs, I used

filename_queue = tf.train.string_input_producer(filenames=['data.csv'], num_epochs=2, shuffle=True)

col1_batch, col2_batch, col3_batch = tf.train.shuffle_batch([col1, col2, col3], batch_size=batch_size, capacity=capacity,\min_after_dequeue=min_after_dequeue, allow_smaller_final_batch=True)

But then I found that this op did not produce what I want.

It can only produce each sample in data.csv for 2 times, but the generated order is not clearly. For example, 3 line data in data.csv

[[1]
[2]
[3]]

it will produce (which each sample just appear 2 times, but the order is optional)

[1]
[1]
[3]
[2]
[2]
[3]

but what I want is (each epoch is separate, shuffle in each epoch)

(epoch 1:)
[1]
[2]
[3]
(epoch 2:)
[1]
[3]
[2]

In addition, how to know when 1 epoch was done? Is there some flag variables? Thanks!

my code is here.

import tensorflow as tf

def read_my_file_format(filename_queue):
    reader = tf.TextLineReader()
    key, value = reader.read(filename_queue)
    record_defaults = [['1'], ['1'], ['1']]  
    col1, col2, col3 = tf.decode_csv(value, record_defaults=record_defaults, field_delim='-')
    # col1 = list(map(int, col1.split(',')))
    # col2 = list(map(int, col2.split(',')))
    return col1, col2, col3

def input_pipeline(filenames, batch_size, num_epochs=1):
  filename_queue = tf.train.string_input_producer(
    filenames, num_epochs=num_epochs, shuffle=True)
  col1,col2,col3 = read_my_file_format(filename_queue)

  min_after_dequeue = 10
  capacity = min_after_dequeue + 3 * batch_size
  col1_batch, col2_batch, col3_batch = tf.train.shuffle_batch(
    [col1, col2, col3], batch_size=batch_size, capacity=capacity,
    min_after_dequeue=min_after_dequeue, allow_smaller_final_batch=True)
  return col1_batch, col2_batch, col3_batch

filenames=['1.txt']
batch_size = 3
num_epochs = 1
a1,a2,a3=input_pipeline(filenames, batch_size, num_epochs)

with tf.Session() as sess:
  sess.run(tf.local_variables_initializer())
  # start populating filename queue
  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)
  try:
    while not coord.should_stop():
      a, b, c = sess.run([a1, a2, a3])
      print(a, b, c)
  except tf.errors.OutOfRangeError:
    print('Done training, epoch reached')
  finally:
    coord.request_stop()

  coord.join(threads) 

my data is like

1,2-3,4-A
7,8-9,10-B
12,13-14,15-C
17,18-19,20-D
22,23-24,25-E
27,28-29,30-F
32,33-34,35-G
37,38-39,40-H

————————————————
As Nicolas observes, the tf.train.string_input_producer() API does not give you the ability to detect when the end of an epoch is reached; instead it concatenates together all epochs into one long batch. For this reason, we recently added (in TensorFlow 1.2) the tf.contrib.data API, which makes it possible to express more sophisticated pipelines, including your use case.

The following code snippet shows how you would write your program using tf.contrib.data:

import tensorflow as tf

def input_pipeline(filenames, batch_size):
    # Define a `tf.contrib.data.Dataset` for iterating over one epoch of the data.
    dataset = (tf.contrib.data.TextLineDataset(filenames)
               .map(lambda line: tf.decode_csv(
                    line, record_defaults=[['1'], ['1'], ['1']], field_delim='-'))
               .shuffle(buffer_size=10)  # Equivalent to min_after_dequeue=10.
               .batch(batch_size))

    # Return an *initializable* iterator over the dataset, which will allow us to
    # re-initialize it at the beginning of each epoch.
    return dataset.make_initializable_iterator() 

filenames=['1.txt']
batch_size = 3
num_epochs = 10
iterator = input_pipeline(filenames, batch_size)

# `a1`, `a2`, and `a3` represent the next element to be retrieved from the iterator.    
a1, a2, a3 = iterator.get_next()

with tf.Session() as sess:
    for _ in range(num_epochs):
        # Resets the iterator at the beginning of an epoch.
        sess.run(iterator.initializer)

        try:
            while True:
                a, b, c = sess.run([a1, a2, a3])
                print(a, b, c)
        except tf.errors.OutOfRangeError:
            # This will be raised when you reach the end of an epoch (i.e. the
            # iterator has no more elements).
            pass                 

        # Perform any end-of-epoch computation here.
        print('Done training, epoch reached')

————————————————
You might want to have a look to this answer to a similar question.

The short story is that:

  • if num_epochs > 1, all the data is enqueued at the same time and suffled independently of the epoch,

  • so you don't have the ability to monitor which epoch is being dequeued.

What you could do is the first suggestion in the quoted answer, which is to work with num_epochs == 1, and reinitialise the local queue variables (and obviously not the model variables) in each run.

init_queue = tf.variables_initializer(tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES, scope='input_producer'))
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
for e in range(num_epochs):
    with tf.Session() as sess:
       sess.run(init_queue) # reinitialize the local variables in the input_producer scope
       # start populating filename queue
       coord = tf.train.Coordinator()
       threads = tf.train.start_queue_runners(coord=coord)
       try:
           while not coord.should_stop():
               a, b, c = sess.run([a1, a2, a3])
               print(a, b, c)
       except tf.errors.OutOfRangeError:
           print('Done training, epoch reached')
       finally:
           coord.request_stop()

       coord.join(threads)

原文: neural network - How to use TensorFlow tf.train.string_input_producer to produce several epochs data? - Stack Overflow

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
  • 序言:七十年代末,一起剝皮案震驚了整個濱河市换衬,隨后出現(xiàn)的幾起案子,更是在濱河造成了極大的恐慌,老刑警劉巖,帶你破解...
    沈念sama閱讀 212,816評論 6 492
  • 序言:濱河連續(xù)發(fā)生了三起死亡事件熄云,死亡現(xiàn)場離奇詭異艺糜,居然都是意外死亡,警方通過查閱死者的電腦和手機赶熟,發(fā)現(xiàn)死者居然都...
    沈念sama閱讀 90,729評論 3 385
  • 文/潘曉璐 我一進店門,熙熙樓的掌柜王于貴愁眉苦臉地迎上來陷嘴,“玉大人映砖,你說我怎么就攤上這事≡职ぃ” “怎么了邑退?”我有些...
    開封第一講書人閱讀 158,300評論 0 348
  • 文/不壞的土叔 我叫張陵,是天一觀的道長劳澄。 經(jīng)常有香客問我地技,道長,這世上最難降的妖魔是什么秒拔? 我笑而不...
    開封第一講書人閱讀 56,780評論 1 285
  • 正文 為了忘掉前任莫矗,我火速辦了婚禮,結(jié)果婚禮上,老公的妹妹穿的比我還像新娘作谚。我一直安慰自己三娩,他們只是感情好,可當(dāng)我...
    茶點故事閱讀 65,890評論 6 385
  • 文/花漫 我一把揭開白布妹懒。 她就那樣靜靜地躺著尽棕,像睡著了一般。 火紅的嫁衣襯著肌膚如雪彬伦。 梳的紋絲不亂的頭發(fā)上滔悉,一...
    開封第一講書人閱讀 50,084評論 1 291
  • 那天,我揣著相機與錄音单绑,去河邊找鬼回官。 笑死,一個胖子當(dāng)著我的面吹牛搂橙,可吹牛的內(nèi)容都是我干的歉提。 我是一名探鬼主播,決...
    沈念sama閱讀 39,151評論 3 410
  • 文/蒼蘭香墨 我猛地睜開眼区转,長吁一口氣:“原來是場噩夢啊……” “哼苔巨!你這毒婦竟也來了?” 一聲冷哼從身側(cè)響起废离,我...
    開封第一講書人閱讀 37,912評論 0 268
  • 序言:老撾萬榮一對情侶失蹤侄泽,失蹤者是張志新(化名)和其女友劉穎,沒想到半個月后蜻韭,有當(dāng)?shù)厝嗽跇淞掷锇l(fā)現(xiàn)了一具尸體悼尾,經(jīng)...
    沈念sama閱讀 44,355評論 1 303
  • 正文 獨居荒郊野嶺守林人離奇死亡,尸身上長有42處帶血的膿包…… 初始之章·張勛 以下內(nèi)容為張勛視角 年9月15日...
    茶點故事閱讀 36,666評論 2 327
  • 正文 我和宋清朗相戀三年肖方,在試婚紗的時候發(fā)現(xiàn)自己被綠了闺魏。 大學(xué)時的朋友給我發(fā)了我未婚夫和他白月光在一起吃飯的照片。...
    茶點故事閱讀 38,809評論 1 341
  • 序言:一個原本活蹦亂跳的男人離奇死亡俯画,死狀恐怖析桥,靈堂內(nèi)的尸體忽然破棺而出,到底是詐尸還是另有隱情艰垂,我是刑警寧澤泡仗,帶...
    沈念sama閱讀 34,504評論 4 334
  • 正文 年R本政府宣布,位于F島的核電站材泄,受9級特大地震影響沮焕,放射性物質(zhì)發(fā)生泄漏吨岭。R本人自食惡果不足惜拉宗,卻給世界環(huán)境...
    茶點故事閱讀 40,150評論 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一處隱蔽的房頂上張望。 院中可真熱鬧旦事,春花似錦魁巩、人聲如沸。這莊子的主人今日做“春日...
    開封第一講書人閱讀 30,882評論 0 21
  • 文/蒼蘭香墨 我抬頭看了看天上的太陽。三九已至卖鲤,卻和暖如春肾扰,著一層夾襖步出監(jiān)牢的瞬間,已是汗流浹背蛋逾。 一陣腳步聲響...
    開封第一講書人閱讀 32,121評論 1 267
  • 我被黑心中介騙來泰國打工集晚, 沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留,地道東北人区匣。 一個月前我還...
    沈念sama閱讀 46,628評論 2 362
  • 正文 我出身青樓偷拔,卻偏偏與公主長得像,于是被迫代替她去往敵國和親亏钩。 傳聞我的和親對象是個殘疾皇子莲绰,可洞房花燭夜當(dāng)晚...
    茶點故事閱讀 43,724評論 2 351

推薦閱讀更多精彩內(nèi)容