我自己生成的數(shù)據(jù)集在GitHub思恐,包含了訓練沾谜、測試、驗證集:
GitHub - wolverinn/HEVC-CU-depths-dataset: A dataset that contains the Coding Unit image files and their corresponding depths for HEVC intra-prediction.
關(guān)于數(shù)據(jù)集的準備部分胀莹,還有一些更加細節(jié)的工作需要完成基跑。首先就是訓練集,驗證集和測試集的劃分描焰,在數(shù)據(jù)集中使用的所有YUV序列如下:
type | Train | Validation | Test |
---|---|---|---|
Bund-Nightscape_3840x2160_30 | Campfire-Party_3840x2160_30 | Construction-Field_3840x2160_30 | |
Fountains_3840x2160_30 | Library_3840x2160_30 | Marathon_3840x2160_30 | |
Residential-Building_3840x2160_30 | Runners_3840x2160_30 | Rush-Hour_3840x2160_30 | |
Scarf_3840x2160_30 | |||
Tall-Buildings_3840x2160_30 | |||
Traffic-and-Building_3840x2160_30 | |||
Traffic-Flow_3840x2160_30 | |||
Tree-Shade_3840x2160_30 | |||
Wood_3840x2160_30 | |||
2K | NebutaFestival_2560x1600_60 | PeopleOnStreet_2560x1600_30 | Traffic_2560x1600_30 |
SteamLocomotiveTrain_2560x1600_60 | |||
1080p | BasketballDrive_1920x1080_50 | BQTerrace_1920x1080_60 | Cactus_1920x1080_50 |
Kimono1_1920x1080_24 | |||
Tennis_1920x1080_24 | |||
ParkScene_1920x1080_24 | |||
720p | FourPeople_1280x720_60 | SlideShow_1280x720_20 | KristenAndSara_1280x720_60 |
SlideEditing_1280x720_30 | |||
Johnny_1280x720_60 | |||
480p | BasketballDrill_832x480_50 | Flowervase_832x480_30 | BQMall_832x480_60 |
Keiba_832x480_30 | Mobisode2_832x480_30 | PartyScene_832x480_50 | |
RaceHorses_832x480_30 | |||
288 | waterfall_352x288_20 | akiyo_352x288_20 | bridge-close_352x288_20 |
bridge-far_352x288_20 | coastguard_352x288_20 | container_352x288_20 | |
flower_352x288_20 | |||
highway_352x288_20 | |||
news_352x288_20 | |||
paris_352x288_20 | |||
240 | BasketballPass_416x240_50 | BlowingBubbles_416x240_50 | BQSquare_416x240_60 |
由于存儲空間的限制以及相鄰幀之間的高度相似媳否,所以在訓練的時候并不會使用從一個YUV文件中抽取出的所有幀,而是會隨機抽取一些幀荆秦,參考下面的公式來選取幀:
每個YUV的第2幀和第27幀都被抽取篱竭,之后的第n個幀就按照上面的公式抽取,f代表YUV文件的總幀數(shù)步绸。代碼:
def crop_image_to_ctu(video_number):
frames = len(os.listdir("{}\\temp-frames".format(WORKSPACE_PATH))) # 當前視頻一共有多少幀
random_frames = [2,27]
n = int((25+frames/40)//4)
for i in range(frames//n):
f_index = 27 + n*(i+1)
if f_index > frames:
break
else:
random_frames.append(f_index) # 隨機抽取幀掺逼,有一個公式得出抽取的幀的編號
for image_file in os.listdir("{}\\temp-frames".format(WORKSPACE_PATH)):
frame_number = int(image_file.split('_')[2])-1 # ffmpeg生成幀編號是從1開始,這里減1將編號變成從0開始和ctu分割信息對應
if frame_number in random_frames:
img = Image.open(os.path.join("{}\\temp-frames".format(WORKSPACE_PATH),image_file))
img_width, img_height = img.size
ctu_number_per_img = math.ceil(img_width / 64) * math.ceil(img_height / 64)
for ctu_number in range(ctu_number_per_img):
img_row = ctu_number // math.ceil(img_width / 64)
img_colonm = ctu_number % math.ceil(img_height / 64)
start_pixel_x = img_colonm * 64
start_pixel_y = img_row * 64
cropped_img = img.crop((start_pixel_x, start_pixel_y, start_pixel_x + 64, start_pixel_y + 64)) # 依次對抽取到的幀進行裁剪
cropped_img.save("{}\\v_{}_{}_{}_.jpg".format(IMG_PATH,video_number,str(frame_number),str(ctu_number)))
img.close()
dump_ctu_file(video_number, str(frame_number)) # 將當前幀的所有ctu分割信息保存到新的文件瓤介,只保存抽取的幀的信息
os.remove(os.path.join("{}\\temp-frames".format(WORKSPACE_PATH),image_file)) # 裁剪過后的幀就刪掉
print("Total frames extracted from video_{} : {}".format(video_number,len(random_frames)))
在生成數(shù)據(jù)集的過程中吕喘,我發(fā)現(xiàn)對于CTU的分割信息,如果將YUV的所有幀的分割信息保存到txt文件刑桑,這樣一個YUV就會產(chǎn)生幾十兆甚至1080P的YUV會產(chǎn)生一百多兆的文本文件氯质,這樣會導致訓練的時候每加載一個CTU分割信息都會花費一定的時間,影響訓練的效率祠斧。所以首先想到的是不用保存所有的分割信息闻察,只需要保存之前抽取出的幀的CTU劃分信息就可以了,這樣可以大大減少存儲空間梁肿。其次,還可以對保存的數(shù)據(jù)結(jié)構(gòu)進行優(yōu)化觅彰,如果保存在文本文件中吩蔑,每次的讀取方式就只能從第一行開始遍歷,直到找到對應的幀的CTU填抬,效率也很低烛芬。所以我將需要保存的CTU劃分信息保存到了Python的字典(dict)中,結(jié)構(gòu)如下:
video_0 = {
"frame_2":{
"ctu_0":[...] # 16x16的劃分信息
"ctu_1":[...]
.
.
.
"ctu_103":[...]
}
"frame_27":{
...
}
}
這樣可以快速根據(jù)圖片是第幾幀,第幾個CTU找到對應的劃分信息赘娄。將這個字典使用Python自帶的持久化庫pickle
保存到v_0.pkl
仆潮,在使用的時候可以方便地進行讀取。代碼如下:
def dump_ctu_file(video_number,frame_number):
# 將抽取到的幀的所有ctu分割信息保存到pickle:{"frame_number_1":{"ctu_number_1":[...];"ctu_number_2":[...]};"frame_number_2":...}
frame_detected = 0
ctu_number = "0"
temp_ctu = []
f_pkl = open("v_{}.pkl".format(video_number), 'rb')
video_dict = pickle.load(f_pkl)
f_pkl.close()
video_dict[frame_number] = {}
with open(CtuInfo_FILENAME,'r') as f:
for i,line in enumerate(f):
if frame_detected == 0:
if "frame" in line:
current_frame = line.split(':')[1]
if int(frame_number) == int(current_frame):
frame_detected = 1
elif "frame" in line:
break
elif "ctu" in line:
temp_ctu = []
ctu_number = int(line.split(':')[1])
video_dict[frame_number][str(ctu_number)] = []
else:
line_depths = line.split(' ')
for index in range(16):
temp_ctu.append(int(line_depths[index]))
video_dict[frame_number][str(ctu_number)] = temp_ctu
f_pkl = open("v_{}.pkl".format(video_number), 'wb')
pickle.dump(video_dict, f_pkl)
f_pkl.close()
數(shù)據(jù)預處理是神經(jīng)網(wǎng)絡(luò)訓練中的一個很重要也很花費時間的環(huán)節(jié)遣臼,必須考慮到神經(jīng)網(wǎng)絡(luò)所需要的格式化的輸入性置,如何將圖片與標簽對應,以及讀取所花費的時間的優(yōu)化揍堰。根據(jù)這些去對原始數(shù)據(jù)使用一定的規(guī)則進行處理鹏浅。只有數(shù)據(jù)預處理部分做好了,才能在訓練部分專注于神經(jīng)網(wǎng)絡(luò)而不是一些瑣碎的數(shù)據(jù)處理規(guī)則屏歹。
我自己生成的數(shù)據(jù)集GitHub地址: