一心傀、問題描述和基本要求
對一幅BMP格式的灰度圖像(如個(gè)人證件照片)進(jìn)行二元霍夫曼編碼和譯碼。
二、算法思想
霍夫曼在1952年提出了一中構(gòu)造最佳碼的方法魂务,這種編碼方法被稱為霍夫曼編碼∶谏洌霍夫曼編碼適用于多元獨(dú)立信源粘姜,對于多元獨(dú)立信源來說他是最佳碼。它充分利用了信源概率分布的特性進(jìn)行編碼熔酷,完全依據(jù)字符出現(xiàn)概率來構(gòu)造異字頭的平均長度最短的碼字孤紧,因此它是一種最佳的逐個(gè)符號的編碼方法。
元霍夫曼碼的編碼步驟如下:
① 將個(gè)信源符號按概率分布
的大小拒秘,以遞減的次序排列起來号显,設(shè)
② 為了使最后一步信源的縮減信源有
其中押蚤,
③ 用
④ 以此繼續(xù)下去,直到縮減信源最后只剩
霍夫曼編碼過程中生成的碼樹是霍夫曼樹紊馏。假如采用二元編碼料饥,那么他的碼樹二元霍夫曼樹。在數(shù)據(jù)結(jié)構(gòu)中朱监,二元霍夫曼樹也稱為最優(yōu)二元樹岸啡。若對他的葉全部賦權(quán)值,那么權(quán)值乘上葉子的高度的和最小赫编,這也就是二元霍夫曼編碼的平均碼長巡蘸。
進(jìn)行霍夫曼編碼奋隶,有靜態(tài)霍夫曼編碼和自適應(yīng)霍夫曼編碼(常用的算法有Faller-Gallagher-Knuth算法,簡稱FGK算法悦荒,后來針對FGK算法存在的問題唯欣,Vitter進(jìn)行了改進(jìn),稱為Vitter算法)搬味。本實(shí)驗(yàn)采用靜態(tài)霍夫曼編碼境氢。
靜態(tài)霍夫曼編碼,首先掃描第一遍輸入符號流碰纬,統(tǒng)計(jì)各個(gè)符號出現(xiàn)的概率萍聊,之后按照上述算法構(gòu)造二元最優(yōu)樹,再對輸入符號流進(jìn)行第二遍掃描進(jìn)行編碼悦析。
二元的霍夫曼編碼的過程和構(gòu)造一棵最優(yōu)二元樹的過程是等價(jià)的寿桨。一棵最優(yōu)二元樹,若從他的根開始進(jìn)行遍歷操作强戴,左結(jié)點(diǎn)遍歷字符串插入0亭螟,右結(jié)點(diǎn)遍歷字符串插入1,那么最后葉結(jié)點(diǎn)的得到的遍歷字符串即為碼字骑歹。
構(gòu)造一棵二元最優(yōu)樹的算法可以描述為:
① 令
② 從S中取出兩個(gè)最小的權(quán)
③ 令
④ 判斷
帶有上述
初始化優(yōu)先級隊(duì)列PQ
把所有帶權(quán)重的結(jié)點(diǎn)放入PQ中
while PQ的大小 > 1:
取出PQ的頂端結(jié)點(diǎn)v_1伪节,帶權(quán)重w_1
取出PQ的頂端結(jié)點(diǎn)v_2光羞,帶權(quán)重w_2
構(gòu)造新結(jié)點(diǎn)Father绩鸣,使之左孩子為Node1,右孩子為Node2纱兑,權(quán)重為w_1+w_2
將Father放入到PQ中
帶有PQ的唯一權(quán)重的對應(yīng)節(jié)點(diǎn)是構(gòu)造好最優(yōu)二元樹的根結(jié)點(diǎn)呀闻。
之后,再對構(gòu)造好的最優(yōu)二元樹進(jìn)行遍歷即可得到碼字潜慎。
靜態(tài)霍夫曼編碼容易實(shí)現(xiàn)捡多,但要提前統(tǒng)計(jì)被編碼對象中的符號出現(xiàn)概率,因此必須對輸入符號流進(jìn)行兩遍掃描勘纯,這也限制了靜態(tài)霍夫曼編碼的應(yīng)用局服。
三、源代碼
#!/usr/bin/env python3
import os
import pickle
from math import log
from queue import PriorityQueue, Queue
# python3 -m pip install matplotlib
import matplotlib.pyplot as plt
# python3 -m pip install numpy
import numpy as np
# python3 -m pip install opencv-python
from cv2 import IMREAD_GRAYSCALE, imread, imshow, imwrite, absdiff
# python3 -m pip install prettytable
from prettytable import PrettyTable
class StaticHuffNode:
def __init__(self, s: int, weight: int, lchild=None, rchild=None, parent=None):
self.s = s # 信源符號
self.weight = weight # 權(quán)重
self.parent = parent # 父親
self.lchild = lchild # 左子樹
self.rchild = rchild # 右子樹
self.word = "" # 碼字
def __lt__(self, other):
return self.weight < other.weight
class WriterWrapper:
def __init__(self, f):
self.buf = ""
self.f = f
def __call__(self, con: str):
self.buf += con
while len(self.buf) >= 8:
out = self.buf[0:8]
self.buf = self.buf[8:]
conint = int(out, 2)
self.f.write(bytes([conint]))
def flush(self):
if len(self.buf) > 0:
self.buf += "0"*(7 - ((len(self.buf) - 1) % 8))
while len(self.buf) > 0:
out = self.buf[0:8]
self.buf = self.buf[8:]
conint = int(out, 2)
self.f.write(bytes([conint]))
class ReaderWrapper:
def __init__(self, f):
self.buf = ""
self.f = f
def next(self):
if self.buf != "":
c = self.buf[0]
self.buf = self.buf[1:]
return c
c = self.f.read(1)
if c == b"":
return ""
bstr = bin(ord(c))[2:]
bstr = "0"*(7 - ((len(bstr) - 1) % 8)) + bstr
self.buf = bstr
return self.next()
def read(self, c : int):
s = ""
for i in c:
s += self.next()
return s
def traverse(root, s: str, NodeStringMap: dict):
if (root.lchild == None) and (root.rchild == None):
root.word = s
NodeStringMap[root.s] = root.word
return
traverse(root.lchild, s + "0", NodeStringMap)
traverse(root.rchild, s + "1", NodeStringMap)
def StaticHuffmanEncodingAndDecoding(counter, originsize, originshape, originfilesize):
PQ = PriorityQueue()
table = PrettyTable()
table.field_names = ["信源符號", "數(shù)量", "概率"]
for item in flatten:
counter[item] += 1
for item in counter:
table.add_row([item, counter[item], counter[item] / originsize])
PQ.put(StaticHuffNode(item, counter[item]))
Hs = sum(map(lambda weight: -weight / originsize *
log(weight / originsize, 2), counter.values()))
table.sortby = "數(shù)量"
table.reversesort = True
print("完成")
print("原圖像分辨率為:{1}*{0}驳遵,作為信源淫奔,共有{2}個(gè)、{3}種信源符號堤结,熵為{4:.2f}, "
"信源概率表如下:".format(*originshape, originsize, len(counter), Hs))
print(table)
print("正在進(jìn)行靜態(tài)霍夫曼編碼...")
# 采用優(yōu)先級隊(duì)列唆迁,選擇權(quán)重最小的節(jié)點(diǎn),合并后放入隊(duì)列中繼續(xù)操作竞穷,直到隊(duì)列中只剩一個(gè)節(jié)點(diǎn)
while PQ.qsize() > 1:
left = PQ.get()
right = PQ.get()
PQ.put(StaticHuffNode(0, left.weight + right.weight, left, right))
NodeStringMap = {}
# 遍歷霍夫曼樹唐责,獲取每個(gè)信源符號對應(yīng)的碼字
root = PQ.get()
traverse(root, "", NodeStringMap)
table = PrettyTable()
table.field_names = ["信源符號", "碼字", "碼字長度", "數(shù)量", "概率"]
Len = 0
for item in counter:
table.add_row([item, NodeStringMap[item], len(NodeStringMap[item]),
counter[item], counter[item] / originsize])
Len += counter[item]*len(NodeStringMap[item])
AveLen = Len / originsize
print("編碼完成,編碼表如下:")
table.sortby = "數(shù)量"
print(table)
print("信源的熵為{:.2f}瘾带,平均碼長為{:.2f}鼠哥,編碼效率為{:.2f}%".format(
Hs, AveLen, Hs/AveLen*100))
f = open("statictarget.bin", "wb")
print("將編碼后的文件寫入到新文件statictarget.bin中。編碼采用靜態(tài)霍夫曼編碼看政,先寫入圖像尺寸和文件長度朴恳,再寫入編碼表,最后寫入編碼后圖像矩陣")
# 寫入文件分辨率和長度
f.write(pickle.dumps(originshape))
f.write(pickle.dumps(Len))
# 寫入編碼表
f.write(pickle.dumps(counter))
# 寫入圖像矩陣
wt = WriterWrapper(f)
for item in flatten:
wt(NodeStringMap[item])
wt.flush()
f.close()
TarLen = os.path.getsize("statictarget.bin")
print("新文件statictarget.bin的大小為{}字節(jié)允蚣,源文件大小為{}字節(jié)于颖,壓縮比為{:.2f}%".format(
TarLen, originfilesize, TarLen/originfilesize*100))
print("現(xiàn)在開始從新文件statictarget.bin中恢復(fù)圖像信息...")
f = open("statictarget.bin", "rb")
# 讀取圖像分辨率
originshape = pickle.load(f)
# 讀取長度
Len = pickle.load(f)
# 讀取編碼表
counter = pickle.load(f)
print("正在進(jìn)行霍夫曼譯碼...")
PQ = PriorityQueue()
for item in counter:
PQ.put(StaticHuffNode(item, counter[item]))
while PQ.qsize() > 1:
left = PQ.get()
right = PQ.get()
PQ.put(StaticHuffNode(0, left.weight + right.weight, left, right))
p = root = PQ.get()
cur = 0
rd = ReaderWrapper(f)
imgflatten = []
while (cur != Len):
c = rd.next()
cur += 1
if c == "0":
p = p.lchild
elif c == "1":
p = p.rchild
else:
raise AssertionError("c must be 0 or 1")
break
if (p.lchild == None) and (p.rchild == None):
imgflatten.append(p.s)
p = root
imgarr = np.array(imgflatten, dtype=np.uint8)
imgarr.resize(originshape)
imwrite("statictarget.bmp", imgarr)
dif = absdiff(originimg, imgarr)
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
fig = plt.figure('result')
plt.subplot(1, 3, 1)
plt.title('原圖像')
plt.imshow(originimg, plt.cm.gray)
plt.axis('off')
plt.subplot(1, 3, 2)
plt.title('編解碼后的圖像')
plt.imshow(imgarr, plt.cm.gray)
plt.axis('off')
plt.subplot(1, 3, 3)
plt.title('差異')
plt.imshow(dif, plt.cm.gray)
plt.subplots_adjust(wspace=0.15)
plt.axis('off')
plt.show()
return Len, Hs
if __name__ == "__main__":
print("請將要壓縮的圖像文件命名為origin.bmp,請保證該圖像為灰度圖像嚷兔。")
print("讀入源圖像文件...", end="")
originimg = imread("origin.bmp", IMREAD_GRAYSCALE)
originfilesize = os.path.getsize("origin.bmp")
print(f"成功森渐,源文件大小為{originfilesize}字節(jié)")
print("分析原圖像文件...", end="")
originshape = originimg.shape
originsize = originimg.size
flatten = originimg.flatten()
counter = dict.fromkeys(set(flatten), 0)
Len, Hs = StaticHuffmanEncodingAndDecoding(
counter, originsize, originshape, originfilesize)