1.實時圖像處理FPGA實現(xiàn)的一些約束
-
時序約束
對于非實時圖像處理而言晦炊,時序約束并不緊張璧瞬。但是如果涉及到實時圖像處理栈暇。那么每一個cycle都需要輸出一個像素的數(shù)據(jù)麻裁。每一個像素的計算都需要在一個像素周期內(nèi)完成
-
位寬約束
對于實時圖像處理而言,計算一個特定的像素點需要相鄰幾個像素點的數(shù)據(jù)源祈。比如雙線性插值(需要4個相鄰像素的灰階)煎源。但是由于RAM資源有限。端口也有限新博。所以沒辦法在一個周期內(nèi)同時訪存得到4個像素點的數(shù)據(jù)薪夕。解決方法:1.使用多個Ram Bank可以達到同時訪存的目的 2.使用雙倍的時鐘去訪問RAM脚草。達到一個標(biāo)準(zhǔn)周期內(nèi)可以訪問RAM兩次的目的赫悄。 3.使用適當(dāng)?shù)木彺妗?/p>
復(fù)雜的運算如何處理
-
小數(shù)計算
全部轉(zhuǎn)化為定點數(shù)再做處理
-
除法計算
除法計算是最為消耗資源的計算。我們推薦使用查找表來完成除法計算。
2.實時圖像處理FPGA實現(xiàn)的一些技巧
-
查找表
使用查找表來實現(xiàn)一些復(fù)雜的計算埂淮,比如除法姑隅。然后有限位寬的查找表可以通過插值來實現(xiàn)比位寬更高的精確度。(查找表的輸入為index高8位)倔撞。然后具體每一個低八位對應(yīng)的值通過插值來實現(xiàn)讲仰。
-
基于顯示像素掃描順序的方法
可以基于像素掃描的規(guī)律簡化下列平方計算。
接口部分(輸入痪蝇,輸出)
line buffer
線性變換模塊
雙線性插值模塊
3.我們面對得項目鄙陡,所面對的情況:
線性變換坐標(biāo)變化范圍(v:畸變圖縱軸 y:原圖縱軸)
v | y | |
---|---|---|
0 | (v-50,0) | |
50 | (0,v+5) | |
Height-50 | (v-5,v+50) | |
Height | (v,v+50) |
4.示例代碼ram讀寫時序
{signal: [
{name: 'clk', wave: 'p.........'},
{name: 'data_enable', wave: '01........'},
{name: 'de_1d', wave: '0.1.......'},
{name: 'dot_cnt', wave: '2.22222222', data: ['0000', '0001', '0010', '0011','0100','0101','0110','0111','1000']},
{name: 'dat', wave: '3.33333333', data: ['d0' ,'d1', 'd2', 'd3','d4','d5','d6','d7','d8']},
{name: 'dot_cnt[0]', wave: '0.10101010'},
{name: 'dot_cnt[0]_falling', wave: '0..1010101'},
{name: 'sram_addr0', wave: '4...4.4.4.4', data: ['0' , '1', '2', '3','4','0101','0110','0111','1000']},
{name: 'sram_wdata0', wave: '3.33333333', data: ['0','0,d0','d1,d0','d1,d2','d3,d2','d3,d4','d5,d4','d5,d6','d7,d6']},
{name: 'sram0_web', wave: '1..0101010'},
{name: 'o_sram_addr0', wave: '4....4.4.4', data: ['0' , '1', '2', '3']},
{name: 'o_sram_wdata0', wave: '3..33333333', data: ['0','0,d0','d1,d0','d1,d2','d3,d2','d3,d4','d5,d4','d5,d6','d7,d6']},
{name: 'o_sram0_web', wave: '1...0101010'},
{name: 'sram0_web_d', wave: '1....0101010'},
{name: 'i_sram0_rdata', wave: '5....5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'i_rdata_odd', wave: '5.....5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'rdata_post', wave: '5......5.5.5.', data: ['d1,d0','d3,d2','d5,d4','d7,d6']},
{name: 'sram0_web_1d', wave: '1...010101'},
{name: 'wr_odd', wave: '0.1........'},
{name: 'toggle', wave: '0..10101010'},
{name: 'toggle_d', wave: '0...10101010'},
{name: 'data_3d', wave: '3.....33333', data: ['d0','d1', 'd2', 'd3','d4','d5']},
{name: 'wr_odd_d', wave: '0....1.....'},
]}
這份RAM代碼是雙讀通道。
5.各個模塊的實現(xiàn)
5.1 Line Buffer
50行Line Buffer 分為兩個25行的LIne Buffer LB0 LB1
LB0和LB1的規(guī)格為25x1920x48 bit
LB0 LB1可以獨立完成讀寫
5.2 Divider LUT
5.地址增加方法
type 1 2 3 ( v1_1d == v1_2d )
begin
addr <= addr + (flag_sram - flag_sram_1d) * 1920 + u1 - u1_1d ;
end
type 4 5 6
- (v1 == v1_1d + 1)
flag_sram - flag_sram_1d = 1 時躏啰,加2行
flag_sram - flag_sram_1d = 0 時趁矾,加1行
begin
addr <= addr + ( flag_sram - flag_sram_1d + 1) * 1920+ u1_1d - u1_2d;
end
type 4 5 6
- (v1 == v1_1d - 1)
flag_sram - flag_sram_1d = 0 時,減1行
flag_sram - flag_sram_1d = -1時给僵,減2行
begin
addr <= addr + ( flag_sram - flag_sram_1d - 1) * 1920 + u1 - u1_1d毫捣;
end
綜上 type3
begin
addr <= addr + ( flag_sram - flag_sram_1d + v1 - v1_1d) * 1920 + u1 - u1_1d;
end
可以轉(zhuǎn)化為如下通式
begin
addr <= addr + ( flag_sram - flag_sram_1d + v1 - v1_1d) * 1920 + u1 - u1_1d帝际;
end
6.數(shù)據(jù)緩存方法
type1 2 3(v==v_1d)
pixel_reg1 = i_sram_rdata[95:72];
type 4 5 6 (v1_1d==v1_2d+1) flag_sram == 1
flag_sram_1d - flag_sram_2d = 1 時蔓同,加2行
pixel_reg1 <= pixel_reg1;
flag_sram_1d - flag_sram_2d = 0時,加1行
pixel_reg1 <= i_sram_rdata[95:72];
type 4 5 6 (v1_1d == v1_2d - 1) flag_sram == 0
flag_sram - flag_sram_1d = 0 時蹲诀,減1行
pixel_reg1 <= i_sram_rdata[95:72];
flag_sram - flag_sram_1d = -1斑粱,減2行
pixel_reg1 <= pixel_reg1;
6.寄存器數(shù)據(jù)如何安排 // at stage 4
type 1 2 3 (v1_3d == v1_4d )
if (flag_sram_3d == 0)
part_a <= i_sram_rdat[95:72] * mul_a_2d;
else if (flag_sram_3d == 1)
case (u1_3d - u1_4d)
0: part_a <= pixel_reg0 * mul_a_2d;
1: part_a <= pixel_reg1 * mul_a_2d;
2: part_a <= pixel_reg2 * mul_a_2d;
endcase
if (flag_sram_3d == 0)
part_b <= i_sram_rdat[71:48] * mul_b_2d;
else if (flag_sram_3d == 1)
case (u1_3d - u1_4d)
0: part_b <= pixel_reg1 * mul_b_2d;
1: part_b <= pixel_reg2 * mul_b_2d;
2: part_b <= pixel_reg3 * mul_b_2d;
endcase
if (flag_sram_3d == 0)
case (u1_3d - u1_4d)
0: part_c <= pixel_reg0 * mul_c_2d;
1: part_c <= pixel_reg1 * mul_c_2d;
2: part_c <= pixel_reg2 * mul_c_2d;
endcase
else if ( flag_sram_3d == 1)
part_c <= i_sram_rdat [95:72] * mul_c_2d
if (flag_sram_3d == 0)
case (u1_3d - u1_4d)
0: part_d <= pixel_reg1 * mul_d_2d;
1: part_d <= pixel_reg2 * mul_d_2d;
2: part_d <= pixel_reg3 * mul_d_2d;
endcase
else if ( flag_sram_3d == 1)
part_d <= i_sram_rdat [71:48] * mul_d_2d;
type 4 5 6 (v1_3d == v1_4d+1 || v1_3d==v1_4d-1)
v1_3d == v1_4d+1 //flag_sram_3d == 1
if (v1_3d == v1_4d+1)
case (u1_3d - u1_4d)
0: part_a <= pixel_reg0 * mul_a_2d;
1: part_a <= pixel_reg1 * mul_a_2d;
2: part_a <= pixel_reg2 * mul_a_2d;
endcase
if (v1_3d == v1_4d+1)
case (u1_3d - u1_4d)
0: part_b <= pixel_reg1 * mul_b_2d;
1: part_b <= pixel_reg2 * mul_b_2d;
2: part_b <= pixel_reg3 * mul_b_2d;
endcase
if (v1_3d == v1_4d+1)
part_c <= i_sram_rdat [95:72] * mul_c_2d
if (v1_3d == v1_4d+1)
part_d <= i_sram_rdat [71:48] * mul_d_2d
v1_3d==v1_4d-1 //flag_sram_3d == 0
if (v1_3d == v1_4d - 1)
part_a <= i_sram_rdat [95:72] * mul_a_2d;
if (v1_3d == v1_4d - 1)
part_b <= i_sram_rdat[71:48] * mul_b_2d;
if (v1_3d == v1_4d -1)
case (u1_3d - u1_4d)
0: part_c <= pixel_reg0 * mul_c_2d;
1: part_c <= pixel_reg1 * mul_c_2d;
2: part_c <= pixel_reg2 * mul_c_2d;
endcase
if (v1_3d == v1_4d -1)
case (u1_3d - u1_4d)
0: part_d <= pixel_reg1 * mul_d_2d;
1: part_d <= pixel_reg2 * mul_d_2d;
2: part_d <= pixel_reg3 * mul_d_2d;
endcase
7.越界處理
flag_cross_u1;
flag_cross_u1_1;
flag_cross_v1;
flag_cross_v1_1;
flag_cross = {flag_cross_u1, flag_cross_u1_1, flag_cross_v1, flag_cross_v1_1}
對于a而言越界條件如下
flag_cross_u1 || flag_cross_v1
對于b而言越界條件如下
flag_cross_u1_1 || flag_cross_v1
對于c而言越界條件如下
flag_cross_u1 || flag_cross_v1_1
對于d而言越界條件如下
flag_cross_u1_1 || flag_cross_v1_1
8.Line Buffer
設(shè)計兩組buffer,做一個ping-pong操作
新版設(shè)計:
T0:
讀入轉(zhuǎn)換后的坐標(biāo)侧甫, i_coord_u, i_coord_v
T1:
算出 u1,v1, u2, v2, du,dv, sub_du, sub_dv
T2:
計算mul_a, mul_b, mul_c, mul_d
if (i_hsync_1d && i_hsync_2d)
u1_offset=u1-u1_src;
v1_offset=v1- v1_src;
else if (i_hsync_1d && !i_hsync_2d)
u1_offset =0;
v1_offset =0;
assign v1_sub1 =v1-1;
assign v1_plus1=v1+1;
assign v1_plus2=v1+2;
if (u1 > u1_src+6 || v1>v1_src+1 || v1<v1_src-1)
addr1<=u1>>3 + (v1_sub1 )240;
addr2<=u1>>3 + (v1)240;
addr3<=u1>>3 + (v1_plus1)240;
addr4<=u1>>3 + (v1_plus2)240;
u1_src <= u1;
v1_src <= v1;
addr_en <= 1;
T3:
給入地址addr<=addr1 web<=1;
u1_offset_1d 珊佣, v1_offset_1d
mul_a_1d, mul_b_2d, mul_c_2d, mul_d_2d
T4:
給入地址addr<=addr2 ,
u1_offset_2d , v1_offset_2d
mul_a_3d, mul_b_3d, mul_c_3d, mul_d_3d
T5:
給入地址addr<=addr3 ,
u1_offset_3d, v1_offset_3d;
line1_reg1_a <= sram_data[8*DW-1:7*DW] ..... line1_reg8_a<=sram_data[DW-1:0]
mul_a_4d, mul_b_4d, mul_c_4d, mul_d_4d
T6:
給入地址addr<=addr4 ,
u1_offset_4d,v1_offset_4d;
line2_reg1_a <= sram_data[8*DW-1:7*DW] ..... line2_reg8_a<=sram_data[DW-1:0]
mul_a_5d, mul_b_5d, mul_c_5d, mul_d_5d
T7:
web = 0;
u1_offset_5d, v1_offset_5d;
line3_reg1_a <= sram_data[8*DW-1:7*DW] ..... line3_reg8_a<=sram_data[DW-1:0]
mul_a_5d, mul_b_5d, mul_c_5d, mul_d_5d
T8:
u1_offset_6d,v1_offset_6d;
line4_reg1_a <= sram_data[8*DW-1:7*DW] ..... line4_reg8_a<=sram_data[DW-1:0]
mul_a_6d, mul_b_6d, mul_c_6d, mul_d_6d
T9:
v1_offset_7d
case (u1_offset_6d)
'd0: begin
tmp_line1_reg1 = line1_reg1
tmp_line1_reg2 = line1_reg2
tmp_line2_reg1 = line2_reg1
tmp_line2_reg2 = line2_reg2
tmp_line3_reg1 = line3_reg1
tmp_line3_reg2 = line3_reg2
tmp_line4_reg1 = line4_reg1
tmp_line4_reg2 = line4_reg2
end
....
endcase
mul_a_7d, mul_b_7d, mul_c_7d, mul_d_7d
T10:
case (v1_offset_7d)
'd0: begin
pixel_a <= tmp_line2_reg1;
pixel_b <= tmp_line2_reg2;
pixel_c <= tmp_line3_reg1;
pixel_d <= tmp_line4_reg2;
end
endcase
mul_a_8d, mul_b_8d, mul_c_8d, mul_d_8d
T11
part_a_r = pixel_a[23:16] * mul_a_8d;
part_b_r = pixel_b[23:16] * mul_b_8d;
part_c_r = pixel_c[23:16] * mul_c_8d;
part_d_r = pixel_d[23:16] * mul_d_8d;
T12
final_r = part_a_r + part_b_r + part_c_r + part_d_r;
主要是邊界條件如何處理。如果待插值點位于圖片邊緣披粟。
可能存在的邊界情況
flag_cross_u1 && ~flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2;(左上角)
eg: (-0.2 咒锻,-0.5)
flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && ~flag_cross_v2; (左側(cè))
eg: (-0.2 , 50)
flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2; (左下角)
eg: (-0.2, 1079.5)
~flag_cross_u1 && ~flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2; (上側(cè))
eg:(50, -0.5)
~flag_cross_u1 && ~flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2; (下側(cè))
eg : (50,1079.5)
~flag_cross_u1 && flag_cross_u2 && flag_cross_v1 && ~flag_cross_v2; (右上角)
eg: (1919.5, -0.5)
~flag_cross_u1 && flag_cross_u2 && ~flag_cross_v1 && ~flag_cross_v2; (右側(cè))
eg :(1919.5, 50)
~flag_cross_u1 && flag_cross_u2 && ~flag_cross_v1 && flag_cross_v2;(右下角)
eg : (1919.5,1079.5)
疑問:如何實現(xiàn)地址賦值
辦法:根據(jù)當(dāng)前寫SRAM的地址(提前顯示屏掃描50行)
為了顯示視頻流需要將SRAM擴沖為150行Line Buffer
這個地址的確定問題
11/6 工作匯報,以及下一步的工作安排
- 目前的設(shè)計尚且是每個周期出1個pixel,下一步需要修改為每個周期出4個pixel
- 對目前的設(shè)計做一個詳細的驗證守屉。具體步驟是寫出一個計算變換矩陣的SV model惑艇。然后做隨機化的驗證
透視變換
https://blog.csdn.net/xiaowei_cqu/article/details/26471527
11/7 變換矩陣公式推導(dǎo)
11/14 匯報記錄
sram可選1T16P 或者是1T32P 盡量做成離散的類型
1T4P實現(xiàn)方式
對于坐標(biāo)變換模塊值得注意的是
temp_u_1, temp_v_1, temp_w_1,temp_u_2, temp_v_2, temp_w_2的計算方式可以優(yōu)化拇泛。都可以使用加法器實現(xiàn)
插值模塊實現(xiàn)思路(1T4P)
cache結(jié)構(gòu)
輸入:4個連續(xù)的變換后坐標(biāo)
stage 0
賦值u_a_1 , v_a_1 , u_a_2 , v_a_2 , u_a_3 , v_a_3 , u_a_4 , v_a_4
賦值du_1 , dv_1 ....... du_4 , dv_4
stage 1
reg [3:0] flag_cross_1,...,flag_cross_4;
為每個點賦值越界條件
賦值head_v_differ = v_a_2 [1:0] - v_a_1[1:0];
賦值 u_a_1_1d , v_a_1_1d , v_a_3_1d, v_a_4_1d;
賦值v_flag_1d;
賦值flag_start = u_b_1 < 1919 && !flag_start && !(v1 > 1080 && v1 < 8190));
stage 2
確定sram訪問地址
assign addr_tmp = (v_a_1 << 7) - (v_a_1 << 3) + u_a_1>>4;
addr1 = addr_tmp;
賦值 u_src , v_src
賦值 addr_en
賦值 head_v_differ_1d;
賦值 u_a_1_2d , v_a_1_2d;v_a_3_2d, v_a_4_2d;
賦值 flag_start_1d;
stage 3
addr2 = addr_tmp + 120;
賦值head_v_differ_2d;
賦值 v_flag_3d;
賦值 sram_web;
賦值 sram_addr = addr1;
賦值 addr_en_1d;
stage 4
assign tail_v_differ = coord_v[1:0] - coord_v[DW+1:DW];
assign head_tail_v_differ = coord_v[1:0] - v_src[1:0];
if (head_tail_v_differ == 'b01)
v_shift <= 120;
else if (head_tail_v_differ == 'b00 && (tail_v_differ == 'b11 || head_v_differ == 'b01))
v_shift <= 120
else
v_shift <= -240;
if (differ1 || differ2)
addr_en_line3 = 1'b1;
end else begin
addr_en_line3 = 0;
end
賦值 sram_web =
賦值 addr = addr2
stage 5
if (addr_en_line3) begin
sram_addr = sram_addr + v_shift;
end
if (rdata_vld_1)
{line1_reg_a_1 ...... line1_reg_a_16} <= sram_rdata;
stage 6
if (rdata_vld_2)
{line2_reg_a_1 ...... line2_reg_a_16} <= sram_rdata;
stage 7
if (rdata_vld_3)
{line3_reg_1 ...... line3_reg_16} <= sram_rdata;
定義一個3*5的pre_cache緩存上一次
stage 8
定義一個3*6的cache 通過當(dāng)前4個pixel中第一個pixel的u坐標(biāo)與u_src的差值來決定緩存哪部分?jǐn)?shù)據(jù)
temp_line1_reg1 .... temp_line1_reg6
temp_line2_reg1 .... temp_line2_reg6
temp_line3_reg1 .... temp_line3_reg6
stage 9
根據(jù)與v_src的v值差異以及當(dāng)前點與4個連續(xù)點第一個坐標(biāo)的u值差異來決定選取temp cache中的4個特定點滨巴。對于當(dāng)前要計算的4個連續(xù)像素。第一個像素的選擇范圍是temp_reg1-temp_reg3俺叭。第二個像素的選擇范圍是temp_reg2-temp_reg4.......第四個像素的選擇范圍是temp_reg4-temp_reg6恭取。
stage 10
做插值的第一步處理
stage11
做差值的第二步處理
12/2匯報
如何平衡這個寄存器數(shù)目和組合邏輯復(fù)雜度。
sram的吞吐量和寄存器數(shù)目哪個更加重要熄守。功耗和資源哪個更加重要
case 語句測試
area ultimazation
該測試的目的是檢查case語句不同寫法的資源消耗.
type2和type3功能上等效蜈垮,資源消耗也相等耗跛,type2與type3比type1資源消耗大
type1
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
8'b01111:o_value <= i_data[5*24-1:4*24];
8'b10111:o_value <= i_data[4*24-1:3*24];
8'b11011:o_value <= i_data[3*24-1:2*24];
8'b11101:o_value <= i_data[2*24-1:1*24];
8'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 106.560001
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 296.640003
type2
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
8'h00,8'h01,8'h02,8'h03,8'h04,8'h05,8'h06,8'h07,8'h08,8'h09,8'h0a,8'h0b,8'h0c,8'h0d,8'h0e,8'h0f:o_value <= i_data[5*24-1:4*24];
5'd16,8'd17,8'd18,8'd19,8'd20,8'd21,8'd22,8'd23:o_value <= i_data[4*24-1:3*24];
5'd24,8'd25,8'd26,8'd27:o_value <= i_data[3*24-1:2*24];
5'd28,8'd29:o_value <= i_data[2*24-1:1*24];
5'd30:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 230.040000
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 420.120002
type3
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
5'b0????:o_value <= i_data[5*24-1:4*24];
5'b10???:o_value <= i_data[4*24-1:3*24];
5'b110??:o_value <= i_data[3*24-1:2*24];
5'b1110?:o_value <= i_data[2*24-1:1*24];
5'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Combinational area: 230.040000
Buf/Inv area: 1.080000
Noncombinational area: 190.080002
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 420.120002
type 4
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'hffffff;
end else begin
casez (i_shift)
22'b10??_????_????_????_????_??:o_value <= i_data[22*24-1:21*24];
22'b110?_????_????_????_????_??:o_value <= i_data[21*24-1:20*24];
22'b1110_????_????_????_????_??:o_value <= i_data[20*24-1:19*24];
22'b1111_0???_????_????_????_??:o_value <= i_data[19*24-1:18*24];
22'b1111_10??_????_????_????_??:o_value <= i_data[18*24-1:17*24];
22'b1111_110?_????_????_????_??:o_value <= i_data[17*24-1:16*24];
22'b1111_1110_????_????_????_??:o_value <= i_data[16*24-1:15*24];
22'b1111_1111_0???_????_????_??:o_value <= i_data[15*24-1:14*24];
22'b1111_1111_10??_????_????_??:o_value <= i_data[14*24-1:13*24];
22'b1111_1111_110?_????_????_??:o_value <= i_data[13*24-1:12*24];
22'b1111_1111_1110_????_????_??:o_value <= i_data[12*24-1:11*24];
22'b1111_1111_1111_0???_????_??:o_value <= i_data[11*24-1:10*24];
22'b1111_1111_1111_10??_????_??:o_value <= i_data[10*24-1:9*24];
22'b1111_1111_1111_110?_????_??:o_value <= i_data[9*24-1:8*24];
22'b1111_1111_1111_1110_????_??:o_value <= i_data[8*24-1:7*24];
22'b1111_1111_1111_1111_0???_??:o_value <= i_data[7*24-1:6*24];
22'b1111_1111_1111_1111_10??_??:o_value <= i_data[6*24-1:5*24];
22'b1111_1111_1111_1111_110?_??:o_value <= i_data[5*24-1:4*24];
22'b1111_1111_1111_1111_1110_??:o_value <= i_data[4*24-1:3*24];
22'b1111_1111_1111_1111_1111_1?:o_value <= i_data[3*24-1:2*24];
22'b1111_1111_1111_1111_1111_10:o_value <= i_data[2*24-1:1*24];
default: o_value <= 24'hffffff;
endcase
end
end
Combinational area: 960.840003
Buf/Inv area: 2.520000
Noncombinational area: 198.719994
Macro/Black Box area: 0.000000
Net Interconnect area: undefined (Wire load has zero net area)
Total cell area: 1159.559996
Timing report
type1
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'b0;
end else begin
casez (i_shift)
5'b0????:o_value <= i_data[5*24-1:4*24];
5'b10???:o_value <= i_data[4*24-1:3*24];
5'b110??:o_value <= i_data[3*24-1:2*24];
5'b1110?:o_value <= i_data[2*24-1:1*24];
5'b11110:o_value <= i_data[1*24-1:0*24];
default :o_value <= 24'hffffff;
endcase
end
end
Point Incr Path
-----------------------------------------------------------
clock i_clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
input external delay 0.25 0.25 r
i_shift[4] (in) 0.00 0.25 r
U110/Z (INVM1T) 0.19 0.44 f
U112/Z (NR2M2T) 0.60 1.04 r
U205/Z (AOI22M1T) 0.22 1.26 f
U203/Z (ND4M1T) 0.13 1.38 r
o_value_reg[22]/D (DFQRM1T) 0.00 1.38 r
data arrival time 1.38
clock i_clk (rise edge) 2.50 2.50
clock network delay (ideal) 0.00 2.50
clock uncertainty -0.20 2.30
o_value_reg[22]/CK (DFQRM1T) 0.00 2.30 r
library setup time -0.02 2.28
data required time 2.28
-----------------------------------------------------------
data required time 2.28
data arrival time -1.38
-----------------------------------------------------------
slack (MET) 0.90
type 2
always @(posedge i_clk or negedge i_rst_n)
begin
if (!i_rst_n) begin
o_value <= 24'hffffff;
end else begin
casez (i_shift)
22'b10??_????_????_????_????_??:o_value <= i_data[22*24-1:21*24];
22'b110?_????_????_????_????_??:o_value <= i_data[21*24-1:20*24];
22'b1110_????_????_????_????_??:o_value <= i_data[20*24-1:19*24];
22'b1111_0???_????_????_????_??:o_value <= i_data[19*24-1:18*24];
22'b1111_10??_????_????_????_??:o_value <= i_data[18*24-1:17*24];
22'b1111_110?_????_????_????_??:o_value <= i_data[17*24-1:16*24];
22'b1111_1110_????_????_????_??:o_value <= i_data[16*24-1:15*24];
22'b1111_1111_0???_????_????_??:o_value <= i_data[15*24-1:14*24];
22'b1111_1111_10??_????_????_??:o_value <= i_data[14*24-1:13*24];
22'b1111_1111_110?_????_????_??:o_value <= i_data[13*24-1:12*24];
22'b1111_1111_1110_????_????_??:o_value <= i_data[12*24-1:11*24];
22'b1111_1111_1111_0???_????_??:o_value <= i_data[11*24-1:10*24];
22'b1111_1111_1111_10??_????_??:o_value <= i_data[10*24-1:9*24];
22'b1111_1111_1111_110?_????_??:o_value <= i_data[9*24-1:8*24];
22'b1111_1111_1111_1110_????_??:o_value <= i_data[8*24-1:7*24];
22'b1111_1111_1111_1111_0???_??:o_value <= i_data[7*24-1:6*24];
22'b1111_1111_1111_1111_10??_??:o_value <= i_data[6*24-1:5*24];
22'b1111_1111_1111_1111_110?_??:o_value <= i_data[5*24-1:4*24];
22'b1111_1111_1111_1111_1110_??:o_value <= i_data[4*24-1:3*24];
22'b1111_1111_1111_1111_1111_1?:o_value <= i_data[3*24-1:2*24];
22'b1111_1111_1111_1111_1111_10:o_value <= i_data[2*24-1:1*24];
default: o_value <= 24'hffffff;
endcase
end
end
Point Incr Path
-----------------------------------------------------------
clock i_clk (rise edge) 0.00 0.00
clock network delay (ideal) 0.00 0.00
input external delay 0.25 0.25 r
i_shift[17] (in) 0.00 0.25 r
U406/Z (ND4M2T) 0.11 0.36 f
U418/Z (ND2B1M4T) 0.15 0.51 f
U417/Z (NR2B1M4T) 0.06 0.57 r
U441/Z (CKND2M2T) 0.08 0.65 f
U439/Z (NR2B1M2T) 0.10 0.74 r
U438/Z (CKND2M2T) 0.08 0.83 f
U437/Z (NR2B1M2T) 0.10 0.92 r
U405/Z (ND2M2T) 0.09 1.01 f
U414/Z (NR2B1M2T) 0.10 1.11 r
U413/Z (ND2M2T) 0.10 1.21 f
U403/Z (NR2B1M4T) 0.08 1.29 r
U732/Z (NR2B1M2T) 0.52 1.80 r
U770/Z (AO22M1T) 0.31 2.12 r
U694/Z (AOI211M1T) 0.05 2.17 f
U691/Z (ND4M1T) 0.11 2.28 r
o_value_reg[15]/D (DFQSM1T) 0.00 2.28 r
data arrival time 2.28
clock i_clk (rise edge) 2.50 2.50
clock network delay (ideal) 0.00 2.50
clock uncertainty -0.20 2.30
o_value_reg[15]/CK (DFQSM1T) 0.00 2.30 r
library setup time -0.01 2.29
data required time 2.29
-----------------------------------------------------------
data required time 2.29
data arrival time -2.28
-----------------------------------------------------------
slack (MET) 0.01
12/3 向彭總匯報總結(jié)
- 對于每一行而言。line3出現(xiàn)的位置是
- 出一個ppt攒发,12月18號向Eason做一個匯報调塌,時間為1.5h-2h
12/9
12/10
在下列公式中:
M是一個系數(shù),沒有單位惠猿。k值的單位隨r值量綱變化羔砾。故這里的r如果單位是mm。那么k值的單位是/mm2.
如果r的單位是像素坐標(biāo)偶妖,那么k值的單位也是/像素坐標(biāo)平方
12/11 報告大綱
- 項目背景(目前只是實現(xiàn)了線性變換)
- 基本理論(介紹線性變換)
- 具體實現(xiàn)(模塊結(jié)構(gòu)圖)
- 具體實現(xiàn)(坐標(biāo)變換模塊)
- 具體實現(xiàn)(雙線性插值模塊)
12/15
- 我的方案需要將橢圓坐標(biāo)變換的內(nèi)容加進去
- 我的方案需要給出一個應(yīng)用于視頻流的尋址方式
- 需要做一下綜合姜凄,看看資源開銷,與其他的實現(xiàn)方案做對比
12/18
- 1.對于桶形畸變和線性變換夾雜的情況趾访。
- 2.首先得到畸變的圖案(線性畸變+桶形畸變)檀葛,因為鏡片的畸變系數(shù)(畸變中心)已知,所以可以根據(jù)公式計算出原始的線性畸變4個頂點腹缩。從而可以計算得到標(biāo)準(zhǔn)的線性變換的矩陣屿聋。
radial pipline:
stage0:
stage1:
stage2:
stage3:
stage4
12/19
補碼的乘法不能像加減法那樣直接乘。他們需要一些擴展
在verilog中藏鹊,有符號的乘法要求所有變量均是signed型變量
-
sign_c 為signed 型變量
-
sign_d不是signed型變量
12/20
coordinate_trans_module 流水線 (一個周期足矣)
input
pre_stage
stage0
interpolation module
可以將二級cache縮短為3624 bits
可以將pre_cache縮短為3524 bits
判斷是否需要啟用line3只需要5個周期的坐標(biāo)哦
1/3
img_distortion_v3 (縮短流水線長度)
stage 0
賦值sram_addr,賦值sram_web.賦值scan_en, u_src, v_src
sram_addr = addr1
sram_web = (!scan_en && flag_start) || (scan_en && u_comp)
scan_en = flag_start_公式
計算 u16 = u1 + delta_h_x *(16+4)
stage 1
賦值sram_addr = addr2
賦值addr_en=sram_web && !sram_web
賦值 addr3_line3_en = u16[FLEN]!=u1[FLEN]
賦值flag_cross
stage 2
賦值sram_addr = addr3
賦值line1_cache
stage 3
賦值line2_cache
stage 4
賦值line3_cache
賦值 pre_cache(需要sign_v)
stage 5
賦值temp_cache (需要u和u_src的差值)
stage 6
賦值pix1_a ... pix1_d ... pix4_a ... pix4_d