作者:Maxwell Li
日期:2017/03/25
未經(jīng)作者允許,禁止轉載本文任何內容。如需轉載請留言他爸。
[TOC]
在西安出差這一段時間咪橙,對 OpenStack 的網(wǎng)絡虛擬化有了一些了解夕膀。在閱讀 《深入理解 Neutron -- OpenStack 網(wǎng)絡實現(xiàn)》 之后,對 OpenStack 進行簡單的網(wǎng)絡分析總結美侦。
環(huán)境
此篇博客使用 Ubuntu xenial Newton 虛擬部署環(huán)境产舞,一個控制節(jié)點 host1,兩個存儲節(jié)點 host2 host3菠剩,兩個計算節(jié)點 host4 host5易猫,網(wǎng)絡節(jié)點與控制節(jié)點部署在一起准颓,本文不討論存儲節(jié)點的網(wǎng)絡配置怜跑。環(huán)境上建立了 ext-net 網(wǎng)絡和 demo-net 網(wǎng)絡妆艘,利用 demo-net 起了三個實例批旺,并且分配了 floating ip搏熄。demo1 demo3 在 host4 上鞋囊,demo2 在 host5 上译株。
虛擬部署網(wǎng)絡結構如下圖所示:
基本信息如下:
root@host1:~# nova list
+--------------------------------------+-------+--------+------------+-------------+---------------------------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+--------+------------+-------------+---------------------------------------+
| d88726c6-99a2-4d73-b041-7366aed31d98 | demo1 | ACTIVE | - | Running | demo-net=10.10.10.10, 192.168.116.224 |
| ded0d9d9-6739-41ba-b43c-599af217ad2d | demo2 | ACTIVE | - | Running | demo-net=10.10.10.12, 192.168.116.233 |
| ddcadd1f-ad65-41c4-aeab-5c54b4c61675 | demo3 | ACTIVE | - | Running | demo-net=10.10.10.13, 192.168.116.226 |
+--------------------------------------+-------+--------+------------+-------------+---------------------------------------+
root@host1:~# neutron net-list
+--------------------------------------+----------+-------------------------------------------------------+
| id | name | subnets |
+--------------------------------------+----------+-------------------------------------------------------+
| 773bcf25-d146-41ac-b4b5-d6b3c8bf65d8 | ext-net | 0553d050-379f-42fd-a11b-1d229913e563 192.168.116.0/24 |
| d60e4d79-bc10-4470-9434-297ace28ca84 | demo-net | 4be2b598-a9aa-40df-af3b-812a1df0bf80 10.10.10.0/24 |
+--------------------------------------+----------+-------------------------------------------------------+
root@host1:~# neutron subnet-list
+--------------------------------------+-------------+------------------+--------------------------------------------------------+
| id | name | cidr | allocation_pools |
+--------------------------------------+-------------+------------------+--------------------------------------------------------+
| 0553d050-379f-42fd-a11b-1d229913e563 | ext-subnet | 192.168.116.0/24 | {"start": "192.168.116.223", "end": "192.168.116.253"} |
| 4be2b598-a9aa-40df-af3b-812a1df0bf80 | demo-subnet | 10.10.10.0/24 | {"start": "10.10.10.2", "end": "10.10.10.254"} |
+--------------------------------------+-------------+------------------+--------------------------------------------------------+
網(wǎng)絡實現(xiàn)
OpenStack 中網(wǎng)絡實現(xiàn)包括 VLAN蚤氏、GRE棵里、VXLAN 等模式骏掀,Compass4NFV 部署的 OpenStack 網(wǎng)絡實現(xiàn)使用 VXLAN 模式际度,其余模式也類似吵取∑す伲基本結構如下圖所示:
計算節(jié)點
計算節(jié)點主要包含兩個 ovs 網(wǎng)橋:集成網(wǎng)橋 br-int绵疲、隧道網(wǎng)橋 br-tun哲鸳,以及每個實例都會有自己的 linux 網(wǎng)橋 qbr 主要作為安全組使用。
qbr
通過對應實例的 dumpxml 可以找到實例連接到的 linux 網(wǎng)橋盔憨。以 demo2 為例:
root@host1:~# nova show demo2 | grep instance_name
| OS-EXT-SRV-ATTR:instance_name | instance-00000002 |
root@host5:~# virsh dumpxml instance-00000002
...
<interface type='bridge'>
<mac address='fa:16:3e:50:09:42'/>
<source bridge='qbrbb09acdf-a4'/>
<target dev='tapbb09acdf-a4'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
...
root@host5:~# brctl show
bridge name bridge id STP enabled interfaces
qbrbb09acdf-a4 8000.0a6dbff5ddde no qvbbb09acdf-a4
tapbb09acdf-a4
virbr0 8000.525400f0397c yes virbr0-nic
可見 demo2 通過 tap 口連接到 qbr linux 網(wǎng)橋徙菠。而 linux 網(wǎng)橋通過 qvb 接口連接到 br-int。
br-int
集成網(wǎng)橋 br-int作為二層交換機使用郁岩,無論下面使用哪種技術實現(xiàn)虛擬化婿奔,都不會受到影響。
root@host5:~# ovs-vsctl show
...
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qvobb09acdf-a4"
tag: 1
Interface "qvobb09acdf-a4"
Port int-br-prv
Interface int-br-prv
type: patch
options: {peer=phy-br-prv}
...
可以看到 br-int 上有多個連接接口问慎,主要包括以下幾個接口:
- qvo 接口萍摊,連接 Linux 網(wǎng)橋。qvo 接口會給每個網(wǎng)絡分配一個內部 vlan 號如叼,因為這里兩個實例起在同一個網(wǎng)絡上冰木,所以 tag 值都為 1。
- patch-tun 接口笼恰,連接到 br-tun踊沸。
在 Juno 版本之前,所有流量都需要通過網(wǎng)絡節(jié)點轉發(fā)社证,這給網(wǎng)絡節(jié)點帶來了很大的壓力逼龟。因此在 Juno 版本之后啟用了 DVR (分布式路由)特性,允許東西向流量和帶有 floating ip 的南北向流量可以直接從計算節(jié)點的 br-prv 出去追葡。Compass4NFV 沒有啟用 DVR 特性腺律,關于 DVR 特性這里暫時不做展開。
root@host5:~# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0x85b3d27713d4f435, duration=5437.237s, table=0, n_packets=0, n_bytes=0, idle_age=5437, priority=10,icmp6,in_port=3,icmp_type=136 actions=resubmit(,24)
cookie=0x85b3d27713d4f435, duration=5437.232s, table=0, n_packets=1, n_bytes=42, idle_age=5430, priority=10,arp,in_port=3 actions=resubmit(,24)
cookie=0x85b3d27713d4f435, duration=19854.128s, table=0, n_packets=19999, n_bytes=1080954, idle_age=0, priority=2,in_port=1 actions=drop
cookie=0x85b3d27713d4f435, duration=5437.243s, table=0, n_packets=107, n_bytes=10709, idle_age=5420, priority=9,in_port=3 actions=resubmit(,25)
cookie=0x85b3d27713d4f435, duration=19854.886s, table=0, n_packets=84, n_bytes=9442, idle_age=5403, priority=0 actions=NORMAL
cookie=0x85b3d27713d4f435, duration=19854.884s, table=23, n_packets=0, n_bytes=0, idle_age=19854, priority=0 actions=drop
cookie=0x85b3d27713d4f435, duration=5437.240s, table=24, n_packets=0, n_bytes=0, idle_age=5437, priority=2,icmp6,in_port=3,icmp_type=136,nd_target=fe80::f816:3eff:fe50:942 actions=NORMAL
cookie=0x85b3d27713d4f435, duration=5437.235s, table=24, n_packets=1, n_bytes=42, idle_age=5430, priority=2,arp,in_port=3,arp_spa=10.10.10.12 actions=resubmit(,25)
cookie=0x85b3d27713d4f435, duration=19854.883s, table=24, n_packets=0, n_bytes=0, idle_age=19854, priority=0 actions=drop
cookie=0x85b3d27713d4f435, duration=5437.248s, table=25, n_packets=107, n_bytes=10681, idle_age=5420, priority=2,in_port=3,dl_src=fa:16:3e:50:09:42 actions=NORMAL
root@host5:~# ovs-ofctl show br-int
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000e27f47ffb148
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(int-br-prv): addr:7a:7b:34:bb:f5:4b
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(patch-tun): addr:72:cf:68:02:b0:f4
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(qvobb09acdf-a4): addr:ee:d4:71:35:31:3b
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
LOCAL(br-int): addr:e2:7f:47:ff:b1:48
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
可以看到宜肉,table0 中對 in_port=3 的包重新提交到 table24 或者 table25 之后 NORMAL匀钧,而 table23 中所有包都直接丟棄。
br-tun
root@host5:~# ovs-vsctl show
...
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port "vxlan-ac100104"
Interface "vxlan-ac100104"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.16.1.5", out_key=flow, remote_ip="172.16.1.4"}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "vxlan-ac100101"
Interface "vxlan-ac100101"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.16.1.5", out_key=flow, remote_ip="172.16.1.1"}
...
在上面的 br-tun 網(wǎng)橋中谬返,主要包括以下兩個接口:
- vxlan 接口蔫饰,向其他節(jié)點發(fā)送包時候的 vxlan 隧道接口切蟋。
- patch-int 接口淑掌,和 br-int 上的 patch-tun 端口通過一條管道連接顶捷。
隧道網(wǎng)橋 br-tun 作為虛擬化層網(wǎng)橋,br-tun 會對內部過來的網(wǎng)包進行合理甄別翰蠢,內部帶正確 vlan tag 的包過來项乒,從正確的 tunnel 丟出去;外部帶正確 tunnel 的包進來梁沧,修改成對應的內部 vlan tag 再丟進來檀何。具體規(guī)則如下圖所示:
下面針對不同的 table 進行分析:
root@host5:~# ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x2): dpid:000056e643feb343
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(patch-int): addr:42:85:98:00:79:06
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(vxlan-ac100101): addr:6a:a7:02:4c:bd:88
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(vxlan-ac100104): addr:f6:8e:f8:4f:64:7d
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-tun): addr:56:e6:43:fe:b3:43
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
table0
cookie=0xacf7091c4749e28a, duration=21993.420s, table=0, n_packets=113, n_bytes=11169, idle_age=7560, priority=1,in_port=1 actions=resubmit(,2)
cookie=0xacf7091c4749e28a, duration=21992.342s, table=0, n_packets=86, n_bytes=9210, idle_age=7569, priority=1,in_port=2 actions=resubmit(,4)
cookie=0xacf7091c4749e28a, duration=21992.039s, table=0, n_packets=18, n_bytes=2372, idle_age=7543, priority=1,in_port=3 actions=resubmit(,4)
cookie=0xacf7091c4749e28a, duration=21993.418s, table=0, n_packets=0, n_bytes=0, idle_age=21993, priority=0 actions=drop
對于 in_port=1 的包,即從 patch-int 傳進來的網(wǎng)包,提交給 table2 處理频鉴;對于 in_port=2 或者 in_port=3 的包栓辜,即從 vxlan 傳進來的網(wǎng)包,提交給 table4 處理垛孔。即 table2 處理內部 VM 的包藕甩,table4 處理來自外面 vxlan 隧道的包。
table2
cookie=0xacf7091c4749e28a, duration=21993.416s, table=2, n_packets=98, n_bytes=9495, idle_age=7569, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
cookie=0xacf7091c4749e28a, duration=21993.414s, table=2, n_packets=15, n_bytes=1674, idle_age=7560, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
對于傳入的單播包周荐,丟給 table20 處理狭莱;多播和廣播包,丟給 table22 包腋妙。
table3
cookie=0xacf7091c4749e28a, duration=21993.412s, table=3, n_packets=0, n_bytes=0, idle_age=21993, priority=0 actions=drop
丟棄所有包讯榕。
table4
cookie=0xacf7091c4749e28a, duration=7583.782s, table=4, n_packets=78, n_bytes=8954, idle_age=7543, priority=1,tun_id=0x40b actions=mod_vlan_vid:1,resubmit(,10)
cookie=0xacf7091c4749e28a, duration=21993.411s, table=4, n_packets=26, n_bytes=2628, idle_age=7586, priority=0 actions=drop
匹配 tunnel 號,添加對應的 vlan tag愚屁,然后提交給 table10。
table6
cookie=0xacf7091c4749e28a, duration=21993.409s, table=6, n_packets=0, n_bytes=0, idle_age=21993, priority=0 actions=drop
丟棄所有包集绰。
table10
cookie=0xacf7091c4749e28a, duration=21993.407s, table=10, n_packets=78, n_bytes=8954, idle_age=7543, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0xacf7091c4749e28a,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
table10 主要作用是學習從 tunnel 傳入的包谆棺,往 table20 中添加對返程包的正常轉發(fā)規(guī)則栽燕,并且通過 patch-int 丟給 br-int。table10 使用了 openvswitch 的 learn 動作碍岔,該動作能夠根據(jù)處理的流來動態(tài)修改其它表中的規(guī)則朵夏。具體規(guī)則如下:
- NXM_OF_VLAN_TCI[0..11]:匹配跟當前流同樣的 VLAN 頭仰猖,其中 NXM 是 Nicira Extensible Match 的縮寫;
- NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[]:包的目的 mac 跟當前流的源 mac 匹配鸵赫;
- load:0->NXM_OF_VLAN_TCI[]:將 vlan 號改為 0辩棒;
- load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[]:將 tunnel 號改為當前的 tunnel 號;
- output:OXM_OF_IN_PORT[]:從當前入口發(fā)出钻弄。
table20
cookie=0xacf7091c4749e28a, duration=7.784s, table=20, n_packets=23, n_bytes=3452, hard_timeout=300, idle_age=1, hard_age=1, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:fe:6e:a8 actions=load:0->NXM_OF_VLAN_TCI[],load:0x40b->NXM_NX_TUN_ID[],output:2
cookie=0xacf7091c4749e28a, duration=7.673s, table=20, n_packets=5, n_bytes=434, hard_timeout=300, idle_age=2, hard_age=2, priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:81:4a:a6 actions=load:0->NXM_OF_VLAN_TCI[],load:0x40b->NXM_NX_TUN_ID[],output:3
cookie=0xacf7091c4749e28a, duration=80448.803s, table=20, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=resubmit(,22)
前兩條規(guī)則就是從 table10 學習后的結果窘俺,之前我在 demo2 實例內 ping 另一個計算節(jié)點上的 demo1砚偶。可以看到均芽,對于 vlan tag 為 1单鹿,目標 mac 地址為 fa:16:3e:fe:6e:a6 的包仲锄,去掉 vlan tag(load:0->NXM_OF_VLAN_TCI[])儒喊,添加當時的 vxlan 號(load:0x40b->NXM_NX_TUN_ID[]),并從 tunnel 口發(fā)出去侨颈。
對于沒有學習到規(guī)則的包芯义,丟給 table22 處理扛拨。
table22
cookie=0xacf7091c4749e28a, duration=66039.186s, table=22, n_packets=12, n_bytes=1312, idle_age=7, hard_age=65534, priority=1,dl_vlan=1 actions=strip_vlan,load:0x40b->NXM_NX_TUN_ID[],output:3,output:2
cookie=0xacf7091c4749e28a, duration=80448.801s, table=22, n_packets=6, n_bytes=488, idle_age=65534, hard_age=65534, priority=0 actions=drop
table22 檢查如果 vlan tag 正確绑警,則去掉 vlan 頭后從 tunnel 扔出去。
網(wǎng)絡節(jié)點(控制節(jié)點)
網(wǎng)絡節(jié)點(Compass4NFV 將網(wǎng)絡節(jié)點和控制節(jié)點部署在一起)擔負網(wǎng)絡服務任務渴频,包括DHCP枉氮、路由和高級網(wǎng)絡服務等。一般包括三個網(wǎng)橋:br-tun楼肪、br-int 和 br-prv春叫。
br-tun
隧道網(wǎng)橋 br-tun 與計算節(jié)點類似泣港,作為虛擬化層網(wǎng)橋当纱。
root@host1:~# ovs-vsctl show
...
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "vxlan-ac100104"
Interface "vxlan-ac100104"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.16.1.1", out_key=flow, remote_ip="172.16.1.4"}
Port "vxlan-ac100105"
Interface "vxlan-ac100105"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.16.1.1", out_key=flow, remote_ip="172.16.1.5"}
...
主要包括以下兩個接口:
- vxlan 接口坡氯,與其他節(jié)點的 vxlan 端口形成 tunnel箫柳。
- patch-int 接口,連接到 br-tun库糠。
查看 br-tun 上的轉發(fā)規(guī)則:
root@host1:~# ovs-ofctl dump-flows br-tun
NXST_FLOW reply (xid=0x4):
cookie=0x946168d52f4f06d4, duration=86515.003s, table=0, n_packets=72600, n_bytes=3941480, idle_age=0, hard_age=65534, priority=1,in_port=1 actions=resubmit(,2)
cookie=0x946168d52f4f06d4, duration=86209.803s, table=0, n_packets=135, n_bytes=14633, idle_age=5763, hard_age=65534, priority=1,in_port=2 actions=resubmit(,4)
cookie=0x946168d52f4f06d4, duration=86209.495s, table=0, n_packets=224, n_bytes=22548, idle_age=28565, hard_age=65534, priority=1,in_port=3 actions=resubmit(,4)
cookie=0x946168d52f4f06d4, duration=86515s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0x946168d52f4f06d4, duration=86514.997s, table=2, n_packets=261, n_bytes=31232, idle_age=5763, hard_age=65534, priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
cookie=0x946168d52f4f06d4, duration=86514.994s, table=2, n_packets=72339, n_bytes=3910248, idle_age=0, hard_age=65534, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,22)
cookie=0x946168d52f4f06d4, duration=86514.992s, table=3, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0x946168d52f4f06d4, duration=71853.921s, table=4, n_packets=359, n_bytes=37181, idle_age=5763, hard_age=65534, priority=1,tun_id=0x40b actions=mod_vlan_vid:1,resubmit(,10)
cookie=0x946168d52f4f06d4, duration=86514.988s, table=4, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0x946168d52f4f06d4, duration=86514.986s, table=6, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0x946168d52f4f06d4, duration=86514.983s, table=10, n_packets=359, n_bytes=37181, idle_age=5763, hard_age=65534, priority=1 actions=learn(table=20,hard_timeout=300,priority=1,cookie=0x946168d52f4f06d4,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:1
cookie=0x946168d52f4f06d4, duration=86514.979s, table=20, n_packets=22, n_bytes=1868, idle_age=5769, hard_age=65534, priority=0 actions=resubmit(,22)
cookie=0x946168d52f4f06d4, duration=71853.924s, table=22, n_packets=19, n_bytes=1586, idle_age=5769, hard_age=65534, priority=1,dl_vlan=1 actions=strip_vlan,load:0x40b->NXM_NX_TUN_ID[],output:3,output:2
cookie=0x946168d52f4f06d4, duration=86514.977s, table=22, n_packets=72342, n_bytes=3910530, idle_age=0, hard_age=65534, priority=0 actions=drop
root@host1:~# ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x2): dpid:00000a4919379545
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(patch-int): addr:ce:27:3e:4f:8d:e7
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(vxlan-ac100105): addr:2a:86:c7:ba:09:c6
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(vxlan-ac100104): addr:22:25:b5:30:4b:ea
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-tun): addr:0a:49:19:37:95:45
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
轉發(fā)規(guī)則與計算節(jié)點類似窒百,這里就不展開了豫尽。
br-int
root@host1:~# ovs-vsctl show
...
Bridge br-int
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port "tapa4f7640a-a8"
tag: 1
Interface "tapa4f7640a-a8"
type: internal
Port "qr-ae8d3c38-85"
tag: 1
Interface "qr-ae8d3c38-85"
type: internal
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port int-br-prv
Interface int-br-prv
type: patch
options: {peer=phy-br-prv}
Port "qg-d68d3833-b3"
tag: 2
Interface "qg-d68d3833-b3"
type: internal
...
集成網(wǎng)橋 br-int 主要包括以下幾個接口:
- tap 接口美旧,連接到網(wǎng)絡 DHCP 服務的命名空間榴嗅。
- qr 接口,連接到路由服務的命名空間绪励。
- qg 接口疏魏,連接到 router 服務的網(wǎng)絡名字空間中,里面綁定一個路由器的外部 IP蛉腌,作為 nAT 時候的地址烙丛。另外羔味,網(wǎng)絡中的 floating IP 也放在這個網(wǎng)絡名字空間中赋元。
- patch-tun 接口,連接到 br-tun 網(wǎng)橋寒瓦。
- int-br-prv 接口杂腰,連接到 br-prv 網(wǎng)橋椅文。
其中網(wǎng)絡服務接口上會綁定內部 vlan tag皆刺,每個號對應一個網(wǎng)絡。另外漓帅,如果 br-int 和 br-prv 只在邏輯上相連忙干,則 qg 接口應該在 br-prv 上浪藻。
查看 br-int 的轉發(fā)規(guī)則爱葵,table0 對所有包進行 NORMAL反浓,table23 中是所有包直接丟棄勾习。
root@host1:~# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
cookie=0xbafffee8ff6e6051, duration=72462.496s, table=0, n_packets=72990, n_bytes=3948906, idle_age=1, hard_age=65534, priority=3,in_port=1,vlan_tci=0x0000/0x1fff actions=mod_vlan_vid:2,NORMAL
cookie=0xbafffee8ff6e6051, duration=87152.543s, table=0, n_packets=14773, n_bytes=798318, idle_age=65534, hard_age=65534, priority=2,in_port=1 actions=drop
cookie=0xbafffee8ff6e6051, duration=87153.303s, table=0, n_packets=733, n_bytes=76248, idle_age=376, hard_age=65534, priority=0 actions=NORMAL
cookie=0xbafffee8ff6e6051, duration=87153.300s, table=23, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
cookie=0xbafffee8ff6e6051, duration=87153.299s, table=24, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=0 actions=drop
br-prv
root@host1:~# ovs-vsctl show
...
Bridge br-prv
Controller "tcp:127.0.0.1:6633"
is_connected: true
fail_mode: secure
Port phy-br-prv
Interface phy-br-prv
type: patch
options: {peer=int-br-prv}
Port br-prv
Interface br-prv
type: internal
Port external
Interface external
type: internal
Port "eth1"
Interface "eth1"
...
br-prv 主要包括以下幾個接口:
- 掛載的物理接口 eth1,網(wǎng)包通過這個接口發(fā)送到外部網(wǎng)絡涂乌。
- phy-br-prv 接口湾盒,連接 br-int罚勾。
名字空間
在 Linux 中,網(wǎng)絡名字空間是一個擁有獨立網(wǎng)絡棧(網(wǎng)卡丈莺、路由轉發(fā)表缔俄、iptables)的環(huán)境器躏。常用來隔離網(wǎng)絡設備和服務登失,只有擁有同樣網(wǎng)絡名字空間的設備,才能看到彼此状婶。使用 ip net 命令查看已存在的名字空間:
root@host1:~# ip net
qrouter-f6f6ebfe-6d93-4b5f-8aea-c2c172645588
qdhcp-d60e4d79-bc10-4470-9434-297ace28ca84
DHCP 服務
root@host1:~# ip net exec qdhcp-d60e4d79-bc10-4470-9434-297ace28ca84 ip addr
...
13: tapa4f7640a-a8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1
link/ether fa:16:3e:44:48:48 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.2/24 brd 10.10.10.255 scope global tapa4f7640a-a8
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe44:4848/64 scope link
valid_lft forever preferred_lft forever
可以看到太抓,dhcp 服務的網(wǎng)絡名字空間中只有一個網(wǎng)絡接口 tapa4f7640a-a8,連接到 br-int 的 tapa4f7640a-a8 接口上逗噩。dhcp 服務通過 dnsmasq 進程來實現(xiàn),該進程綁定到 dhcp 名字空間中的 br-int 的接口上捶障∠盍叮可以查看相關的進程示绊。
root@host1:~# ps aux | grep d60e4d79-bc10-4470-9434-297ace28ca84
nobody 20089 0.0 0.0 51592 408 ? S Apr05 0:00 dnsmasq --no-hosts --no-resolv --strict-order --except-interface=lo --pid-file=/var/lib/neutron/dhcp/d60e4d79-bc10-4470-9434-297ace28ca84/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/d60e4d79-bc10-4470-9434-297ace28ca84/host --addn-hosts=/var/lib/neutron/dhcp/d60e4d79-bc10-4470-9434-297ace28ca84/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/d60e4d79-bc10-4470-9434-297ace28ca84/opts --dhcp-leasefile=/var/lib/neutron/dhcp/d60e4d79-bc10-4470-9434-297ace28ca84/leases --dhcp-match=set:ipxe,175 --bind-interfaces --interface=tapa4f7640a-a8 --dhcp-range=set:tag0,10.10.10.0,static,86400s --dhcp-option-force=option:mtu,1450 --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal
Router 服務
Router 提供跨 subnet 的互聯(lián)功能的拌禾。比如用戶的內部網(wǎng)絡中主機想要訪問外部互聯(lián)網(wǎng)的地址展哭,就需要 router 來轉發(fā)匪傍,因此役衡,所有跟外部網(wǎng)絡的流量都必須經(jīng)過 router。目前 router 的實現(xiàn)是通過 iptables 進行的泽篮。
root@host1:~# ip net exec qrouter-f6f6ebfe-6d93-4b5f-8aea-c2c172645588 ip addr
14: qr-ae8d3c38-85: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1
link/ether fa:16:3e:fe:6e:a8 brd ff:ff:ff:ff:ff:ff
inet 10.10.10.1/24 brd 10.10.10.255 scope global qr-ae8d3c38-85
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fefe:6ea8/64 scope link
valid_lft forever preferred_lft forever
15: qg-d68d3833-b3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1
link/ether fa:16:3e:7d:c9:4e brd ff:ff:ff:ff:ff:ff
inet 192.168.116.231/24 brd 192.168.116.255 scope global qg-d68d3833-b3
valid_lft forever preferred_lft forever
inet 192.168.116.224/32 brd 192.168.116.224 scope global qg-d68d3833-b3
valid_lft forever preferred_lft forever
inet 192.168.116.233/32 brd 192.168.116.233 scope global qg-d68d3833-b3
valid_lft forever preferred_lft forever
inet 192.168.116.226/32 brd 192.168.116.226 scope global qg-d68d3833-b3
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe7d:c94e/64 scope link
valid_lft forever preferred_lft forever
該名字空間包含兩個接口:
- qr-ae8d3c38-85 接口與 br-int 上的 qr 接口相連。任何從 br-int 來的尋找 10.10.10.1(租戶私有網(wǎng)段)的網(wǎng)包都會到達這個接口鞍时。
- qg-d68d3833-b3 接口與 br-int 上的 qg 接口相連逆巍。任何從外部來的網(wǎng)包锐极,詢問 192.168.116.231(默認的靜態(tài) NAT 外部地址)或 192.168.116.224(租戶申請的 floating IP 地址),都會到達這個接口肋层。
查看該名字空間的路由表:
root@host1:~# ip net exec qrouter-f6f6ebfe-6d93-4b5f-8aea-c2c172645588 ip route
default via 192.168.116.1 dev qg-d68d3833-b3
10.10.10.0/24 dev qr-ae8d3c38-85 proto kernel scope link src 10.10.10.1
192.168.116.0/24 dev qg-d68d3833-b3 proto kernel scope link src 192.168.116.231
默認情況以及訪問外部網(wǎng)絡的時候栋猖,網(wǎng)包會從 qg-d68d3833-b3 接口發(fā)出蒲拉,經(jīng)過 br-int 傳輸?shù)?br-prv 發(fā)布到外網(wǎng)。而訪問租戶內網(wǎng)的時候爆班,會從 qr-ae8d3c38-85 接口發(fā)出柿菩,發(fā)送給 br-int雨涛。
其中 SNAT 和 DNAT 規(guī)則完成外部 floating ip(192.168.116.*)到內部 ip(10.10.10.*) 的映射:
root@host1:~# ip netns exec qrouter-f6f6ebfe-6d93-4b5f-8aea-c2c172645588 iptables -t nat -S
...
-A neutron-l3-agent-OUTPUT -d 192.168.116.233/32 -j DNAT --to-destination 10.10.10.12
-A neutron-l3-agent-OUTPUT -d 192.168.116.224/32 -j DNAT --to-destination 10.10.10.10
-A neutron-l3-agent-OUTPUT -d 192.168.116.226/32 -j DNAT --to-destination 10.10.10.13
-A neutron-l3-agent-PREROUTING -d 192.168.116.233/32 -j DNAT --to-destination 10.10.10.12
-A neutron-l3-agent-PREROUTING -d 192.168.116.224/32 -j DNAT --to-destination 10.10.10.10
-A neutron-l3-agent-PREROUTING -d 192.168.116.226/32 -j DNAT --to-destination 10.10.10.13
-A neutron-l3-agent-float-snat -s 10.10.10.12/32 -j SNAT --to-source 192.168.116.233
-A neutron-l3-agent-float-snat -s 10.10.10.10/32 -j SNAT --to-source 192.168.116.224
-A neutron-l3-agent-float-snat -s 10.10.10.13/32 -j SNAT --to-source 192.168.116.226
...
另外有一條 SNAT 規(guī)則把所有其他從 qg-d68d3833-b3 口出來的流量都映射到外部 IP 192.168.116.231凉泄。這樣即使在內部虛擬機沒有外部IP的情況下蚯根,也可以發(fā)起對外網(wǎng)的訪問颅拦。
root@host1:~# ip netns exec qrouter-f6f6ebfe-6d93-4b5f-8aea-c2c172645588 iptables -t nat -S
...
-A neutron-l3-agent-snat -o qg-d68d3833-b3 -j SNAT --to-source 192.168.116.231
...