DOCKER 網(wǎng)絡(luò)基礎(chǔ)
網(wǎng)絡(luò)命名空間(linux net namespace)
linux 內(nèi)核支持(net namespace)以支持網(wǎng)絡(luò)協(xié)議棧的多個(gè)實(shí)例睦尽。不同命名空間內(nèi)的網(wǎng)絡(luò)棧是完全隔離的豆村。docker 使用 net namespace 進(jìn)行網(wǎng)絡(luò)層面的隔離。
每個(gè)網(wǎng)絡(luò)堆棧中有一個(gè)獨(dú)立的路由表骂删,及獨(dú)立的iptables 用來(lái)設(shè)置路由表掌动。
網(wǎng)絡(luò)命名空間中,包含宁玫,進(jìn)程粗恢,套接字,網(wǎng)絡(luò)設(shè)備等元素欧瘪,所有元素都必須屬于一個(gè)命名空間眷射。
網(wǎng)絡(luò)命名空間的實(shí)現(xiàn)——將網(wǎng)絡(luò)協(xié)議棧的變量,放入到namespace中佛掖,成為namespace的私有變量妖碉,然后為協(xié)議棧的調(diào)用添加namespace參數(shù)。
命名空間的結(jié)構(gòu)
在非默認(rèn)命名空間創(chuàng)建時(shí)芥被,指由lo 回環(huán)設(shè)備欧宜,且默認(rèn)時(shí)停止的,其他設(shè)備都需要自己創(chuàng)建拴魄,這項(xiàng)工作在docker中有docker daemon來(lái)執(zhí)行冗茸。
創(chuàng)建namespace的命令
ip netns add <ns-name>
在命名空間中執(zhí)行命令
ip netns exec <ns-name> <command>
物理設(shè)備只能管理難道root 命名空間中席镀,虛擬設(shè)備可以在命名空間中移動(dòng)。
查看設(shè)備是否可以轉(zhuǎn)移
# ethtool -k br0
netns-local: on [fixed]
每個(gè)設(shè)備都有一個(gè)netns-local(NETIF_F_LOCAL)屬性夏漱,如果為on 表示不能轉(zhuǎn)移
Veth 設(shè)備對(duì)
veth 設(shè)備對(duì) 用于不同命名空間之間的通信豪诲。
操作實(shí)踐
添加兩個(gè)namespace ns1 和 ns2
root@rancherk8sn1:~# ip netns add ns1
root@rancherk8sn1:~# ip netns add ns2
root@rancherk8sn1:~# ip netns
ns2
ns1
創(chuàng)建Veth設(shè)備對(duì)
root@rancherk8sn1:~# ip link add veth0 type veth peer name veth1
root@rancherk8sn1:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:71:7d:82 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:a1:e0:50:32 brd ff:ff:ff:ff:ff:ff
4: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 2a:95:ac:e4:5f:56 brd ff:ff:ff:ff:ff:ff
5: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ca:4c:69:0c:d1:76 brd ff:ff:ff:ff:ff:ff
將Veth1 移動(dòng)到ns1 命名空間中, 可以看到veth1 已經(jīng)看不到了
root@rancherk8sn1:~# ip link set veth1 netns ns1
root@rancherk8sn1:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 00:0c:29:71:7d:82 brd ff:ff:ff:ff:ff:ff
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
link/ether 02:42:a1:e0:50:32 brd ff:ff:ff:ff:ff:ff
5: veth0@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ca:4c:69:0c:d1:76 brd ff:ff:ff:ff:ff:ff link-netnsid 0
在ns1 中查看設(shè)備,可以看到veth1
root@rancherk8sn1:~# ip netns exec ns1 ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: veth1@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 2a:95:ac:e4:5f:56 brd ff:ff:ff:ff:ff:ff link-netnsid 0
將 veth0 移動(dòng)到 ns2
root@rancherk8sn1:~# ip link set veth0 netns ns2
root@rancherk8sn1:~# ip netns exec ns2 ip link show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth0@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether ca:4c:69:0c:d1:76 brd ff:ff:ff:ff:ff:ff link-netnsid 0
在docker中 veth對(duì)的設(shè)備被改名為eth0
給兩個(gè)命名空間的的veth 對(duì)設(shè)備分配IP 地址
root@rancherk8sn1:~# ip netns exec ns2 ip addr add 10.1.1.5/24 dev veth0
root@rancherk8sn1:~# ip netns exec ns1 ip addr add 10.1.1.3/24 dev veth1
root@rancherk8sn1:~# ip netns exec ns1 ip link set dev veth1 up
root@rancherk8sn1:~# ip netns exec ns2 ip link set dev veth0 up
此時(shí)兩個(gè)namespace之間就通了
root@rancherk8sn1:~# ip netns exec ns2 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
5: veth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ca:4c:69:0c:d1:76 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.1.1.5/24 scope global veth0
valid_lft forever preferred_lft forever
inet6 fe80::c84c:69ff:fe0c:d176/64 scope link
valid_lft forever preferred_lft forever
root@rancherk8sn1:~# ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: veth1@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2a:95:ac:e4:5f:56 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet 10.1.1.1/24 scope global veth1
valid_lft forever preferred_lft forever
inet 10.1.1.3/24 scope global secondary veth1
valid_lft forever preferred_lft forever
inet6 fe80::2895:acff:fee4:5f56/64 scope link
valid_lft forever preferred_lft forever
root@rancherk8sn1:~# ip netns exec ns1 ping 10.1.1.5
PING 10.1.1.5 (10.1.1.5) 56(84) bytes of data.
64 bytes from 10.1.1.5: icmp_seq=1 ttl=64 time=0.017 ms
64 bytes from 10.1.1.5: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 10.1.1.5: icmp_seq=3 ttl=64 time=0.038 ms
64 bytes from 10.1.1.5: icmp_seq=4 ttl=64 time=0.032 ms
Veth 如何查看對(duì)端
通過(guò)ethtool 查看對(duì)端
root@rancherk8sn1:~# ip netns exec ns1 ethtool -S veth1
NIC statistics:
peer_ifindex: 5 # 查看對(duì)端序號(hào)
root@rancherk8sn1:~# ip netns exec ns2 ip link show | grep 5
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
5: veth0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
# veth0@if4 序號(hào)為5
網(wǎng)橋(bridge)
網(wǎng)橋是一個(gè)2層虛擬設(shè)備,把若干個(gè)網(wǎng)絡(luò)接口連接起來(lái),使之能夠相互通信。網(wǎng)橋能夠?qū)W習(xí)mac地址挂绰,存入自己的mac表中屎篱。linux內(nèi)核支持支持網(wǎng)口橋接,與交換機(jī)不同的時(shí)候葵蒂,交互及對(duì)于報(bào)文芳室,要么轉(zhuǎn)發(fā),要么丟棄刹勃,而linux內(nèi)核本事是一臺(tái)主機(jī)堪侯,有可能消化報(bào)文,將報(bào)文發(fā)送到上層協(xié)議棧荔仁。所以linux網(wǎng)橋可以看做一個(gè)2層設(shè)備伍宦,也可以看做一個(gè)3層設(shè)備。
linux 網(wǎng)橋的實(shí)現(xiàn)
linux通過(guò)虛擬網(wǎng)橋設(shè)備(net device)來(lái)實(shí)現(xiàn)的,虛擬網(wǎng)橋可以綁定多個(gè)以太網(wǎng)接口設(shè)備.
網(wǎng)橋操作命令
brctl addbr br0 #添加網(wǎng)橋
brctl addif eth4 br0 #為網(wǎng)橋添加網(wǎng)口
ifconfig eth4 0.0.0.0 #網(wǎng)橋的物理卡作為一個(gè)網(wǎng)口,工作在鏈路層,不再需要IP
ifconfig br0 192.168.10.10 #給網(wǎng)橋配置IP
iptables 和netfilter
netfilter 負(fù)責(zé)在內(nèi)核中執(zhí)行各種掛接的規(guī)則,運(yùn)行在內(nèi)核模式中,iptables 運(yùn)行在用戶模式中,負(fù)責(zé)維護(hù)規(guī)則表.
netfilter的結(jié)構(gòu)
docker 網(wǎng)絡(luò)實(shí)現(xiàn)
docker 網(wǎng)絡(luò)模式
- host 模式 使用--net=host 指定, 使用host模式的容器可以直接使用docker host的IP地址與外界通信乏梁,容器內(nèi)部的服務(wù)端口也可以使用宿主機(jī)的端口次洼,不需要進(jìn)行NAT,host最大的優(yōu)勢(shì)就是網(wǎng)絡(luò)性能比較好遇骑,但是docker host上已經(jīng)使用的端口就不能再用了卖毁,網(wǎng)絡(luò)的隔離性不好。
- container 模式: --net=container:NAME OR ID 指定, 這個(gè)模式在創(chuàng)建新的容器的時(shí)候指定容器的網(wǎng)絡(luò)和一個(gè)已經(jīng)存在的容器共享一個(gè)Network Namespace落萎,但是并不為docker容器進(jìn)行任何網(wǎng)絡(luò)配置亥啦,這個(gè)docker容器沒(méi)有網(wǎng)卡、IP练链、路由等信息翔脱,需要手動(dòng)的去為docker容器添加網(wǎng)卡、配置IP等媒鼓。
- none 模式: 使用 --net=none 指定, 這種網(wǎng)絡(luò)模式下容器只有l(wèi)o回環(huán)網(wǎng)絡(luò)届吁,沒(méi)有其他網(wǎng)卡。none網(wǎng)絡(luò)可以在容器創(chuàng)建時(shí)通過(guò)--network=none來(lái)指定绿鸣。這種類型的網(wǎng)絡(luò)沒(méi)有辦法聯(lián)網(wǎng)疚沐,封閉的網(wǎng)絡(luò)能很好的保證容器的安全性。
- bridge 模式: 使用--net = bridge 指定, 創(chuàng)建一個(gè)容器之后一個(gè)新的網(wǎng)絡(luò)接口被掛載到了docker0上潮模,這個(gè)就是容器創(chuàng)建時(shí)創(chuàng)建的虛擬網(wǎng)卡亮蛔。bridge模式為容器創(chuàng)建獨(dú)立的網(wǎng)絡(luò)棧,保證容器內(nèi)的進(jìn)程使用獨(dú)立的網(wǎng)絡(luò)環(huán)境再登,使容器之間尔邓,容器和docker host之間實(shí)現(xiàn)網(wǎng)絡(luò)隔離晾剖。
- user-defined模式: 用戶自定義模式主要可選的有三種網(wǎng)絡(luò)驅(qū)動(dòng):bridge锉矢、overlay梯嗽、macvlan。bridge驅(qū)動(dòng)用于創(chuàng)建類似于前面提到的bridge網(wǎng)絡(luò)沽损;overlay和macvlan驅(qū)動(dòng)用于創(chuàng)建跨主機(jī)的網(wǎng)絡(luò)灯节。
BRIDGE 模式的網(wǎng)絡(luò)模型
docker daemon 啟動(dòng)是會(huì)創(chuàng)建docker0 網(wǎng)橋.
IP 1 的IP 地址默認(rèn)為172.17.0.1/24 , IP 2 會(huì)會(huì)在IP1 的網(wǎng)段中選擇一個(gè)沒(méi)有使用的地址. ip3 為主機(jī)網(wǎng)卡地址
容器中的eth0 與 網(wǎng)橋上的ethX 為一對(duì)VETH 設(shè)備.
我們啟動(dòng)一個(gè)busybox 來(lái)看下docker 容器內(nèi)的網(wǎng)絡(luò)設(shè)置, eth0@if17 即為一個(gè)veth 設(shè)備
root@rancherk8sn1:~# docker run -it busybox
/ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
16: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
valid_lft forever preferred_lft forever
再看一下宿主機(jī)的網(wǎng)絡(luò)設(shè)備
root@rancherk8sn1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:71:7d:82 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.14/24 brd 192.168.10.255 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe71:7d82/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 02:42:96:20:83:69 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:96ff:fe20:8369/64 scope link
valid_lft forever preferred_lft forever
17: vethdc57d12@if16: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
link/ether 1a:26:a7:36:d4:60 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::1826:a7ff:fe36:d460/64 scope link
valid_lft forever preferred_lft forever
root@rancherk8sn1:~# brctl show
bridge name bridge id STP enabled interfaces
docker0 8000.024296208369 no vethdc57d12
可以看docker0 為一個(gè)bridge 設(shè)備, vethdc57d12@if16 為一個(gè)veth 對(duì)的一端,掛在docker0 上.
啟動(dòng)busybox后看netfilter 內(nèi)容(無(wú)端口映射)
root@rancherk8sn1:~# iptables-save
# Generated by iptables-save v1.6.1 on Mon Jul 8 06:46:13 2019
*nat
:PREROUTING ACCEPT [8:520]
:INPUT ACCEPT [8:520]
:OUTPUT ACCEPT [305:21540]
:POSTROUTING ACCEPT [305:21540]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
COMMIT
# Completed on Mon Jul 8 06:46:13 2019
# Generated by iptables-save v1.6.1 on Mon Jul 8 06:46:13 2019
*filter
:INPUT ACCEPT [65604:236051607]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [59111:2530704]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
root@rancherk8sn1:~# ip route
default via 192.168.10.2 dev ens33 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.10.0/24 dev ens33 proto kernel scope link src 192.168.10.14
可以看到docker daemon 已經(jīng)向 netfilter 寫(xiě)入了規(guī)則.
我們?cè)賮?lái)看一個(gè)有端口映射的例子
docker run -d -p 80:80 nginx
再次查看netfilter 表以及路由
root@rancherk8sn1:~# iptables-save
# Generated by iptables-save v1.6.1 on Mon Jul 8 07:04:16 2019
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [1:76]
:POSTROUTING ACCEPT [1:76]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A DOCKER -i docker0 -j RETURN
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.2:80
COMMIT
# Completed on Mon Jul 8 07:04:16 2019
# Generated by iptables-save v1.6.1 on Mon Jul 8 07:04:16 2019
*filter
:INPUT ACCEPT [288:17512]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [212:20980]
:DOCKER - [0:0]
:DOCKER-ISOLATION-STAGE-1 - [0:0]
:DOCKER-ISOLATION-STAGE-2 - [0:0]
:DOCKER-USER - [0:0]
-A FORWARD -j DOCKER-USER
-A FORWARD -j DOCKER-ISOLATION-STAGE-1
-A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -o docker0 -j DOCKER
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
-A FORWARD -i docker0 -o docker0 -j ACCEPT
-A DOCKER -d 172.17.0.2/32 ! -i docker0 -o docker0 -p tcp -m tcp --dport 80 -j ACCEPT
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
-A DOCKER-USER -j RETURN
COMMIT
# Completed on Mon Jul 8 07:04:16 2019
root@rancherk8sn1:~# ip route
default via 192.168.10.2 dev ens33 proto static
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.10.0/24 dev ens33 proto kernel scope link src 192.168.10.14
可以看到nat規(guī)則中添加了 -A POSTROUTING -s 172.17.0.2/32 -d 172.17.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
以及 -A DOCKER ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.2:80
兩條規(guī)則, 路由沒(méi)有變化.
如果文章對(duì)您有幫助,請(qǐng)點(diǎn)一下下面的 "喜歡"