openshift4.4 在線部署（可用于all-in-one） -- 靜態(tài)ip

本文描述openshift4.4 baremental 在線安裝方式昧旨，我的環(huán)境是 vmwamre esxi 虛擬化庐完，也適用于其他方式提供的虛擬主機(jī)或者物理機(jī)。
ocp4 注冊賬戶即可下載暢游嚣鄙。 - -磷箕！覺得好的可以買官方訂閱提供更多企業(yè)服務(wù)。
采用 1master 1 worker方式蒿柳，節(jié)約資源饶套，worker節(jié)點(diǎn)可以參照步驟，想加多少加多少垒探。
若采用 all in one 單節(jié)點(diǎn)妓蛮，master多分配些資源，不再需要部署worker圾叼，在下文mster 部署及組件部署后即完成蛤克，步驟都是一樣的捺癞。

之前做過4.3 的離線安裝，環(huán)境壞了构挤。這次方便點(diǎn)髓介，直接用在線安裝，1master 1worker筋现。
4.3離線安裝-dhcp方式唐础，之前的文檔，高可用的離線安裝矾飞。
https://github.com/cai11745/k8s-ocp-yaml/blob/master/ocp4/2020-02-25-openshift4.3-install-offline-dhcp.md

部署環(huán)境介紹

比官方多了一個(gè)base節(jié)點(diǎn)一膨，用來搭建部署需要的dns，倉庫等服務(wù)凰慈，這臺系統(tǒng)用Centos7.6汞幢，因?yàn)閏entos解決源比較方便驼鹅。

其他機(jī)器都用RHCOS微谓，就是coreos專門針對openshift的操作系統(tǒng)版本。

Machine	OS	vCPU	RAM	Storage	IP
bastion	Centos7.6	2	8GB	100 GB	192.168.2.20
bootstrap-0	RHCOS	2	4GB	100 GB	192.168.2.21
master-0	RHCOS	8	16 GB	100 GB	192.168.2.22
worker-0	RHCOS	16	32 GB	100 GB	192.168.2.23

節(jié)點(diǎn)角色：
1臺基礎(chǔ)服務(wù)節(jié)點(diǎn)输钩，用于安裝部署所需的dhcp豺型，dns，ftp服務(wù)买乃。系統(tǒng)不限姻氨。由于單master，這臺上面不用部署負(fù)載了剪验。
1臺部署引導(dǎo)節(jié)點(diǎn) Bootstrap肴焊，用于安裝openshift集群，在集群安裝完成后可以刪除功戚。系統(tǒng)RHCOS
1臺控制節(jié)點(diǎn) Control plane娶眷，即master，通常使用三臺部署高可用啸臀，etcd也部署在上面届宠。系統(tǒng)RHCOS
2臺計(jì)算節(jié)點(diǎn) Compute，用于運(yùn)行openshift基礎(chǔ)組件及應(yīng)用乘粒。系統(tǒng)RHCOS

安裝順序

順序就是先準(zhǔn)備基礎(chǔ)節(jié)點(diǎn)豌注，包括需要的dns、文件服務(wù)器灯萍、引導(dǎo)文件等轧铁，然后安裝引導(dǎo)機(jī) bootstrap，再后面就是 master旦棉，再 node

安裝準(zhǔn)備-鏡像倉庫

安裝base基礎(chǔ)組件節(jié)點(diǎn)

|base|centos7.6|4|8GB|100 GB|192.168.2.20|

安裝系統(tǒng) centos7.6 mini
設(shè)置IP属桦，設(shè)置主機(jī)名熊痴，關(guān)閉防火墻和selinux
注意所有節(jié)點(diǎn)主機(jī)名采用三級域名格式如 master1.aa.bb.com

hostnamectl set-hostname bastion.ocp4.example.com
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0
systemctl disable firewalld
systemctl stop firewalld

下載安裝文件

https://cloud.redhat.com/openshift/install/metal/user-provisioned
沒有賬號就注冊一個(gè)
下載 openshift-install-linux.tar.gz pull-secret.txt openshift-client-linux.tar.gz rhcos-4.4.3-x86_64-installer.x86_64.iso rhcos-4.4.3-x86_64-metal.x86_64.raw.gz
分別是安裝文件，鏡像拉取密鑰聂宾，openshift linux client （oc 命令）果善，RHCOS 安裝文件，都從上面同一個(gè)頁面下載系谐，如果節(jié)點(diǎn)之前有oc命令巾陕，刪掉，使用下載的最新的

安裝 openshift-install 和 oc 命令

tar -zxvf openshift-install-linux.tar.gz
chmod +x openshift-install
mv openshift-install /usr/local/bin/

tar -zxvf openshift-client-linux.tar.gz
chmod +x oc kubectl
mv oc kubectl /usr/local/bin/

配置dns server

base節(jié)點(diǎn)
worker 節(jié)點(diǎn)連接 master 都是通過域名的纪他，需要有一個(gè)dns server 負(fù)責(zé)域名解析

yum install dnsmasq -y

# 配置dnsmasq鄙煤，配置文件如下，網(wǎng)盤里也有
cd /etc/dnsmasq.d/

vi ocp4.conf
address=/api.ocp4.example.com/192.168.2.20
address=/api-int.ocp4.example.com/192.168.2.20
address=/.apps.ocp4.example.com/192.168.2.22
address=/etcd-0.ocp4.example.com/192.168.2.22
srv-host=_etcd-server-ssl._tcp.ocp4.example.com,etcd-0.ocp4.example.com,2380,10

# api 和api-int 指向base本機(jī)茶袒，本機(jī)會部署haproxy 負(fù)載到 bootstrap 和master
# .apps 這個(gè)用作應(yīng)用的泛域名解析梯刚，寫master的，一開始route 會部署到master節(jié)點(diǎn)
# etcd-0 寫master的

# 啟動服務(wù)
systemctl start dnsmasq
systemctl enable  dnsmasq

# 驗(yàn)證解析薪寓，通過nslookup 都能解析到上面對應(yīng)的ip
yum install bind-utils -y

nslookup api.ocp4.example.com 192.168.2.20
nslookup api-int.ocp4.example.com 192.168.2.20
nslookup 333.apps.ocp4.example.com 192.168.2.20
nslookup etcd-0.apps.ocp4.example.com 192.168.2.20

準(zhǔn)備安裝配置文件

base節(jié)點(diǎn)
新建一個(gè)目錄用于存放安裝配置文件亡资。目錄不要建在 /root 下，后面httpd 服務(wù)權(quán)限會有問題向叉。

mkdir /opt/install
cd /opt/install


# 編輯安裝配置文件
vi install-config.yaml


apiVersion: v1
baseDomain: example.com    #1
compute:
- hyperthreading: Enabled 
  name: worker
  replicas: 0   #2
controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 1   #3
metadata:
  name: ocp4   #4
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14   #5
    hostPrefix: 23 
  networkType: OpenShiftSDN
  serviceNetwork: 
  - 172.30.0.0/16    #5
platform:
  none: {}    #6
fips: false 
pullSecret: '{"auths": ...}'  #7
sshKey: 'ssh-ed25519 AAAA...'  #8

參數(shù)解讀：

基礎(chǔ)域名
因?yàn)閣ork部署后面是單獨(dú)執(zhí)行的锥腻，這邊寫0
單master，所以寫1
就是節(jié)點(diǎn)master/worker名稱后面一級母谎，這也是為什么主機(jī)名要用好幾級
pod ip 和service ip范圍瘦黑，需要注意不能和內(nèi)網(wǎng)已有ip范圍沖突
我們這屬于直接裸金屬安裝類別，所有不填
這是上一節(jié)從網(wǎng)頁下載的 pull-secret.txt 內(nèi)容奇唤，需要在txt 內(nèi)容兩頭加上單引號
用于后面免密登錄幸斥。 ssh-keygen -t rsa -b 2048 -N "" -f /root/.ssh/id_rsa ；cat /root/.ssh/id_rsa.pub 咬扇；內(nèi)容兩頭帶上單引號填入sshKey

備份下配置文件甲葬，必須，因?yàn)橄旅婷顖?zhí)行后這個(gè) yaml 文件就消失了

cp install-config.yaml install-config.yaml.bak.0619

# 生成kubernetes配置
# 這個(gè)--dir 是有install-config.yaml 的路徑
openshift-install create manifests --dir=/opt/install

# 生成引導(dǎo)配置
openshift-install create ignition-configs --dir=/opt/install/

文件目錄現(xiàn)在是這樣
.
├── auth
│   ├── kubeadmin-password
│   └── kubeconfig
├── bootstrap.ign
├── master.ign
├── metadata.json
└── worker.ign

部署 httpd 文件服務(wù)器

部署在base節(jié)點(diǎn)冗栗，用于openshift節(jié)點(diǎn)部署時(shí)候拉取配置文件

# 安裝httpd
yum install httpd -y

# 把/root/install 目錄軟鏈到 /var/www/html 下
mv ocp4.4/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz /opt/install/
ln -s /opt/install/ /var/www/html/install
chmod 777 -R /opt/install/

# 啟動服務(wù)
systemctl start httpd
systemctl enable httpd

瀏覽器訪問下 base節(jié)點(diǎn)演顾， http://192.168.2.20/install/ 可以查看到目錄下的文件

部署haproxy

部署在base節(jié)點(diǎn)，負(fù)載到 bootstrap 和master api 6443 端口

yum install haproxy -y
cd /etc/haproxy/
vi haproxy.cfg  #在最下面加配置文件隅居，也可以把自帶的frontend 和backend刪掉钠至，沒有用

# 可選項(xiàng),可以通過頁面查看負(fù)載監(jiān)控狀態(tài)
listen stats
    bind :9000
    mode http
    stats enable
    stats uri /
    monitor-uri /healthz

# 負(fù)載master api，bootstrap 后面刪掉
frontend openshift-api-server
    bind *:6443
    default_backend openshift-api-server
    mode tcp
    option tcplog

backend openshift-api-server
    balance source
    mode tcp
    server bootstrap 192.168.2.21:6443 check
    server master0 192.168.2.22:6443 check

frontend machine-config-server
    bind *:22623
    default_backend machine-config-server
    mode tcp
    option tcplog

backend machine-config-server
    balance source
    mode tcp
    server bootstrap 192.168.2.21:22623 check
    server master0 192.168.2.22:22623 check

啟動服務(wù)

systemctl enable haproxy && systemctl start haproxy

驗(yàn)證服務(wù)

通過瀏覽器頁面查看 IP:9000 可以看到haproxy的監(jiān)控頁面胎源，當(dāng)前后端服務(wù)還沒起棉钧，所以很多紅色的。

安裝 bootstrap

在虛擬化中按照之前的配置規(guī)劃創(chuàng)建系統(tǒng)涕蚤，使用 rhcos-4.4.3-x86_64-installer.x86_64.iso 啟動系統(tǒng)

在安裝界面 "Install RHEL CoreOS" 宪卿，按 Tab 鍵修改啟動參數(shù)的诵。
在 coreos.inst = yes 之后添加。仔細(xì)校對參數(shù)佑钾，不能粘貼

ip=192.168.2.21::192.168.2.1:255.255.255.0:bootstrap.ocp4.example.com:ens192:none nameserver=192.168.2.20 coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.2.20/install/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz coreos.inst.ignition_url=http://192.168.2.20/install/bootstrap.ign

ip=.. 對應(yīng)的參數(shù)是 ip=ipaddr::gateway:netmask:hostnameFQDN:網(wǎng)卡名稱:是否開啟dhcp

網(wǎng)卡名稱和磁盤名稱參照base節(jié)點(diǎn)西疤，一樣的命名規(guī)則，后面兩個(gè)http文件先在base節(jié)點(diǎn) wget 測試下能否下載

仔細(xì)檢查休溶，出錯(cuò)了會進(jìn)入shell界面代赁，可以排查問題。然后重啟再輸入一次

安裝完成后兽掰，從base節(jié)點(diǎn) ssh core@192.168.2.21 進(jìn)入bootstrap 節(jié)點(diǎn)

檢查下端口已經(jīng)開啟  
 ss -tulnp|grep 6443 
 ss -tulnp|grep 22623

 sudo crictl pods
 會有6芭碍，7個(gè)正常的pod

# 查看服務(wù)狀態(tài)的命令，ssh進(jìn)去的時(shí)候就會提示這條命令
journalctl -b -f -u bootkube.service

安裝 master

同上孽尽，注意ip窖壕、主機(jī)名、ign配置文件和上述不同

ip=192.168.2.22::192.168.2.1:255.255.255.0:master0.ocp4.example.com:ens192:none nameserver=192.168.2.20 coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.2.20/install/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz coreos.inst.ignition_url=http://192.168.2.20/install/master.ign

裝完后 master 的apiserver 會有問題杉女，后面處理

mkdir ~/.kube
cp /opt/install/auth/kubeconfig ~/.kube/config 

[root@bastion ~]# oc get node
NAME                       STATUS   ROLES           AGE    VERSION
master0.ocp4.example.com   Ready    master,worker   163m   v1.17.1

master 上的etcd 沒起得來瞻讽，導(dǎo)致了master的apiserver 也是異常的，需要改下etcd參數(shù)

[root@bastion ~]# oc get pod -A |grep api
openshift-kube-apiserver                                kube-apiserver-master0.ocp4.example.com                           3/4     CrashLoopBackOff   30         47s

# 改etcd
oc patch etcd cluster -p='{"spec": {"unsupportedConfigOverrides": {"useUnsupportedUnsafeNonHANonProductionUnstableEtcd": true}}}' --type=merge

# 然后master 的etcd 會被拉起來
[root@bastion ~]# oc -n openshift-etcd get pod -owide
NAME                                         READY   STATUS      RESTARTS   AGE     IP             NODE                       NOMINATED NODE   READINESS GATES
etcd-master0.ocp4.example.com                3/3     Running     0          10m     192.168.2.22   master0.ocp4.example.com   <none>           <none>

# 觀察apiserver 至恢復(fù)正常宠纯，一直不好就刪pod重啟看下
[root@bastion ~]# oc -n openshift-kube-apiserver get pod -owide |grep Running
kube-apiserver-master0.ocp4.example.com      4/4     Running     3          2m37s   192.168.2.22   master0.ocp4.example.com   <none>           <none>

master節(jié)點(diǎn)完成了

base 節(jié)點(diǎn)執(zhí)行下面命令完成master節(jié)點(diǎn)安裝

openshift-install --dir=/opt/install wait-for bootstrap-complete --log-level=debug

這個(gè)命令主要檢測master 節(jié)點(diǎn)是否正常工作卸夕，完成后會提示可以移除bootstrap

現(xiàn)在可以修改 /etc/haproxy/haproxy.cfg 移除 bootstrap 節(jié)點(diǎn)的6443 和 22623层释，然后重啟haproxy婆瓜。或者直接改dnsmasq

因?yàn)槲覀冎挥幸粋€(gè)master節(jié)點(diǎn)，或者可以直接修改dnserver配置 /etc/dnsmasq.d/ocp4.conf 的配置贡羔，將api.ocp4.example.com 和 api-int.ocp4.example.com 解析到 master節(jié)點(diǎn)IP 192.168.2.22 廉白，然后重啟dnsmasq。這種情況haproxy 服務(wù)可以關(guān)掉了乖寒。

現(xiàn)在master 的服務(wù)組件都安裝完成了猴蹂。bootstrap節(jié)點(diǎn)任務(wù)完成，可以關(guān)掉楣嘁，已經(jīng)沒用了磅轻。

/opt/install/install-config.yaml 中 worker 寫的0，所以ocp 會默認(rèn)把master節(jié)點(diǎn)打上 worker的標(biāo)簽逐虚。從 oc get node 可以看出聋溜。

安裝其他組件

由于我們的master 有worker 的標(biāo)簽，也可當(dāng)做計(jì)算節(jié)點(diǎn)叭爱。

使用 openshift-install 命令完成集群剩余組件的安裝

先處理下 etcd-quorum-guard這個(gè)組件撮躁，默認(rèn)是部署三個(gè)且用的主機(jī)網(wǎng)絡(luò)，我們需要把他改成1個(gè)买雾。

# 編輯文件把曼，寫入內(nèi)容杨帽。必須打這個(gè)patch，不然直接改副本數(shù)嗤军，還會恢復(fù)回去
[root@bastion opt]# vi etcd_quorum_guard.yaml

- op: add
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps/v1
    name: etcd-quorum-guard
    namespace: openshift-machine-config-operator
    unmanaged: true


oc patch clusterversion version --type json -p "$(cat etcd_quorum_guard.yaml)"

oc scale --replicas=1 deployment/etcd-quorum-guard -n openshift-machine-config-operator

# 修改下面這些服務(wù)副本數(shù)為1 注盈，不然后面過不去
oc scale --replicas=1 ingresscontroller/default -n openshift-ingress-operator
oc scale --replicas=1 deployment.apps/console -n openshift-console
oc scale --replicas=1 deployment.apps/downloads -n openshift-console
oc scale --replicas=1 deployment.apps/oauth-openshift -n openshift-authentication
oc scale --replicas=1 deployment.apps/packageserver -n openshift-operator-lifecycle-manager

oc scale --replicas=1 deployment.apps/prometheus-adapter -n openshift-monitoring
oc scale --replicas=1 deployment.apps/thanos-querier -n openshift-monitoring
oc scale --replicas=1 statefulset.apps/prometheus-k8s -n openshift-monitoring
oc scale --replicas=1 statefulset.apps/alertmanager-main -n openshift-monitoring


openshift-install --dir=/opt/install wait-for install-complete --log-level debug

主要檢查平臺的web 監(jiān)控等組件，完成后會提示登錄地址和密碼
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp4.example.com
INFO Login to the console with user: kubeadmin, password: JzPpM-hVUJn-o2PD7-RKtoe

查看所有pod 狀態(tài)
oc get pod -A

至此平臺已經(jīng)部署完成叙赚，組件也部署完成当凡，若采用all in one，則到此為止纠俭。

若需要繼續(xù)添加計(jì)算節(jié)點(diǎn)沿量，完成下一步驟。

安裝 worker

同上冤荆，注意ip朴则、主機(jī)名、ign配置文件和上述不同

ip=192.168.2.23::192.168.2.1:255.255.255.0:worker0.ocp4.example.com:ens192:none nameserver=192.168.2.20 coreos.inst.install_dev=sda coreos.inst.image_url=http://192.168.2.20/install/rhcos-4.4.3-x86_64-metal.x86_64.raw.gz coreos.inst.ignition_url=http://192.168.2.20/install/worker.ign

當(dāng)worker 在控制臺看到已經(jīng)部署完

在部署機(jī)執(zhí)行 oc get csr 命令钓简，查看node 節(jié)點(diǎn)加入申請乌妒，批準(zhǔn)之，然后就看到了node節(jié)點(diǎn)外邓。大功告成３肺谩！损话！

[root@bastion opt]# oc get csr
NAME        AGE   REQUESTOR                                                                   CONDITION
csr-65jnf   10m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending
csr-gqslr   25m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

[root@bastion opt]# yum install epel-release
[root@bastion opt]# yum install jq -y
[root@bastion opt]# oc get csr -ojson | jq -r '.items[] | select(.status == {} ) | .metadata.name' | xargs oc adm certificate approve

剛加進(jìn)來狀態(tài)是not ready侦啸，等一會就變成ready了
[root@bastion opt]# oc get node
NAME                       STATUS   ROLES           AGE     VERSION
master0.ocp4.example.com   Ready    master,worker   5h25m   v1.17.1
worker0.ocp4.example.com   Ready    worker          3m4s    v1.17.1

至此，整個(gè)集群部署完成丧枪，若要添加更多node節(jié)點(diǎn)光涂，重復(fù)本步驟即可。

Web console 登錄

ocp4的web console 入口走router了拧烦，所以找下域名
首先找到我們的域名忘闻，然后在我們自己電腦上 hosts添加解析，指向到router所在節(jié)點(diǎn)ip恋博，這樣就能夠訪問openshift 的web 控制臺了

[root@bastion opt]# oc get route -A |grep console-openshift
openshift-console          console             console-openshift-console.apps.ocp4.example.com                       console             https   reencrypt/Redirect     None
[root@bastion opt]# oc get pod -A -owide|grep router
openshift-ingress                                       router-default-679488d97-pt5xh                                    1/1     Running     0          21m     192.168.2.22   master0.ocp4.example.com   <none>           <none>

把這條寫入hosts
192.168.2.22 oauth-openshift.apps.ocp4.example.com console-openshift-console.apps.ocp4.example.com

然后瀏覽器訪問console
https://console-openshift-console.apps.ocp4.example.com

用戶名是 kubeadmin
密碼在這個(gè)文件里
cat /opt/install/auth/kubeadmin-password

后續(xù)需注意齐佳，若重啟worker，則router 可能會在幾臺worker漂移债沮，可以參照ocp3的做法炼吴，給某個(gè)節(jié)點(diǎn)打上infra 標(biāo)簽，再修改 router 的 nodeselector

oc -n openshift-ingress-operator get ingresscontroller/default -o yaml

參考文檔

官方文檔
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.4/html/pipelines/installing-pipelines
https://www.redhat.com/sysadmin/kubernetes-cluster-laptop

米開朗基楊
https://cloud.tencent.com/developer/article/1638330

也可以順便關(guān)注下我的github秦士，后續(xù)更新會同步到github

https://github.com/cai11745/k8s-ocp-yaml