強(qiáng)制刪除monitoring命名空間
kubectl get namespace monitoring -o json \
| tr -d "\n" | sed "s/\"finalizers\": \[[^]]\+\]/\"finalizers\": []/" \
| kubectl replace --raw /api/v1/namespaces/monitoring/finalize -f -
報(bào)錯(cuò)1
[root@master ~]# kubectl get nodes
The connection to the server 10.0.0.10:6443 was refused - did you specify the right host or port?
解決辦法
[root@master ~]# kubectl get nodes
The connection to the server 10.0.0.10:6443 was refused - did you specify the right host or port?
[root@master ~]# kubeadm init
I0320 21:32:41.509087 5017 version.go:252] remote version is much newer: v1.23.5; falling back to: stable-1.19
W0320 21:32:42.543039 5017 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.16
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
[root@master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
master Ready master 9d v1.19.3
node1 Ready node 9d v1.19.3
node2 Ready node 9d v1.19.3
prometheus安裝部署
1.comfigmap 存儲(chǔ)配置文件
2.RBAC 權(quán)限
3.pv/pvc 存儲(chǔ)位置
4.deployment 安裝
5.service 訪問(wèn)它
6.ingress 網(wǎng)頁(yè)打開它
首先是創(chuàng)建一個(gè)prom的命名空間
01prom-na.yaml
[root@master ~/k8s_yml/prom/prom]# cat 01prom-na.yaml
apiVersion: v1
kind: Namespace
metadata:
name: prom
1.先配置configmap資源
02prom-cm.yml
[root@master ~/k8s_yml/prom/prom]# cat 02prom-cm.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: prom
data:
prometheus.yml: |
global: #全局配置
scrape_interval: 15s #抓取數(shù)據(jù)間隔
scrape_timeout: 15s #抓取超時(shí)時(shí)間
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager:9093"]
rule_files:
- /etc/prometheus/rules.yml
scrape_configs: #拉取配置
- job_name: 'prometheus' #任務(wù)名稱
static_configs: #靜態(tài)配置,寫死的
- targets: ['localhost:9090'] #抓取數(shù)據(jù)節(jié)點(diǎn)的IP端口剿骨,prometheus自己運(yùn)行各項(xiàng)情況
- job_name: 'coredns'
static_configs:
- targets: ['10.2.0.4:9153','10.2.0.5:9153']
- job_name: 'mysql'
static_configs:
- targets: ['mysql-svc.default:9104']
- job_name: 'nodes'
kubernetes_sd_configs: #k8s自動(dòng)服務(wù)發(fā)現(xiàn)
- role: node #自動(dòng)發(fā)現(xiàn)類型為node
relabel_configs:
- action: replace
source_labels: ['__address__'] #需要修改的源標(biāo)簽
regex: '(.*):10250' #正則表達(dá)式(10.0.0.10):10250
replacement: '${1}:9100' #替換后的內(nèi)容10.0.0.10:9100
target_label: __address__ #將替換后的內(nèi)容覆蓋原來(lái)的標(biāo)簽
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_node_name]
regex: (.*)
replacement: /metrics/cadvisor
target_label: __metrics_path__
- job_name: 'apiservers-endpoints'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_component]
regex: apiserver
action: keep
- job_name: 'pods'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
rules.yml: |
groups:
- name: test-node-mem
rules:
- alert: NodeMemoryUsage
expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 20
for: 2m
labels:
team: node
annotations:
summary: "{{$labels.instance}}: High Memory usage detected"
description: "{{$labels.instance}}: Memory usage is above 20% (current value is: {{ $value }}"
創(chuàng)建RBAC,因?yàn)镻rometheus有可能要跨命名空間
03prom-rbac.yml
apiVersion: v1
kind: ServiceAccount #給pod使用
metadata:
name: prometheus
namespace: prom
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole #給跨命名空間使用
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: prom
04創(chuàng)建pv/pvc
04prom-pv-pvc.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-local
labels:
app: prometheus
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
storageClassName: local-storage
local:
path: /data/k8s/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node2
persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
namespace: prom
spec:
selector:
matchLabels:
app: prometheus
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: local-storage
創(chuàng)建deployment
05prom-dp.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: prom
labels:
app: prometheus
spec:
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus #引用RBAC創(chuàng)建的serviceAccount
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-data
- name: config-volume
configMap:
name: prometheus-config
initContainers: #初始化容器
- name: fix-permissions
image: busybox
volumeMounts:
- name: data
mountPath: /prometheus
command: [chown, -R, "nobody:nobody", /prometheus]
containers:
- name: prometheus
image: prom/prometheus:v2.24.1
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 200m
memory: 512Mi
ports:
- name: http
containerPort: 9090
args:
- "--config.file=/etc/prometheus/prometheus.yml" #指定配置文件
- "--storage.tsdb.path=/prometheus" #tsdb數(shù)據(jù)庫(kù)保留路徑
- "--storage.tsdb.retention.time=24h" #數(shù)據(jù)保留時(shí)間,默認(rèn)15天
- "--web.enable-admin-api" #控制對(duì)admin http api的訪問(wèn)
- "--web.enable-lifecycle" #支持熱更新,直接執(zhí)行l(wèi)ocalhost:9090/-/reload立即生效
volumeMounts:
- name: config-volume
mountPath: "/etc/prometheus"
- name: data
mountPath: "/prometheus"
配置service資源
06prom-svc.yml
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: prom
labels:
app: prometheus
spec:
selector:
app: prometheus
ports:
- name: web
port: 9090
targetPort: http
配置ingress資源
07prom-ingress.yml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus
namespace: prom
labels:
app: prometheus
spec:
rules:
- host: prom.k8s.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: prometheus
port:
number: 9090
注意要安裝ingress
電腦端配置host解析
依次按順序執(zhí)行yml
記得再node1和node2創(chuàng)建/prometheus目錄
1.查看coredns的指標(biāo)端口
kubectl -n kube-system describe cm coredns
2.查看coredns的POD地址
kubectl -n kube-system get pod -o wide
3.訪問(wèn)coredns pod的指標(biāo)接口
curl 10.2.0.2:9153/metrics
4.編輯prometheus配置文件怜奖,添加靜態(tài)配置
- job_name: 'coredns'
static_configs:- targets: ['10.2.0.2:9153', '10.2.1.61:9153']
5.更新cm
kubectl apply -f 02prom-cm.yml
6.prometheus熱更新配置
kubectl -n pron get pod -o wide
curl -X POST "http://10.2.4.74:9090/-/reload"
7.在web頁(yè)面觀察是否出現(xiàn)了coredns的數(shù)據(jù)
status --> targets --> coredns
新建exporter
cat >prom-node-exporter.yaml<<'EOF'
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: prom
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
containers:
- name: node-exporter
image: prom/node-exporter:v1.1.1
args:
- --web.listen-address=$(HOSTIP):9100
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/) #忽略監(jiān)控的磁盤信息
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$ #忽略監(jiān)控的文件系統(tǒng)
ports:
- containerPort: 9100
env:
- name: HOSTIP
valueFrom:
fieldRef:
fieldPath: status.hostIP
resources:
requests:
cpu: 150m
memory: 180Mi
limits:
cpu: 150m
memory: 180Mi
securityContext:
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
tolerations:
- operator: "Exists"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
EOF
新建granfa
cat >grafana.yml<<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: prom
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
volumes:
- name: storage
hostPath:
path: /data/k8s/grafana/
nodeSelector:
kubernetes.io/hostname: node2
securityContext:
runAsUser: 0
containers:
- name: grafana
image: grafana/grafana:7.4.3
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: grafana
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
readinessProbe:
failureThreshold: 10
httpGet:
path: /api/health
port: 3000
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 30
livenessProbe:
failureThreshold: 3
httpGet:
path: /api/health
port: 3000
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: 150m
memory: 512Mi
requests:
cpu: 150m
memory: 512Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: storage
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: prom
spec:
ports:
- port: 3000
selector:
app: grafana
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: prom
labels:
app: grafana
spec:
rules:
- host: grafana.k8s.com
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: grafana
port:
number: 3000
EOF
新建prom.sh
[root@master ~/k8s_yml/prom/prom]# cat prom.sh
#! /bin/bash
curl -X POST "http://10.1.9.237:9090/-/reload"