Ceph部署
Ceph的部署已經(jīng)有很多文章了穿铆,大致都是用ceph-deploy
這個(gè)工具實(shí)施,這里我就不累述了斋荞。我就簡單記錄下我在部署過程中遇到的坑荞雏。
**1. **SSH端口不是22造成的部署失敗。
修改本地.ssh/config目錄平酿,加入要安裝ceph集群的host即可凤优,如下:
host ceph-mon
Hostname 192.168.68.55
host ceph-osd1
Hostname 192.168.68.53
host ceph-osd2
Hostname 192.168.68.54
host ceph-mon ceph-osd1 ceph-osd2
User root
Port 8822
**2. **安裝ceph-release
$ yum install -y centos-release-ceph-jewel
**3. **通過ceph-deploy部署不同版本的ceph,我這里用的是jewel
$ ceph-deploy install --release jewel ceph-mon ceph-osd1 ceph-osd2
**4. **切換國內(nèi)阿里ceph鏡像源
export CEPH_DEPLOY_REPO_URL=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/
export CEPH_DEPLOY_GPG_URL=https://mirrors.aliyun.com/ceph/keys/release.asc
**5. **配置文件
[global]
fsid = 6013f65b-1178-4d8b-b3b5-dfb52ce811a7
mon_initial_members = l-192168068053-mitaka
mon_host = 192.168.68.53
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 192.168.68.0/24
cluster network = 192.168.68.0/24
rbd_default_features = 3
這里有關(guān)rbd_default_features
這個(gè)配置項(xiàng)蜈彼,遇到個(gè)問題筑辨。由于不知道內(nèi)核支持的鏡像特性,所以在映射rbd的時(shí)候會報(bào)錯(cuò)幸逆,提示鏡像特性問題棍辕。查了下,本機(jī)內(nèi)核版本只支持分層还绘,所以配置3
就好了楚昭。
rbd鏡像特性
- layering: 支持分層
- striping: 支持條帶化 v2
- exclusive-lock: 支持獨(dú)占鎖
- object-map: 支持對象映射(依賴 exclusive-lock )
- fast-diff: 快速計(jì)算差異(依賴 object-map )
- deep-flatten: 支持快照扁平化操作
- journaling: 支持記錄 IO 操作(依賴獨(dú)占鎖)
Ceph 使用
**1. ** 創(chuàng)建存儲池與配額設(shè)置
創(chuàng)建
ceph osd pool create {pool-name} {pg-num} [{pgp-num}] [replicated] \
[crush-ruleset-name] [expected-num-objects]
PG_NUM
取值是強(qiáng)制性的,因?yàn)椴荒茏詣佑?jì)算拍顷。下面是幾個(gè)常用的值:
少于 5 個(gè) OSD 時(shí)可把 pg_num 設(shè)置為 128
OSD 數(shù)量在 5 到 10 個(gè)時(shí)抚太,可把 pg_num 設(shè)置為 512
OSD 數(shù)量在 10 到 50 個(gè)時(shí),可把 pg_num 設(shè)置為 4096
OSD 數(shù)量大于 50 時(shí),你得理解權(quán)衡方法尿贫、以及如何自己計(jì)算 pg_num 取值
自己計(jì)算 pg_num 取值時(shí)可借助 pgcalc 工具
隨著 OSD 數(shù)量的增加电媳,正確的 pg_num 取值變得更加重要,因?yàn)樗@著地影響著集群的行為庆亡、以及出錯(cuò)時(shí)的數(shù)據(jù)持久性(即災(zāi)難性事件導(dǎo)致數(shù)據(jù)丟失的概率)匾乓。
replicated
副本個(gè)數(shù),默認(rèn)是三份
我的設(shè)置
$ ceph osd pool create instances 128
$ ceph osd pool set instances size 2
$ ceph osd pool create volumes 128
$ ceph osd pool set volumes size 2
$ ceph osd pool create kube 128
$ ceph osd pool set kube size 2
配額
ceph osd pool set-quota {pool-name} [max_objects {obj-count}] [max_bytes {bytes}]
我的設(shè)置
ceph osd pool set-quota instances max_objects 1000000
ceph osd pool set-quota volumes max_objects 5000000
ceph osd pool set-quota kube max_objects 5000000
**2. **創(chuàng)建用戶
$ ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read class-write object_prefix rbd_children, allow rwx pool=instances, allow rwx pool=volumes' -o ceph.client.libvirt.keyring
$ ceph auth get-or-create client.kube mon 'allow r' osd 'allow class-read class-write object_prefix rbd_children, allow rwx pool=kube' -o ceph.client.kube.keyring
我這里對libvirt賬號只能擁有instances和volumes池子的所有權(quán)限、kube占用只擁有kube池子所有權(quán)限
Ceph in Libvirt
**1. **生成secret
$ cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
<usage type='ceph'>
<name>client.libvirt secret</name>
</usage>
</secret>
EOF
**2. **導(dǎo)入secret
$ virsh secret-define --file secret.xml
<uuid of secret is output here>
**3. **設(shè)置secret的UUID
virsh secret-set-value --secret {uuid of secret} --base64 $(ceph auth get-key client.libvirt)
4. 編輯虛擬機(jī)xml文件
<disk type='network' device='disk'>
<driver name='qemu' type='raw'/>
<source protocol='rbd' name='instances/kube_ceph'>
<host name='192.168.68.53' port='6789'/>
<host name='192.168.68.54' port='6789'/>
<host name='192.168.68.57' port='6789'/>
</source>
<auth username='libvirt'>
<secret type='ceph' uuid='3735f424-7724-4489-9ee5-d78066ad7fa1'/>
</auth>
<target dev='vda' bus='virtio'/>
</disk>
5. 如果要在virt-manager上的存儲池上使用Ceph又谋,則需要定義一個(gè)pool
<pool type="rbd">
<name>ceph-volumes</name>
<source>
<name>volumes</name>
<host name='192.168.68.53' port='6789'/>
<host name='192.168.68.54' port='6789'/>
<host name='192.168.68.57' port='6789'/>
<auth username='libvirt' type='ceph'>
<secret type='ceph' uuid='3735f424-7724-4489-9ee5-d78066ad7fa1'/>
</auth>
</source>
</pool>
這樣就可以在宿主機(jī)存儲池上創(chuàng)建和刪除rbd塊了
6. 虛擬機(jī)鏡像
因?yàn)镃eph鏡像支持分層拼缝,我這里創(chuàng)建虛擬機(jī)鏡像的時(shí)候只需要對原鏡的快照做clone就行了,操作如下:
導(dǎo)入鏡像
$ rbd import --image-format 2 Debian-Wheezy-7.11.raw instances/Debian-Wheezy-7.11.raw --name client.libvirt
raw 數(shù)據(jù)格式是使用 RBD 時(shí)的唯一可用 format 選項(xiàng)搂根。從技術(shù)上講,你可以使用 QEMU 支持的其他格式(例如 qcow2 或 vmdk)铃辖,但是這樣做可能會帶來額外開銷剩愧,而且在開啟緩存模式下進(jìn)行虛擬機(jī)的熱遷移時(shí)會導(dǎo)致卷的不安全性。
快照
$ rbd snap protect instances/Debian-Wheezy-7.11.raw@snapshot_backingfile --name client.libvirt
克隆鏡像
$ rbd clone instances/Debian-Wheezy-7.11.raw@snapshot_backingfile instances/kube_ceph --name client.libvirt
克隆完成后我們虛擬機(jī)就可以使用 instances/kube_ceph
塊了.
Ceph in Kubernetes
不管采用什么模式娇斩,kubelet節(jié)點(diǎn)都要安裝ceph-common
包
1. 普通模式
根據(jù)K8s官方提供的rbd文件可以簡單做功能測試仁卷。
- Ceph認(rèn)證
加密ceph秘鑰環(huán)
grep key /etc/ceph/ceph.client.kube.keyring |awk '{printf "%s", $NF}'|base64
QVFBclU4dFlGUjhLTXhBQXRGcnFiUXN2cm1hUUU1N1ZpUmpmcUE9PQ==
secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
type: "kubernetes.io/rbd"
data:
key: QVFBclU4dFlGUjhLTXhBQXRGcnFiUXN2cm1hUUU1N1ZpUmpmcUE9PQ==
- 創(chuàng)建pod
apiVersion: v1
kind: Pod
metadata:
name: rbd
spec:
containers:
- name: rbd-rw
image: nginx:1.11.10
volumeMounts:
- mountPath: "/mnt/rbd"
name: rbdpd
nodeSelector:
ceph: up #方便測試,我這里調(diào)度到label為 ceph=up的kube節(jié)點(diǎn)上
volumes:
- name: rbdpd
rbd:
monitors:
- 192.168.68.57:6789
pool: kube
image: test
user: kube
secretRef:
name: ceph-secret
fsType: ext4
2. PersistentVolume/PersistentVolumeCliam
- PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: ceph-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
rbd:
monitors:
- 192.168.68.53:6789
- 192.168.68.54:6789
- 192.168.68.57:6789
pool: kube
image: magine_test
user: kube
secretRef:
name: ceph-secret
fsType: ext4
readOnly: false
persistentVolumeReclaimPolicy: Recycle
- PersistentVoumeCliam
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
- 狀態(tài)
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
ceph-claim Bound ceph-pv1 10Gi RWX 4h
$ kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM REASON AGE
ceph-pv 10Gi RWX Recycle Bound default/ceph-claim 5h
這里看到pvc就已經(jīng)bound成功
- Pod
apiVersion: v1
kind: Pod
metadata:
name: ceph-pod
spec:
containers:
- name: ceph-pod
image: nginx:1.11.10
volumeMounts:
- mountPath: "/mnt/rbd"
name: ceph-pv-test
nodeSelector:
ceph: up
volumes:
- name: ceph-pv-test
persistentVolumeClaim:
claimName: ceph-claim
測試了下犬第,由于accessModes
采用ReadWriteMany
模式锦积,一個(gè)rbd塊是可以掛載給多個(gè)Pod使用的。
目前Kubernetes對接Ceph還沒發(fā)現(xiàn)擴(kuò)展API來幫助Pod創(chuàng)建RBD塊歉嗓,所以都需要Ceph在底層創(chuàng)建好才能使用丰介。
笨一點(diǎn)的方法是可以采用initcontainer的方式,在POD創(chuàng)建之前啟動容器鉴分,執(zhí)行命令的創(chuàng)建一個(gè)rdb塊哮幢,操作方式留在下次文檔更新。