默认情况下,我们所有的osd都会class类型都是hdd:
# ceph osd crush class ls
[
"hdd"
]
查看当前的osd布局:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-8 0 root cache
-7 0 host 192.168.3.9-cache
-1 0.37994 root default
-2 0 host 192.168.3.9
-5 0.37994 host kolla-cloud
0 hdd 0.10999 osd.0 up 1.00000 1.00000
1 hdd 0.10999 osd.1 up 1.00000 1.00000
2 hdd 0.10999 osd.2 up 1.00000 1.00000
3 hdd 0.04999 osd.3 up 1.00000 1.00000
将osd.3从 hdd class中删除:
# ceph osd crush rm-device-class osd.3
done removing class of osd(s): 3
将这些osd.3添加至ssd class
# ceph osd crush set-device-class ssd osd.3
set osd(s) 3 to class 'ssd'
添加完成之后,我们再次查看osd布局:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-8 0 root cache
-7 0 host 192.168.3.9-cache
-1 0.37994 root default
-2 0 host 192.168.3.9
-5 0.37994 host kolla-cloud
0 hdd 0.10999 osd.0 up 1.00000 1.00000
1 hdd 0.10999 osd.1 up 1.00000 1.00000
2 hdd 0.10999 osd.2 up 1.00000 1.00000
3 ssd 0.04999 osd.3 up 1.00000 1.00000
可以看到我们osd.3的class都变为了ssd。
然后我们再次查看crush class,也多出了一个名为ssd的class:
# ceph osd crush class ls
[
"hdd",
"ssd"
]
创建一个class rule,取名为ssd_rule,使用ssd的osd:
# ceph osd crush rule create-replicated ssd_rule default host ssd
查看集群rule:
# ceph osd crush rule ls
replicated_rule
disks
ssd_rule
通过如下方式查看详细的crushmap信息:
# ceph osd getcrushmap -o crushmap
26
# crushtool -d crushmap -o crushmap.txt
# cat crushmap.txt
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class ssd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host 192.168.3.9 {
id -2 # do not change unnecessarily
id -3 class hdd # do not change unnecessarily
id -13 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
}
host kolla-cloud {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
id -14 class ssd # do not change unnecessarily
# weight 0.380
alg straw2
hash 0 # rjenkins1
item osd.2 weight 0.110
item osd.1 weight 0.110
item osd.0 weight 0.110
item osd.3 weight 0.050
}
root default {
id -1 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -15 class ssd # do not change unnecessarily
# weight 0.380
alg straw2
hash 0 # rjenkins1
item 192.168.3.9 weight 0.000
item kolla-cloud weight 0.380
}
host 192.168.3.9-cache {
id -7 # do not change unnecessarily
id -9 class hdd # do not change unnecessarily
id -11 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
}
root cache {
id -8 # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
# weight 0.000
alg straw2
hash 0 # rjenkins1
item 192.168.3.9-cache weight 0.000
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule disks {
id 1
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule ssd_rule {
id 2
type replicated
min_size 1
max_size 10
step take default class ssd
step chooseleaf firstn 0 type host
step emit
}
# end crush map
修改crushmap.txt文件中的step take default class改成 step take default class hdd
rule disks {
id 1
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
重新编译crushmap并导入进去:
# crushtool -c crushmap.txt -o crushmap.new
# ceph osd setcrushmap -i crushmap.new
创建一个基于该ssd_rule规则的存储池:
# ceph osd pool create cache 64 64 ssd_rule
pool 'cache' created
查看cache的信息可以看到使用的crush_rule为1,也就是ssd_rule
# ceph osd pool get cache crush_rule
crush_rule: ssd_rule
查看pool使用rule情况,发现pool使用crush_rule 2
# # ceph osd dump | grep -i size
pool 1 'images' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 80 lfor 0/71 flags hashpspool stripe_width 0 Application rbd
pool 2 'volumes' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 89 lfor 0/73 flags hashpspool stripe_width 0 application rbd
pool 3 'backups' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 84 lfor 0/75 flags hashpspool stripe_width 0 application rbd
pool 4 'vms' replicated size 1 min_size 1 crush_rule 1 object_hash rjenkins pg_num 64 pgp_num 64 last_change 86 lfor 0/77 flags hashpspool stripe_width 0 application rbd
pool 5 'cache' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 108 flags hashpspool stripe_width 0
缓冲池已经在一.3已经创建,pool: cache_pool,可以参考
后端存储:
# ceph osd pool create volumes2 64 64
将上面创建的cache_pool池绑定至存储池的前端,volumes即为我们的后端存储池
# ceph osd tier add volumes2 cache
pool 'cache' is now (or already was) a tier of 'volumes2'
设置缓存模式为writeback
# ceph osd tier cache-mode cache writeback
set cache-mode for pool 'cache' to writeback
将所有客户端请求从标准池引导至缓存池
# ceph osd tier set-overlay volumes2 cache
overlay for 'volumes2' is now (or already was) 'cache'
此时,我们分别查看存储池和缓存池的详情,可以看到相关的缓存配置信息:
# ceph osd dump |egrep 'volumes2|cache'
pool 5 'cache' replicated size 1 min_size 1 crush_rule 2 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 lfor 125/125 flags hashpspool,incomplete_clones tier_of 6 cache_mode writeback stripe_width 0
pool 6 'volumes2' replicated size 1 min_size 1 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 125 lfor 125/125 flags hashpspool tiers 5 read_tier 5 write_tier 5 stripe_width 0
对于生产环境的部署,目前只能使用bloom filters数据结构(看官方文档的意思,好像目前只支持这一种filter):
ceph osd pool set cache hit_set_type bloom
设置当缓存池中的数据达到多少个字节或者多少个对象时,缓存分层代理就开始从缓存池刷新对象至后端存储池并驱逐:
# 当缓存池中的数据量达到1TB时开始刷盘并驱逐
ceph osd pool set cache target_max_bytes 1099511627776
# 当缓存池中的对象个数达到100万时开始刷盘并驱逐
ceph osd pool set cache target_max_objects 10000000
定义缓存层将对象刷至存储层或者驱逐的时间:
ceph osd pool set cache cache_min_flush_age 600
ceph osd pool set cache cache_min_evict_age 600
定义当缓存池中的脏对象(被修改过的对象)占比达到多少时,缓存分层代理开始将object从缓存层刷至存储层:
# 当脏对象占比达到10%时开始刷盘
ceph osd pool set cache cache_target_dirty_ratio 0.4
# 当脏对象占比达到60%时开始高速刷盘
ceph osd pool set cache cache_target_dirty_high_ratio 0.6
当缓存池的使用量达到其总量的一定百分比时,缓存分层代理将驱逐对象以维护可用容量(达到该限制时,就认为缓存池满了),此时会将未修改的(干净的)对象刷盘:
ceph osd pool set cache cache_target_full_ratio 0.8
配置好缓存池以后,我们可以先将其驱逐对象的最小时间设置为60s:
ceph osd pool set cache cache_min_evict_age 60
ceph osd pool set cache cache_min_flush_age 60
定义当缓存池中的脏对象(被修改过的对象)占比达到千分之一,缓存分层代理开始将object从缓存层刷至存储层:
ceph osd pool set cache cache_target_dirty_ratio 0.001
然后,我们往存储池中写一个数据
rados -p volumes put test MySQL-community-client-5.7.31-1.el7.x86_64.rpm
查看存储池,这时应该无法查看到该数据,查看缓存池,则可以看到数据存储在缓存池中:
rados -p volumes2 ls |grep test
rados -p cache ls |grep test
等60s之后,数据刷盘,此时即可在存储池中看到该数据,则缓存池中,该数据即被驱逐。
需要说明的是,根据缓存池类型的不同,删除缓存池的方法也不同。
由于只读缓存不具有修改的数据,因此可以直接禁用并删除它,而不会丢失任何最近对缓存中的对象的更改。
将缓存模式个性为none以禁用缓存:
ceph osd tier cache-mode cache none
删除缓存池:
# 解除绑定
ceph osd tier remove cephfs_data cache
由于回写缓存可能具有修改的数据,所以必须采取措施以确保在禁用和删除缓存前,不丢失缓存中对象的最近的任何更改。
将缓存模式更改为转发,以便新的和修改的对象刷新至后端存储池:
ceph osd tier cache-mode cache forward
查看缓存池以确保所有的对象都被刷新(这可能需要点时间):
rados -p cache ls
如果缓存池中仍然有对象,也可以手动刷新:
rados -p cache cache-flush-evict-all
删除覆盖层,以使客户端不再将流量引导至缓存:
ceph osd tier remove-overlay cephfs_data
解除存储池与缓存池的绑定:
ceph osd tier remove cephfs_data cache
ceph osd pool application enable sata-pool rbd
https://www.cnblogs.com/breezey/p/11080532.html
https://my.oschina.net/hanhanztj/blog/515410