This is Gentoo's testing wiki. It is a non-operational environment and its textual content is outdated.
Please visit our production wiki at https://wiki.gentoo.org
Ceph/Installation
All necessary Ceph software is available through the sys-cluster/ceph package. It contains all services as well as basic administration utilities for managing a Ceph cluster.
Design
Before embarking on a Ceph deployment scenario, take the time to make a basic Ceph cluster design.
What is the purpose of the Ceph cluster? Is it to play around and experiment with Ceph? Is it to host all critical data in form of rbd devices? Is it to create a highly available file server?
What features are needed on the Ceph cluster? How many monitors are likely to be needed? How much storage will be used, and how will this storage be represented (as in, how many OSDs will be available and where will they run)? Will the cluster provide S3- or Swift-like APIs to the outside world?
What are the IP addresses that will be used by the cluster? Ceph requires a static IP environment, so making a well designed network infrastructure is important for Ceph to function properly.
How will the servers be distributed across the environment? Ceph has a number of buckets that it can use to differentiate servers and make well-thought-through distribution and replication decisions. The default is an OSD on a host in a rack in a row in a room inside a data center.
There are a number of best practices to account for through:
- Most clusters require 3 monitor servers, perhaps 5. Clusters generally do not need more than 5 monitor servers to function in even the harshest environments.
- Distribute the monitor servers across the environment. If the cluster is over a couple of racks, make sure that the monitor servers are distributed across the racks as well.
- There is usually no need for RAID on the file system that an OSD uses. Instead, rely on the Ceph availability and distribution.
- OSD services do not need a lot of CPU or RAM. A metadata server however does benefit from high-speed CPU and lots of memory.
Hardware layout
The hardware specification of this example consist of three machines: host1, host2, host3, each has three harddisk, first driver (/dev/sda) for OS installation, second, third (/dev/sdb, /dev/sdb) for OSD service, Ceph Monitor will be deployed at each machine, while Metadata serive will be deployed only at host1
System configuration
The first configuration to decide on is which Ceph version to deploy. At the time of writing, Ceph version 0.87 ("Giant") is available in the tree in ~arch while version 0.80 ("Firefly") is available as stable release. To use the ~arch version, add sys-cluster/ceph to package.accept_keywords:
/etc/portage/package.accept_keywords/ceph
sys-cluster/ceph
Next, validate that the Linux kernel is configured to support Ceph.
Device Drivers ---> [*] Block devices ---> <*> Rados block device (RBD) File systems ---> [*] Network File Systems ---> <*> Ceph distributed file system
Ensure that support for extended attributes and POSIX ACL support is enabled in all file systems (such as Ext4, Btrfs, etc.) that will be used to host Ceph.
Installation
With the system configuration done, install the Ceph software.
The following USE flags are available for fine-tuning the installation.
USE flags for sys-cluster/ceph Ceph distributed filesystem
+cephfs
|
Build support for cephfs, a POSIX compatible filesystem built on top of ceph |
+mgr
|
Build the ceph-mgr daemon |
+parquet
|
Support for s3 select on parquet objects |
+radosgw
|
Add radosgw support |
+sqlite
|
Add support for sqlite - embedded sql database |
+ssl
|
Add support for SSL/TLS connections (Secure Socket Layer / Transport Layer Security) |
+system-boost
|
Use system dev-libs/boost instead of the bundled one |
+tcmalloc
|
Use the dev-util/google-perftools libraries to replace the malloc() implementation with a possibly faster one |
+uring
|
Build with support for sys-libs/liburing |
babeltrace
|
Add support for LTTng babeltrace |
custom-cflags
|
Build with user-specified CFLAGS (unsupported) |
diskprediction
|
Enable local diskprediction module to predict disk failures |
dpdk
|
Enable DPDK messaging |
fuse
|
Build fuse client |
grafana
|
Install grafana dashboards |
jaeger
|
Enable jaegertracing and it's dependent libraries |
jemalloc
|
Use dev-libs/jemalloc for memory management |
kafka
|
Rados Gateway's pubsub support for Kafka push endpoint |
kerberos
|
Add kerberos support |
ldap
|
Add LDAP support (Lightweight Directory Access Protocol) |
lttng
|
Add support for LTTng |
pmdk
|
Enable PMDK libraries |
rabbitmq
|
Use rabbitmq-c to build rgw amqp push endpoint |
rbd-rwl
|
Enable librbd persistent write back cache |
rbd-ssd
|
Enable librbd persistent write back cache for SSDs |
rdma
|
Enable RDMA support via sys-cluster/rdma-core |
rgw-lua
|
Rados Gateway's support for dynamically adding lua packagess |
selinux
|
!!internal use only!! Security Enhanced Linux support, this must be set by the selinux profile or breakage will occur |
spdk
|
Enable SPDK user-mode storage driver toolkit |
systemd
|
Enable use of systemd-specific libraries and features like socket activation or session tracking |
test
|
Enable dependencies and/or preparations necessary to run tests (usually controlled by FEATURES=test but can be toggled independently) |
xfs
|
Add xfs support |
zbd
|
Enable sys-block/libzbd bluestore backend |
With the USE flags defined, install the software:
root #
emerge --ask sys-cluster/ceph
Cluster creation
Use uuidgen to generate a cluster id.
user $
uuidgen
a0ffc974-222e-449a-a078-121bdfcb110b
Create the basic skeleton for the ceph.conf file, and use the generated id for the fsid
parameter.
/etc/ceph/ceph.conf
Global part in ceph.conf[global] fsid = a0ffc974-222e-449a-a078-121bdfcb110b cluster = ceph public network = 192.168.100.0/24 # Enable cephx authentication (which uses shared keys for almost everything) auth cluster required = cephx auth service required = cephx auth client required = cephx # Replication osd pool default size = 2 osd pool default min size = 1
In this example, a cluster is used with a replication factor of 2 (which means it is replicated once - there are two instances of each block) and a minimum of 1 (i.e. as long as one copy of the data is available, continue).
Next create the administrative key. The default administrative key is called client.admin:
root #
ceph-authtool --create-keyring /etc/ceph/ceph.client.admin.keyring --gen-key -n client.admin --set-uid=0 --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow'
root #
grep "key = " /etc/ceph/ceph.client.admin.keyring | awk '{print $3}' > /etc/ceph/ceph.client.admin.secret
Monitors
To create the monitors, first add in the information to the ceph.conf file:
/etc/ceph/ceph.conf
Snippet for monitors[mon] # Global settings for monitors mon host = host1, host2, host3 mon addr = 192.168.100.10:6789, 192.168.100.11:6789, 192.168.100.12:6789 mon initial members = 0, 1, 2 [mon.0] host = host1 mon addr = 192.168.100.10:6789 [mon.1] host = host2 mon addr = 192.168.100.11:6789 [mon.2] host = host3 mon addr = 192.168.100.12:6789
Next create the keyring for the monitor (so that the Ceph monitors can integrate and interact with the Ceph cluster) and add the administrative keyring to it:
root #
ceph-authtool --create-keyring /etc/ceph/ceph.mon.keyring --gen-key -n mon. --cap mon 'allow *' --import-keyring /etc/ceph/ceph.client.admin.keyring
Now create the initial monitor map (which is a binary file that the Ceph monitors use to find the default, initial monitor list).
root #
monmaptool --create --fsid $(uuidgen) --add 0 192.168.100.10 --add 1 192.168.100.11 --add 2 192.168.100.12 /etc/ceph/ceph.initial-monmap
Create the file system that the monitors will use to keep their information in.
root #
mkdir -p /var/log/ceph
root #
mkdir -p /var/lib/ceph/mon
root #
ceph-mon --mkfs -i 0 --monmap /etc/ceph/ceph.initial-monmap --keyring /etc/ceph/ceph.mon.keyring
Repeat this step on each system for the right id (-i 0
becomes -i 1
etc.)
Finally, create the init script to launch the monitor at boot:
root #
ln -s /etc/init.d/ceph /etc/init.d/ceph-mon.0
root #
rc-update add ceph-mon.0 default
root #
rc-service ceph-mon.0 start
Also repeat this on each system for the right id.
Object store devices
Get a UUID for the file system:
root #
uuidgen
e33dcfb0-31d5-4953-896d-007c7c295410
Create a new OSD in the cluster:
root #
ceph osd create e33dcfb0-31d5-4953-896d-007c7c295410
0
Create the mountpoint on which the data of the OSD will be stored. The {id} to use is the output of the previous command, as 0 here:
{{RootCmd|mkdir -p /var/lib/ceph/osd/ceph-{id} }}
Mount the storage that will be used for the OSD. Then, create the OSD file system on it:
root #
ceph-osd -i {id} --mkfs --mkkey --osd-uuid e33dcfb0-31d5-4953-896d-007c7c295410
Add the OSD keyring to the clusters' authentication database:
{{RootCmd|ceph auth add osd.{id} osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-{id}/keyring}}
Add the current host to the CRUSH map if it is the first OSD of this host that participates in the cluster:
root #
ceph osd crush add-bucket $(hostname) host
root #
ceph osd crush move $(hostname) root=default
Add each OSD to the map with a default weight value:
root #
ceph osd crush add osd.{id} 1.0 host=$(hostname)
Update the OSD information in /etc/ceph/ceph.conf:
/etc/ceph/ceph.conf
OSD snippet[osd] # Global defaults for OSDs osd journal size = 1024 # Needed for Ext4 file systems only filestore xattr use omap = true # choose filsystem type: ext4 or xfs osd mkfs type = ext4 # Mount option for ext4 osd mount options ext4 = user_xattr,rw,noatime # Mount option for xfs osd mount options xfs = rw,inode64 [osd.0] host = host1 devs = /dev/sdb1 [osd.1] host = host1 devs = /dev/sdc1 [osd.2] host = host2 devs = /dev/sdb1 [osd.3] host = host2 devs = /dev/sdc1 [osd.4] host = host3 devs = /dev/sdb1 [osd.5] host = host3 devs = /dev/sdc1
Create the init script for the OSD and have it start at boot:
root #
ln -s /etc/init.d/ceph /etc/init.d/ceph-osd.{id}
root #
rc-update add ceph-osd.{id} default
root #
rc-service ceph-osd.{id} start
Metadata server
Update the ceph.conf information for the MDS:
/etc/ceph/ceph.conf
MDS snippet[mds.0] host = host1
Create two pools - one for data and one for metadata. The number 128 in the example below is the number of placement groups to assign inside the pool. Tune this correctly depending on the size of the cluster (see Ceph's placement groups information).
root #
ceph osd pool create data 128
root #
ceph osd pool create metadata 128
Now create a file system that uses these pools. The name of the file system can be chosen freely - the example uses cephfs:
root #
ceph fs new cephfs metadata data
Create the keyring for the MDS service:
root #
mkdir -p /var/lib/ceph/mds/ceph-0
root #
ceph auth get-or-create mds.0 mds 'allow' osd 'allow *' mon 'allow rwx' > /var/lib/ceph/mds/ceph-0/keyring
Create the init script and have it start at boot:
root #
ln -s /etc/init.d/ceph /etc/init.d/ceph-mds.0
root #
rc-update add ceph-mds.0 default
root #
rc-service ceph-mds.0 start