In this article by Karan Singh, the author of the book Ceph Cookbook, we will see how storage space or capacity are assigned to physical or virtual servers in detail. We'll also cover the various storage formats supported by Ceph.
In this article, we will cover the following recipes:
(For more resources related to this topic, see here.)
Once you have installed and configured your Ceph storage cluster, the next task is performing storage provisioning. Storage provisioning is the process of assigning storage space or capacity to physical or virtual servers, either in the form of blocks, files, or object storages. A typical computer system or server comes with a limited local storage capacity that might not be enough for your data storage needs. Storage solutions such as Ceph provide virtually unlimited storage capacity to these servers, making them capable of storing all your data and making sure that you do not run out of space. Using a dedicated storage system instead of local storage gives you the much needed flexibility in terms of scalability, reliability, and performance.
Ceph can provision storage capacity in a unified way, which includes block, filesystem, and object storage. The following diagram shows storage formats supported by Ceph, and depending on your use case, you can select one or more storage options:
We will discuss each of these options in detail in this article, and we will focus mainly on Ceph block storage.
The RADOS Block Device (RBD), which is now known as the Ceph Block Device, provides reliable, distributed, and high performance block storage disks to clients. A RADOS block device makes use of the librbd library and stores a block of data in sequential form striped over multiple OSDs in a Ceph cluster. RBD is backed by the RADOS layer of Ceph, thus every block device is spread over multiple Ceph nodes, delivering high performance and excellent reliability. RBD has native support for Linux kernel, which means that RBD drivers are well integrated with the Linux kernel since the past few years. In addition to reliability and performance, RBD also provides enterprise features such as full and incremental snapshots, thin provisioning, copy on write cloning, dynamic resizing, and so on. RBD also supports In-Memory caching, which drastically improves its performance.
The industry leading open source hypervisors, such as KVM and Zen, provide full support to RBD and leverage its features to their guest virtual machines. Other proprietary hypervisors, such as VMware and Microsoft HyperV will be supported very soon. There has been a lot of work going on in the community for support to these hypervisors. The Ceph block device provides full support to cloud platforms such as OpenStack, Cloud stack, as well as others. It has been proven successful and feature-rich for these cloud platforms. In OpenStack, you can use the Ceph block device with cinder (block) and glance (imaging) components. Doing so, you can spin 1000s of Virtual Machines (VMs) in very little time, taking advantage of the copy on write feature of the Ceph block storage.
All these features make RBD an ideal candidate for cloud platforms such as OpenStack and CloudStack. We will now learn how to create a Ceph block device and make use of it.
Any regular Linux host (RHEL- or Debian-based) can act as a Ceph client. The Client interacts with the Ceph storage cluster over the network to store or retrieve user data. Ceph RBD support has been added to the Linux mainline kernel, starting with 2.6.34 and later versions.
As we have done earlier, we will set up a Ceph client machine using vagrant and VirtualBox. We will use the Vagrantfile. Vagrant will then launch an Ubuntu 14.04 virtual machine that we will configure as a Ceph client:
$ vagrant status client-node1
$ vagrant up client-node1
$ vagrant ssh client-node1
Note: The username and password that vagrant uses to configure virtual machines is vagrant, and vagrant has sudo rights. The default password for root user is vagrant.
$ lsb_release -a
$ uname -r
Check for RBD support in the kernel:
$ sudo modprobe rbd
## Login to ceph-node1 machine
$ vagrant ssh ceph-node1
$ sudo su -
# ssh-copy-id vagrant@client-node1
Provide a one-time vagrant user password, that is, vagrant, for client-node1. Once the ssh keys are copied from ceph-node1 to client-node1, you should able to log in to client-node1 without a password.
# cd /etc/ceph
# ceph-deploy --username vagrant install client-node1
# ceph-deploy --username vagrant config push client-node1
In our case, we will create a Ceph user, client.rbd, with access to the rbd pool. By default, Ceph block devices are created on the rbd pool:
# ceph auth get-or-create client.rbd mon 'allow r' osd 'allow
class-read object_prefix rbd_children, allow rwx pool=rbd'
# ceph auth get-or-create client.rbd | ssh vagrant@client-node1 sudo tee /etc/ceph/ceph.client.rbd.keyring
$ vagrant ssh client-node1
$ sudo su -
# cat /etc/ceph/ceph.client.rbd.keyring >> /etc/ceph/keyring
### Since we are not using the default user client.admin we need to supply username that will connect to Ceph cluster.
# ceph -s --name client.rbd
Till now, we have configured Ceph client, and now we will demonstrate creating a Ceph block device from the client-node1 machine.
# rbd create rbd1 --size 10240 --name client.rbd
## The default pool to store block device images is 'rbd', you can also specify the pool name with the rbd command using the -p option:
# rbd ls --name client.rbd
# rbd ls -p rbd --name client.rbd
# rbd list --name client.rbd
# rbd --image rbd1 info --name client.rbd
Now that we have created block device on Ceph cluster, in order to use this block device, we need to map it to the client machine. To do this, execute the following commands from the client-node1 machine.
# rbd showmapped --name client.rbd
# fdisk -l /dev/rbd1
# mkfs.xfs /dev/rbd1
# mkdir /mnt/ceph-disk1
# mount /dev/rbd1 /mnt/ceph-disk1
# df -h /mnt/ceph-disk1
# dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M
# wget https://raw.githubusercontent.com/ksingh7/
ceph-cookbook/master/rbdmap -O /etc/init.d/rbdmap
# chmod +x /etc/init.d/rbdmap
# update-rc.d rbdmap defaults
## Make sure you use correct keyring value in /etc/ceph/rbdmap
file, which is generally unique for an environment.
# echo "rbd/rbd1 id=rbd,
keyring=AQCLEg5VeAbGARAAE4ULXC7M5Fwd3BGFDiHRTw==" >>
/etc/ceph/rbdmap
# echo "/dev/rbd1 /mnt/ceph-disk1 xfs defaults, _netdev
0 0 " >> /etc/fstab
# mkdir /mnt/ceph-disk1
# /etc/init.d/rbdmap start
Ceph supports thin provisioned block devices, which means that the physical storage space will not get occupied until you begin storing data on the block device. The Ceph RADOS block device is very flexible; you can increase or decrease the size of an RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, ZFS, and others support filesystem resizing to a certain extent. Please follow filesystem specific documentation to know more on resizing.
To increase or decrease Ceph RBD image size, use the --size <New_Size_in_MB> option with the rbd resize command, this will set the new size for the RBD image:
# rbd resize --image rbd1 --size 20480 --name client.rbd
# rbd info --image rbd1 --name client.rbd
# dmesg | grep -i capacity
# xfs_growfs -d /mnt/ceph-disk1
Ceph extends full support to snapshots, which are point-in-time, read-only copies of an RBD image. You can preserve the state of a Ceph RBD image by creating snapshots and restoring the snapshot to get the original data.
Let's see how a snapshot works with Ceph.
# echo "Hello Ceph This is snapshot test" > /mnt/
ceph-disk1/snapshot_test_file
Syntax: rbd snap create <pool-name>/<image-name>@<snap-name>
# rbd snap create rbd/rbd1@snapshot1 --name client.rbd
Syntax: rbd snap ls <pool-name>/<image-name>
# rbd snap ls rbd/rbd1 --name client.rbd
# rm -f /mnt/ceph-disk1/*
Syntax: rbd snap rollback <pool-name>/<image-name>@<snap-name># rbd snap rollback rbd/rbd1@snapshot1 --name client.rbd
# umount /mnt/ceph-disk1
# mount /dev/rbd1 /mnt/ceph-disk1
# ls -l /mnt/ceph-disk1
Syntax: rbd snap rm <pool-name>/<image-name>@<snap-name>
Syntax: rbd snap purge <pool-name>/<image-name>
# rbd snap purge rbd/rbd1 --name client.rbd
Ceph supports a very nice feature for creating Copy-On-Write (COW) clones from RBD snapshots. This is also known as Snapshot Layering in Ceph. Layering allows clients to create multiple instant clones of Ceph RBD. This feature is extremely useful for cloud and virtualization platforms such as OpenStack, CloudStack, and Qemu/KVM, and so on. These platforms usually protect Ceph RBD images containing an OS / VM image in the form of a snapshot. Later, this snapshot is cloned multiple times to spawn new virtual machines / instances. Snapshots are read-only, but COW clones are fully writable; this feature of Ceph provides a greater level of flexibility and is extremely useful among cloud platforms:
Every cloned image (child image) stores references of its parent snapshot to read image data. Hence, the parent snapshot should be protected before it can be used for cloning. At the time of data writing on the COW cloned image, it stores new data references to itself. COW cloned images are as good as RBD. They are quite flexible like RBD, which means that they are writable, resizable, and support snapshots and further cloning.
In Ceph RBD, images are of two types: format-1 and format-2. The RBD snapshot feature is available on both types that is, in format-1 as well as in format-2 RBD images. However, the layering feature (the COW cloning feature) is available only for the RBD image with format-2. The default RBD image format is format-1.
To demonstrate RBD cloning, we will intentionally create a format-2 RBD image, then create and protect its snapshot, and finally, create COW clones out of it:
# rbd create rbd2 --size 10240 --image-format 2 --name client.rbd
# rbd info --image rbd2 --name client.rbd
# rbd snap create rbd/rbd2@snapshot_for_cloning --name client.rbd
# rbd snap protect rbd/rbd2@snapshot_for_cloning --name client.rbd
Syntax: rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name>
# rbd info rbd/clone_rbd2 --name client.rbd
At this point, we have a cloned RBD image, which is dependent upon its parent image snapshot. To make the cloned RBD image independent of its parent, we need to flatten the image, which involves copying the data from the parent snapshot to the child image. The time it takes to complete the flattening process depends on the size of the data present in the parent snapshot. Once the flattening process is completed, there is no dependency between the cloned RBD image and its parent snapshot.
# rbd flatten rbd/clone_rbd2 --name client.rbd
# rbd info --image clone_rbd2 --name client.rbd
After the completion of the flattening process, if you check image information, you will notice that the parent image/snapshot name is not present and the clone is independent.
# rbd snap unprotect rbd/rbd2@snapshot_for_cloning --name client.rbd
# rbd snap rm rbd/rbd2@snapshot_for_cloning --name client.rbd
OpenStack is an open source software platform for building and managing public and private cloud infrastructure. It is being governed by an independent, non-profit foundation known as the OpenStack foundation. It has the largest and the most active community, which is backed by technology giants such as, HP, Red Hat, Dell, Cisco, IBM, Rackspace, and many more. OpenStack's idea for cloud is that it should be simple to implement and massively scalable.
OpenStack is considered as the cloud operating system where users are allowed to instantly deploy hundreds of virtual machines in an automated way. It also provides an efficient way of hassle free management of these machines. OpenStack is known for its dynamic scale up, scale out, and distributed architecture capabilities, making your cloud environment robust and future-ready. OpenStack provides an enterprise class Infrastructure-as-a-service (IaaS) platform for all your cloud needs.
As shown in the preceding diagram, OpenStack is made up of several different software components that work together to provide cloud services. Out of all these components, in this article, we will focus on Cinder and Glance, which provide block storage and image services respectively. For more information on OpenStack components, please visit http://www.openstack.org/.
Since the last few years, OpenStack has been getting amazingly popular, as it's based on software defined on a wide range, whether it's computing, networking, or even storage. And when you talk storage for OpenStack, Ceph will get all the attraction. An OpenStack user survey, conducted in May 2015, showed Ceph dominating the block storage driver market with a whopping 44% production usage. Ceph provides a robust, reliable storage backend that OpenStack was looking for. Its seamless integration with OpenStack components such as cinder, glance, nova, and keystone provides all in one cloud storage backend for OpenStack. Here are some key benefits that make Ceph the best match for OpenStack:
Ceph and OpenStack communities have been working closely since the last few years to make the integration more seamless, and to make use of new features as they are landed. In the future, we can expect that OpenStack and Ceph will be more closely associated due to Red Hat's acquisition of Inktank, the company behind Ceph; Red Hat is one of the major contributor of OpenStack project.
OpenStack is a modular system, which is a system that has a unique component for a specific set of tasks. There are several components that require a reliable storage backend, such as Ceph, and extend full integration to it, as shown in the following diagram. Each of these components uses Ceph in their own way to store block devices and objects. The majority of cloud deployment based on OpenStack and Ceph use the Cinder, glance, and Swift integrations with Ceph. Keystone integration is used when you need an S3-compatible object storage on the Ceph backend. Nova integration allows boot from Ceph volume capabilities for your OpenStack instances.
The OpenStack setup and configuration is beyond the scope of this article; however, for ease of demonstration, we will use a virtual machine preinstalled with the OpenStack RDO Juno release. If you like, you can also use your own OpenStack environment and can perform Ceph integration.
In this section, we will demonstrate setting up a preconfigured OpenStack environment using vagrant, and accessing it via CLI and GUI:
# cd ceph-cookbook
# vagrant up openstack-node1
$ vagrant status openstack-node1
$ vagrant ssh openstack-node1
$ sudo su -
$ source keystone_admin
We will now run some native OpenStack commands to make sure that OpenStack is set up correctly. Please note that some of these commands do not show any information, since this is a fresh OpenStack environment and does not have instances or volumes created:
# nova list
# cinder list
# glance image-list
OpenStack nodes should be configured as Ceph clients in order to access the Ceph cluster. To do this, install Ceph packages on OpenStack nodes and make sure it can access the Ceph cluster.
In this section, we are going to configure OpenStack as a Ceph client, which will be later used to configure cinder, glance, and nova:
$ vagrant ssh ceph-node1
$ sudo su -
# ping os-node1 -c 1
# ssh-copy-id root@os-node1
# cd /etc/ceph
# ceph-deploy install os-node1
# ceph-deploy config push os-node1
Make sure that the ceph.conf file that we have pushed to os-node1 should have the permission of 644.
# ceph osd pool create images 128
# ceph osd pool create volumes 128
# ceph osd pool create vms 128
# ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images'
# ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images'
# ceph auth get-or-create client.glance | ssh os-node1 sudo tee /etc/ceph/ceph.client.glance.keyring
# ssh os-node1 sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring
# ceph auth get-or-create client.cinder | ssh os-node1 sudo tee /etc/ceph/ceph.client.cinder.keyring
# ssh os-node1 sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring
# ceph auth get-key client.cinder | ssh os-node1 tee /etc/ceph/temp.client.cinder.key
$ vagrant ssh openstack-node1
$ sudo su -
# cd /etc/ceph
# ceph -s --name client.glance --keyring ceph.client.glance.keyring
# ceph -s --name client.cinder --keyring ceph.client.cinder.keyring
# cd /etc/ceph
# uuidgen
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
<uuid>bb90381e-a4c5-4db7-b410-3154c4af486e</uuid>
<usage type='ceph'>
<name>client.cinder secret</name>
</usage>
</secret>
EOF
Make sure that you use your own uuid generated for your environment./
# virsh secret-define --file secret.xml
# virsh secret-set-value --secret bb90381e-a4c5-4db7-b410-3154c4af486e --base64 $(cat temp.client.cinder.key) && rm temp.client.cinder.key secret.xml
# virsh secret-list
We have completed the configuration required from the Ceph side. In this section, we will configure the OpenStack glance to use Ceph as a storage backend.
This section talks about configuring the glance component of OpenStack to store virtual machine images on Ceph RBD:
default_store=rbd
show_image_direct_url=True
# cat /etc/glance/glance-api.conf | egrep -i "default_store|image_direct"
stores = rbd
rbd_store_ceph_conf=/etc/ceph/ceph.conf
rbd_store_user=glance
rbd_store_pool=images
rbd_store_chunk_size=8
# cat /etc/glance/glance-api.conf | egrep -v "#|default" | grep -i rbd
# service openstack-glance-api restart
# source /root/keystonerc_admin
# glance image-list
# wget http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img
# glance image-create --name cirros_image --is-public=true --disk-format=qcow2 --container-format=bare < cirros-0.3.1-x86_64-disk.img
# glance image-list
# rados -p images ls --name client.glance --keyring /etc/ceph/ceph.client.glance.keyring | grep -i id
# nova boot --flavor 1 --image b2d15e34-7712-4f1d-b48d-48b924e79b0c vm1
While you are adding new glance images or creating an instance from the glance image stored on Ceph, you can check the IO on the Ceph cluster by monitoring it using the # watch ceph -s command.
The Cinder program of OpenStack provides block storage to virtual machines. In this section, we will configure OpenStack Cinder to use Ceph as a storage backend. OpenStack Cinder requires a driver to interact with the Ceph block device. On the OpenStack node, edit the /etc/cinder/cinder.conf configuration file by adding the code snippet given in the following section.
In the last section, we learned to configure glance to use Ceph. In this section, we will learn to use the Ceph RBD with the Cinder service of OpenStack:
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_user = cinder
rbd_secret_uuid = bb90381e-a4c5-4db7-b410-3154c4af486e
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = -1
glance_api_version = 2
# cat /etc/cinder/cinder.conf | egrep "rbd|rados|version" | grep -v "#"
# service openstack-cinder-volume restart
# source /root/keystonerc_admin
# cinder list
# cinder create --display-name ceph-volume01 --display-description "Cinder volume on CEPH storage" 2
# cinder list
# rados -p volumes --name client.cinder --keyring ceph.client.cinder.keyring ls | grep -i id
In order to attach the Ceph RBD to OpenStack instances, we should configure the nova component of OpenStack by adding the rbd user and uuid information that it needs to connect to the Ceph cluster. To do this, we need to edit /etc/nova/nova.conf on the OpenStack node and perform the steps that are given in the following section.
The cinder service that we configured in the last section creates volumes on Ceph, however, to attach these volumes to OpenStack instances, we need to configure NOVA:
rbd_user=cinder
rbd_secret_uuid= bb90381e-a4c5-4db7-b410-3154c4af486e
# service openstack-nova-compute restart
# nova list
# cinder list
# nova volume-attach 1cadffc0-58b0-43fd-acc4-33764a02a0a6 1337c866-6ff7-4a56-bfe5-b0b80abcb281
# cinder list
In order to boot all OpenStack instances into Ceph, that is, for the boot-from-volume feature, we should configure an ephemeral backend for nova. To do this, edit /etc/nova/nova.conf on the OpenStack node and perform the changes shown next.
This section deals with configuring NOVA to store entire virtual machine on the Ceph RBD:
inject_partition=-2
images_type=rbd
images_rbd_pool=vms
images_rbd_ceph_conf=/etc/ceph/ceph.conf
# cat /etc/nova/nova.conf|egrep "rbd|partition" | grep -v "#"
# service openstack-nova-compute restart
# qemu-img convert -f qcow2 -O raw cirros-0.3.1-x86_64-disk.img cirros-0.3.1-x86_64-disk.raw
# glance image-create --name cirros_raw_image --is-public=true --disk-format=raw --container-format=bare < cirros-0.3.1-x86_64-disk.raw
# nova image-list
# cinder create --image-id ff8d9729-5505-4d2a-94ad-7154c6085c97 --display-name cirros-ceph-boot-volume 1
# cinder list
# nova boot --flavor 1 --block_device_mapping vda=fd56314b-e19b-4129-af77-e6adf229c536::0 --image 964bd077-7b43-46eb-8fe1-cd979a3370df vm2_on_ceph
--block_device_mapping vda = <cinder bootable volume id >
--image = <Glance image associated with the bootable volume>
# nova list
In this article, we have covered the various storage formats supported by Ceph in detail and how they were assigned to other physical or virtual servers.
Further resources on this subject: