As a research project this year at Caktus, we decided to build our own private cloud. Some of Caktus' clients prefer or require that their data and workloads remain on premises, which begs the question, "How can we bring a modern, cloud-like experience to an on-premises data center?"
We decided to answer this question as economically as we could by building OpenStack and Ceph clusters on gently used hardware in the existing space in our rack in the Caktus office.
The Hardware
We procured eight (8) used SuperMicro servers with the following specifications:
- 128 GB RAM
- 2x Xeon CPUs
- 2x 10Gbit Intel NICs
- 2x 500 GB SAS drives (in a mirrored configuration for the operating system)
We wanted 10 Gigabit networking to maximize throughput between the compute and storage nodes in our OpenStack and Ceph clusters, so we also found a used Dell 10 Gigabit switch. Tobias has written some about the Dell S4820T configuration on his own blog.
Racking the Servers
Racking servers takes time, in particular when dealing with older hardware that might need a little dust removed first. We identified and fixed the following issues in our servers:
- Several of the machines had minor HDD issues reported by smartctl. In most cases the drives were still healthy overall, but we ran badblocks on them a couple times just to be sure.
- One or two of the servers initially stopped during BIOS POST with a "B7" error message, the SuperMicro code for a memory issue. Luckily, in our case, simpy reseating the RAM did the trick.
- We cleaned the CPUs and installed new thermal paste on one server that was crashing intermittently.
Operating System Deployment
Rather than install an operating system on each server by hand, we installed Canonical's MAAS product on a separate VLAN at our office, which gives us the ability to remotely power control and easily reinstall the operating system on these servers through a web interface. As with public cloud environments, we wished to automate as much as possible, so being able to start with a clean slate at the touch of a button was key to further testing of Ceph and OpenStack. As of the date of this post, both of these products prefer Ubuntu 20.04 as a base operating system, so that is what we deployed via MAAS.
Ceph Bootstrapping
Once we had Ubuntu 20.04 deployed on the bare metal servers, we began working on the Ceph cluster. While for production clusters it is recommended to have at least five nodes, we decided to start with three, to reserve three nodes for the OpenStack cluster and leave two spares.
There are many ways to deploy a Ceph cluster, but as of the time of this post the recommended method is to use cephadm. We also used cephadm-ansible (not ceph-ansible) to automate the cluster bootstrapping and configuration. Here is a copy of the Ansible playbook that we used to bootstrap and configure the cluster, including links to the relevant upstream documentation and code.
Configuring Ceph for OpenStack
Before deploying OpenStack (or configuring it to use Ceph), the RBD pools need to be created. Additionally, the Ceph keyrings need to be retrieved to the deployment machine for use by OpenStack. To help automate this process, we created an Ansible playbook to download the Ceph keyrings to a Kolla Ansible deployment host.
OpenStack Bootstrapping
Given our affinity for Ansible, we chose the Kolla Ansible project to deploy OpenStack. Kolla Ansible is well maintained and used, and its documentation also includes numerous tips for operating a cluster (not just deploying it for the first time).
The Kolla Ansible Quick Start is an excellent place to start. We followed these steps almost exactly, and documented a streamlined version of the commands in our own internal README for the project.
OpenStack Configuration
Once the OpenStack cluster is created, there is still work to do to configure it for your environment. The init-runonce script included with Kolla Ansible is a good starting point, however, we recommend reviewing the variables related to external networking at the top of the file before running it so that you can customize the created networks for your local network. It's harder to change these after they've been created.
Summary
We hope you've enjoyed reading this summary of our experiments with MAAS, Ceph, and OpenStack over the last few months! Please let us know if you have any questions in the comments below. We plan to write more as we refine our OpenStack configuration, and possibly release more in-depth pieces on some of the above topics if there's interest. If you're interested in working with us on a project (Python, Django, OpenStack, or otherwise), please get in touch with our partnerships team.