Fixed claim-to-volume binding strategy in Kubernetes

Posted in category kubernetes on 2018-09-10
Tags: kubernetes

Table of Contents

The problem

In order to explain the challenge I will walk you through parts of Elasticsearch cluster setup. Later I will show the solution that will involve deeper understanding of naming convention that Kubernetes uses to render Persistent Volume Claim templates into Persistent Volume Claims, and binding these claims to concrete Persistent Volumes in a fixed fashion.

What’s given

Storage

This is how storage class is defined:

There is also a bunch of Persistent Volumes to be used in the elasticsearch-data storage class. To spice it up and give a bit of meaning to the whole post, I will draw additional requirement to it - each and every Persistent Volume has to be connected to a specific Kubernetes node.

And so on for all available indicies, let’s say from 0 to 2 allocating each PV on its own server.

This same thing can be done automatically with using something like local-volume-provisioner from external-storage project of Kubernetes Incubator to achieve more dynamic provisioning of local volumes. Though for the sake of the point to be made with this post, this option is not considered.

StatefulSet

Metadata of Elasticsearch StatefulSet:

And here is how claim templates section of Elasticsearch StatefulSet looks like:

Problem statement

When this StatefulSet is scheduled and running (assuming it has 3 replicas configured), the map from claims to volumes will look like following:

Claim Volume
esdata-elasticsearch-0 elasticsearch-data-1
esdata-elasticsearch-1 elasticsearch-data-2
esdata-elasticsearch-2 elasticsearch-data-0

The problem is in this random way of mapping from claims to volumes. Assume that you need to reconstruct Kubernetes cluster from the ground up and restore the data. But the next time you boot you will see yet another way of mapping from claims to volumes. This will inevitably lead to internal problems in Elasticsearch cluster itself.

Solution

Possible way to solve this problem is hidden in the way claim templates are resolved into claims. Notice the naming convention for claims from naming of claim template and statefulset:

-> Claim Template esdata
-> StatefulSet elasticsearch
== Claim esdata-elasticsearch-N

To explicitly phrase it - claim name rendered from claim template for a statefulset like following: [CLAIM_TEMPLATE]-[STATEFUL_SET]-N.

This gives a hint for how to solve the problem - we simply need to define claims with expected names but with additional parameters that will determine binding to a specific volume that has required binding to a specific Kubernetes node:

Basically, problem is getting solved by having claim template on a statefulset and also explicitly defining claims that are bound to correct volumes.

Scheduling such setup will produce fixed binding of claims to volumes in the most straightforward and intuitively expected manner:

Claim Volume
esdata-elasticsearch-0 elasticsearch-data-0
esdata-elasticsearch-1 elasticsearch-data-1
esdata-elasticsearch-2 elasticsearch-data-2