Linstor: concepts and configuration

In a previous article, I introduced Linstor. Now I’m going to explain some of the concepts needed to use it.

If you’ve read the Linstor user’s guide but are still confused about the relationship between resource-groups, resource-definitions, resources and volumes, then this is for you.

I’m going to leave storage pools for a moment, and go straight to the most important part: resources and volumes

Resources and Volumes

There are six key entities:

Ultimately, the storage that appears as /dev/drbdXXXX is a volume.

The relationship between resources and volumes is described here:

Volumes are a subset of a Resource. A Resource could have multiple volumes, for example you may wish to have your database stored on slower storage than your logs in your MySQL cluster. By keeping the volumes under a single resource you are essentially creating a consistency group.

In short: if you want a set of volumes that co-exist together on the same nodes and are consistent with each other (e.g. snapshots are taken simultaneously), then create them as volumes under the same resource.

If you don’t need this, then don’t worry about it. For most use cases, resources and volumes are related one-to-one; that is, each resource provides a single volume.

resource-group and volume-group

A resource-group and its linked volume-group(s) form a kind of template from which a resource-definition inherits. Whenever you create a resource-definition, it must belong to a resource-group.

If you make a change to a resource-group, that change is applied retroactively to any existing resource-definitions underneath it. For example, if you specified a replication factor (“place-count”) of 2 in the resource-group, then spawn some resources from it, there will be two replicas of each. If you then change the place-count to 3 in the resource-group, all the attached resource definitions will use this new value — and additional resources will be created to bring you up to the new replication factor.

You can also move a resource-definition from one resource-group to another.

However, one thing you can’t do is change the number of volume-groups in a resource-group, except when the resource-group is empty (i.e. it has no child resource-definitions).

As already mentioned, except for unusual scenarios, each resource-group will only have a single volume-group anyway.

resource-definition and volume-definition

A resource-definition and its linked volume-definition(s) describe the actual volumes that you want to exist on your cluster (regardless of the number of replicas deployed).

Two essential attributes of the volume-definition are:

  • The volume size, which you choose at creation time (and can increase later)
  • The volume minor device number, which defines the device node it will appear as (e.g. VolumeMinor 1000 means /dev/drbd1000

resource and volume

Resource and volume are the manifestations of storage on a specific node (host) — representing the actual LVM or ZFS volumes on disk. A resource and its volumes may exist on multiple nodes, if there are replicas. Therefore, each resource belongs to both a resource-definition and a node.

A node can also carry diskless resources, which can access the same volumes over the network, even though the storage is held elsewhere.

In the following example, the resource “my_ssd_res” (and its associated volume) exists on four nodes: two diskless and two with the actual storage. Hence there is one resource-definition for my_ssd_res, but four resources.

The resource-definition has one volume-definition (VolNr 0), and hence each resource has one volume. This volume is accessible as /dev/drbd1000 across all four nodes.

Inspecting objects

A resource-group, resource-definition or resource is identified by a name, which you choose when creating that object.

A volume-group, volume-definition or volume is identified by the name of its parent object (resource-group, resource-definition or resource), plus its volume number, starting from zero.

These can all be easily inspected from the command line. You can obtain a list of objects of a given type: in the case of volumes, you need to give the name of the parent resource.

Objects have a set of properties, which you can also examine from the command line. For resource-type objects, give the name:

For volumes, again you need to give the name plus volume number.

Putting it all together

Now we can do a complete worked example.

Firstly, create resource-groups. Use linstor resource-group create --help to see the available options.

Let’s create one for standard workloads which require 2 replicas, and one for critical workloads which require 3 replicas. (Note that long lines have been split)

At this point, we have resource-groups with no child volume-groups:

A resource with no volumes isn’t very useful, so we need to add a volume-group to each template:

Those groups only need to be created once, but can re-used many times.

Now, to create an actual storage volume, there are four remaining types of object we need to create: resource-definition, volume-definition, resources and volumes. Fortunately, Linstor provides a shortcut to do all this in one step: linstor resource-group spawn-resources.

All you have to do is give the name of an existing resource-group; the name of the resource-definition you want to create; and the size(s) of the volume(s) to create (as many as the number of volume-groups attached to the resource-group)

That’s it! It has created the resource-definition, volume-definition, and all the resources to deploy it across the correct number of nodes:

Linstor picks the “least used” nodes when deploying the resources, according to this algorithm which you can control via weights. In this case, node1 and node2 were chosen for the volumes. Node3 was also configured as a quorum “tie breaker” diskless node (this isn’t stored as a persistent resource, as it doesn’t matter which node fulfils this role at any given time).

However if that doesn’t meet your needs for any reason, now that you understand what all the underlying objects are, you can create them individually by hand instead of using spawn-resources:

The result is the same, but you had full control over the process.

Moving a resource-definition to a new resource-group

Suppose we decide that volume “bar” is now part of the critical infrastructure and needs three replicas. We can change it to be part of the ssd_crit resource-group, and tell Linstor to create additional resources to meet the new replication requirement:

Deleting volumes

As you’ve seen, you can increase and reduce the number of replicas under a resource-definition by creating and deleting its resources.

What if you want to remove a volume entirely? Hopefully it’s clear now: you delete the resource-definition. This will delete its child resources and volumes.

Storage

Storage pools

Storage pools are the manifestation of available storage on a particular node. This is very straightforward:

When Linstor wants to allocate space from pool_ssd, you can see it has four choices where to get it, one on each of four nodes.

A node can have multiple entries for the same storage pool. One way of using this is that instead of combining multiple drives into a single LVM volume group, you could present the drives as separate devices. The idea is that if a drive fails, fewer volumes are affected while you swap it out.

Volume sizes

Linstor’s default size units are powers-of-two: if you ask Linstor for a volume size of “1”, “1G” or “1GiB”, you are asking for one gibibyte, which is 1024x1024x1024 = 1,073,741,824 bytes. (If instead you ask for “1GB” you’ll get one gigabyte which is 1,000,000,000 bytes²)

However, in either case, the volume you get is actually slightly larger than requested:

Why is this? It is because DRBD needs to allocate some extra space for its own metadata to track the replication state of data on disk. For example, if the power is interrupted part-way through replication, the metadata allows it to continue where it left off when restarted. Therefore, it asks LVM for a slightly larger volume than you requested.

LVM allocates space in chunks called “extents”, which by default are 4MiB in size. So the next size up that LVM can offer, after 1024MiB, is 1028MiB:

DRBD doesn’t use all this extra space for metadata (in this example it takes only 264KiB), so the rest is left available to the user.

drbdsetup status

Under the hood, Linstor’s job is to configure the drbd kernel driver on each node. The drbdsetup and drbdadm tools also interact directly with this driver.

Normally you should need to touch them, but drbdsetup status --verbose lets you see the relationships between active resources and their child volumes. Run this on the satellite node where the resource is deployed, not on the controller.

CLI hints and tips

Short form keywords

Most keywords have short forms, e.g. rd for resource-definition , l for list, lp for list-properties. This saves a lot of typing!

Bash completion

Bash completion, where you hit tab to get a list of possible options, is available. Unfortunately it is missing from the Ubuntu packages. If tab-completion isn’t working for you, you can install it directly from the git repository (note: this is one long line)

Pastable output

You may have noticed that the tables generated by the linstor command use funky unicode characters for lines and rounded corners. If you want more traditional ASCII output, use the -p (pastable) flag. Compare:

JSON output

At the other extreme, apply -m or -m --output-version v1 to get machine-parseable JSON output. v1 is the newer format; --output-version v0 is the default.

Preferences

You can store your preferences in ~/.config/linstor/linstor-client.conf or /etc/linstor/linstor-client.conf For example, to disable the ANSI colours for the “State” values, you can set

Conclusion

Hopefully you now have a clearer idea of Linstor’s concepts of resource-groups, resource-definitions, resources and how they relate to volumes. Using these, you can create, modify and delete storage volumes in your cluster.

[¹] The minor device number is normally chosen automatically, although you can select it yourself you wish, e.g. if you are migrating a manual DRBD configuration to Linstor.

[²] Drive manufacturers use power-of-ten figures. This is because a 500GB drive sounds bigger than a 465.7GiB drive (even though they are the same size).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store