Linstor: concepts and configuration

Brian Candler
10 min readFeb 15, 2021

In a previous article, I introduced Linstor. Now I’m going to explain some of the concepts needed to use it.

If you’ve read the Linstor user’s guide but are still confused about the relationship between resource-groups, resource-definitions, resources and volumes, then this is for you.

I’m going to leave storage pools for a moment, and go straight to the most important part: resources and volumes

Resources and Volumes

There are six key entities:

Relationships between resource and volume objects

Ultimately, the storage that appears as /dev/drbdXXXX is a volume.

The relationship between resources and volumes is described here:

Volumes are a subset of a Resource. A Resource could have multiple volumes, for example you may wish to have your database stored on slower storage than your logs in your MySQL cluster. By keeping the volumes under a single resource you are essentially creating a consistency group.

In short: if you want a set of volumes that co-exist together on the same nodes and are consistent with each other (e.g. snapshots are taken simultaneously), then create them as volumes under the same resource.

If you don’t need this, then don’t worry about it. For most use cases, resources and volumes are related one-to-one; that is, each resource provides a single volume.

resource-group and volume-group

A resource-group and its linked volume-group(s) form a kind of template from which a resource-definition inherits. Whenever you create a resource-definition, it must belong to a resource-group.

If you make a change to a resource-group, that change is applied retroactively to any existing resource-definitions underneath it. For example, if you specified a replication factor (“place-count”) of 2 in the resource-group, then spawn some resources from it, there will be two replicas of each. If you then change the place-count to 3 in the resource-group, all the attached resource definitions will use this new value — and additional resources will be created to bring you up to the new replication factor.

You can also move a resource-definition from one resource-group to another.

However, one thing you can’t do is change the number of volume-groups in a resource-group, except when the resource-group is empty (i.e. it has no child resource-definitions).

root@node1:~# linstor volume-group create my_ssd_group
ERROR:
Description:
Volume group cannot be created while the resource group has already resource definitions.
Details:
Volume groups for resource group 'my_ssd_group'

As already mentioned, except for unusual scenarios, each resource-group will only have a single volume-group anyway.

resource-definition and volume-definition

A resource-definition and its linked volume-definition(s) describe the actual volumes that you want to exist on your cluster (regardless of the number of replicas deployed).

Two essential attributes of the volume-definition are:

  • The volume size, which you choose at creation time (and can increase later)
  • The volume minor device number, which defines the device node it will appear as (e.g. VolumeMinor 1000 means /dev/drbd1000
root@node1:~# linstor resource-definition list
╭─────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞═════════════════════════════════════════════╡
┊ my_ssd_res ┊ 7000 ┊ my_ssd_group ┊ ok ┊
╰─────────────────────────────────────────────╯
root@node1:~# linstor volume-definition list
╭───────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ VolumeNr ┊ VolumeMinor ┊ Size ┊ Gross ┊ State ┊
╞═══════════════════════════════════════════════════════════════╡
┊ my_ssd_res ┊ 0 ┊ 1000 ┊ 1 GiB ┊ ┊ ok ┊
╰───────────────────────────────────────────────────────────────╯

resource and volume

Resource and volume are the manifestations of storage on a specific node (host) — representing the actual LVM or ZFS volumes on disk. A resource and its volumes may exist on multiple nodes, if there are replicas. Therefore, each resource belongs to both a resource-definition and a node.

A node can also carry diskless resources, which can access the same volumes over the network, even though the storage is held elsewhere.

In the following example, the resource “my_ssd_res” (and its associated volume) exists on four nodes: two diskless and two with the actual storage. Hence there is one resource-definition for my_ssd_res, but four resources.

The resource-definition has one volume-definition (VolNr 0), and hence each resource has one volume. This volume is accessible as /dev/drbd1000 across all four nodes.

Inspecting objects

A resource-group, resource-definition or resource is identified by a name, which you choose when creating that object.

A volume-group, volume-definition or volume is identified by the name of its parent object (resource-group, resource-definition or resource), plus its volume number, starting from zero.

These can all be easily inspected from the command line. You can obtain a list of objects of a given type: in the case of volumes, you need to give the name of the parent resource.

root@node1:~# linstor resource-group list
╭─────────────────────────────────────────────────────────────────╮
┊ ResourceGroup ┊ SelectFilter ┊ VlmNrs ┊ Description ┊
╞═════════════════════════════════════════════════════════════════╡
┊ DfltRscGrp ┊ PlaceCount: 2 ┊ ┊ ┊
╞┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄┄╡
┊ my_ssd_group ┊ PlaceCount: 2 ┊ 0 ┊ ┊
┊ ┊ StoragePool(s): pool_ssd ┊ ┊ ┊
╰─────────────────────────────────────────────────────────────────╯
root@node1:~# linstor volume-group list my_ssd_group
╭──────────────────╮
┊ VolumeNr ┊ Flags ┊
╞══════════════════╡
┊ 0 ┊ ┊
╰──────────────────╯

Objects have a set of properties, which you can also examine from the command line. For resource-type objects, give the name:

root@node1:~# linstor resource-definition list
╭─────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞═════════════════════════════════════════════╡
┊ my_ssd_res ┊ 7000 ┊ my_ssd_group ┊ ok ┊
╰─────────────────────────────────────────────╯
root@node1:~# linstor resource-definition list-properties my_ssd_res
╭──────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞══════════════════════════════════════════════╡
┊ DrbdOptions/Resource/on-no-quorum ┊ io-error ┊
┊ DrbdOptions/Resource/quorum ┊ majority ┊
┊ DrbdPrimarySetOn ┊ NODE3 ┊
╰──────────────────────────────────────────────╯

For volumes, again you need to give the name plus volume number.

root@node1:~# linstor volume-definition list
╭───────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ VolumeNr ┊ VolumeMinor ┊ Size ┊ Gross ┊ State ┊
╞═══════════════════════════════════════════════════════════════╡
┊ my_ssd_res ┊ 0 ┊ 1000 ┊ 1 GiB ┊ ┊ ok ┊
╰───────────────────────────────────────────────────────────────╯
root@node1:~# linstor volume-definition list-properties my_ssd_res 0
╭───────────────────────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞═══════════════════════════════════════════════════════════════╡
┊ DrbdCurrentGi ┊ 7910B7E5EAF8F9CC ┊
┊ DrbdOptions/Disk/discard-zeroes-if-aligned ┊ yes ┊
┊ DrbdOptions/Disk/rs-discard-granularity ┊ 8192 ┊
╰───────────────────────────────────────────────────────────────╯

Putting it all together

Now we can do a complete worked example.

Firstly, create resource-groups. Use linstor resource-group create --help to see the available options.

Let’s create one for standard workloads which require 2 replicas, and one for critical workloads which require 3 replicas. (Note that long lines have been split)

root@node1:~# linstor resource-group create --storage-pool pool_ssd --place-count 2 ssd_std
SUCCESS:
...
root@node1:~# linstor resource-group create --storage-pool pool_ssd --place-count 3 ssd_crit
SUCCESS:
...

At this point, we have resource-groups with no child volume-groups:

root@node1:~# linstor volume-group list ssd_std
╭──────────────────╮
┊ VolumeNr ┊ Flags ┊
╞══════════════════╡
╰──────────────────╯

A resource with no volumes isn’t very useful, so we need to add a volume-group to each template:

root@node1:~# linstor volume-group create ssd_std
SUCCESS:
New volume group with number '0' of resource group 'ssd_std' created.
root@node1:~# linstor volume-group list ssd_std
╭──────────────────╮
┊ VolumeNr ┊ Flags ┊
╞══════════════════╡
┊ 0 ┊ ┊
╰──────────────────╯
root@node1:~# linstor volume-group create ssd_crit
SUCCESS:
New volume group with number '0' of resource group 'ssd_crit' created.

Those groups only need to be created once, but can re-used many times.

Now, to create an actual storage volume, there are four remaining types of object we need to create: resource-definition, volume-definition, resources and volumes. Fortunately, Linstor provides a shortcut to do all this in one step: linstor resource-group spawn-resources.

All you have to do is give the name of an existing resource-group; the name of the resource-definition you want to create; and the size(s) of the volume(s) to create (as many as the number of volume-groups attached to the resource-group)

root@node1:~# linstor resource-group spawn-resources ssd_std bar 2G
SUCCESS:
Volume definition with number '0' successfully created in resource definition 'bar'.
SUCCESS:
Description:
New resource definition 'bar' created.
...
SUCCESS:
Description:
Resource 'bar' successfully autoplaced on 2 nodes
Details:
Used nodes (storage pool name): 'node1 (pool_ssd)', 'node2 (pool_ssd)'
...
INFO:
Tie breaker resource 'bar' created on DfltDisklessStorPool
...

That’s it! It has created the resource-definition, volume-definition, and all the resources to deploy it across the correct number of nodes:

Linstor picks the “least used” nodes when deploying the resources, according to this algorithm which you can control via weights. In this case, node1 and node2 were chosen for the volumes. Node3 was also configured as a quorum “tie breaker” diskless node (this isn’t stored as a persistent resource, as it doesn’t matter which node fulfils this role at any given time).

However if that doesn’t meet your needs for any reason, now that you understand what all the underlying objects are, you can create them individually by hand instead of using spawn-resources:

The result is the same, but you had full control over the process.

Moving a resource-definition to a new resource-group

Suppose we decide that volume “bar” is now part of the critical infrastructure and needs three replicas. We can change it to be part of the ssd_crit resource-group, and tell Linstor to create additional resources to meet the new replication requirement:

In fact, as of linstor-controller v1.12.3, there is no need to run the auto-place manually; the new resources are created as soon as you apply the new resource-group.

Deleting volumes

As you’ve seen, you can increase and reduce the number of replicas under a resource-definition by creating and deleting its resources.

What if you want to remove a volume entirely? Hopefully it’s clear now: you delete the resource-definition. This will delete its child resources and volumes.

root@node1:~# linstor resource-definition delete foo
SUCCESS:
Description:
Resource definition 'foo' marked for deletion.
Details:
Resource definition 'foo' UUID is: 48d7b991-6a23-4e5f-b325-e13973b0040a
SUCCESS:
Notified 'node3' that diskless resources of 'foo' are being deleted
SUCCESS:
Notified 'node2' that diskless resources of 'foo' are being deleted
SUCCESS:
Notified 'node1' that diskless resources of 'foo' are being deleted
SUCCESS:
Resource 'foo' on 'node2' deleted
SUCCESS:
Resource 'foo' on 'node1' deleted
SUCCESS:
Description:
Resource definition 'foo' deleted.

Storage

Storage pools

Storage pools are the manifestation of available storage on a particular node. This is very straightforward:

When Linstor wants to allocate space from pool_ssd, you can see it has four choices where to get it, one on each of four nodes.

A node can have multiple entries for the same storage pool. One way of using this is that instead of combining multiple drives into a single LVM volume group, you could present the drives as separate devices. The idea is that if a drive fails, fewer volumes are affected while you swap it out.

Volume sizes

Linstor’s default size units are powers-of-two: if you ask Linstor for a volume size of “1”, “1G” or “1GiB”, you are asking for one gibibyte, which is 1024x1024x1024 = 1,073,741,824 bytes. (If instead you ask for “1GB” you’ll get one gigabyte which is 1,000,000,000 bytes²)

However, in either case, the volume you get is actually slightly larger than requested:

root@node1:~# blockdev --getsize64 /dev/drbd1000
1077665792

Why is this? It is because DRBD needs to allocate some extra space for its own metadata to track the replication state of data on disk. For example, if the power is interrupted part-way through replication, the metadata allows it to continue where it left off when restarted. Therefore, it asks LVM for a slightly larger volume than you requested.

LVM allocates space in chunks called “extents”, which by default are 4MiB in size. So the next size up that LVM can offer, after 1024MiB, is 1028MiB:

root@node3:~# lvs --units m
LV VG Attr LSize ...
my_ssd_res_00000 vg_ssd -wi-ao---- 1028.00m

DRBD doesn’t use all this extra space for metadata (in this example it takes only 264KiB), so the rest is left available to the user.

drbdsetup status

Under the hood, Linstor’s job is to configure the drbd kernel driver on each node. The drbdsetup and drbdadm tools also interact directly with this driver.

Normally you should need to touch them, but drbdsetup status --verbose lets you see the relationships between active resources and their child volumes. Run this on the satellite node where the resource is deployed, not on the controller.

CLI hints and tips

Short form keywords

Most keywords have short forms, e.g. rd for resource-definition , l for list, lp for list-properties. This saves a lot of typing!

root@node1:~# linstor resource-definition list
╭─────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞═════════════════════════════════════════════╡
┊ my_ssd_res ┊ 7000 ┊ my_ssd_group ┊ ok ┊
╰─────────────────────────────────────────────╯
root@node1:~# linstor rd l
╭─────────────────────────────────────────────╮
┊ ResourceName ┊ Port ┊ ResourceGroup ┊ State ┊
╞═════════════════════════════════════════════╡
┊ my_ssd_res ┊ 7000 ┊ my_ssd_group ┊ ok ┊
╰─────────────────────────────────────────────╯

Bash completion

Bash completion, where you hit tab to get a list of possible options, should now work out-of-the-box³.

Pastable output

You may have noticed that the tables generated by the linstor command use funky unicode characters for lines and rounded corners. If you want more traditional ASCII output, use the -p (pastable) flag. Compare:

root@node1:~# linstor rd lp my_ssd_res
╭──────────────────────────────────────────────╮
┊ Key ┊ Value ┊
╞══════════════════════════════════════════════╡
┊ DrbdOptions/Resource/on-no-quorum ┊ io-error ┊
┊ DrbdOptions/Resource/quorum ┊ majority ┊
┊ DrbdPrimarySetOn ┊ NODE3 ┊
╰──────────────────────────────────────────────╯
root@node1:~# linstor rd lp my_ssd_res -p
+----------------------------------------------+
| Key | Value |
|==============================================|
| DrbdOptions/Resource/on-no-quorum | io-error |
| DrbdOptions/Resource/quorum | majority |
| DrbdPrimarySetOn | NODE3 |
+----------------------------------------------+

JSON output

At the other extreme, apply -m or -m --output-version v1 to get machine-parseable JSON output. v1 is the newer format; --output-version v0 is the default.

Preferences

You can store your preferences in ~/.config/linstor/linstor-client.conf or /etc/linstor/linstor-client.conf For example, to disable the ANSI colours for the “State” values, you can set

[global]
no-color=

Conclusion

Hopefully you now have a clearer idea of Linstor’s concepts of resource-groups, resource-definitions, resources and how they relate to volumes. Using these, you can create, modify and delete storage volumes in your cluster.

[¹] The minor device number is normally chosen automatically, although you can select it yourself you wish, e.g. if you are migrating a manual DRBD configuration to Linstor.

[²] Drive manufacturers use power-of-ten figures. This is because a 500GB drive sounds bigger than a 465.7GiB drive (even though they are the same size).

[³] Prior to linstor-client v1.7.1 it was missing from the Ubuntu packages. If tab-completion isn’t working for you, either upgrade, or install the completion script directly from the git repository (note: this is one long line)

curl -Ss https://raw.githubusercontent.com/LINBIT/linstor-client/master/scripts/bash_completion/linstor -o /etc/bash_completion.d/linstor

--

--