Skip to content
This repository is currently being migrated. It's locked while the migration is in progress.

Latest commit

 

History

History
90 lines (63 loc) · 3.84 KB

File metadata and controls

90 lines (63 loc) · 3.84 KB

Shared Volume Controller

The Shared Volume controller is responsible for creating Kubernetes Services and Endpoint resources for attached shared volumes and then publishing the Service's endpoint to the storageos.com/nfs/mount-endpoint label on the volume.

Create and Publish Shared Volume

When the StorageOS control plane receives a CSI CreateVolume request for a ReadWriteMany (RWX) volume, it creates a standard volume. All StorageOS volumes support both SINGLE_NODE_WRITER and MULTI_NODE_MULTI_WRITER CSI access modes.

Only when the control plane receives a CSI ControllerPublishVolume with VolumeCapability set to the MULTI_NODE_MULTI_WRITER access mode does it attach the volume as shared. When attached, the volume will have:

  • attachmentType set to nfs.
  • nfs.serviceEndpoint set to the address at which the NFS server is bound.

These steps happen without involvement from the Shared Volume controller.

Kubernetes Resource Evaluation

Shared Volumes must have a Service created with a ClusterIP that does not change for the lifetime of the PVC. The ClusterIP is combined with a static port (2049, the default NFS port), and set as the storageos.com/nfs/mount-endpoint label on the StorageOS volume.

An Endpoint must also exist with the same name and namespace as the Service, with the target set to the NFS server endpoint as defined in nfs.serviceEndpoint.

Since a Shared Volume is tied to a specific PVC, the Service and Endpoint both use the PVC name and namespace.

Resources are checked for existence and equivalence before deciding whether a create, update, or no action is required. When a resource is created, it is re-fetched before proceeding. Since the resource may not appear in the k8s api immediately, the resource is polled every -k8s-create-poll-interval (default 1s) for -k8s-create-wait-duration (default 20s).

Mount Endpoint Publishing

Only once the Kubernetes resources have been successfully re-evaluated does the Service endpoint get published in storageos.com/nfs/mount-endpoint on the StorageOS volume, and only if different from the existing value.

In normal operation the Service endpoint should not change - doing so will invalidate client caches and lead to "Stale NFS filehandle" errors. The only likely cause would be in the Service was manually deleted.

Mount Shared Volume

After the CSI ControllerPublishVolume succeeds, it's likely that NodePublishVolume will be called immediately to mount the volume into the application container. This will not succeed until storageos.com/nfs/mount-endpoint has been set by the Shared Volume Controller. The control plane uses the mount endpoint to remotely (or locally) mount the shared volume.

Volume Failover

When a StorageOS master volume fails over to another node, the NFS service gets restarted on that node and the nfs.serviceEndpoint is updated to reflect the new endpoint.

The shared volume control loop will either:

  • ignore the volume if the volume no longer has nfs.serviceEndpoint set.
  • see that the cached volume no longer matches due to a different nfs.serviceEndpoint and will trigger a resource re-evaluation.

The Service will be updated with the new target port and the Endpoint will be updated with the new address and port.

During failover and update, the Service endpoint (<ClusterIP>:2049) does not change but it will not respond until the Endpoint has been updated.

Garbage Collection

Services and Endpoints are automatically removed when the PVC is deleted. The PVC is set as the owner of the service, and the Kubernetes garbage collector will delete it and the Endpoint, which is automatically associated with the Service.

Prometheus Metrics

The following metrics are collected:

  • storageos_shared_volume_reconcile_duration_seconds Distribution of the length of time taken to reconcile all shared volumes.