The Shared Volume controller is responsible for creating Kubernetes Services and
Endpoint resources for attached shared volumes and then publishing the Service's
endpoint to the storageos.com/nfs/mount-endpoint
label on the volume.
When the StorageOS control plane receives a CSI CreateVolume
request for a
ReadWriteMany
(RWX) volume, it creates a standard volume. All StorageOS
volumes support both SINGLE_NODE_WRITER
and MULTI_NODE_MULTI_WRITER
CSI
access modes.
Only when the control plane receives a CSI ControllerPublishVolume
with
VolumeCapability
set to the MULTI_NODE_MULTI_WRITER
access mode does it
attach the volume as shared. When attached, the volume will have:
attachmentType
set tonfs
.nfs.serviceEndpoint
set to the address at which the NFS server is bound.
These steps happen without involvement from the Shared Volume controller.
Shared Volumes must have a Service created with a ClusterIP
that does not
change for the lifetime of the PVC. The ClusterIP
is combined with a static
port (2049
, the default NFS port), and set as the
storageos.com/nfs/mount-endpoint
label on the StorageOS volume.
An Endpoint must also exist with the same name and namespace as the Service,
with the target set to the NFS server endpoint as defined in
nfs.serviceEndpoint
.
Since a Shared Volume is tied to a specific PVC, the Service and Endpoint both use the PVC name and namespace.
Resources are checked for existence and equivalence before deciding whether a
create, update, or no action is required. When a resource is created, it is
re-fetched before proceeding. Since the resource may not appear in the k8s api
immediately, the resource is polled every -k8s-create-poll-interval
(default
1s
) for -k8s-create-wait-duration
(default 20s
).
Only once the Kubernetes resources have been successfully re-evaluated does the
Service endpoint get published in storageos.com/nfs/mount-endpoint
on the
StorageOS volume, and only if different from the existing value.
In normal operation the Service endpoint should not change - doing so will invalidate client caches and lead to "Stale NFS filehandle" errors. The only likely cause would be in the Service was manually deleted.
After the CSI ControllerPublishVolume
succeeds, it's likely that
NodePublishVolume
will be called immediately to mount the volume into the
application container. This will not succeed until storageos.com/nfs/mount-endpoint
has been set by the Shared Volume Controller. The control plane uses the mount
endpoint to remotely (or locally) mount the shared volume.
When a StorageOS master volume fails over to another node, the NFS service gets
restarted on that node and the nfs.serviceEndpoint
is updated to reflect the
new endpoint.
The shared volume control loop will either:
- ignore the volume if the volume no longer has
nfs.serviceEndpoint
set. - see that the cached volume no longer matches due to a different
nfs.serviceEndpoint
and will trigger a resource re-evaluation.
The Service will be updated with the new target port and the Endpoint will be updated with the new address and port.
During failover and update, the Service endpoint (<ClusterIP>:2049
) does not
change but it will not respond until the Endpoint has been updated.
Services and Endpoints are automatically removed when the PVC is deleted. The PVC is set as the owner of the service, and the Kubernetes garbage collector will delete it and the Endpoint, which is automatically associated with the Service.
The following metrics are collected:
storageos_shared_volume_reconcile_duration_seconds
Distribution of the length of time taken to reconcile all shared volumes.