diff --git a/docs/design/overcommit-plugin-enhancement.md b/docs/design/overcommit-plugin-enhancement.md new file mode 100644 index 00000000000..0c4fed1980b --- /dev/null +++ b/docs/design/overcommit-plugin-enhancement.md @@ -0,0 +1,107 @@ +# overcommit-plugin enhancements + +[@googs1025](https://github.com/googs1025); Jul. 29, 2024 +## Introduction +Currently, the overcommit-plugin enhances the Allocatable resources of a node to achieve the functionality of AddJobEnqueuedFn. However, different resources should have different factors, so using the same overcommit-factor is not appropriate. +- For example: + +The Binpack plugin assigns different weights to different resources as well. + + +```yaml +actions: "enqueue, reclaim, allocate, backfill, preempt" +tiers: +- plugins: + - name: binpack + arguments: + binpack.weight: 10 + binpack.cpu: 5 + binpack.memory: 1 + binpack.resources: nvidia.com/gpu, example.com/foo + binpack.resources.nvidia.com/gpu: 2 + binpack.resources.example.com/foo: 3 +``` + +## Solution +We can further break down the overcommit-factor into more granular components: cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor. +To maintain compatibility with the existing approach, we will retain the original overcommit-factor field and introduce optional fields for cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor. + +The priority of these fields will be from low to high: + +`defaultOverCommitFactor -> overcommit-factor -> cpu-overcommit-factor, men-overcommit-factor, other overcommit-factor` + + +- overcommitPlugin struct + +```go +type overcommitFactors struct { + cpu float64 + memory float64 + other float64 +} + +type overcommitPlugin struct { + // Arguments given for the plugin + pluginArguments framework.Arguments + totalResource *api.Resource + idleResource *api.Resource + inqueueResource *api.Resource + overCommitFactors overcommitFactors +} +``` + +#### Example + +Example 1: +Specify all three values: cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor simultaneously. +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: overcommit + arguments: + cpu-overcommit-factor: 1.2 + mem-overcommit-factor: 1.0 + other-overcommit-factor: 1.2 +``` + +Example 2: +Specifying only the overcommit-factor implies that all three factors are the same. +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: overcommit + arguments: + overcommit-factor: 1.0 +``` +Example 3: +Specifying any one of cpu-overcommit-factor, mem-overcommit-factor, or other-overcommit-factor, along with specifying overcommit-factor: indicates that the resource uses a specific value, while other values use the overcommit-factor field. +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: overcommit + arguments: + cpu-overcommit-factor: 1.2 + overcommit-factor: 1.0 +``` +Example 4: +Specifying any one of cpu-overcommit-factor, mem-overcommit-factor, or other-overcommit-factor: indicates that the resource uses a specific value, while other values use the defaultOverCommitFactor default value. +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: overcommit + arguments: + cpu-overcommit-factor: 1.2 +``` +Example 5: +Not specifying will default to the defaultOverCommitFactor value. +```yaml +actions: "enqueue, allocate, backfill" +tiers: +- plugins: + - name: overcommit + arguments: +```