-
Notifications
You must be signed in to change notification settings - Fork 955
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: overcommit-plugin enhancements
Signed-off-by: googs1025 <[email protected]>
- Loading branch information
Showing
3 changed files
with
117 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
# overcommit-plugin | ||
|
||
[@googs1025](https://github.com/googs1025); Jul. 29, 2024 | ||
|
||
### Background: | ||
The overcommit-plugin is used to amplify node resources to achieve resource allocation. | ||
### Objective: | ||
Use different amplification factors based on different resource types. | ||
|
||
|
||
## Introduction | ||
Currently, the overcommit-plugin enhances the Allocatable resources of a node to achieve the functionality of AddJobEnqueuedFn. However, different resources should have different factors, so using the same overcommit-factor is not appropriate. | ||
![factor](images/overcommit-plugin.png) | ||
- For example: | ||
|
||
The Binpack plugin assigns different weights to different resources as well. | ||
|
||
|
||
```yaml | ||
actions: "enqueue, reclaim, allocate, backfill, preempt" | ||
tiers: | ||
- plugins: | ||
- name: binpack | ||
arguments: | ||
binpack.weight: 10 | ||
binpack.cpu: 5 | ||
binpack.memory: 1 | ||
binpack.resources: nvidia.com/gpu, example.com/foo | ||
binpack.resources.nvidia.com/gpu: 2 | ||
binpack.resources.example.com/foo: 3 | ||
``` | ||
## Solution | ||
We can further break down the overcommit-factor into more granular components: cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor. | ||
To maintain compatibility with the existing approach, we will retain the original overcommit-factor field and introduce optional fields for cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor. | ||
![factors](images/overcommit-plugin-with-multi-factors.png) | ||
The priority of these fields will be from low to high: | ||
`defaultOverCommitFactor -> overcommit-factor -> cpu-overcommit-factor, men-overcommit-factor, other overcommit-factor` | ||
|
||
|
||
- overcommitPlugin struct | ||
|
||
```go | ||
type overcommitFactors struct { | ||
cpu float64 | ||
memory float64 | ||
other float64 | ||
} | ||
type overcommitPlugin struct { | ||
// Arguments given for the plugin | ||
pluginArguments framework.Arguments | ||
totalResource *api.Resource | ||
idleResource *api.Resource | ||
inqueueResource *api.Resource | ||
overCommitFactors overcommitFactors | ||
} | ||
``` | ||
|
||
#### Example | ||
|
||
Example 1: | ||
Specify all three values: cpu-overcommit-factor, mem-overcommit-factor, and other-overcommit-factor simultaneously. | ||
```yaml | ||
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: overcommit | ||
arguments: | ||
cpu-overcommit-factor: 1.2 | ||
mem-overcommit-factor: 1.0 | ||
other-overcommit-factor: 1.2 | ||
``` | ||
|
||
Example 2: | ||
Specifying only the overcommit-factor implies that all three factors are the same. | ||
```yaml | ||
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: overcommit | ||
arguments: | ||
overcommit-factor: 1.0 | ||
``` | ||
Example 3: | ||
Specifying any one of cpu-overcommit-factor, mem-overcommit-factor, or other-overcommit-factor, along with specifying overcommit-factor: indicates that the resource uses a specific value, while other values use the overcommit-factor field. | ||
```yaml | ||
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: overcommit | ||
arguments: | ||
cpu-overcommit-factor: 1.2 | ||
overcommit-factor: 1.0 | ||
``` | ||
Example 4: | ||
Specifying any one of cpu-overcommit-factor, mem-overcommit-factor, or other-overcommit-factor: indicates that the resource uses a specific value, while other values use the defaultOverCommitFactor default value. | ||
```yaml | ||
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: overcommit | ||
arguments: | ||
cpu-overcommit-factor: 1.2 | ||
``` | ||
Example 5: | ||
Not specifying will default to the defaultOverCommitFactor value. | ||
```yaml | ||
actions: "enqueue, allocate, backfill" | ||
tiers: | ||
- plugins: | ||
- name: overcommit | ||
arguments: | ||
``` |