-
Notifications
You must be signed in to change notification settings - Fork 955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: overcommit-plugin enhancements #3634
base: master
Are you sure you want to change the base?
docs: overcommit-plugin enhancements #3634
Conversation
87eccf9
to
5a36cdd
Compare
Why not put it in the overcommit document, but create a new document? |
I would like to, but I can't seem to find any design documentation related to overcommit plugins |
You can name your document overcommit-plugin.md |
I can modify it. I named it overcommit-plugin-enhancements because overcommit-plugin itself was not designed by me and I may not be sure of many backgrounds. |
/kind docs |
Never mind, overcommit plugin is simple, and I believe you can understand it completely. This is also a supplement to the missing documentation in the community. |
5a36cdd
to
3fe8b2e
Compare
thanks! done |
If having time, please take a look at this issue. @lowang-bh @Monokaix thanks a lot |
docs/design/overcommit-plugin.md
Outdated
arguments: | ||
cpu-overcommit-factor: 1.2 | ||
mem-overcommit-factor: 1.0 | ||
other-overcommit-factor: 1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which scene, other resource will have a overcommit request ? Does gpu support overcommit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, GPU resources should not be over-resolved. (Please correct me if I am wrong.) Originally, this plugin multiplied all resources by a fixed overcommit factor. This proposal is to enhance this. We should not make all configurations consistent. The overcommit factor should be configured more flexibly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively separates incompressible resources from compressible resources and sets them differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overcommit plugin is activated when a job is enqueued, and it allows more jobs to enter the Inqueue state by amplifying the factor.
Please refer to |
Signed-off-by: googs1025 <[email protected]>
3fe8b2e
to
9f1c3c1
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix: #3635
overcommit-plugin
@googs1025; Jul. 29, 2024
Background:
The overcommit-plugin is used to amplify node resources to achieve resource allocation.
Objective:
Use different amplification factors based on different resource types.
Introduction
Currently, the overcommit-plugin enhances the Allocatable resources of a node to achieve the functionality of AddJobEnqueuedFn. However, different resources should have different factors, so using the same overcommit-factor is not appropriate.
The Binpack plugin assigns different weights to different resources as well.
Solution
We can further break down the overcommit-factor into more granular components:
overcommit-factor.<resource name>
.For example:
overcommit-factor.cpu
overcommit-factor.memory
overcommit-factor.pods
overcommit-factor.ephemeral-storage
overcommit-factor.nvidia.com/gpu
To maintain compatibility with the existing approach, we will retain the original overcommit-factor field and we will keep the original overcommit-factor field and introduce an optional field of
overcommit-factor.<resource name>
.The priority of these fields will be from low to high:
defaultOverCommitFactor -> overcommit-factor -> overcommit-factor.<resorce name>
Example
Example 1:
Explicitly specify all the overcommit factors
Example 2:
Specifying only the overcommit-factor implies that all factors are the same.
Example 3:
Specifying overcommit-factor.cpu, overcommit-factor.nvidia.com/gpu are set, along with specifying overcommit-factor: indicates that the resource uses a specific value, while other values use the overcommit-factor field.
Example 4:
Specifying any one of overcommit-factor.cpu is set: indicates that the resource uses a specific value, while other values use the defaultOverCommitFactor default value.
Example 5:
Not specifying will default to the defaultOverCommitFactor value.