-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically update the ddm_quota of tape rses #728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I like that we will have a way to prioritise, 'largeness' vs 'freeness'.
It is kinda complicated to think what the math is doing. It is a hard problem to find a balance/single number.
What can help here is a table with numbers and plots to justify what the function and current constants do.
The table could be something like,
| TAPE_RSE | PLEDGE | FREE_SPACE | DDM_QUOTA |
and a plots that show,
pledge
vsddm_quota
free_space
vsddm_quota
@@ -90,28 +81,35 @@ def calculate_ddm_quotas(): | |||
if static == 0: | |||
continue # Skip if static is 0 | |||
|
|||
# Apply penalty for special rses | |||
rse_attributes = client.list_rse_attributes(rse) | |||
if "ddm_quota_penalty" in rse_attributes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this penalty as a multiplier to the weights instead of the values. It makes it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two problems with this suggestion:
- It wouldn't be the same thing if you mean
static, rucio, expired = (x for x in (static, rucio, expired))
free = static - rucio
ddm_quota = static_weight * (static / total_static) + free_weight * \
(free / static) + expired_weight * (expired / static)
if "ddm_quota_penalty" in rse_attributes:
ddm_quota *= float(rse_attributes["ddm_quota_penalty"])
It yields different values.
- If we do it as you suggest, it becomes harder to explain what we're doing. We explain
ddm_quota_penalty
as the factor which reduces all space metrics of an RSE, e.g. if CERN has 100PB of static, 10PB of free, we'll treat it as if has 50PB of static and 5PB of free if the penalty value is 0.5. I don't know how to explain yours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A penalty should not be used to say reduce the actual values to lower
rather to reduce the factor
.
So, you say, we use a smaller coefficient (as a penalty) compared to rest.
And not that I shrink the rse to a smaller one - that would not be clear actually - we aren't reducing the site space just giving it less weight.
Yes, it would end up in different numbers, given how the relative weights are calculated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the light of latest discussions and your comment, I suggest two things:
-
rename "ddm_quota" to "dm_weight". This will be used for tape placements only for now. @amaltaro already agreed with this and they will update their code such that they'll start using
dm_weight
soon. We'll coordinate before deployment indeed -
I suggest removing ddm_quota_penalty attribute and replace it with "override_factor" attribute which should be applied as follows:
dm_weight = (STATIC_WEIGHT * <relative-pledge> + FREE_WEIGHT * <relative-free-space>) * <override_factor>
With the current status quo, we can set the "override_factor" to a value more than 1 for FNAL Tape and a value lower than 1 for CERN Tape. Does it sounds reasonable? @dynamic-entropy
Thanks a lot Rahul, I liked reporting the values in a tabular format. I can do it as
I think the plot is outside of the scope of this PR. It only makes sense to do it in a time series format on Grafana. We can do it, but I don't know if it's worth it. Perhaps |
Ah apologies. I did not mean to track these numbers on grafana. |
Hi @dynamic-entropy I applied the changes we talked about. The normalization that you suggest doesn't work between 0, 100, since the sum is too large. It produces very small and similar values. So, I kept the previous normalization. I also removed
I'm setting https://gitlab.cern.ch/cmsdmops/Documentation/-/merge_requests/24 We need to link it to the code, once you merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have left some minor comments for changes.
Also, please rebase.
|
||
|
||
# Calculate dm_weights for tape rses | ||
run(rse_expression = "rse_type=TAPE&wmcore_output_tape=True\cms_type=test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tapes do not have wmcore_output_tape=True
?
If we already have wmcore_output_tape=True
as an attribute that defines if data goes or not on the tape. Then we do not need to explicitly set dm_weight
(or ddm_quota
) to 0
.
Also, do we not have cms_type=real
for the TAPE rses?
We shall prefer it for consistency with the disk expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CERN, JINR and MIT Tape aren't used for prod output and we're not setting dm_weight
for them. However, @amaltaro was asking to set the attribute for them as well: link to jira, but it's not clear whether this is strongly necessary, so I keep those RSEs untouched for now.
Also, do we not have cms_type=real for the TAPE rses?
We shall prefer it for consistency with the disk expression.
We do, I'll update it.
It would give numbers close to the fraction of free space. Ultimately keeping things even. If you still prefer it this way, simply add a small number (1 to 10) to the present calculation. Just don't let any site have 0 chances when not required explicitly. I only see your comment and the link to docs now. I guess you can merge the docs, that won't change things. |
…ssions consistent
Thanks Rahul, I applied your requested changes. For normalization, I made the minimum [1]
|
@haozturk All is good. I did not mean a value of Just squash the commits into one and force push. |
Thanks Rahul, can you not just "Squash and Merge" via Github? Since, I fixed the merge conflict via |
Hmm, let me try. I thought you would want to edit the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Ok, done. |
Fixes #727
This PR includes changes which updates the ddm_quota of tape rses according to their relative free space and relative pledge. In addition, there is a special operation for CERN and FNAL tapes (configurable) which treats them smaller than they are to lower their weight for the compensation of their special asymmetric usage of RAW data.
If you run it as is, you'll get the following weights
It's possible to lower FNAL's and CERN's weights even more by lowering their
ddm_quota_penalty
, which are set as 0.75 and 0.5 at the moment. Additionally,static_weight
andfree_weight
are set as 0.5 each at the moment. It's possible to increase the weights of larger sites by increasingstatic_weight