Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically update the ddm_quota of tape rses #728

Merged
merged 5 commits into from
Apr 10, 2024

Conversation

haozturk
Copy link
Contributor

@haozturk haozturk commented Feb 27, 2024

Fixes #727

This PR includes changes which updates the ddm_quota of tape rses according to their relative free space and relative pledge. In addition, there is a special operation for CERN and FNAL tapes (configurable) which treats them smaller than they are to lower their weight for the compensation of their special asymmetric usage of RAW data.

If you run it as is, you'll get the following weights

DRY-RUN: Set ddm_quota for T1_IT_CNAF_Tape to 7
DRY-RUN: Set ddm_quota for T1_ES_PIC_Tape to 1000
DRY-RUN: Set ddm_quota for T1_US_FNAL_Tape to 223
DRY-RUN: Set ddm_quota for T1_DE_KIT_Tape to 78
DRY-RUN: Set ddm_quota for T1_FR_CCIN2P3_Tape to 0
DRY-RUN: Set ddm_quota for T0_CH_CERN_Tape to 140
DRY-RUN: Set ddm_quota for T1_UK_RAL_Tape to 296

It's possible to lower FNAL's and CERN's weights even more by lowering their ddm_quota_penalty, which are set as 0.75 and 0.5 at the moment. Additionally, static_weight and free_weight are set as 0.5 each at the moment. It's possible to increase the weights of larger sites by increasing static_weight

Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I like that we will have a way to prioritise, 'largeness' vs 'freeness'.
It is kinda complicated to think what the math is doing. It is a hard problem to find a balance/single number.

What can help here is a table with numbers and plots to justify what the function and current constants do.
The table could be something like,

| TAPE_RSE | PLEDGE | FREE_SPACE | DDM_QUOTA |

and a plots that show,

  • pledge vs ddm_quota
  • free_space vs ddm_quota

docker/CMSRucioClient/scripts/updateDDMQuota Outdated Show resolved Hide resolved
docker/CMSRucioClient/scripts/updateDDMQuota Outdated Show resolved Hide resolved
@@ -90,28 +81,35 @@ def calculate_ddm_quotas():
if static == 0:
continue # Skip if static is 0

# Apply penalty for special rses
rse_attributes = client.list_rse_attributes(rse)
if "ddm_quota_penalty" in rse_attributes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this penalty as a multiplier to the weights instead of the values. It makes it more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two problems with this suggestion:

  1. It wouldn't be the same thing if you mean
        static, rucio, expired = (x for x in (static, rucio, expired))
        free = static - rucio
        ddm_quota = static_weight * (static / total_static) + free_weight * \
                    (free / static) + expired_weight * (expired / static)
        if "ddm_quota_penalty" in rse_attributes:        
            ddm_quota *= float(rse_attributes["ddm_quota_penalty"])

It yields different values.

  1. If we do it as you suggest, it becomes harder to explain what we're doing. We explain ddm_quota_penalty as the factor which reduces all space metrics of an RSE, e.g. if CERN has 100PB of static, 10PB of free, we'll treat it as if has 50PB of static and 5PB of free if the penalty value is 0.5. I don't know how to explain yours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A penalty should not be used to say reduce the actual values to lower rather to reduce the factor.
So, you say, we use a smaller coefficient (as a penalty) compared to rest.
And not that I shrink the rse to a smaller one - that would not be clear actually - we aren't reducing the site space just giving it less weight.

Yes, it would end up in different numbers, given how the relative weights are calculated.

Copy link
Contributor Author

@haozturk haozturk Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the light of latest discussions and your comment, I suggest two things:

  1. rename "ddm_quota" to "dm_weight". This will be used for tape placements only for now. @amaltaro already agreed with this and they will update their code such that they'll start using dm_weight soon. We'll coordinate before deployment indeed

  2. I suggest removing ddm_quota_penalty attribute and replace it with "override_factor" attribute which should be applied as follows:

dm_weight =  (STATIC_WEIGHT * <relative-pledge> + FREE_WEIGHT * <relative-free-space>) * <override_factor>

With the current status quo, we can set the "override_factor" to a value more than 1 for FNAL Tape and a value lower than 1 for CERN Tape. Does it sounds reasonable? @dynamic-entropy

docker/CMSRucioClient/scripts/updateDDMQuota Outdated Show resolved Hide resolved
@haozturk
Copy link
Contributor Author

haozturk commented Mar 6, 2024

Thanks a lot Rahul, I liked reporting the values in a tabular format. I can do it as

| TAPE_RSE | PLEDGE (original) | FREE_SPACE (original) | PLEDGE (after penalty) | FREE_SPACE (after penalty) | DDM_QUOTA |

I think the plot is outside of the scope of this PR. It only makes sense to do it in a time series format on Grafana. We can do it, but I don't know if it's worth it. Perhaps ddm_quota can added to this panel [1] (preferably after it's migrated to the probes)

[1] https://monit-grafana.cern.ch/d/WPDxqoL4z/rucioaccounts?orgId=11&var-RSE=T1_ES_PIC_Tape&var-AccountType=All&from=now-7d&to=now&viewPanel=9&var-AccountName=All&var-ArchivalAccount=All&var-diskserviceaccount=All

@dynamic-entropy
Copy link
Contributor

Ah apologies. I did not mean to track these numbers on grafana.
I simply meant we put them here (in the chat) to be able to see what our inputs are and what the output is.
And by plots, I meant a simple matplotlib (or excel) rendering (of the same table) to have a more visual representation of the input-output relation.

@haozturk
Copy link
Contributor Author

haozturk commented Apr 8, 2024

Hi @dynamic-entropy I applied the changes we talked about. The normalization that you suggest doesn't work between 0, 100, since the sum is too large. It produces very small and similar values. So, I kept the previous normalization. I also removed override_ddm_quota attribute, since dm_weight_coefficient replaces that. Here's the dry-run results:

RSE                    PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
-------------------  -------------  -----------------  -------------------  -----------------------  -----------
T2_PT_NCG_Lisbon            0.5             0.0690387            13.8077                          1           10
T2_FR_IPHC                  2.2             0.27894              12.6791                          1           15
T2_PL_Cyfronet              0.4             0.0412895            10.3224                          1            3
T2_FR_GRIF                  2.832           0.579064             20.4472                          1           16
T2_IT_Pisa                  2.85            0.327395             11.4876                          1           25
T2_CH_CERN                 28.5            -0.258027             -0.905357                        1            0
T2_US_Florida               4.52            0.876801             19.3983                          1            6
T2_ES_IFCA                  0.7             0.10252              14.6458                          1            9
T2_AT_Vienna                0.5             0.0634887            12.6977                          1            4
T2_BR_UERJ                  0.21            0.122233             58.2062                          1           58
T2_US_Caltech               4.9             0.777826             15.874                           1            9
T2_RU_ITEP                  0.231           0.0294863            12.7646                          1            3
T2_IT_Bari                  3.05            1.1677               38.2853                          1           18
T1_UK_RAL_Disk              7.693           0.780169             10.1413                          1           25
T2_UK_SGrid_RALPP           1.8             0.251705             13.9836                          1            8
T2_US_Vanderbilt           11.2             2.66535              23.7978                          1            5
T2_DE_RWTH                  3               0.618512             20.6171                          1           16
T1_IT_CNAF_Disk            10               1.06531              10.6531                          1           17
T2_US_Nebraska              4.324           0.45741              10.5784                          1            3
T2_KR_KISTI                 1.2             0.141435             11.7863                          1           13
T2_US_MIT                   5.53            0.618379             11.1823                          1            8
T2_UA_KIPT                  1               0.0582457             5.82457                         1           12
T2_US_Wisconsin             3.9             0.539743             13.8396                          1            3
T2_BE_IIHE                  4.992           0.617938             12.3786                          1           17
T2_CN_Beijing               0.55            0.0553334            10.0606                          1            7
T2_UK_London_IC             6.3             0.800395             12.7047                          1           10
T2_DE_DESY                  6.5             0.756525             11.6388                          1           11
T2_IT_Rome                  2.35            0.475621             20.2392                          1           12
T2_RU_INR                   0.24            0.0238311             9.92964                         1            1
T2_FI_HIP                   2.175           0.370199             17.0206                          1           15
T2_FR_GRIF_IRFU             1.3             1.29085              99.2961                          1           98
T2_UK_London_Brunel         0.65            0.26212              40.3262                          1           20
T2_TW_NCHC                  0.7             0.101478             14.4969                          1            9
T2_CH_CSCS                  2.78            0.444761             15.9986                          1           14
T2_BE_UCL                   1.96            0.251748             12.8443                          1           10
T2_US_UCSD                  2.735           0.299978             10.9681                          1            2
T2_EE_Estonia               1.38            0.161557             11.7071                          1            8
T2_IT_Legnaro               3.75            0.731454             19.5055                          1           12
T2_IN_TIFR                  5.75            0.892409             15.5202                          1           30
T2_FR_GRIF_LLR              1.532           1.52876              99.7885                          1          100
T1_ES_PIC_Disk              4.1             0.606416             14.7906                          1           17
T2_PL_Swierk                0.63            0.0671292            10.6554                          1            8
T1_US_FNAL_Disk            36.95            1.97735               5.35141                         1            4
T2_US_Purdue                4.6825          0.487229             10.4053                          1            4
T2_TR_METU                  0.925           0.089409              9.66583                         1            6
T1_RU_JINR_Disk            10.6             1.92817              18.1903                          1           25
T2_ES_CIEMAT                4.25            0.509059             11.9779                          1           16
T1_DE_KIT_Disk             10.93            3.86501              35.3615                          1           32
T2_BR_SPRACE                2               0.474925             23.7462                          1           11
T2_RU_JINR                  1.57            0.210147             13.3851                          1           16
T2_RU_IHEP                  0.3             0.0330585            11.0195                          1            1
T2_UK_SGrid_Bristol         0.4             0.0319308             7.9827                          1           17
T1_FR_CCIN2P3_Disk          8               1.52812              19.1015                          1           17
T2_HU_Budapest              1.45            0.156867             10.8184                          1           13
T2_PK_NCP                   0.401           0.218931             54.5963                          1           66
RSE                   PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
------------------  -------------  -----------------  -------------------  -----------------------  -----------
T1_DE_KIT_Tape             38                4.51753             11.8882                       1             13
T1_IT_CNAF_Tape            41.08             2.31124              5.6262                       1              6
T1_FR_CCIN2P3_Tape         32.548            1.45373              4.46641                      1              0
T1_UK_RAL_Tape             24.424            6.93688             28.4019                       1             30
T1_US_FNAL_Tape           126.4             20.5856              16.2861                       1.4          100
T1_ES_PIC_Tape             17.2              7.59287             44.1446                       1             48

I'm setting dm_weight values for tape RSEs now, since MSOutput will start using it starting from tomorrow. Let me know what you think of these changes. I think we don't need to over-engineer it. If there are points that you're not sure of, we can monitor it and update it later. This is the documentation:

https://gitlab.cern.ch/cmsdmops/Documentation/-/merge_requests/24

We need to link it to the code, once you merge it.

Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some minor comments for changes.

Also, please rebase.

docker/rucio_client/scripts/updateDDMQuota Show resolved Hide resolved
docker/rucio_client/scripts/updateDDMQuota Show resolved Hide resolved


# Calculate dm_weights for tape rses
run(rse_expression = "rse_type=TAPE&wmcore_output_tape=True\cms_type=test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tapes do not have wmcore_output_tape=True?
If we already have wmcore_output_tape=True as an attribute that defines if data goes or not on the tape. Then we do not need to explicitly set dm_weight (or ddm_quota) to 0.

Also, do we not have cms_type=real for the TAPE rses?
We shall prefer it for consistency with the disk expression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CERN, JINR and MIT Tape aren't used for prod output and we're not setting dm_weight for them. However, @amaltaro was asking to set the attribute for them as well: link to jira, but it's not clear whether this is strongly necessary, so I keep those RSEs untouched for now.

Also, do we not have cms_type=real for the TAPE rses?
We shall prefer it for consistency with the disk expression.

We do, I'll update it.

docker/rucio_client/scripts/updateDDMQuota Outdated Show resolved Hide resolved
docker/rucio_client/scripts/updateDDMQuota Outdated Show resolved Hide resolved
docker/rucio_client/scripts/updateDDMQuota Show resolved Hide resolved
@dynamic-entropy
Copy link
Contributor

The normalization that you suggest doesn't work between 0, 100, since the sum is too large. It produces very small and similar values.

It would give numbers close to the fraction of free space. Ultimately keeping things even.
Having too drastic differences only puts a load few sites at the same time.

If you still prefer it this way, simply add a small number (1 to 10) to the present calculation. Just don't let any site have 0 chances when not required explicitly.

I only see your comment and the link to docs now. I guess you can merge the docs, that won't change things.
And then add the link too.

@haozturk
Copy link
Contributor Author

haozturk commented Apr 9, 2024

Thanks Rahul, I applied your requested changes. For normalization, I made the minimum dm_weight 1. I think this is okay for now. If we do value/sum(values) it generates this output [1] which is not what we need. I also fixed the merge conflicts. Is there anything missing to do?

[1]

RSE                    PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
-------------------  -------------  -----------------  -------------------  -----------------------  -----------
T2_US_Nebraska              4.324           0.455954              10.5447                         1            0
T2_RU_INR                   0.24            0.0241191             10.0496                         1            0
T2_US_Purdue                4.6825          0.636235              13.5875                         1            0
T2_BR_UERJ                  0.21            0.122232              58.2057                         0            0
T2_FR_IPHC                  2.2             0.277428              12.6104                         1            1
T2_FI_HIP                   2.175           0.368057              16.9221                         1            1
T2_US_Vanderbilt           11.2             2.74783               24.5342                         1            0
T2_UA_KIPT                  1               0.0573702              5.73702                        1            1
T1_US_FNAL_Disk            36.95            2.15713                5.83796                        1            0
T2_DE_DESY                  6.5             0.753922              11.5988                         1            1
T1_FR_CCIN2P3_Disk          8               1.52518               19.0648                         1            1
T1_RU_JINR_Disk            10.6             1.92432               18.154                          1            2
T2_RU_IHEP                  0.3             0.0330585             11.0195                         0            0
T2_CN_Beijing               0.55            0.0548313              9.96933                        1            0
T2_IN_TIFR                  5.75            0.889611              15.4715                         1            3
T2_KR_KISTI                 1.2             0.140466              11.7055                         1            1
T2_EE_Estonia               1.38            0.159888              11.5861                         1            1
T2_IT_Pisa                  2.85            0.32497               11.4025                         1            2
T2_CH_CSCS                  2.78            0.442781              15.9274                         1            1
T2_ES_CIEMAT                4.25            0.505959              11.9049                         1            1
T2_BR_SPRACE                2               0.603086              30.1543                         1            2
T2_CH_CERN                 30.5             1.73296                5.68183                        1            0
T2_HU_Budapest              1.45            0.156043              10.7616                         1            1
T2_BE_IIHE                  4.992           0.614046              12.3006                         1            1
T2_FR_GRIF                  2.832           0.577513              20.3924                         1            1
T2_IT_Rome                  2.35            0.474096              20.1743                         1            1
T2_RU_JINR                  1.57            0.208562              13.2842                         1            1
T2_BE_UCL                   1.96            0.249846              12.7472                         1            1
T2_UK_SGrid_RALPP           1.8             0.24945               13.8583                         1            0
T1_DE_KIT_Disk             10.93            3.86151               35.3295                         1            3
T2_TR_METU                  0.925           0.0889149              9.61243                        1            0
T2_ES_IFCA                  0.7             0.100259              14.3227                         1            1
T2_US_Wisconsin             3.9             0.687359              17.6246                         1            0
T2_IT_Bari                  3.05            1.16349               38.1473                         1            1
T2_PL_Cyfronet              0.4             0.0410616             10.2654                         1            0
T1_ES_PIC_Disk              4.1             0.593401              14.4732                         1            1
T2_FR_GRIF_IRFU             1.3             1.29085               99.2961                         1           10
T2_TW_NCHC                  0.7             0.100549              14.3642                         1            1
T1_UK_RAL_Disk              7.693           0.796345              10.3515                         1            2
T2_DE_RWTH                  3               0.614908              20.4969                         1            1
T2_PK_NCP                   0.401           0.218931              54.5963                         1            7
T2_UK_London_IC             6.3             0.797182              12.6537                         1            1
T2_US_MIT                   5.53            0.614398              11.1103                         1            0
T2_PT_NCG_Lisbon            0.5             0.0682805             13.6561                         1            1
T2_UK_SGrid_Bristol         0.4             0.0319308              7.9827                         1            1
T2_US_UCSD                  2.735           0.298285              10.9062                         1            0
T2_US_Florida               4.52            1.06341               23.5267                         1            1
T2_PL_Swierk                0.63            0.06647               10.5508                         1            0
T1_IT_CNAF_Disk            10               1.06214               10.6214                         1            1
T2_AT_Vienna                0.5             0.0623828             12.4766                         1            0
T2_FR_GRIF_LLR              1.532           1.52876               99.7885                         1           10
T2_RU_ITEP                  0.231           0.0292225             12.6504                         1            0
T2_US_Caltech               4.9             0.890109              18.1655                         1            1
T2_UK_London_Brunel         0.65            0.258093              39.7066                         1            2
T2_IT_Legnaro               3.75            0.729108              19.4429                         1            1 

@dynamic-entropy
Copy link
Contributor

@haozturk All is good. I did not mean a value of dm_weight/total. The calculation of the weights should itself have given a value between zero and 1 (they need not sum up to 1).
However, if you are happy with the results of this calculation, I am okay to merge it.
I would have preferred to wrap up the whole calculation into a single function; however, this doesn't matter much if we see data distribution as expected.

Just squash the commits into one and force push.
I will merge it.

@haozturk
Copy link
Contributor Author

haozturk commented Apr 9, 2024

Thanks Rahul, can you not just "Squash and Merge" via Github? Since, I fixed the merge conflict via git merge instead of git rebase, I cannot simply squash last 5 commits, but I need to cherry pick them. It's easier to do it via github.

@dynamic-entropy
Copy link
Contributor

Hmm, let me try. I thought you would want to edit the commit message.

Copy link
Contributor

@dynamic-entropy dynamic-entropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dynamic-entropy dynamic-entropy merged commit d38e167 into dmwm:master Apr 10, 2024
@dynamic-entropy
Copy link
Contributor

Ok, done.
I did get an option to modify the commit message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhancement: Update ddm_quota automatically for TAPE rses as well
2 participants