Dynamically update the ddm_quota of tape rses #728

haozturk · 2024-02-27T14:12:31Z

Fixes #727

This PR includes changes which updates the ddm_quota of tape rses according to their relative free space and relative pledge. In addition, there is a special operation for CERN and FNAL tapes (configurable) which treats them smaller than they are to lower their weight for the compensation of their special asymmetric usage of RAW data.

If you run it as is, you'll get the following weights

DRY-RUN: Set ddm_quota for T1_IT_CNAF_Tape to 7
DRY-RUN: Set ddm_quota for T1_ES_PIC_Tape to 1000
DRY-RUN: Set ddm_quota for T1_US_FNAL_Tape to 223
DRY-RUN: Set ddm_quota for T1_DE_KIT_Tape to 78
DRY-RUN: Set ddm_quota for T1_FR_CCIN2P3_Tape to 0
DRY-RUN: Set ddm_quota for T0_CH_CERN_Tape to 140
DRY-RUN: Set ddm_quota for T1_UK_RAL_Tape to 296

It's possible to lower FNAL's and CERN's weights even more by lowering their ddm_quota_penalty, which are set as 0.75 and 0.5 at the moment. Additionally, static_weight and free_weight are set as 0.5 each at the moment. It's possible to increase the weights of larger sites by increasing static_weight

dynamic-entropy

In general, I like that we will have a way to prioritise, 'largeness' vs 'freeness'.
It is kinda complicated to think what the math is doing. It is a hard problem to find a balance/single number.

What can help here is a table with numbers and plots to justify what the function and current constants do.
The table could be something like,

| TAPE_RSE | PLEDGE | FREE_SPACE | DDM_QUOTA |

and a plots that show,

pledge vs ddm_quota
free_space vs ddm_quota

docker/CMSRucioClient/scripts/updateDDMQuota

dynamic-entropy · 2024-03-05T12:19:34Z

docker/CMSRucioClient/scripts/updateDDMQuota

@@ -90,28 +81,35 @@ def calculate_ddm_quotas():
        if static == 0:
            continue  # Skip if static is 0

+        # Apply penalty for special rses
+        rse_attributes = client.list_rse_attributes(rse)
+        if "ddm_quota_penalty" in rse_attributes:


Can you add this penalty as a multiplier to the weights instead of the values. It makes it more clear.

I see two problems with this suggestion:

It wouldn't be the same thing if you mean

static, rucio, expired = (x for x in (static, rucio, expired)) free = static - rucio ddm_quota = static_weight * (static / total_static) + free_weight * \ (free / static) + expired_weight * (expired / static) if "ddm_quota_penalty" in rse_attributes: ddm_quota *= float(rse_attributes["ddm_quota_penalty"])

It yields different values.

If we do it as you suggest, it becomes harder to explain what we're doing. We explain ddm_quota_penalty as the factor which reduces all space metrics of an RSE, e.g. if CERN has 100PB of static, 10PB of free, we'll treat it as if has 50PB of static and 5PB of free if the penalty value is 0.5. I don't know how to explain yours.

A penalty should not be used to say reduce the actual values to lower rather to reduce the factor.
So, you say, we use a smaller coefficient (as a penalty) compared to rest.
And not that I shrink the rse to a smaller one - that would not be clear actually - we aren't reducing the site space just giving it less weight.

Yes, it would end up in different numbers, given how the relative weights are calculated.

In the light of latest discussions and your comment, I suggest two things:

rename "ddm_quota" to "dm_weight". This will be used for tape placements only for now. @amaltaro already agreed with this and they will update their code such that they'll start using dm_weight soon. We'll coordinate before deployment indeed

I suggest removing ddm_quota_penalty attribute and replace it with "override_factor" attribute which should be applied as follows:

dm_weight = (STATIC_WEIGHT * <relative-pledge> + FREE_WEIGHT * <relative-free-space>) * <override_factor>

With the current status quo, we can set the "override_factor" to a value more than 1 for FNAL Tape and a value lower than 1 for CERN Tape. Does it sounds reasonable? @dynamic-entropy

docker/CMSRucioClient/scripts/updateDDMQuota

haozturk · 2024-03-06T09:13:59Z

Thanks a lot Rahul, I liked reporting the values in a tabular format. I can do it as

| TAPE_RSE | PLEDGE (original) | FREE_SPACE (original) | PLEDGE (after penalty) | FREE_SPACE (after penalty) | DDM_QUOTA |

I think the plot is outside of the scope of this PR. It only makes sense to do it in a time series format on Grafana. We can do it, but I don't know if it's worth it. Perhaps ddm_quota can added to this panel [1] (preferably after it's migrated to the probes)

[1] https://monit-grafana.cern.ch/d/WPDxqoL4z/rucioaccounts?orgId=11&var-RSE=T1_ES_PIC_Tape&var-AccountType=All&from=now-7d&to=now&viewPanel=9&var-AccountName=All&var-ArchivalAccount=All&var-diskserviceaccount=All

dynamic-entropy · 2024-03-06T14:44:32Z

Ah apologies. I did not mean to track these numbers on grafana.
I simply meant we put them here (in the chat) to be able to see what our inputs are and what the output is.
And by plots, I meant a simple matplotlib (or excel) rendering (of the same table) to have a more visual representation of the input-output relation.

…_tapeddmquota

haozturk · 2024-04-08T12:30:23Z

Hi @dynamic-entropy I applied the changes we talked about. The normalization that you suggest doesn't work between 0, 100, since the sum is too large. It produces very small and similar values. So, I kept the previous normalization. I also removed override_ddm_quota attribute, since dm_weight_coefficient replaces that. Here's the dry-run results:

RSE                    PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
-------------------  -------------  -----------------  -------------------  -----------------------  -----------
T2_PT_NCG_Lisbon            0.5             0.0690387            13.8077                          1           10
T2_FR_IPHC                  2.2             0.27894              12.6791                          1           15
T2_PL_Cyfronet              0.4             0.0412895            10.3224                          1            3
T2_FR_GRIF                  2.832           0.579064             20.4472                          1           16
T2_IT_Pisa                  2.85            0.327395             11.4876                          1           25
T2_CH_CERN                 28.5            -0.258027             -0.905357                        1            0
T2_US_Florida               4.52            0.876801             19.3983                          1            6
T2_ES_IFCA                  0.7             0.10252              14.6458                          1            9
T2_AT_Vienna                0.5             0.0634887            12.6977                          1            4
T2_BR_UERJ                  0.21            0.122233             58.2062                          1           58
T2_US_Caltech               4.9             0.777826             15.874                           1            9
T2_RU_ITEP                  0.231           0.0294863            12.7646                          1            3
T2_IT_Bari                  3.05            1.1677               38.2853                          1           18
T1_UK_RAL_Disk              7.693           0.780169             10.1413                          1           25
T2_UK_SGrid_RALPP           1.8             0.251705             13.9836                          1            8
T2_US_Vanderbilt           11.2             2.66535              23.7978                          1            5
T2_DE_RWTH                  3               0.618512             20.6171                          1           16
T1_IT_CNAF_Disk            10               1.06531              10.6531                          1           17
T2_US_Nebraska              4.324           0.45741              10.5784                          1            3
T2_KR_KISTI                 1.2             0.141435             11.7863                          1           13
T2_US_MIT                   5.53            0.618379             11.1823                          1            8
T2_UA_KIPT                  1               0.0582457             5.82457                         1           12
T2_US_Wisconsin             3.9             0.539743             13.8396                          1            3
T2_BE_IIHE                  4.992           0.617938             12.3786                          1           17
T2_CN_Beijing               0.55            0.0553334            10.0606                          1            7
T2_UK_London_IC             6.3             0.800395             12.7047                          1           10
T2_DE_DESY                  6.5             0.756525             11.6388                          1           11
T2_IT_Rome                  2.35            0.475621             20.2392                          1           12
T2_RU_INR                   0.24            0.0238311             9.92964                         1            1
T2_FI_HIP                   2.175           0.370199             17.0206                          1           15
T2_FR_GRIF_IRFU             1.3             1.29085              99.2961                          1           98
T2_UK_London_Brunel         0.65            0.26212              40.3262                          1           20
T2_TW_NCHC                  0.7             0.101478             14.4969                          1            9
T2_CH_CSCS                  2.78            0.444761             15.9986                          1           14
T2_BE_UCL                   1.96            0.251748             12.8443                          1           10
T2_US_UCSD                  2.735           0.299978             10.9681                          1            2
T2_EE_Estonia               1.38            0.161557             11.7071                          1            8
T2_IT_Legnaro               3.75            0.731454             19.5055                          1           12
T2_IN_TIFR                  5.75            0.892409             15.5202                          1           30
T2_FR_GRIF_LLR              1.532           1.52876              99.7885                          1          100
T1_ES_PIC_Disk              4.1             0.606416             14.7906                          1           17
T2_PL_Swierk                0.63            0.0671292            10.6554                          1            8
T1_US_FNAL_Disk            36.95            1.97735               5.35141                         1            4
T2_US_Purdue                4.6825          0.487229             10.4053                          1            4
T2_TR_METU                  0.925           0.089409              9.66583                         1            6
T1_RU_JINR_Disk            10.6             1.92817              18.1903                          1           25
T2_ES_CIEMAT                4.25            0.509059             11.9779                          1           16
T1_DE_KIT_Disk             10.93            3.86501              35.3615                          1           32
T2_BR_SPRACE                2               0.474925             23.7462                          1           11
T2_RU_JINR                  1.57            0.210147             13.3851                          1           16
T2_RU_IHEP                  0.3             0.0330585            11.0195                          1            1
T2_UK_SGrid_Bristol         0.4             0.0319308             7.9827                          1           17
T1_FR_CCIN2P3_Disk          8               1.52812              19.1015                          1           17
T2_HU_Budapest              1.45            0.156867             10.8184                          1           13
T2_PK_NCP                   0.401           0.218931             54.5963                          1           66
RSE                   PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
------------------  -------------  -----------------  -------------------  -----------------------  -----------
T1_DE_KIT_Tape             38                4.51753             11.8882                       1             13
T1_IT_CNAF_Tape            41.08             2.31124              5.6262                       1              6
T1_FR_CCIN2P3_Tape         32.548            1.45373              4.46641                      1              0
T1_UK_RAL_Tape             24.424            6.93688             28.4019                       1             30
T1_US_FNAL_Tape           126.4             20.5856              16.2861                       1.4          100
T1_ES_PIC_Tape             17.2              7.59287             44.1446                       1             48

I'm setting dm_weight values for tape RSEs now, since MSOutput will start using it starting from tomorrow. Let me know what you think of these changes. I think we don't need to over-engineer it. If there are points that you're not sure of, we can monitor it and update it later. This is the documentation:

https://gitlab.cern.ch/cmsdmops/Documentation/-/merge_requests/24

We need to link it to the code, once you merge it.

dynamic-entropy

I have left some minor comments for changes.

Also, please rebase.

docker/rucio_client/scripts/updateDDMQuota

dynamic-entropy · 2024-04-09T08:38:58Z

docker/rucio_client/scripts/updateDDMQuota

+
+
+    # Calculate dm_weights for tape rses
+    run(rse_expression = "rse_type=TAPE&wmcore_output_tape=True\cms_type=test",


Which tapes do not have wmcore_output_tape=True?
If we already have wmcore_output_tape=True as an attribute that defines if data goes or not on the tape. Then we do not need to explicitly set dm_weight (or ddm_quota) to 0.

Also, do we not have cms_type=real for the TAPE rses?
We shall prefer it for consistency with the disk expression.

CERN, JINR and MIT Tape aren't used for prod output and we're not setting dm_weight for them. However, @amaltaro was asking to set the attribute for them as well: link to jira, but it's not clear whether this is strongly necessary, so I keep those RSEs untouched for now.

Also, do we not have cms_type=real for the TAPE rses?
We shall prefer it for consistency with the disk expression.

We do, I'll update it.

docker/rucio_client/scripts/updateDDMQuota

dynamic-entropy · 2024-04-09T08:53:33Z

The normalization that you suggest doesn't work between 0, 100, since the sum is too large. It produces very small and similar values.

It would give numbers close to the fraction of free space. Ultimately keeping things even.
Having too drastic differences only puts a load few sites at the same time.

If you still prefer it this way, simply add a small number (1 to 10) to the present calculation. Just don't let any site have 0 chances when not required explicitly.

I only see your comment and the link to docs now. I guess you can merge the docs, that won't change things.
And then add the link too.

…ssions consistent

haozturk · 2024-04-09T11:57:49Z

Thanks Rahul, I applied your requested changes. For normalization, I made the minimum dm_weight 1. I think this is okay for now. If we do value/sum(values) it generates this output [1] which is not what we need. I also fixed the merge conflicts. Is there anything missing to do?

[1]

RSE                    PLEDGE (PB)    FREE SPACE (PB)    RELATIVE FREE (%)    DM WEIGHT COEFFICIENT    DM_WEIGHT
-------------------  -------------  -----------------  -------------------  -----------------------  -----------
T2_US_Nebraska              4.324           0.455954              10.5447                         1            0
T2_RU_INR                   0.24            0.0241191             10.0496                         1            0
T2_US_Purdue                4.6825          0.636235              13.5875                         1            0
T2_BR_UERJ                  0.21            0.122232              58.2057                         0            0
T2_FR_IPHC                  2.2             0.277428              12.6104                         1            1
T2_FI_HIP                   2.175           0.368057              16.9221                         1            1
T2_US_Vanderbilt           11.2             2.74783               24.5342                         1            0
T2_UA_KIPT                  1               0.0573702              5.73702                        1            1
T1_US_FNAL_Disk            36.95            2.15713                5.83796                        1            0
T2_DE_DESY                  6.5             0.753922              11.5988                         1            1
T1_FR_CCIN2P3_Disk          8               1.52518               19.0648                         1            1
T1_RU_JINR_Disk            10.6             1.92432               18.154                          1            2
T2_RU_IHEP                  0.3             0.0330585             11.0195                         0            0
T2_CN_Beijing               0.55            0.0548313              9.96933                        1            0
T2_IN_TIFR                  5.75            0.889611              15.4715                         1            3
T2_KR_KISTI                 1.2             0.140466              11.7055                         1            1
T2_EE_Estonia               1.38            0.159888              11.5861                         1            1
T2_IT_Pisa                  2.85            0.32497               11.4025                         1            2
T2_CH_CSCS                  2.78            0.442781              15.9274                         1            1
T2_ES_CIEMAT                4.25            0.505959              11.9049                         1            1
T2_BR_SPRACE                2               0.603086              30.1543                         1            2
T2_CH_CERN                 30.5             1.73296                5.68183                        1            0
T2_HU_Budapest              1.45            0.156043              10.7616                         1            1
T2_BE_IIHE                  4.992           0.614046              12.3006                         1            1
T2_FR_GRIF                  2.832           0.577513              20.3924                         1            1
T2_IT_Rome                  2.35            0.474096              20.1743                         1            1
T2_RU_JINR                  1.57            0.208562              13.2842                         1            1
T2_BE_UCL                   1.96            0.249846              12.7472                         1            1
T2_UK_SGrid_RALPP           1.8             0.24945               13.8583                         1            0
T1_DE_KIT_Disk             10.93            3.86151               35.3295                         1            3
T2_TR_METU                  0.925           0.0889149              9.61243                        1            0
T2_ES_IFCA                  0.7             0.100259              14.3227                         1            1
T2_US_Wisconsin             3.9             0.687359              17.6246                         1            0
T2_IT_Bari                  3.05            1.16349               38.1473                         1            1
T2_PL_Cyfronet              0.4             0.0410616             10.2654                         1            0
T1_ES_PIC_Disk              4.1             0.593401              14.4732                         1            1
T2_FR_GRIF_IRFU             1.3             1.29085               99.2961                         1           10
T2_TW_NCHC                  0.7             0.100549              14.3642                         1            1
T1_UK_RAL_Disk              7.693           0.796345              10.3515                         1            2
T2_DE_RWTH                  3               0.614908              20.4969                         1            1
T2_PK_NCP                   0.401           0.218931              54.5963                         1            7
T2_UK_London_IC             6.3             0.797182              12.6537                         1            1
T2_US_MIT                   5.53            0.614398              11.1103                         1            0
T2_PT_NCG_Lisbon            0.5             0.0682805             13.6561                         1            1
T2_UK_SGrid_Bristol         0.4             0.0319308              7.9827                         1            1
T2_US_UCSD                  2.735           0.298285              10.9062                         1            0
T2_US_Florida               4.52            1.06341               23.5267                         1            1
T2_PL_Swierk                0.63            0.06647               10.5508                         1            0
T1_IT_CNAF_Disk            10               1.06214               10.6214                         1            1
T2_AT_Vienna                0.5             0.0623828             12.4766                         1            0
T2_FR_GRIF_LLR              1.532           1.52876               99.7885                         1           10
T2_RU_ITEP                  0.231           0.0292225             12.6504                         1            0
T2_US_Caltech               4.9             0.890109              18.1655                         1            1
T2_UK_London_Brunel         0.65            0.258093              39.7066                         1            2
T2_IT_Legnaro               3.75            0.729108              19.4429                         1            1

dynamic-entropy · 2024-04-09T12:18:05Z

@haozturk All is good. I did not mean a value of dm_weight/total. The calculation of the weights should itself have given a value between zero and 1 (they need not sum up to 1).
However, if you are happy with the results of this calculation, I am okay to merge it.
I would have preferred to wrap up the whole calculation into a single function; however, this doesn't matter much if we see data distribution as expected.

Just squash the commits into one and force push.
I will merge it.

haozturk · 2024-04-09T13:50:01Z

Thanks Rahul, can you not just "Squash and Merge" via Github? Since, I fixed the merge conflict via git merge instead of git rebase, I cannot simply squash last 5 commits, but I need to cherry pick them. It's easier to do it via github.

dynamic-entropy · 2024-04-10T07:07:45Z

Hmm, let me try. I thought you would want to edit the commit message.

dynamic-entropy

LGTM

dynamic-entropy · 2024-04-10T07:11:40Z

Ok, done.
I did get an option to modify the commit message.

Dynamically update the ddm_quota of tape rses as well

b1781ec

haozturk requested a review from dynamic-entropy February 27, 2024 14:12

dynamic-entropy requested changes Mar 5, 2024

View reviewed changes

Hasan Ozturk added 3 commits April 8, 2024 14:13

Update dm_weight according to the latest discussions

d20510d

Keep setting ddm_quota for disk rses until we deprecate it

e0849ad

Merge branch 'master' of https://github.com/dmwm/CMSRucio into update…

f478bf9

…_tapeddmquota

haozturk requested a review from dynamic-entropy April 8, 2024 12:30

dynamic-entropy reviewed Apr 9, 2024

View reviewed changes

Add docs link, make the lowest weight non-zero and make the RSE expre…

a3a9222

…ssions consistent

haozturk mentioned this pull request Apr 9, 2024

Feature: Rename ddm_quota to dm_weight for disk RSEs too #772

Open

haozturk requested a review from dynamic-entropy April 9, 2024 11:58

dynamic-entropy approved these changes Apr 10, 2024

View reviewed changes

dynamic-entropy merged commit d38e167 into dmwm:master Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically update the ddm_quota of tape rses #728

Dynamically update the ddm_quota of tape rses #728

haozturk commented Feb 27, 2024 •

edited

Loading

dynamic-entropy left a comment

dynamic-entropy Mar 5, 2024

haozturk Mar 6, 2024

dynamic-entropy Mar 6, 2024

haozturk Mar 22, 2024 •

edited

Loading

haozturk commented Mar 6, 2024

dynamic-entropy commented Mar 6, 2024

haozturk commented Apr 8, 2024

dynamic-entropy left a comment

dynamic-entropy Apr 9, 2024

haozturk Apr 9, 2024

dynamic-entropy commented Apr 9, 2024

haozturk commented Apr 9, 2024 •

edited

Loading

dynamic-entropy commented Apr 9, 2024

haozturk commented Apr 9, 2024

dynamic-entropy commented Apr 10, 2024

dynamic-entropy left a comment

dynamic-entropy commented Apr 10, 2024



		# Calculate dm_weights for tape rses
		run(rse_expression = "rse_type=TAPE&wmcore_output_tape=True\cms_type=test",

Dynamically update the ddm_quota of tape rses #728

Dynamically update the ddm_quota of tape rses #728

Conversation

haozturk commented Feb 27, 2024 • edited Loading

dynamic-entropy left a comment

Choose a reason for hiding this comment

dynamic-entropy Mar 5, 2024

Choose a reason for hiding this comment

haozturk Mar 6, 2024

Choose a reason for hiding this comment

dynamic-entropy Mar 6, 2024

Choose a reason for hiding this comment

haozturk Mar 22, 2024 • edited Loading

Choose a reason for hiding this comment

haozturk commented Mar 6, 2024

dynamic-entropy commented Mar 6, 2024

haozturk commented Apr 8, 2024

dynamic-entropy left a comment

Choose a reason for hiding this comment

dynamic-entropy Apr 9, 2024

Choose a reason for hiding this comment

haozturk Apr 9, 2024

Choose a reason for hiding this comment

dynamic-entropy commented Apr 9, 2024

haozturk commented Apr 9, 2024 • edited Loading

dynamic-entropy commented Apr 9, 2024

haozturk commented Apr 9, 2024

dynamic-entropy commented Apr 10, 2024

dynamic-entropy left a comment

Choose a reason for hiding this comment

dynamic-entropy commented Apr 10, 2024

haozturk commented Feb 27, 2024 •

edited

Loading

haozturk Mar 22, 2024 •

edited

Loading

haozturk commented Apr 9, 2024 •

edited

Loading