Module to update site lists for WMAgents #12123

vkuznet · 2024-09-30T17:48:40Z

Fixes #12039

Status

In development

Description

Provide site list updates for WMAgents with the following logic:
- obtain list of current active workflows from the agent using WMBS database
- we extract workflow name and pickle file name (the spec)
- requests their JSON specs from upstream ReqMgr2 server
- check and if necessary update site lists of all active workflows
- push new specs to the agent local WorkQueue and update spec pickle file

Please note: This code will be used with polling cycle method, i.e. it will run as separate thread within WorkflowUpdater component. As such, it can have potentially impact on:

upstream ReqMgr2 service, as we will perform concurrent requests to this service. The number of requests will correspond to total number of active workflows in a particular WMAgent. Since we have multiple agents this will spread across all of them.
- for instance, we we have 100 active workflows at any given time these 100 calls will be spread across all agents
- on ReqMgr2 side it means it will load all JSON specs and send them to the wire which may increase its RAM usage (as JSON loading has such behavior)
to update the spec we need to load pickle file, update its site lists, and write it back. This is potentially slow IO operation which will impact the component execution time:
- if we'll have X active workflows in an agent it means and we'll spend O(sec) per pickle file load/save operation it means we'll spend X seconds in each agent during the update. The X number should be relatively small since not all active workflows may need site list change
- it also means that our component can load X pickle files content into RAM to make an update
there is non-zero probability that local WorkQueue content for spec will deviate from pickle file if presented code will fail at run-time because local WorkQueue is separate service from actual pickle spec file on local file disk. At the moment there is no mechanism how to ensure that both operations will successfully persist with more complexity of the code, e.g. we need to record that local WorkQueue update was successful along with name of updated workflow(s) in order to update their pickle files later.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

External dependencies / deployment changes

cmsdmwmbot · 2024-09-30T18:02:43Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 3 warnings and errors that must be fixed
- 7 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 3 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15257/artifact/artifacts/PullRequestReport.html

amaltaro · 2024-09-30T20:23:47Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+        self.url = url
+        self.logger = logger
+
+    def getActiveRequests(self, wflowSpecs):


This should be using a SQL statement, similar to what you have implemented for pileup updates in WorkflowUpdater.

If we'll put this class to be used in WorkflowUpdaterPoller.py (as I think it will be), then the WorkflowUpdaterPoller.py code does this SQL statement I didn't want to do it again as we already got wflowSpecs in here: https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/WorkflowUpdater/WorkflowUpdaterPoller.py#L286

amaltaro · 2024-09-30T20:24:52Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                    ]
+        return requests, wmaSpecs
+
+    def getRequestSpecs(self, requests):


Even though we are going to need it to fetch the workflow spec itself, I think we will have to implement a call to the wmstatserver service and retrieve the site lists with a filter+mask call.

Why? if we fetch the whole spec it contains site lists, why do we need to get it again from wmstatserver? I just don't understand the reason.

It is not to get it again, but to get it from WMStats instead of ReqMgr2.
For WMStats, we can retrieve data in bulk.

amaltaro · 2024-09-30T20:28:57Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                wmaSpec = wmaSpecs[key]
+                workloadHelper = WMWorkloadHelper()
+                workloadHelper.load(wmaSpec)
+                taskIterator = workloadHelper.taskIterator()


no need for iterating over the object ourselves. You can simply call the relevant WMWorkload object method, e.g.:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMSpec/WMWorkload.py#L676

yes, we do need to iterate to check if sitelist needs to be changed, otherwise we only do assignment without knowing if it is required. But I don't mind to make the assignments. The reason I structured the code to identify which task must be updated and therefore when update happens we should push back the task to whatever place it should be.

cmsdmwmbot · 2024-09-30T20:59:06Z

Can one of the admins verify this patch?

cmsdmwmbot · 2024-10-01T12:54:27Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 3 changes in unstable tests
Python3 Pylint check: failed
- 3 warnings and errors that must be fixed
- 7 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 2 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15260/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-01T14:27:15Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 6 warnings and errors that must be fixed
- 3 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15261/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-02T12:55:03Z

Jenkins results:

Python3 Unit tests: failed
- 1 new failures
- 2 changes in unstable tests
Python3 Pylint check: failed
- 6 warnings and errors that must be fixed
- 1 warnings
- 88 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 16 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15268/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-02T13:56:13Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 2 changes in unstable tests
Python3 Pylint check: failed
- 10 warnings and errors that must be fixed
- 1 warnings
- 91 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 15 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15269/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-02T15:00:14Z

Jenkins results:

Python3 Unit tests: succeeded
- 2 changes in unstable tests
Python3 Pylint check: failed
- 9 warnings and errors that must be fixed
- 1 warnings
- 91 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 15 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15270/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-02T17:07:52Z

Jenkins results:

Python3 Unit tests: succeeded
- 3 changes in unstable tests
Python3 Pylint check: failed
- 6 warnings and errors that must be fixed
- 1 warnings
- 91 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 15 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15272/artifact/artifacts/PullRequestReport.html

cmsdmwmbot · 2024-10-02T17:58:10Z

Jenkins results:

Python3 Unit tests: succeeded
- 1 tests no longer failing
- 1 changes in unstable tests
Python3 Pylint check: failed
- 10 warnings and errors that must be fixed
- 1 warnings
- 91 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 15 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15273/artifact/artifacts/PullRequestReport.html

todor-ivanov

It looks good to me. I left just tow comments inline

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

todor-ivanov · 2024-10-03T11:18:00Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

+                                 'reduce': False})
+        elementsToUpdate = [x['id'] for x in data.get('rows', [])]
+        if elementsToUpdate:
+            self.updateElements(*elementsToUpdate, SiteWhiteList=siteWhiteList, SiteBlackList=siteBlackList)


Just a note here, not a request for changing anything. When it comes to the global Workqueue we are about to attempt to create a method to update the workqueue elements per workflow :

WMCore/src/python/WMCore/Services/WorkQueue/WorkQueue.py

Lines 259 to 291 in 1f78a64

def updateElementsByWorkflow(self, wf, updateParams, status=None):

"""

Update all available WorkQueue elements of a given workflow with a set

of arguments provided through the `updateParams` dictionary

:param wf: The workflow name

:param updateParams: A dictionary with parameters to be updated

:param status: A list of allowed WorkQueue elements statuses to be considered for updating

Default: None - do not filter by status

:return: No value, raises exceptions from internal methods in case of errors.

"""

# Fetch the whole view with Workqueue elements per given workflow

data = self.db.loadView('WorkQueue', 'elementsDetailByWorkflowAndStatus',

{'startkey': [wf], 'endkey': [wf, {}],

'reduce': False})

# Fetch only a list of WorkQueue element Ids && Filter them by allowed status

if status:

elementsToUpdate = [x['id'] for x in data.get('rows', []) if x['value']['Status'] in status]

else:

elementsToUpdate = [x['id'] for x in data.get('rows', [])]

# Update all WorkQueue elements with the parameters provided in a single push

if elementsToUpdate:

self.updateElements(*elementsToUpdate, **updateParams)

# Update the spec, if it exists

if self.db.documentExists(wf):

wmspec = WMWorkloadHelper()

wmspec.load(self.hostWithAuth + "/%s/%s/spec" % (self.db.name, wf))

wmspec.updateWorkloadArgs(updateParams)

dummy_values = {'name': wmspec.name()}

wmspec.saveCouch(self.hostWithAuth, self.db.name, dummy_values)

return

And yet another method to update the full specs with all wmspec parameters at a single push. For the later the following methods was developed, but still not merged in the master branch:

WMCore/src/python/WMCore/WMSpec/WMWorkload.py

Lines 78 to 121 in 1f78a64

def updateWorkloadArgs(self, reqArgs):

"""

Method to take a dictionary of arguments of the type:

{reqArg1: value,

reqArg2: value,

...}

and update the workload by a predefined map of reqArg to setter methods.

:param reqArgs: A Dictionary of request arguments to be updated

:return: Nothing, Raises an error of type WMWorkloadException if

fails to apply the proper setter method

"""

# NOTE: So far we support only a single argument setter methods, like

# setSiteWhitelist or setPriority. This may change in the future,

# but it will require a change in the logic of how we validate and

# call the proper setter methods bellow.

# populate the current instance settersMap

self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority))

self.settersMap['SiteBlacklist'] = setterTuple('SiteBlacklist', self.setSiteBlacklist, inspect.signature(self.setSiteBlacklist))

self.settersMap['SiteWhitelist'] = setterTuple('SiteWhitelist', self.setSiteWhitelist, inspect.signature(self.setSiteWhitelist))

# First validate if we can properly call the setter function given the reqArgs passed.

for reqArg, argValue in reqArgs.items():

if not self.settersMap.get(reqArg, None):

msg = f"Unsupported or missing setter method for updating reqArg: {reqArg}."

raise WMWorkloadException(msg)

try:

self.settersMap[reqArg].setterSignature.bind(argValue)

except TypeError as ex:

msg = f"Setter's method signature does not match the method calls we currently support: Error: req{str(ex)}"

raise WMWorkloadException(msg) from None

# Now go through the reqArg again and call every setter method according to the map

for reqArg, argValue in reqArgs.items():

try:

self.settersMap[reqArg].setterFunc(argValue)

except Exception as ex:

currFrame = inspect.currentframe()

argsInfo = inspect.getargvalues(currFrame)

argVals = {arg: argsInfo.locals.get(arg) for arg in argsInfo.args}

msg = f"Failure while calling setter method {self.settersMap[reqArg].setterFunc.__name__} "

msg += f"With arguments: {argVals}"

msg += f"Full exception string: {str(ex)}"

raise WMWorkloadException(msg) from None

I know the scale here while working with the local queue is far from what we see in the global workqueue, but maybe once we converge on the other PR you may consider using the this same method here as well. If not anything else it at least

cmsdmwmbot · 2024-10-03T12:02:46Z

Jenkins results:

Python3 Unit tests: succeeded
Python3 Pylint check: failed
- 5 warnings and errors that must be fixed
- 1 warnings
- 91 comments to review
Pylint py3k check: succeeded
Pycodestyle check: succeeded
- 15 comments to review

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/15276/artifact/artifacts/PullRequestReport.html

amaltaro

Valentin, please find my comments along the code.

In addition to those, I would suggest to use wmstatsserver instead of ReqMgr2, such that we can drastically reduce the amount of HTTP calls made during each cycle.

An example url would be: https://cmsweb.cern.ch/wmstatsserver/data/filtered_requests?RequestStatus=running-open&mask=SiteWhitelist&mask=SiteBlacklist

and I would suggest to repeat it for RequestStatus=acquired as well.

For that, I think we can simply use this wrapper method:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WMStatsServer/WMStatsServer.py#L92

instead of writing the pycurl calls from scratch.

amaltaro · 2024-10-14T11:04:57Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+from WMCore.WorkerThreads.BaseWorkerThread import BaseWorkerThread
+
+
+class SiteListUpdater(Harness):


Here you are basically creating a new component called SiteListUpdater.
Instead, pleas remove this code and add a worker thread (SiteListUpdatePoller) to the component manager thread:
https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/WorkflowUpdater/WorkflowUpdater.py

ok, moved SiteListUpdatePoller into WorkflowUpdater.

amaltaro · 2024-10-14T11:06:51Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+        self.logger = myThread.logger
+
+        # the reqmgr2Url should be points to ReqMgr2 data services, i.e. /reqmgr2 end-point
+        self.reqmgrUrl = getattr(config.SiteListUpdater, "reqmgr2Url")


As briefly discussed in different moments, I think we should rely on WMStats cached data to reduce the amount of calls by orders of magnitude. I will give you further details by the end of this review.

ok, switching to wmstats then

amaltaro · 2024-10-14T11:08:38Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+    def getActiveWorkflows(self):
+        """
+        Provide list of active requests in a system (from assigned to running)
+        obtained the agent


"in"?
And the line below seems incorrect to me, as this method seem to be returning the workflow name and its spec path (not a dict)

ok, adjusted

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

amaltaro · 2024-10-14T11:11:13Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                    ]
+        return requests, wmaSpecs
+
+    def getRequestSpecs(self, requests):


It is not to get it again, but to get it from WMStats instead of ReqMgr2.
For WMStats, we can retrieve data in bulk.

amaltaro · 2024-10-14T12:21:09Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

@@ -237,6 +237,31 @@ def cancelWorkflow(self, wf):
        elements = [x['id'] for x in data.get('rows', []) if x['key'][1] not in nonCancelableElements]
        return self.updateElements(*elements, Status='CancelRequested')

+    def updateSiteLists(self, wf, siteWhiteList, siteBlackList):
+        """
+        Update site lists of a workflow


Can you please improve this docstring? Please refer to https://github.com/dmwm/WMCore/blob/master/CONTRIBUTING.rst#project-docstrings-best-practices

In addition, I think we should make both sitewhitelist and siteblacklist optional, as a workflow could be updating only one of those lists.

I'm not sure what to improve as docstring complaint with contributing requirements, i.e. it provides description of input parameters and what is returned by the function. If further improvements in docstring itself is required please be specific.

Meanwhile, I converted site/black lists to be optional parameters.

Short description of the method can likely be better, e.g. "Update site list parameters in elements matching a given workflow and a list of element statuses.
In addition, parameters are supposed to mention the data type, as written in the document I linked above.

amaltaro · 2024-10-14T12:23:09Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                except Exception as ex:
+                    msg = f"Caught unexpected exception in SiteListUpdater. Details:\n{str(ex)}"
+                    logging.exception(msg)
+                    raise LocalWorkqueueUpdateException(msg) from None


If we raise this exception, we end up breaking the cycle. Instead, we should record the error but proceed with the other workflows in the pipeline.

ok, switched to continue

amaltaro · 2024-10-14T12:24:07Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+
+                # update workload only if we updated local WorkQueue
+                # update site white/black lists together
+                wHelper.setWhitelist(siteWhiteList)


IMO, we should only call this if there was an actual change to these lists. The if statement above does not distinguish which of the 2 lists was changed.

amaltaro · 2024-10-14T12:28:05Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+
+                    # save back pickle file
+                    newPklFileName = pklFileName.split('.pkl')[0] + '_new.pkl'
+                    wHelper.save(newPklFileName)


I'd suggest to simply save the workload changes in its expected location, hence no need for additional file and checks that can add new failure scenarios.

the code is correct, the check is required as newPklFileName may not be written, e.g. disk is full, in fact we need even better check to see that the size of newPklFileName is comparable with old one to make sure that file content is written.

Valentin, we use this module for 10+ years. This code is very noisy and there is no need at all for all these concerns and checks. If the node runs out of filesystem storage, everything else would be broken, including SQL/NoSQL databases. Keep it simple please.

amaltaro · 2024-10-14T12:28:45Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                except Exception as ex:
+                    msg = f"Caught unexpected exception in SiteListUpdater. Details:\n{str(ex)}"
+                    logging.exception(msg)
+                    raise SpecFileUpdateException(msg) from None


Same as the comment made for the other exception. We cannot break the whole cycle, instead, the component needs to keep processing further workflows and simply skip the one where we had problems updating the list.

ok, switched to continue

vkuznet · 2024-10-14T15:01:35Z

test this please

amaltaro · 2024-10-14T18:50:12Z

test this please

vkuznet · 2024-10-15T13:10:06Z

test this please

amaltaro · 2024-10-15T14:56:20Z

It looks like tests ran with your last trigger, as we can see here:
https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/15310/

but Jenkins failed to publish those results back to the PR. Maybe @d-ylee and/or @khurtado are doing something that might have cause this(?)

vkuznet · 2024-10-16T11:39:50Z

test this please

vkuznet · 2024-10-16T14:52:40Z

@amaltaro , I give up with awaiting Jenkins post to this PR. I looked at https://cmssdt.cern.ch/dmwm-jenkins/job/DMWM-WMCore-PR-test/15310/ and style wise everything is fine. The reported unit tests failure do not seem to be relevant here either. Therefore I'm asking for your review again to speed up things.

amaltaro · 2024-10-19T10:15:40Z

test this please

amaltaro

Valentin, changes are coming to a final shape. However, please look into my comments along the code.

amaltaro · 2024-10-19T11:12:07Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

@@ -0,0 +1,164 @@
+#!/usr/bin/env python
+"""
+File       : SiteListUpdater


Just to keep things more consistent across components, perhaps naming this as SiteListPoller would be better.

amaltaro · 2024-10-19T11:17:52Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+        for state in states:
+            url = "{}/data/filtered_requests?RequestStatus={}&mask=SiteWhitelist&mask=SiteBlacklist".format(self.wmstatsUrl, state)
+            urls.append(url)
+        response = getdata(urls, ckey(), cert())


Please use the wrapper module for interacting with the WMStats Server. There is this API that reaches the endpoint you need:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/WMStatsServer/WMStatsServer.py#L92

amaltaro · 2024-10-19T11:23:02Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+            wmaWhiteList = wHelper.getSiteWhiteList()
+            wmaBlackList = wHelper.getSiteBlackList()
+            if set(wmaWhiteList) != set(siteWhiteList) or set(wmaBlackList) != set(siteBlackList):
+                self.logger.info(f"Updating {wflow}: siteWhiteList {wmaWhiteList} => {siteWhiteList} and siteBlackList {wmaBlackList} => {siteBlackList}")


I would suggest breaking this log record into multiple lines. Something like:

Updating {wflow}: SiteWhiteList {wmaWhiteList} => {siteWhiteList} SiteBlackList {wmaBlackList} => {siteBlackList}

amaltaro · 2024-10-19T11:24:47Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+                    # update local WorkQueue first
+                    self.localWQ.updateSiteLists(wflow, siteWhiteList, siteBlackList)
+                except Exception as ex:
+                    msg = f"Caught unexpected exception in SiteListUpdater. Details:\n{str(ex)}"


We are executing one method here, so I would suggest to make the error message less generic. E.g.: "Unexpected exception while updating elements in local workqueue....

amaltaro · 2024-10-19T11:28:37Z

src/python/WMComponent/WorkflowUpdater/SiteListUpdater.py

+
+                    # save back pickle file
+                    newPklFileName = pklFileName.split('.pkl')[0] + '_new.pkl'
+                    wHelper.save(newPklFileName)


Valentin, we use this module for 10+ years. This code is very noisy and there is no need at all for all these concerns and checks. If the node runs out of filesystem storage, everything else would be broken, including SQL/NoSQL databases. Keep it simple please.

amaltaro · 2024-10-19T11:31:53Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

@@ -237,6 +237,31 @@ def cancelWorkflow(self, wf):
        elements = [x['id'] for x in data.get('rows', []) if x['key'][1] not in nonCancelableElements]
        return self.updateElements(*elements, Status='CancelRequested')

+    def updateSiteLists(self, wf, siteWhiteList, siteBlackList):
+        """
+        Update site lists of a workflow


Short description of the method can likely be better, e.g. "Update site list parameters in elements matching a given workflow and a list of element statuses.
In addition, parameters are supposed to mention the data type, as written in the document I linked above.

amaltaro · 2024-10-19T11:44:28Z

src/python/WMCore/Services/WorkQueue/WorkQueue.py

+        data = self.db.loadView('WorkQueue', 'elementsDetailByWorkflowAndStatus',
+                                {'startkey': [wf], 'endkey': [wf, {}],
+                                 'reduce': False})
+        elementsToUpdate = [x['id'] for x in data.get('rows', [])]


To filter it in the query, I think we would have to pass in a tuple with the (wflow_name, status). However, I actually think we should use a cheaper view like:
https://cmsweb.cern.ch/couchdb/workqueue/_design/WorkQueue/_view/jobStatusByRequest?reduce=false

then iterate over the rows and compare the Status field with what we want.

amaltaro · 2024-10-19T12:06:01Z

@khurtado did you hard-wire any tests to this specific PR? I am surprised to see that most of the other PRs are returning with jenkins report, but this one.

Looking into the log that was running this morning for this PR:
https://cmssdt.cern.ch/dmwm-jenkins/job/WMCore-PR-test-job-khurtado/621/console

I see specifically:

...
Setting status of 10bc178f7b2e453915fc1cf2e656d617d03482fe to PENDING with url https://cmssdt.cern.ch/dmwm-jenkins/job/WMCore-PR-test-job-khurtado/621/ and message: 'Build started for merge commit.'
...
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files were found. Configuration error?
Setting status of 10bc178f7b2e453915fc1cf2e656d617d03482fe to FAILURE with url https://cmssdt.cern.ch/dmwm-jenkins/job/WMCore-PR-test-job-khurtado/621/ and message: 'Build finished. '

These tests have been persistently failing for 5 days now. Can you please look into this?

amaltaro reviewed Sep 30, 2024

View reviewed changes

vkuznet self-assigned this Oct 1, 2024

vkuznet changed the title ~~Add new module to manage site list updates~~ Module to update site lists for WMAgents Oct 1, 2024

vkuznet force-pushed the fix-issue-12039 branch from e9a371c to c3e0a2b Compare October 2, 2024 16:57

vkuznet force-pushed the fix-issue-12039 branch from 9236b4b to 7c5dce9 Compare October 2, 2024 17:14

vkuznet force-pushed the fix-issue-12039 branch from 7c5dce9 to 98c3804 Compare October 2, 2024 18:02

vkuznet requested review from todor-ivanov and amaltaro October 2, 2024 18:57

todor-ivanov approved these changes Oct 3, 2024

View reviewed changes

vkuznet force-pushed the fix-issue-12039 branch from 601f325 to 358c344 Compare October 9, 2024 12:34

amaltaro requested changes Oct 14, 2024

View reviewed changes

Add new module to manage site list updates

f5cbee0

vkuznet force-pushed the fix-issue-12039 branch from 358c344 to f5cbee0 Compare October 14, 2024 13:34

new unit tests for getSiteWhitelist and getSiteBlacklist functions

10bc178

vkuznet requested a review from amaltaro October 16, 2024 14:52

amaltaro requested changes Oct 19, 2024

View reviewed changes

	def updateElementsByWorkflow(self, wf, updateParams, status=None):
	"""
	Update all available WorkQueue elements of a given workflow with a set
	of arguments provided through the `updateParams` dictionary
	:param wf: The workflow name
	:param updateParams: A dictionary with parameters to be updated
	:param status: A list of allowed WorkQueue elements statuses to be considered for updating
	Default: None - do not filter by status
	:return: No value, raises exceptions from internal methods in case of errors.
	"""
	# Fetch the whole view with Workqueue elements per given workflow
	data = self.db.loadView('WorkQueue', 'elementsDetailByWorkflowAndStatus',
	{'startkey': [wf], 'endkey': [wf, {}],
	'reduce': False})

	# Fetch only a list of WorkQueue element Ids && Filter them by allowed status
	if status:
	elementsToUpdate = [x['id'] for x in data.get('rows', []) if x['value']['Status'] in status]
	else:
	elementsToUpdate = [x['id'] for x in data.get('rows', [])]

	# Update all WorkQueue elements with the parameters provided in a single push
	if elementsToUpdate:
	self.updateElements(elementsToUpdate, *updateParams)

	# Update the spec, if it exists
	if self.db.documentExists(wf):
	wmspec = WMWorkloadHelper()
	wmspec.load(self.hostWithAuth + "/%s/%s/spec" % (self.db.name, wf))
	wmspec.updateWorkloadArgs(updateParams)
	dummy_values = {'name': wmspec.name()}
	wmspec.saveCouch(self.hostWithAuth, self.db.name, dummy_values)
	return

	def updateWorkloadArgs(self, reqArgs):
	"""
	Method to take a dictionary of arguments of the type:
	{reqArg1: value,
	reqArg2: value,
	...}
	and update the workload by a predefined map of reqArg to setter methods.
	:param reqArgs: A Dictionary of request arguments to be updated
	:return: Nothing, Raises an error of type WMWorkloadException if
	fails to apply the proper setter method
	"""
	# NOTE: So far we support only a single argument setter methods, like
	# setSiteWhitelist or setPriority. This may change in the future,
	# but it will require a change in the logic of how we validate and
	# call the proper setter methods bellow.

	# populate the current instance settersMap
	self.settersMap['RequestPriority'] = setterTuple('RequestPriority', self.setPriority, inspect.signature(self.setPriority))
	self.settersMap['SiteBlacklist'] = setterTuple('SiteBlacklist', self.setSiteBlacklist, inspect.signature(self.setSiteBlacklist))
	self.settersMap['SiteWhitelist'] = setterTuple('SiteWhitelist', self.setSiteWhitelist, inspect.signature(self.setSiteWhitelist))

	# First validate if we can properly call the setter function given the reqArgs passed.
	for reqArg, argValue in reqArgs.items():
	if not self.settersMap.get(reqArg, None):
	msg = f"Unsupported or missing setter method for updating reqArg: {reqArg}."
	raise WMWorkloadException(msg)
	try:
	self.settersMap[reqArg].setterSignature.bind(argValue)
	except TypeError as ex:
	msg = f"Setter's method signature does not match the method calls we currently support: Error: req{str(ex)}"
	raise WMWorkloadException(msg) from None

	# Now go through the reqArg again and call every setter method according to the map
	for reqArg, argValue in reqArgs.items():
	try:
	self.settersMap[reqArg].setterFunc(argValue)
	except Exception as ex:
	currFrame = inspect.currentframe()
	argsInfo = inspect.getargvalues(currFrame)
	argVals = {arg: argsInfo.locals.get(arg) for arg in argsInfo.args}
	msg = f"Failure while calling setter method {self.settersMap[reqArg].setterFunc.__name__} "
	msg += f"With arguments: {argVals}"
	msg += f"Full exception string: {str(ex)}"
	raise WMWorkloadException(msg) from None

		from WMCore.WorkerThreads.BaseWorkerThread import BaseWorkerThread


		class SiteListUpdater(Harness):

Module to update site lists for WMAgents #12123

Are you sure you want to change the base?

Module to update site lists for WMAgents #12123

Conversation

vkuznet commented Sep 30, 2024 • edited Loading

Status

Description

Is it backward compatible (if not, which system it affects?)

Related PRs

External dependencies / deployment changes

cmsdmwmbot commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsdmwmbot commented Sep 30, 2024

cmsdmwmbot commented Oct 1, 2024

cmsdmwmbot commented Oct 1, 2024

cmsdmwmbot commented Oct 2, 2024

cmsdmwmbot commented Oct 2, 2024

cmsdmwmbot commented Oct 2, 2024

cmsdmwmbot commented Oct 2, 2024

cmsdmwmbot commented Oct 2, 2024

todor-ivanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmsdmwmbot commented Oct 3, 2024

amaltaro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkuznet commented Oct 14, 2024

amaltaro commented Oct 14, 2024

vkuznet commented Oct 15, 2024

amaltaro commented Oct 15, 2024

vkuznet commented Oct 16, 2024

vkuznet commented Oct 16, 2024

amaltaro commented Oct 19, 2024

amaltaro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amaltaro commented Oct 19, 2024

vkuznet commented Sep 30, 2024 •

edited

Loading