Dask worker resources #69

PythonFZ · 2023-03-24T15:45:38Z

This looks really nice. Just a general comment on Dask at a large scale. The issue we came across was that Dask will deploy as many jobs as possible to a single task which will mean if you set your cluster parameters such that you want 5 GB of memory because that is how much on job requires, it will violate it immediately as it just gives this amount of memory to one worker who can run as many jobs as possible.

The way I am trying to avoid this is to assign worker resources which limit the number of tasks it can deploy, e.g. 1 GPU per model training means only one model can run on the worker at a time. Or, for espresso, 1 "espresso". But in the case you have here, if you task was a large matrix computation and you heavily parallelised over nodes, I think you would reach a dead worker pretty fast.

Originally posted by @SamTov in #21 (review)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask worker resources #69

Dask worker resources #69

PythonFZ commented Mar 24, 2023 •

edited

Loading

Dask worker resources #69

Dask worker resources #69

Comments

PythonFZ commented Mar 24, 2023 • edited Loading

PythonFZ commented Mar 24, 2023 •

edited

Loading