Better registration support for a wide range of third-party hardware #20349

uniartisan · 2024-10-19T06:04:56Z

What does this PR do?

Thank you to the lightning team for providing such an easy-to-use, clearly designed library.

The pr draft hopes to provide better registration support for a wide range of third-party hardware, and the pr is designed to integrate third-party hardware with minimal intrusive changes, including intel XPU, Huawei Ascend NPU, Cambrian, Moorethreads, and more.

Fixes #<issue_number>

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20349.org.readthedocs.build/en/20349/

uniartisan · 2024-10-19T06:06:40Z

Examples here: https://github.com/uniartisan/RWKV-PEFT/blob/device-enhance/train.py#L499

There are a lot of things to be checked, I will try to do it later and make it more clear in documentation

codecov · 2024-10-22T07:16:46Z

Codecov Report

Attention: Patch coverage is 81.69014% with 13 lines in your changes missing coverage. Please review.

Project coverage is 88%. Comparing base (1e88899) to head (31c3412).
Report is 2 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #20349   +/-   ##
=======================================
- Coverage      88%      88%   -0%     
=======================================
  Files         267      267           
  Lines       23266    23321   +55     
=======================================
+ Hits        20375    20418   +43     
- Misses       2891     2903   +12

---- 🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

lantiga

Thank you for the interesting PR! I added a few comments.

lantiga · 2024-11-25T23:52:04Z

src/lightning/fabric/accelerators/accelerator.py

@@ -46,6 +46,11 @@ def parse_devices(devices: Any) -> Any:
    def get_parallel_devices(devices: Any) -> Any:
        """Gets parallel devices for the Accelerator."""

+    @staticmethod
+    @abstractmethod
+    def get_device() -> Any:


should we name this get_device_type instead? this would be consistent with the fact that in PyTorch x.device.type is a string ("cpu", "cuda", etc).

lantiga · 2024-11-25T23:57:15Z

src/lightning/fabric/plugins/precision/amp.py

+                if _TORCH_GREATER_EQUAL_2_4
+                else getattr(
+                    torch,
+                    "cuda" if not isinstance(device, str) or device.split(":")[0] == "cpu" else device.split(":")[0],


not sure I understand this condition, can you please clarify?

lantiga · 2024-11-25T23:58:05Z

src/lightning/fabric/plugins/precision/fsdp.py

@@ -49,13 +49,16 @@ class FSDPPrecision(Precision):

    """

-    def __init__(self, precision: _PRECISION_INPUT, scaler: Optional["ShardedGradScaler"] = None) -> None:
+    def __init__(
+        self, precision: _PRECISION_INPUT, scaler: Optional["ShardedGradScaler"] = None, device: Optional[str] = None


this is device_type, since device may not be a string and may have a device_id appended to it

lantiga · 2024-11-25T23:58:28Z

src/lightning/fabric/plugins/precision/fsdp.py

@@ -111,7 +114,9 @@ def module_init_context(self) -> AbstractContextManager:
    @override
    def forward_context(self) -> AbstractContextManager:
        if "mixed" in self.precision:
-            return torch.autocast("cuda", dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16))
+            return torch.autocast(
+                self.device, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16)


Suggested change

self.device, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16)

self.device_type, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16)

lantiga · 2024-11-25T23:58:42Z

src/lightning/fabric/plugins/precision/fsdp.py

        supported_precision = get_args(_PRECISION_INPUT)
        if precision not in supported_precision:
            raise ValueError(
                f"`precision={precision!r})` is not supported in FSDP."
                f" `precision` must be one of: {supported_precision}."
            )
+        self.device = device if device is not None else "cuda"


Suggested change

self.device = device if device is not None else "cuda"

self.device_type = device_type if device_type is not None else "cuda"

lantiga · 2024-11-26T00:03:14Z

src/lightning/fabric/strategies/ddp.py

@@ -124,7 +124,13 @@ def setup_module(self, module: Module) -> DistributedDataParallel:
        """Wraps the model into a :class:`~torch.nn.parallel.distributed.DistributedDataParallel` module."""
        device_ids = self._determine_ddp_device_ids()
        # https://pytorch.org/docs/stable/notes/cuda.html#id5
-        ctx = torch.cuda.stream(torch.cuda.Stream()) if device_ids is not None else nullcontext()
+        ctx = (
+            getattr(torch, f"{self.root_device.type.split(':')[0]}").stream(


I'd avoid doing this inline. Better to getattr and assign to a variable, and then use the variable.

My main question here is what is the contract that ensures that an accelerator has a concept of streams. Unless I read through the code, as a developer I wouldn't know that I need to register streams as torch.mygpu.stream and torch.mygpu.Stream().

So we should either guard the strategy to only apply to "cuda", or introduce a stream contract to the accelerator. I'd much rather do the former.

lantiga · 2024-11-26T00:06:47Z

src/lightning/fabric/strategies/deepspeed.py

@@ -507,7 +507,11 @@ def load_checkpoint(

        optimzer_state_requested = any(isinstance(item, (Optimizer, DeepSpeedOptimizer)) for item in state.values())

-        torch.cuda.empty_cache()
+        if isinstance(self.accelerator, Accelerator) and self.accelerator.get_device() != "cpu":
+            getattr(torch, self.root_device.type.split(":")[0]).empty_cache()


Same goes for the comment above. empty_cache is not part of a contract (nor is the fact that a device is registered as a submodule of the torch module). It needs to be if we want to rely on calling emtpy_cache on whatever device we pass.

BTW, being a torch submodule is too strong of a requirement in my opinion.

In this case we should probably guard the strategy to be GPU-only.

github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Oct 19, 2024

uniartisan force-pushed the device-enhance branch from f863645 to baf3e5c Compare October 19, 2024 06:22

github-actions bot added the docs Documentation related label Oct 19, 2024

uniartisan marked this pull request as ready for review October 19, 2024 07:13

uniartisan requested review from lantiga, Borda, tchaton and justusschock as code owners October 19, 2024 07:13

uniartisan changed the title ~~Device enhance~~ Better registration support for a wide range of third-party hardware Oct 19, 2024

uniartisan force-pushed the device-enhance branch from ae3ae6b to ce680a2 Compare October 19, 2024 07:21

This was referenced Oct 19, 2024

[WIP]add npu support #19308

Closed

Enable support for Intel XPU devices (AKA Intel GPUs) #19443

Draft

Enable Intel xpu as a new backend of PyTorch-Lightning #17700

Closed

Pytorch DirectML integration in Pytorch Lightning microsoft/DirectML#211

Open

uniartisan force-pushed the device-enhance branch 11 times, most recently from 8f0b3d6 to 2a89640 Compare October 22, 2024 06:58

uniartisan force-pushed the device-enhance branch 2 times, most recently from 1c83154 to 15595bf Compare October 22, 2024 08:00

uniartisan force-pushed the device-enhance branch 5 times, most recently from 7fee905 to 4299dfe Compare October 22, 2024 09:49

enhance 3d-party devices in mix-precision

decb98a

uniartisan force-pushed the device-enhance branch from 4299dfe to decb98a Compare October 22, 2024 11:13

Merge branch 'master' into device-enhance

a04ce44

mergify bot added the has conflicts label Nov 25, 2024

Merge branch 'master' into device-enhance

31c3412

lantiga requested a review from ethanwharris as a code owner November 25, 2024 09:01

mergify bot removed the has conflicts label Nov 25, 2024

lantiga added the accelerator label Nov 25, 2024

lantiga reviewed Nov 26, 2024

View reviewed changes

Merge branch 'master' into device-enhance

634b085

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better registration support for a wide range of third-party hardware #20349

Better registration support for a wide range of third-party hardware #20349

uniartisan commented Oct 19, 2024 •

edited

Loading

uniartisan commented Oct 19, 2024

codecov bot commented Oct 22, 2024 •

edited

Loading

lantiga left a comment

lantiga Nov 25, 2024

lantiga Nov 25, 2024

lantiga Nov 25, 2024

lantiga Nov 25, 2024

lantiga Nov 25, 2024

lantiga Nov 26, 2024

lantiga Nov 26, 2024

	self.device, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16)
	self.device_type, dtype=(torch.bfloat16 if self.precision == "bf16-mixed" else torch.float16)

	self.device = device if device is not None else "cuda"
	self.device_type = device_type if device_type is not None else "cuda"

Better registration support for a wide range of third-party hardware #20349

Are you sure you want to change the base?

Better registration support for a wide range of third-party hardware #20349

Conversation

uniartisan commented Oct 19, 2024 • edited Loading

What does this PR do?

PR review

uniartisan commented Oct 19, 2024

codecov bot commented Oct 22, 2024 • edited Loading

Codecov Report

lantiga left a comment

Choose a reason for hiding this comment

lantiga Nov 25, 2024

Choose a reason for hiding this comment

lantiga Nov 25, 2024

Choose a reason for hiding this comment

lantiga Nov 25, 2024

Choose a reason for hiding this comment

lantiga Nov 25, 2024

Choose a reason for hiding this comment

lantiga Nov 25, 2024

Choose a reason for hiding this comment

lantiga Nov 26, 2024

Choose a reason for hiding this comment

lantiga Nov 26, 2024

Choose a reason for hiding this comment

uniartisan commented Oct 19, 2024 •

edited

Loading

codecov bot commented Oct 22, 2024 •

edited

Loading