add support for CPU and MPS

do not use distributed when not available, instead use CPU or MPS. This entails a few changes: --device is now a valid flag to the library since `ilab` can pass CPU, MPS, or default to cuda when using CPU or MPS, do not initialize DS, instead put the model on the device and initialize `Adafactor` optimizer which is more efficient and than Adam based one inside of `train` add logic for handling if torch.cuda.is_available and torch.distributed.is_initialized() we dont use distributed torch on consumer systems the train loop needs some custom step and loss logic for a LlamaForCausalLM model, add that in when using CPU or MPS we are always world_size == 1 and local_rank == 0 Signed-off-by: Charlie Doern <[email protected]>
instructlab · Aug 29, 2024 · f7d33d3 · f7d33d3
1 parent 0de1e36
commit f7d33d3
Show file tree

Hide file tree

Showing 5 changed files with 207 additions and 96 deletions.
diff --git a/src/instructlab/training/__init__.py b/src/instructlab/training/__init__.py
@@ -22,9 +22,9 @@
 
 
 # defer import of main_ds
-def run_training(torch_args: TorchrunArgs, train_args: TrainingArgs) -> None:
+def run_training(torch_args: TorchrunArgs, train_args: TrainingArgs, device: str) -> None:
     """Wrapper around the main training job that calls torchrun."""
     # Local
     from .main_ds import run_training
 
-    return run_training(torch_args=torch_args, train_args=train_args)
+    return run_training(torch_args=torch_args, train_args=train_args, device=device)