Release v0.33.0: MUSA backend support and bugfixes · huggingface/accelerate

MUSA backend support and bugfixes

Small release this month, with key focuses on some added support for backends and bugs:

Support MUSA (Moore Threads GPU) backend in accelerate by @fmo-mt in #2917
Allow multiple process per device by @cifkao in #2916
Add torch.float8_e4m3fn format dtype_byte_size by @SunMarc in #2945
Properly handle Params4bit in set_module_tensor_to_device by @matthewdouglas in #2934

[tests] fix bug in torch_device by @faaany in #2909
Fix slowdown on init with device_map="auto" by @muellerzr in #2914
fix: bug where multi_gpu was being set and warning being printed even with num_processes=1 by @HarikrishnanBalagopal in #2921
Better error when a bad directory is given for weight merging by @muellerzr in #2852
add xpu device check before moving tensor directly to xpu device by @faaany in #2928
Add huggingface_hub version to setup.py by @nullquant in #2932
Correct loading of models with shared tensors when using accelerator.load_state() by @jkuntzer in #2875
Hotfix PyTorch Version Installation in CI Workflow for Minimum Version Matrix by @yhna940 in #2889
Fix import test by @muellerzr in #2931
Consider pynvml available when installed through the nvidia-ml-py distribution by @matthewdouglas in #2936
Improve test reliability for Accelerator.free_memory() by @matthewdouglas in #2935
delete CCL env var setting by @Liangliang-Ma in #2927
feat(ci): add pip caching in CI by @SauravMaheshkar in #2952

Full Changelog: v0.32.1...v0.33.0