You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Context: We had a discussion around torchao developer experience recently in torchao team meeting and collected some feedback from OSS contributors, the sentiment is generally that AffineQuantizedTensor in the current state is hard to work with and extend, and I had a few follow ups to get people's thoughts on how we can improve torchao developer experience, here is a response summary to the feedback.
tldr; We won’t split AffineQuantizedTensor right now, but it’s fine to add new tensor subclasses when there is a new use case even if it fits into AffineQuantizedTensor, we’ll revisit on if we need to split AQT or not when we understand different extension points, variations of quantization that people care about later. We will also think about what utils may make sense as we see more use cases (like different training dtypes, composing with DTensor etc.) being developed in torchao.
From the feedback I got from internal and OSS, at the high level, it seems that we should not have unify things in AffineQuantizedTensor as a goal, because AffineQuantizedTensor right now is designed for inference and is using a specific implementation of quantization, there are a lot of small or large differences when we expand the scope to training or optimizers, for example, slight differences for int8 training: 1 and 2, it’s better not to design for general API at this moment, before we see more use cases and possible extension points, but instead allow people to contribute new tensor subclasses to torchao (to lower the bar for contribution and optimize for developer velocity) and we can think about unification when this approach becomes unscalable and the duplication becomes an blocking issue. The good thing is these are implementation details for users, so changing implementation should not break BC. In the meantime we want to make sure we define clear public facing APIs so we can have how tensor subclasses are implemented as an implementation detail.
In terms of concrete plans around AffineQuantizedTensor
For use cases already integrated with AffineQuantizedTensor, we’ll try to make them more readable and hackable, it’s OK to split if there is a clear benefit as well but probably not a priority at this point
For new use cases, it’s fine that user want to add new tensor subclasses even if it fits into AffineQuantizedTensor, especially for the ones that’s dealing with training
In the meantime, we can improve on readability of AffineQuantizedTensor, here are some concrete items, @jainapurva will be working on these, and more feedback are welcome
[Done] Current Layout/Layout Type naming are confusing and not corresponding to pytorch layout
Current Layout class and aqt.layout_tensor corresponds to TensorImpl in pytorch
Current LayoutType class and aqt.layout_type corresponds to tensor.layout in pytorch
“aqt.layout_type” to “aqt._layout” for more alignment with pytorch native Tensor
Currently AffineQuantizedTensor file is large, this could make it harder to onboard since it’s not clear what code people need to understand for extension, to keep all relevant things in one place, we can move both layout definitions and kernel implementation dispatch code into a single file, and have a central place to import everything and include in the same dispatch table so that it’s clear what are the kernels that’s supported by AffineQuantizedTensor, this will reduce the size of file so it’s easier to understand the important part when people are adding new layout or new kernel dispatches.
Some feedback that's already addressed/fixed also includes
Also we want to think about what utils we can have for training tensor subclasses and how we think about training and inference story in torchao, this will be an ongoing effort throughout Q4, and we have been thinking about how to generalize the existing DTensor support like tensor parallelism, FSDP etc. from float8, nf4, mx to other dtype tensor subclasses as well
The text was updated successfully, but these errors were encountered:
Context: We had a discussion around torchao developer experience recently in torchao team meeting and collected some feedback from OSS contributors, the sentiment is generally that AffineQuantizedTensor in the current state is hard to work with and extend, and I had a few follow ups to get people's thoughts on how we can improve torchao developer experience, here is a response summary to the feedback.
tldr; We won’t split AffineQuantizedTensor right now, but it’s fine to add new tensor subclasses when there is a new use case even if it fits into AffineQuantizedTensor, we’ll revisit on if we need to split AQT or not when we understand different extension points, variations of quantization that people care about later. We will also think about what utils may make sense as we see more use cases (like different training dtypes, composing with DTensor etc.) being developed in torchao.
From the feedback I got from internal and OSS, at the high level, it seems that we should not have unify things in AffineQuantizedTensor as a goal, because AffineQuantizedTensor right now is designed for inference and is using a specific implementation of quantization, there are a lot of small or large differences when we expand the scope to training or optimizers, for example, slight differences for int8 training: 1 and 2, it’s better not to design for general API at this moment, before we see more use cases and possible extension points, but instead allow people to contribute new tensor subclasses to torchao (to lower the bar for contribution and optimize for developer velocity) and we can think about unification when this approach becomes unscalable and the duplication becomes an blocking issue. The good thing is these are implementation details for users, so changing implementation should not break BC. In the meantime we want to make sure we define clear public facing APIs so we can have how tensor subclasses are implemented as an implementation detail.
In terms of concrete plans around AffineQuantizedTensor
In the meantime, we can improve on readability of AffineQuantizedTensor, here are some concrete items, @jainapurva will be working on these, and more feedback are welcome
[Done] Current Layout/Layout Type naming are confusing and not corresponding to pytorch layout
Currently AffineQuantizedTensor file is large, this could make it harder to onboard since it’s not clear what code people need to understand for extension, to keep all relevant things in one place, we can move both layout definitions and kernel implementation dispatch code into a single file, and have a central place to import everything and include in the same dispatch table so that it’s clear what are the kernels that’s supported by AffineQuantizedTensor, this will reduce the size of file so it’s easier to understand the important part when people are adding new layout or new kernel dispatches.
Some feedback that's already addressed/fixed also includes
Also we want to think about what utils we can have for training tensor subclasses and how we think about training and inference story in torchao, this will be an ongoing effort throughout Q4, and we have been thinking about how to generalize the existing DTensor support like tensor parallelism, FSDP etc. from float8, nf4, mx to other dtype tensor subclasses as well
The text was updated successfully, but these errors were encountered: