[RFC] Follow Up for torchao developer experience discussion #1184

jerryzh168 · 2024-10-28T18:23:35Z

Context: We had a discussion around torchao developer experience recently in torchao team meeting and collected some feedback from OSS contributors, the sentiment is generally that AffineQuantizedTensor in the current state is hard to work with and extend, and I had a few follow ups to get people's thoughts on how we can improve torchao developer experience, here is a response summary to the feedback.

tldr; We won’t split AffineQuantizedTensor right now, but it’s fine to add new tensor subclasses when there is a new use case even if it fits into AffineQuantizedTensor, we’ll revisit on if we need to split AQT or not when we understand different extension points, variations of quantization that people care about later. We will also think about what utils may make sense as we see more use cases (like different training dtypes, composing with DTensor etc.) being developed in torchao.

From the feedback I got from internal and OSS, at the high level, it seems that we should not have unify things in AffineQuantizedTensor as a goal, because AffineQuantizedTensor right now is designed for inference and is using a specific implementation of quantization, there are a lot of small or large differences when we expand the scope to training or optimizers, for example, slight differences for int8 training: 1 and 2, it’s better not to design for general API at this moment, before we see more use cases and possible extension points, but instead allow people to contribute new tensor subclasses to torchao (to lower the bar for contribution and optimize for developer velocity) and we can think about unification when this approach becomes unscalable and the duplication becomes an blocking issue. The good thing is these are implementation details for users, so changing implementation should not break BC. In the meantime we want to make sure we define clear public facing APIs so we can have how tensor subclasses are implemented as an implementation detail.
In terms of concrete plans around AffineQuantizedTensor
- For use cases already integrated with AffineQuantizedTensor, we’ll try to make them more readable and hackable, it’s OK to split if there is a clear benefit as well but probably not a priority at this point
- For new use cases, it’s fine that user want to add new tensor subclasses even if it fits into AffineQuantizedTensor, especially for the ones that’s dealing with training
- People can also start with "General Guide on Extending torchao" in [RFC] torchao Contributor Guide #391
In the meantime, we can improve on readability of AffineQuantizedTensor, here are some concrete items, @jainapurva will be working on these, and more feedback are welcome
- [Done] Current Layout/Layout Type naming are confusing and not corresponding to pytorch layout
  - Current Layout class and aqt.layout_tensor corresponds to TensorImpl in pytorch
  - Current LayoutType class and aqt.layout_type corresponds to tensor.layout in pytorch
  - Rename
    - “Layout” class to “TensorImpl” (following the naming for https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/SparseTensorImpl.h)
    - “aqt.layout_tensor” to “aqt.tensor_impl”
    - “LayoutType” to “Layout”
    - “aqt.layout_type” to “aqt._layout” for more alignment with pytorch native Tensor
- Currently AffineQuantizedTensor file is large, this could make it harder to onboard since it’s not clear what code people need to understand for extension, to keep all relevant things in one place, we can move both layout definitions and kernel implementation dispatch code into a single file, and have a central place to import everything and include in the same dispatch table so that it’s clear what are the kernels that’s supported by AffineQuantizedTensor, this will reduce the size of file so it’s easier to understand the important part when people are adding new layout or new kernel dispatches.
- Some feedback that's already addressed/fixed also includes
  - Allow a quantization method to by pass the condition checks for activation and weight and call into the specific linear kernel they want to use directly (Allow quantized linear registration in a different file #783)
  - Defining an Exception to indicate quantized linear op is not implemented so it's not confused with other exceptions (Make developer experience better for extending AQT #749)
Also we want to think about what utils we can have for training tensor subclasses and how we think about training and inference story in torchao, this will be an ongoing effort throughout Q4, and we have been thinking about how to generalize the existing DTensor support like tensor parallelism, FSDP etc. from float8, nf4, mx to other dtype tensor subclasses as well

jerryzh168 added the rfc label Oct 28, 2024

jerryzh168 mentioned this issue Oct 31, 2024

[New method] VPTQ Vector Post-Training Quantization Support #1204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Follow Up for torchao developer experience discussion #1184

[RFC] Follow Up for torchao developer experience discussion #1184

jerryzh168 commented Oct 28, 2024

[RFC] Follow Up for torchao developer experience discussion #1184

[RFC] Follow Up for torchao developer experience discussion #1184

Comments

jerryzh168 commented Oct 28, 2024