Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Follow Up for torchao developer experience discussion #1184

Open
jerryzh168 opened this issue Oct 28, 2024 · 0 comments
Open

[RFC] Follow Up for torchao developer experience discussion #1184

jerryzh168 opened this issue Oct 28, 2024 · 0 comments
Labels

Comments

@jerryzh168
Copy link
Contributor

Context: We had a discussion around torchao developer experience recently in torchao team meeting and collected some feedback from OSS contributors, the sentiment is generally that AffineQuantizedTensor in the current state is hard to work with and extend, and I had a few follow ups to get people's thoughts on how we can improve torchao developer experience, here is a response summary to the feedback.

tldr; We won’t split AffineQuantizedTensor right now, but it’s fine to add new tensor subclasses when there is a new use case even if it fits into AffineQuantizedTensor, we’ll revisit on if we need to split AQT or not when we understand different extension points, variations of quantization that people care about later. We will also think about what utils may make sense as we see more use cases (like different training dtypes, composing with DTensor etc.) being developed in torchao.

  • From the feedback I got from internal and OSS, at the high level, it seems that we should not have unify things in AffineQuantizedTensor as a goal, because AffineQuantizedTensor right now is designed for inference and is using a specific implementation of quantization, there are a lot of small or large differences when we expand the scope to training or optimizers, for example, slight differences for int8 training: 1 and 2, it’s better not to design for general API at this moment, before we see more use cases and possible extension points, but instead allow people to contribute new tensor subclasses to torchao (to lower the bar for contribution and optimize for developer velocity) and we can think about unification when this approach becomes unscalable and the duplication becomes an blocking issue. The good thing is these are implementation details for users, so changing implementation should not break BC. In the meantime we want to make sure we define clear public facing APIs so we can have how tensor subclasses are implemented as an implementation detail.

  • In terms of concrete plans around AffineQuantizedTensor

    • For use cases already integrated with AffineQuantizedTensor, we’ll try to make them more readable and hackable, it’s OK to split if there is a clear benefit as well but probably not a priority at this point
    • For new use cases, it’s fine that user want to add new tensor subclasses even if it fits into AffineQuantizedTensor, especially for the ones that’s dealing with training
    • People can also start with "General Guide on Extending torchao" in [RFC] torchao Contributor Guide #391
  • In the meantime, we can improve on readability of AffineQuantizedTensor, here are some concrete items, @jainapurva will be working on these, and more feedback are welcome

    • [Done] Current Layout/Layout Type naming are confusing and not corresponding to pytorch layout

      • Current Layout class and aqt.layout_tensor corresponds to TensorImpl in pytorch
      • Current LayoutType class and aqt.layout_type corresponds to tensor.layout in pytorch
      • Rename
    • Currently AffineQuantizedTensor file is large, this could make it harder to onboard since it’s not clear what code people need to understand for extension, to keep all relevant things in one place, we can move both layout definitions and kernel implementation dispatch code into a single file, and have a central place to import everything and include in the same dispatch table so that it’s clear what are the kernels that’s supported by AffineQuantizedTensor, this will reduce the size of file so it’s easier to understand the important part when people are adding new layout or new kernel dispatches.

    • Some feedback that's already addressed/fixed also includes

  • Also we want to think about what utils we can have for training tensor subclasses and how we think about training and inference story in torchao, this will be an ongoing effort throughout Q4, and we have been thinking about how to generalize the existing DTensor support like tensor parallelism, FSDP etc. from float8, nf4, mx to other dtype tensor subclasses as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant