-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use other model file based on Flux-schnell? #102
Comments
@puppyapple if you're up for installing a dev build, try: pip install git+https://github.com/anthonywu/mflux.git@support-hf-models
mflux-generate \
--base-model schnell \
--steps 4 \
--model shuttleai/shuttle-3-diffusion \
--prompt "A dog holding a sign that says mflux rocks" make sure to accept the repo terms of service first at: https://huggingface.co/shuttleai/shuttle-3-diffusion before trying to use mflux / huggingface-hub to download the models to disk sample from my local gen using the branch in #103 |
@anthonywu for this syntax, how does quant work? for shuttle-3-diffusion, what's the difference between --quant 8 on full model or cloning the fp8 version? |
Here's my investigation. TL;DR
download the official models# official bfloat16 model
huggingface-cli download shuttleai/shuttle-3-diffusion
# official fp8 model
huggingface-cli download shuttleai/shuttle-3-diffusion-fp8 observe the model size on diskdu -sh ~/.cache/huggingface/hub/models--shuttleai--shuttle-3-diffusion*
54G ~/.cache/huggingface/hub/models--shuttleai--shuttle-3-diffusion
11G ~/.cache/huggingface/hub/models--shuttleai--shuttle-3-diffusion-fp8 use mflux to save a q8 modelthis should work in the latest commit of PR #103 mflux-save \
-m shuttleai/shuttle-3-diffusion \
--base-model schnell \
-q 8 \
--path /tmp/shuttle-3-diffusion-q8 this produces a local 17G converted model (compared to the 11G fp8 produced by their util: https://huggingface.co/shuttleai/shuttle-3-diffusion-fp8/blob/main/convert.py) du -sh /tmp/shuttle-3-diffusion-q8
17G /tmp/shuttle-3-diffusion-q8 In mflux's
the caller is: def save_weights(base_path: str, bits: int, model: nn.Module, subdir: str):
path = Path(base_path) / subdir
path.mkdir(parents=True, exist_ok=True)
weights = ModelSaver._split_weights(base_path, dict(tree_flatten(model.parameters())))
for i, weight in enumerate(weights):
mx.save_safetensors(
str(path / f"{i}.safetensors"),
weight,
{"quantization_level": str(bits)},
) so the difference between the official fp8 17G version and the mflux converted 11G version is a matter of compare outputsgenerating with official bfloat16mflux-generate \
--seed 42 \
--base-model schnell \
--model shuttleai/shuttle-3-diffusion \
--prompt "a cat holding a sign saying meow" \
--steps 4 generating with mflux-converted q8mflux-generate \
--seed 42 \
--model schnell \
--path /tmp/shuttle-3-diffusion-q8 \
--prompt "a cat holding a sign saying meow" \
--steps 4 the two images look almost identical - watch for a diff in the lower right corner of the sign. On my M1 Max 64GB, generation time was 1m24 with q8 and 1m28 on bfloat16, so that's almost identical so my conclusion is that q8 isn't worth the perf improvement for now. I presume q8 would use less RAM but I don't have time to instrument it for now. generating with upstream fp8mflux-generate \
--seed 42 \
--base-model schnell \
--model shuttleai/shuttle-3-diffusion-fp8 \
--prompt "a cat holding a sign saying meow" \
--steps 4 this doesn't actually work out of the box because the HF repo for fp8 does not contain all the assets from the bfloat16 version! And that's probably why it weighs in a 11G rather than closer to 17G, so a full repo may not be much diff in size after all. the error from mflux is if the official fp8 repo is laid out like the full version then I think the HF model would just work so as of this writing - the only way to get out of box q8 functionality is to use mflux to convert to q8 locally this does create a todo to better display the errors for HF repos that do not present its full assets |
Thank you! Happy to confirm all this works as you say with the pr |
Thanks for the great work!
Recently I see new models like shuttleai/shuttle-3-diffusion · Hugging Face, which is finetuned on flux-schnell.
Is there any way to use mflux on these models?
The text was updated successfully, but these errors were encountered: