v1.7

roberthbailey released this 04 Nov 21:11

0127782

This release includes a number of new features, improvements and bug fixes.

New Features

Added a benchmarking tool for measuring data loading performance with gcsfuse. (#863)
Added a Prometheus server to the Latency Profile Generator (LPG) running on port 9090, along with new metrics for prompt_length, response_length, and time_per_output_token. (#857)
Added support for Google Cloud Monitoring and Managed Collection for the gke-batch-refarch. (#856)
Added a tutorial on packaging models and low-rank adapters (LoRA) from Hugging Face as images, pushing them to Artifact Registry, and deploying them in GKE. (#855)

Improvements

Updated outdated references to the Text Generation Inference (TGI) container to use the Hugging Face Deep Learning Containers (DLCs) hosted on Google Cloud's Artifact Registry. (#816)
Added the ability to benchmark multiple models concurrently in the LPG. (#850)
Added support for "inf" (infinity) request rate and number of prompts in the LPG. (#847)
Fixed the latency_throughput_curve.sh script to correctly parse non-integer request rates and added "errors" to the benchmark results. (#850)

Bug Fixes

Fixed an issue where the README was not rendering correctly. (#862)

New Contributors

@alvarobartt made their first contribution in #816
@liu-cong made their first contribution in #850
@coolkp made their first contribution in #855
@JamesDuncanNz made their first contribution in #856

Full Changelog: v1.6...v1.7

Contributors

JamesDuncanNz, liu-cong, and 2 other contributors

Assets 2