Skip to content

v1.7

Compare
Choose a tag to compare
@roberthbailey roberthbailey released this 04 Nov 21:11
0127782

This release includes a number of new features, improvements and bug fixes.

New Features

  • Added a benchmarking tool for measuring data loading performance with gcsfuse. (#863)
  • Added a Prometheus server to the Latency Profile Generator (LPG) running on port 9090, along with new metrics for prompt_length, response_length, and time_per_output_token. (#857)
  • Added support for Google Cloud Monitoring and Managed Collection for the gke-batch-refarch. (#856)
  • Added a tutorial on packaging models and low-rank adapters (LoRA) from Hugging Face as images, pushing them to Artifact Registry, and deploying them in GKE. (#855)

Improvements

  • Updated outdated references to the Text Generation Inference (TGI) container to use the Hugging Face Deep Learning Containers (DLCs) hosted on Google Cloud's Artifact Registry. (#816)
  • Added the ability to benchmark multiple models concurrently in the LPG. (#850)
  • Added support for "inf" (infinity) request rate and number of prompts in the LPG. (#847)
  • Fixed the latency_throughput_curve.sh script to correctly parse non-integer request rates and added "errors" to the benchmark results. (#850)

Bug Fixes

  • Fixed an issue where the README was not rendering correctly. (#862)

New Contributors

Full Changelog: v1.6...v1.7