Update README with the latest input variables (GoogleCloudPlatform#759)

yiyinglovecoding · Jul 30, 2024 · d57be9e · d57be9e
1 parent 464a071
commit d57be9e
Showing 1 changed file with 20 additions and 11 deletions.
diff --git a/benchmarks/inference-server/text-generation-inference/README.md b/benchmarks/inference-server/text-generation-inference/README.md
@@ -113,14 +113,23 @@ terraform apply
 <!-- BEGIN TFDOC -->
 ## Variables
 
-| name                                            | description                                                                                                                                              |                                                                                                                                                                             type                                                                                                                                                                             | required |                   default                   |
-| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :-----------------------------------------: |
-| [credentials_config](variables.tf#L17)          | Configure how Terraform authenticates to the cluster.                                                                                                    | <code title="object&#40;&#123;&#10;  fleet_host &#61; optional&#40;string&#41;&#10;  kubeconfig &#61; optional&#40;object&#40;&#123;&#10;    context &#61; optional&#40;string&#41;&#10;    path    &#61; optional&#40;string, &#34;&#126;&#47;.kube&#47;config&#34;&#41;&#10;  &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |    ✓     |                                             |
-| [gpu_count](variables.tf#L55)                   | Tensor parallelism. This is the number of gpus the server should use. Note that Huggingface TGI server supports gpu_count equal to one of 1, 2, 4, or 8. |
-| <code>number</code>                             |                                                                                                                                                          |                                                                                                                                                                        <code>1</code>                                                                                                                                                                        |
-| [hugging_face_secret](variables.tf#L81)         | Secret id in Secret Manager. Required if your model requires a Huggingface user access token.                                                            |                                                                                                                                                                     <code>string</code>                                                                                                                                                                      |          |              <code>null</code>              |
-| [hugging_face_secret_version](variables.tf#L88) | Secret version in Secret Manager. Required if your model requires a Huggingface user access token.                                                       |                                                                                                                                                                     <code>string</code>                                                                                                                                                                      |          |              <code>null</code>              |
-| [ksa](variables.tf#L62)                         | Kubernetes Service Account used for workload.                                                                                                            |                                                                                                                                                                     <code>string</code>                                                                                                                                                                      |          |       <code>&#34;default&#34;</code>        |
-| [model_id](variables.tf#L48)                    | Model used for inference.                                                                                                                                |                                                                                                                                                                     <code>string</code>                                                                                                                                                                      |          | <code>&#34;tiiuae&#47;falcon-7b&#34;</code> |
-| [namespace](variables.tf#L36)                   | Namespace to deploy server in.                                                                                                                           |                                                                                                                                                                     <code>string</code>                                                                                                                                                                      |          |       <code>&#34;default&#34;</code>        |
-<!-- END TFDOC -->
+| Name | Description | Type | Default | Required |
+|------|-------------|------|---------|:--------:|
+| <a name="input_credentials_config"></a> [credentials\_config](#input\_credentials\_config) | Configure how Terraform authenticates to the cluster. | <pre>object({<br>    fleet_host = optional(string)<br>    kubeconfig = optional(object({<br>      context = optional(string)<br>      path    = optional(string, "~/.kube/config")<br>    }))<br>  })</pre> | n/a | yes |
+| <a name="input_gpu_count"></a> [gpu\_count](#input\_gpu\_count) | Parallelism based on number of gpus. | `number` | `1` | no |
+| <a name="input_hpa_averagevalue_target"></a> [hpa\_averagevalue\_target](#input\_hpa\_averagevalue\_target) | AverageValue target for the `hpa_type` metric. Must be set if `hpa_type` is not null. | `number` | `null` | no |
+| <a name="input_hpa_max_replicas"></a> [hpa\_max\_replicas](#input\_hpa\_max\_replicas) | Maximum number of HPA replicas. | `number` | `5` | no |
+| <a name="input_hpa_min_replicas"></a> [hpa\_min\_replicas](#input\_hpa\_min\_replicas) | Minimum number of HPA replicas. | `number` | `1` | no |
+| <a name="input_hpa_type"></a> [hpa\_type](#input\_hpa\_type) | How the TGI workload should be scaled. | `string` | `null` | no |
+| <a name="input_hugging_face_secret"></a> [hugging\_face\_secret](#input\_hugging\_face\_secret) | Secret id in Secret Manager | `string` | `null` | no |
+| <a name="input_hugging_face_secret_version"></a> [hugging\_face\_secret\_version](#input\_hugging\_face\_secret\_version) | Secret version in Secret Manager | `string` | `null` | no |
+| <a name="input_ksa"></a> [ksa](#input\_ksa) | Kubernetes Service Account used for workload. | `string` | `"default"` | no |
+| <a name="input_max_concurrent_requests"></a> [max\_concurrent\_requests](#input\_max\_concurrent\_requests) | Max concurrent requests allowed for TGI to handle at once. TGI will drop all requests once it hits this max-concurrent-requests limit. | `number` | `128` | no |
+| <a name="input_model_id"></a> [model\_id](#input\_model\_id) | Model used for inference. | `string` | `"tiiuae/falcon-7b"` | no |
+| <a name="input_namespace"></a> [namespace](#input\_namespace) | Namespace used for TGI resources. | `string` | `"default"` | no |
+| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project id of existing or created project. | `string` | n/a | yes |
+| <a name="input_quantization"></a> [quantization](#input\_quantization) | Quantization used for the model. Can be one of the quantization options mentioned in https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#quantize. `eetq` and `bitsandbytes` can be applied to any models whereas others might require the use of quantized checkpoints. | `string` | `""` | no |
+| <a name="input_secret_templates_path"></a> [secret\_templates\_path](#input\_secret\_templates\_path) | Path where secret configuration manifest templates will be read from. Set to null to use the default manifests | `string` | `null` | no |
+| <a name="input_templates_path"></a> [templates\_path](#input\_templates\_path) | Path where manifest templates will be read from. Set to null to use the default manifests | `string` | `null` | no |
+
+<!-- END_TF_DOCS -->