Skip to content

Commit

Permalink
Update README with the latest input variables (GoogleCloudPlatform#759)
Browse files Browse the repository at this point in the history
  • Loading branch information
achandrasekar authored Jul 30, 2024
1 parent 464a071 commit d57be9e
Showing 1 changed file with 20 additions and 11 deletions.
31 changes: 20 additions & 11 deletions benchmarks/inference-server/text-generation-inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,23 @@ terraform apply
<!-- BEGIN TFDOC -->
## Variables

| name | description | type | required | default |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------: | :-----------------------------------------: |
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object&#40;&#123;&#10; fleet_host &#61; optional&#40;string&#41;&#10; kubeconfig &#61; optional&#40;object&#40;&#123;&#10; context &#61; optional&#40;string&#41;&#10; path &#61; optional&#40;string, &#34;&#126;&#47;.kube&#47;config&#34;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> || |
| [gpu_count](variables.tf#L55) | Tensor parallelism. This is the number of gpus the server should use. Note that Huggingface TGI server supports gpu_count equal to one of 1, 2, 4, or 8. |
| <code>number</code> | | <code>1</code> |
| [hugging_face_secret](variables.tf#L81) | Secret id in Secret Manager. Required if your model requires a Huggingface user access token. | <code>string</code> | | <code>null</code> |
| [hugging_face_secret_version](variables.tf#L88) | Secret version in Secret Manager. Required if your model requires a Huggingface user access token. | <code>string</code> | | <code>null</code> |
| [ksa](variables.tf#L62) | Kubernetes Service Account used for workload. | <code>string</code> | | <code>&#34;default&#34;</code> |
| [model_id](variables.tf#L48) | Model used for inference. | <code>string</code> | | <code>&#34;tiiuae&#47;falcon-7b&#34;</code> |
| [namespace](variables.tf#L36) | Namespace to deploy server in. | <code>string</code> | | <code>&#34;default&#34;</code> |
<!-- END TFDOC -->
| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_credentials_config"></a> [credentials\_config](#input\_credentials\_config) | Configure how Terraform authenticates to the cluster. | <pre>object({<br> fleet_host = optional(string)<br> kubeconfig = optional(object({<br> context = optional(string)<br> path = optional(string, "~/.kube/config")<br> }))<br> })</pre> | n/a | yes |
| <a name="input_gpu_count"></a> [gpu\_count](#input\_gpu\_count) | Parallelism based on number of gpus. | `number` | `1` | no |
| <a name="input_hpa_averagevalue_target"></a> [hpa\_averagevalue\_target](#input\_hpa\_averagevalue\_target) | AverageValue target for the `hpa_type` metric. Must be set if `hpa_type` is not null. | `number` | `null` | no |
| <a name="input_hpa_max_replicas"></a> [hpa\_max\_replicas](#input\_hpa\_max\_replicas) | Maximum number of HPA replicas. | `number` | `5` | no |
| <a name="input_hpa_min_replicas"></a> [hpa\_min\_replicas](#input\_hpa\_min\_replicas) | Minimum number of HPA replicas. | `number` | `1` | no |
| <a name="input_hpa_type"></a> [hpa\_type](#input\_hpa\_type) | How the TGI workload should be scaled. | `string` | `null` | no |
| <a name="input_hugging_face_secret"></a> [hugging\_face\_secret](#input\_hugging\_face\_secret) | Secret id in Secret Manager | `string` | `null` | no |
| <a name="input_hugging_face_secret_version"></a> [hugging\_face\_secret\_version](#input\_hugging\_face\_secret\_version) | Secret version in Secret Manager | `string` | `null` | no |
| <a name="input_ksa"></a> [ksa](#input\_ksa) | Kubernetes Service Account used for workload. | `string` | `"default"` | no |
| <a name="input_max_concurrent_requests"></a> [max\_concurrent\_requests](#input\_max\_concurrent\_requests) | Max concurrent requests allowed for TGI to handle at once. TGI will drop all requests once it hits this max-concurrent-requests limit. | `number` | `128` | no |
| <a name="input_model_id"></a> [model\_id](#input\_model\_id) | Model used for inference. | `string` | `"tiiuae/falcon-7b"` | no |
| <a name="input_namespace"></a> [namespace](#input\_namespace) | Namespace used for TGI resources. | `string` | `"default"` | no |
| <a name="input_project_id"></a> [project\_id](#input\_project\_id) | Project id of existing or created project. | `string` | n/a | yes |
| <a name="input_quantization"></a> [quantization](#input\_quantization) | Quantization used for the model. Can be one of the quantization options mentioned in https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#quantize. `eetq` and `bitsandbytes` can be applied to any models whereas others might require the use of quantized checkpoints. | `string` | `""` | no |
| <a name="input_secret_templates_path"></a> [secret\_templates\_path](#input\_secret\_templates\_path) | Path where secret configuration manifest templates will be read from. Set to null to use the default manifests | `string` | `null` | no |
| <a name="input_templates_path"></a> [templates\_path](#input\_templates\_path) | Path where manifest templates will be read from. Set to null to use the default manifests | `string` | `null` | no |

<!-- END_TF_DOCS -->

0 comments on commit d57be9e

Please sign in to comment.