This module deploys and configures the Kube-Prometheus Stack inside a Kubernetes Cluster.
Name | Version |
---|---|
terraform | >= 1.0 |
helm | >= 2.0.0 |
Name | Version |
---|---|
helm | >= 2.0.0 |
kubernetes | n/a |
Name | Description | Type | Default | Required |
---|---|---|---|---|
chart_version | Version of the Helm chart | any |
n/a | yes |
helm_namespace | The namespace Helm will install the chart under | any |
n/a | yes |
cluster_domain | Cluster domain for DestinationRules | string |
"cluster.local" |
no |
destinationrules_labels | Labels applied to DestinationRules | map(string) |
{} |
no |
destinationrules_mode | DestionationRule TLS mode | string |
"DISABLE" |
no |
enable_destinationrules | Creates DestinationRules for Prometheus, Alertmanager, Grafana, and Node Exporters | bool |
false |
no |
enable_prometheusrules | Adds PrometheusRules for alerts | bool |
true |
no |
helm_release | The name of the Helm release | string |
"kube-prometheus-stack" |
no |
helm_repository | The repository where the Helm chart is stored | string |
"https://prometheus-community.github.io/helm-charts" |
no |
helm_repository_password | The password of the repository where the Helm chart is stored | string |
"" |
no |
helm_repository_username | The username of the repository where the Helm chart is stored | string |
"" |
no |
prometheus_pvc_name | Used for storage alert. Set if using non-default helm_release | string |
"prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0" |
no |
values | Values to be passed to the Helm chart | string |
"" |
no |
alertmanager_replicas | Number of replicas for Alertmanager | number |
1 |
no |
Name | Description |
---|---|
helm_namespace | n/a |
helm_release | The name of the Helm release. For use by external ServiceMonitors |
status | n/a |
module "helm_kube_prometheus_stack" {
source = "git::https://github.com/canada-ca-terraform-modules/terraform-kubernetes-kube-prometheus-stack?ref=v3.3.0"
chart_version = "43.3.0"
depends_on = [
module.namespace_monitoring,
]
helm_namespace = module.namespace_monitoring.name
helm_release = "kube-prometheus-stack"
helm_repository = "https://prometheus-community.github.io/helm-charts"
enable_destinationrules = true
values = <<EOF
EOF
}
To upgrade an existing Helm release created from the previous module instead of reinstalling into a new Helm release, set helm_release
to "prometheus-operator"
. This will persist Helm release history and some temporary data, but may result in resource name and label aberrations.
It is alternatively possible to reinstall into a new release while persisting existing data in Persistent Volumes from the previous module. This process involves downtime and does not guarantee data compatibility. A guide is available here. Note that there are further steps if multiple components (e.g. both Prometheus and Grafana) were configured with Persistent Volume storage. Their Persistent Volumes will need to be given different labels, and the components' volumeClaimTemplate
s (defined in Helm values) will need to be given corresponding selectors.
Date | Release | Change |
---|---|---|
2021-03-26 | v1.0.0 | 1st release |
2021-07-05 | v1.1.0 | 1st set of general project alerts |
2021-09-07 | v1.1.1 | CompletedJobsNotCleared scope set to project |
2022-03-16 | v2.0.0 | Convert DestinationRules and PrometheusRules to kubernetes_manifest s. Updates for Terraform v1 and nomenclature |
2022-07-28 | v2.0.1 | PrometheusRule severity label updates |
2022-08-10 | v2.0.2 | Refactor the threshold for the VeleroHourlyBackupPartialFailure & VeleroHourlyBackupFailure alert |
2022-08-10 | v2.0.3 | Create the NodeDiskMayFillIn60Hours alert |
2022-08-10 | v2.0.4 | Delete the ManyAlertsFiring & ManyManyAlertsFiring alerts |
2022-08-19 | v2.0.5 | Create the VeleroBackupTakingLongTime alert |
2022-08-22 | v2.0.6 | Fix the VeleroBackupTakingLongTime alert severity level |
2022-08-31 | v2.0.7 | Update nodepool pod capacity alerts and remove unused recording rule |
2022-09-02 | v2.0.8 | Update threshold for when to expect a backup for the VeleroBackupTakingLongTime alert |
2022-11-04 | v2.1.0 | Add several alerts and associated test cases regarding cert manager certificates |
2022-11-08 | v2.1.1 | Adjust ContainerWaiting alert duration to align with PodNotReady |
2022-11-16 | v2.1.2 | Fix node and nodepool pod capacity, NodePodsFull, and NodeReachingPodCapacity alerts |
2022-11-24 | v2.2.0 | Add alert: PrometheusDiskMayFillIn60Hours |
2022-12-06 | v2.3.0 | Add alert: NodeReadinessFlapping |
2022-12-15 | v2.3.1 | Fix the NodeUnschedulable alert severity level |
2023-01-04 | v3.0.0 | Refactor general cluster and namespace alerts. enable_prometheusrules false->true. Removes variables: prometheusrules_labels, cluster_rules_name, namespace_rules_name, cert_manager_rules_name |
2023-01-09 | v3.1.0 | Add runbook links to Prometheus rules |
2023-01-11 | v3.1.1 | Fix ManyContainerRestarts alert to account for multiple metrics sources |
2023-02-01 | v3.2.0 | Node clock alerts and README update |
2023-02-03 | v3.2.1 | Specify sensitive variables |
2023-02-08 | v3.3.0 | Add abilitity to add DestinationRule for Alertmanager replicas |
2023-02-16 | v3.4.0 | Add rules for CoreDNS alerts |
2023-03-10 | v3.4.1 | Fix syntax error in CoreDNS alert rules |
2023-03-14 | v3.5.0 | Add rule for ContainerImagePullProblem, refactor container alert unit tests |
2023-03-15 | v3.6.0 | Add DestinationRule for Thanos Sidecar |
2023-03-28 | v3.7.0 | Add generic PVC alerts |
2023-04-05 | v3.8.0 | Add "cluster" in prometheus rule aggregations to make compatible with Thanos. Add Prometheus heartbeat recording rule |
2023-04-19 | v3.8.1 | Fix CoreDNSDown alert |
2023-04-21 | v3.8.2 | Ensure prometheus heartbeat recording rule is evaluated by Prometheus |
2023-05-04 | v3.8.3 | Fix ContainerImagePullProblem flapping |
2023-06-08 | v3.9.0 | Ignore terminated pods in pod capacity alerts |
2023-06-19 | v3.9.1 | Fix PersistentVolume status alerts |
2023-12-07 | v3.9.2 | Adjust node alerts for clock synchronization |
2024-02-29 | v3.9.3 | Adjust Node and PVC storage alerts |
2024-04-15 | v3.9.4 | Adjust Node alerts, report agentpool, standardize node label |
2024-05-31 | v3.9.5 | Update container alerts |
2024-09-09 | v3.9.6 | Debounce ContainerCrashLooping |
2024-12-03 | v3.9.7 | Add NodeDiskFull and fix/refactor some node alerts |
-
Note that in Usage the
dependencies
array has been replaced by thedepends_on
array. -
If
enable_destinationrules
wastrue
in v1.x, locate the DestinationRules that were created inhelm_namespace
. There should be 4 correspoding to Prometheus, Alertmanager, Grafana, and the Prometheus Node Exporter. Delete them prior to the upgrade. Ifenable_destinationrules
remains true, they will be recreated with minimal downtime. -
If
enable_prometheusrules
wastrue
in v1.x, locate the PrometheusRule definitions that were created inhelm_namespace
. There should be 2:general-platform-alerts
andgeneral-project-alerts
. Delete them prior to the upgrade. Ifenable_prometheusrules
remains true, they will be recreated. This may resolve any presently firing alerts. If it does, they will fire again once their conditions are met.- The default names for these PrometheusRule resources are now
general-cluster-alerts
andgeneral-namespace-alerts
. The scopes have changed fromplatform
tocluster
and fromproject
tonamespace
. Adjust Alertmanager routing criteria accordingly. - The severities for these rules have been adjusted from
minor/major/urgent
todebug/minor/major
. Adjust Alertmanager routing criteria accordingly.
- The default names for these PrometheusRule resources are now
This module replaces terraform-kubernetes-prometheus. The previous module used the custom chart prometheus-operator, which used the now-deprecated upstream chart prometheus-operator as a sub-chart and added DestinationRules.
This new module uses the new upstream chart kube-prometheus-stack directly. DestinationRules, as well as a set of general alerts, can be added through the module.
To migrate from the old custom chart to the new upstream chart, the following changes should be made to Helm values:
- Remove the top-level
prometheus-operator:
and realign indentation, as you are no longer applying values to a subchart. - Remove any
destinationRule:
specification and its contents, as this is now handled by terraform variables.
The upstream prometheus-operator
chart was renamed to kube-prometheus-stack
to reflect that additional components beyond the Prometheus Operator are installed.