Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA : Client rate limit issues #6359

Open
Sathyam-Hotstar opened this issue Nov 25, 2024 · 3 comments
Open

KEDA : Client rate limit issues #6359

Sathyam-Hotstar opened this issue Nov 25, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Sathyam-Hotstar
Copy link

Sathyam-Hotstar commented Nov 25, 2024

Report

Running into multiple client rate limiter Wait returned an error issues in keda-operator. We have around 50-100 scaled objects across various EKS clusters and this problem seen after we upgraded KEDA from 2.13.0 to 2.15.1 version.

We have a varying number of ScaledObjects across our eks clusters, and as per previous threads we previously increased the qps values but since upgrade we are getting frequent errors.

Our K8s client config is as follows:

  • kube-api-qps: 35
  • kube-api-burst: 70

I want to understand that what is the reason KEDA sends so many concurrent requests to the kubernetes api server? Since we do not have a very large number of scaledobjects and default polling_intervalof 30sec is used, why the keda client is ratelimiting the requests? Need to understand if the keda retries are causing issues or is it something else?

Expected Behavior

We should not run into client rate limit issues.

Actual Behavior

We are getting error messages in keda-operator logs for client rate limit issues.

Steps to Reproduce the Problem

Observed in cluster with scaledobjects around 50-100 with keda version 2.15.1 and eks version 1.31.

Logs from KEDA operator

2024-11-25T11:24:51Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace", "name": "internal-app-1", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=sum%28sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29+%2A+ignoring%28pod%2C+namespace%29+group_left%28%29+max%281+%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace%22%2C+exported_service%3D%22internal-app-1%22%7D%29%29+or+sum%28rate%28envoy_http_downstream_rq_total%7Benvoy_http_conn_manager_prefix%3D~%22ingress_http%7Cingress_https%22%2Csource_cluster%3D%22eks-cluster-1%22%2C+namespace%3D%22internal-namespace%22%2C+service%3D%22internal-app-1%22%2C+pod%3D~%27internal-app-1.%2A%27%7D%5B1m%5D%29%2A60%29+by+%28pod%2C+namespace%29%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-1", "scaledObject.Namespace": "internal-namespace", "scaleTarget.Name": "internal-app-1", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "prometheusScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:51Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace", "scaledObject.Name": "internal-app-1", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:52Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-2", "name": "internal-app-2", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_2_%28%28default%29%7C%28headless%29%29_internal-namespace-2.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-2%22%2C+exported_service%3D%22internal-app-2%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:51Z\": context canceled"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-2", "scaledObject.Namespace": "internal-namespace-2", "scaleTarget.Name": "internal-app-2", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:52Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-2", "scaledObject.Name": "internal-app-2", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:53Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-3", "name": "internal-app-3", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_3_%28%28default%29%7C%28headless%29%29_internal-namespace-3.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-3%22%2C+exported_service%3D%22internal-app-3%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-3", "scaledObject.Namespace": "internal-namespace-3", "scaleTarget.Name": "internal-app-3", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:53Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-3", "scaledObject.Name": "internal-app-3", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:53Z\": context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-4", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	prometheus_scaler	error executing prometheus query	{"type": "ScaledObject", "namespace": "internal-namespace-4", "name": "internal-app-4-p0", "error": "Get \"http://vmselect.internal-endpoint.com:8481/select/0/prometheus/api/v1/query?query=%28sum%28sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+%2B+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_total%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29+or+sum%28rate%28envoy_cluster_upstream_rq_5xx%7Benvoy_cluster_name%3D~%27internal_app_4_p0_%28%28default%29%7C%28headless%29%29_internal-namespace-4.%2A%27%2C+source_cluster%3D%22eks-cluster-1%22%2C+namespace%3D~%22%28infrastructure%7Cinternal-namespace%29%22%7D%5B1m%5D%29%29+by+%28envoy_cluster_name%29%29%29+%2A+max%281%2B+max%28request_buffer_per_datacenter%7Bdatacenter%3D%22sgp%22%7D+or+request_buffer_per_service%7Bdatacenter%3D%22sgp%22%2C+exported_namespace%3D%22internal-namespace-4%22%2C+exported_service%3D%22internal-app-4-p0%22%7D%29%29+by+%28namespace%2C+service%29&time=2024-11-25T11:24:54Z\": context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting scale decision	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "prometheusScaler", "error": "scaler with id 2 not found, len = 0, cache has been probably already invalidated"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	error setting ready condition	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	failed to patch Objects	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scaleexecutor	Error setting active condition when triggers are not active	{"scaledobject.Name": "internal-app-4-p0", "scaledObject.Namespace": "internal-namespace-4", "scaleTarget.Name": "internal-app-4-p0", "error": "client rate limiter Wait returned an error: context canceled"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 0 not found. Len = 0"}
2024-11-25T11:24:54Z	ERROR	scale_handler	error getting metric spec for the scaler	{"scaledObject.Namespace": "internal-namespace-4", "scaledObject.Name": "internal-app-4-p0", "scaler": "cpuMemoryScaler", "error": "scaler with id 1 not found. Len = 0"}

KEDA Version

2.15.1

Kubernetes Version

1.31

Platform

Amazon Web Services

Scaler Details

CpuMemoryScaler & PrometheusScaler

Anything else?

No response

@Sathyam-Hotstar Sathyam-Hotstar added the bug Something isn't working label Nov 25, 2024
@JorTurFer
Copy link
Member

Hello
I see multiple scaler issues in your logs. When there is a scaling error, KEDA needs to register the status in k8s API as it's the place to store the state. As there are prometheus timeouts, it could trigger the rate limiter. Could you share one of the ScaledObjects that you use?

@Sathyam-Hotstar
Copy link
Author

Sathyam-Hotstar commented Nov 28, 2024

Hi @JorTurFer
Sharing one scaledobject definition for which prometheusScaler failed and we got further rate limit logs:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  annotations:
    scaledobject.keda.sh/transfer-hpa-ownership: "true"
  finalizers:
  - finalizer.keda.sh
  labels:
    scaledobject.keda.sh/name: service-name
  name: service-name
  namespace: umsp
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      name: service-name
    scalingModifiers: {}
  maxReplicaCount: 1000
  minReplicaCount: 12
  scaleTargetRef:
    kind: Deployment
    name: service-name
  triggers:
  - metadata:
      value: "50"
    metricType: Utilization
    type: cpu
  - metadata:
      value: "60"
    metricType: Utilization
    type: memory
  - metadata:
      activationThreshold: "0"
      query: (sum(sum(rate(envoy_cluster_upstream_rq_total{envoy_cluster_name=~'hs_um_session_tracking_service_((default)|(headless))_umsp.*',
        source_cluster="CLUSTER_NAME", namespace=~"(infrastructure|infrastructure-internal)"}[1m]))
        by (envoy_cluster_name) + sum(rate(envoy_cluster_upstream_rq_5xx{envoy_cluster_name=~'hs_um_session_tracking_service_((default)|(headless))_umsp.*',
        source_cluster="CLUSTER_NAME", namespace=~"(infrastructure|infrastructure-internal)"}[1m]))
        by (envoy_cluster_name) or sum(rate(envoy_cluster_upstream_rq_total{envoy_cluster_name=~'hs_um_session_tracking_service_((default)|(headless))_umsp.*',
        source_cluster="CLUSTER_NAME", namespace=~"(infrastructure|infrastructure-internal)"}[1m]))
        by (envoy_cluster_name) or sum(rate(envoy_cluster_upstream_rq_5xx{envoy_cluster_name=~'hs_um_session_tracking_service_((default)|(headless))_umsp.*',
        source_cluster="CLUSTER_NAME", namespace=~"(infrastructure|infrastructure-internal)"}[1m]))
        by (envoy_cluster_name))) * max(1+ max(request_buffer_per_datacenter{datacenter="sgp"}
        or request_buffer_per_service{datacenter="sgp", exported_namespace="umsp",
        exported_service="service-name"})) by (namespace, service)
      serverAddress: http://VMSELECT-DNS.custom-domain.com:8481/select/0/prometheus
      threshold: "220"
    type: prometheus
status:
  conditions:
  - message: ScaledObject is defined correctly and is ready for scaling
    reason: ScaledObjectReady
    status: "True"
    type: Ready
  - message: Scaling is performed because triggers are active
    reason: ScalerActive
    status: "True"
    type: Active
  - message: No fallbacks are active on this scaled object
    reason: NoFallbackFound
    status: "False"
    type: Fallback
  - status: Unknown
    type: Paused
  externalMetricNames:
  - s2-prometheus
  health:
    s2-prometheus:
      numberOfFailures: 0
      status: Happy
  lastActiveTime: "2024-11-28T06:01:57Z"
  originalReplicaCount: 15
  resourceMetricNames:
  - cpu
  - memory
  scaleTargetGVKR:
    group: apps
    kind: Deployment
    resource: deployments
    version: v1
  scaleTargetKind: apps/v1.Deployment

@JorTurFer
Copy link
Member

Hello,
We have been talking about this during the community call, but to summarize the talk, there was a problem related with the amount of information sent to the K8s API for fallback calculation although the fallback isn't enabled for a given ScaledObject.

Reducing the information sent to the k8s API related with fallback when fallback is enabled is more complex and will be done with the refactor of the logic, but this PR (shipped as part of v2.16.0) reduces significantly the amount of traffic related with fallback when the fallback feature isn't enabled for the given SO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

2 participants