Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ZK] Triggering validation plan returns an error for zookeeper operator #308

Open
rishabh96b opened this issue Jan 4, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@rishabh96b
Copy link
Member

rishabh96b commented Jan 4, 2021

Description

The validation plan of zookeeper operator does not run properly and marked as COMPLETED. Please find the detailed logs below.

└── zookeeper-instance (Operator-Version: "zookeeper-3.4.14-0.3.1" Active-Plan: "validation")
    ├── Plan deploy (serial strategy) [NOT ACTIVE]
    │   ├── Phase zookeeper (parallel strategy) [NOT ACTIVE]
    │   │   └── Step deploy [NOT ACTIVE]
    │   └── Phase validation (serial strategy) [NOT ACTIVE]
    │       ├── Step validation [NOT ACTIVE]
    │       └── Step cleanup [NOT ACTIVE]
    ├── Plan not-allowed (serial strategy) [NOT ACTIVE]
    │   └── Phase not-allowed (serial strategy) [NOT ACTIVE]
    │       └── Step not-allowed [NOT ACTIVE]
    └── Plan validation (serial strategy) [COMPLETE], last updated 2021-01-04 20:10:40
        └── Phase connection (serial strategy) [COMPLETE]
            ├── Step connection [COMPLETE]
            └── Step cleanup [COMPLETE]

Command

kubectl kudo plan trigger --name=validation --instance=zookeeper-instance

The kudo-controller logs are flooded with

2021/01/04 14:20:10 HealthUtil: unknown type *v1beta1.PodDisruptionBudget is marked healthy by default
2021/01/04 14:20:10 HealthUtil: statefulset "zookeeper-instance-zookeeper" is not healthy: Waiting for 1 pods to be ready...
2021/01/04 14:20:10 TaskExecution: object default/zookeeper-instance-zookeeper is NOT healthy: statefulset "zookeeper-instance-zookeeper" is not healthy: Waiting for 1 pods to be ready...
2021/01/04 14:20:10 PlanExecution: 'deploy' step(s) (instance: default/zookeeper-instance) of the deploy.zookeeper are not ready
2021/01/04 14:20:10 InstanceController: Received Reconcile request for instance default/zookeeper-instance

The plan is supposed to trigger a job which in turn will print the zookeeper URI. But it is unable to create any job stating

 HealthUtil: job "zookeeper-instance-validation" still running or failed
2021/01/04 14:20:28 TaskExecution: object default/zookeeper-instance-validation is NOT healthy: job "zookeeper-instance-validation" still running or failed
2021/01/04 14:20:28 PlanExecution: 'validation' task(s) (instance: default/zookeeper-instance) of the deploy.validation.validation are not ready
2021/01/04 14:20:28 PlanExecution: 'validation,cleanup' step(s) (instance: default/zookeeper-instance) of the deploy.validation are not ready

The zookeeper-instance StatefulSet looks to be okay.

""2021-01-04 14:24:16,272 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@222] - Accepted socket connection from /127.0.0.1:39720
""2021-01-04 14:24:16,272 [myid:3] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@908] - Processing ruok command from /127.0.0.1:39720
""2021-01-04 14:24:16,273 [myid:3] - INFO  [Thread-290:NIOServerCnxn@1056] - Closed socket connection for client /127.0.0.1:39720 (no session established for client)

Lastly, I am getting a TLS handshake error as well

2021/01/04 14:20:31 InstanceController: Error when updating instance status. Operation cannot be fulfilled on instances.kudo.dev "zookeeper-instance": the object has been modified; please apply your changes to the latest version and try again
2021/01/04 14:20:32 InstanceController: Received Reconcile request for instance default/zookeeper-instance
2021/01/04 14:20:32 Computing health out of 0 Deployments, 0 ReplicaSets, 1 StatefulSets, 0 DaemonSets, 3 Pods
2021/01/04 14:20:32 Updating instance default/zookeeper-instance readiness to: true
2021/01/04 14:20:32 InstanceController: Readiness did not change for default/zookeeper-instance. Not updating.
2021/01/04 14:20:32 http: TLS handshake error from 10.0.130.81:56732: EOF
2021/01/04 14:20:42 http: TLS handshake error from 10.0.130.81:56844: EOF
...

KUDO Version

KUDO Version: version.Info{GitVersion:"0.17.2", GitCommit:"d902714c", BuildDate:"2020-11-16T20:34:11Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64", KubernetesClientVersion:"v0.19.2"}

I tried this with KUDO version 0.17.0 and was getting the same error.

@rishabh96b rishabh96b added the bug Something isn't working label Apr 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant