[BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9857

yangchiu · 2024-11-26T02:11:24Z

Describe the bug

Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached:

https://ci.longhorn.io/job/public/job/master/job/sles/job/amd64/job/longhorn-tests-sles-amd64/1104/testReport/junit/tests/test_node/test_node_eviction_multiple_volume/

To Reproduce

Disable scheduling on node 1.
Create pv, pvc, pod with volume 1 of 2 replicas.
Set 'Eviction Requested' to 'true' and disable scheduling on node 2.
Set 'Eviction Requested' to 'false' and enable scheduling on node 1.
Check volume 'healthy' and wait for replicas running on node 1 and 3.
delete pods to detach volume 1.
Set 'Eviction Requested' to 'false' and enable scheduling on node 2.
Set 'Eviction Requested' to 'true' and disable scheduling on node 1.
Wait for replicas running on node 2 and 3.

In v1.7.2, the detached volume will automatically re-attach in step 9 to reschedule replicas from node 1 to node 2.

But in master-head, the re-attachment and rescheduling never happen.

Expected behavior

Support bundle for troubleshooting

Environment

Longhorn version: master-head
Impacted volume (PV):
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.31.1+k3s1
- Number of control plane nodes in the cluster:
- Number of worker nodes in the cluster:
Node config
- OS type and version: sles 15-sp6
- Kernel version:
- CPU per node:
- Memory per node:
- Disk type (e.g. SSD/NVMe/HDD):
- Network bandwidth between the nodes (Gbps):
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:

Additional context

Workaround and Mitigation

The text was updated successfully, but these errors were encountered:

derekbit · 2024-11-26T03:12:37Z

@mantissahz Please help investigate the issue. Thank you.

yangchiu · 2024-11-26T06:29:27Z

Could this be related to #9781?

c3y1huang · 2024-11-27T02:27:33Z

Could this be related to #9781?

Yes, it seems to be a regression failure caused by it. I will handle this at #9781.

cc @derekbit @mantissahz

longhorn-io-github-bot · 2024-11-27T06:28:34Z

innobead · 2024-11-28T05:07:20Z

Could this be related to #9781?

Yes, it seems to be a regression failure caused by it. I will handle this at #9781.

cc @derekbit @mantissahz

so this is not a regression in the existing versions but caused by the recent fix for #9781 ?

c3y1huang · 2024-11-28T05:10:43Z

so this is not a regression in the existing versions but caused by the recent fix for #9781 ?

Yes, this is caused by a recently merged PR. longhorn/longhorn-manager#3270

yangchiu added this to the v1.8.0 milestone Nov 26, 2024

github-project-automation bot added this to Longhorn Sprint Nov 26, 2024

github-project-automation bot moved this to New Issues in Longhorn Sprint Nov 26, 2024

derekbit assigned mantissahz Nov 26, 2024

c3y1huang mentioned this issue Nov 27, 2024

[BUG] Detached Volume Stuck in Attached State During Node Eviction #9781

Open

c3y1huang assigned c3y1huang and unassigned mantissahz Nov 27, 2024

c3y1huang moved this from New Issues to Implement in Longhorn Sprint Nov 27, 2024

c3y1huang added backport/1.6.4 backport/1.7.3 labels Nov 27, 2024

This was referenced Nov 27, 2024

[BACKPORT][v1.7.3][BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9866

Open

[BACKPORT][v1.6.4][BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9867

Open

c3y1huang mentioned this issue Nov 27, 2024

fix: detached volume replicas not evicted longhorn/longhorn-manager#3293

Open

c3y1huang moved this from Implement to Review in Longhorn Sprint Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9857

[BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9857

yangchiu commented Nov 26, 2024 •

edited

Loading

derekbit commented Nov 26, 2024

yangchiu commented Nov 26, 2024 •

edited

Loading

c3y1huang commented Nov 27, 2024 •

edited

Loading

longhorn-io-github-bot commented Nov 27, 2024 •

edited by c3y1huang

Loading

innobead commented Nov 28, 2024

c3y1huang commented Nov 28, 2024

[BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9857

[BUG] Test case test_node_eviction_multiple_volume failed to reschedule replicas after volume detached #9857

Comments

yangchiu commented Nov 26, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

Workaround and Mitigation

derekbit commented Nov 26, 2024

yangchiu commented Nov 26, 2024 • edited Loading

c3y1huang commented Nov 27, 2024 • edited Loading

longhorn-io-github-bot commented Nov 27, 2024 • edited by c3y1huang Loading

Pre Ready-For-Testing Checklist

innobead commented Nov 28, 2024

c3y1huang commented Nov 28, 2024

yangchiu commented Nov 26, 2024 •

edited

Loading

yangchiu commented Nov 26, 2024 •

edited

Loading

c3y1huang commented Nov 27, 2024 •

edited

Loading

longhorn-io-github-bot commented Nov 27, 2024 •

edited by c3y1huang

Loading