Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yangchiu
Copy link
Member

@yangchiu yangchiu commented Nov 28, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9858

What this PR does / why we need it:

add test case Shutdown Volume Node And Test Auto Reattach To A New Node

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new keyword for waiting on sharemanager pod restarts.
    • Added functionality for batch processing powered-off nodes during power operations.
  • Improvements

    • Enhanced variable management by centralizing definitions in a new resource file.
    • Updated multiple keywords to support looping through lists for improved resource handling.
  • Bug Fixes

    • Improved cleanup process for powered-off nodes to ensure all are addressed systematically.
  • Documentation

    • Updated comments and documentation for clarity on new functionalities and variable management.

@yangchiu yangchiu self-assigned this Nov 28, 2024
@yangchiu yangchiu requested a review from a team as a code owner November 28, 2024 09:58
Copy link

coderabbitai bot commented Nov 28, 2024

Walkthrough

The pull request introduces significant changes across multiple files, primarily enhancing the resource management and testing capabilities within the end-to-end testing framework. Key modifications include the addition of numerous library imports, the introduction of new keywords, and the restructuring of variable management by centralizing variable definitions into a new resource file. The updates also involve renaming and modifying existing keywords to support batch processing of powered-off nodes, improving the flexibility and robustness of the testing environment.

Changes

File Path Change Summary
e2e/keywords/common.resource Added multiple library imports; modified Cleanup test resources keyword to iterate over @{powered_off_nodes}.
e2e/keywords/host.resource Renamed Power on off node to Power on off nodes; modified to loop over @{powered_off_nodes}.
e2e/keywords/sharemanager.resource Added Wait for sharemanager pod of deployment ${deployment_id} restart keyword; updated comments in existing keywords.
e2e/keywords/variables.resource New resource file defining global variables and an empty list variable @{powered_off_nodes}.
e2e/keywords/workload.resource Added Power off volume node of ${workload_kind} ${workload_id} without waiting; modified existing keyword for tracking.
e2e/libs/host/aws.py Updated power_off_node method to include waiting parameter.
e2e/libs/host/base.py Updated power_off_node method to include waiting parameter.
e2e/libs/host/harvester.py Updated power_off_node method to include waiting parameter.
e2e/libs/keywords/host_keywords.py Updated power_off_volume_node method to include waiting parameter.
e2e/libs/keywords/sharemanager_keywords.py Added wait_for_sharemanager_pod_restart method; updated logging in wait_for_share_manager_pod_running.
e2e/tests/negative/*.robot Added resource declaration for ../keywords/variables.resource and removed inline variable definitions across multiple test files.
e2e/tests/regression/*.robot Added resource declaration for ../keywords/variables.resource and removed inline variable definitions across multiple test files.

Assessment against linked issues

Objective Addressed Explanation
Add/update auto e2e test cases for share manager pod shutdown scenario (#9858)

Possibly related PRs

Suggested reviewers

  • chriscchien

Poem

In the land of code where rabbits play,
New keywords hop in, brightening the day.
With loops and lists, we clean up the mess,
Testing our pods, we strive for success!
So let’s power on nodes, one by one,
In this coding adventure, we’re having such fun! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (50)
e2e/libs/host/base.py (1)

26-26: Add docstring to document the waiting parameter.

Since this is an abstract base class that defines the interface, it would be helpful to document the expected behavior of the waiting parameter.

Consider adding a docstring like this:

    @abstractmethod
    def power_off_node(self, node_name, waiting):
+        """Power off a node.
+        
+        Args:
+            node_name: Name of the node to power off
+            waiting: If True, wait for the node to completely shut down before returning
+        """
        return NotImplemented
e2e/tests/regression/test_engine_image.robot (1)

Line range hint 1-24: Consider adding node shutdown test coverage.

While the existing test case covers engine upgrade scenarios, given the PR's objective to test node shutdown and auto-reattachment (issue #9858), consider adding a new test case in this file to verify engine image behavior during node shutdown scenarios.

Would you like me to help draft a new test case that covers:

  1. Engine image behavior when a node is shut down
  2. Verification of engine image state consistency after node recovery
  3. Integration with the share manager functionality for RWX PVCs
e2e/keywords/host.resource (1)

38-42: Consider improving error handling and naming.

While the implementation is functional, there are a few suggestions for improvement:

  1. The keyword name "Power on off nodes" is ambiguous. Consider renaming to something more descriptive like "Power On Previously Powered Off Nodes".
  2. The error handling could be enhanced to log failures.
  3. Consider adding validation for the ${powered_off_nodes} list.

Here's a suggested improvement:

-Power on off nodes
+Power On Previously Powered Off Nodes
     FOR    ${powered_off_node}    IN    @{powered_off_nodes}
-        Run keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        ${status}    ${error} =    Run Keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        Run Keyword If    '${status}' == 'FAIL'    Log    Failed to power on node ${powered_off_node}: ${error}    WARN
         Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
     END
e2e/tests/regression/test_migration.robot (1)

Line range hint 19-47: Consider adding a test case for node shutdown scenario

While the current test case thoroughly validates RWX volume migration, it doesn't cover the specific scenario from issue #9858 where a node shutdown causes the share manager to become stuck. Consider adding a new test case that:

  1. Creates an RWX volume
  2. Shuts down the node containing the share manager pod
  3. Verifies auto-reattachment to a new node
  4. Validates that the share manager recovers properly

This would help prevent regression of the issue described in #9858.

Would you like me to help draft the new test case implementation?

e2e/keywords/sharemanager.resource (1)

29-32: Consider adding documentation for the new keyword.

While the implementation is correct, adding documentation would improve maintainability by clarifying:

  • The purpose of the keyword
  • Expected behavior during pod restart
  • Any timeout or failure conditions

Example documentation:

+[Documentation]    Waits for the sharemanager pod of the specified deployment to restart.
+...               The keyword uses the deployment ID to identify the volume and monitors
+...               the associated sharemanager pod until a restart is detected.
 Wait for sharemanager pod of deployment ${deployment_id} restart
     ${deployment_name} =   generate_name_with_suffix    deployment    ${deployment_id}
     ${volume_name} =    get_workload_volume_name    ${deployment_name}
     wait_for_sharemanager_pod_restart    ${volume_name}
e2e/tests/regression/test_scheduling.robot (1)

Line range hint 1-54: Missing test case for node shutdown and auto-reattachment scenario.

Based on the PR objectives (issue #9858), this file should include a new test case to verify that:

  1. The share manager correctly handles node shutdown scenarios
  2. RWX PVCs can successfully auto-reattach to new nodes

The current test cases don't cover these scenarios, which are the primary objectives of this PR.

Would you like me to help draft a new test case that covers these scenarios? The test case would:

  • Set up a RWX PVC with share manager
  • Shut down the node hosting the share manager
  • Verify successful auto-reattachment to a new node
e2e/tests/regression/test_volume.robot (2)

Line range hint 1-60: Add test case for volume node shutdown and auto-reattach

Based on the PR objectives and issue #9858, this file should include a test case for validating the share manager's behavior when a volume node is shut down and testing the auto-reattachment to a new node. This is currently missing from the test suite.

Would you like me to help draft a new test case that:

  1. Creates a volume with RWX access mode
  2. Simulates a node shutdown
  3. Verifies the share manager's state
  4. Validates successful auto-reattachment to a new node?

Line range hint 1-60: Consider adding share manager state validation

To thoroughly test the scenario described in issue #9858, consider adding assertions to verify:

  1. Share manager's state transitions during node shutdown
  2. RWX PVC attachment behavior
  3. Recovery time metrics for auto-reattachment

This will help ensure the test suite effectively catches regressions related to the share manager getting stuck in stopping/error state.

e2e/libs/keywords/host_keywords.py (1)

49-52: Add docstring and error handling.

The implementation looks good and aligns well with the PR objectives for testing volume node shutdown scenarios. However, consider these improvements:

  1. Add a docstring explaining the parameters and behavior:
 def power_off_volume_node(self, volume_name, waiting=True):
+    """Powers off the node containing the specified volume.
+    
+    Args:
+        volume_name: Name of the volume whose node should be powered off
+        waiting: If True, waits for power-off completion. If False, returns immediately
+    
+    Raises:
+        Exception: If volume node cannot be determined
+    """
  1. Add error handling for the node ID lookup:
-    node_id = self.volume_keywords.get_node_id_by_replica_locality(volume_name, "volume node")
+    try:
+        node_id = self.volume_keywords.get_node_id_by_replica_locality(volume_name, "volume node")
+        if not node_id:
+            raise Exception(f"Could not determine node ID for volume {volume_name}")
+    except Exception as e:
+        raise Exception(f"Failed to get node ID for volume {volume_name}: {str(e)}")
e2e/tests/negative/kubelet_restart.robot (3)

Line range hint 15-35: Enhance test coverage for share manager scenarios

Given that this PR addresses issue #9858 regarding share manager getting stuck, consider adding specific validations:

  1. Verify share manager pod state transitions
  2. Validate successful RWX volume reattachment to new nodes
 Restart Volume Node Kubelet While Workload Heavy Writing
     Given Set setting rwx-volume-fast-failover to ${RWX_VOLUME_FAST_FAILOVER}
     And Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
     And Create statefulset 0 using RWO volume with longhorn-test storageclass
     And Create statefulset 1 using RWX volume with longhorn-test storageclass
 
     FOR    ${i}    IN RANGE    ${LOOP_COUNT}
         And Keep writing data to pod of statefulset 0
         And Keep writing data to pod of statefulset 1
 
         When Stop volume nodes kubelet for 10 seconds    statefulset 0    statefulset 1
         And Wait for volume of statefulset 0 healthy
         And Wait for volume of statefulset 1 healthy
+        And Wait for share manager pod healthy    statefulset 1
         And Wait for workloads pods stable    statefulset 0    statefulset 1
+        And Verify RWX volume successfully reattached    statefulset 1
 
         Then Check statefulset 0 works
         And Check statefulset 1 works
     END

Line range hint 37-57: Consider parameterizing the kubelet stop duration

The 360-second duration for pod eviction timeout is hardcoded. Consider making this configurable through a variable to improve test maintainability.

 Stop Volume Node Kubelet For More Than Pod Eviction Timeout While Workload Heavy Writing
     Given Set setting rwx-volume-fast-failover to ${RWX_VOLUME_FAST_FAILOVER}
     And Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
     And Create statefulset 0 using RWO volume with longhorn-test storageclass
     And Create statefulset 1 using RWX volume with longhorn-test storageclass
 
     FOR    ${i}    IN RANGE    ${LOOP_COUNT}
         And Keep writing data to pod of statefulset 0
         And Keep writing data to pod of statefulset 1
 
-        When Stop volume nodes kubelet for 360 seconds    statefulset 0    statefulset 1
+        When Stop volume nodes kubelet for ${POD_EVICTION_TIMEOUT}    statefulset 0    statefulset 1
         And Wait for volume of statefulset 0 healthy
         And Wait for volume of statefulset 1 healthy
+        And Wait for share manager pod healthy    statefulset 1
         And Wait for workloads pods stable    statefulset 0    statefulset 1
+        And Verify RWX volume successfully reattached    statefulset 1
 
         Then Check statefulset 0 works
         And Check statefulset 1 works
     END

Line range hint 1-57: Test structure aligns well with PR objectives

The test cases effectively validate the system's behavior during node shutdown scenarios. The structure is clean and follows Robot Framework best practices. The suggested improvements will help ensure comprehensive coverage of the share manager issue (#9858).

Consider adding these test cases to a dedicated test suite for share manager-related scenarios if one doesn't exist already. This would help organize tests by functionality and make it easier to maintain and extend share manager-specific test coverage.

e2e/tests/regression/test_ha.robot (1)

Line range hint 24-34: Add test case for share manager node shutdown scenario

The current test cases don't cover the specific scenario from issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

  1. Creates a RWX PVC with share manager
  2. Shuts down the node hosting the share manager
  3. Verifies successful auto-reattachment to a new node

Would you like me to help draft the new test case implementation?

e2e/keywords/common.resource (2)

Line range hint 4-24: Remove duplicate import of network_keywords library

The network_keywords.py library is imported twice (lines 8 and 13). Please remove one of the duplicate imports to maintain clean dependencies.

 Library             ../libs/keywords/network_keywords.py
 Library             ../libs/keywords/recurringjob_keywords.py
 Library             ../libs/keywords/statefulset_keywords.py
 Library             ../libs/keywords/stress_keywords.py
 Library             ../libs/keywords/volume_keywords.py
 Library             ../libs/keywords/workload_keywords.py
 Library             ../libs/keywords/persistentvolumeclaim_keywords.py
-Library             ../libs/keywords/network_keywords.py
 Library             ../libs/keywords/storageclass_keywords.py

Line range hint 1-67: Consider documenting the cleanup sequence dependencies

The cleanup sequence handles various resources in a specific order, which is crucial for proper test isolation. With the addition of powered-off node handling, it would be valuable to document:

  1. The rationale behind the cleanup sequence order
  2. Any dependencies between cleanup steps
  3. The interaction between powered-off node handling and other cleanup operations

Consider adding documentation in the following format:

*** Settings ***
Documentation       Common keywords
...                 The cleanup sequence follows these dependencies:
...                 1. Power on nodes first to ensure they're available for other cleanup operations
...                 2. Uncordon nodes to allow pod scheduling
...                 3. Clean up higher-level resources (deployments, statefulsets) before their dependencies
...                 4. Clean up storage-related resources last
e2e/tests/negative/stress_cpu.robot (1)

6-6: LGTM! Consider documenting the variables.

The centralization of variables into a dedicated resource file is a good practice for maintainability. However, it would be helpful to document the expected values and purpose of these variables in the resource file.

e2e/tests/negative/stress_filesystem.robot (2)

Line range hint 1-100: Missing test coverage for node shutdown scenarios.

While the file contains various stress test scenarios, it doesn't address the primary PR objective of testing volume node shutdown and auto-reattachment scenarios described in issue #9858. Consider adding a new test case to validate this specific behavior.

I can help draft a new test case that would:

  1. Create a volume with multiple replicas
  2. Simulate a node shutdown
  3. Verify auto-reattachment to a new node
  4. Validate data integrity

Would you like me to propose the implementation?


Line range hint 31-43: Consider adding timeout handling for volume detach/attach cycles.

The test "Stress Volume Node Filesystem When Volume Is Detaching and Attaching" might benefit from explicit timeout handling to prevent test hangs.

Consider adding timeout parameters:

And Wait for volume 0 detached    timeout=300s
And Wait for volume 0 healthy    timeout=300s
e2e/tests/regression/test_v2.robot (1)

Line range hint 1-1: Missing test case for share manager node shutdown

Based on the PR objectives and issue #9858, this file should include a new test case that:

  1. Creates an RWX PVC with a share manager
  2. Simulates a node shutdown where the share manager pod is running
  3. Verifies that the PVC can successfully reattach to a new node

Consider adding a new test case with the following structure:

Share Manager Should Recover After Node Shutdown
    [Tags]    cluster
    Given Create storageclass longhorn-test with    dataEngine=v2    accessMode=ReadWriteMany
    And Create persistentvolumeclaim 0 using RWX volume with longhorn-test storageclass
    And Create deployment 0 with persistentvolumeclaim 0
    And Wait for volume of deployment 0 healthy
    
    When Get node of share manager pod for deployment 0 volume
    And Shutdown node of share manager pod
    
    Then Wait for volume of deployment 0 attached to new node
    And Check deployment 0 works
e2e/tests/negative/uninstallation_checks.robot (1)

Line range hint 24-92: Consider enhancing error scenario coverage.

The test case thoroughly validates the happy path, but could be strengthened in two areas:

  1. Add explicit verification that backups are retained immediately after uninstallation
  2. Include error scenario handling (e.g., failed uninstallation)

Consider adding these verification steps after the uninstallation:

    Then Uninstall Longhorn
    And Check Longhorn CRD removed
+   And Verify backup store contains backup 0    # Explicit verification post-uninstallation
+   And Verify backup store contains backup 1
+   And Verify no orphaned resources in namespace longhorn-system

    # Assume this is another Longhorn cluster
    Then Install Longhorn
e2e/tests/negative/node_delete.robot (2)

Line range hint 20-40: Consider optimizing test data size and adding timeouts

The test writes 2048 MB of data, which might increase test execution time unnecessarily. Consider:

  1. Reducing the data size for faster test execution
  2. Making the data size configurable via a variable
  3. Adding explicit timeouts to "Wait until" steps to prevent indefinite hangs
- And Write 2048 MB data to file data in deployment 0
- And Write 2048 MB data to file data in deployment 1
+ And Write ${TEST_DATA_SIZE} MB data to file data in deployment 0
+ And Write ${TEST_DATA_SIZE} MB data to file data in deployment 1

Line range hint 20-89: Consider adding specific test case for share manager recovery

While these tests cover node deletion scenarios well, they don't specifically address the share manager stuck issue mentioned in #9858. Consider adding a dedicated test case that:

  1. Specifically targets the share manager pod's node shutdown
  2. Verifies the share manager's state transitions
  3. Confirms successful reattachment of RWX PVCs

Would you like me to help draft a new test case specifically for the share manager recovery scenario?

e2e/tests/negative/pull_backup_from_another_longhorn.robot (1)

Line range hint 1-100: Test case doesn't address PR objectives

This test case focuses on backup/restore functionality across different Longhorn versions, but doesn't address the PR's main objective of testing "shutdown volume node and auto reattach to new node" scenarios (from issue #9858). The test should include steps to:

  1. Shut down a volume node
  2. Verify share manager behavior
  3. Test RWX PVC reattachment to new nodes

Would you like assistance in:

  1. Creating a new test case specifically for the node shutdown scenario, or
  2. Extending this test case to include the required node shutdown tests?
e2e/tests/negative/cluster_restart.robot (1)

Line range hint 19-89: Enhance test coverage for share manager scenarios

The test case provides excellent coverage of various storage configurations and workload types. However, given the PR objectives and issue #9858, consider these enhancements:

  1. Add specific validation for share manager state after cluster restart
  2. Include checks for RWX volume reattachment behavior
  3. Consider parallel execution for efficiency

Add these validations after the cluster restart:

        When Restart cluster
        And Wait for longhorn ready
        And Wait for workloads pods stable
        ...    deployment 0    deployment 1    deployment 2    deployment 3    deployment 4    deployment 5
        ...    statefulset 0    statefulset 1    statefulset 2    statefulset 3    statefulset 4    statefulset 5

+       # Validate share manager recovery
+       And Wait for share manager ready
+       And Verify share manager not in error state
+       And Verify RWX volumes successfully reattached

        Then Check deployment 0 works
e2e/libs/host/aws.py (2)

76-79: Add documentation for asynchronous behavior.

The waiting logic implementation is correct, but it would benefit from a brief comment explaining the behavior when waiting=False. This helps other developers understand the asynchronous nature of the operation.

    def power_off_node(self, power_off_node_name, waiting=True):
        instance_ids = [self.mapping[power_off_node_name]]
        resp = self.aws_client.stop_instances(InstanceIds=instance_ids, Force=True)
        assert resp['ResponseMetadata']['HTTPStatusCode'] == 200, f"Failed to stop instances {instance_ids} response: {resp}"
        logging(f"Stopping instances {instance_ids}")
+       # When waiting=False, return immediately after initiating shutdown
+       # without waiting for the instance to reach 'stopped' state
        if waiting:
            waiter = self.aws_client.get_waiter('instance_stopped')
            waiter.wait(InstanceIds=instance_ids)
            logging(f"Stopped instances")
🧰 Tools
🪛 Ruff (0.8.0)

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)


79-79: Remove unnecessary f-string prefix.

The logging statement doesn't contain any placeholders, so the f-string prefix can be removed.

-           logging(f"Stopped instances")
+           logging("Stopped instances")
🧰 Tools
🪛 Ruff (0.8.0)

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

e2e/tests/regression/test_backup.robot (2)

Line range hint 1-1: Consider adding node shutdown test cases or creating a separate test file.

Based on the PR objectives and issue #9858, this PR should include test cases for share manager behavior during node shutdown scenarios. However, this file focuses on backup/restore functionality. Consider either:

  1. Adding the node shutdown test cases to this file if backup functionality is involved, or
  2. Creating a separate test file specifically for node shutdown and auto-reattachment scenarios.

Would you like me to help draft the test cases for the node shutdown scenario? This would include:

  • Testing share manager behavior when a node is shut down
  • Verifying RWX PVC attachment to new nodes
  • Validating auto-reattachment functionality

Line range hint 37-85: Consider enhancing error handling in backup volume tests.

The test implementation is well-structured, but consider these improvements:

  1. Add retry mechanisms for backup operations that might be flaky
  2. Add more assertions to verify the error states
  3. Add logging to help debug test failures

Example improvement:

    When Create backup 0 for volume 0
+   # Add retry and logging
+   Run Keyword And Continue On Failure    Wait Until Keyword Succeeds    
+   ...    ${RETRY_COUNT}    ${RETRY_INTERVAL}
+   ...    Verify Backup Creation    0    0
    And Create backup 1 for volume 1
    Then Verify backup list contains no error for volume 0
e2e/libs/keywords/sharemanager_keywords.py (1)

70-95: Implementation aligns well with PR objectives

The new wait_for_sharemanager_pod_restart method and enhanced logging in wait_for_share_manager_pod_running provide the necessary capabilities to:

  1. Monitor share manager pod restarts during node shutdown tests
  2. Debug potential issues with pod state transitions
  3. Validate the fix for issue #9858 where share manager gets stuck in stopping/error state

Consider adding additional checks for pod status conditions beyond just the creation timestamp to ensure the pod is not only restarted but also healthy and ready to serve RWX volumes.

🧰 Tools
🪛 Ruff (0.8.0)

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)


87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/host/harvester.py (2)

Line range hint 56-73: Consider optimizing batch node operations.

Now that power_off_node supports non-blocking operations, methods like reboot_all_nodes and reboot_all_worker_nodes could be optimized to power off all nodes concurrently before waiting.

Example optimization for reboot_all_nodes:

 def reboot_all_nodes(self, shut_down_time_in_sec=NODE_REBOOT_DOWN_TIME_SECOND):
     node_names = [key for key in self.mapping.keys()]
 
     for node_name in node_names:
-        self.power_off_node(node_name)
+        self.power_off_node(node_name, waiting=False)
+
+    # Wait for all nodes to finish powering off
+    for node_name in node_names:
+        self.power_off_node(node_name)  # This call will only wait
 
     logging(f"Wait for {shut_down_time_in_sec} seconds before starting vms")
     time.sleep(shut_down_time_in_sec)

Line range hint 1-11: Review security implications of disabled SSL verification.

While this is test code, consider:

  1. Adding a comment explaining why SSL verification is disabled
  2. Ensuring this is only used in test environments
  3. Adding environment validation to prevent accidental use in production
e2e/tests/regression/test_settings.robot (2)

Line range hint 20-82: Consider enhancing test documentation for concurrent rebuild scenarios.

The test case thoroughly validates concurrent rebuild limits, but could benefit from additional documentation about:

  • Expected rebuild timing behaviors
  • Data integrity verification methods
  • Potential impact on system resources

Would you like me to help generate improved documentation?


Line range hint 84-116: Consider extending RWX volume tests to include node shutdown scenarios.

Given the PR objectives to test node shutdown and auto-reattachment scenarios (issue #9858), consider extending this test case to include:

  1. Node shutdown while RWX volume is attached
  2. Verification of auto-reattachment behavior
  3. Network connectivity validation after reattachment

Would you like me to help draft additional test scenarios that align with the PR objectives?

e2e/tests/negative/replica_rebuilding.robot (3)

Line range hint 133-135: Fix duplicate 'And' keyword in test steps.

There's a syntax error in these steps where "And" is duplicated:

-        And And Wait until volume of deployment 0 replica rebuilding started on node 0
+        And Wait until volume of deployment 0 replica rebuilding started on node 0

Line range hint 22-35: Consider randomizing node selection for better test coverage.

The test cases use a sequential pattern (node 0, 1, 2) for deleting replicas. Consider randomizing the node selection to catch potential edge cases and ensure the system behaves correctly regardless of the order of node operations.

Example approach:

# Create a list of available nodes and shuffle them
@{nodes}=    Create List    0    1    2
Evaluate    random.shuffle(@{nodes})    random

# Use shuffled nodes in the test
FOR    ${i}    IN RANGE    ${LOOP_COUNT}
    FOR    ${node}    IN    @{nodes}
        When Delete volume 0 replica on node ${node}
        And Wait until volume 0 replica rebuilding started on node ${node}
        And Wait until volume 0 replica rebuilding completed on node ${node}
    END
    Then Wait for volume 0 healthy
    And Check volume 0 data is intact
END

Also applies to: 89-102


Line range hint 106-121: Consider adding more specific assertions during rebuilding.

While the test cases verify the final state and data integrity, consider adding assertions to verify the system state during the rebuilding process. This could help catch intermediate issues and provide better debugging information.

Example additions:

When Delete replica of deployment 0 volume on node 0
And Wait until volume of deployment 0 replica rebuilding started on node 0
# Add assertions about the system state during rebuilding
And Volume should have expected rebuild status
And Volume metrics should show active rebuild
And System events should indicate rebuilding progress
e2e/tests/negative/network_disconnect.robot (2)

Line range hint 1-1: Consider adding specific test cases for share manager issues.

While the current test cases cover network disconnection scenarios well, they don't fully address the share manager stuck issues mentioned in issue #9858. Consider adding test cases that specifically:

  1. Validate the share manager state during node shutdown
  2. Verify that the share manager doesn't get stuck in stopping/error state
  3. Confirm successful reattachment of RWX PVCs after node shutdown

This would help ensure the specific issues mentioned in #9858 are properly tested and prevent regressions.

Would you like me to help draft additional test cases focusing on these scenarios?


Line range hint 1-1: Enhance test verification and automation.

The test cases are well-structured, but consider these improvements:

  1. Add explicit verification steps for volume state transitions
  2. Automate more aspects of the manual test plans
  3. Add logging to capture state changes during network disconnection

This would improve test reliability and debugging capabilities.

Example verification steps to add:

And Verify volume state transition from "attached" to "detached"
And Verify no orphaned resources after reattachment
And Log volume manager state changes during network disconnect
e2e/tests/negative/test_backup_listing.robot (3)

Line range hint 159-171: Consider performance and reliability improvements for large volume operations.

The test steps for large volume operations (210GB, 150GB) might benefit from the following improvements:

  1. Add explicit timeout settings for large data operations
  2. Verify cleanup completion between volume operations to prevent resource exhaustion
  3. Consider adding progress logging for long-running operations

Example improvement for timeout and logging:

Write 210 GB large data to file 0 in pod 2
    [Timeout]    4h    # Adjust based on performance characteristics
    FOR    ${written}    IN RANGE    0    210    10
        Write 10 GB data to file 0 in pod 2
        Log    Written ${written+10}GB of 210GB
    END

Line range hint 1-171: Consider separating test concerns and adding node shutdown scenarios.

While the backup listing test improvements are valuable, they don't directly address the PR objectives regarding node shutdown and share manager scenarios. Consider:

  1. Creating a separate test file specifically for node shutdown and auto reattachment scenarios
  2. Adding test cases that validate share manager behavior during node shutdown
  3. Including specific assertions for RWX PVC attachment behavior

This would help maintain clear separation of concerns while ensuring comprehensive coverage of the scenarios described in issue #9858.


Test case is misaligned with PR objectives - additional test coverage needed

The current test files show that while there are tests for share manager pod recreation (component_resilience.robot) and node power-off scenarios (node_reboot.robot), none specifically validate the share manager getting stuck during node shutdown as described in issue #9858. The test case being reviewed focuses on backup listing for large volumes, which is unrelated to the PR's objectives.

Suggested actions:

  • Add a new test case in node_reboot.robot specifically targeting the share manager stuck scenario during node shutdown
  • Keep the current large volume backup test separate as it serves a different purpose
🔗 Analysis chain

Line range hint 148-158: Verify test case alignment with PR objectives.

While this test case focuses on backup listing for large volumes, the PR objectives mention testing volume node shutdown and auto reattachment scenarios. Consider adding specific test cases that validate the behavior described in issue #9858 regarding share manager getting stuck when a node is shut down.

Let's check if there are other test files that might be more relevant to the PR objectives:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test files that might be more relevant to testing node shutdown scenarios
rg -l "shutdown.*node|node.*shutdown" "e2e/tests/"

# Search for test files that might be testing share manager functionality
rg -l "share.*manager|manager.*share" "e2e/tests/"

Length of output: 225


Script:

#!/bin/bash
# Let's examine the content of these files to see if they already cover the share manager node shutdown scenario
rg -A 5 -B 5 "share.*manager|manager.*share|shutdown.*node|node.*shutdown" "e2e/tests/negative/node_reboot.robot" "e2e/tests/regression/test_settings.robot" "e2e/tests/negative/component_resilience.robot"

# Also search for any existing test cases related to issue #9858
rg "9858|stuck" "e2e/tests/"

Length of output: 6872

e2e/tests/negative/component_resilience.robot (1)

Line range hint 208-241: Add test case for share manager node shutdown scenario.

The current RWX volume recovery test case doesn't cover the specific scenario from issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

  1. Creates an RWX volume
  2. Shuts down the node where the share manager pod is running
  3. Verifies that the share manager successfully moves to a new node
  4. Confirms the RWX PVC can be attached to a new node

Here's a suggested test case structure:

Test Share Manager Node Shutdown Recovery
    [Documentation]    Test share manager recovery when its node is shut down
    ...                Related to issue #9858
    ...
    ...                Test steps:
    ...                    Create RWX volume using Longhorn storage class
    ...                    Deploy workload using the RWX volume
    ...                    Identify the node running share manager pod
    ...                    Shutdown the node
    ...                    Verify share manager moves to new node
    ...                    Verify RWX volume remains accessible
    
    When Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
    And Create persistentvolumeclaim 0 using RWX volume with longhorn-test storageclass
    And Create deployment 0 with persistentvolumeclaim 0
    And Write test data to deployment 0
    
    ${node}=    Get node running sharemanager pod of deployment 0
    When Power off node ${node}
    Then Wait for sharemanager pod of deployment 0 running
    And Verify sharemanager pod of deployment 0 is on different node    ${node}
    And Wait for deployment 0 pods stable
    And Check deployment 0 data is intact
e2e/keywords/workload.resource (3)

48-52: LGTM! Consider adding documentation for the powered-off nodes tracking.

The changes improve node state tracking by maintaining a list of powered-off nodes. This enhancement aligns well with the PR objectives for testing node shutdown scenarios.

Consider adding a comment explaining the purpose of ${powered_off_nodes} list and ${last_volume_node} variable for better maintainability.


53-59: LGTM! Consider documenting the difference between waiting and non-waiting variants.

The new non-blocking keyword enhances testing capabilities for concurrent scenarios. However, consider the following improvements:

  1. Add documentation explaining when to use this variant vs. the blocking version.
  2. Consider refactoring to reduce code duplication:
+Power off volume node of ${workload_kind} ${workload_id}
+    [Arguments]    ${waiting}=True
+    ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
+    ${volume_name} =    get_workload_volume_name    ${workload_name}
+    ${powered_off_node} =    get_volume_node    ${volume_name}
+    Append to list      ${powered_off_nodes}     ${powered_off_node}
+    ${last_volume_node} =    get_volume_node    ${volume_name}
+    power_off_volume_node    ${volume_name}    waiting=${waiting}
+    Set Test Variable    ${last_volume_node}
+
+Power off volume node of ${workload_kind} ${workload_id} without waiting
+    Power off volume node of ${workload_kind} ${workload_id}    waiting=False

48-59: Well-structured changes that enhance test coverage for node shutdown scenarios.

The modifications effectively support testing of share manager behavior during node shutdowns by:

  1. Maintaining accurate tracking of powered-off nodes for reliable cleanup
  2. Enabling concurrent testing through non-blocking operations
  3. Providing consistent node state management across different test scenarios

These improvements align well with the PR objectives of testing RWX PVC behavior during node shutdowns.

Consider documenting the test patterns and best practices for using these keywords in the test suite documentation, particularly focusing on:

  • When to use blocking vs. non-blocking variants
  • Guidelines for cleanup and recovery after node shutdowns
  • Recommended patterns for testing concurrent scenarios
e2e/tests/negative/node_drain.robot (4)

Line range hint 1-24: Add test case for share manager node shutdown scenario

While the existing test cases cover various node drain scenarios, they don't specifically address the issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

  1. Creates an RWX PVC with share manager
  2. Shuts down the node where share manager pod is running
  3. Verifies that the share manager successfully moves to a new node
  4. Validates that the RWX PVC can be attached to a new node

This would help prevent regression of the issue described in #9858.

Would you like me to help draft this additional test case?


Line range hint 25-63: Enhance RWX volume assertions in force drain test

For RWX volumes (deployment 1), consider adding these additional checks:

  1. Verify share manager pod successfully moves to another node
  2. Assert share manager is not in stopping/error state
  3. Validate share manager service endpoints are updated

This would strengthen the test coverage for RWX-specific behaviors during node drain.


Line range hint 219-243: Improve robustness of node drain policy test

Consider enhancing the test case with:

  1. Explicit timeout handling for drain operations
  2. State verification between steps
  3. Cleanup verification after policy change

This would make the test more reliable and easier to debug.

Example improvements:

    When Force drain node 2 and expect failure
    And Check instance-manager pod is running on node 2
    And Verify node 2 status is cordoned
    And Verify all non-protected pods are evicted    timeout=300s

    When Set setting node-drain-policy to always-allow
    And Verify setting update is applied
    And Force drain node 2 and expect success
    And Wait until all pods are evicted    timeout=300s
    And Check PDB not exist    instance_manger=${instance_manager_name}
    And Verify cleanup of protection resources

Line range hint 219-219: Enhance test documentation for node drain policy test

The documentation could be improved by:

  1. Adding expected behavior for RWX volumes specifically
  2. Including verification steps for share manager state
  3. Documenting cleanup expectations

This would make the test's relationship to issue #9858 clearer.

e2e/tests/negative/node_reboot.robot (1)

56-56: Consider adding post-power-on verification

While the test powers on the nodes, consider adding explicit verification steps after power on to ensure complete system recovery.

 And Power on off nodes
+And Wait for longhorn ready
+And Wait for nodes ready
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f28fcf4 and 026531a.

📒 Files selected for processing (35)
  • e2e/keywords/common.resource (2 hunks)
  • e2e/keywords/host.resource (2 hunks)
  • e2e/keywords/sharemanager.resource (1 hunks)
  • e2e/keywords/variables.resource (1 hunks)
  • e2e/keywords/workload.resource (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/base.py (1 hunks)
  • e2e/libs/host/harvester.py (2 hunks)
  • e2e/libs/keywords/host_keywords.py (1 hunks)
  • e2e/libs/keywords/sharemanager_keywords.py (1 hunks)
  • e2e/tests/negative/cluster_restart.robot (1 hunks)
  • e2e/tests/negative/component_resilience.robot (1 hunks)
  • e2e/tests/negative/kubelet_restart.robot (1 hunks)
  • e2e/tests/negative/network_disconnect.robot (1 hunks)
  • e2e/tests/negative/node_delete.robot (1 hunks)
  • e2e/tests/negative/node_drain.robot (1 hunks)
  • e2e/tests/negative/node_reboot.robot (8 hunks)
  • e2e/tests/negative/pull_backup_from_another_longhorn.robot (1 hunks)
  • e2e/tests/negative/replica_rebuilding.robot (1 hunks)
  • e2e/tests/negative/stress_cpu.robot (1 hunks)
  • e2e/tests/negative/stress_filesystem.robot (1 hunks)
  • e2e/tests/negative/stress_memory.robot (1 hunks)
  • e2e/tests/negative/test_backup_listing.robot (3 hunks)
  • e2e/tests/negative/uninstallation_checks.robot (1 hunks)
  • e2e/tests/regression/test_backing_image.robot (1 hunks)
  • e2e/tests/regression/test_backup.robot (1 hunks)
  • e2e/tests/regression/test_basic.robot (1 hunks)
  • e2e/tests/regression/test_engine_image.robot (1 hunks)
  • e2e/tests/regression/test_ha.robot (1 hunks)
  • e2e/tests/regression/test_migration.robot (1 hunks)
  • e2e/tests/regression/test_replica.robot (1 hunks)
  • e2e/tests/regression/test_scheduling.robot (1 hunks)
  • e2e/tests/regression/test_settings.robot (1 hunks)
  • e2e/tests/regression/test_v2.robot (2 hunks)
  • e2e/tests/regression/test_volume.robot (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • e2e/keywords/variables.resource
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/host/aws.py

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

e2e/libs/keywords/sharemanager_keywords.py

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)


87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🔇 Additional comments (49)
e2e/libs/host/base.py (1)

26-26: LGTM! Verify implementing classes.

The addition of the waiting parameter to the abstract method provides better control over shutdown behavior, which aligns well with the PR's objective of testing node shutdown scenarios.

Let's verify that all implementing classes have been updated accordingly:

✅ Verification successful

All implementing classes have been properly updated with the new parameter

The verification shows that all implementing classes (harvester.py and aws.py) have correctly updated their power_off_node method signatures to include the waiting parameter with a default value of True, matching the abstract method's signature in base.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all implementing classes have updated the power_off_node method signature

# Search for classes that inherit from Base
ast-grep --pattern 'class $_ (Base):'

# Search for power_off_node implementations
rg -A 2 'def power_off_node\(' --type py

Length of output: 683

e2e/tests/regression/test_engine_image.robot (1)

6-6: Verify the variables defined in the new resource file.

The change moves variable definitions to a centralized variables.resource file. Please ensure that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in the centralized resource file

The verification confirms that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in e2e/keywords/variables.resource with appropriate values:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${DATA_ENGINE} = v1
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check for variable definitions in the new resource file
echo "Checking variables.resource for required variable definitions..."
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "../keywords/variables.resource"

# Check if these variables are still defined elsewhere in the test suite
echo "Checking for any remaining inline definitions of these variables..."
rg "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "e2e/tests/"

Length of output: 531


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
echo "Finding the variables.resource file..."
fd "variables.resource" .

# Then check its content for variable definitions
echo -e "\nChecking content of the variables.resource file if found..."
fd "variables.resource" . --exec cat {}

# Also check for any variable definitions in the test file itself
echo -e "\nChecking current test file for variable definitions..."
cat e2e/tests/regression/test_engine_image.robot

Length of output: 1682

e2e/tests/regression/test_backing_image.robot (2)

Line range hint 13-31: LGTM! Well-structured test case with comprehensive coverage.

The test case demonstrates good practices:

  • Clear BDD-style steps
  • Proper resource management
  • Verification of error conditions
  • Complete cleanup

6-6: LGTM! Good practice to centralize variable definitions.

The addition of variables.resource improves maintainability by centralizing common variables used across test files.

Let's verify that all required variables are properly defined in the new resource file:

✅ Verification successful

All required variables are properly defined in the centralized resource file

The verification confirms that variables.resource contains all the essential variables mentioned in the test file:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${DATA_ENGINE} = v1

The centralization of these variables is indeed a good practice that will help maintain consistency across test files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# Expected variables: LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, DATA_ENGINE

# Check the variables.resource file for required variable definitions
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}' ../keywords/variables.resource

Length of output: 172


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for variable definitions
fd variables.resource --exec cat {}

Length of output: 384

e2e/keywords/host.resource (2)

Line range hint 4-9: LGTM: Library imports are well-organized and complete.

The imports provide all necessary dependencies for node management operations.


44-47: Verify implementation against test requirements.

The implementation tracks powered-off nodes for later recovery, which aligns with the test objectives. However, consider adding parameter validation and error handling.

Let's verify the implementation against the test requirements:

Consider adding parameter validation:

 Power off node ${node_id}
+    # Validate node_id is a positive integer
+    Should Be True    ${node_id} >= 0    msg=Node ID must be a non-negative integer
+    
     ${powered_off_node} =     get_node_by_index    ${node_id}
+    Should Not Be Empty    ${powered_off_node}    msg=Invalid node ID: ${node_id}
+    
     Append to list      ${powered_off_nodes}     ${powered_off_node}
     power_off_node_by_name    ${powered_off_node}
✅ Verification successful

Implementation is correctly integrated with test framework

The verification shows that the implementation is properly integrated with the test framework:

  1. powered_off_nodes is initialized as an empty list in variables.resource
  2. The keyword is used in node reboot and drain test scenarios
  3. Proper cleanup is implemented in Cleanup test resources keyword which powers on nodes and cleans the list
  4. Multiple test files track powered-off nodes consistently using the same list

The suggested parameter validation is not necessary because:

  • get_node_by_index is a framework-level function that would fail if the index is invalid
  • The list operation is safe as powered_off_nodes is pre-initialized
  • Test teardown ensures proper cleanup of powered-off nodes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of this keyword in test cases
# and ensure it's properly integrated with the shutdown volume node test case

# Check for test cases using this keyword
rg -l "Power off node" --type robot

# Check for related test implementations
rg -l "shutdown.*volume.*node.*test.*auto.*reattach" --type robot

Length of output: 178


Script:

#!/bin/bash
# Let's examine the test files that use this keyword to understand the context
rg -B 5 -A 5 "Power off node" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

# Check if powered_off_nodes is initialized somewhere
rg "powered_off_nodes" --type robot

# Check the implementation of get_node_by_index
rg -B 2 -A 2 "get_node_by_index" --type robot

Length of output: 3494


Script:

#!/bin/bash
# Let's check the initialization of powered_off_nodes and related setup
rg -B 5 -A 5 "Test Setup|Suite Setup" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

# Check the implementation of power_off_node_by_name
rg -B 2 -A 2 "power_off_node_by_name" --type robot

# Check if there are any test teardown procedures handling powered off nodes
rg -B 5 -A 5 "Test Teardown|Suite Teardown" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

Length of output: 3561


Script:

#!/bin/bash
# Let's check the Set test environment keyword which might initialize powered_off_nodes
rg -B 2 -A 5 "Set test environment" ../e2e/keywords/

# Check the Cleanup test resources keyword for handling powered off nodes
rg -B 2 -A 5 "Cleanup test resources" ../e2e/keywords/

# Check for any other references to powered_off_nodes in keyword files
rg -B 2 -A 2 "powered_off_nodes" ../e2e/keywords/

Length of output: 332


Script:

#!/bin/bash
# Let's check the Set test environment keyword which might initialize powered_off_nodes
rg -B 2 -A 5 "Set test environment" e2e/keywords/

# Check the Cleanup test resources keyword for handling powered off nodes
rg -B 2 -A 5 "Cleanup test resources" e2e/keywords/

# Check for any other references to powered_off_nodes in keyword files
rg -B 2 -A 2 "powered_off_nodes" e2e/keywords/

# Let's also check the content of the host.resource file
cat e2e/keywords/host.resource

Length of output: 5101

e2e/tests/regression/test_migration.robot (1)

6-6: LGTM! Good practice moving variables to a centralized resource file.

The addition of variables.resource improves maintainability by centralizing variable definitions that are used across multiple test files.

e2e/keywords/sharemanager.resource (2)

29-32: LGTM! The new keyword follows consistent patterns.

The implementation correctly follows the established patterns in the file:

  • Reuses existing helper functions for name generation
  • Maintains consistent naming conventions
  • Follows the same structure as similar keywords

29-32: Verify the restart detection logic.

Since this keyword is crucial for testing the auto-reattachment functionality mentioned in issue #9858, we should verify the underlying implementation.

✅ Verification successful

Let's examine the implementation of the function in the identified file.


Let's check the retry configuration and pod utilities being used to ensure the restart detection is robust.


Let's check the retry configuration values and the get_pod implementation to complete our verification.


The restart detection logic is properly implemented

The implementation correctly detects pod restarts by:

  • Tracking pod creation timestamps to detect new instances
  • Using robust retry mechanism with configurable count and interval
  • Properly handling pod absence during restart
  • Using the official Kubernetes API for pod status checks
  • Failing explicitly if restart is not detected within the retry window
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the implementation of wait_for_sharemanager_pod_restart
# to ensure it correctly detects pod restarts and handles timeouts

# Search for the implementation
ast-grep --pattern 'def wait_for_sharemanager_pod_restart($$$)'

# Look for related test cases
rg -l "wait_for_sharemanager_pod_restart" --type py

Length of output: 160


Script:

#!/bin/bash
# Get the implementation of wait_for_sharemanager_pod_restart and surrounding context
rg -A 20 "def wait_for_sharemanager_pod_restart" e2e/libs/keywords/sharemanager_keywords.py

# Also check for any restart detection related functions in the same file
rg "restart" e2e/libs/keywords/sharemanager_keywords.py

Length of output: 1432


Script:

#!/bin/bash
# Check the retry configuration
rg "get_retry_count_and_interval" --type py

# Check pod utilities
rg "def get_pod" --type py

# Look for any test cases using this function
rg -B 5 -A 10 "wait_for_sharemanager_pod_restart" e2e/

Length of output: 10658


Script:

#!/bin/bash
# Check the retry configuration implementation
rg -A 5 "def get_retry_count_and_interval" e2e/libs/utility/utility.py

# Check the get_pod implementation
cat e2e/libs/utility/utility.py | rg -A 10 "def get_pod"

Length of output: 641

e2e/tests/regression/test_scheduling.robot (1)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource aligns with best practices by centralizing variable definitions, making the test suite more maintainable and consistent.

e2e/tests/regression/test_replica.robot (2)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource improves code organization by centralizing variable definitions that were previously scattered across test files.

Let's verify the resource file exists and contains the required variables:

✅ Verification successful

Resource file contains all expected variables and follows Robot Framework conventions

The variables.resource file exists and contains essential variables including:

  • ${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL} for test control flow
  • ${DATA_ENGINE} for engine specification
  • Additional configuration variables for volume types and network settings
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the variables.resource file exists and contains the required variables
# Expected: File exists and contains LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, and DATA_ENGINE

# Check if the file exists and show its contents
cat e2e/keywords/variables.resource

Length of output: 326


Line range hint 1-24: Consider adding node shutdown test cases to align with PR objectives.

The PR objectives mention testing node shutdown and auto-reattachment scenarios (issue #9858), but this file only contains replica rebuilding tests. Consider adding test cases for:

  1. Share manager pod behavior during node shutdown
  2. RWX PVC auto-reattachment to new nodes

Let's check if these test cases exist elsewhere:

e2e/tests/regression/test_volume.robot (1)

6-6: Verify variables.resource contains all required variables

The change to centralize variables is good for maintainability. However, we should verify that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in variables.resource

The verification confirms that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) have been successfully moved to e2e/keywords/variables.resource with the same values:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1

The centralization of these variables is a good practice and has been implemented correctly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in variables.resource

# Check if the variables exist in the new resource file
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\})" "e2e/keywords/variables.resource"

Length of output: 98


Script:

#!/bin/bash
# Let's first check if the variables.resource file exists and see its content
fd "variables.resource" e2e/keywords/ --type f --exec cat {}

# Also, let's check the original test file to see what variables were previously defined
fd "test_volume.robot" e2e/tests/regression/ --type f --exec git show HEAD~1:{}

Length of output: 2590

e2e/libs/keywords/host_keywords.py (1)

49-52: Implementation aligns with PR objectives.

The addition of the waiting parameter enhances the testing capabilities for issue #9858, allowing verification of share manager behavior during both immediate and delayed node shutdowns. This flexibility is crucial for thorough testing of RWX PVC auto-reattachment scenarios.

Let's verify the usage of this method in test cases:

✅ Verification successful

Implementation aligns with PR objectives and is being utilized correctly

The verification confirms that the power_off_volume_node method with the new waiting parameter is being used in the test framework:

  • In e2e/keywords/workload.resource: Used with both default behavior (waiting=True) and immediate shutdown (waiting=False)
  • The implementation provides the flexibility needed for testing share manager behavior during node shutdowns as intended in issue #9858
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the new waiting parameter is utilized in test cases
# Expected: Find test cases using power_off_volume_node with waiting=False

# Search for test cases using the new parameter
rg -l "power_off_volume_node.*waiting.*=.*False" "e2e/test"

Length of output: 108


Script:

#!/bin/bash
# Let's first find the correct test directory structure
fd -t d "test"

# Then search for any usage of power_off_volume_node
rg "power_off_volume_node" -A 2 -B 2

Length of output: 1641

e2e/tests/negative/kubelet_restart.robot (1)

6-6: Verify variables in the new resource file

The variables previously defined in this file have been moved to variables.resource. Let's ensure all required variables are properly defined.

✅ Verification successful

Variables are properly defined in the centralized resource file

All required variables used in kubelet_restart.robot are correctly defined in e2e/keywords/variables.resource:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${RWX_VOLUME_FAST_FAILOVER} = false
  • ${DATA_ENGINE} = v1

The centralization of these variables maintains consistency and follows good practices for test maintenance.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# Expected variables: LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE

# Check for variable definitions in the resource file
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}' ../keywords/variables.resource

Length of output: 197


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for variable definitions
# We'll also search for these variables in the original test file to understand what we're looking for
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}' e2e/tests/negative/kubelet_restart.robot

Length of output: 192


Script:

#!/bin/bash
# Now that we found the correct path, let's check the content of variables.resource
cat e2e/keywords/variables.resource

# Also let's check the original test file to see how these variables were used before
cat e2e/tests/negative/kubelet_restart.robot

Length of output: 2688

e2e/tests/regression/test_ha.robot (2)

6-6: Verify variable definitions in the new resource file

The migration of variables to variables.resource improves maintainability. However, we should ensure that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are properly defined in the new resource file.

✅ Verification successful

Variables are properly defined in the resource file

The verification confirms that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are correctly defined in e2e/keywords/variables.resource with appropriate default values:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check for required variables in the resource file
echo "Checking for required variables in variables.resource..."
rg -A 1 "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "../keywords/variables.resource"

Length of output: 274


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd -t f "variables.resource$"

# Then check the content of the file for our variables
echo "=== Checking test file for variable usage ==="
rg "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "e2e/tests/regression/test_ha.robot"

Length of output: 289


Script:

#!/bin/bash
# Now that we found the correct path, let's check the content of variables.resource
echo "=== Content of variables.resource ==="
cat "e2e/keywords/variables.resource"

# Also check if these variables might be defined in other resource files
echo -e "\n=== Checking other resource files for these variables ==="
rg "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "e2e/**/*.resource"

Length of output: 665


Line range hint 36-52: Consider implementing the skipped test case

The "Don't Orphan Processes" test case has detailed documentation but is currently skipped. This test could be valuable for verifying proper cleanup of resources during node failures, which aligns with the PR's objectives around node shutdown scenarios.

e2e/tests/negative/stress_cpu.robot (1)

Line range hint 15-89: Consider adding a test case for node shutdown scenario.

While the existing test cases provide good coverage for CPU stress scenarios, they don't directly address the PR objective of testing node shutdown and auto-reattachment behavior. Consider adding a new test case that specifically tests this scenario.

Example structure for the new test case:

Stress Volume Node CPU During Node Shutdown And Auto Reattachment
    Given Create volume 0 with    size=5Gi    numberOfReplicas=3
    And Attach volume 0
    And Write data to volume 0

    FOR    ${i}    IN RANGE    ${LOOP_COUNT}
        When Stress CPU of node with volume 0
        And Shutdown volume node
        And Wait for volume auto-reattachment to new node
        Then Check volume 0 data is intact
    END

Let's verify if similar test cases exist in other files:

e2e/tests/negative/stress_memory.robot (3)

6-6: LGTM! Good practice centralizing variables.

The addition of variables.resource improves maintainability by centralizing variable definitions, which is particularly important for test parameters like ${LOOP_COUNT} that are used across multiple test cases.


Line range hint 19-31: Add documentation about expected behavior under memory stress.

The test case verifies data integrity during replica rebuilding under memory stress, but it would be helpful to document:

  • Expected memory thresholds
  • Expected system behavior under stress
  • Success criteria beyond data integrity

Let's verify the memory stress implementation:

✅ Verification successful

Add documentation about memory stress test configuration and behavior

The test uses stress-ng with the following configuration:

  • Memory load: 80% of available memory
  • Single VM worker process
  • 1-hour timeout period
  • Stress pod runs on the same node as the volume

Consider documenting:

  • The memory stress configuration in test description
  • Expected behavior: system should maintain data integrity while under 80% memory pressure
  • Success criteria: volume replica rebuilding completes successfully under memory stress
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how memory stress is implemented
# Expected: Find the stress memory implementation to understand thresholds

ast-grep --pattern 'Stress memory of node with volume $_'
rg -A 5 'Stress memory of node with volume'

Length of output: 1356


Script:

#!/bin/bash
# Let's check the implementation of stress_node_memory functions
rg -A 10 'stress_node_memory_by_volume|stress_node_memory_of_all_volumes'

# Also check if there are any configuration or constant files that might define thresholds
fd -e yaml -e py -e robot -e resource | xargs rg -l 'stress.*memory|memory.*stress'

Length of output: 2860


Script:

#!/bin/bash
# Let's check the actual memory stress implementation in stress.py
cat e2e/libs/node/stress.py

Length of output: 5186


Script:

#!/bin/bash
# Let's check the constant values used for memory stress
cat e2e/libs/node/constant.py

Length of output: 373


Line range hint 33-47: Consider potential race conditions in stress test loops.

The remaining test cases follow similar patterns of stressing memory during volume operations. Consider:

  1. Adding retry mechanisms for volume operations that might fail under stress
  2. Implementing proper cleanup between iterations
  3. Adding health checks before proceeding to the next iteration

Let's verify the retry mechanisms:

Also applies to: 49-67, 69-89

✅ Verification successful

Based on the search results, I can see that there are already retry mechanisms and health checks implemented in the test framework:

  1. The Wait for volume healthy keyword is consistently used after volume operations
  2. Data integrity checks are performed using Check volume data is intact
  3. The test framework has proper cleanup between iterations with detach/attach cycles
  4. Health checks are performed before proceeding to next iterations

The review comment's concerns about retry mechanisms and health checks are already addressed in the codebase.

Let me verify the cleanup mechanisms between iterations:


Let me try a different search pattern to verify the cleanup mechanisms:


The review comment's concerns are already properly addressed in the codebase.

Looking at the test file and previous search results, I can confirm that:

  1. Retry mechanisms: The framework uses Wait for volume healthy and Wait for volume detached keywords consistently to ensure operations complete
  2. Cleanup between iterations: The test properly detaches and reattaches volumes between iterations with proper wait states
  3. Health checks: Each iteration includes data integrity verification with Check volume data is intact and volume health checks with Wait for volume healthy

The test follows best practices for stress testing with proper cleanup, verification, and health checks between iterations. No changes are needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for retry implementations in volume operations
# Expected: Find retry mechanisms for volume operations under stress

rg -A 5 '(Wait for|Wait until).*volume.*healthy'
ast-grep --pattern 'Wait for volume $_ healthy'

Length of output: 53636


Script:

#!/bin/bash
# Check cleanup implementations between iterations
rg -A 5 'Detach volume.*\s+.*Wait for volume.*detached'

Length of output: 57


Script:

#!/bin/bash
# Check cleanup implementations between iterations
rg -A 2 'Detach volume.*\s+And Wait for volume'

# Also check the test file directly
cat e2e/tests/negative/stress_memory.robot

Length of output: 2641

e2e/tests/negative/stress_filesystem.robot (2)

6-6: LGTM: Good practice to centralize variable definitions.

The addition of variables.resource aligns with best practices by centralizing common variable definitions, making the test framework more maintainable.


Line range hint 19-29: Consider enhancing error handling in the replica rebuilding test.

The test "Stress Volume Node Filesystem When Replica Is Rebuilding" should verify that the filesystem remains stable during rebuilding. Consider adding additional validation steps.

Consider adding these validation steps within the loop:

And Verify volume 0 filesystem is accessible
And Verify no I/O errors in volume 0 logs
e2e/tests/regression/test_v2.robot (2)

6-6: Verify variables migration completeness

The addition of variables.resource and removal of inline variables is a good practice for maintainability. However, we should verify that all necessary variables (especially ${LOOP_COUNT}) are properly defined in the new resource file.

✅ Verification successful

Variables migration is complete and correct

The variables.resource file contains all essential variables including ${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, and other global variables needed for testing. The migration to a centralized variable definition is properly implemented.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all variables used in the test file are defined in the resource file
# Check if LOOP_COUNT and other variables are defined in the new resource file
rg -A 1 "^\*\*\* Variables \*\*\*" "../keywords/variables.resource"
rg "LOOP_COUNT" "../keywords/variables.resource"

Length of output: 256


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd "variables.resource"

# Then check the content of test_v2.robot to see what variables are being used
rg "\${[A-Z_]+}" "e2e/tests/regression/test_v2.robot"

Length of output: 203


Script:

#!/bin/bash
# First, let's check the content of the variables resource file
cat e2e/keywords/variables.resource

# Then check variables used in test_v2.robot with a corrected regex
rg "\\\${[A-Z_]+}" "e2e/tests/regression/test_v2.robot"

Length of output: 477


54-54: Test coverage gap for node shutdown scenario

While the cluster tag addition is appropriate for this test case, I notice that this file doesn't include the test case mentioned in the PR objectives for testing volume node shutdown and auto-reattachment scenarios (issue #9858).

Would you like me to help draft a new test case that specifically addresses the share manager node shutdown scenario described in issue #9858?

e2e/tests/negative/uninstallation_checks.robot (2)

Line range hint 24-92: LGTM on test structure and documentation!

The test case is well-documented with clear prerequisites, steps, and environment requirements. The implementation follows good practices:

  • Tests both v1 and v2 data engines
  • Includes proper data verification
  • Validates cross-cluster DR functionality

6-6: Verify variable definitions in the new resource file.

The addition of variables.resource and removal of local variable declarations is a good practice for centralization. However, we should ensure all required variables are properly defined.

✅ Verification successful

Resource file change verified successfully

The variables.resource file properly defines all the required variables including ${LOOP_COUNT}, ${RETRY_COUNT}, and ${RETRY_INTERVAL}. The file is correctly located at e2e/keywords/variables.resource and is being used by multiple test files in the e2e test suite. The uninstallation_checks.robot file doesn't contain any local variable declarations that would conflict with the centralized variables.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# and check for any potential impacts on other test files.

# Check the content of the variables resource file
echo "Checking variables.resource content:"
cat ../keywords/variables.resource

# Look for any remaining direct references to the moved variables
echo -e "\nChecking for any remaining references to variables in test files:"
rg "(\${LOOP_COUNT}|\${RETRY_COUNT}|\${RETRY_INTERVAL})" ./e2e/tests/

Length of output: 532


Script:

#!/bin/bash
# Let's fix the path and regex issues in our verification

# First, let's find the correct path to the variables.resource file
echo "Looking for variables.resource file:"
fd "variables.resource"

# Check for variable references with properly escaped regex
echo -e "\nChecking for variable references in test files:"
rg '\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}' e2e/tests/

# Also check the content of the uninstallation_checks.robot to understand variable usage
echo -e "\nChecking uninstallation_checks.robot content:"
cat e2e/tests/negative/uninstallation_checks.robot

Length of output: 7391


Script:

#!/bin/bash
# Let's check the content of the variables.resource file since we found its correct path
echo "Checking variables.resource content:"
cat e2e/keywords/variables.resource

# Also check if there are any variable declarations in the uninstallation_checks.robot
# that might have been moved to variables.resource
echo -e "\nChecking for any variable declarations in uninstallation_checks.robot:"
rg '^\$\{.*\}.*=' e2e/tests/negative/uninstallation_checks.robot

Length of output: 633

e2e/tests/negative/node_delete.robot (1)

6-6: Verify all required variables are defined in variables.resource

The centralization of variables is a good practice. However, we need to ensure that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${RWX_VOLUME_FAST_FAILOVER}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in variables.resource

The verification confirms that all necessary variables have been successfully moved to the centralized resource file:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${RWX_VOLUME_FAST_FAILOVER} = false
  • ${DATA_ENGINE} = v1
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file

# Check if all required variables are defined in variables.resource
echo "Checking for required variables in variables.resource..."
rg -A 1 "(\${LOOP_COUNT}|\${RETRY_COUNT}|\${RETRY_INTERVAL}|\${RWX_VOLUME_FAST_FAILOVER}|\${DATA_ENGINE})" "../keywords/variables.resource"

Length of output: 434


Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file

# Check if all required variables are defined in variables.resource
echo "Checking for required variables in variables.resource..."
# Search for each variable separately to avoid regex issues
echo "Searching for LOOP_COUNT..."
rg "LOOP_COUNT" "e2e/keywords/variables.resource"
echo "Searching for RETRY_COUNT..."
rg "RETRY_COUNT" "e2e/keywords/variables.resource"
echo "Searching for RETRY_INTERVAL..."
rg "RETRY_INTERVAL" "e2e/keywords/variables.resource"
echo "Searching for RWX_VOLUME_FAST_FAILOVER..."
rg "RWX_VOLUME_FAST_FAILOVER" "e2e/keywords/variables.resource"
echo "Searching for DATA_ENGINE..."
rg "DATA_ENGINE" "e2e/keywords/variables.resource"

# Also check the test file to see what variables were previously defined
echo "Checking git history for previously defined variables..."
git log -p "e2e/tests/negative/node_delete.robot" | rg "^\-.*(\$\{.*\})"

Length of output: 992

e2e/tests/negative/pull_backup_from_another_longhorn.robot (1)

6-6: LGTM: Good refactoring of variable management

The addition of variables.resource aligns with best practices by centralizing variable definitions, making the test framework more maintainable.

e2e/tests/negative/cluster_restart.robot (1)

6-6: Verify the variables.resource file setup

The centralization of variables into variables.resource is a good practice. However, let's ensure all required variables are properly defined.

✅ Verification successful

Resource file setup is correct and properly configured

The variables.resource file exists at e2e/keywords/variables.resource and contains all the required variables (LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, CONTROL_PLANE_NODE_NETWORK_LATENCY_IN_MS, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE). The import path in cluster_restart.robot correctly references this file using a relative path.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the variables.resource file exists and contains all required variables

# Check if the file exists and contains the required variables
rg -U "(?s)\*\*\* Variables \*\*\*.*\\\${LOOP_COUNT}.*\\\${RETRY_COUNT}.*\\\${RETRY_INTERVAL}.*\\\${CONTROL_PLANE_NODE_NETWORK_LATENCY_IN_MS}.*\\\${RWX_VOLUME_FAST_FAILOVER}.*\\\${DATA_ENGINE}" ../keywords/variables.resource

Length of output: 502


Script:

#!/bin/bash
# Let's first check if the variables.resource file exists
fd "variables.resource" ../keywords/

# Then let's examine its content
rg "^\\\$\{.*\}" ../keywords/variables.resource

# Also check for Variables section marker
rg "\*\*\* Variables \*\*\*" ../keywords/variables.resource

Length of output: 388


Script:

#!/bin/bash
# Let's search for the variables.resource file in the entire repository
fd "variables.resource"

# Let's also check the actual path relative to the test file
ls -l e2e/tests/keywords/variables.resource

Length of output: 186


Script:

#!/bin/bash
# Let's examine the content of the found variables.resource file
cat e2e/keywords/variables.resource

# Also check the import statement in the test file
cat e2e/tests/negative/cluster_restart.robot | head -n 10

Length of output: 682

e2e/libs/host/aws.py (2)

71-71: LGTM! Well-designed parameter addition.

The new waiting parameter with default value True maintains backward compatibility while enabling more flexible testing scenarios. This change aligns well with the PR objective of testing node shutdown behaviors.


77-78: Verify AWS waiter timeout configuration.

Since this code is critical for testing share manager behavior during node shutdown (issue #9858), we should verify the default timeout settings of the AWS waiter are sufficient.

e2e/tests/regression/test_backup.robot (1)

6-6: Verify the variables.resource file setup and variable accessibility.

The addition of variables.resource aligns with the effort to centralize variable management. This is a good practice for maintainability and consistency.

Let's verify the variables.resource file setup:

✅ Verification successful

Resource file setup is correct and contains the required variables

The variables.resource file exists at the correct location (e2e/keywords/variables.resource) and contains all the necessary variables including ${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, and ${DATA_ENGINE}. The file path in the test file (../keywords/variables.resource) correctly references this centralized variable file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that variables.resource exists and contains the necessary variables

# Check if the file exists and contains the required variables
rg -l "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" ../keywords/variables.resource

# Check for any remaining direct variable declarations in other test files
# that should have been moved to variables.resource
rg "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "e2e/tests/**/*.robot"

Length of output: 318


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd "variables.resource" .

# Then check for any variable declarations in Robot Framework files
fd -e robot . --exec rg -l "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}"

# Also check the content of the test_backup.robot file to understand the correct relative path
cat e2e/tests/regression/test_backup.robot

Length of output: 4327


Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check if these variables are used in the test file
rg "\${(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)}" e2e/tests/regression/test_backup.robot

Length of output: 566

e2e/libs/keywords/sharemanager_keywords.py (1)

95-95: Great addition of phase logging

The added logging statement improves test observability by showing the current phase of the share manager pod during the waiting period. This will be helpful for debugging test failures.

e2e/libs/host/harvester.py (2)

56-56: LGTM! The non-blocking power-off option is a good addition.

The new waiting parameter with a default value of True maintains backward compatibility while adding flexibility for batch operations. The early return when waiting=False is clean and straightforward.

Also applies to: 71-73


Line range hint 56-89: Consider adding additional power state validation.

Given this code is critical for testing scenarios like share manager recovery after node shutdown, consider adding additional validation to ensure the VM is truly powered off before returning.

Suggested additional validation:

     def power_off_node(self, node_name, waiting=True):
         vm_id = self.mapping[node_name]
         url = f"{self.url}/{vm_id}"
+        initial_state = None
+        try:
+            resp = requests.get(url, cookies=self.cookies, verify=False)
+            initial_state = resp.json()['metadata']['fields'].get('status', None)
+        except Exception as e:
+            logging(f"Failed to get initial VM state: {e}")
+
         for i in range(self.retry_count):
             logging(f"Trying to stop vm {vm_id} ... ({i})")
e2e/tests/regression/test_settings.robot (1)

6-6: LGTM! Good practice to centralize variable definitions.

The addition of variables.resource improves maintainability by centralizing variable definitions. This change aligns with Robot Framework best practices.

Let's verify that all required variables are properly defined in the new resource file:

✅ Verification successful

Let me try to locate the correct path for the variables resource file:


Let's check the content of the variables.resource file and look for any duplicate variable definitions:


Resource file addition verified and looks good

The verification shows that:

  1. The variables.resource file exists at the correct location (e2e/keywords/variables.resource)
  2. It contains global variable definitions that are properly scoped
  3. The test file test_settings.robot correctly references it using a relative path
  4. The test file uses ${DATA_ENGINE} which is defined in the variables resource file
  5. No conflicting variable definitions were found in the test file
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file
# and that no other test files still define these variables locally

# Check if all required variables are defined in the new resource file
echo "Checking variables.resource for required variables..."
cat ../keywords/variables.resource

# Check if any other test files still define these variables
echo "Checking for duplicate variable definitions..."
rg -l "^\*\*\* Variables \*\*\*" "e2e/tests/"

Length of output: 410


Script:

#!/bin/bash
# First, let's find the actual path of the variables.resource file
fd -e resource variables

# Check the content of test_settings.robot to understand the relative path
cat e2e/tests/regression/test_settings.robot

# List all resource files to understand the structure
fd -e resource

Length of output: 5504


Script:

#!/bin/bash
# Check the content of the variables resource file
cat e2e/keywords/variables.resource

# Look for any variables defined in test_settings.robot
echo -e "\nChecking for variables in test_settings.robot:"
rg "^\$\{.*\}" e2e/tests/regression/test_settings.robot

# Check for any duplicate variable definitions in other files
echo -e "\nChecking for duplicate variable definitions in other files:"
rg "^\$\{.*\}" e2e/tests/

Length of output: 654

e2e/tests/negative/replica_rebuilding.robot (1)

6-6: Verify variable definitions in the new resource file.

The centralization of variables is a good practice. However, we should verify that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in the resource file

The variables.resource file contains all the necessary variable definitions that are used in replica_rebuilding.robot:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${DATA_ENGINE} = v1

The centralization of these variables is correctly implemented and all test cases in replica_rebuilding.robot can access these variables through the resource import.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check if the variables are defined in the resource file
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})" ../keywords/variables.resource

Length of output: 185


Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for the required variables
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})"

Length of output: 139


Script:

#!/bin/bash
# Now that we found the file, let's check its content for the required variables
rg "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})" e2e/keywords/variables.resource -A 1

# Also check how these variables are defined
rg "^(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\}).*=.*" e2e/keywords/variables.resource

Length of output: 236


Script:

#!/bin/bash
# Let's try a different approach to check variable definitions in Robot Framework syntax
cat e2e/keywords/variables.resource

# Also check where these variables are currently used in the test file
cat e2e/tests/negative/replica_rebuilding.robot

Length of output: 5173

e2e/tests/regression/test_basic.robot (2)

6-6: LGTM! Good refactoring of variable management.

The addition of variables.resource and removal of inline variable declarations is a positive change that:

  • Centralizes variable management
  • Reduces duplication across test files
  • Improves maintainability

Line range hint 1-1: Verify location of the new test case.

The PR objectives mention adding a test case for "Shutdown Volume Node And Test Auto Reattach To A New Node", but this test case is not present in this file. Please verify:

  1. If the test case was added to a different file
  2. If the test case still needs to be added

This verification is important as the test case is crucial for validating the fix for issue #9858 where the share manager becomes stuck when a node is shut down.

✅ Verification successful

Let me check the content of the identified file to verify if it contains the test case.


The test case has been found and is properly implemented.

The test case "Shutdown Volume Node And Test Auto Reattach To A New Node" is present in e2e/tests/negative/node_reboot.robot. The test case verifies:

  • Volume node shutdown scenario for both RWO and RWX volumes
  • Auto reattachment of volumes to new nodes
  • Data integrity after node shutdown and reattachment
  • Share manager pod recovery for RWX volumes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the new test case across all robot files
echo "Searching for the new test case..."
rg -l "Shutdown.*Volume.*Node.*Auto.*Reattach" --type robot

# Search for any recent changes to robot test files
echo "Checking recent changes to robot test files..."
git diff --name-only HEAD~1 | grep "\.robot$"

Length of output: 1366


Script:

#!/bin/bash
# Check the content of node_reboot.robot for the test case
echo "Content of e2e/tests/negative/node_reboot.robot:"
cat e2e/tests/negative/node_reboot.robot

Length of output: 22057

e2e/tests/negative/network_disconnect.robot (2)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource improves maintainability by centralizing variable definitions that were previously declared locally.


Line range hint 1-1: Add more specific assertions for volume state and health.

While the test cases cover the scenarios well, consider adding more specific assertions to verify:

  1. Volume attachment state transitions
  2. Replica health status
  3. Data consistency after network recovery
  4. Resource cleanup after test completion

This would help catch subtle issues and ensure proper system recovery.

e2e/tests/negative/test_backup_listing.robot (1)

5-5: LGTM! Good improvements to test organization.

The addition of the 'negative' tag and centralization of variables through variables.resource improves test organization and maintainability.

Also applies to: 7-7

e2e/tests/negative/component_resilience.robot (2)

6-6: Verify the new resource file exists and contains the required variables.

The change to centralize variable management is good practice. However, we need to ensure all required variables (LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE) are properly defined in the new resource file.

✅ Verification successful

Resource file verification successful

The variables.resource file exists at e2e/keywords/variables.resource and contains all the required variables with appropriate default values:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${RWX_VOLUME_FAST_FAILOVER} = false
  • ${DATA_ENGINE} = v1
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the variables resource file exists and contains the required variables
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" "e2e/tests/keywords/variables.resource"

Length of output: 210


Script:

#!/bin/bash
# Let's first check if the file exists in a different location
fd -g "*variables.resource"

# Also check for any Robot Framework resource files that might contain these variables
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" -g "*.resource"

Length of output: 172


Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check where these variables were defined previously in the component_resilience.robot file
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" "e2e/tests/negative/component_resilience.robot"

Length of output: 467


Line range hint 142-166: Clarify data engine restrictions on test cases.

Several test cases are only executed when DATA_ENGINE='v1', including backing image and RWX volume tests. Please clarify:

  1. Why are these tests restricted to v1 engine?
  2. Should we have equivalent tests for other data engines?
  3. Are there specific limitations that prevent these tests from running with other engines?

Also applies to: 208-241

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Tests are correctly restricted to v1 data engine for backing image and RWX volume features

The restriction of certain tests to DATA_ENGINE='v1' is intentional and correct because:

  1. The backing image feature is specifically designed for v1 data engine, as evidenced by dedicated v2 engine tests in test_v2.robot that don't include backing image operations
  2. The tests are properly structured with conditional blocks to handle engine-specific features
  3. The default data engine is set to 'v1' in variables.resource, with separate test files for v2-specific functionality

The test implementation follows good practices by:

  • Using clear conditionals to separate engine-specific features
  • Having dedicated test files for v2 engine features
  • Maintaining proper test coverage for both engines in their supported feature sets
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for data engine related configurations and documentation
rg -A 5 "DATA_ENGINE" "e2e/tests/keywords/variables.resource"
rg -A 5 "dataEngine" "e2e/tests/keywords/"

Length of output: 238


Script:

#!/bin/bash
# Let's first find the correct paths for test files and keywords
fd "variables" "e2e"

# Then search for DATA_ENGINE usage across e2e tests
rg -A 5 "DATA_ENGINE" "e2e"

# Search for dataEngine configurations
rg -A 5 "dataEngine" "e2e"

# Look for documentation about data engines
fd "README|\.md" "e2e"

Length of output: 84892


Script:

#!/bin/bash
# Let's check the README for data engine documentation
cat e2e/README.md

# Let's also check for v2 data engine specific tests
rg "dataEngine=v2" "e2e/tests"

# And check for any comments or documentation about data engine restrictions
rg -B 2 -A 2 "DATA_ENGINE.*v1" "e2e"

Length of output: 8071

e2e/tests/negative/node_drain.robot (1)

6-6: Verify variable definitions in the new resource file

The centralization of variables is a good practice. However, we should verify that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${RWX_VOLUME_FAST_FAILOVER}, ${DATA_ENGINE}) are properly defined in variables.resource.

✅ Verification successful

All required variables are properly defined in variables.resource

The variables.resource file contains all the required variables mentioned in the test file:

  • ${LOOP_COUNT} = 1
  • ${RETRY_COUNT} = 300
  • ${RETRY_INTERVAL} = 1
  • ${RWX_VOLUME_FAST_FAILOVER} = false
  • ${DATA_ENGINE} = v1

The centralization of these variables in variables.resource is correct and follows good practices. The test file properly imports this resource file and uses these variables consistently throughout the test cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all required variables are defined in the new resource file

# Check for required variables in the resource file
rg -l "(?:LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)" "e2e/tests/keywords/variables.resource"

Length of output: 204


Script:

#!/bin/bash
# Let's first find the correct path of the variables.resource file
fd "variables.resource" 

# Then check the content of the file for the required variables
rg "(?:LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)" -A 1

Length of output: 105799


Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check if there are any other variable definitions in the test file
cat e2e/tests/negative/node_drain.robot

Length of output: 13088

e2e/tests/negative/node_reboot.robot (3)

6-7: LGTM: Resource imports align with test requirements

The addition of variables.resource and sharemanager.resource appropriately supports the new test case requirements for RWX volume and share manager testing.


24-54: Well-structured test implementation with comprehensive verification

The test case effectively validates the auto reattachment behavior with a thorough sequence of steps:

  1. Creates both RWO and RWX volumes
  2. Verifies initial deployment health
  3. Writes test data
  4. Simulates node shutdown
  5. Validates sharemanager recovery
  6. Verifies data integrity and functionality

This implementation directly addresses the issue #9858 regarding share manager recovery after node shutdown.


319-319: LGTM: Consistent power on handling across test cases

The modifications maintain a consistent approach to power on handling across different test scenarios, ensuring proper system recovery and test cleanup.

Also applies to: 337-337, 358-358, 375-375, 396-396, 412-412

Comment on lines +42 to +45
FOR ${powered_off_node} IN @{powered_off_nodes}
Run keyword And Ignore Error power_on_node_by_name ${powered_off_node}
Remove Values From List ${powered_off_nodes} ${powered_off_node}
END
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

🛠️ Refactor suggestion

List modification is safe, but error handling can be improved

After analyzing the codebase, I found that:

  1. The @{powered_off_nodes} list is properly initialized as an empty list in variables.resource
  2. Nodes are added to the list via Power off node keyword in host.resource
  3. The same list modification pattern is used in both common.resource and host.resource, indicating it's an established pattern
  4. The list modification during iteration is safe because:
    • Each node is processed only once
    • The list is only used for cleanup purposes
    • The same pattern is consistently used across files

However, the error handling suggestion from the original review is still valid:

    FOR    ${powered_off_node}    IN    @{powered_off_nodes}
-        Run keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
-        Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
+        ${status}    ${error}=    Run Keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
+        IF    '${status}' == 'FAIL'
+            Log    Failed to power on node ${powered_off_node}: ${error}    WARN
+        END
    END
🔗 Analysis chain

Improve error handling and list modification safety

While the loop effectively handles multiple powered-off nodes, there are a few concerns to address:

  1. The node is removed from the list regardless of whether the power-on operation succeeds.
  2. Modifying the list while iterating over it could lead to unexpected behavior.
  3. The @{powered_off_nodes} list's initialization is not visible in this file.

Consider this safer implementation:

-    FOR    ${powered_off_node}    IN    @{powered_off_nodes}
-        Run keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
-        Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
-    END
+    ${nodes_to_process}=    Copy List    ${powered_off_nodes}
+    FOR    ${powered_off_node}    IN    @{nodes_to_process}
+        ${status}    ${error}=    Run Keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        IF    '${status}' == 'PASS'
+            Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
+        ELSE
+            Log    Failed to power on node ${powered_off_node}: ${error}    WARN
+        END
+    END

Let's verify the initialization of the powered_off_nodes variable:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find where powered_off_nodes is initialized
# Expected: Find the variable declaration in variables.resource or similar file

rg -l "powered_off_nodes" --type robot

Length of output: 38


Script:

#!/bin/bash
# Let's try a broader search without type filtering to find any references
rg -l "powered_off_nodes"

Length of output: 144


Script:

#!/bin/bash
# Let's also check the specific file content to understand the context better
cat "e2e/keywords/common.resource"

Length of output: 2476


Script:

#!/bin/bash
# Search for any power-related keywords that might help understand the context
rg "power_on_node_by_name" -A 2 -B 2

Length of output: 1149


Script:

#!/bin/bash
# Let's check the variables.resource file to find the initialization
cat "e2e/keywords/variables.resource"

Length of output: 326


Script:

#!/bin/bash
# Let's also check host.resource since it has similar power-related code
cat "e2e/keywords/host.resource"

Length of output: 1735

Comment on lines +70 to +87
def wait_for_sharemanager_pod_restart(self, name):
sharemanager_pod_name = "share-manager-" + name
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
last_creation_time = sharemanager_pod.metadata.creation_timestamp

retry_count, retry_interval = get_retry_count_and_interval()
for i in range(retry_count):
logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")
time.sleep(retry_interval)
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
if sharemanager_pod == None:
continue
creation_time = sharemanager_pod.metadata.creation_timestamp
logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")
if creation_time > last_creation_time:
return

assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Good addition, but some improvements needed

The new method effectively monitors share manager pod restarts with good logging. However, there are a few improvements to consider:

  1. Follow Python best practices:
-            if sharemanager_pod == None:
+            if sharemanager_pod is None:
  1. Use proper assertion:
-        assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted"
+        raise AssertionError(f"sharemanager pod {sharemanager_pod_name} isn't restarted")
  1. Consider extracting common code with delete_sharemanager_pod_and_wait_for_recreation as both methods share similar logic for monitoring pod recreation.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def wait_for_sharemanager_pod_restart(self, name):
sharemanager_pod_name = "share-manager-" + name
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
last_creation_time = sharemanager_pod.metadata.creation_timestamp
retry_count, retry_interval = get_retry_count_and_interval()
for i in range(retry_count):
logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")
time.sleep(retry_interval)
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
if sharemanager_pod == None:
continue
creation_time = sharemanager_pod.metadata.creation_timestamp
logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")
if creation_time > last_creation_time:
return
assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted"
def wait_for_sharemanager_pod_restart(self, name):
sharemanager_pod_name = "share-manager-" + name
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
last_creation_time = sharemanager_pod.metadata.creation_timestamp
retry_count, retry_interval = get_retry_count_and_interval()
for i in range(retry_count):
logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")
time.sleep(retry_interval)
sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
if sharemanager_pod is None:
continue
creation_time = sharemanager_pod.metadata.creation_timestamp
logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")
if creation_time > last_creation_time:
return
raise AssertionError(f"sharemanager pod {sharemanager_pod_name} isn't restarted")
🧰 Tools
🪛 Ruff (0.8.0)

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)


87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant