test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

yangchiu · 2024-11-28T09:58:26Z

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9858

What this PR does / why we need it:

add test case Shutdown Volume Node And Test Auto Reattach To A New Node

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

New Features
- Introduced a new keyword for waiting on sharemanager pod restarts.
- Added functionality for batch processing powered-off nodes during power operations.
Improvements
- Enhanced variable management by centralizing definitions in a new resource file.
- Updated multiple keywords to support looping through lists for improved resource handling.
Bug Fixes
- Improved cleanup process for powered-off nodes to ensure all are addressed systematically.
Documentation
- Updated comments and documentation for clarity on new functionalities and variable management.

…h To A New Node Signed-off-by: Yang Chiu <[email protected]>

coderabbitai · 2024-11-28T09:58:36Z

Walkthrough

The pull request introduces significant changes across multiple files, primarily enhancing the resource management and testing capabilities within the end-to-end testing framework. Key modifications include the addition of numerous library imports, the introduction of new keywords, and the restructuring of variable management by centralizing variable definitions into a new resource file. The updates also involve renaming and modifying existing keywords to support batch processing of powered-off nodes, improving the flexibility and robustness of the testing environment.

Changes

File Path	Change Summary
e2e/keywords/common.resource	Added multiple library imports; modified `Cleanup test resources` keyword to iterate over `@{powered_off_nodes}`.
e2e/keywords/host.resource	Renamed `Power on off node` to `Power on off nodes`; modified to loop over `@{powered_off_nodes}`.
e2e/keywords/sharemanager.resource	Added `Wait for sharemanager pod of deployment ${deployment_id} restart` keyword; updated comments in existing keywords.
e2e/keywords/variables.resource	New resource file defining global variables and an empty list variable `@{powered_off_nodes}`.
e2e/keywords/workload.resource	Added `Power off volume node of ${workload_kind} ${workload_id} without waiting`; modified existing keyword for tracking.
e2e/libs/host/aws.py	Updated `power_off_node` method to include `waiting` parameter.
e2e/libs/host/base.py	Updated `power_off_node` method to include `waiting` parameter.
e2e/libs/host/harvester.py	Updated `power_off_node` method to include `waiting` parameter.
e2e/libs/keywords/host_keywords.py	Updated `power_off_volume_node` method to include `waiting` parameter.
e2e/libs/keywords/sharemanager_keywords.py	Added `wait_for_sharemanager_pod_restart` method; updated logging in `wait_for_share_manager_pod_running`.
e2e/tests/negative/*.robot	Added resource declaration for `../keywords/variables.resource` and removed inline variable definitions across multiple test files.
e2e/tests/regression/*.robot	Added resource declaration for `../keywords/variables.resource` and removed inline variable definitions across multiple test files.

Assessment against linked issues

Objective	Addressed	Explanation
Add/update auto e2e test cases for share manager pod shutdown scenario (#9858)	✅

Possibly related PRs

test(robot): v2 volume should block trim when volume is degraded #2114: Relates to modifications in the Cleanup test resources keyword and handling powered-off nodes.
test(robot): add test case Test Longhorn components recovery #2143: Connects with broader resource management and cleanup processes.
test(robot): pull backup from another longhorn #2151: Focuses on backup and restore processes, emphasizing state management.
test(robot): volume should not reattach after node eviction #2164: Complements the test case verifying volume detachment after node eviction.
fix(robot): fix Test Longhorn dynamic provisioned RWX volume recovery #2168: Relevant to improving reliability of volume recovery tests and sharemanager pod management.

Suggested reviewers

chriscchien

Poem

In the land of code where rabbits play,
New keywords hop in, brightening the day.
With loops and lists, we clean up the mess,
Testing our pods, we strive for success!
So let’s power on nodes, one by one,
In this coding adventure, we’re having such fun! 🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (50)

e2e/libs/host/base.py (1)
26-26: Add docstring to document the waiting parameter.

Since this is an abstract base class that defines the interface, it would be helpful to document the expected behavior of the waiting parameter.

Consider adding a docstring like this:
    @abstractmethod
    def power_off_node(self, node_name, waiting):
+        """Power off a node.
+        
+        Args:
+            node_name: Name of the node to power off
+            waiting: If True, wait for the node to completely shut down before returning
+        """
        return NotImplemented
e2e/tests/regression/test_engine_image.robot (1)

Line range hint 1-24: Consider adding node shutdown test coverage.

While the existing test case covers engine upgrade scenarios, given the PR's objective to test node shutdown and auto-reattachment (issue #9858), consider adding a new test case in this file to verify engine image behavior during node shutdown scenarios.

Would you like me to help draft a new test case that covers:

Engine image behavior when a node is shut down

Verification of engine image state consistency after node recovery

Integration with the share manager functionality for RWX PVCs
e2e/keywords/host.resource (1)
38-42: Consider improving error handling and naming.

While the implementation is functional, there are a few suggestions for improvement:

The keyword name "Power on off nodes" is ambiguous. Consider renaming to something more descriptive like "Power On Previously Powered Off Nodes".

The error handling could be enhanced to log failures.

Consider adding validation for the ${powered_off_nodes} list.

Here's a suggested improvement:
-Power on off nodes
+Power On Previously Powered Off Nodes
     FOR    ${powered_off_node}    IN    @{powered_off_nodes}
-        Run keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        ${status}    ${error} =    Run Keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        Run Keyword If    '${status}' == 'FAIL'    Log    Failed to power on node ${powered_off_node}: ${error}    WARN
         Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
     END
e2e/tests/regression/test_migration.robot (1)

Line range hint 19-47: Consider adding a test case for node shutdown scenario

While the current test case thoroughly validates RWX volume migration, it doesn't cover the specific scenario from issue #9858 where a node shutdown causes the share manager to become stuck. Consider adding a new test case that:

Creates an RWX volume

Shuts down the node containing the share manager pod

Verifies auto-reattachment to a new node

Validates that the share manager recovers properly

This would help prevent regression of the issue described in #9858.

Would you like me to help draft the new test case implementation?
e2e/keywords/sharemanager.resource (1)
29-32: Consider adding documentation for the new keyword.

While the implementation is correct, adding documentation would improve maintainability by clarifying:

The purpose of the keyword

Expected behavior during pod restart

Any timeout or failure conditions

Example documentation:
+[Documentation]    Waits for the sharemanager pod of the specified deployment to restart.
+...               The keyword uses the deployment ID to identify the volume and monitors
+...               the associated sharemanager pod until a restart is detected.
 Wait for sharemanager pod of deployment ${deployment_id} restart
     ${deployment_name} =   generate_name_with_suffix    deployment    ${deployment_id}
     ${volume_name} =    get_workload_volume_name    ${deployment_name}
     wait_for_sharemanager_pod_restart    ${volume_name}
e2e/tests/regression/test_scheduling.robot (1)

Line range hint 1-54: Missing test case for node shutdown and auto-reattachment scenario.

Based on the PR objectives (issue #9858), this file should include a new test case to verify that:

The share manager correctly handles node shutdown scenarios

RWX PVCs can successfully auto-reattach to new nodes

The current test cases don't cover these scenarios, which are the primary objectives of this PR.

Would you like me to help draft a new test case that covers these scenarios? The test case would:

Set up a RWX PVC with share manager

Shut down the node hosting the share manager

Verify successful auto-reattachment to a new node

e2e/tests/regression/test_volume.robot (2)

Line range hint 1-60: Add test case for volume node shutdown and auto-reattach

Based on the PR objectives and issue #9858, this file should include a test case for validating the share manager's behavior when a volume node is shut down and testing the auto-reattachment to a new node. This is currently missing from the test suite.

Would you like me to help draft a new test case that:

Creates a volume with RWX access mode

Simulates a node shutdown

Verifies the share manager's state

Validates successful auto-reattachment to a new node?

Line range hint 1-60: Consider adding share manager state validation

To thoroughly test the scenario described in issue #9858, consider adding assertions to verify:

Share manager's state transitions during node shutdown

RWX PVC attachment behavior

Recovery time metrics for auto-reattachment

This will help ensure the test suite effectively catches regressions related to the share manager getting stuck in stopping/error state.
e2e/libs/keywords/host_keywords.py (1)
49-52: Add docstring and error handling.

The implementation looks good and aligns well with the PR objectives for testing volume node shutdown scenarios. However, consider these improvements:

Add a docstring explaining the parameters and behavior:
 def power_off_volume_node(self, volume_name, waiting=True):
+    """Powers off the node containing the specified volume.
+    
+    Args:
+        volume_name: Name of the volume whose node should be powered off
+        waiting: If True, waits for power-off completion. If False, returns immediately
+    
+    Raises:
+        Exception: If volume node cannot be determined
+    """
Add error handling for the node ID lookup:
-    node_id = self.volume_keywords.get_node_id_by_replica_locality(volume_name, "volume node")
+    try:
+        node_id = self.volume_keywords.get_node_id_by_replica_locality(volume_name, "volume node")
+        if not node_id:
+            raise Exception(f"Could not determine node ID for volume {volume_name}")
+    except Exception as e:
+        raise Exception(f"Failed to get node ID for volume {volume_name}: {str(e)}")
e2e/tests/negative/kubelet_restart.robot (3)
Line range hint 15-35: Enhance test coverage for share manager scenarios

Given that this PR addresses issue #9858 regarding share manager getting stuck, consider adding specific validations:

Verify share manager pod state transitions

Validate successful RWX volume reattachment to new nodes
 Restart Volume Node Kubelet While Workload Heavy Writing
     Given Set setting rwx-volume-fast-failover to ${RWX_VOLUME_FAST_FAILOVER}
     And Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
     And Create statefulset 0 using RWO volume with longhorn-test storageclass
     And Create statefulset 1 using RWX volume with longhorn-test storageclass
 
     FOR    ${i}    IN RANGE    ${LOOP_COUNT}
         And Keep writing data to pod of statefulset 0
         And Keep writing data to pod of statefulset 1
 
         When Stop volume nodes kubelet for 10 seconds    statefulset 0    statefulset 1
         And Wait for volume of statefulset 0 healthy
         And Wait for volume of statefulset 1 healthy
+        And Wait for share manager pod healthy    statefulset 1
         And Wait for workloads pods stable    statefulset 0    statefulset 1
+        And Verify RWX volume successfully reattached    statefulset 1
 
         Then Check statefulset 0 works
         And Check statefulset 1 works
     END
Line range hint 37-57: Consider parameterizing the kubelet stop duration

The 360-second duration for pod eviction timeout is hardcoded. Consider making this configurable through a variable to improve test maintainability.
 Stop Volume Node Kubelet For More Than Pod Eviction Timeout While Workload Heavy Writing
     Given Set setting rwx-volume-fast-failover to ${RWX_VOLUME_FAST_FAILOVER}
     And Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
     And Create statefulset 0 using RWO volume with longhorn-test storageclass
     And Create statefulset 1 using RWX volume with longhorn-test storageclass
 
     FOR    ${i}    IN RANGE    ${LOOP_COUNT}
         And Keep writing data to pod of statefulset 0
         And Keep writing data to pod of statefulset 1
 
-        When Stop volume nodes kubelet for 360 seconds    statefulset 0    statefulset 1
+        When Stop volume nodes kubelet for ${POD_EVICTION_TIMEOUT}    statefulset 0    statefulset 1
         And Wait for volume of statefulset 0 healthy
         And Wait for volume of statefulset 1 healthy
+        And Wait for share manager pod healthy    statefulset 1
         And Wait for workloads pods stable    statefulset 0    statefulset 1
+        And Verify RWX volume successfully reattached    statefulset 1
 
         Then Check statefulset 0 works
         And Check statefulset 1 works
     END
Line range hint 1-57: Test structure aligns well with PR objectives

The test cases effectively validate the system's behavior during node shutdown scenarios. The structure is clean and follows Robot Framework best practices. The suggested improvements will help ensure comprehensive coverage of the share manager issue (#9858).

Consider adding these test cases to a dedicated test suite for share manager-related scenarios if one doesn't exist already. This would help organize tests by functionality and make it easier to maintain and extend share manager-specific test coverage.
e2e/tests/regression/test_ha.robot (1)

Line range hint 24-34: Add test case for share manager node shutdown scenario

The current test cases don't cover the specific scenario from issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

Creates a RWX PVC with share manager

Shuts down the node hosting the share manager

Verifies successful auto-reattachment to a new node

Would you like me to help draft the new test case implementation?
e2e/keywords/common.resource (2)
Line range hint 4-24: Remove duplicate import of network_keywords library

The network_keywords.py library is imported twice (lines 8 and 13). Please remove one of the duplicate imports to maintain clean dependencies.
 Library             ../libs/keywords/network_keywords.py
 Library             ../libs/keywords/recurringjob_keywords.py
 Library             ../libs/keywords/statefulset_keywords.py
 Library             ../libs/keywords/stress_keywords.py
 Library             ../libs/keywords/volume_keywords.py
 Library             ../libs/keywords/workload_keywords.py
 Library             ../libs/keywords/persistentvolumeclaim_keywords.py
-Library             ../libs/keywords/network_keywords.py
 Library             ../libs/keywords/storageclass_keywords.py
Line range hint 1-67: Consider documenting the cleanup sequence dependencies

The cleanup sequence handles various resources in a specific order, which is crucial for proper test isolation. With the addition of powered-off node handling, it would be valuable to document:

The rationale behind the cleanup sequence order

Any dependencies between cleanup steps

The interaction between powered-off node handling and other cleanup operations

Consider adding documentation in the following format:
*** Settings ***
Documentation       Common keywords
...                 The cleanup sequence follows these dependencies:
...                 1. Power on nodes first to ensure they're available for other cleanup operations
...                 2. Uncordon nodes to allow pod scheduling
...                 3. Clean up higher-level resources (deployments, statefulsets) before their dependencies
...                 4. Clean up storage-related resources last
e2e/tests/negative/stress_cpu.robot (1)

6-6: LGTM! Consider documenting the variables.

The centralization of variables into a dedicated resource file is a good practice for maintainability. However, it would be helpful to document the expected values and purpose of these variables in the resource file.
e2e/tests/negative/stress_filesystem.robot (2)
Line range hint 1-100: Missing test coverage for node shutdown scenarios.

While the file contains various stress test scenarios, it doesn't address the primary PR objective of testing volume node shutdown and auto-reattachment scenarios described in issue #9858. Consider adding a new test case to validate this specific behavior.

I can help draft a new test case that would:

Create a volume with multiple replicas

Simulate a node shutdown

Verify auto-reattachment to a new node

Validate data integrity

Would you like me to propose the implementation?

Line range hint 31-43: Consider adding timeout handling for volume detach/attach cycles.

The test "Stress Volume Node Filesystem When Volume Is Detaching and Attaching" might benefit from explicit timeout handling to prevent test hangs.

Consider adding timeout parameters:
And Wait for volume 0 detached    timeout=300s
And Wait for volume 0 healthy    timeout=300s
e2e/tests/regression/test_v2.robot (1)
Line range hint 1-1: Missing test case for share manager node shutdown

Based on the PR objectives and issue #9858, this file should include a new test case that:

Creates an RWX PVC with a share manager

Simulates a node shutdown where the share manager pod is running

Verifies that the PVC can successfully reattach to a new node

Consider adding a new test case with the following structure:
Share Manager Should Recover After Node Shutdown
    [Tags]    cluster
    Given Create storageclass longhorn-test with    dataEngine=v2    accessMode=ReadWriteMany
    And Create persistentvolumeclaim 0 using RWX volume with longhorn-test storageclass
    And Create deployment 0 with persistentvolumeclaim 0
    And Wait for volume of deployment 0 healthy
    
    When Get node of share manager pod for deployment 0 volume
    And Shutdown node of share manager pod
    
    Then Wait for volume of deployment 0 attached to new node
    And Check deployment 0 works
e2e/tests/negative/uninstallation_checks.robot (1)
Line range hint 24-92: Consider enhancing error scenario coverage.

The test case thoroughly validates the happy path, but could be strengthened in two areas:

Add explicit verification that backups are retained immediately after uninstallation

Include error scenario handling (e.g., failed uninstallation)

Consider adding these verification steps after the uninstallation:
    Then Uninstall Longhorn
    And Check Longhorn CRD removed
+   And Verify backup store contains backup 0    # Explicit verification post-uninstallation
+   And Verify backup store contains backup 1
+   And Verify no orphaned resources in namespace longhorn-system

    # Assume this is another Longhorn cluster
    Then Install Longhorn
e2e/tests/negative/node_delete.robot (2)
Line range hint 20-40: Consider optimizing test data size and adding timeouts

The test writes 2048 MB of data, which might increase test execution time unnecessarily. Consider:

Reducing the data size for faster test execution

Making the data size configurable via a variable

Adding explicit timeouts to "Wait until" steps to prevent indefinite hangs
- And Write 2048 MB data to file data in deployment 0
- And Write 2048 MB data to file data in deployment 1
+ And Write ${TEST_DATA_SIZE} MB data to file data in deployment 0
+ And Write ${TEST_DATA_SIZE} MB data to file data in deployment 1
Line range hint 20-89: Consider adding specific test case for share manager recovery

While these tests cover node deletion scenarios well, they don't specifically address the share manager stuck issue mentioned in #9858. Consider adding a dedicated test case that:

Specifically targets the share manager pod's node shutdown

Verifies the share manager's state transitions

Confirms successful reattachment of RWX PVCs

Would you like me to help draft a new test case specifically for the share manager recovery scenario?
e2e/tests/negative/pull_backup_from_another_longhorn.robot (1)

Line range hint 1-100: Test case doesn't address PR objectives

This test case focuses on backup/restore functionality across different Longhorn versions, but doesn't address the PR's main objective of testing "shutdown volume node and auto reattach to new node" scenarios (from issue #9858). The test should include steps to:

Shut down a volume node

Verify share manager behavior

Test RWX PVC reattachment to new nodes

Would you like assistance in:

Creating a new test case specifically for the node shutdown scenario, or

Extending this test case to include the required node shutdown tests?
e2e/tests/negative/cluster_restart.robot (1)
Line range hint 19-89: Enhance test coverage for share manager scenarios

The test case provides excellent coverage of various storage configurations and workload types. However, given the PR objectives and issue #9858, consider these enhancements:

Add specific validation for share manager state after cluster restart

Include checks for RWX volume reattachment behavior

Consider parallel execution for efficiency

Add these validations after the cluster restart:
        When Restart cluster
        And Wait for longhorn ready
        And Wait for workloads pods stable
        ...    deployment 0    deployment 1    deployment 2    deployment 3    deployment 4    deployment 5
        ...    statefulset 0    statefulset 1    statefulset 2    statefulset 3    statefulset 4    statefulset 5

+       # Validate share manager recovery
+       And Wait for share manager ready
+       And Verify share manager not in error state
+       And Verify RWX volumes successfully reattached

        Then Check deployment 0 works
e2e/libs/host/aws.py (2)
76-79: Add documentation for asynchronous behavior.

The waiting logic implementation is correct, but it would benefit from a brief comment explaining the behavior when waiting=False. This helps other developers understand the asynchronous nature of the operation.
    def power_off_node(self, power_off_node_name, waiting=True):
        instance_ids = [self.mapping[power_off_node_name]]
        resp = self.aws_client.stop_instances(InstanceIds=instance_ids, Force=True)
        assert resp['ResponseMetadata']['HTTPStatusCode'] == 200, f"Failed to stop instances {instance_ids} response: {resp}"
        logging(f"Stopping instances {instance_ids}")
+       # When waiting=False, return immediately after initiating shutdown
+       # without waiting for the instance to reach 'stopped' state
        if waiting:
            waiter = self.aws_client.get_waiter('instance_stopped')
            waiter.wait(InstanceIds=instance_ids)
            logging(f"Stopped instances")
🧰 Tools

🪛 Ruff (0.8.0)

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

79-79: Remove unnecessary f-string prefix.

The logging statement doesn't contain any placeholders, so the f-string prefix can be removed.
-           logging(f"Stopped instances")
+           logging("Stopped instances")
🧰 Tools

🪛 Ruff (0.8.0)

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)
e2e/tests/regression/test_backup.robot (2)
Line range hint 1-1: Consider adding node shutdown test cases or creating a separate test file.

Based on the PR objectives and issue #9858, this PR should include test cases for share manager behavior during node shutdown scenarios. However, this file focuses on backup/restore functionality. Consider either:

Adding the node shutdown test cases to this file if backup functionality is involved, or

Creating a separate test file specifically for node shutdown and auto-reattachment scenarios.

Would you like me to help draft the test cases for the node shutdown scenario? This would include:

Testing share manager behavior when a node is shut down

Verifying RWX PVC attachment to new nodes

Validating auto-reattachment functionality

Line range hint 37-85: Consider enhancing error handling in backup volume tests.

The test implementation is well-structured, but consider these improvements:

Add retry mechanisms for backup operations that might be flaky

Add more assertions to verify the error states

Add logging to help debug test failures

Example improvement:
    When Create backup 0 for volume 0
+   # Add retry and logging
+   Run Keyword And Continue On Failure    Wait Until Keyword Succeeds    
+   ...    ${RETRY_COUNT}    ${RETRY_INTERVAL}
+   ...    Verify Backup Creation    0    0
    And Create backup 1 for volume 1
    Then Verify backup list contains no error for volume 0
e2e/libs/keywords/sharemanager_keywords.py (1)

70-95: Implementation aligns well with PR objectives

The new wait_for_sharemanager_pod_restart method and enhanced logging in wait_for_share_manager_pod_running provide the necessary capabilities to:

Monitor share manager pod restarts during node shutdown tests

Debug potential issues with pod state transitions

Validate the fix for issue #9858 where share manager gets stuck in stopping/error state

Consider adding additional checks for pod status conditions beyond just the creation timestamp to ensure the pod is not only restarted but also healthy and ready to serve RWX volumes.

🧰 Tools

🪛 Ruff (0.8.0)

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)

87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)
e2e/libs/host/harvester.py (2)
Line range hint 56-73: Consider optimizing batch node operations.

Now that power_off_node supports non-blocking operations, methods like reboot_all_nodes and reboot_all_worker_nodes could be optimized to power off all nodes concurrently before waiting.

Example optimization for reboot_all_nodes:
 def reboot_all_nodes(self, shut_down_time_in_sec=NODE_REBOOT_DOWN_TIME_SECOND):
     node_names = [key for key in self.mapping.keys()]
 
     for node_name in node_names:
-        self.power_off_node(node_name)
+        self.power_off_node(node_name, waiting=False)
+
+    # Wait for all nodes to finish powering off
+    for node_name in node_names:
+        self.power_off_node(node_name)  # This call will only wait
 
     logging(f"Wait for {shut_down_time_in_sec} seconds before starting vms")
     time.sleep(shut_down_time_in_sec)
Line range hint 1-11: Review security implications of disabled SSL verification.

While this is test code, consider:

Adding a comment explaining why SSL verification is disabled

Ensuring this is only used in test environments

Adding environment validation to prevent accidental use in production
e2e/tests/regression/test_settings.robot (2)

Line range hint 20-82: Consider enhancing test documentation for concurrent rebuild scenarios.

The test case thoroughly validates concurrent rebuild limits, but could benefit from additional documentation about:

Expected rebuild timing behaviors

Data integrity verification methods

Potential impact on system resources

Would you like me to help generate improved documentation?

Line range hint 84-116: Consider extending RWX volume tests to include node shutdown scenarios.

Given the PR objectives to test node shutdown and auto-reattachment scenarios (issue #9858), consider extending this test case to include:

Node shutdown while RWX volume is attached

Verification of auto-reattachment behavior

Network connectivity validation after reattachment

Would you like me to help draft additional test scenarios that align with the PR objectives?
e2e/tests/negative/replica_rebuilding.robot (3)
Line range hint 133-135: Fix duplicate 'And' keyword in test steps.

There's a syntax error in these steps where "And" is duplicated:
-        And And Wait until volume of deployment 0 replica rebuilding started on node 0
+        And Wait until volume of deployment 0 replica rebuilding started on node 0
Line range hint 22-35: Consider randomizing node selection for better test coverage.

The test cases use a sequential pattern (node 0, 1, 2) for deleting replicas. Consider randomizing the node selection to catch potential edge cases and ensure the system behaves correctly regardless of the order of node operations.

Example approach:
# Create a list of available nodes and shuffle them
@{nodes}=    Create List    0    1    2
Evaluate    random.shuffle(@{nodes})    random

# Use shuffled nodes in the test
FOR    ${i}    IN RANGE    ${LOOP_COUNT}
    FOR    ${node}    IN    @{nodes}
        When Delete volume 0 replica on node ${node}
        And Wait until volume 0 replica rebuilding started on node ${node}
        And Wait until volume 0 replica rebuilding completed on node ${node}
    END
    Then Wait for volume 0 healthy
    And Check volume 0 data is intact
END
Also applies to: 89-102

Line range hint 106-121: Consider adding more specific assertions during rebuilding.

While the test cases verify the final state and data integrity, consider adding assertions to verify the system state during the rebuilding process. This could help catch intermediate issues and provide better debugging information.

Example additions:
When Delete replica of deployment 0 volume on node 0
And Wait until volume of deployment 0 replica rebuilding started on node 0
# Add assertions about the system state during rebuilding
And Volume should have expected rebuild status
And Volume metrics should show active rebuild
And System events should indicate rebuilding progress
e2e/tests/negative/network_disconnect.robot (2)
Line range hint 1-1: Consider adding specific test cases for share manager issues.

While the current test cases cover network disconnection scenarios well, they don't fully address the share manager stuck issues mentioned in issue #9858. Consider adding test cases that specifically:

Validate the share manager state during node shutdown

Verify that the share manager doesn't get stuck in stopping/error state

Confirm successful reattachment of RWX PVCs after node shutdown

This would help ensure the specific issues mentioned in #9858 are properly tested and prevent regressions.

Would you like me to help draft additional test cases focusing on these scenarios?

Line range hint 1-1: Enhance test verification and automation.

The test cases are well-structured, but consider these improvements:

Add explicit verification steps for volume state transitions

Automate more aspects of the manual test plans

Add logging to capture state changes during network disconnection

This would improve test reliability and debugging capabilities.

Example verification steps to add:
And Verify volume state transition from "attached" to "detached"
And Verify no orphaned resources after reattachment
And Log volume manager state changes during network disconnect
e2e/tests/negative/test_backup_listing.robot (3)
Line range hint 159-171: Consider performance and reliability improvements for large volume operations.

The test steps for large volume operations (210GB, 150GB) might benefit from the following improvements:

Add explicit timeout settings for large data operations

Verify cleanup completion between volume operations to prevent resource exhaustion

Consider adding progress logging for long-running operations

Example improvement for timeout and logging:
Write 210 GB large data to file 0 in pod 2
    [Timeout]    4h    # Adjust based on performance characteristics
    FOR    ${written}    IN RANGE    0    210    10
        Write 10 GB data to file 0 in pod 2
        Log    Written ${written+10}GB of 210GB
    END
Line range hint 1-171: Consider separating test concerns and adding node shutdown scenarios.

While the backup listing test improvements are valuable, they don't directly address the PR objectives regarding node shutdown and share manager scenarios. Consider:

Creating a separate test file specifically for node shutdown and auto reattachment scenarios

Adding test cases that validate share manager behavior during node shutdown

Including specific assertions for RWX PVC attachment behavior

This would help maintain clear separation of concerns while ensuring comprehensive coverage of the scenarios described in issue #9858.

Test case is misaligned with PR objectives - additional test coverage needed

The current test files show that while there are tests for share manager pod recreation (component_resilience.robot) and node power-off scenarios (node_reboot.robot), none specifically validate the share manager getting stuck during node shutdown as described in issue #9858. The test case being reviewed focuses on backup listing for large volumes, which is unrelated to the PR's objectives.

Suggested actions:

Add a new test case in node_reboot.robot specifically targeting the share manager stuck scenario during node shutdown

Keep the current large volume backup test separate as it serves a different purpose

🔗 Analysis chain

Line range hint 148-158: Verify test case alignment with PR objectives.

While this test case focuses on backup listing for large volumes, the PR objectives mention testing volume node shutdown and auto reattachment scenarios. Consider adding specific test cases that validate the behavior described in issue #9858 regarding share manager getting stuck when a node is shut down.

Let's check if there are other test files that might be more relevant to the PR objectives:
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for test files that might be more relevant to testing node shutdown scenarios
rg -l "shutdown.*node|node.*shutdown" "e2e/tests/"

# Search for test files that might be testing share manager functionality
rg -l "share.*manager|manager.*share" "e2e/tests/"
Length of output: 225

Script:
#!/bin/bash
# Let's examine the content of these files to see if they already cover the share manager node shutdown scenario
rg -A 5 -B 5 "share.*manager|manager.*share|shutdown.*node|node.*shutdown" "e2e/tests/negative/node_reboot.robot" "e2e/tests/regression/test_settings.robot" "e2e/tests/negative/component_resilience.robot"

# Also search for any existing test cases related to issue #9858
rg "9858|stuck" "e2e/tests/"
Length of output: 6872
e2e/tests/negative/component_resilience.robot (1)
Line range hint 208-241: Add test case for share manager node shutdown scenario.

The current RWX volume recovery test case doesn't cover the specific scenario from issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

Creates an RWX volume

Shuts down the node where the share manager pod is running

Verifies that the share manager successfully moves to a new node

Confirms the RWX PVC can be attached to a new node

Here's a suggested test case structure:
Test Share Manager Node Shutdown Recovery
    [Documentation]    Test share manager recovery when its node is shut down
    ...                Related to issue #9858
    ...
    ...                Test steps:
    ...                    Create RWX volume using Longhorn storage class
    ...                    Deploy workload using the RWX volume
    ...                    Identify the node running share manager pod
    ...                    Shutdown the node
    ...                    Verify share manager moves to new node
    ...                    Verify RWX volume remains accessible
    
    When Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
    And Create persistentvolumeclaim 0 using RWX volume with longhorn-test storageclass
    And Create deployment 0 with persistentvolumeclaim 0
    And Write test data to deployment 0
    
    ${node}=    Get node running sharemanager pod of deployment 0
    When Power off node ${node}
    Then Wait for sharemanager pod of deployment 0 running
    And Verify sharemanager pod of deployment 0 is on different node    ${node}
    And Wait for deployment 0 pods stable
    And Check deployment 0 data is intact
e2e/keywords/workload.resource (3)
48-52: LGTM! Consider adding documentation for the powered-off nodes tracking.

The changes improve node state tracking by maintaining a list of powered-off nodes. This enhancement aligns well with the PR objectives for testing node shutdown scenarios.

Consider adding a comment explaining the purpose of ${powered_off_nodes} list and ${last_volume_node} variable for better maintainability.

53-59: LGTM! Consider documenting the difference between waiting and non-waiting variants.

The new non-blocking keyword enhances testing capabilities for concurrent scenarios. However, consider the following improvements:

Add documentation explaining when to use this variant vs. the blocking version.

Consider refactoring to reduce code duplication:
+Power off volume node of ${workload_kind} ${workload_id}
+    [Arguments]    ${waiting}=True
+    ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
+    ${volume_name} =    get_workload_volume_name    ${workload_name}
+    ${powered_off_node} =    get_volume_node    ${volume_name}
+    Append to list      ${powered_off_nodes}     ${powered_off_node}
+    ${last_volume_node} =    get_volume_node    ${volume_name}
+    power_off_volume_node    ${volume_name}    waiting=${waiting}
+    Set Test Variable    ${last_volume_node}
+
+Power off volume node of ${workload_kind} ${workload_id} without waiting
+    Power off volume node of ${workload_kind} ${workload_id}    waiting=False
48-59: Well-structured changes that enhance test coverage for node shutdown scenarios.

The modifications effectively support testing of share manager behavior during node shutdowns by:

Maintaining accurate tracking of powered-off nodes for reliable cleanup

Enabling concurrent testing through non-blocking operations

Providing consistent node state management across different test scenarios

These improvements align well with the PR objectives of testing RWX PVC behavior during node shutdowns.

Consider documenting the test patterns and best practices for using these keywords in the test suite documentation, particularly focusing on:

When to use blocking vs. non-blocking variants

Guidelines for cleanup and recovery after node shutdowns

Recommended patterns for testing concurrent scenarios
e2e/tests/negative/node_drain.robot (4)
Line range hint 1-24: Add test case for share manager node shutdown scenario

While the existing test cases cover various node drain scenarios, they don't specifically address the issue #9858 where the share manager becomes stuck when its node is shut down. Consider adding a new test case that:

Creates an RWX PVC with share manager

Shuts down the node where share manager pod is running

Verifies that the share manager successfully moves to a new node

Validates that the RWX PVC can be attached to a new node

This would help prevent regression of the issue described in #9858.

Would you like me to help draft this additional test case?

Line range hint 25-63: Enhance RWX volume assertions in force drain test

For RWX volumes (deployment 1), consider adding these additional checks:

Verify share manager pod successfully moves to another node

Assert share manager is not in stopping/error state

Validate share manager service endpoints are updated

This would strengthen the test coverage for RWX-specific behaviors during node drain.

Line range hint 219-243: Improve robustness of node drain policy test

Consider enhancing the test case with:

Explicit timeout handling for drain operations

State verification between steps

Cleanup verification after policy change

This would make the test more reliable and easier to debug.

Example improvements:
    When Force drain node 2 and expect failure
    And Check instance-manager pod is running on node 2
    And Verify node 2 status is cordoned
    And Verify all non-protected pods are evicted    timeout=300s

    When Set setting node-drain-policy to always-allow
    And Verify setting update is applied
    And Force drain node 2 and expect success
    And Wait until all pods are evicted    timeout=300s
    And Check PDB not exist    instance_manger=${instance_manager_name}
    And Verify cleanup of protection resources
Line range hint 219-219: Enhance test documentation for node drain policy test

The documentation could be improved by:

Adding expected behavior for RWX volumes specifically

Including verification steps for share manager state

Documenting cleanup expectations

This would make the test's relationship to issue #9858 clearer.
e2e/tests/negative/node_reboot.robot (1)
56-56: Consider adding post-power-on verification

While the test powers on the nodes, consider adding explicit verification steps after power on to ensure complete system recovery.
 And Power on off nodes
+And Wait for longhorn ready
+And Wait for nodes ready

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f28fcf4 and 026531a.

📒 Files selected for processing (35)

e2e/keywords/common.resource (2 hunks)
e2e/keywords/host.resource (2 hunks)
e2e/keywords/sharemanager.resource (1 hunks)
e2e/keywords/variables.resource (1 hunks)
e2e/keywords/workload.resource (1 hunks)
e2e/libs/host/aws.py (1 hunks)
e2e/libs/host/base.py (1 hunks)
e2e/libs/host/harvester.py (2 hunks)
e2e/libs/keywords/host_keywords.py (1 hunks)
e2e/libs/keywords/sharemanager_keywords.py (1 hunks)
e2e/tests/negative/cluster_restart.robot (1 hunks)
e2e/tests/negative/component_resilience.robot (1 hunks)
e2e/tests/negative/kubelet_restart.robot (1 hunks)
e2e/tests/negative/network_disconnect.robot (1 hunks)
e2e/tests/negative/node_delete.robot (1 hunks)
e2e/tests/negative/node_drain.robot (1 hunks)
e2e/tests/negative/node_reboot.robot (8 hunks)
e2e/tests/negative/pull_backup_from_another_longhorn.robot (1 hunks)
e2e/tests/negative/replica_rebuilding.robot (1 hunks)
e2e/tests/negative/stress_cpu.robot (1 hunks)
e2e/tests/negative/stress_filesystem.robot (1 hunks)
e2e/tests/negative/stress_memory.robot (1 hunks)
e2e/tests/negative/test_backup_listing.robot (3 hunks)
e2e/tests/negative/uninstallation_checks.robot (1 hunks)
e2e/tests/regression/test_backing_image.robot (1 hunks)
e2e/tests/regression/test_backup.robot (1 hunks)
e2e/tests/regression/test_basic.robot (1 hunks)
e2e/tests/regression/test_engine_image.robot (1 hunks)
e2e/tests/regression/test_ha.robot (1 hunks)
e2e/tests/regression/test_migration.robot (1 hunks)
e2e/tests/regression/test_replica.robot (1 hunks)
e2e/tests/regression/test_scheduling.robot (1 hunks)
e2e/tests/regression/test_settings.robot (1 hunks)
e2e/tests/regression/test_v2.robot (2 hunks)
e2e/tests/regression/test_volume.robot (1 hunks)

✅ Files skipped from review due to trivial changes (1)

e2e/keywords/variables.resource

🧰 Additional context used

🪛 Ruff (0.8.0)

e2e/libs/host/aws.py

79-79: f-string without any placeholders

Remove extraneous f prefix

(F541)

e2e/libs/keywords/sharemanager_keywords.py

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)

87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🔇 Additional comments (49)

e2e/libs/host/base.py (1)

26-26: LGTM! Verify implementing classes.

The addition of the waiting parameter to the abstract method provides better control over shutdown behavior, which aligns well with the PR's objective of testing node shutdown scenarios.

Let's verify that all implementing classes have been updated accordingly:

✅ Verification successful

All implementing classes have been properly updated with the new parameter

The verification shows that all implementing classes (harvester.py and aws.py) have correctly updated their power_off_node method signatures to include the waiting parameter with a default value of True, matching the abstract method's signature in base.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all implementing classes have updated the power_off_node method signature

# Search for classes that inherit from Base
ast-grep --pattern 'class $_ (Base):'

# Search for power_off_node implementations
rg -A 2 'def power_off_node\(' --type py

Length of output: 683

e2e/tests/regression/test_engine_image.robot (1)

6-6: Verify the variables defined in the new resource file.

The change moves variable definitions to a centralized variables.resource file. Please ensure that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in the centralized resource file

The verification confirms that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in e2e/keywords/variables.resource with appropriate values:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${DATA_ENGINE} = v1

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check for variable definitions in the new resource file
echo "Checking variables.resource for required variable definitions..."
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "../keywords/variables.resource"

# Check if these variables are still defined elsewhere in the test suite
echo "Checking for any remaining inline definitions of these variables..."
rg "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "e2e/tests/"

Length of output: 531

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
echo "Finding the variables.resource file..."
fd "variables.resource" .

# Then check its content for variable definitions
echo -e "\nChecking content of the variables.resource file if found..."
fd "variables.resource" . --exec cat {}

# Also check for any variable definitions in the test file itself
echo -e "\nChecking current test file for variable definitions..."
cat e2e/tests/regression/test_engine_image.robot

Length of output: 1682

e2e/tests/regression/test_backing_image.robot (2)

Line range hint 13-31: LGTM! Well-structured test case with comprehensive coverage.

The test case demonstrates good practices:

Clear BDD-style steps
Proper resource management
Verification of error conditions
Complete cleanup

6-6: LGTM! Good practice to centralize variable definitions.

The addition of variables.resource improves maintainability by centralizing common variables used across test files.

Let's verify that all required variables are properly defined in the new resource file:

✅ Verification successful

All required variables are properly defined in the centralized resource file

The verification confirms that variables.resource contains all the essential variables mentioned in the test file:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${DATA_ENGINE} = v1

The centralization of these variables is indeed a good practice that will help maintain consistency across test files.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# Expected variables: LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, DATA_ENGINE

# Check the variables.resource file for required variable definitions
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}' ../keywords/variables.resource

Length of output: 172

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for variable definitions
fd variables.resource --exec cat {}

Length of output: 384

e2e/keywords/host.resource (2)

Line range hint 4-9: LGTM: Library imports are well-organized and complete.

The imports provide all necessary dependencies for node management operations.

44-47: Verify implementation against test requirements.

The implementation tracks powered-off nodes for later recovery, which aligns with the test objectives. However, consider adding parameter validation and error handling.

Let's verify the implementation against the test requirements:

Consider adding parameter validation:

 Power off node ${node_id}
+    # Validate node_id is a positive integer
+    Should Be True    ${node_id} >= 0    msg=Node ID must be a non-negative integer
+    
     ${powered_off_node} =     get_node_by_index    ${node_id}
+    Should Not Be Empty    ${powered_off_node}    msg=Invalid node ID: ${node_id}
+    
     Append to list      ${powered_off_nodes}     ${powered_off_node}
     power_off_node_by_name    ${powered_off_node}

✅ Verification successful

Implementation is correctly integrated with test framework

The verification shows that the implementation is properly integrated with the test framework:

powered_off_nodes is initialized as an empty list in variables.resource
The keyword is used in node reboot and drain test scenarios
Proper cleanup is implemented in Cleanup test resources keyword which powers on nodes and cleans the list
Multiple test files track powered-off nodes consistently using the same list

The suggested parameter validation is not necessary because:

get_node_by_index is a framework-level function that would fail if the index is invalid
The list operation is safe as powered_off_nodes is pre-initialized
Test teardown ensures proper cleanup of powered-off nodes

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of this keyword in test cases
# and ensure it's properly integrated with the shutdown volume node test case

# Check for test cases using this keyword
rg -l "Power off node" --type robot

# Check for related test implementations
rg -l "shutdown.*volume.*node.*test.*auto.*reattach" --type robot

Length of output: 178

Script:

#!/bin/bash
# Let's examine the test files that use this keyword to understand the context
rg -B 5 -A 5 "Power off node" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

# Check if powered_off_nodes is initialized somewhere
rg "powered_off_nodes" --type robot

# Check the implementation of get_node_by_index
rg -B 2 -A 2 "get_node_by_index" --type robot

Length of output: 3494

Script:

#!/bin/bash
# Let's check the initialization of powered_off_nodes and related setup
rg -B 5 -A 5 "Test Setup|Suite Setup" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

# Check the implementation of power_off_node_by_name
rg -B 2 -A 2 "power_off_node_by_name" --type robot

# Check if there are any test teardown procedures handling powered off nodes
rg -B 5 -A 5 "Test Teardown|Suite Teardown" e2e/tests/negative/node_reboot.robot e2e/tests/negative/node_drain.robot

Length of output: 3561

Script:

#!/bin/bash
# Let's check the Set test environment keyword which might initialize powered_off_nodes
rg -B 2 -A 5 "Set test environment" ../e2e/keywords/

# Check the Cleanup test resources keyword for handling powered off nodes
rg -B 2 -A 5 "Cleanup test resources" ../e2e/keywords/

# Check for any other references to powered_off_nodes in keyword files
rg -B 2 -A 2 "powered_off_nodes" ../e2e/keywords/

Length of output: 332

Script:

#!/bin/bash
# Let's check the Set test environment keyword which might initialize powered_off_nodes
rg -B 2 -A 5 "Set test environment" e2e/keywords/

# Check the Cleanup test resources keyword for handling powered off nodes
rg -B 2 -A 5 "Cleanup test resources" e2e/keywords/

# Check for any other references to powered_off_nodes in keyword files
rg -B 2 -A 2 "powered_off_nodes" e2e/keywords/

# Let's also check the content of the host.resource file
cat e2e/keywords/host.resource

Length of output: 5101

e2e/tests/regression/test_migration.robot (1)

6-6: LGTM! Good practice moving variables to a centralized resource file.

The addition of variables.resource improves maintainability by centralizing variable definitions that are used across multiple test files.

e2e/keywords/sharemanager.resource (2)

29-32: LGTM! The new keyword follows consistent patterns.

The implementation correctly follows the established patterns in the file:

Reuses existing helper functions for name generation
Maintains consistent naming conventions
Follows the same structure as similar keywords

29-32: Verify the restart detection logic.

Since this keyword is crucial for testing the auto-reattachment functionality mentioned in issue #9858, we should verify the underlying implementation.

✅ Verification successful

Let's examine the implementation of the function in the identified file.

Let's check the retry configuration and pod utilities being used to ensure the restart detection is robust.

Let's check the retry configuration values and the get_pod implementation to complete our verification.

The restart detection logic is properly implemented

The implementation correctly detects pod restarts by:

Tracking pod creation timestamps to detect new instances
Using robust retry mechanism with configurable count and interval
Properly handling pod absence during restart
Using the official Kubernetes API for pod status checks
Failing explicitly if restart is not detected within the retry window

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check the implementation of wait_for_sharemanager_pod_restart
# to ensure it correctly detects pod restarts and handles timeouts

# Search for the implementation
ast-grep --pattern 'def wait_for_sharemanager_pod_restart($$$)'

# Look for related test cases
rg -l "wait_for_sharemanager_pod_restart" --type py

Length of output: 160

Script:

#!/bin/bash
# Get the implementation of wait_for_sharemanager_pod_restart and surrounding context
rg -A 20 "def wait_for_sharemanager_pod_restart" e2e/libs/keywords/sharemanager_keywords.py

# Also check for any restart detection related functions in the same file
rg "restart" e2e/libs/keywords/sharemanager_keywords.py

Length of output: 1432

Script:

#!/bin/bash
# Check the retry configuration
rg "get_retry_count_and_interval" --type py

# Check pod utilities
rg "def get_pod" --type py

# Look for any test cases using this function
rg -B 5 -A 10 "wait_for_sharemanager_pod_restart" e2e/

Length of output: 10658

Script:

#!/bin/bash
# Check the retry configuration implementation
rg -A 5 "def get_retry_count_and_interval" e2e/libs/utility/utility.py

# Check the get_pod implementation
cat e2e/libs/utility/utility.py | rg -A 10 "def get_pod"

Length of output: 641

e2e/tests/regression/test_scheduling.robot (1)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource aligns with best practices by centralizing variable definitions, making the test suite more maintainable and consistent.

e2e/tests/regression/test_replica.robot (2)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource improves code organization by centralizing variable definitions that were previously scattered across test files.

Let's verify the resource file exists and contains the required variables:

✅ Verification successful

Resource file contains all expected variables and follows Robot Framework conventions ✅

The variables.resource file exists and contains essential variables including:

${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL} for test control flow
${DATA_ENGINE} for engine specification
Additional configuration variables for volume types and network settings

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the variables.resource file exists and contains the required variables
# Expected: File exists and contains LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, and DATA_ENGINE

# Check if the file exists and show its contents
cat e2e/keywords/variables.resource

Length of output: 326

Line range hint 1-24: Consider adding node shutdown test cases to align with PR objectives.

The PR objectives mention testing node shutdown and auto-reattachment scenarios (issue #9858), but this file only contains replica rebuilding tests. Consider adding test cases for:

Share manager pod behavior during node shutdown
RWX PVC auto-reattachment to new nodes

Let's check if these test cases exist elsewhere:

e2e/tests/regression/test_volume.robot (1)

6-6: Verify variables.resource contains all required variables

The change to centralize variables is good for maintainability. However, we should verify that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in variables.resource

The verification confirms that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) have been successfully moved to e2e/keywords/variables.resource with the same values:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1

The centralization of these variables is a good practice and has been implemented correctly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in variables.resource

# Check if the variables exist in the new resource file
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\})" "e2e/keywords/variables.resource"

Length of output: 98

Script:

#!/bin/bash
# Let's first check if the variables.resource file exists and see its content
fd "variables.resource" e2e/keywords/ --type f --exec cat {}

# Also, let's check the original test file to see what variables were previously defined
fd "test_volume.robot" e2e/tests/regression/ --type f --exec git show HEAD~1:{}

Length of output: 2590

e2e/libs/keywords/host_keywords.py (1)

49-52: Implementation aligns with PR objectives.

The addition of the waiting parameter enhances the testing capabilities for issue #9858, allowing verification of share manager behavior during both immediate and delayed node shutdowns. This flexibility is crucial for thorough testing of RWX PVC auto-reattachment scenarios.

Let's verify the usage of this method in test cases:

✅ Verification successful

Implementation aligns with PR objectives and is being utilized correctly

The verification confirms that the power_off_volume_node method with the new waiting parameter is being used in the test framework:

In e2e/keywords/workload.resource: Used with both default behavior (waiting=True) and immediate shutdown (waiting=False)
The implementation provides the flexibility needed for testing share manager behavior during node shutdowns as intended in issue #9858

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the new waiting parameter is utilized in test cases
# Expected: Find test cases using power_off_volume_node with waiting=False

# Search for test cases using the new parameter
rg -l "power_off_volume_node.*waiting.*=.*False" "e2e/test"

Length of output: 108

Script:

#!/bin/bash
# Let's first find the correct test directory structure
fd -t d "test"

# Then search for any usage of power_off_volume_node
rg "power_off_volume_node" -A 2 -B 2

Length of output: 1641

e2e/tests/negative/kubelet_restart.robot (1)

6-6: Verify variables in the new resource file

The variables previously defined in this file have been moved to variables.resource. Let's ensure all required variables are properly defined.

✅ Verification successful

Variables are properly defined in the centralized resource file

All required variables used in kubelet_restart.robot are correctly defined in e2e/keywords/variables.resource:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${RWX_VOLUME_FAST_FAILOVER} = false
${DATA_ENGINE} = v1

The centralization of these variables maintains consistency and follows good practices for test maintenance.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# Expected variables: LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE

# Check for variable definitions in the resource file
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}' ../keywords/variables.resource

Length of output: 197

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for variable definitions
# We'll also search for these variables in the original test file to understand what we're looking for
rg -A 1 '^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}' e2e/tests/negative/kubelet_restart.robot

Length of output: 192

Script:

#!/bin/bash
# Now that we found the correct path, let's check the content of variables.resource
cat e2e/keywords/variables.resource

# Also let's check the original test file to see how these variables were used before
cat e2e/tests/negative/kubelet_restart.robot

Length of output: 2688

e2e/tests/regression/test_ha.robot (2)

6-6: Verify variable definitions in the new resource file

The migration of variables to variables.resource improves maintainability. However, we should ensure that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are properly defined in the new resource file.

✅ Verification successful

Variables are properly defined in the resource file

The verification confirms that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}) are correctly defined in e2e/keywords/variables.resource with appropriate default values:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check for required variables in the resource file
echo "Checking for required variables in variables.resource..."
rg -A 1 "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "../keywords/variables.resource"

Length of output: 274

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd -t f "variables.resource$"

# Then check the content of the file for our variables
echo "=== Checking test file for variable usage ==="
rg "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "e2e/tests/regression/test_ha.robot"

Length of output: 289

Script:

#!/bin/bash
# Now that we found the correct path, let's check the content of variables.resource
echo "=== Content of variables.resource ==="
cat "e2e/keywords/variables.resource"

# Also check if these variables might be defined in other resource files
echo -e "\n=== Checking other resource files for these variables ==="
rg "LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL" "e2e/**/*.resource"

Length of output: 665

Line range hint 36-52: Consider implementing the skipped test case

The "Don't Orphan Processes" test case has detailed documentation but is currently skipped. This test could be valuable for verifying proper cleanup of resources during node failures, which aligns with the PR's objectives around node shutdown scenarios.

e2e/tests/negative/stress_cpu.robot (1)

Line range hint 15-89: Consider adding a test case for node shutdown scenario.

While the existing test cases provide good coverage for CPU stress scenarios, they don't directly address the PR objective of testing node shutdown and auto-reattachment behavior. Consider adding a new test case that specifically tests this scenario.

Example structure for the new test case:

Stress Volume Node CPU During Node Shutdown And Auto Reattachment
    Given Create volume 0 with    size=5Gi    numberOfReplicas=3
    And Attach volume 0
    And Write data to volume 0

    FOR    ${i}    IN RANGE    ${LOOP_COUNT}
        When Stress CPU of node with volume 0
        And Shutdown volume node
        And Wait for volume auto-reattachment to new node
        Then Check volume 0 data is intact
    END

Let's verify if similar test cases exist in other files:

e2e/tests/negative/stress_memory.robot (3)

6-6: LGTM! Good practice centralizing variables.

The addition of variables.resource improves maintainability by centralizing variable definitions, which is particularly important for test parameters like ${LOOP_COUNT} that are used across multiple test cases.

Line range hint 19-31: Add documentation about expected behavior under memory stress.

The test case verifies data integrity during replica rebuilding under memory stress, but it would be helpful to document:

Expected memory thresholds
Expected system behavior under stress
Success criteria beyond data integrity

Let's verify the memory stress implementation:

✅ Verification successful

Add documentation about memory stress test configuration and behavior

The test uses stress-ng with the following configuration:

Memory load: 80% of available memory
Single VM worker process
1-hour timeout period
Stress pod runs on the same node as the volume

Consider documenting:

The memory stress configuration in test description
Expected behavior: system should maintain data integrity while under 80% memory pressure
Success criteria: volume replica rebuilding completes successfully under memory stress

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how memory stress is implemented
# Expected: Find the stress memory implementation to understand thresholds

ast-grep --pattern 'Stress memory of node with volume $_'
rg -A 5 'Stress memory of node with volume'

Length of output: 1356

Script:

#!/bin/bash
# Let's check the implementation of stress_node_memory functions
rg -A 10 'stress_node_memory_by_volume|stress_node_memory_of_all_volumes'

# Also check if there are any configuration or constant files that might define thresholds
fd -e yaml -e py -e robot -e resource | xargs rg -l 'stress.*memory|memory.*stress'

Length of output: 2860

Script:

#!/bin/bash
# Let's check the actual memory stress implementation in stress.py
cat e2e/libs/node/stress.py

Length of output: 5186

Script:

#!/bin/bash
# Let's check the constant values used for memory stress
cat e2e/libs/node/constant.py

Length of output: 373

Line range hint 33-47: Consider potential race conditions in stress test loops.

The remaining test cases follow similar patterns of stressing memory during volume operations. Consider:

Adding retry mechanisms for volume operations that might fail under stress
Implementing proper cleanup between iterations
Adding health checks before proceeding to the next iteration

Let's verify the retry mechanisms:

Also applies to: 49-67, 69-89

✅ Verification successful

Based on the search results, I can see that there are already retry mechanisms and health checks implemented in the test framework:

The Wait for volume healthy keyword is consistently used after volume operations
Data integrity checks are performed using Check volume data is intact
The test framework has proper cleanup between iterations with detach/attach cycles
Health checks are performed before proceeding to next iterations

The review comment's concerns about retry mechanisms and health checks are already addressed in the codebase.

Let me verify the cleanup mechanisms between iterations:

Let me try a different search pattern to verify the cleanup mechanisms:

The review comment's concerns are already properly addressed in the codebase.

Looking at the test file and previous search results, I can confirm that:

Retry mechanisms: The framework uses Wait for volume healthy and Wait for volume detached keywords consistently to ensure operations complete
Cleanup between iterations: The test properly detaches and reattaches volumes between iterations with proper wait states
Health checks: Each iteration includes data integrity verification with Check volume data is intact and volume health checks with Wait for volume healthy

The test follows best practices for stress testing with proper cleanup, verification, and health checks between iterations. No changes are needed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for retry implementations in volume operations
# Expected: Find retry mechanisms for volume operations under stress

rg -A 5 '(Wait for|Wait until).*volume.*healthy'
ast-grep --pattern 'Wait for volume $_ healthy'

Length of output: 53636

Script:

#!/bin/bash
# Check cleanup implementations between iterations
rg -A 5 'Detach volume.*\s+.*Wait for volume.*detached'

Length of output: 57

Script:

#!/bin/bash
# Check cleanup implementations between iterations
rg -A 2 'Detach volume.*\s+And Wait for volume'

# Also check the test file directly
cat e2e/tests/negative/stress_memory.robot

Length of output: 2641

e2e/tests/negative/stress_filesystem.robot (2)

6-6: LGTM: Good practice to centralize variable definitions.

The addition of variables.resource aligns with best practices by centralizing common variable definitions, making the test framework more maintainable.

Line range hint 19-29: Consider enhancing error handling in the replica rebuilding test.

The test "Stress Volume Node Filesystem When Replica Is Rebuilding" should verify that the filesystem remains stable during rebuilding. Consider adding additional validation steps.

Consider adding these validation steps within the loop:

And Verify volume 0 filesystem is accessible
And Verify no I/O errors in volume 0 logs

e2e/tests/regression/test_v2.robot (2)

6-6: Verify variables migration completeness

The addition of variables.resource and removal of inline variables is a good practice for maintainability. However, we should verify that all necessary variables (especially ${LOOP_COUNT}) are properly defined in the new resource file.

✅ Verification successful

Variables migration is complete and correct

The variables.resource file contains all essential variables including ${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, and other global variables needed for testing. The migration to a centralized variable definition is properly implemented.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all variables used in the test file are defined in the resource file
# Check if LOOP_COUNT and other variables are defined in the new resource file
rg -A 1 "^\*\*\* Variables \*\*\*" "../keywords/variables.resource"
rg "LOOP_COUNT" "../keywords/variables.resource"

Length of output: 256

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd "variables.resource"

# Then check the content of test_v2.robot to see what variables are being used
rg "\${[A-Z_]+}" "e2e/tests/regression/test_v2.robot"

Length of output: 203

Script:

#!/bin/bash
# First, let's check the content of the variables resource file
cat e2e/keywords/variables.resource

# Then check variables used in test_v2.robot with a corrected regex
rg "\\\${[A-Z_]+}" "e2e/tests/regression/test_v2.robot"

Length of output: 477

54-54: Test coverage gap for node shutdown scenario

While the cluster tag addition is appropriate for this test case, I notice that this file doesn't include the test case mentioned in the PR objectives for testing volume node shutdown and auto-reattachment scenarios (issue #9858).

Would you like me to help draft a new test case that specifically addresses the share manager node shutdown scenario described in issue #9858?

e2e/tests/negative/uninstallation_checks.robot (2)

Line range hint 24-92: LGTM on test structure and documentation!

The test case is well-documented with clear prerequisites, steps, and environment requirements. The implementation follows good practices:

Tests both v1 and v2 data engines
Includes proper data verification
Validates cross-cluster DR functionality

6-6: Verify variable definitions in the new resource file.

The addition of variables.resource and removal of local variable declarations is a good practice for centralization. However, we should ensure all required variables are properly defined.

✅ Verification successful

Resource file change verified successfully

The variables.resource file properly defines all the required variables including ${LOOP_COUNT}, ${RETRY_COUNT}, and ${RETRY_INTERVAL}. The file is correctly located at e2e/keywords/variables.resource and is being used by multiple test files in the e2e test suite. The uninstallation_checks.robot file doesn't contain any local variable declarations that would conflict with the centralized variables.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file
# and check for any potential impacts on other test files.

# Check the content of the variables resource file
echo "Checking variables.resource content:"
cat ../keywords/variables.resource

# Look for any remaining direct references to the moved variables
echo -e "\nChecking for any remaining references to variables in test files:"
rg "(\${LOOP_COUNT}|\${RETRY_COUNT}|\${RETRY_INTERVAL})" ./e2e/tests/

Length of output: 532

Script:

#!/bin/bash
# Let's fix the path and regex issues in our verification

# First, let's find the correct path to the variables.resource file
echo "Looking for variables.resource file:"
fd "variables.resource"

# Check for variable references with properly escaped regex
echo -e "\nChecking for variable references in test files:"
rg '\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}' e2e/tests/

# Also check the content of the uninstallation_checks.robot to understand variable usage
echo -e "\nChecking uninstallation_checks.robot content:"
cat e2e/tests/negative/uninstallation_checks.robot

Length of output: 7391

Script:

#!/bin/bash
# Let's check the content of the variables.resource file since we found its correct path
echo "Checking variables.resource content:"
cat e2e/keywords/variables.resource

# Also check if there are any variable declarations in the uninstallation_checks.robot
# that might have been moved to variables.resource
echo -e "\nChecking for any variable declarations in uninstallation_checks.robot:"
rg '^\$\{.*\}.*=' e2e/tests/negative/uninstallation_checks.robot

Length of output: 633

e2e/tests/negative/node_delete.robot (1)

6-6: Verify all required variables are defined in variables.resource

The centralization of variables is a good practice. However, we need to ensure that all previously defined variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${RWX_VOLUME_FAST_FAILOVER}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in variables.resource

The verification confirms that all necessary variables have been successfully moved to the centralized resource file:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${RWX_VOLUME_FAST_FAILOVER} = false
${DATA_ENGINE} = v1

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file

# Check if all required variables are defined in variables.resource
echo "Checking for required variables in variables.resource..."
rg -A 1 "(\${LOOP_COUNT}|\${RETRY_COUNT}|\${RETRY_INTERVAL}|\${RWX_VOLUME_FAST_FAILOVER}|\${DATA_ENGINE})" "../keywords/variables.resource"

Length of output: 434

Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file

# Check if all required variables are defined in variables.resource
echo "Checking for required variables in variables.resource..."
# Search for each variable separately to avoid regex issues
echo "Searching for LOOP_COUNT..."
rg "LOOP_COUNT" "e2e/keywords/variables.resource"
echo "Searching for RETRY_COUNT..."
rg "RETRY_COUNT" "e2e/keywords/variables.resource"
echo "Searching for RETRY_INTERVAL..."
rg "RETRY_INTERVAL" "e2e/keywords/variables.resource"
echo "Searching for RWX_VOLUME_FAST_FAILOVER..."
rg "RWX_VOLUME_FAST_FAILOVER" "e2e/keywords/variables.resource"
echo "Searching for DATA_ENGINE..."
rg "DATA_ENGINE" "e2e/keywords/variables.resource"

# Also check the test file to see what variables were previously defined
echo "Checking git history for previously defined variables..."
git log -p "e2e/tests/negative/node_delete.robot" | rg "^\-.*(\$\{.*\})"

Length of output: 992

e2e/tests/negative/pull_backup_from_another_longhorn.robot (1)

6-6: LGTM: Good refactoring of variable management

The addition of variables.resource aligns with best practices by centralizing variable definitions, making the test framework more maintainable.

e2e/tests/negative/cluster_restart.robot (1)

6-6: Verify the variables.resource file setup

The centralization of variables into variables.resource is a good practice. However, let's ensure all required variables are properly defined.

✅ Verification successful

Resource file setup is correct and properly configured

The variables.resource file exists at e2e/keywords/variables.resource and contains all the required variables (LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, CONTROL_PLANE_NODE_NETWORK_LATENCY_IN_MS, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE). The import path in cluster_restart.robot correctly references this file using a relative path.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the variables.resource file exists and contains all required variables

# Check if the file exists and contains the required variables
rg -U "(?s)\*\*\* Variables \*\*\*.*\\\${LOOP_COUNT}.*\\\${RETRY_COUNT}.*\\\${RETRY_INTERVAL}.*\\\${CONTROL_PLANE_NODE_NETWORK_LATENCY_IN_MS}.*\\\${RWX_VOLUME_FAST_FAILOVER}.*\\\${DATA_ENGINE}" ../keywords/variables.resource

Length of output: 502

Script:

#!/bin/bash
# Let's first check if the variables.resource file exists
fd "variables.resource" ../keywords/

# Then let's examine its content
rg "^\\\$\{.*\}" ../keywords/variables.resource

# Also check for Variables section marker
rg "\*\*\* Variables \*\*\*" ../keywords/variables.resource

Length of output: 388

Script:

#!/bin/bash
# Let's search for the variables.resource file in the entire repository
fd "variables.resource"

# Let's also check the actual path relative to the test file
ls -l e2e/tests/keywords/variables.resource

Length of output: 186

Script:

#!/bin/bash
# Let's examine the content of the found variables.resource file
cat e2e/keywords/variables.resource

# Also check the import statement in the test file
cat e2e/tests/negative/cluster_restart.robot | head -n 10

Length of output: 682

e2e/libs/host/aws.py (2)

71-71: LGTM! Well-designed parameter addition.

The new waiting parameter with default value True maintains backward compatibility while enabling more flexible testing scenarios. This change aligns well with the PR objective of testing node shutdown behaviors.

77-78: Verify AWS waiter timeout configuration.

Since this code is critical for testing share manager behavior during node shutdown (issue #9858), we should verify the default timeout settings of the AWS waiter are sufficient.

e2e/tests/regression/test_backup.robot (1)

6-6: Verify the variables.resource file setup and variable accessibility.

The addition of variables.resource aligns with the effort to centralize variable management. This is a good practice for maintainability and consistency.

Let's verify the variables.resource file setup:

✅ Verification successful

Resource file setup is correct and contains the required variables

The variables.resource file exists at the correct location (e2e/keywords/variables.resource) and contains all the necessary variables including ${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, and ${DATA_ENGINE}. The file path in the test file (../keywords/variables.resource) correctly references this centralized variable file.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that variables.resource exists and contains the necessary variables

# Check if the file exists and contains the required variables
rg -l "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" ../keywords/variables.resource

# Check for any remaining direct variable declarations in other test files
# that should have been moved to variables.resource
rg "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}" "e2e/tests/**/*.robot"

Length of output: 318

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd "variables.resource" .

# Then check for any variable declarations in Robot Framework files
fd -e robot . --exec rg -l "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)\}"

# Also check the content of the test_backup.robot file to understand the correct relative path
cat e2e/tests/regression/test_backup.robot

Length of output: 4327

Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check if these variables are used in the test file
rg "\${(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|DATA_ENGINE)}" e2e/tests/regression/test_backup.robot

Length of output: 566

e2e/libs/keywords/sharemanager_keywords.py (1)

95-95: Great addition of phase logging

The added logging statement improves test observability by showing the current phase of the share manager pod during the waiting period. This will be helpful for debugging test failures.

e2e/libs/host/harvester.py (2)

56-56: LGTM! The non-blocking power-off option is a good addition.

The new waiting parameter with a default value of True maintains backward compatibility while adding flexibility for batch operations. The early return when waiting=False is clean and straightforward.

Also applies to: 71-73

Line range hint 56-89: Consider adding additional power state validation.

Given this code is critical for testing scenarios like share manager recovery after node shutdown, consider adding additional validation to ensure the VM is truly powered off before returning.

Suggested additional validation:

     def power_off_node(self, node_name, waiting=True):
         vm_id = self.mapping[node_name]
         url = f"{self.url}/{vm_id}"
+        initial_state = None
+        try:
+            resp = requests.get(url, cookies=self.cookies, verify=False)
+            initial_state = resp.json()['metadata']['fields'].get('status', None)
+        except Exception as e:
+            logging(f"Failed to get initial VM state: {e}")
+
         for i in range(self.retry_count):
             logging(f"Trying to stop vm {vm_id} ... ({i})")

e2e/tests/regression/test_settings.robot (1)

6-6: LGTM! Good practice to centralize variable definitions.

The addition of variables.resource improves maintainability by centralizing variable definitions. This change aligns with Robot Framework best practices.

Let's verify that all required variables are properly defined in the new resource file:

✅ Verification successful

Let me try to locate the correct path for the variables resource file:

Let's check the content of the variables.resource file and look for any duplicate variable definitions:

Resource file addition verified and looks good

The verification shows that:

The variables.resource file exists at the correct location (e2e/keywords/variables.resource)
It contains global variable definitions that are properly scoped
The test file test_settings.robot correctly references it using a relative path
The test file uses ${DATA_ENGINE} which is defined in the variables resource file
No conflicting variable definitions were found in the test file

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all required variables are defined in the new resource file
# and that no other test files still define these variables locally

# Check if all required variables are defined in the new resource file
echo "Checking variables.resource for required variables..."
cat ../keywords/variables.resource

# Check if any other test files still define these variables
echo "Checking for duplicate variable definitions..."
rg -l "^\*\*\* Variables \*\*\*" "e2e/tests/"

Length of output: 410

Script:

#!/bin/bash
# First, let's find the actual path of the variables.resource file
fd -e resource variables

# Check the content of test_settings.robot to understand the relative path
cat e2e/tests/regression/test_settings.robot

# List all resource files to understand the structure
fd -e resource

Length of output: 5504

Script:

#!/bin/bash
# Check the content of the variables resource file
cat e2e/keywords/variables.resource

# Look for any variables defined in test_settings.robot
echo -e "\nChecking for variables in test_settings.robot:"
rg "^\$\{.*\}" e2e/tests/regression/test_settings.robot

# Check for any duplicate variable definitions in other files
echo -e "\nChecking for duplicate variable definitions in other files:"
rg "^\$\{.*\}" e2e/tests/

Length of output: 654

e2e/tests/negative/replica_rebuilding.robot (1)

6-6: Verify variable definitions in the new resource file.

The centralization of variables is a good practice. However, we should verify that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${DATA_ENGINE}) are properly defined in the new resource file.

✅ Verification successful

All required variables are properly defined in the resource file

The variables.resource file contains all the necessary variable definitions that are used in replica_rebuilding.robot:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${DATA_ENGINE} = v1

The centralization of these variables is correctly implemented and all test cases in replica_rebuilding.robot can access these variables through the resource import.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all required variables are defined in the new resource file

# Check if the variables are defined in the resource file
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})" ../keywords/variables.resource

Length of output: 185

Script:

#!/bin/bash
# Let's first find the correct path to the variables.resource file
fd variables.resource

# Then check the content of the file for the required variables
rg -l "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})"

Length of output: 139

Script:

#!/bin/bash
# Now that we found the file, let's check its content for the required variables
rg "(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\})" e2e/keywords/variables.resource -A 1

# Also check how these variables are defined
rg "^(?:\$\{LOOP_COUNT\}|\$\{RETRY_COUNT\}|\$\{RETRY_INTERVAL\}|\$\{DATA_ENGINE\}).*=.*" e2e/keywords/variables.resource

Length of output: 236

Script:

#!/bin/bash
# Let's try a different approach to check variable definitions in Robot Framework syntax
cat e2e/keywords/variables.resource

# Also check where these variables are currently used in the test file
cat e2e/tests/negative/replica_rebuilding.robot

Length of output: 5173

e2e/tests/regression/test_basic.robot (2)

6-6: LGTM! Good refactoring of variable management.

The addition of variables.resource and removal of inline variable declarations is a positive change that:

Centralizes variable management
Reduces duplication across test files
Improves maintainability

Line range hint 1-1: Verify location of the new test case.

The PR objectives mention adding a test case for "Shutdown Volume Node And Test Auto Reattach To A New Node", but this test case is not present in this file. Please verify:

If the test case was added to a different file
If the test case still needs to be added

This verification is important as the test case is crucial for validating the fix for issue #9858 where the share manager becomes stuck when a node is shut down.

✅ Verification successful

Let me check the content of the identified file to verify if it contains the test case.

The test case has been found and is properly implemented.

The test case "Shutdown Volume Node And Test Auto Reattach To A New Node" is present in e2e/tests/negative/node_reboot.robot. The test case verifies:

Volume node shutdown scenario for both RWO and RWX volumes
Auto reattachment of volumes to new nodes
Data integrity after node shutdown and reattachment
Share manager pod recovery for RWX volumes

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the new test case across all robot files
echo "Searching for the new test case..."
rg -l "Shutdown.*Volume.*Node.*Auto.*Reattach" --type robot

# Search for any recent changes to robot test files
echo "Checking recent changes to robot test files..."
git diff --name-only HEAD~1 | grep "\.robot$"

Length of output: 1366

Script:

#!/bin/bash
# Check the content of node_reboot.robot for the test case
echo "Content of e2e/tests/negative/node_reboot.robot:"
cat e2e/tests/negative/node_reboot.robot

Length of output: 22057

e2e/tests/negative/network_disconnect.robot (2)

6-6: LGTM! Good practice to centralize variable management.

The addition of variables.resource improves maintainability by centralizing variable definitions that were previously declared locally.

Line range hint 1-1: Add more specific assertions for volume state and health.

While the test cases cover the scenarios well, consider adding more specific assertions to verify:

Volume attachment state transitions
Replica health status
Data consistency after network recovery
Resource cleanup after test completion

This would help catch subtle issues and ensure proper system recovery.

e2e/tests/negative/test_backup_listing.robot (1)

5-5: LGTM! Good improvements to test organization.

The addition of the 'negative' tag and centralization of variables through variables.resource improves test organization and maintainability.

Also applies to: 7-7

e2e/tests/negative/component_resilience.robot (2)

6-6: Verify the new resource file exists and contains the required variables.

The change to centralize variable management is good practice. However, we need to ensure all required variables (LOOP_COUNT, RETRY_COUNT, RETRY_INTERVAL, RWX_VOLUME_FAST_FAILOVER, DATA_ENGINE) are properly defined in the new resource file.

✅ Verification successful

Resource file verification successful

The variables.resource file exists at e2e/keywords/variables.resource and contains all the required variables with appropriate default values:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${RWX_VOLUME_FAST_FAILOVER} = false
${DATA_ENGINE} = v1

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the variables resource file exists and contains the required variables
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" "e2e/tests/keywords/variables.resource"

Length of output: 210

Script:

#!/bin/bash
# Let's first check if the file exists in a different location
fd -g "*variables.resource"

# Also check for any Robot Framework resource files that might contain these variables
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" -g "*.resource"

Length of output: 172

Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check where these variables were defined previously in the component_resilience.robot file
rg -A 1 "^\$\{(LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)\}" "e2e/tests/negative/component_resilience.robot"

Length of output: 467

Line range hint 142-166: Clarify data engine restrictions on test cases.

Several test cases are only executed when DATA_ENGINE='v1', including backing image and RWX volume tests. Please clarify:

Why are these tests restricted to v1 engine?
Should we have equivalent tests for other data engines?
Are there specific limitations that prevent these tests from running with other engines?

Also applies to: 208-241

✅ Verification successful

Based on the gathered information, I can now provide a final response:

Tests are correctly restricted to v1 data engine for backing image and RWX volume features

The restriction of certain tests to DATA_ENGINE='v1' is intentional and correct because:

The backing image feature is specifically designed for v1 data engine, as evidenced by dedicated v2 engine tests in test_v2.robot that don't include backing image operations
The tests are properly structured with conditional blocks to handle engine-specific features
The default data engine is set to 'v1' in variables.resource, with separate test files for v2-specific functionality

The test implementation follows good practices by:

Using clear conditionals to separate engine-specific features
Having dedicated test files for v2 engine features
Maintaining proper test coverage for both engines in their supported feature sets

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for data engine related configurations and documentation
rg -A 5 "DATA_ENGINE" "e2e/tests/keywords/variables.resource"
rg -A 5 "dataEngine" "e2e/tests/keywords/"

Length of output: 238

Script:

#!/bin/bash
# Let's first find the correct paths for test files and keywords
fd "variables" "e2e"

# Then search for DATA_ENGINE usage across e2e tests
rg -A 5 "DATA_ENGINE" "e2e"

# Search for dataEngine configurations
rg -A 5 "dataEngine" "e2e"

# Look for documentation about data engines
fd "README|\.md" "e2e"

Length of output: 84892

Script:

#!/bin/bash
# Let's check the README for data engine documentation
cat e2e/README.md

# Let's also check for v2 data engine specific tests
rg "dataEngine=v2" "e2e/tests"

# And check for any comments or documentation about data engine restrictions
rg -B 2 -A 2 "DATA_ENGINE.*v1" "e2e"

Length of output: 8071

e2e/tests/negative/node_drain.robot (1)

6-6: Verify variable definitions in the new resource file

The centralization of variables is a good practice. However, we should verify that all required variables (${LOOP_COUNT}, ${RETRY_COUNT}, ${RETRY_INTERVAL}, ${RWX_VOLUME_FAST_FAILOVER}, ${DATA_ENGINE}) are properly defined in variables.resource.

✅ Verification successful

All required variables are properly defined in variables.resource

The variables.resource file contains all the required variables mentioned in the test file:

${LOOP_COUNT} = 1
${RETRY_COUNT} = 300
${RETRY_INTERVAL} = 1
${RWX_VOLUME_FAST_FAILOVER} = false
${DATA_ENGINE} = v1

The centralization of these variables in variables.resource is correct and follows good practices. The test file properly imports this resource file and uses these variables consistently throughout the test cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all required variables are defined in the new resource file

# Check for required variables in the resource file
rg -l "(?:LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)" "e2e/tests/keywords/variables.resource"

Length of output: 204

Script:

#!/bin/bash
# Let's first find the correct path of the variables.resource file
fd "variables.resource" 

# Then check the content of the file for the required variables
rg "(?:LOOP_COUNT|RETRY_COUNT|RETRY_INTERVAL|RWX_VOLUME_FAST_FAILOVER|DATA_ENGINE)" -A 1

Length of output: 105799

Script:

#!/bin/bash
# Let's check the content of the variables.resource file
cat e2e/keywords/variables.resource

# Also check if there are any other variable definitions in the test file
cat e2e/tests/negative/node_drain.robot

Length of output: 13088

e2e/tests/negative/node_reboot.robot (3)

6-7: LGTM: Resource imports align with test requirements

The addition of variables.resource and sharemanager.resource appropriately supports the new test case requirements for RWX volume and share manager testing.

24-54: Well-structured test implementation with comprehensive verification

The test case effectively validates the auto reattachment behavior with a thorough sequence of steps:

Creates both RWO and RWX volumes
Verifies initial deployment health
Writes test data
Simulates node shutdown
Validates sharemanager recovery
Verifies data integrity and functionality

This implementation directly addresses the issue #9858 regarding share manager recovery after node shutdown.

319-319: LGTM: Consistent power on handling across test cases

The modifications maintain a consistent approach to power on handling across different test scenarios, ensuring proper system recovery and test cleanup.

Also applies to: 337-337, 358-358, 375-375, 396-396, 412-412

coderabbitai · 2024-11-28T10:03:31Z

e2e/keywords/common.resource

+    FOR    ${powered_off_node}    IN    @{powered_off_nodes}
+        Run keyword And Ignore Error    power_on_node_by_name    ${powered_off_node}
+        Remove Values From List    ${powered_off_nodes}    ${powered_off_node}
+    END


💡 Codebase verification

🛠️ Refactor suggestion

List modification is safe, but error handling can be improved

After analyzing the codebase, I found that:

The @{powered_off_nodes} list is properly initialized as an empty list in variables.resource

Nodes are added to the list via Power off node keyword in host.resource

The same list modification pattern is used in both common.resource and host.resource, indicating it's an established pattern

The list modification during iteration is safe because:

Each node is processed only once

The list is only used for cleanup purposes

The same pattern is consistently used across files

However, the error handling suggestion from the original review is still valid:

FOR ${powered_off_node} IN @{powered_off_nodes} - Run keyword And Ignore Error power_on_node_by_name ${powered_off_node} - Remove Values From List ${powered_off_nodes} ${powered_off_node} + ${status} ${error}= Run Keyword And Ignore Error power_on_node_by_name ${powered_off_node} + Remove Values From List ${powered_off_nodes} ${powered_off_node} + IF '${status}' == 'FAIL' + Log Failed to power on node ${powered_off_node}: ${error} WARN + END END

🔗 Analysis chain

Improve error handling and list modification safety

While the loop effectively handles multiple powered-off nodes, there are a few concerns to address:

The node is removed from the list regardless of whether the power-on operation succeeds.

Modifying the list while iterating over it could lead to unexpected behavior.

The @{powered_off_nodes} list's initialization is not visible in this file.

Consider this safer implementation:

- FOR ${powered_off_node} IN @{powered_off_nodes} - Run keyword And Ignore Error power_on_node_by_name ${powered_off_node} - Remove Values From List ${powered_off_nodes} ${powered_off_node} - END + ${nodes_to_process}= Copy List ${powered_off_nodes} + FOR ${powered_off_node} IN @{nodes_to_process} + ${status} ${error}= Run Keyword And Ignore Error power_on_node_by_name ${powered_off_node} + IF '${status}' == 'PASS' + Remove Values From List ${powered_off_nodes} ${powered_off_node} + ELSE + Log Failed to power on node ${powered_off_node}: ${error} WARN + END + END

Let's verify the initialization of the powered_off_nodes variable:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Description: Find where powered_off_nodes is initialized # Expected: Find the variable declaration in variables.resource or similar file rg -l "powered_off_nodes" --type robot

Length of output: 38

Script:

#!/bin/bash # Let's try a broader search without type filtering to find any references rg -l "powered_off_nodes"

Length of output: 144

Script:

#!/bin/bash # Let's also check the specific file content to understand the context better cat "e2e/keywords/common.resource"

Length of output: 2476

Script:

#!/bin/bash # Search for any power-related keywords that might help understand the context rg "power_on_node_by_name" -A 2 -B 2

Length of output: 1149

Script:

#!/bin/bash # Let's check the variables.resource file to find the initialization cat "e2e/keywords/variables.resource"

Length of output: 326

Script:

#!/bin/bash # Let's also check host.resource since it has similar power-related code cat "e2e/keywords/host.resource"

Length of output: 1735

coderabbitai · 2024-11-28T10:03:31Z

e2e/libs/keywords/sharemanager_keywords.py

+    def wait_for_sharemanager_pod_restart(self, name):
+        sharemanager_pod_name = "share-manager-" + name
+        sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
+        last_creation_time = sharemanager_pod.metadata.creation_timestamp
+
+        retry_count, retry_interval = get_retry_count_and_interval()
+        for i in range(retry_count):
+            logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")
+            time.sleep(retry_interval)
+            sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")
+            if sharemanager_pod == None:
+                continue
+            creation_time = sharemanager_pod.metadata.creation_timestamp
+            logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")
+            if creation_time > last_creation_time:
+                return
+
+        assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted"


🛠️ Refactor suggestion

Good addition, but some improvements needed

The new method effectively monitors share manager pod restarts with good logging. However, there are a few improvements to consider:

Follow Python best practices:

- if sharemanager_pod == None: + if sharemanager_pod is None:

Use proper assertion:

- assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted" + raise AssertionError(f"sharemanager pod {sharemanager_pod_name} isn't restarted")

Consider extracting common code with delete_sharemanager_pod_and_wait_for_recreation as both methods share similar logic for monitoring pod recreation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def wait_for_sharemanager_pod_restart(self, name):

sharemanager_pod_name = "share-manager-" + name

sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")

last_creation_time = sharemanager_pod.metadata.creation_timestamp

retry_count, retry_interval = get_retry_count_and_interval()

for i in range(retry_count):

logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")

time.sleep(retry_interval)

sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")

if sharemanager_pod == None:

continue

creation_time = sharemanager_pod.metadata.creation_timestamp

logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")

if creation_time > last_creation_time:

return

assert False, f"sharemanager pod {sharemanager_pod_name} isn't restarted"

def wait_for_sharemanager_pod_restart(self, name):

sharemanager_pod_name = "share-manager-" + name

sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")

last_creation_time = sharemanager_pod.metadata.creation_timestamp

retry_count, retry_interval = get_retry_count_and_interval()

for i in range(retry_count):

logging(f"Waiting for sharemanager for volume {name} restart ... ({i})")

time.sleep(retry_interval)

sharemanager_pod = get_pod(sharemanager_pod_name, "longhorn-system")

if sharemanager_pod is None:

continue

creation_time = sharemanager_pod.metadata.creation_timestamp

logging(f"Getting new sharemanager which is created at {creation_time}, and old one is created at {last_creation_time}")

if creation_time > last_creation_time:

return

raise AssertionError(f"sharemanager pod {sharemanager_pod_name} isn't restarted")

🧰 Tools

🪛 Ruff (0.8.0)

80-80: Comparison to None should be cond is None

Replace with cond is None

(E711)

87-87: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

test(robot): add test case Shutdown Volume Node And Test Auto Reattac…

026531a

…h To A New Node Signed-off-by: Yang Chiu <[email protected]>

yangchiu self-assigned this Nov 28, 2024

yangchiu requested a review from a team as a code owner November 28, 2024 09:58

coderabbitai bot reviewed Nov 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

yangchiu commented Nov 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 28, 2024

coderabbitai bot Nov 28, 2024

test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

Are you sure you want to change the base?

test(robot): add test case Shutdown Volume Node And Test Auto Reattach To A New Node #2169

Conversation

yangchiu commented Nov 28, 2024 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 28, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 28, 2024

Choose a reason for hiding this comment

yangchiu commented Nov 28, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 28, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)