Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): migrate test_replica_auto_balance_disk_in_pressure #2167

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yangchiu
Copy link
Member

@yangchiu yangchiu commented Nov 16, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9796

What this PR does / why we need it:

migrate test_replica_auto_balance_disk_in_pressure

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced disk management capabilities with new keywords for enabling, disabling, and checking disk pressure on nodes.
    • New functionality to create StatefulSets with specified volume sizes.
    • Added a keyword to check for running replicas on specified nodes and disks.
  • Bug Fixes

    • Improved logic for disk updates and scheduling in the Node class.
  • Tests

    • Introduced new test cases for replica management and resource handling.
    • Enhanced test organization with new tags for better filtering.
  • Documentation

    • Updated test cases and resources for clarity and organization.

@yangchiu yangchiu self-assigned this Nov 16, 2024
@yangchiu yangchiu requested a review from a team as a code owner November 16, 2024 07:51
Copy link

coderabbitai bot commented Nov 16, 2024

Walkthrough

The changes in this pull request primarily enhance disk management functionalities across various resources in the end-to-end (e2e) testing framework. Modifications include updates to existing keywords, the introduction of new keywords for disk scheduling and monitoring, and enhancements to the Node and Replica classes. Additionally, new test cases and resource management strategies are added to improve the testing framework's capabilities.

Changes

File Path Change Summary
e2e/keywords/common.resource Modified Set test environment keyword to change parameters for add_disk.
e2e/keywords/node.resource Added new disk management keywords and modified existing keyword to prepend ${disk_type}-disk.
e2e/keywords/replica.resource Imported node_keywords.py and added a new keyword for checking replicas on nodes.
e2e/keywords/statefulset.resource Added two new keywords for creating StatefulSets with specified volume sizes.
e2e/keywords/workload.resource Added a new keyword for checking volume replicas on nodes.
e2e/libs/keywords/node_keywords.py Updated add_disk method signature and added several new methods for disk management.
e2e/libs/keywords/replica_keywords.py Added get_replicas method to retrieve replica information with optional parameters.
e2e/libs/keywords/statefulset_keywords.py Updated create_statefulset method to include a size parameter.
e2e/libs/node/node.py Added several methods for disk management and updated existing methods for improved logic.
e2e/libs/replica/base.py Updated get method to include disk_uuid parameter.
e2e/libs/replica/crd.py Modified get method to accept disk_uuid and updated internal logic.
e2e/libs/replica/replica.py Updated get method to include disk_uuid as an optional parameter.
e2e/libs/replica/rest.py Updated get method signature to include disk_uuid.
e2e/libs/workload/statefulset.py Updated create_statefulset method to include a size parameter.
e2e/tests/regression/test_scheduling.robot Added new test case for replica auto-balance and modified resource usage for tests.
e2e/tests/regression/test_v2.robot Added a new tag to an existing test case for better organization.
e2e/tests/regression/test_volume.robot Introduced a new variable ${DATA_ENGINE} for dynamic configuration in a test case.

Assessment against linked issues

Objective Addressed Explanation
Migrate test_scheduling.test_replica_auto_balance_disk_in_pressure (#9796)

Possibly related PRs

Suggested reviewers

  • chriscchien

Poem

🐇 In the land of disks and nodes,
Where data flows and logic codes,
We add, we check, we manage well,
With every change, new stories to tell.
So hop along, let’s test and play,
For in this code, we find our way! 🌟

Warning

Rate limit exceeded

@yangchiu has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 1e777ba and e621156.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (21)
e2e/libs/keywords/replica_keywords.py (1)

12-13: LGTM! Consider adding documentation and type hints.

The implementation correctly delegates to the underlying replica.get() method. However, to improve maintainability and IDE support, consider adding:

  1. A docstring explaining the method's purpose and parameters
  2. Type hints for the parameters and return value

Here's a suggested improvement:

-    def get_replicas(self, volume_name=None, node_name=None, disk_uuid=None):
-        return self.replica.get(volume_name, node_name, disk_uuid)
+    def get_replicas(self, volume_name: str | None = None,
+                     node_name: str | None = None,
+                     disk_uuid: str | None = None) -> list:
+        """Retrieve replicas based on optional filtering criteria.
+
+        Args:
+            volume_name: Optional name of the volume to filter replicas
+            node_name: Optional name of the node to filter replicas
+            disk_uuid: Optional UUID of the disk to filter replicas
+
+        Returns:
+            list: List of matching replica objects
+        """
+        return self.replica.get(volume_name, node_name, disk_uuid)
e2e/libs/replica/base.py (1)

7-7: Add docstring to document the new parameter.

Consider adding a docstring to document the purpose and expected format of the disk_uuid parameter.

     def get(self, volume_name, node_name, disk_uuid):
+        """Get replica information.
+        
+        Args:
+            volume_name: Name of the volume
+            node_name: Name of the node
+            disk_uuid: UUID of the disk to filter replicas
+        
+        Returns:
+            Implementation specific replica information
+        """
         return NotImplemented
e2e/libs/replica/replica.py (1)

23-23: Fix spacing in method parameter

There's inconsistent spacing in the wait_for_rebuilding_start method call - missing space after the comma.

-    def wait_for_rebuilding_start(self, volume_name, node_name):
-        return self.replica.wait_for_rebuilding_start(volume_name,node_name)
+    def wait_for_rebuilding_start(self, volume_name, node_name):
+        return self.replica.wait_for_rebuilding_start(volume_name, node_name)
e2e/libs/replica/crd.py (2)

13-13: Add docstring to document the expanded functionality

Consider adding a docstring to document the purpose and parameters of this method, especially since it now supports additional filtering capabilities.

 def get(self, volume_name=None, node_name=None, disk_uuid=None):
+    """Retrieve Longhorn replicas with optional filtering.
+
+    Args:
+        volume_name (str, optional): Filter replicas by volume name
+        node_name (str, optional): Filter replicas by node name
+        disk_uuid (str, optional): Filter replicas by disk UUID
+
+    Returns:
+        list: List of replica objects matching the specified filters
+    """

15-20: Consider adding input validation and using constants for label keys

The current implementation could benefit from:

  1. Input validation for disk_uuid format
  2. Using constants for label keys to avoid hardcoding and ensure consistency
+LABEL_KEY_VOLUME = "longhornvolume"
+LABEL_KEY_NODE = "longhornnode"
+LABEL_KEY_DISK = "longhorndiskuuid"

 def get(self, volume_name=None, node_name=None, disk_uuid=None):
+    if disk_uuid and not self._is_valid_uuid(disk_uuid):
+        raise ValueError(f"Invalid disk UUID format: {disk_uuid}")
     label_selector = []
     if volume_name:
-        label_selector.append(f"longhornvolume={volume_name}")
+        label_selector.append(f"{LABEL_KEY_VOLUME}={volume_name}")
     if node_name:
-        label_selector.append(f"longhornnode={node_name}")
+        label_selector.append(f"{LABEL_KEY_NODE}={node_name}")
     if disk_uuid:
-        label_selector.append(f"longhorndiskuuid={disk_uuid}")
+        label_selector.append(f"{LABEL_KEY_DISK}={disk_uuid}")
e2e/keywords/common.resource (1)

37-37: Consider making the disk name more dynamic.

Instead of hardcoding block-disk, consider using the ${disk_type} variable that's already available in the loop to construct the disk name.

-        add_disk    block-disk    ${worker_node}    block    ${disk_path}
+        add_disk    ${disk_type}-disk    ${worker_node}    ${disk_type}    ${disk_path}

This would make the code more maintainable and reduce the chance of inconsistencies if the disk type needs to be changed in the future.

e2e/keywords/node.resource (1)

56-63: Consider adding error handling and cleanup for disk creation.

The disk creation process involves multiple steps (create, attach, wait, mount, add) which could fail independently. Consider:

  1. Adding error handling to clean up resources if any step fails
  2. Making the operation more atomic

Example improvement:

Create ${disk_size} Gi disk ${disk_id} on node ${node_id}
    ${node_name} =    get_node_by_index    ${node_id}
    ${disk_name} =    generate_name_with_suffix    disk    ${disk_id}
    TRY
        create_volume    ${disk_name}    size=${disk_size}Gi    numberOfReplicas=1
        attach_volume    ${disk_name}    ${node_name}
        wait_for_volume_healthy    ${disk_name}
        ${mount_path} =    mount_disk    ${disk_name}    ${node_name}
        add_disk    ${disk_name}    ${node_name}    filesystem    ${mount_path}
    EXCEPT    AS    ${error}
        # Cleanup on failure
        Run Keyword If Test Failed    cleanup_volume    ${disk_name}
        Run Keyword If Test Failed    cleanup_mount    ${mount_path}
        FAIL    Failed to create disk: ${error}
    END
e2e/libs/keywords/statefulset_keywords.py (1)

31-31: LGTM! Consider adding docstring for the new parameter.

The addition of the size parameter is well-structured and maintains backward compatibility.

Add a docstring to document the new parameter:

 def create_statefulset(self, name, volume_type="RWO", sc_name="longhorn", size=None):
+    """
+    Create a statefulset with the specified parameters.
+    
+    Args:
+        name: Name of the statefulset
+        volume_type: Volume access mode (default: "RWO")
+        sc_name: Storage class name (default: "longhorn")
+        size: Volume size (e.g., "1Gi", "500Mi", default: None)
+    """
e2e/tests/regression/test_v2.robot (2)

Line range hint 58-77: Consider enhancing test documentation.

While the test implementation is solid, it would be beneficial to add documentation that explains:

  • The expected trim behavior in degraded vs. healthy states
  • The purpose of the loop count in this specific test case
  • The significance of testing trim operations during degraded states

Add documentation like this:

 V2 Volume Should Block Trim When Volume Is Degraded
+    [Documentation]    Validates that trim operations are:
+    ...    - Blocked when the volume is in degraded state after cluster restart
+    ...    - Allowed once the volume returns to healthy state
+    ...    The test repeats to ensure consistent behavior across multiple iterations.
     [Tags]    cluster

Line range hint 65-77: Consider improving test resilience.

The test involves cluster restart and data operations which can be sensitive to timing. Consider these improvements:

  1. Add explicit timeouts for cluster operations
  2. Verify data integrity after cluster restart
  3. Add detailed error messages for debugging failures

Example improvements:

         When Restart cluster
         And Wait for longhorn ready
+        And Wait for longhorn ready timeout=600
         And Wait for volume of deployment 0 attached and degraded
+        And Verify data integrity in deployment 0
         Then Trim deployment 0 volume should fail
+        ...    msg=Trim operation should be blocked in degraded state
e2e/libs/replica/rest.py (1)

15-16: Consider adding type hints and documentation

To improve maintainability and clarity, consider:

  1. Adding type hints for the parameters
  2. Adding a docstring explaining the purpose of disk_uuid
  3. Documenting the expected return type
-    def get(self, volume_name, node_name, disk_uuid):
+    def get(self, volume_name: str, node_name: str, disk_uuid: str) -> dict:
+        """Retrieve replica information for a volume on a specific node and disk.
+
+        Args:
+            volume_name: Name of the volume
+            node_name: Name of the node
+            disk_uuid: UUID of the disk where the replica resides
+
+        Returns:
+            dict: Replica information
+        """
         return NotImplemented
e2e/keywords/statefulset.resource (2)

18-20: Consider adding documentation for the size parameter.

The keyword implementation looks good and follows the established patterns. Consider adding documentation about the valid range for the size parameter to help other developers use this keyword correctly.

Add documentation like this:

 Create statefulset ${statefulset_id} using ${volume_type} volume with ${sc_name} storageclass and size ${size} Mi
+    [Documentation]    Creates a StatefulSet with specified volume size in MiB.
+    ...    The size parameter must be a positive integer representing the volume size in MiB.
     ${statefulset_name} =    generate_name_with_suffix    statefulset    ${statefulset_id}
     create_statefulset    ${statefulset_name}    ${volume_type}    ${sc_name}    ${size}Mi

22-24: Consider adding size parameter validation.

The keyword implementation looks good. Consider adding parameter validation to ensure the size is a positive number before creating the StatefulSet.

You might want to add a validation keyword:

Validate Volume Size
    [Arguments]    ${size}
    Should Be True    ${size} > 0    Volume size must be a positive number

Then update the keyword to use it:

 Create statefulset ${statefulset_id} using ${volume_type} volume with ${sc_name} storageclass and size ${size} Gi
+    [Documentation]    Creates a StatefulSet with specified volume size in GiB.
+    ...    The size parameter must be a positive integer representing the volume size in GiB.
+    Validate Volume Size    ${size}
     ${statefulset_name} =    generate_name_with_suffix    statefulset    ${statefulset_id}
     create_statefulset    ${statefulset_name}    ${volume_type}    ${sc_name}    ${size}Gi
e2e/tests/regression/test_scheduling.robot (1)

94-102: Consider enhancing verification steps

The current verification steps check for:

  • Replica distribution across disks
  • Pressure status
  • Data integrity

Consider adding:

  1. Verification that replicas were not rebuilt simultaneously
  2. Specific pressure percentage checks
e2e/libs/workload/statefulset.py (1)

13-13: Add type hints and docstring for better clarity.

Consider adding type hints and a docstring to clarify the expected format and optional nature of the size parameter.

-def create_statefulset(statefulset_name, volume_type, sc_name, size):
+def create_statefulset(statefulset_name: str, volume_type: str, sc_name: str, size: str | None = None) -> None:
+    """Create a StatefulSet with specified configuration.
+    
+    Args:
+        statefulset_name: Name of the StatefulSet
+        volume_type: Volume access mode ('RWX' or 'RWO')
+        sc_name: Storage class name
+        size: Storage size (e.g., '1Gi', '500Mi'), optional
+    """
e2e/keywords/workload.resource (1)

218-225: Implementation looks good but could be enhanced

The keyword implementation follows good practices and integrates well with existing helper methods. However, there are a few potential improvements:

  1. The assertion len(${replicas}) > 0 could be more specific based on the expected number of replicas.
  2. Consider adding error handling for cases where disk_uuid is not found.
  3. Documentation for the keyword's purpose and parameters is missing.

Consider enhancing the implementation with this diff:

 Check volume of ${workload_kind} ${workload_id} replica on node ${node_id} disk ${disk_id}
+    [Documentation]    Verifies that the specified volume has replicas on the given node and disk.
+    ...    
+    ...    Arguments:
+    ...    - workload_kind: Type of the workload (e.g., pod, deployment)
+    ...    - workload_id: Identifier for the workload
+    ...    - node_id: Node index where to check for replicas
+    ...    - disk_id: Disk identifier where to check for replicas
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${volume_name} =    get_workload_volume_name    ${workload_name}
     ${node_name} =    get_node_by_index    ${node_id}
     ${disk_name} =    generate_name_with_suffix    disk    ${disk_id}
     ${disk_uuid} =    get_disk_uuid    ${node_name}    ${disk_name}
+    Should Not Be Empty    ${disk_uuid}    msg=Disk ${disk_name} not found on node ${node_name}
     ${replicas} =    get_replicas    volume_name=${volume_name}    node_name=${node_name}    disk_uuid=${disk_uuid}
-    Should Be True    len(${replicas}) > 0
+    ${replica_count} =    Get Length    ${replicas}
+    Should Be True    ${replica_count} > 0    msg=No replicas found for volume ${volume_name} on node ${node_name} disk ${disk_name}
e2e/libs/keywords/node_keywords.py (4)

14-16: Ensure consistent parameter order for 'node_name' and 'disk_name'

For consistency across methods, consider placing node_name before disk_name in the mount_disk method, as other methods have node_name first.

Apply this diff:

-def mount_disk(self, disk_name, node_name):
+def mount_disk(self, node_name, disk_name):
    logging(f"Mount device /dev/longhorn/{disk_name} on node {node_name}")
    return self.node.mount_disk(disk_name, node_name)

Line range hint 18-25: Rename 'type' to 'disk_type' and adjust parameter order for consistency

Using type as a parameter name shadows the built-in type function, which can lead to confusion or unexpected behavior. Additionally, to maintain consistency, consider placing node_name before disk_name in the method parameters.

Apply this diff:

-def add_disk(self, disk_name, node_name, type, path):
-    logging(f"Adding {type} type disk {disk_name} {path} to node {node_name}")
+def add_disk(self, node_name, disk_name, disk_type, path):
+    logging(f"Adding {disk_type} type disk {disk_name} {path} to node {node_name}")
     disk = {
-        f"{disk_name}": {
-            "diskType": type,
+        disk_name: {
+            "diskType": disk_type,
             "path": path,
             "allowScheduling": True
         }
     }
     self.node.add_disk(node_name, disk)

45-46: Add logging to 'disable_disk' method for consistency

Consider adding a logging statement to the disable_disk method to maintain consistency and aid in debugging.

Apply this diff:

 def disable_disk(self, node_name, disk_name):
+    logging(f"Disabling disk {disk_name} on node {node_name}")
     self.node.set_disk_scheduling(node_name, disk_name, allowScheduling=False)

48-49: Add logging to 'enable_disk' method for consistency

Similarly, add a logging statement to the enable_disk method.

Apply this diff:

 def enable_disk(self, node_name, disk_name):
+    logging(f"Enabling disk {disk_name} on node {node_name}")
     self.node.set_disk_scheduling(node_name, disk_name, allowScheduling=True)
e2e/libs/node/node.py (1)

27-27: Remove unused variable res to clean up the code.

The variable res is assigned the result of NodeExec(node_name).issue_cmd(cmd) but is not used thereafter. This can be removed to improve code clarity.

Apply this diff to remove the unused variable:

-    res = NodeExec(node_name).issue_cmd(cmd)
+    NodeExec(node_name).issue_cmd(cmd)

Also applies to: 29-29, 31-31

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1e777ba and e621156.

📒 Files selected for processing (17)
  • e2e/keywords/common.resource (1 hunks)
  • e2e/keywords/node.resource (2 hunks)
  • e2e/keywords/replica.resource (1 hunks)
  • e2e/keywords/statefulset.resource (1 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/node_keywords.py (3 hunks)
  • e2e/libs/keywords/replica_keywords.py (1 hunks)
  • e2e/libs/keywords/statefulset_keywords.py (1 hunks)
  • e2e/libs/node/node.py (6 hunks)
  • e2e/libs/replica/base.py (1 hunks)
  • e2e/libs/replica/crd.py (1 hunks)
  • e2e/libs/replica/replica.py (1 hunks)
  • e2e/libs/replica/rest.py (1 hunks)
  • e2e/libs/workload/statefulset.py (2 hunks)
  • e2e/tests/regression/test_scheduling.robot (2 hunks)
  • e2e/tests/regression/test_v2.robot (1 hunks)
  • e2e/tests/regression/test_volume.robot (1 hunks)
🧰 Additional context used
🪛 Ruff
e2e/libs/node/node.py

31-31: Local variable res is assigned to but never used

Remove assignment to unused variable res

(F841)

🔇 Additional comments (20)
e2e/keywords/replica.resource (2)

6-6: LGTM: Library import is appropriate

The addition of node_keywords.py is necessary for the new functionality and follows the existing import pattern.


13-18: Consider enhancing the keyword's robustness

While the implementation is functional, consider these improvements:

  1. The empty volume_name parameter in get_replicas call looks suspicious. Should this be parameterized?
  2. The validation is very permissive (any number > 0). Consider adding a parameter for expected replica count.
  3. Missing error handling for cases where disk_uuid might not be found.

Consider this enhanced implementation:

- There should be replicas running on node ${node_id} disk ${disk_id}
+ There should be ${expected_count} replicas running on node ${node_id} disk ${disk_id}
     ${node_name} =    get_node_by_index    ${node_id}
     ${disk_name} =    generate_name_with_suffix    disk    ${disk_id}
     ${disk_uuid} =    get_disk_uuid    ${node_name}    ${disk_name}
+    Should Not Be Empty    ${disk_uuid}    msg=Disk UUID not found for node ${node_name} disk ${disk_name}
-    ${replicas} =    get_replicas    volume_name=    node_name=${node_name}    disk_uuid=${disk_uuid}
+    ${replicas} =    get_replicas    node_name=${node_name}    disk_uuid=${disk_uuid}
-    Should Be True    len(${replicas}) > 0
+    Length Should Be    ${replicas}    ${expected_count}    msg=Expected ${expected_count} replicas but found ${replicas}

Let's verify the usage of empty volume_name parameter:

e2e/libs/replica/replica.py (1)

19-20: LGTM: Clean implementation of the disk_uuid parameter

The addition of the optional disk_uuid parameter is well-implemented and maintains backward compatibility. This change aligns well with the PR objective of migrating disk pressure-related test cases.

e2e/tests/regression/test_volume.robot (1)

22-22: LGTM! Good practice using a configurable variable for data engine.

The addition of ${DATA_ENGINE} variable allows for flexible configuration of the data engine version used in tests.

e2e/libs/replica/crd.py (1)

13-20: Verify compatibility with existing callers

The modified signature of the get method might affect other parts of the codebase that call this method.

✅ Verification successful

Let me analyze the search results to verify the impact of the modified get method signature.


Based on the search results, I can now generate the final response:

The modified get method signature is compatible with existing usage

The changes to the get method are safe as all callers are already using the method with named arguments that match the new signature:

  • All calls use volume_name, node_name and/or disk_uuid as named parameters
  • No callers rely on positional arguments
  • The parameters remain optional which maintains backward compatibility
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential compatibility issues with the modified get method

# Find all calls to the get method in the codebase
echo "Searching for direct calls to get method..."
rg -A 2 "\.get\([^)]*\)" --type py

# Look for potential imports of this class
echo "Searching for imports of the CRD class..."
rg "from replica.crd import CRD" --type py

Length of output: 47208


Script:

#!/bin/bash
# Search for actual usage of the get method from the replica CRD class
echo "Searching for actual usage of get method from replica CRD..."
rg -A 3 "replica\.get\(" --type py

# Search for method implementations in the replica class
echo "Searching for method implementations in replica class..."
ast-grep --pattern 'def get($$$)'

Length of output: 1922

e2e/keywords/common.resource (1)

37-37: Verify the impact on existing test cases.

The change to include block-disk as the first parameter in add_disk looks good, as it aligns with the standardization of disk naming. However, we should ensure this change doesn't break existing test cases.

✅ Verification successful

The add_disk signature change is consistent with existing usage

The change is verified to be correct as it follows the existing pattern in the codebase:

  • In node.resource, add_disk is already using the ${disk_type}-disk naming convention
  • The modified line in common.resource follows the same pattern by using block-disk
  • All instances of add_disk follow the same parameter order: disk name, node name, disk type, and path
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for other test cases that might be affected by the add_disk signature change

# Look for other direct calls to add_disk that might need updating
rg "add_disk\s+(?!block-disk)[^\s]+" --type robot

# Look for any test cases that might depend on specific disk names
rg "block-disk" --type robot

Length of output: 326


Script:

#!/bin/bash
# Search for add_disk calls in robot files
rg "add_disk\s+\S+" --type robot -A 2

# Search for block-disk string in robot files
rg "block-disk" --type robot

Length of output: 68


Script:

#!/bin/bash
# List all robot framework files first to ensure we're searching in the right place
fd ".robot$|.resource$"

# Then search for add_disk pattern with word boundaries
rg "\badd_disk\b" --type-add 'robot:*.{robot,resource}' -t robot

Length of output: 2160

e2e/keywords/node.resource (4)

6-6: LGTM: Library import is appropriate.

The addition of volume_keywords.py library is necessary to support the new disk management functionalities.


36-44: LGTM: Well-structured disk scheduling keywords.

The implementation follows consistent patterns and uses appropriate helper functions for disk scheduling management.


46-54: LGTM: Disk pressure check keywords align with PR objectives.

These keywords are essential for the test migration and properly implement waiting mechanisms for reliable pressure checks.


12-12: Verify impact of disk naming change.

The modification to prepend ${disk_type}-disk to the disk name is a good practice for clarity, but we should verify this doesn't break existing tests.

✅ Verification successful

The previous search pattern was too specific. Let's broaden the search to find disk-related test cases and usages of the add_disk keyword to better understand the impact of the naming change.


Let's examine the implementation of the add_disk keyword and any disk-related variables to understand the current naming convention and potential impact.


Disk naming change is safe to proceed

Based on the code analysis:

  • The modified keyword is used in a higher-level keyword that adds disks to worker nodes
  • The existing usage in other contexts (like generate_name_with_suffix) already handles disk naming independently
  • The change only affects the specific worker node disk addition scenario where the type-based naming adds clarity
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing test cases that might be affected by the disk naming change
rg -l "Add \w+ type disk" --type robot

Length of output: 40


Script:

#!/bin/bash
# Search for usage of add_disk keyword in test files
rg "add_disk" --type robot -A 2 -B 2

# Also search for any disk-related test cases
rg "test.*disk" --type robot -i

Length of output: 381


Script:

#!/bin/bash
# Let's look at the actual implementation of add_disk keyword
rg "add_disk" --type robot -A 5 -B 5 e2e/keywords/node.resource

# Also check for any disk-related variables or settings that might reference disk names
rg "\${.*disk.*}" --type robot

Length of output: 954

e2e/libs/keywords/statefulset_keywords.py (1)

32-33: Verify the underlying create_statefulset function implementation.

The changes look good, but we should verify that the imported create_statefulset function properly handles the new size parameter.

✅ Verification successful

Let me gather more information about the implementation details of the create_statefulset function in e2e/libs/workload/statefulset.py since we can see it's defined there.


Based on the implementation shown in e2e/libs/workload/statefulset.py, I can now provide the final response:

The create_statefulset function properly handles the size parameter

The implementation correctly uses the size parameter to set the storage request size in the StatefulSet's volumeClaimTemplate:

if size:
    manifest_dict['spec']['volumeClaimTemplates'][0]['spec']['resources']['requests']['storage'] = size
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the imported create_statefulset function accepts the size parameter

# Search for the create_statefulset function definition
ast-grep --pattern 'def create_statefulset($$$)' 

# Search for any existing usage of create_statefulset with size parameter
rg -A 2 'create_statefulset.*size' 

Length of output: 1425


Script:

#!/bin/bash
# Check the implementation of create_statefulset function and how it uses the size parameter
cat e2e/libs/workload/statefulset.py

# Also check the template file it uses
cat ./templates/workload/statefulset.yaml

Length of output: 5092

e2e/tests/regression/test_v2.robot (1)

58-58: LGTM! The cluster tag is appropriate for this test case.

The addition of the cluster tag is well-justified as this test case involves cluster-wide operations (cluster restart) and validates system behavior at the cluster level for trim operations in degraded states.

e2e/tests/regression/test_scheduling.robot (4)

8-14: LGTM: Resource imports align with test requirements

The addition of replica, storageclass, statefulset, and node resources is appropriate for testing replica auto-balancing functionality. The resource organization follows a logical order.


57-65: LGTM: Well-documented test case

The test documentation clearly outlines:

  • The test scenario (disk pressure at 80%)
  • Expected behavior (non-simultaneous replica rebuilding)
  • Traceability to the original issue

85-89: Verify pressure threshold calculations

The test writes 254MB to each statefulset to trigger disk pressure. Please verify:

  1. The calculation comment: "950 Mi * 80% / 3 = 254 Mi"
  2. Whether this consistently triggers the pressure threshold

Consider adding a verification step after writing data to ensure the exact pressure percentage is reached before proceeding.


76-83: Verify disk size calculations and assumptions

The test assumes 950Mi available space out of 1Gi disk, and divides it into 3 parts (316Mi each). Please verify:

  1. Is 950Mi a reliable assumption across different environments?
  2. Should we add a pre-check step to verify available space?
e2e/libs/workload/statefulset.py (1)

Line range hint 13-35: Verify impact on existing test cases.

Since this is a test infrastructure change, we should ensure it doesn't break existing test cases that might not provide the size parameter.

✅ Verification successful

Optional size parameter is safe to use

The changes are safe because:

  1. The template file has a default storage size of 3Gi
  2. The size parameter is optional in the function signature with a default value of None
  3. The size is only modified in the manifest if explicitly provided
  4. Existing test cases in other files use the function through statefulset_keywords.py which maintains the same optional parameter pattern
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for existing calls to create_statefulset to verify compatibility
rg -A 2 "create_statefulset\(" --type py

# Look for any test files that might be affected by this change
fd -e py -x grep -l "test_replica_auto_balance_disk_in_pressure" {}

Length of output: 1937


Script:

#!/bin/bash
# Let's check the actual implementation of create_statefulset in test_scheduling.py
rg -B 5 -A 10 "_create_statefulset" manager/integration/tests/test_scheduling.py

# Also check the template file to understand default values
cat e2e/libs/workload/templates/workload/statefulset.yaml

# Check for any other test files using statefulset
fd -e py -x grep -l "statefulset" {}

Length of output: 2275


Script:

#!/bin/bash
# Let's check how the size parameter is used in other test files
rg -B 2 -A 2 "create_statefulset.*size" ./e2e/libs/keywords/statefulset_keywords.py ./manager/integration/tests/test_statefulset.py ./manager/integration/tests/test_rwx.py

# Check the actual template content since the path was incorrect
fd -t f statefulset.yaml
cat ./templates/workload/statefulset.yaml 2>/dev/null || cat ./e2e/templates/workload/statefulset.yaml 2>/dev/null

Length of output: 2041

e2e/keywords/workload.resource (1)

11-11: LGTM: Library import follows conventions

The import of replica_keywords.py is properly placed and follows the existing import conventions.

e2e/libs/node/node.py (2)

75-78: Ensure consistent logging messages for disk scheduling actions.

The logging messages in reset_disks are now more informative. Good job on improving the clarity of log outputs.


53-55: Confirm the logic change in disk condition checks.

The condition in wait_for_disk_update has been modified. Ensure this change aligns with the desired behavior for disk readiness and scheduling.

Run the following script to verify disk conditions:

✅ Verification successful

Based on the search results, I can now verify the logic change in disk condition checks. The code in wait_for_disk_update is checking for two conditions:

  1. The disk's allowScheduling is true
  2. Either:
    • The disk has no conditions, or
    • The disk's Ready condition status is not "True"

This is a valid check as it ensures that disks that are allowed for scheduling must have their Ready condition set to "True" before they are considered fully updated. This is consistent with the test cases shown in the search results where:

  • Tests wait for disk conditions to be updated after disk operations
  • The Ready condition is checked alongside Schedulable condition
  • Disk operations (mount/unmount) affect the Ready condition status

The logic change is correct and aligns with the expected disk readiness verification behavior.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify disk conditions and readiness status.

# Expected: Disks should have 'Ready' status as 'True' when 'allowScheduling' is true.
ast-grep --pattern $'node.disks[d] {
  "allowScheduling": true,
  "conditions": {
    "Ready": {
      "status": "True"
    }
  }
}'

Length of output: 133


Script:

#!/bin/bash
# Let's examine the full context of the wait_for_disk_update function and its usage
rg -A 10 -B 10 "wait_for_disk_update" 

# Also search for any tests or related disk condition checks
rg -A 5 -B 5 "disk.*condition" 
rg -A 5 -B 5 "allowScheduling.*Ready"

Length of output: 105668

@@ -4,7 +4,7 @@
class Base(ABC):

@abstractmethod
def get(self, volume_name, node_name):
def get(self, volume_name, node_name, disk_uuid):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Implementation inconsistency found in derived classes

The verification reveals that not all implementing classes have been properly updated with the new disk_uuid parameter:

  • e2e/libs/replica/crd.py is missing the get() method implementation entirely
  • e2e/libs/replica/replica.py has inconsistent signature with optional disk_uuid=None
  • e2e/libs/replica/rest.py matches the base class signature correctly

The abstract base class requires a mandatory disk_uuid parameter, but implementations are either missing or inconsistent. This needs to be fixed to maintain the contract defined by the base class.

🔗 Analysis chain

LGTM! Verify implementations in derived classes.

The addition of disk_uuid parameter to the abstract get() method aligns well with the PR's objective of enhancing disk management capabilities.

Let's verify that all implementing classes have been updated with the new parameter:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if all implementations of the get() method include the disk_uuid parameter
# Expected: All implementing classes should have the updated signature

# Search for class definitions that inherit from Base
ast-grep --pattern 'class $_ (Base):'

# Search for get method implementations
ast-grep --pattern 'def get(self, volume_name, node_name, $_):'

Length of output: 105


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to find implementations

# First, find the base class definition
echo "=== Base class definition ==="
rg -l "class.*Base.*:" e2e/libs/replica/

# Then find all implementations of get method in replica-related files
echo -e "\n=== Get method implementations ==="
rg "def get\(self, volume_name, node_name" -A 1 e2e/libs/replica/

Length of output: 822

Comment on lines +15 to 16
def get(self, volume_name, node_name, disk_uuid):
return NotImplemented
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Breaking change detected in method signature

The verification reveals that while the base class Base and implementation class CRD have been properly updated to include the disk_uuid parameter, the Replica class implementation has a mismatched signature where disk_uuid is marked as optional with a default value (disk_uuid=None). This inconsistency with the abstract base class could lead to runtime issues.

  • e2e/libs/replica/replica.py: Update the get method signature to match the base class by removing the default value:
    def get(self, volume_name, node_name, disk_uuid):
🔗 Analysis chain

Verify implementations of the updated method signature

The addition of the disk_uuid parameter is aligned with the test migration objectives. However, since this is an abstract method, we should ensure all implementing classes have been updated accordingly.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all implementations of the Rest class to verify they implement
# the updated get method signature

# Search for classes that inherit from Rest
ast-grep --pattern 'class $_ (Rest):'

# Search for get method implementations with the new signature
ast-grep --pattern 'def get(self, volume_name, node_name, disk_uuid):'

Length of output: 112


Script:

#!/bin/bash
# Let's try a different approach using ripgrep to find implementations

# Search for class definitions inheriting from Rest
rg "class.*\(Rest\)" -l

# Search for all get method implementations in the codebase
rg "def get\(.*volume_name.*node_name.*\):" -l

# Get content of the base class to understand the inheritance
cat e2e/libs/replica/rest.py

Length of output: 3259


Script:

#!/bin/bash
# Let's examine the implementation in other files that might be affected

# Check the content of files that might have the get method implementation
cat e2e/libs/replica/crd.py
cat e2e/libs/replica/base.py
cat e2e/libs/replica/replica.py

Length of output: 3895

Comment on lines +33 to +35
# correct request storage size
if size:
manifest_dict['spec']['volumeClaimTemplates'][0]['spec']['resources']['requests']['storage'] = size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation and improve error handling for storage size.

The current implementation could be more robust:

  1. No validation of size format (e.g., '1Gi', '500Mi')
  2. No error handling for invalid sizes
  3. Nested dictionary access could fail if manifest structure changes

Consider this safer implementation:

-        # correct request storage size
-        if size:
-            manifest_dict['spec']['volumeClaimTemplates'][0]['spec']['resources']['requests']['storage'] = size
+        # Set PVC storage size if specified
+        if size:
+            # Validate size format
+            if not isinstance(size, str) or not any(size.endswith(unit) for unit in ['Ki', 'Mi', 'Gi', 'Ti']):
+                raise ValueError(f"Invalid storage size format: {size}. Expected format: '<number>Ki|Mi|Gi|Ti'")
+            
+            try:
+                template = manifest_dict['spec']['volumeClaimTemplates'][0]
+                template.setdefault('spec', {}).setdefault('resources', {}).setdefault('requests', {})['storage'] = size
+            except (KeyError, IndexError) as e:
+                raise ValueError(f"Invalid manifest structure: {str(e)}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# correct request storage size
if size:
manifest_dict['spec']['volumeClaimTemplates'][0]['spec']['resources']['requests']['storage'] = size
# Set PVC storage size if specified
if size:
# Validate size format
if not isinstance(size, str) or not any(size.endswith(unit) for unit in ['Ki', 'Mi', 'Gi', 'Ti']):
raise ValueError(f"Invalid storage size format: {size}. Expected format: '<number>Ki|Mi|Gi|Ti'")
try:
template = manifest_dict['spec']['volumeClaimTemplates'][0]
template.setdefault('spec', {}).setdefault('resources', {}).setdefault('requests', {})['storage'] = size
except (KeyError, IndexError) as e:
raise ValueError(f"Invalid manifest structure: {str(e)}")

Comment on lines +246 to +248
def get_disk_uuid(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
return node["disks"][disk_name]["diskUUID"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure disk existence before accessing disk UUID.

In get_disk_uuid, accessing node["disks"][disk_name]["diskUUID"] without checking may lead to a KeyError if the disk does not exist.

Apply this diff to add a check for disk existence:

 def get_disk_uuid(self, node_name, disk_name):
     node = get_longhorn_client().by_id_node(node_name)
-    return node["disks"][disk_name]["diskUUID"]
+    try:
+        return node["disks"][disk_name]["diskUUID"]
+    except KeyError:
+        raise KeyError(f"Disk '{disk_name}' not found on node '{node_name}'")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def get_disk_uuid(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
return node["disks"][disk_name]["diskUUID"]
def get_disk_uuid(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
try:
return node["disks"][disk_name]["diskUUID"]
except KeyError:
raise KeyError(f"Disk '{disk_name}' not found on node '{node_name}'")

Comment on lines +204 to +211
def set_disk_scheduling(self, node_name, disk_name, allowScheduling):
node = get_longhorn_client().by_id_node(node_name)

for name, disk in iter(node.disks.items()):
if name == disk_name:
disk.allowScheduling = allowScheduling
self.update_disks(node_name, node.disks)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for existence of the specified disk before updating scheduling.

In set_disk_scheduling, there is no check to confirm if the specified disk_name exists in node.disks. This may lead to a KeyError if the disk does not exist.

Apply this diff to add a check for disk existence:

 def set_disk_scheduling(self, node_name, disk_name, allowScheduling):
     node = get_longhorn_client().by_id_node(node_name)

+    if disk_name not in node.disks:
+        raise KeyError(f"Disk '{disk_name}' not found on node '{node_name}'")

     for name, disk in iter(node.disks.items()):
         if name == disk_name:
             disk.allowScheduling = allowScheduling
     self.update_disks(node_name, node.disks)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def set_disk_scheduling(self, node_name, disk_name, allowScheduling):
node = get_longhorn_client().by_id_node(node_name)
for name, disk in iter(node.disks.items()):
if name == disk_name:
disk.allowScheduling = allowScheduling
self.update_disks(node_name, node.disks)
def set_disk_scheduling(self, node_name, disk_name, allowScheduling):
node = get_longhorn_client().by_id_node(node_name)
if disk_name not in node.disks:
raise KeyError(f"Disk '{disk_name}' not found on node '{node_name}'")
for name, disk in iter(node.disks.items()):
if name == disk_name:
disk.allowScheduling = allowScheduling
self.update_disks(node_name, node.disks)

Comment on lines +224 to +226
def is_disk_in_pressure(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
return node["disks"][disk_name]["conditions"]["Schedulable"]["reason"] == "DiskPressure"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for missing disk conditions.

The method is_disk_in_pressure may raise a KeyError if the expected keys are missing in the disk's conditions. It's safer to handle such cases to prevent unexpected exceptions.

Apply this diff to safely access the dictionary keys:

 def is_disk_in_pressure(self, node_name, disk_name):
     node = get_longhorn_client().by_id_node(node_name)
-    return node["disks"][disk_name]["conditions"]["Schedulable"]["reason"] == "DiskPressure"
+    disk = node["disks"].get(disk_name, {})
+    conditions = disk.get("conditions", {})
+    schedulable = conditions.get("Schedulable", {})
+    reason = schedulable.get("reason", "")
+    return reason == "DiskPressure"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_disk_in_pressure(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
return node["disks"][disk_name]["conditions"]["Schedulable"]["reason"] == "DiskPressure"
def is_disk_in_pressure(self, node_name, disk_name):
node = get_longhorn_client().by_id_node(node_name)
disk = node["disks"].get(disk_name, {})
conditions = disk.get("conditions", {})
schedulable = conditions.get("Schedulable", {})
reason = schedulable.get("reason", "")
return reason == "DiskPressure"

Comment on lines +228 to +245
def wait_for_disk_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} in pressure ... ({i})")
if is_in_pressure:
break
time.sleep(self.retry_interval)
assert self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} in pressure failed: {get_longhorn_client().by_id_node(node_name)}"

def wait_for_disk_not_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} not in pressure ... ({i})")
if not is_in_pressure:
break
time.sleep(self.retry_interval)
assert not self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} not in pressure failed: {get_longhorn_client().by_id_node(node_name)}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle potential infinite loops in wait methods.

In wait_for_disk_in_pressure and wait_for_disk_not_in_pressure, if the condition is never met, the method might loop indefinitely. Consider adding a timeout or maximum retry limit.

Ensure that the loop respects self.retry_count and raises an exception when the condition isn't met within the retries.

Apply this diff to enforce maximum retries:

 def wait_for_disk_in_pressure(self, node_name, disk_name):
     for i in range(self.retry_count):
         is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
         logging(f"Waiting for disk {disk_name} on node {node_name} in pressure ... ({i})")
         if is_in_pressure:
             break
         time.sleep(self.retry_interval)
-    assert self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} in pressure failed: {get_longhorn_client().by_id_node(node_name)}"
+    else:
+        raise TimeoutError(f"Disk {disk_name} on node {node_name} did not enter pressure state within the expected time.")

 def wait_for_disk_not_in_pressure(self, node_name, disk_name):
     for i in range(self.retry_count):
         is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
         logging(f"Waiting for disk {disk_name} on node {node_name} not in pressure ... ({i})")
         if not is_in_pressure:
             break
         time.sleep(self.retry_interval)
-    assert not self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} not in pressure failed: {get_longhorn_client().by_id_node(node_name)}"
+    else:
+        raise TimeoutError(f"Disk {disk_name} on node {node_name} did not exit pressure state within the expected time.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def wait_for_disk_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} in pressure ... ({i})")
if is_in_pressure:
break
time.sleep(self.retry_interval)
assert self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} in pressure failed: {get_longhorn_client().by_id_node(node_name)}"
def wait_for_disk_not_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} not in pressure ... ({i})")
if not is_in_pressure:
break
time.sleep(self.retry_interval)
assert not self.is_disk_in_pressure(node_name, disk_name), f"Waiting for node {node_name} disk {disk_name} not in pressure failed: {get_longhorn_client().by_id_node(node_name)}"
def wait_for_disk_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} in pressure ... ({i})")
if is_in_pressure:
break
time.sleep(self.retry_interval)
else:
raise TimeoutError(f"Disk {disk_name} on node {node_name} did not enter pressure state within the expected time.")
def wait_for_disk_not_in_pressure(self, node_name, disk_name):
for i in range(self.retry_count):
is_in_pressure = self.is_disk_in_pressure(node_name, disk_name)
logging(f"Waiting for disk {disk_name} on node {node_name} not in pressure ... ({i})")
if not is_in_pressure:
break
time.sleep(self.retry_interval)
else:
raise TimeoutError(f"Disk {disk_name} on node {node_name} did not exit pressure state within the expected time.")

Comment on lines +23 to +33
def mount_disk(self, disk_name, node_name):
mount_path = os.path.join(self.DEFAULT_DISK_PATH, disk_name)
device_path = os.path.join(self.DEFAULT_VOLUME_PATH, disk_name)
cmd = f"mkdir -p {mount_path}"
res = NodeExec(node_name).issue_cmd(cmd)
cmd = f"mkfs.ext4 {device_path}"
res = NodeExec(node_name).issue_cmd(cmd)
cmd = f"mount {device_path} {mount_path}"
res = NodeExec(node_name).issue_cmd(cmd)
return mount_path

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle potential exceptions during disk mounting.

The mount_disk method executes system commands that may fail. It's advisable to handle exceptions or check the result of each command to ensure robustness.

Apply this diff to handle exceptions:

 def mount_disk(self, disk_name, node_name):
     mount_path = os.path.join(self.DEFAULT_DISK_PATH, disk_name)
     device_path = os.path.join(self.DEFAULT_VOLUME_PATH, disk_name)
     cmd = f"mkdir -p {mount_path}"
-    NodeExec(node_name).issue_cmd(cmd)
+    if not NodeExec(node_name).issue_cmd(cmd):
+        raise Exception(f"Failed to create mount path {mount_path} on node {node_name}")
     cmd = f"mkfs.ext4 {device_path}"
-    NodeExec(node_name).issue_cmd(cmd)
+    if not NodeExec(node_name).issue_cmd(cmd):
+        raise Exception(f"Failed to format device {device_path} on node {node_name}")
     cmd = f"mount {device_path} {mount_path}"
-    NodeExec(node_name).issue_cmd(cmd)
+    if not NodeExec(node_name).issue_cmd(cmd):
+        raise Exception(f"Failed to mount {device_path} to {mount_path} on node {node_name}")
     return mount_path
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def mount_disk(self, disk_name, node_name):
mount_path = os.path.join(self.DEFAULT_DISK_PATH, disk_name)
device_path = os.path.join(self.DEFAULT_VOLUME_PATH, disk_name)
cmd = f"mkdir -p {mount_path}"
res = NodeExec(node_name).issue_cmd(cmd)
cmd = f"mkfs.ext4 {device_path}"
res = NodeExec(node_name).issue_cmd(cmd)
cmd = f"mount {device_path} {mount_path}"
res = NodeExec(node_name).issue_cmd(cmd)
return mount_path
def mount_disk(self, disk_name, node_name):
mount_path = os.path.join(self.DEFAULT_DISK_PATH, disk_name)
device_path = os.path.join(self.DEFAULT_VOLUME_PATH, disk_name)
cmd = f"mkdir -p {mount_path}"
if not NodeExec(node_name).issue_cmd(cmd):
raise Exception(f"Failed to create mount path {mount_path} on node {node_name}")
cmd = f"mkfs.ext4 {device_path}"
if not NodeExec(node_name).issue_cmd(cmd):
raise Exception(f"Failed to format device {device_path} on node {node_name}")
cmd = f"mount {device_path} {mount_path}"
if not NodeExec(node_name).issue_cmd(cmd):
raise Exception(f"Failed to mount {device_path} to {mount_path} on node {node_name}")
return mount_path
🧰 Tools
🪛 Ruff

31-31: Local variable res is assigned to but never used

Remove assignment to unused variable res

(F841)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant