Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/optimize the logic of finding cluster service/pod ip range #3285

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

unilinu
Copy link

@unilinu unilinu commented Nov 28, 2024

What type of PR is this?

Fix or Optimize

What this PR does / why we need it:

According to the original logic, but fix some issues and optimize:

  1. Remove find pod ip range from pod's spec. The node.spec.podCIDR field is not the cluster pod ip range, but the node-assigned.

kubectl explain node.spec.podCIDR shows:
PodCIDR represents the pod IP range assigned to the node

  1. Correct the label selector of kube-proxy from component=kube-proxy to k8s-app=kube-proxy

  2. Similar to findPodIPRange function, add the redundant finding from kube-controller-manager.

@submariner-bot
Copy link
Contributor

🤖 Created branch: z_pr3285/unilinu/fix/optimize-finding-cluster-iprange
🚀 Full E2E won't run until the "ready-to-test" label is applied. I will add it automatically once the PR has 2 approvals, or you can add it manually.

@@ -78,22 +77,31 @@ func discoverNetwork(ctx context.Context, client controllerClient.Client) (*Clus

func findClusterIPRange(ctx context.Context, client controllerClient.Client) (string, error) {
clusterIPRange, err := findClusterIPRangeFromApiserver(ctx, client)
if err != nil || clusterIPRange != "" {
return clusterIPRange, err
if clusterIPRange != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the error checking here and other places below? This hides the error which isn't best practice. An error here means it could not determine if the IP range could be obtained, most likely because the API server wasn't available, in which case we want to abort and report the error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, It seems I understand wrong the purpose of these error checking. I thought this function is to detect the available ClusterIP range, but it seems to detect the availability of component/service related to ClusterIP. I will cancel the related modifications later.

In addition, I think the point 1&2 should be retained, but for point 3, I wonder if add the redundant finding from kube-controller-manager.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tpantelis Another question, the original if conditions err != nil || clusterIPRange != "" is almost true, which will cover/skip the following judging!!
In my opinion, this function is used to find the service cluster ip range. Even if occurs error, which should be detected and handled somewhere, adding a waring/error log but continuing finding is a good idea here.

If you agree, I will modify it as above. If not, looking forward to your reply.

Copy link
Contributor

@tpantelis tpantelis Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this function is used to find the service cluster ip range. Even if occurs error, which should be detected and handled somewhere, adding a waring/error log but continuing finding is a good idea here.

If you agree, I will modify it as above. If not, looking forward to your reply.

I don't agree. Consider this scenario:

Let's say in a particular environment the first call, findClusterIPRangeFromApiserver, under normal circumstances with no error finds the cluster IP. But on a particular run a transient error occurs, eg the API server is busy and times out, in which case we would just log a warning and continue. For the rest of the code below, the API server responds and no error occurs but the cluster IP isn't found by any of the following methods. This function would return and report to the caller that no cluster IP was found and no error occurred. But this is incorrect as an error did occur but it was essentially ignored by just logging it. We must propagate errors to the caller to indicate that the result was indeterminate and to let the caller handle the error as it sees fit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand it. So, each error should be checked and not be covered by assignments. However, there is no doubt that err != nil || clusterIPRange != "" condition will cover/skip the following statements. This point must be corrected. I'll modify the PR correspondingly.

Copy link
Contributor

@tpantelis tpantelis Dec 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - the error checking was fine as is. Given your recent changes and the scenario I outlined above, it would return a valid cluster IP and an error which is non-sensical and non-idiomatic.

Please just focus on the purpose of this PR as outlined in your summary and not modify the existing error handling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience. I know what I misunderstood. err != nil || clusterIPRange != "" is not always true. There is a situation not finding cluster ip range from the args of apiserver/controller-manager.

I have modified my patch, removing the changes of error checking.

@unilinu unilinu force-pushed the fix/optimize-finding-cluster-iprange branch from e588a15 to 128bf17 Compare December 3, 2024 02:45
@unilinu unilinu force-pushed the fix/optimize-finding-cluster-iprange branch from 128bf17 to 66c3f51 Compare December 4, 2024 01:41
// the cluster is a single node deployment, we should rely on the node.Spec.PodCIDR as podCIDR of the cluster.
if len(nodes) == 1 {
return nodes[0].Spec.PodCIDR, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PR summary:

The node.spec.podCIDR field is not the cluster pod ip range, but the node-assigned.

But this is only considered if there's only one node as explained in the comments above and if the pod CIDR wasn't found from any prior methods. It seems valid to keep this.

@tpantelis
Copy link
Contributor

tpantelis commented Dec 4, 2024

@unilinu The DCO job failed - you need to add Signed-off-by your commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants