Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail with I/O timeout due to bad configuration of the Kubernetes provider #104

Closed
Nuru opened this issue Feb 10, 2021 · 7 comments · Fixed by #119
Closed

Fail with I/O timeout due to bad configuration of the Kubernetes provider #104

Nuru opened this issue Feb 10, 2021 · 7 comments · Fixed by #119
Labels
bug 🐛 An issue with the system

Comments

@Nuru
Copy link
Contributor

Nuru commented Feb 10, 2021

Describe the Bug

Creating or modifying an EKS cluster fails (sometimes) with errors suggesting the Kubernetes provider is not properly configured. This appears to be more of a problem with Terraform 0.14 than with Terraform 0.13.

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: i/o timeout

Steps to Reproduce

Steps to reproduce the behavior:

  1. Run terraform plan
  2. See the error

Environment (please complete the following information):

  • Terraform 0.14.6
  • terraform-aws-eks-cluster 0.34.1

Update

This appears to be a bug in the Kubernetes v2 Terraform provider (see comments below). Until v2 works the way we want, we recommend using terraform-aws-eks-cluster v0.37.0 or later and pinning the Kubernetes provider to ~> 1.13

@Nuru Nuru added the bug 🐛 An issue with the system label Feb 10, 2021
@Nuru
Copy link
Contributor Author

Nuru commented Feb 10, 2021

We are investigating. This appears to be related to some or all of the following:

There is a documented warning about using the Kubernetes provider in the same module that creates the Kubernetes resource, but we have not had a problem with it in this module until recently. Although we have made several recent changes to the modules, we expect this is primarily triggered by a change in behavior of Terraform 0.14 and the new Kubernetes provider v2.0.0.

@Nuru Nuru pinned this issue Feb 10, 2021
@z0rc
Copy link
Contributor

z0rc commented Feb 18, 2021

Got in similar situation when updated EKS cluster (provisioned by this module version 0.34.1) from 1.18 to 1.19. Changing kubernetes_version attribute when cluster is deployed breaks kubernetes provider with error in opening post. I had to do terraform state rm module.eks.kubernetes_config_map.aws_auth[0] prior to running terraform apply, and did terraform import module.eks.kubernetes_config_map.aws_auth[0] kube-system/aws-auth after apply was complete.

Probably this case has to be documented as known issue with workaround.

@vsimon
Copy link

vsimon commented Mar 5, 2021

Thanks for investigating @Nuru. I got this error from a different pathway.

I'm using the cloudposse/eks-cluster with the cloudposse/named-subnets module mostly configured like the examples:

Show Configuration
module "vpc" {
  source  = "cloudposse/vpc/aws"
  version = "0.20.4"

  context    = module.this.context
  cidr_block = "10.0.0.0/16"
}

locals {
  us_east_1a_public_cidr_block  = cidrsubnet(module.vpc.vpc_cidr_block, 2, 0)
  us_east_1a_private_cidr_block = cidrsubnet(module.vpc.vpc_cidr_block, 2, 1)
  us_east_1b_public_cidr_block  = cidrsubnet(module.vpc.vpc_cidr_block, 2, 2)
  us_east_1b_private_cidr_block = cidrsubnet(module.vpc.vpc_cidr_block, 2, 3)
}

module "us_east_1a_public_subnets" {
  source  = "cloudposse/named-subnets/aws"
  version = "0.9.2"

  context           = module.this.context
  subnet_names      = ["eks"]
  vpc_id            = module.vpc.vpc_id
  cidr_block        = local.us_east_1a_public_cidr_block
  type              = "public"
  igw_id            = module.vpc.igw_id
  availability_zone = "us-east-1a"
  attributes        = ["us-east-1a"]
  # The usage of the specific kubernetes.io/cluster/* resource tags below are required
  # for EKS and Kubernetes to discover and manage networking resources
  # https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html#vpc-subnet-tagging
  tags = {
    "kubernetes.io/cluster/cluster" : "shared"
    "kubernetes.io/role/elb" : "1"
  }
}

module "us_east_1a_private_subnets" {
  source  = "cloudposse/named-subnets/aws"
  version = "0.9.2"

  context           = module.this.context
  subnet_names      = ["eks"]
  vpc_id            = module.vpc.vpc_id
  cidr_block        = local.us_east_1a_private_cidr_block
  type              = "private"
  availability_zone = "us-east-1a"
  attributes        = ["us-east-1a"]
  ngw_id            = module.us_east_1a_public_subnets.ngw_id
  # The usage of the specific kubernetes.io/cluster/* resource tags below are required
  # for EKS and Kubernetes to discover and manage networking resources
  # https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html#vpc-subnet-tagging
  tags = {
    "kubernetes.io/cluster/cluster" : "shared"
    "kubernetes.io/role/internal-elb" : "1"
  }
}

module "us_east_1b_public_subnets" {
  source  = "cloudposse/named-subnets/aws"
  version = "0.9.2"

  context           = module.this.context
  subnet_names      = ["eks"]
  vpc_id            = module.vpc.vpc_id
  cidr_block        = local.us_east_1b_public_cidr_block
  type              = "public"
  igw_id            = module.vpc.igw_id
  availability_zone = "us-east-1b"
  attributes        = ["us-east-1b"]
  # The usage of the specific kubernetes.io/cluster/* resource tags below are required
  # for EKS and Kubernetes to discover and manage networking resources
  # https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html#vpc-subnet-tagging
  tags = {
    "kubernetes.io/cluster/cluster" : "shared"
    "kubernetes.io/role/elb" : "1"
  }
}

module "us_east_1b_private_subnets" {
  source  = "cloudposse/named-subnets/aws"
  version = "0.9.2"

  context           = module.this.context
  subnet_names      = ["eks"]
  vpc_id            = module.vpc.vpc_id
  cidr_block        = local.us_east_1b_private_cidr_block
  type              = "private"
  availability_zone = "us-east-1b"
  attributes        = ["us-east-1b"]
  ngw_id            = module.us_east_1b_public_subnets.ngw_id
  # The usage of the specific kubernetes.io/cluster/* resource tags below are required
  # for EKS and Kubernetes to discover and manage networking resources
  # https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html#vpc-subnet-tagging
  tags = {
    "kubernetes.io/cluster/cluster" : "shared"
    "kubernetes.io/role/internal-elb" : "1"
  }
}

module "eks_cluster" {
  source  = "cloudposse/eks-cluster/aws"
  version = "0.34.0"

  context = module.this.context
  region  = "us-east-1"
  vpc_id  = module.vpc.vpc_id
  subnet_ids = [
    module.us_east_1a_public_subnets.named_subnet_ids["eks"],
    module.us_east_1b_public_subnets.named_subnet_ids["eks"],
    module.us_east_1a_private_subnets.named_subnet_ids["eks"],
    module.us_east_1b_private_subnets.named_subnet_ids["eks"]
  ]
  kubernetes_version                = "1.18"
  oidc_provider_enabled             = true
  enabled_cluster_log_types         = ["api", "authenticator", "controllerManager", "scheduler"]
  cluster_log_retention_period      = 90
  cluster_encryption_config_enabled = true
  map_additional_aws_accounts       = [REDACTED]
}

# Ensure ordering of resource creation to eliminate the race conditions when applying the Kubernetes Auth ConfigMap.
# Do not create Node Group before the EKS cluster is created and the `aws-auth` Kubernetes ConfigMap is applied.
# Otherwise, EKS will create the ConfigMap first and add the managed node role ARNs to it,
# and the kubernetes provider will throw an error that the ConfigMap already exists (because it can't update the map, only create it).
# If we create the ConfigMap first (to add additional roles/users/accounts), EKS will just update it by adding the managed node role ARNs.
data "null_data_source" "wait_for_cluster_and_kubernetes_configmap" {
  inputs = {
    cluster_name             = module.eks_cluster.eks_cluster_id
    kubernetes_config_map_id = module.eks_cluster.kubernetes_config_map_id
  }
}

module "eks-node-group" {
  source  = "cloudposse/eks-node-group/aws"
  version = "0.18.1"

  context = module.this.context
  subnet_ids = [
    module.us_east_1a_private_subnets.named_subnet_ids["eks"],
    module.us_east_1b_private_subnets.named_subnet_ids["eks"]
  ]
  cluster_name              = data.null_data_source.wait_for_cluster_and_kubernetes_configmap.outputs["cluster_name"]
  desired_size              = 2
  min_size                  = 1
  max_size                  = 2
}

After creation, terraform plan works fine.

When I change the existing us_east_1a_private_subnets/subnet_names and us_east_1b_private_subnets/subnet_names to be ["eks", "mysql"] and do terraform plan, I see:

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

Releasing state lock. This may take a few moments...

With the subnet_names change, the debug output contains a WARNING: Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Show Logs
2021-03-05T10:10:43.078-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:10:43 [WARN] Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable: timestamp=2021-03-05T10:10:43.078-0800
2021-03-05T10:10:43.079-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:10:43 [DEBUG] Enabling HTTP requests/responses tracing: timestamp=2021-03-05T10:10:43.078-0800
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "local.enabled"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "var.apply_config_map_aws_auth"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "var.kubernetes_config_map_ignore_role_changes"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "local.map_worker_roles"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "var.map_additional_iam_roles"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "var.map_additional_iam_users"
2021/03/05 10:10:43 [INFO] ReferenceTransformer: reference not found: "var.map_additional_aws_accounts"
2021/03/05 10:10:43 [DEBUG] ReferenceTransformer: "module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]" references: []
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Refreshing state... [id=kube-system/aws-auth]
2021-03-05T10:10:43.083-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:10:43 [INFO] Checking config map aws-auth: timestamp=2021-03-05T10:10:43.083-0800
2021-03-05T10:10:43.083-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:10:43 [DEBUG] Kubernetes API Request Details:
---[ REQUEST ]---------------------------------------
GET /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
Host: localhost
User-Agent: HashiCorp/1.0 Terraform/0.14.7
Accept: application/json, */*
Accept-Encoding: gzip


-----------------------------------------------------: timestamp=2021-03-05T10:10:43.083-0800
2021-03-05T10:10:43.084-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:10:43 [DEBUG] Received error: &url.Error{Op:"Get", URL:"http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth", Err:(*net.OpError)(0xc000e67cc0)}: timestamp=202
1-03-05T10:10:43.084-0800

Without the change, there is no WARNING.

Show Logs
2021-03-05T10:07:59.545-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:07:59 [DEBUG] Enabling HTTP requests/responses tracing: timestamp=2021-03-05T10:07:59.545-0800
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "local.enabled"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "var.apply_config_map_aws_auth"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "var.kubernetes_config_map_ignore_role_changes"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "local.map_worker_roles"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "var.map_additional_iam_roles"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "var.map_additional_iam_users"
2021/03/05 10:07:59 [INFO] ReferenceTransformer: reference not found: "var.map_additional_aws_accounts"
2021/03/05 10:07:59 [DEBUG] ReferenceTransformer: "module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]" references: []
module.eks_cluster.kubernetes_config_map.aws_auth_ignore_changes[0]: Refreshing state... [id=kube-system/aws-auth]
2021-03-05T10:07:59.548-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:07:59 [INFO] Checking config map aws-auth: timestamp=2021-03-05T10:07:59.548-0800
2021-03-05T10:07:59.548-0800 [INFO]  plugin.terraform-provider-kubernetes_v2.0.2_x5: 2021/03/05 10:07:59 [DEBUG] Kubernetes API Request Details:
---[ REQUEST ]---------------------------------------
GET /api/v1/namespaces/kube-system/configmaps/aws-auth HTTP/1.1
Host: [REDACTED]
User-Agent: HashiCorp/1.0 Terraform/0.14.7
Accept: application/json, */*
Authorization: Bearer [REDACTED]
Accept-Encoding: gzip
Show Versions
Terraform v0.14.7
+ provider registry.terraform.io/hashicorp/aws v3.28.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2
+ provider registry.terraform.io/hashicorp/local v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.1
+ provider registry.terraform.io/hashicorp/template v2.2.0

@osterman
Copy link
Member

osterman commented Mar 8, 2021

It also seems relevant to hashicorp/terraform-provider-kubernetes#1167. Unfortunately, we're already using the data source workaround suggested.

@osterman
Copy link
Member

osterman commented Mar 8, 2021

It also seems related to #58 (comment)

@osterman
Copy link
Member

osterman commented Mar 8, 2021

@vsimon
Copy link

vsimon commented May 19, 2021

Just sharing another datapoint.

Today, I tried changing the module's kubernetes_version input from 1.18 to 1.19 and hit the dial tcp [::1]:80: connect: connection refused error both locally and in CI.

$ terraform plan

Error: Get "http://localhost/api/v1/namespaces/kube-system/configmaps/aws-auth": dial tcp [::1]:80: connect: connection refused

Releasing state lock. This may take a few moments...

Wanted to avoid going down the terraform state rm ... route.

I happened to try passing a KUBE_CONFIG_PATH envvar with success (this assumes the ~/.kube/config file already exists and is valid):

$ KUBE_CONFIG_PATH=~/.kube/config terraform plan

  ...
  ...
  # module.eks_cluster.data.aws_eks_cluster_auth.eks[0] will be read during apply
  # (config refers to values not yet known)
 <= data "aws_eks_cluster_auth" "eks"  {
      ~ id    = "cluster-xxxx" -> (known after apply)
        name  = "cluster-xxxx"
      ~ token = (sensitive value)
    }

  # module.eks_cluster.aws_eks_cluster.default[0] will be updated in-place
  ~ resource "aws_eks_cluster" "default" {
        id                        = "cluster-xxxx"
        name                      = "cluster-xxxx"
        tags                      = {
            "Attributes" = "cluster-xxxx"
            "Name"       = "cluster-xxxx"
        }
      ~ version                   = "1.18" -> "1.19"
        # (9 unchanged attributes hidden)



        # (3 unchanged blocks hidden)
    }

Plan: 0 to add, 4 to change, 0 to destroy.

After applying, plan works again without the KUBE_CONFIG_PATH specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 An issue with the system
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants