Eks-core-services module failing | Pinging tiller!

Hello,

I am in the process of bringing up my EKS cluster using Gruntwork code.
I am able to create cluster successfully and when I run eks-core-services module is failing at a stage where “helm is trying to ping tiller”. I am trying to setup helm in my local VM and tiller is deployed in EKS.

time="2020-03-06T13:35:15+05:30" level=info msg="Verified helm home directory /tmp/657119618/.helm and all its subdirectories exist" name=kubergrunt
time="2020-03-06T13:35:15+05:30" level=info msg="Initializing repository file" name=kubergrunt
time="2020-03-06T13:35:15+05:30" level=info msg="Creating helm repository file /tmp/657119618/.helm/repository/repositories.yaml" name=kubergrunt
time="2020-03-06T13:35:15+05:30" level=info msg="Initializing repository file /tmp/657119618/.helm/repository/repositories.yaml with stable repo" name=kubergrunt
time="2020-03-06T13:35:15+05:30" level=info msg="Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Verifying repository file format" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Verifed repository file format is up to date" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Successfully initializing repository file" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Successfully initialized helm home directory /tmp/657119618/.helm" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Done initializing helm home" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Downloading TLS certificates to access specified Tiller server." name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Successfully downloaded TLS certificates." name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Generating environment file to setup helm client." name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Successfully generated environment file." name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Verifying client setup" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Loading Kubernetes Client" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Using direct auth methods to setup client." name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Setting up connection to Tiller Pod in Namespace kube-system" name=kubergrunt
time="2020-03-06T13:35:23+05:30" level=info msg="Using direct auth methods to setup client." name=kubergrunt
time="2020-03-06T13:35:30+05:30" level=info msg="Successfully opened tunnel to Tiller Pod in Namespace kube-system: 127.0.0.1:33609" name=kubergrunt
time="2020-03-06T13:35:30+05:30" level=info msg="Setting up new helm client with home /tmp/657119618/.helm and pinging Tiller" name=kubergrunt


  on main.tf line 56, in data "external" "configured_helm_home":
  56: data "external" "configured_helm_home" {

I see that helm is opened tunnel to local host, is that what causing the problem? Do I need to configure this as EKS endpoint in someway?

I am able to get all EKS cluster information from my local VM. Also since I have a openvpn connection established from my local VM to AWS account, I can access the EKS worker nodes from my local VM.

kubectl get namespace
NAME          STATUS   AGE
default       Active   3d19h
kube-public   Active   3d19h
kube-system   Active   3d19h

kubectl get nodes -o wide
NAME                            STATUS   ROLES    AGE     VERSION               INTERNAL-IP     EXTERNAL-IP   OS-IMAGE         KERNEL-VERSION                  CONTAINER-RUNTIME
ip-172-21-103-4.ec2.internal    Ready    <none>   3d19h   v1.13.12-eks-c500e1   172.21.103.4    <none>        Amazon Linux 2   4.14.165-131.185.amzn2.x86_64   docker://18.9.9
ip-172-21-80-42.ec2.internal    Ready    <none>   3d19h   v1.13.12-eks-c500e1   172.21.80.42    <none>        Amazon Linux 2   4.14.165-131.185.amzn2.x86_64   docker://18.9.9
ip-172-21-89-120.ec2.internal   Ready    <none>   3d19h   v1.13.12-eks-c500e1   172.21.89.120   <none>        Amazon Linux 2   4.14.165-131.185.amzn2.x86_64   docker://18.9.9

kubectl get pods --namespace=kube-system -o wide
NAME                             READY   STATUS    RESTARTS   AGE     IP              NODE                            NOMINATED NODE   READINESS GATES
aws-node-2kb88                   1/1     Running   0          3d19h   172.21.89.120   ip-172-21-89-120.ec2.internal   <none>           <none>
aws-node-st75j                   1/1     Running   0          3d19h   172.21.80.42    ip-172-21-80-42.ec2.internal    <none>           <none>
aws-node-t65zb                   1/1     Running   0          3d19h   172.21.103.4    ip-172-21-103-4.ec2.internal    <none>           <none>
coredns-69bc49bfdd-2q2v2         1/1     Running   0          3d19h   172.21.96.94    ip-172-21-103-4.ec2.internal    <none>           <none>
coredns-69bc49bfdd-wjcw7         1/1     Running   0          3d19h   172.21.97.185   ip-172-21-103-4.ec2.internal    <none>           <none>
kube-proxy-bt98t                 1/1     Running   0          3d19h   172.21.103.4    ip-172-21-103-4.ec2.internal    <none>           <none>
kube-proxy-cvxrc                 1/1     Running   0          3d19h   172.21.80.42    ip-172-21-80-42.ec2.internal    <none>           <none>
kube-proxy-sxwtd                 1/1     Running   0          3d19h   172.21.89.120   ip-172-21-89-120.ec2.internal   <none>           <none>
tiller-deploy-797864d97c-pppp7   1/1     Running   0          19h     172.21.90.164   ip-172-21-89-120.ec2.internal   <none>           <none>

I have even tried configuring helm using kubergrunt command. That is also getting timedout at pinging tiller.

kubergrunt helm configure --tiller-namespace kube-system --resource-namespace kube-system --rbac-user sr_full_admin_access

INFO[2020-03-06T13:44:41+05:30] Setting up connection to Tiller Pod in Namespace kube-system  name=kubergrunt
INFO[2020-03-06T13:44:41+05:30] No direct auth methods provided. Using config on disk and context.  name=kubergrunt
INFO[2020-03-06T13:44:43+05:30] Successfully opened tunnel to Tiller Pod in Namespace kube-system: 127.0.0.1:33161  name=kubergrunt
INFO[2020-03-06T13:44:43+05:30] Setting up new helm client with home /home/sanoop/.helm and pinging Tiller  name=kubergrunt

I have following files in my helm home directory.

total 32
drwxr-xr-x 2 sanoop sanoop 4096 Mar  6 13:20 starters
drwxr-xr-x 2 sanoop sanoop 4096 Mar  6 13:20 plugins
drwxr-xr-x 3 sanoop sanoop 4096 Mar  6 13:20 cache
drwxr-xr-x 4 sanoop sanoop 4096 Mar  6 13:20 repository
-rw-r--r-- 1 sanoop sanoop  227 Mar  6 13:44 key.pem
-rwx------ 1 sanoop sanoop  129 Mar  6 13:44 env
-rw-r--r-- 1 sanoop sanoop  729 Mar  6 13:44 cert.pem
-rw-r--r-- 1 sanoop sanoop  814 Mar  6 13:44 ca.pem

And my helm env file is having following contents.

export HELM_HOME=/home/sanoop/.helm
export TILLER_NAMESPACE=kube-system
export HELM_TLS_VERIFY=true
export HELM_TLS_ENABLE=true

I have also set the HELM_HOST environment variable, still it is not connecting to tiller.

export HELM_HOST=172.21.90.164:44134

helm version --debug --tiller-connection-timeout 4

Client: &version.Version{SemVer:"v2.16.2", GitCommit:"bbdfe5e7803a12bbdf97e94cd847859890cf4050", GitTreeState:"clean"}
[debug] SERVER: "172.21.90.164:44134"

Kubernetes: &version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.12-eks-eb1860", GitCommit:"eb1860579253bb5bf83a5a03eb0330307ae26d18", GitTreeState:"clean", BuildDate:"2019-12-23T08:58:45Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
[debug] context deadline exceeded
Error: cannot connect to Tiller


@yoriy @jim @josh-padnick Can you please help here?

Hi Sanoop,

Can you describe your local VM setup a bit more? In particular, what is the network setting on the VM? Also, how you launched the EKS cluster and any customizations you have made to it.

172.21.90.164 is actually the cluster local IP of the Tiller server and is only accessible from within the EKS cluster. This is why it won’t work if you directly try to set the HELM_HOST that way.

The way access to helm works is that the client creates a tunnel to the pod using the same mechanism as kubectl port-forward (see https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/ for more info), and then trying to access it through the local port forward. Since it looks like it is opening the port forward correctly, that is probably not the issue. The fact that it is timing out indicates that there is some firewall issue that is preventing access to the local port forward.

There are a few ways this can happen:

  • You have network restrictions that prevent you from accessing the opened port forward locally.
  • The security group is preventing port forwarding between the EKS control plane and the workers.
  • A local firewall is preventing port forwarding between the kubelet on the worker and the docker container.

I would check each of these to make sure they are open in your setup. Here are some leading questions to help:

  • Did you install any firewalls on the workers? In particular, ip-lockdown and older versions of fail2ban are known to cause problems on the EKS optimized AMIs.
  • Did you customize the security group rules on the worker nodes?
  • Do you know of any network configurations locally that might prevent the port forwarding from working?

I would also suggest using the redis example in the docs to debug this, as it is easier to ping redis than it is to ping helm which uses grpc.

Yori

Thank you!

I had tried port forward for redis and redis ping worked perfectly. Which means there is no network connectivity issues or firewall block issue.
Later I did a packet capture on the tunnel port and noticed that it was because of the bad certificate. Server(tiller pod) is sending Bad Certificate(42) error back to the client.

I did look at the certificate which is downloaded in my helm home (/home/sanoop/.helm) and certificates in tiller pod(/etc/cert). These two are not matching.

I am not sure if from where it is downloading the wrong certificates. Or there is something wrong in the way I am looking at the certificates.
I see multiple secrets in kube-system namesapce when I list it. Is that causing the problem?

kubectl get secrets -n kube-system | grep certs
applications-tiller-namespace-tiller-ca-certs          Opaque                                3      2d21h
kube-system-namespace-tiller-ca-certs                  Opaque                                3      3d21h
kube-system-namespace-tiller-certs                     Opaque                                4      3d21h
tiller-client-23b0749d7d3a9ee3c0b024a86fe3e1c2-certs   Opaque                                4      3d21h
tiller-client-2e3a9c6ce8da519bb53fafcea0f28db4-certs   Opaque                                4      3d21h
tiller-client-a7cfa110b509eb5603c721481b1595f8-certs   Opaque                                4      3d23h
tiller-client-c3e2a2a00ed25bff37d4bc0902c56f42-certs   Opaque                                4      3d22h

The problem was because of multiple secrets present. I have deleted all secrets in kube-system namespace and created only required one. And it worked!

Thanks a lot!

Thanks for closing the loop!